|
|
This guide familiarizes readers with the solaris10 zone brand, which allows OpenSolaris administrators to create Solaris 10 Containers. The guide explains how the brand affects Solaris development and how developers can enhance the brand so that it can cope with changes to Solaris 10 and OpenSolaris. This document is aimed at all Solaris kernel developers whose work might affect the solaris10 brand's functionality. (The introduction and the section entitled "What kinds of changes to OpenSolaris and Solaris 10 might break the solaris10 brand?" delineate the kinds of projects and fixes that are impacted.)
Note that this is a living document: The guide changes as the Solaris 10 Containers development team receives feedback from readers. Please send questions, comments, suggestions, and corrections to zones-discuss@opensolaris.org. Please ensure that your emails' subject lines start with "S10C Dev Guide" if the emails specifically address this guide.
This section introduces solaris10-branded zones by providing brief overviews of zones, branded zones, what solaris10-branded zones do, and the reasons why Solaris kernel developers should take solaris10-branded zones into consideration when fixing bugs or adding new features to Solaris 10 or OpenSolaris.
Solaris Zones (also known as Solaris Containers or simply zones) are lightweight virtual machines that isolate user-level workloads on Solaris systems. They differ markedly from virtual machines created with other virtualization technologies such as Xen, VirtualBox, and Logical Domains (LDoms) in that zones do not rely on hypervisors to provide abstracted hardware resources and isolate themselves from the native host system and each other. Zones are built into the Solaris kernel and many of the kernel's subsystems are zone-aware (i.e., they associate their abstractions with zones and base decisions in part on such associations). Consequently, processes executing within zones experience little overhead (a high estimate is 5% of total execution time) and thus come close to achieving bare-metal performance. Furthermore, the lack of overhead makes zones highly scalable: Even low-end consumer desktops are capable of running dozens of zones at a time. The noticeable lack of execution overhead experienced by zoned processes, the relative ease of zone creation and management, and the maturity of zones technology make zones one of the most popular virtualization technologies (if not the most popular virtualization technology) supported on Solaris 10 and OpenSolaris.
However, zones have a few well-known, fundamental limitations. Zones cannot host processes running programs compiled for non-native architectures because zones lack hypervisors. Furthermore, not all Solaris kernel subsystems are zone-aware, which limits the kinds of resources that can be fully isolated and supported within zones. Finally, ordinary zones cannot host user environments from non-native operating systems (OSes). The last limitation is largely eliminated through the use of branded zones, which will be discussed shortly.
You can learn more about zones by visiting the OpenSolaris.org Zones Community Page, the Solaris Containers BigAdmin Page, or the zones chapter of the System Administration Guide on docs.sun.com (Solaris 10 only).
Branded zones are zones that are capable of emulating user environments from OSes other than Solaris 10 and OpenSolaris. Branded zones achieve this by emulating the non-native OSes' system calls (syscalls). Syscalls constitute the sole interface between user environments and kernels; therefore, if a branded zone emulates syscalls such that they have the same side effects as the syscalls of a particular non-native OS (e.g., Linux 2.4), then processes running within the zone will act as though they are running on the targeted non-native OS. A branded zone's brand is the collection of support libraries, support hooks, and auxiliary data files that make emulating the zone's targeted OS possible. Brands are named after the OSes whose syscalls they emulate. For example, there are solaris8 and solaris9 brands on Solaris 10 that allow Solaris 10 zones to host Solaris 8 and Solaris 9 user environments, respectively. Zones that host native Solaris user environments (i.e., zones that lack syscall emulation) are native-branded zones (often simply called native zones). (OpenSolaris' native zones currently use the ipkg brand.)
Maintaining brands can be incredibly difficult. Changes in Solaris syscalls' semantics (i.e., their parameters and side effects) can break emulation provided by brands if the brands' support libraries are not updated to account for the changes. Similarly, changes in non-native OSes' syscalls' semantics can break the emulation provided by brands. Maintaining a brand is especially difficult when the user-kernel interfaces exported by both the native Solaris kernel and the hosted non-native OS are continually in flux (as would be the case if we were to maintain a brand for the latest development releases of Linux 2.6 on OpenSolaris).
You can learn more about branded zones and the framework that makes them possible by visiting the OpenSolaris.org BrandZ Community Page, which describes the brand framework and the lx [Linux 2.4] brand on OpenSolaris, or the Solaris Containers BigAdmin Page.
solaris10-branded zones host Solaris 10 (S10) user environments inside zones on OpenSolaris. They are meant to help maintainers of Solaris 10 systems consolidate their production environments onto systems running OpenSolaris. Workloads running within solaris10-branded zones can take advantage of the performance improvements made to the OpenSolaris kernel and utilize some of the innovative technologies available only on OpenSolaris (e.g., Crossbow VNICs). Only Solaris 10u8 and beyond are supported and tested in such zones.
Ultimately, the purpose of solaris10-branded zones is to provide the proper emulation for Solaris 10 processes running inside the zones so that they work correctly with the OpenSolaris kernel. This is summarized in the following principle, which should serve as a guide for enhancing and maintaining solaris10-branded zones: Any script or program that works in native Solaris 10 zones should also work in solaris10-branded zones.
You can learn more about the solaris10 brand and track the project's progress by visiting the OpenSolaris.org solaris10 Brand Project Page.
Because a solaris10-branded zone is running Solaris 10 user-level binaries on top of the OpenSolaris kernel, mismatches in the user-kernel interfaces provided by both systems are possible. This does not happen within normal OpenSolaris zones (i.e., native OpenSolaris zones) because such zones run OpenSolaris user-level binaries, which are built to run in sync with the OpenSolaris kernel. If you make changes to either OpenSolaris or Solaris 10 that impact their user-kernel boundaries, then you will have to take solaris10-branded zones into consideration. If your changes break the emulation provided by solaris10-branded zones, then you will have to enhance the brand's emulation layer so that it can cope with such changes. OpenSolaris community contribution sponsors should ensure that ON contributions will not break the solaris10 brand.
No. The ABI deals with published, well-documented interfaces such as those provided by libc and documented in the section 2 man pages. The solaris10 brand must deal with undocumented, unstable interfaces such as how libc traps into the kernel. Normally changes to such interfaces are hidden within libraries such as libc in a compatible way so that applications don't notice them. However, solaris10-branded zones run the Solaris 10 version of libc (as well as other Solaris 10 libraries), which is not built to work with the OpenSolaris kernel. Therefore, the brand emulation layer is needed to translate between the Solaris 10 user level code and the OpenSolaris kernel code.
Here is a simple example: issetugid(2) is a libc function whose semantics haven't changed between Solaris 10 and OpenSolaris. However, Solaris 10's libc invokes the SYS_issetugid syscall (syscall number 75), which doesn't take arguments, while OpenSolaris' libc invokes the SYS_privsys syscall (number 82), which takes six arguments. Furthermore, the OpenSolaris kernel replaced SYS_issetugid with SYS_sidsys, which has radically different semantics. Consequently, if a Solaris 10 process linked with the Solaris 10 libc running on top of the OpenSolaris kernel were to invoke issetugid(2), then it would issue the wrong syscall. This might result in the Solaris 10 process dumping core or producing incorrect results. The solution is to emulate the Solaris 10 SYS_issetugid syscall so that SYS_privsys is issued instead (with proper arguments, of course).
No. The purpose of solaris10-branded zones is to translate between the old Solaris 10 code running within them and the new OpenSolaris kernel. There are existing proofs of concept with the lx brand running Linux on Solaris 10 and OpenSolaris as well as the solaris8 and solaris9 brands running those releases on Solaris 10. The capability to run Linux on OpenSolaris did not stop developers from creating ZFS, DTrace, or Crossbow (to name just a few prominent examples of Solaris innovation). You can continue to make new, innovative changes to OpenSolaris, but you will have to take the solaris10 brand into account during your development and you may need to modify the brand so that solaris10-branded zones continue to work with your innovation. This guide tells you how to do this.
Changes that cross the user-kernel boundary affect solaris10-branded zones. In other words, beware all changes that cause cap-I Install(1) flag days. The following list details these kinds of changes:
The solaris10 brand's source tree is layed out thus:
Syscall emulation is what makes the solaris10 brand work. As mentioned in the introduction, the syscall interface (which includes ioctls) is the sole interface by which user processes interact with the kernel. By emulating syscalls, the solaris10 brand can make Solaris 10 processes running within solaris10-branded zones act as though they are communicating with a Solaris 10 kernel.
When a process execs within a solaris10-branded zone, the dynamic linker loads the solaris10 brand emulation library (which resides in the global zone) prior to all other dynamic libraries to which the process' executable is linked. The emulation library initializes its data structures and registers itself with the solaris10 kernel module via a special brand syscall (SYS_brand with a special subcode). Thus the kernel module will know how to transfer control to the emulation library in the event a syscall is issued from the associated process. Once the emulation library finishes initializing, the dynamic linker continues to set up the associated process and transfers control to its start routine.
When a non-native process in a solaris10-branded zone issues a syscall, the following steps occur in order:
The solaris10 brand's emulation library is constructed from usr/src/lib/brand/solaris10/s10_brand/common/s10_brand.c in the source tree.
To emulate a Solaris 10 syscall, you must create a static function in the brand emulation library with the following signature:
where <function-name> is the name of the function and <syscall-args> are the syscall's parameter declarations. The function should return the errno value that the emulated Solaris 10 syscall would produce. You should store the emulated syscall's return value in <rv>.
You must make two additional changes, one to the brand emulation library and the other to the solaris10 brand's kernel module (usr/src/uts/common/brand/solaris10/s10_brand.c in the source tree):
In this example, we will emulate the Solaris 10 SYS_sigqueue syscall. We will name the emulation function s10_sigqueue() and define it thus:
(For information about __systemcall(), see "Issuing System Calls within Emulation Functions".)
We need to modify the entry in the brand emulation library's s10_sysent_table array corresponding to SYS_sigqueue (entry 190) so that s10_sigqueue() will be invoked when processes issue SYS_sigqueue. SYS_sigqueue takes four arguments and returns "default" values in Solaris 10, so the correct entry is:
Finally, we need to modify the solaris10 brand kernel module so that it knows to emulate SIG_sigqueue. We would add the following line to the beginning of the _init() function in the kernel module:
All ioctls are issued via the SYS_ioctl syscall, which the brand emulation library emulates in s10_ioctl(). All ioctls are currently emulated by first checking the request argument to ioctl(2) (the cmd argument to s10_ioctl()) and taking appropriate action based on its value (usually calling separate emulation functions to handle subsystem- or device-specific ioctls, such as zfs_ioctl() to handle ZFS ioctls). If the argument does not match any emulated ioctls, then the syscall is passed to the OpenSolaris kernel.
Here is an example of s10_ioctl():
Note that the same ioctl command number might be used by two different devices: Checking the command number does not determine which of the two is being controlled. Although most Solaris devices have unique ioctl commands, there is no guarantee that a Solaris device-specific ioctl command is not used by a third-party device. If this is a concern, then ioctl emulation code can gather more information about the targeted device by performing an fstat operation (SYS_fstat) and checking the results.
For example, suppose that we will emulate the contract file system's (CTFS's) CT_TGET and CT_TSET ioctl commands in the emulation function ctfs_ioctl(). Suppose further that we want to ensure that we only emulate these ioctls when they target CTFS files. We can issue a SYS_fstat syscall and check that the targeted file's filesystem's type is MNTTYPE_CTFS:
(For information about __systemcall(), see "Issuing System Calls within Emulation Functions".)
Notice that the above function assumes that CT_TGET and CT_TSET ioctls should not be emulated if they are not intended for CTFS files. This might not be the case if another device uses either ioctl command and needs to be emulated.
The following sections detail common techniques used in syscall and ioctl emulation functions.
Many emulation functions need to issue syscalls to the native OpenSolaris kernel. To do so, invoke the __systemcall() function as follows:
<SYS-identifier> is the numeric code of the syscall being issued as defined in usr/src/uts/common/sys/syscall.h (e.g., SYS_fstat) and <syscall-args> are the arguments to the syscall. The return value of the syscall is stored in <rv>, which has type sysret_t *. The return value of __systemcall() indicates whether or not an error occurred. If it is zero, then no errors occurred.
Notice that you should add 1024 to the syscall's numeric identifier. The brand framework treats all syscalls whose identifiers are less than 1024 as emulated syscalls and will bounce them back to the brand emulation library, whereas syscalls whose identifiers are offset by 1024 are treated as native syscalls and are not emulated. If you do not add 1024 to the identifier, then the brand emulation library will handle the syscall, which is probably not what you want. Not adding 1024 might also cause infinite recursion if the emulation function inadvertently invokes itself while issuing the syscall, resulting in stack overflows and, ultimately, core dumps.
Refer to the example emulation function s10_sigqueue() provided in the last section, which emulates the SYS_sigqueue syscall:
In Solaris 10, SYS_sigqueue takes five arguments, but the OpenSolaris version takes six. The new sixth argument is a flag used by OpenSolaris' AIO subsystem and should be zero when SYS_sigqueue is issued from within a solaris10-branded zone. Reissuing the syscall within the emulation function with a zeroed sixth argument solves the problem. The above code issues a native SYS_sigqueue syscall (notice that the function offsets the syscall identifier by 1024), passing all of the arguments provided by the calling process untouched and adding a zero as the sixth argument. The return value of the syscall is stored in rval, which is handed to the calling process when the emulation function completes. The error code (if any) produced by the syscall is returned to the calling process.
No truss points are triggered when a syscall is emulated by the brand library. If a native syscall is issued from an emulation function (see "Issuing System Calls within Emulation Functions" above), then the native syscall's truss point is triggered. However, if the emulation function does not issue a native syscall, then you should insert a truss point via the S10_TRUSS_POINT_* macros. These macros issue a SYS_brand syscall in order to simulate a truss point.
The macros have the following signature:
N is an integer between one and five (inclusive) that specifies the number of arguments to report in the truss point. <rv> is a non-NULL sysret_t * that stores the return value of the syscall that performs the truss operation. <SYS-identifier> is the numeric identifier of the syscall being issued as defined in usr/src/uts/common/sys/syscall.h (e.g., SYS_fstat). <errno-value> is an integer such that if it is nonzero, then the SYS_brandsys syscall that simulates the truss point stores it in the calling thread's errno. <arguments> is a comma-separated list of N values to report in the truss point.
The macros return zero for success and an errno error code for failure.
In this example, SYS_systeminfo is emulated entirely in the brand emulation library by the function s10_sysinfo(). If a process running in a solaris10-branded zone issues SYS_systeminfo and an instance of truss is observing the process' syscalls, then the latter won't see the SYS_systeminfo syscall because s10_sysinfo() never issues a native syscall. Creating a simulated truss point solves this problem:
Notice that the function passes zero as <errno-value> in the truss macro in order to tell observing truss processes that the syscall completed successfully.
If you need to copy data to or from a buffer or structure provided by the calling process, then you should use s10_uucopy() and s10_uucopystr(). Both prevent the emulation library from performing illegal memory accesses if the calling process provides junk pointers. The signatures of the two functions are identical:
s10_uucopy() copies size bytes from from to to. s10_uucopystr() functions like strncpy(3C) in that it copies at most size characters from from to to, but unlike strncpy(3C) it never adds a terminating NULL byte. Both functions indicate success by returning zero and indicate failure by returning an errno value (e.g., EFAULT).
This example expands the example of inserting truss points, which showcased part of the emulation function for SYS_systeminfo.
When the calling process requests either the release (SI_RELEASE) or the version (SI_VERSION) of the kernel, the emulation function produces fake values so that the calling process will see values reflecting a Solaris 10 kernel. Once the appropriate string is selected, s10_uucopystr() copies it to the buffer provided by the calling process. However, s10_uucopystr() does not copy the string's terminating NULL byte, so the function invokes s10_uucopy() to append a NULL byte to the end of the buffer.
Some syscalls' and ioctls' parameters' structures and semantics differ between Solaris 10 and OpenSolaris. For example, the contract file system's ct_param_t structure changed in OpenSolaris so that it looks like
instead of
Notice that the third field, ctpm_value, changed from uint64_t to void *. Unfortunately, this means that the size of the structure is 12 bytes on 32-bit systems and 16 bytes on 64-bit systems. Solaris 10 processes issuing CT_TGET or CT_TSET contract file system ioctls will pass the Solaris 10 version of the structure as the ioctl argument, which is invalid on 32-bit OpenSolaris systems.
Other examples of incompatible structure changes include new fields, deleted fields, new flag field values, and field value range changes.
Structure changes can be overcome by defining the Solaris 10 version of the structure in the brand emulation library, declaring a stack instance (local variable) of the OpenSolaris version of the structure in the associated emulation function, copying the contents of the structure parameter to the local structure variable in the appropriate places (see "Copying Memory" above), adjusting the fields of the structure on the stack as necessary, issuing the system call or ioctl and passing the OpenSolaris structure instead of the Solaris 10 structure, and copying modified fields from the OpenSolaris structure to the Solaris 10 structure if any of the former's fields were modified by the syscall or ioctl. Sometimes a stack instance of the Solaris 10 structure must also be declared and utilized: see the example below for a case in which this is necessary.
The following example shows how the aforementioned ct_param_t structure change can be circumvented by the brand emulation library. ctfs_ioctl() emulates the contract file system CT_TGET and CT_TSET ioctls in the emulation library. Here is its definition:
The solaris10 brand emulation library is compiled twice for both x86 and sparc in order to produce 32- and 64-bit shared libraries. Consequently, it is possible to restrict pieces of code in the emulation library to particular architectures via the standard architecture preprocessor symbols: __x86 for 32- or 64-bit x86; __i386 for 32-bit x86; __amd64 for 64-bit x86 (i.e., x86-64); __sparc for any sparc architecture; __sparcv7, __sparcv8, and __sparcv9 for sparc V7, V8, and V9, respectively; and _LP64 for any 64-bit architecture. For example, suppose that a particular ioctl only needs to be emulated when issued by a 64-bit x86 process. We can implement such emulation by wrapping the emulation code (and its invocations) with preprocessor conditional statements:
However, care must be taken when restricting syscall emulation to specific architectures. If the syscall emulation function will only exist for some architectures, then the s10_sysent_table array in the emulation library should be configured such that it specifies NOSYS for the architectures for which the syscall will not be emulated. The brand's kernel module's s10_emulation_table should also be properly configured so that the syscall is only emulated for those architectures for which emulation is necessary. (For more information about s10_sysent_table and s10_emulation_table, see "Emulating a System Call" above.) For example, if SYS_fstat were to be emulated by sparc alone, then we should define the emulation function thus:
We would have to modify s10_sysent_table as follows (note that SYS_fstat is syscall number 28):
Finally, we would have to modify _init() in the kernel module thus:
If a syscall will only be emulated on one or more 64-bit architectures, then an emulation function usually has to be created for 32-bit architectures that only passes the syscall's arguments to the native kernel. For example, if SYS_fstat should only be emulated by 64-bit x86 processes, then we should define the emulation function thus:
We would have to modify s10_sysent_table as follows:
This specifies that SYS_fstat will be emulated by s10_fstat() on both 32- and 64-bit x86 but not on sparc. Finally, we would have to modify _init() in the kernel module thus:
The above code ensures that only the 64-bit x86 kernel module will pass SYS_fstat syscalls to the brand emulation library. However, both 32- and 64-bit processes are capable of running on 64-bit kernels and the kernel brand framework does not discriminate between syscalls issued by 32- and 64-bit processes. Consequently, s10_sysent_table must be configured so that SYS_fstat is handled by the brand emulation library for both 32- and 64-bit processes even though the 32-bit emulation function simply hands control back to the kernel. If s10_sysent_table were instead configured thus:
then if a 32-bit x86 process running on a 64-bit kernel were to issue SYS_fstat, then the brand's kernel module would pass the syscall to the brand emulation library and the library would signal the calling process with SIGSYS (bad syscall) because the 32-bit library does not emulate SYS_fstat.
Keep the following considerations in mind when you modify the emulation library:
Modify the solaris10 brand boot script (usr/src/lib/brand/solaris10/zone/s10_boot.ksh in the source tree) as follows:
The brand boot script executes whenever a solaris10-branded zone boots. Once zone bootup is complete, every replaced command X will have been backed up to X.pre_p2v. If you need to restore the replaced command's binary within a zone, then execute mv X.pre_p2v X within the zone, where X is the full path of the replaced command.
Here is an example that replaces /usr/bin/zcat with OpenSolaris' /usr/bin/zcat (with owner root, group bin, and UNIX file permissions 0555) and /usr/demo/dtrace/kstat.d with OpenSolaris' /usr/bin/true (with owner root, group bin, and UNIX file permissions 0644):
Some commands in /usr/bin, such as /usr/bin/pmap, are split into 32- and 64-bit versions that reside in architecture-dependent subdirectories of /usr/bin. For example, the 32-bit version of pmap resides in /usr/bin/i86 while the 64-bit version resides in /usr/bin/amd64 on x86 systems. Sparc systems only have a 64-bit version of pmap: /usr/bin/sparcv9/pmap. To replace a split command such as pmap, you must replace both the 32-bit binary (if it exists) and the 64-bit binary.
Suppose that you want to replace the split command /usr/bin/X with the OpenSolaris version of /usr/bin/X. Then skip steps four and five above and add the following lines below "STEP TWO" instead:
<mode>, <owner>, and <group> have the same semantics as in steps four and five above.
Here is an example that replaces the split command /usr/bin/pmap:
If the replacement command's path differs from that of the replaced command, then add the following lines below "STEP TWO" instead:
X is the name of the replaced command. <32-bit-replacement-abs-path> and <64-bit-replacement-abs-path> are the absolute paths of the replacement commands for the 32- and 64-bit binaries (respectfully).
Any OpenSolaris changes that require updates to the brand's emulation must ensure that Solaris 10u8 and later will function correctly inside of solaris10-branded zones. This is simply a matter of straightforward compatibility and no special versioning support is needed.
Versioning becomes a factor when making changes to Solaris 10 code. A single OpenSolaris system can host solaris10-branded zones running different releases of Solaris 10 (u8, u9, etc.). For example, an OpenSolaris system might have two zones foo and bar that host Solaris 10u8 and Solaris 10u9 environments, respectively. This is problematic because different Solaris 10 releases might require different emulation due to backports and other Solaris 10 changes. The solaris10 brand overcomes this problem through a basic versioning system.
Note that enhancing the brand's emulation library to dynamically detect the presence or absence of features in hosted Solaris 10 environments is preferable to versioning. You can see an example of this in s10_lwp_private(), which determines whether a zone's libc works when the %fs x86 segment register is zero. (Solaris 10u8 sets %fs to a nonzero value in 64-bit x86 processes whereas OpenSolaris clears it.) If it does not, then s10_lwp_private() changes the current LWP's %fs register to a nonzero value and adjusts the brand emulation library so that future LWPs created via the SYS_lwp_create syscall will have nonzero %fs registers; otherwise, the emulation library does not change %fs. Read the subsection entitled "Dynamic Feature Detection: Sysent Table Patching" to learn about techniques that improve the performance of emulation functions that utilize dynamic feature detection.
In order to use versioning, you must change OpenSolaris thus:
For example, suppose that you need to backport a fix to Solaris 10u9 that would eliminate the need for MNTFS ioctl emulation. You would have to add a new constant to s10_emulated_features representing the backport. If you were to name the constant S10_FEATURE_ALTERED_MNTFS_IOCTL, then you would have to add it to s10_emulated_features thus:
If your addition were to conflict with another developer's putback because the developer also added a constant (say, S10_FEATURE_SOME_BACKPORT), then you would have to merge the changes thus:
Emulation functions can easily detect fixes and backports inside solaris10-branded zones via the S10_FEATURE_IS_PRESENT() macro. S10_FEATURE_IS_PRESENT(N) is true iff the fix or backport represented by the s10_emulated_features constant N is present in the associated zone's Solaris 10 environment. Armed with this macro and your new s10_emulated_features constant, you can make the emulation library conditionally emulate syscalls or ioctls (or both).
Take the aforementioned example of a backport that obsoletes MNTFS ioctl emulation. You cannot delete the emulation because older Solaris 10 environments (e.g., u8 environments) will need the emulation in order to function properly; therefore, the emulation library should emulate the ioctl iff the Solaris 10 backport is not present. You should make the ioctl's emulation look something like this:
On the other hand, suppose that the backport represented by S10_FEATURE_ALTERED_MNTFS_IOCTL would require MNTFS ioctl emulation that did not previously exist. Swapping the above example's branches would achieve the desired effect: The MNTFS ioctl would be emulated iff S10_FEATURE_IS_PRESENT(S10_FEATURE_ALTERED_MNTFS_IOCTL) were true.
You must do the following as part of your Solaris 10 fix or backport after you finish modifying the solaris10 brand's emulation:
zoneadm(1M) will refuse to boot solaris10-branded zones that host Solaris 10 environments containing unrecognized numbered files in /usr/lib/brand/solaris10. In other words, solaris10-branded zones hosting newer versions of Solaris 10 will not boot if the solaris10 brand does not provide sufficiently up-to-date emulation (i.e., OpenSolaris is not sufficiently up-to-date). For example, the brand will not permit a solaris10-branded zone containing /usr/lib/brand/solaris10/5 to boot when the brand only recognizes s10_emulated_features constants that are less than five (i.e., S10_NUM_EMULATED_FEATURES equals five). This prevents users from hosting Solaris 10 environments that cannot be correctly emulated.
Suppose that you need to change a device's ioctl in OpenSolaris and backport the change to Solaris 10u9 at the same time. Furthermore, suppose that the device is used in native Solaris 10 zones. Then you will have to change OpenSolaris thus:
Additionally, you should do the following in the Solaris 10 backport:
Syscall emulation functions can dynamically adjust syscalls' s10_sysent_table entries to make the emulation library invoke different syscall emulation functions. This technique, called sysent table patching, eliminates repeated dynamic feature detection (e.g., probing zones' environments for indications of whether a fix or backport is present) at the cost coding and maintaining multiple emulation functions.
s10_lwp_private() uses sysent table patching. As described above, s10_lwp_private() determines whether the Solaris 10 libc can function when the %fs x86 segment register is zero. If libc cannot, then s10_lwp_private() modifies the SYS_lwp_create syscall's entry in s10_sysent_table so that s10_lwp_create_correct_fs emulates SYS_lwp_create syscalls rather than the default handler, s10_lwp_create. s10_lwp_create_correct_fs ensures that new LWPs start in s10_lwp_create_entry_point, which sets the new LWP's %fs register to the legacy nonzero Solaris 10 selector value before making the LWP jump to its true entry point. s10_lwp_create simply hands the SYS_lwp_create syscall to the kernel. s10_lwp_private() is invoked once after a process execs and only while the process is single-threaded; therefore, its sysent table patch, if applied, affects all of the process' SYS_lwp_create syscalls.
In general, do the following when you want a syscall to use sysent table patching:
For example, suppose that SYS_fstat needs to be emulated when a zone's Solaris 10 environment contains the fix represented by a hypothetical s10_emulated_features constant named S10_FEATURE_FSTAT_CHANGE. We would follow the above procedure thus:
Please see the section entitled "How is versioning handled for different Solaris 10 updates or patches running in the zone when they need different emulation?" for the full backporting and conditional emulation procedure.
There are two ways to test your OpenSolaris fixes and projects with solaris10-branded zones:
The DIY test page at diy,ireland includes tests for zones under Category:Zones. When you run these tests you will automatically run tests for solaris10 branded zones.
Your system must have the SUNWs10brand package. You can install it on OpenSolaris via the pkg(5) command:
The OpenSolaris.org development repository publishes the SUNWs10brand package.
Use zonecfg(1M) to configure a new solaris10-branded zone. Be sure to specify SUNWsolaris10 when issuing the create subcommand. For example:
Each zone configuration must specify a zonepath: Every other property is optional. Note that the zone path of a solaris10-branded zone must reside on a ZFS filesystem. The following is an example of a minimal configuration for a solaris10-branded zone:
Once you have configured a solaris10-branded zone, install it via zoneadm(1M) using the install subcommand. You will have to use one of the -a or the -d options. -a <archive-path> specifies a flash_archive(4); a cpio(1) archive, which can be compressed with either gzip(1) or bzip2(1); a pax(1) "xustar" archive; or a level zero ufsdump(1M) of the installed Solaris 10 system (either a physical system or a native Solaris 10 zone) whose files will be installed into the new zone. -d <s10-system-root-path> specifies the full path of the root directory of an installed Solaris 10 system (again, either a physical system or a native Solaris 10 zone) whose files will be installed into the new zone. You must also specify either -p to preserve the virtualized system's configuration or -u to run sys-unconfig(1M) within the zone after it is installed. The following example installs the zone configured above from a flash archive of a physical Solaris 10 update 8 system and indicates that sys-unconfig(1M) should be run within the zone:
The solaris10(5) man page contains more information about configuring and installing solaris10-branded zones.
A variety of pre-built Solaris 10u8 images are available within Sun for testing. They are located in /net/kodiak.sfbay/gates/s10brand/public/images.
Boot the zone via the zoneadm(1M) boot subcommand:
Once the zone boots, you can log into the zone via the zlogin(1) command. If you specified -u when you installed the zone, then you should grab the zone's console and step through the sys-unconfig(1M) screens via zlogin -C. For example:
The zone's filesystem is accessible from the global zone via <zonepath>/root, where <zonepath> is the zone's path as specified in its configuration.
Once you boot a solaris10-branded zone, you can do two things: (Please note that these options are not mutually exclusive; in fact, doing both is highly recommended.)
If your changes only affect a few binaries (e.g., your changes alter private interfaces between special libraries and the kernel), then it is usually sufficient to test the affected binaries in your zone. For example, if your changes alter a ZFS ioctl, then you can execute relevant ZFS commands in your zone in order to ensure that the ioctl functions properly. If your changes affect a syscall, then you can create and execute tests that invoke library functions that issue the syscall. If you are using native command replacement, then you can log into your zone and test the functionality of the replacement binary's CLI and the functionality of commands that fork the binary if it is a command or test the functionality of commands that communicate with the replacement if it is a daemon. You should also consider running multithreaded tests that stress the interfaces affected by your changes if the interfaces are thread-safe in Solaris 10.
Remember, syscalls, ioctls, and other facets of the user-kernel interface as seen from within solaris10-branded zones must function as they do in Solaris 10. The same is true of commands and daemons within solaris10-branded zones.
Consider running one or more PIT tests, such as MSTC, in your zone in addition to manually testing your changes if your changes are extensive or affect frequently-used syscalls or ioctls (e.g., SYS_lwp_create or a tty ioctl).
NOTE: The following procedure uses Sun-internal links to the test pages. Links to procedures for external testing will be provided soon.
Determine which PIT tests run within solaris10-branded zones. Many but not all of the existing test suites work in solaris10-branded zones. Any suite that runs in native Solaris 10 zones should run identically in solaris10-branded zones. You can do one of the following to determine whether a test will work:
Afterwards, run the test suite. All PIT test suites can be run via the STEP tool. Click here for instructions on how to run STEP. You should run STEP from the global zone. STEP will query you for the hostname of your test system, at which point you should enter the name of the solaris10-branded zone that you created for the test. Specify "zlogin_console" when STEP queries for the connection type.
You will need to follow the procedures for manually testing the zone. However, you will not be able to use one of the pre-built Solaris 10u8 images. Instead, as part of your testing of the Solaris 10 change, you should create a flash archive of a physical system onto which you installed your change.
Instead of test.flar, you can name the flar as you choose. If your physical Solaris 10 test system has a ZFS root, then you must create the flar with an explicit cpio or pax archive using the -L option:
You can now use this flar to install the zone and complete the testing. When installing the zone using your new flar, you must also use the -F option which
bypasses the built-in version checking. This is needed because only Solaris 10u8 and later are supported and a BFU image does not look like a supported
version of Solaris 10.
If your code changes are simple, an alternative to creating your own flar is to use one of the pre-existing archives to install the zone and then
replace the updated user-level binaries with the new binaries that you've built. If you use this technique, you must be sure that the base archive you're
using is fully compatible with the changed files you're replacing.
Terms of Use
|
Privacy
|
Trademarks
|
Copyright Policy
|
Site Guidelines
|
Site Map
|
Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2012, Oracle Corporation and/or its affiliates.