Introduction to OpenSolaris xVM Developers

1. About this document

 This document is not intended to provide exhaustive documentation about the OpenSolaris xVM architecture, xVM configuration, or domain management. This document is intended to serve as a basic introduction to the components that make up the xVM universe, with a goal of providing developers with some context for understanding the changes introduced by its arrival.

 For more information about using xVM and/or the xVM architecture, please see:

2. Basic architecture

 xVM is based on the work of the Xen community. In a running system, Xen fits between the hardware and the operating system.

 xVM supports multiple operating system instances simultaneously. Each instance is called a 'domain'. There are two different kinds of domains: the control domain (typically called "dom0") and user domains (called "domU"s). Unlike Solaris Zones, each domain runs a full instance of an operating system.

 The xVM hypervisor virtualizes the system's hardware. This means that it transparently shares and partitions the systems resources (CPUs, memory, NICs, etc.) among the user domains.

 The hypervisor performs the low-level plumbing needed to provide a virtualized platform for operating systems, but it relies heavily on the control domain for almost everything else. The control domain decides which domUs are created, which resources they can access, how much memory they can have, and so on. In addition, xVM does not include any device drivers, so the dom0 performs all device access.

 There are two basic kinds of virtualization: full virtualization and paravirtualization. In a full virtualization, the operating system is completely unaware that it is running in a virtualized environment. In paravirtualization, the operating system is aware of the virtualization layer, and works with the hypervisor to achieve higher performance. xVM supports both models. Since it must work closely with the hypervisor layer, dom0 is always paravirtualized. domUs can be either paravirtualized or fully virtualized, and a system can have both varieties running simultaneously.

 Full virtualization requires that the hypervisor transparently intercept many operations that an operating system typically performs directly on the hardware. This interception allows the hypervisor to ensure that a domain cannot read or modify another domain's memory, cannot interfere with its device access, and cannot shut down the CPUs it is using. To implement full virtualization, xVM requires special hardware support. Specifically, you must be using Intel CPUs with VT support (such as the Intel Core Duo) or AMD CPUs with AMD-v technology. Full virtualization allows any x86 operating system to run in a guest domain, including Solaris, Linux, or Windows.

 Paravirtualization is built on top of a hypercall interface. Hypercalls are analagous to system calls, but they are made by the operating system into the hypervisor layer. These hypercalls allow a domain to initiate the creation of a new domain, request the creation of address mappings, pass a buffer from one domain to another, send an interrupt between CPUs, etc. The hypervisor layer allows the OS to explicitly initiate the operations that are transparently intercepted in a fully virtualized environment. Since there is no interception required, paravirtualization does not require any special hardware support, and will run on any CPU. Since paravirtualization requires changes to the OS, only specific operating systems can be hosted in a paravirtualized domU. Currently those are limited to Solaris, Linux, and FreeBSD.

3. Resource Virtualization

3.1. CPUs

 xVM assigns domains one or more virtual CPUs (vcpus). Each vcpu contains all the state one would typically associate with a physical CPU: registers, flags, timestamp, etc. A vcpu in xVM is an entity that can be scheduled, like a thread is in Solaris. When it is a domain's turn to run on a CPU, xVM loads the physical CPU with the state captured in the vcpu, and lets it run.

 Solaris treats each vcpu as it would a physical CPU. When xVM selects a vcpu to run, it will be running the thread that Solaris loaded on the vcpu.

3.2. Memory

 Solaris manages memory in pages. When running directly on hardware, Solaris uses the physical page frame numbers provided by the system's BIOS or firmware. When running under xVM, the hypervisor provides the list of available physical pages. To isolate the guest OS from the hardware, and to enable features such as live migration, the page numbers provided by xVM do not reflect the physical pages as understood by the hardware. Instead, the pages reflect a virtualized view of the system's memory. xVM maps the guest OS's "physical" page numbers to the "machine frame numbers" (MFNs) recognized by the hardware.

 For the most part, Solaris manages the PFNs provided by xVM exactly as it does the PFNs used on real hardware. The only part of Solaris that needs to understand that PFNs are virtualized is the very lowest level of the HAT layer. Only when we are modifying a process's page tables do we use MFNs rather than PFNs.

 If you find yourself needing to be aware of the distinction between MFNs and PFNs outside of the HAT layer, it almost certainly indicates that you are doing something wrong.

3.3. Devices

 Dom0 Solaris drivers are identical to Solaris drivers that run on bare metal. As long as these drivers are written entirely with standard DDI interfaces, they will function identically on dom0 and metal Solaris.

 Any driver that works around the DDI might encounter problems when run on dom0. One area of particular vulnerability is DMA. If a driver attempts to use PFNs directly, rather than using the ddi_dma_*_bind_handle() interfaces, it will miss the necessary PFN->MFN translation, and the driver will be instructing the device to perform I/O to or from the wrong location. The failure will most likely result in a crash of dom0, but it could also lead to silent data corruption.

 There are two different ways device drivers can work in the domU.

 In a fully virtualized domU, xVM will trap any writes to I/O space or any DMA operations, and will transparently forward these requests to the appropriate device in dom0. This trapping and forwarding is an expensive operation, so this kind of device access will likely result in poor performance, particularly for network devices.

 In a paravirtualized domU, each driver has a "front end," which is present within the domU, and a "back end," which runs in dom0. These are referred to as PV drivers. The front end driver takes standard requests from Solaris and forwards them to the back end driver. The back end driver executes the request on the physical hardware and passes any result back to the front end driver, which then notifies Solaris of the request's completion. Since the driver is explicitly hypervisor aware, it is able to work with the hypervisor and the back end driver to deliver much better performance than a driver in a fully virtualized environment.

 There are two main paravirtualized drivers in ON: a virtual block device and a virtual network device. These two drivers present the domU with standard disk and network devices.

 There is also a hybrid method of supporting devices: implement PV drivers in a fully virtualized environment. In this case, the operating system as a whole is still unaware that its environment is virtual, but the PV drivers in the domain are aware of the hypervisor. This method allows us to deliver much better I/O performance without requiring that the full OS be ported to the hypervisor.

4. Source structure and management

4.1 Hypervisor

 The xVM hypervisor is based on an open source project, and the xVM community is largely populated by Linux users. Quite a few changes were required to the hypervisor to allow Solaris to act as the dom0. While some of those have been folded back into the main xVM source tree, we have had to maintain the rest ourselves. Thus, you cannot simply download the source from the xVM open source site and expect it to work with a Solaris dom0.

4.2 Solaris and the Intel platforms

 With the introduction of xVM, there are now two different platforms on the Intel architecture: i86pc and i86xpv. i86pc refers to Solaris running on bare metal. i86xpv refers to Solaris running paravirtualized on top of a xVM hypervisor.

 The i86xpv and i86pc platforms are much more similar than they are different. The code that is specific to the i86xpv platform is found under usr/src/uts/i86xpv. However, the bulk of the code used to implement the i86xpv platform is taken directly from the i86pc sub-tree, usr/src/uts/i86pc)\.

 There are some header files from the xVM source tree that are needed to build the i86xpv platform code. These headers can be found in common/xen/public. Ensuring that the header files imported into the ON source tree track those in the xVM source tree is a manual process. Any engineer that modifies the head files in one tree must make the identical modification in the other.

 Code that is meant to apply only to i86xpv platforms should be protected with: #ifdef xpv.

 For the most part, users and userland applications should be completely unaware of which platform they are running on. At user-level, there is complete binary compatibility between i86pc and i86xpv. If you do feel the need to determine which platform you are using, use uname -i.

5. Booting

 Whether to run Solaris as a virtualized dom0 or as a standalone operating system is a boot-time decision.

 To run Solaris as a standalone OS, continue to use the same GRUB menu entries that you use currently. To run Solaris as a dom0 with the hypervisor, there must be an entry in /boot/grub/menu.lst that specifies the hypervisor, as described here.

6. Observability / debug capability

6.1 xm

 Although the hypervisor and dom0 work closely together to manage a running system, the dom0 operating system has little direct visibility into the hypervisor. The hypervisor's entire address space is inaccessible to the dom0, so the only source of information is xm, a user-space tool that communicates with the hypervisor via hypercalls.

 Some of the useful xm commands are:

  • xm info  - Report static information about the machine, such as number of CPUs, total memory, and xVM version info.
  • xm list  - List all domains and some high-level information
  • xm top   - Analogous to the Linux "top" command, but it reports domain info rather than process info.
  • xm log   - Display the contents of the xend log
  • xm help  - List all the available commands
  • xentrace - Capture trace buffer data from xVM
  • xentop    - Display information about the xVM system and domains in a continuously-updating manner.

6.2 Crash dumps

 On a running system, the hypervisor's memory is completely off-limits to dom0. If the hypervisor crashes, however, the resulting panic dump will generate a core file that provides a unified view of both xVM and dom0. In such a core file, xVM appears as a simple Solaris kernel module called xpv.

 For example:


                > $c
                xpv`panic+0xbf()
                xpv`do_crashdump_trigger+0x19()
                xpv`keypress_softirq+0x35()
                xpv`do_softirq+0x54()
                xpv`idle_loop+0x55()

 To be clear: if a dom0 crashes with a standard Solaris panic, the dump will include just the dom0. It is only when the hypervisor itself panics that the resulting dump includes the xVM state as well.

 For more information about handling xVM panics, as well as an example of debugging a xVM panic, see: http://blogs.sun.com/nilsn/resource/xen _solaris_panic.pdf.

 If a domain appears hung, use xm dump-core to take a dump file. You can look at this file with /bin/mdb.

6.3 DTrace

 Due to the isolation of the hypervisor from dom0, there is currently no way to apply DTrace directly to the hypervisor. There is a new xpv DTrace provider that allows you to trace the interaction between dom0 and the hypervisor. This provider is constructed of SDT probes introduced into the privcmd device driver. The available probes may be listed by using: dtrace -l -i 'xpv:::'

 While understanding the details of these probes requires a fair bit of knowledge about the OpenSolaris xVM interface, simply enabling them all (dtrace -n 'xpv::: {}') provides a quick high-level introduction to the steps involved with creating domains, destroying domains, migrating domains, etc.

7. Known problems

 For descriptions of bugs, known issues, and tips, see the release notes. 

/twiki/bin/view/Matrix/WebHo me Copyright © 1999-2007 Sun Microsystems, Inc.

Tags:
Created by admin on 2009/10/26 12:11
Last modified by johnlev on 2009/10/28 23:16

Collectives

Project


© 2010, Oracle Corporation and/or its affiliates
XWiki Enterprise 2.1.1.25889 - Documentation
Terms Of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines | Site map | Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
Oracle