| Solaris |
|
|
Alexander Kolbasov (alexander.kolbasov@sun.com)
CPU caps provide absolute fine-grained limits on the amount of CPU resources that can be consumed by a project or a zone. It is provided as an extension to resource_controls(5) . This document provides a short description specifically for PSARC. A full description is available in the design document [1] and the implementation is discussed in the implementation guide [2]. The on-line version of this document is available in [5].
CPU caps extend existing facilities that control CPU usage on the system - processor binding, processor sets and the Fair Share Scheduler. Unlike processor binding and processor sets, CPU caps allow applications to run on any CPUs while limiting their CPU usage and provide fine-grained (specified in fractions of a CPU) limit. In contrast to the FSS(7) , CPU caps provide absolute usage limit that does not depend on other applications running in the system. CPU caps can be used together with CPU binding, processor sets and FSS. When used in conjunction with processor sets, CPU caps limit CPU usage within a set. When used with FSS(7) , CPU resources defined by caps are further subdivided using FSS(7) shares.
CPU caps provide several advantages over the mechanisms described above:
The CPU caps are administered using the resource_controls(5) framework or the zonecfg(1M) command (see section 3). Two new resource controls are introduced to set per-project and per-zone CPU caps (see section 6.1).
System administrators can use kstats to observe impact of CPU caps at the zone or project level (see 5.1). The DTrace sched provider is extended with new probes which can be used for per-process or per-thread analysis (see 5.3).
All extensions described here are Committed except for the kstats which are Uncommitted. Patch binding is requested for all.
Customers provided the following major reasons for wanting CPU caps:
In some form, such mechanism for limiting CPU resources is provided by other operating systems. It became one of many check-box items which customers look at when comparing resource management solutions from Sun and other vendors.
Solaris provides different mechanisms which can be used to control CPU usage of applications to some extent:
CPU binding and processor sets provide hard limits for applications since they only run on a bound CPU or within a set, but the minimum granularity for both is a single CPU. The FSS(7) provides soft limits which are enforced only when there is enough competition for CPU resources and is specified in units expressing relationships of processes to each other.
CPU caps are represented by two new resource controls. The cap value is specified in units of one per cent of a CPU regardless of how many CPUs are available or their characteristics. The value should be greater than zero.
A project CPU cap is represented by the project.cpu-cap resource control. It is associated with the privileged level and can be modified only by privileged (superuser) callers (see resource_controls(5) and prctl(1) for the description of privileged level).
The project.cpu-cap resource limit can be set statically for projects in project(4) file. For example, the following line in project(4) file sets persistent CPU cap of 6 CPUs for user akolb:
user.akolb:1234::::project.cpu-cap=(privileged,600,deny)
Project CPU cap can be also dynamically modified or removed on a running system using prctl(1) utility. For example, the following command modifies the CPU cap to limit user akolb to 3 CPUs:
$ prctl -r -t privileged -n project.cpu-cap -v 300 -i project user.akolb
A zone CPU caps is represented by the zone.cpu-cap resource control. Similar to the project cap it is associated with the privileged level and can be modified only by privileged (superuser) callers.
The zone.cpu-cap resource can be set for a zone using zonecfg(1M) command. The following example configures CPU cap for a zone to 3 CPUs:
zonecfg:myzone> add rctl zonecfg:myzone:rctl> set name=zone.cpu-cap zonecfg:myzone:rctl> add value (priv=privileged,limit=300,action=deny) zonecfg:myzone:rctl> end
The zonecfg(1M) is extended with a new resource called capped-cpu, as described in PSARC/2006/496 [3]. The resource value, called ncpus, maps to the zone.cpu-cap rctl.
The following example sets zone CPU cap using the capped-cpu resource:
zonecfg:myzone> add capped-cpu zonecfg:myzone>capped-cpu> set ncpus=3 zonecfg:myzone>capped-cpu>capped-cpu> end
The PSARC/2006/496 [3] describes the capped-cpu resource in the following way:
The capped-cpu resource has a single ncpus property which is a positive decimal with two digits to the right of the decimal. This property is implemented as a special case of the zonecfg cpu-cap alias. The special case handling of this property normalizes the value so that it corresponds to units of cpus and is similar to the ncpus property under the dedicated-cpu resource group. Unlike dedicated-cpu it will not accept a range and it will accept a decimal number. For example, when using ncpus in the dedicated-cpu resource group, a value of 1 means one dedicated cpu. When using ncpus in the capped-cpu resource group, a value of 1 means 100% of a cpu as the zone.cpu-cap setting. A value of 1.25 means 125%, since 100% corresponds to one full cpu on the system when using cpu caps. The intention here is to align the ncpus units as closely as possible in these two cases (dedicated-cpu vs. capped-cpu), given the limitations and capabilities of the two underlying mechanisms (pset vs. rctl).
See PSARC/2006/496 [3] for a description of the dedicated-cpu resource and ncpu property.
Here we commit to the capped-cpu resource.
For all threads running in capped projects or zones, the system keeps track of their CPU usage over short periods of time. When CPU usage of projects or zones reaches specified caps, threads in them do not get scheduled and instead are placed on the special wait queues in the kernel. These threads will become runnable again only when CPU usage drops below the cap level.
The time spent by threads on wait queues is combined with the time spent on run queues and is reported as part of the ``wait-cpu'' (latency) time by procfs. The ``wait-cpu'' time is also used to report time spent waiting on run queues (see pr_wtime field of struct prusage in proc(4) ). Wait times can be seen in the LAT column when prstat(1M) is invoked with the -m option. CPU time spent by threads on wait queues is also accumulated at the LMS_WAIT_CPU micro-state accounting state which is also used to account for time spent waiting on run queues.
We decided to combine the wait times that threads spent waiting on run queues and wait queues into a single bucket because currently there is a fixed set of micro-state accounting types. Any extension of this set will cause offsets of fields in data structures, embedding micro-state data, to change. This, in turn, implies that all proc(4) consumers should be recompiled. We feel that CPU wait time is general enough to include time spent waiting on both run queues and wait queues and it is reasonable to combine them together until micro-state accounting framework is extended.
CPU caps are enforced only for threads running in TS, IA, FX, and FSS scheduling classes and have no effect on threads running in RT (Real-Time) scheduling class.
All CPU usage accounting data is collected using micro-state accounting facility, so the accuracy of CPU caps depends on the accuracy of the micro-state data.
A more detailed description of the implementation is available in the implementation guide [2].
There are several ways to observe the impact of CPU caps at a zone or project level and at a process or thread level. Each project or zone CPU cap exports information via kstats (see 5.1).
The DTrace sched provider is extended with two new probes which can be used for gathering detailed data for times spent on wait queues (see 5.3). These probes provide thread and process level observability for CPU caps.
Zone and project CPU cap kstats contain the following information:
For example, when a project CPU cap is set to 50% for project 1234, the following command will show cap information:
$ kstat -m caps
module: caps instance: 0
name: cpucaps_project_1234 class: project_caps
above_sec 787
below_sec 260551
nwait 0
usage 1
maxusage 51
value 50
The following example is for zone kstats:
module: caps instance: 14
name: cpucaps_zone_14 class: zone_caps
above_sec 0
below_sec 3
maxusage 255
nwait 0
usage 19
value 300
For both zone and project kstats, the kstat instance is the same as zone ID. The PSARC 2006/598 Swap resource control case[4] provides a precedents for such kstats and establishes the naming conventions.
The maximum usage statistics provides a good way for users to estimate how to set up their CPU caps. They can set the cap to a very high value and observe the maximum usage while their application is running for a while. This gives an estimate of maximum CPU requirements for the workload.
The ps(1) command shows threads on the wait queue by displaying ``W'' for their state. For example:
$ ps -o pid,s,comm -p 101262 PID S COMMAND 101262 W /usr/perl5/bin/perl
The prstat(1M) command shows ``wait'' state for threads, sitting on wait queues. For example:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 100686 akolb 4272K 1516K wait 0 0 0:51:20 25% perl/1
CPU caps provide two new DTrace sched provider probes for observing scheduling impact for threads and processes:
The following description should be added to resource_controls(5) man page:
The Solaris Dynamic Tracing Guide should be updated to include information about new process states and two new sched provider probes.
The proc Provider section of the guide should be updated to reflect the addition of the new SWAIT processor state (table 25-5 ):
The Probes section of the guide (table 26-1 ) should be updated with information about two new probes:
The Arguments section of the guide (table 26-2 ) should be updated with information in following table, describing arguments for CPU Caps probe arguments:
| Probe | args[0] | args[1] |
| cpucaps-sleep | lwpsinfo_t * | psinfo_t * |
| cpucaps-wakeup | lwpsinfo_t * | psinfo_t * |
The Examples section should be updated with examples for CPU Caps probe arguments.
Here is a summary of proposed interface changes. All of them, except kstats are Committed.
:
Alexander Kolbasov 2007-01-10
Terms of Use
|
Privacy
|
Trademarks
|
Copyright Policy
|
Site Guidelines
|
Site Map
|
Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2012, Oracle Corporation and/or its affiliates.