Performance
- Scheduler optimizations and shared cache awareness
- Shared cache detection / optimization
- The CPUID instruction allows detection of shared caches between cores and threads on the physical processor. Solaris should detect which cores/threads share caches, so that the appropriate affinity and load balancing polices can be implemented. The RFE to track this work is 6495401. To maximize aggregate cache availability, on partially utilized systems the dispatcher should load balance across the shared cache level, and to improve utilization, where possible threads needing to migrate should prefer to do so between the set of CPUs sharing the cache.
- Monitor/mwait based "halt/wakeup" mechanism
- When CPUs go idle, they eventually invoke the halt instruction, which puts the CPU in the C1 state. At this point the CPU is suspended (in a sense), and can/will be awoken via an interrupt. When work destined for the halted CPU becomes available, the CPU enqueueing the thread on the halted CPUs run queue must send an inter-processor interrupt to the CPU to awaken it, and in practice, this is done in the context of a given thread dropping a lock. Eliminating the need to send the ipi would improve the performance of the lock dropping path. This can be accomplished by using a monitor/mwait based mechanism...since mwait can also be used in the idle() code path to put the CPU in the C1 (or deeper) state. The sleeping CPU can monitor the runnable thread count in it's associated dispatch queue...so that as soon as the awakening thread is enqueued, the sleeping CPU will come out of the C1 state. This work is tracked by RFE 6495342
- Shared cache detection / optimization
- New instruction support
- Kernel architecture primitive operations
- Page copy
- Page clear
- bcopy
- Library performance optimization
- C-runtime library
- libm
- HPC
- ACML
- MPI
Details
Performance work is being tracked with bug reports (also known as CR "Change Request").
List of CRs sorted by priority with links to public CR information, owners, and status:
- 5070897 kernel bzero routine should use non-temporal instructions for larger areas
- Priority: P1
- Owner: Bob Kasten
- Status: In Development/Testing
- 6292199 bcopy and kcopy should'nt use rep, smov
- Priority: P1
- Owner: Bob Kasten
- Status: In Development/Testing
- no bugid: improve libc memcpy, memmove, memset performance using Intel SSE instructions
- Priority: P1
- Owner: Bob Kasten
- Status: In Development/Testing
- 6495392 use monitor/mwait for halting idle CPUs where supported
- Priority: P2
- Owner: Bill Holler
- Status: RTI filed. Waiting for Perf PIT and ON PIT test results.
targeting S10U5
- 6537908 Investigate SSE instructions for kernel primatives
- Priority: P2
- Owner: Bill Holler
- Status: Under Investigation
- 6537868 NTA primatives should be cache size aware
- Priority: P2
- Owner: Krishnendu Sadhukhan
- Status: Testing
- 6495401 cpuid based cache hierarchy awareness
- Priority: P3
- Owner: Eric Saxe
- Status: Integrated into Nevada build 69
- 6537919 Use xmm registers for sha1
- Priority: P3
- Owner: Bill Holler
- Status: Under investigation
- 6284837 Please provide cache information, perhaps with psrinfo
- Priority: P3
- Owner: Bill Holler
- Status: Under Investigation
- 6537876 xmm register save for kernel threads
- Priority: RFE/P4
- Owner: Bill Holler
- Status: Under Investigation
- 6537957 fastheaders with inline needed
- Priority: RFE/P4
- Owner: Compiler group / Stanislav Mekhanoshin?
- Status: none
Related previous work
Previously completed related work above CRs may build on:
*
on 2009/10/26 12:14