| Solaris |
|
|
Instruction Based Sampling is an performance observability feature available as of AMD family 0x10 processors (e.g. Barcelona). While many modern processors offer performance counters as a mechanism for observing counts of certain performance relevant events, this data often lacks the specificity needed to gain an accurate understanding of performance (or the lack thereof). As an example, many performance counter facilities enable one to count memory references, but this doesn't show which memory is being accessed.
In many ways, AMD's Instruction Based Sampling facility bridges this gap. It works by periodically sampling instructions (or instruction ops) from an instruction stream (program execution). Detailed information about the sampled instruction/op is then collected as it makes its way through the pipeline. The information is then made available through the IBS facility.
IBS provides the performance analyst with a mechanism for effectively observing:
IBS is described in Appendix G of Software Optimization Guide for AMD Family 10h Processors. This article provides an example of how IBS can be used (using matrix multiplication as example).
A prototype DTrace provider has been developed that allows one to interface with the IBS feature through DTrace. The provider exports a set of ibs DTrace probes that (when enabled) fire after IBS samples an instruction / op.
The information IBS provides about the sampled op/instruction is available both in the body of the DTrace probe, as well as the probe's predicate. DTrace allows one to easily build predicates to filter for the performance events of interest, and its data aggregation features provide a powerful mechanism for managing, analyzing, and visualizing the stream of performance data the IBS feature provides.
A fairly full featured prototype is available.
The purpose of IBS DTrace provider is to provide convenient access to the IBS functionality. Currently the provider provides 2 kinds of probes:
Note: The x in the probe name actually goes into bits [4:19] of the 20 bit count of instruction fetches/micro-ops executed (with bits [0:3] being 0). So the actual number of instruction fetches/micro-ops executed before the IBS selects an instruction/micro-op for recording data is greater than x. For instance x = 1000 corresponds to 16000 instruction fetches/microops executed. When the probe fires, the recorded data is returned in a data structure as args[0]. The data structures are defined as follows:
#define IBS_REG_BITFIELD(name, ...) \
union { \
uint64_t reg; \
struct { \
uint64_t __VA_ARGS__; \
} bit; \
} name
struct ibs_fetch_data {
uint64_t cpu_id;
IBS_REG_BITFIELD(IbsFetchCtl,
IbsFetchMaxCnt:16,
IbsFetchCnt:16,
IbsFetchLat:16,
IbsFetchEn:1,
IbsFetchVal:1,
IbsFetchComp:1,
IbsIcMiss:1,
IbsPhyAddrValid:1,
IbsL1TlbPgSz:2,
IbsL1TlbMiss:1,
IbsL2TlbMiss:1,
IbsRandEn:1,
IbsReserved:6);
uint64_t IbsFetchLinAd;
uint64_t IbsFetchPhysAd;
};
struct ibs_exec_data {
uint64_t cpu_id;
uint64_t IbsOpRip;
IBS_REG_BITFIELD(IbsOpData,
IbsCompToRetCtr:16,
IbsTagToRetCtr:16,
IbsOpBrnResync:1,
IbsOpMispReturn:1,
IbsOpReturn:1,
IbsOpBrnTaken:1,
IbsOpBrnMisp:1,
IbsOpBrnRet:1,
reserved:26);
IBS_REG_BITFIELD(IbsOpData2,
NbIbsReqSrc:3,
reserved:1,
NbIbsReqDstProc:1,
NbIbsReqCacheHitSt:1,
reserved2:58);
IBS_REG_BITFIELD(IbsOpData3,
IbsLdOp:1,
IbsStOp:1,
IbsDcL1tlbMiss:1,
IbsDcL2tlbMiss:1,
IbsDcL1tlbHit2M:1,
IbsDcL1tlbHit1G:1,
IbsDcL2tlbHit2M:1,
IbsDcMiss:1,
IbsDcMisAcc:1,
IbsDcLdBnkCon:1,
IbsDcStBnkCon:1,
IbsDcStToLdFwd:1,
IbsDcStToLdCan:1,
IbsDcUcMemAcc:1,
IbsDcWcMemAcc:1,
IbsDcLockedOp:1,
IbsDcMabHit:1,
IbsDcLinAddrValid:1,
IbsDcPhyAddrValid:1,
IbsDcL2tlbHit1G:1,
reserved:12,
IbsDcMissLat:16,
reserved2:16);
uint64_t IbsDcLinAd;
uint64_t IbsDcPhysAd;
};
The names of the fields correspond to the register/bitfield names as described in the family 0x10 BKDG. For bitfiels a union is used to simplify access to the individual bits. For more details refer to the family 0x10h Optimization guide (above).
The following simple script sums up the dcache misses caused by different executables. Note that this number would not be a precise total, since the accounting is not done on a per instruction or micro op basis. But still it gives a reasonable indication of how each executable is doing in terms of cache misses.
#!/usr/sbin/dtrace -s
#pragma D option quiet
ibs-exec-2000
{
@exec[execname] = sum(args[0]->IbsOpData3.bit.IbsDcMiss);
}
END
{
printf("\nDcache misses per exec:\n");
printa(@exec);
}
The following script adds more functionality and observes only an executable called "memtest":
#!/usr/sbin/dtrace -s
#pragma D option quiet
ibs:::ibs-fetch-500
/execname == "memtest"/
{
@fetch[execname] = sum(args[0]->IbsFetchCtl.bit.IbsL2TlbMiss);
}
ibs-exec-1000
/execname == "memtest"/
{
@exec[execname, args[0]->cpu_id] = sum(args[0]->IbsOpData3.bit.IbsDcMiss);
}
ibs-exec-1000
/execname == "memtest" && args[0]->IbsOpData3.bit.IbsDcMiss == 1 && args[0]->IbsOpData3.bit.IbsDcLinAddrValid == 1/
{
@linadr[args[0]->IbsDcLinAd] = count();
}
END
{
printf("\nNumber of L2 TLB misses:\n");
printa(@fetch);
printf("\nDcache misses per core:\n");
printa(@exec);
trunc(@linadr, 10);
printf("\nTop 10 VA that caused dcache misses:\n");
printa("%16x %16x %@10d\n", @linadr);
}
$ hg clone ssh://your-login@hg.opensolaris.org/hg/amd/ibs-gate
For help with using Mercurial, or the ON tools, you can:
$ cd ibs
$ /opt/onbld/bin/bldenv -d /opt/onbld/bin/opensolaris.sh
$ cd usr/src/tools
$ dmake install
$ cd $CODEMGR_WS/usr/src/uts
$ dmake install
$ /opt/onbld/bin/Install -G my_ibs_kernel -k i86pc
$ cd ibs
$ tar xf on-closed-bins.i386.tar
$ /opt/onbld/bin/nightly /opt/onbld/bin/opensolaris.sh
Alternatively, you can use a standalone source package that contains just the files necessary to build the IBS provider:
$ gzcat dtrace-ibs.tar.gz | tar xf - $ cd dtrace-ibs $ make $ make install $ add_drv ibs
| File | last time updated | Solaris Versions supported |
|---|---|---|
| dtrace-ibs.tar.gz | 2010-01-28 18:20 | build 131 and later |
To ease testing of the provider, preliminary binary packages are available for download. Those packages contains the IBS provider module and a special devfsadm link module to create the device link in the /dev filesystem. To install one of those packages, extract the tarball into some directory and use pkgadd(1M) add it:
$ cd /tmp $ mkdir ibs $ cd ibs $ gzcat SUNWibs.tar.gz | tar xf - $ pkgadd -d .
| File | last time updated | Solaris Versions supported |
|---|---|---|
| SUNWibs.tar.gz | 2010-01-14 17:30 | build 130 and later |
Terms of Use
|
Privacy
|
Trademarks
|
Copyright Policy
|
Site Guidelines
|
Site Map
|
Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2012, Oracle Corporation and/or its affiliates.