xdt+dtrace
en

xdt+dtrace

Observability Support with xdt Probes and DTrace Scripts

xdt is an interface for DTrace to XenTrace probes
inside the Xen hypervisor. While xdt functionality is
limited (it provides time-delayed samples of XenTrace
buffers with no additional context), it can still be used  
to analyze performance issues.

xdt works by enabling tracing inside the hypervisor.
This will collect tracing data for a number of trace
points inside Xen. This data is collected in buffers,
one for each physical CPU. Dom0 can map these trace
buffers and collect the data, and that is what xdt
does. This updated version of xdt supports all
tracepoints inside Xen (previously a subset was supported). 

DTrace scripts that combine data from the xdt provider
as well as data from the backend drivers in dom0 are
available to obtain performance data.

At specific intervals, xdt (through a cyclic routine)
walks through the trace buffers and feeds the data to
DTrace by calling dtrace_probe(). This means that
all probes fire from this periodic dom0 kernel context.
The only consistent context that the probes can rely
on consists of the arguments to the probe, and a
context that is maintained by xdt, and is exported
through global variables in xdt (safe, because all
probes are executed in order and on one CPU at a
time only). This context can be accessed from
DTrace scripts by accessing these kernel variables:

  • `xdt_curpcpu  the current physical CPU (uint_t)
  • `xdt_curdom  the current domain id (uint_t)
  • `xdt_curvcpu  the current virtual CPU (uint_t)
  • `xdt_timestamp  the current Xen system time in nanoseconds (uint64_t)

This global context is only available in xdt probes, not
in any other ones, including BEGIN and END. The sched
probes may have explicit domain and vcpu arguments, as the
current domain/vcpu can be changed by the operation that
they report on. Also note that xdt_timestamp reflects the
timestamp that came with the current trace event. It is a
time snapshot and will not change during the execution
of the probe. It can only be used to measure time
between events, not changing time during events.

The following probe classes are available:

  • sched  scheduling events
  • pv  PV domain events
  • hvm  HVM domain events
  • mem  grant events
  • shadow  shadow page table (typically HVM domain) events
  • pm  power management events
  • trace  tracing meta events

Below is a list of all probes, with short descriptions.

xdt:sched::block
  A vcpu blocks in the Xen scheduler.
  arg0 = domain
  arg1 = vcpu

xdt:sched::idle-off-cpu
  The idle domain is being replaced by a newly scheduled domain/vcpu.
  arg0 = domain (will be 0x7fff, the idle domain)
  arg1 = vcpu
  arg2 = runtime for the idle domain in nanoseconds (e.g. time the physical cPU was idle)

xdt:sched::off-cpu
  A vcpu is being scheduled off of a cpu.
  arg0 = domain
  arg1 = vcpu
  arg2 = time the vcpu spent running on this cpu (nanoseconds)

xdt:sched::idle-on-cpu
  The idle domain is being scheduled on a cpu.
  arg0 = domain (will be 0x7fff, the idle domain)
  arg1 = vcpu
  arg2 = time spent waiting to be scheduled (not useful for this case)
  arg3 = allocated time slice (nanoseconds)

xdt:sched::on-cpu
  A vcpu is being scheduled on a cpu.
  arg0 = domain
  arg1 = vcpu
  arg2 = time spent waiting to be scheduled (nanoseconds)
  arg3 = allocated time slice (nanoseconds)

xdt:sched::shutdown-crash
  A domain is being shut down because of a crash.
  arg0 = domain
  arg1 = the initiating vcpu

xdt:sched::shutdown-poweroff
  A domain is being shut down becuase of a poweroff.
  arg0 = domain
  arg1 = the initiating vcpu

xdt:sched::shutdown-reboot
  A domain is being shut down becuase of a reboot.
  arg0 = domain
  arg1 = the initiating vcpu

xdt:sched::shutdown-suspend
  A domain is being shut down becuase of a suspend.
  arg0 = domain
  arg1 = the initiating vcpu

xdt:sched::sleep
  A vcpu sleeps.
  arg0 = domain
  arg1 = vcpu

xdt:sched::wake
  A vcpu is woken up.
  arg0 = domain
  arg1 = vcpu

xdt:sched::yield
  A vcpu yields.
  arg0 = domain
  arg1 = vcpu

xdt:sched::add
  A vcpu is added to a guest
  arg0 = domain
  arg1 = vcpu

xdt:sched::adjdom
  Scheduling parameters for a domain were adjusted.
  arg0 = domain

xdt:pv::dt-mapping-fault
  A PV domain faulted in a descriptor table mapping.
  arg0 = instruction pointer
  arg1 = offset in table

xdt:pv::emulate-priv-op
  Xen emulated a privileged operation for a PV domain that trapped.
  arg0 = instruction pointer

xdt:pv::forced-invalid-op
  An invalid op (ud2 insn) was inserted in a PV domain do emulate an op (cpuid).
  arg0 = instruction pointer

xdt:pv::hypercall
  A hypercall is done in to Xen from a PV domain.
  arg0 = hypercall number (%eax register)

xdt:pv::math-state-restore
  Xen restored FPU state after a DNA trap

xdt:pv::page-fault
  Page fault in a PV domain.
  arg0 = instruction pointer
  arg1 = faulting virtual address
  arg2 = error code

xdt:pv::paging-fixup
  A page fault was fixed up by Xen (it occurred because of Xen bookkeeping only)
  arg0 = instruction pointer
  arg1 = faulting virtual address

xdt:pv::pte-write-emul
  An emulated write to a PTE (PV writeable page tables)
  arg0 = page table entry (PTE)
  arg1 = virtual address
  arg2 = instruction pointer

xdt:pv::trap
  A trap occurred in a PV domain.
  arg0 = instruction pointer
  arg1 = trap number
  arg2 = 1 if the error code is valid, 0 otherwise
  arg3 = error code

xdt:hvm::vmexit
  An event occurred that made a HVM domain trap in to the hypervisor HVM
  code. The vmentry trace point (see below) fires when the hypervisor
  resumes executing the HVM guest code. All other hvm tracepoints happen
  between vmexit and vmentry and provide further information on the time
  that a VCPU for a HVM domain spends inside Xen.
  arg0 = exit reason (different for AMD and Intel)
  arg1 = instruction pointer

xdt:hvm::vmentry
  Resume execution of a HVM domain VCPU after a vmexit has been handled.
  No arguments.

xdt:hvm::vmmcall
  Explicit entry in to the hypervisor from a HVM domain
  arg0 = call number (%eax)

xdt:hvm::clts
  clts (clear task switch bit) instruction executed in a HVM domain.
  No arguments.

xdt:hvm::cpuid
  cpuid instruction executed in a HVM domain.
  arg0 = %eax (input)
  arg1 = %eax (output)
  arg2 = %ebx
  arg3 = %ecx
  arg4 = %edx

xdt:hvm::cr-read
  Control register read in a HVM domain.
  arg0 = control reg #
  arg1 = value

xdt:hvm::cr-write
  Control register write in a HVM domain.
  arg0 = control reg #
  arg1 = value

xdt:hvm::exception-inject
  Inject an exception in to a HVM domain.
  arg0 = trap number
  arg1 = error code

xdt:hvm::virq-inject
  Inject an interrupt in to a HVM domain.
  arg0 = vector

xdt:hvm::hlt
  hlt instruction executed in a HVM domain
  arg0 = 1 if VCPU is runnable, 0 if not

xdt:hvm::intr
  vmexit because of a physical interrupt while running a HVM domain.
  No arguments.

xdt:hvm::intr-window
  An interrupt can't be delivered to a HVM domain VCPU yet. This vmexit
  happens if a vmentry is done with an injected interrupt, but for
  some reason, it can't be delivered (interrupts disabled, SS segment reg
  operation).
  arg0 = vector
  arg1 = source (0 = none, 1 = PIC, 2 = LAPIC, 3 = NMI)
  arg2 = info (-1 if unavailable; different for AMD and Intel)

xdt:hvm::nmi
  An NMI occurred while executing a HVM domain.
  No arguments.

xdt:hvm::smi
  An SMI occurred while executing a HVM domain. AMD only.
  No arguments.

xdt:hvm::mce
  MCE in HVM domain.
  No arguments

xdt:hvm::lmsw
  lmsw instruction executed in a HVM domain
  arg0 = value
  
xdt:hvm::mmio-read
xdt:hvm::mmio-write
xdt:hvm::pio-read
xdt:hvm::pio-write
  An PIO instruction was executed, or a memory-mapped I/O region
  was accessed, from a HVM domain.
  arg0 = address (port or physical)
  arg1 = count (for REP instruction prefix)
  arg2 = access size (1, 2, 4, 8)

xdt:hvm::msr-read
  An MSR was read in a HVM domain.
  arg0 = MSR
  arg1 = value

xdt:hvm::msr-write
  An MSR was written in a HVM domain.
  arg0 = MSR
  arg1 = value

xdt:hvm::invlpg
  The invlpg or invlpga instruction was executed in a HVM domain.
  arg0 = 1 if invplga, 0 if invlpg
  arg1 = virtual address

xdt:hvm::pagefault-inject
  A page fault is injected in to a HVM domain.
  arg0 = error code
  arg1 = faulting guest VA

xdt:hvm::pagefault-xen
  A page fault was fixed by Xen, and not injected in to the HVM guest, because
  it was an artifact of the shadow page table code.
  arg0 = error code
  arg1 = faulting guest VA

xdt:mem::page-grant-map
  A domain mapped a grant ref.
  arg0 = domain (owner of the grant)

xdt:mem::page-grant-transfer
  A domain transferred a grant ref.
  arg0 = domain (target domain)

xdt:mem::page-grant-unmap
  A domain unmapped a grant ref.
  arg0 = domain (owner of the grant)

xdt:shadow::domf-dying
  Fatal error while handling a mapping in the shadow page tables.
  arg0 = virtual address

xdt:shadow::emulate
  A PTE write was handled in the shadow page tables by emulating it.
  arg0 = PTE
  arg1 = written value
  arg2 = guest VA
  arg3 = flags

xdt:shadow::emulate-unshadow-evtinj
  The shadow page table code detected a page fault during exception/interrupt
  injection.
  arg0 = guest frame number
  arg1 = guest VA

xdt:shadow::emulate-unshadow-unhandled
  The shadow page table code failed to emulate a faulting instruction.
  arg0 = guest frame number
  arg1 = guest VA

xdt:shadow::emulate-unshadow-user
  The shadow page table code detected a user mode write to a page table.
  It assumes that this means the page is no longer a page table page.
  arg0 = guest frame number
  arg1 = guest VA

xdt:shadow::false-fast-path
  The shadow page table code detected that a page fault was fixed by another
  VCPU (this should be a rare condition).
  arg0 = guest VA

xdt:shadow::fast-mmio
  The shadow page table code handled a MMIO access through its fast path.
  arg0 = guest VA

xdt:shadow::fast-propagate
  The shadow page table code propagates a not-present page fault to the guest.
  arg0 = guest VA
  
xdt:shadow::fault-not-shadow
  Page fault was not a shadow page table fault and is bounced back to the
  guest.
  arg0 = PTE
  arg1 = guest VA
  arg2 = flags

xdt:shadow::fixup
  A page fault was fixed up in the shadow page table code, because it was
  cause by the shadow PT mechanism itself. The guest will not see it.
  arg0 = PTE
  arg1 = guest VA
  arg2 = flags

xdt:shadow::mmio
  A page fault was determined to be an MMIO access, and will be handled
  as such.
  arg0 = guest VA

xdt:shadow::prealloc-unpin
  Unpin pages as part of a shadow page table pre-allocation.
  arg0 = shadow frame number
  
xdt:shadow::resync-full
  Re-sync all entries on an out-of-sync shadow page table page (was allowed
  to go out of sync as an optimization)
  arg0 = guest frame number

xdt:shadow::resync-only
  Resync only one out-of-sync entry in the shadow page tables.
  arg0 = guest frame number

xdt:shadow::wrmap-bf
  Brute force search of shadow page tables to remove write access.
  arg0 = guest frame number

xdt:pm::freq-change
  Power management frequency change
  arg0 = old frequency
  arg1 = new frequency

xdt:pm::idle-entry
  Power management: entering idle state.
  arg0 = C-state
  arg1 = time

xdt:pm::idle-exit
  Power management: exiting idle state
  arg0 = C-state
  arg1 = time

xdt:trace::records-lost
  Trace records were lost inside Xen because of a buffer overflow
  arg0 = current domain
  arg1 = crrent vcpu
  arg2 = number of lost records
  arg3 = Xen system timestamp at time of first lost record
Tags:
Created by pcotten on 2010/04/08 20:34
Last modified by pcotten on 2010/04/08 20:34

XWiki Enterprise 2.7.1.34853 - Documentation