Introduction

To help observe what's going on during the build, we have written a script that
samples the CPU load every second, uses dtrace(1M) to determine the current work
ing directories, dmake(1) targest, executables, and system calls during the buil
d, and runs sar(1) to get some additional information about system CPU, memory,
and I/O utilization.

With the data gathered by this script, one can analyze the build at the level of
 the overall build down to a specific directory, dmake target, executable, and s
ystem call.  The timestamps of when and where each dmake(1) target starts and en
ds and the samples of the CPU load are kept in a consistent way, so they can be
correlated with each other.  All of the data can be massaged into something that
 can be analyzed visually with gnuplot or manipulated into a form that highlight
s the most important information using perl(1), awk(1), sed(1), grep(1), sort(1)
, and other data manipulation tools.

What Is Observed?

The script mentioned above runs the following:

  1. Script to sample and record idle, system, and user time every second, 
  2. dtrace(1M) script to get timestamps of where and when each dmake target starts and ends and time and counts for the directory, dmake(1) target, executables, and system calls.
  3. sar(1) to get additional information on system CPU, memory, and I/O utilization

In particuliar, the following information is gotten:

        - Total time (elapsed and CPU)

        - Idle, system, and user time sampled every second during the build

        - Timestamps of start and end of each target and its current working directory

        - Directory
          - Time (elapsed and CPU)

        - Dmake target
          - Time (elapsed)
          - Time per directory (elapsed)

        - Executables
          - Time (elapsed and CPU)
          - Time per directory (elapsed and CPU)
          - Counts (total and per directory)

        - Syscalls
          - Time (total and per executable)
          - Count (total and per executable)
          - Failure
            - Time (total and per executable)
            - Count (total and per executable)
            - Failed count of open(2) syscalls per executable with name of file

How This Can Be Used?

With this information, one can determine the following:

        - Efficiency of the build

        - CPU utilization (or lack thereof) for any part of the build and what dmake(1) targets were being built then

        - Parallelism for all or any part of the build

        - Time to build all or a specific part of the source tree by directory

        - Time spent building the targets for a specific directory

        - Time and counts that each executable was run in total or in a specific directory

        - Time and counts that each system call was executed in total or for each executable

        -  Time and counts that system calls failed in total or for each executable

Example

Using the observability explained above, we gathered a bunch of data on a non-de
bug clobber build of Solaris ON without running lint(1B) on a SunFire x4600 with
 16 CPUs and 32 gigabytes of RAM.

To see what's going on across the whole build, we massaged the timestamps of whe
n each dmake target starts and ends gotten by the dtrace(1M) script and the idle
 time sampled during the build by the load sampling script into two columns of x
 and y coordinates that gnuplot can generate into graphs.

A graph showing when each dmake target starts and ends was constructed to make it easier to see which targets take the most time and where there are many or few targets being made in parallel.  The timestamp of when each dmake target starts and ends is graphed as a line segment with the start and end times as the endpoints.  The y coordinate for each dmake target is the elapsed time for building that target.

The times for the graph showing when each dmake target starts and ends are in seconds and are correlated to match the graph showing idle time during the build.  By placing one graph above the other, one can see the CPU utilization during the build of one or more targets along with how long each target took and some indication of its parallelism like this:

Idle Times

dmake(1) targets targets")

What Have We Learned

Here are some interesting things that we learned about the build using the data gathered by

        - Most of the elapsed time for usr/src is spent in usr/src/{lib,cmd,uts} in that order with usr/src/lib accounting for a large portion of the time.

        - The dmake targets that take the most elapsed time are "install", "clobber", "all",  and "_msg" and "install" takes the most CPU time.

        - Compiler passes (eg. cc1, ube, acomp, iropt, ir2hf, fbe, cc, etc.) and tools (eg. dmake, sh, ctf{convert,merge}, etc.) are the executables that take the most elapsed and CPU time.

        - Of course, the compiler passes and tools are called a lot too, but commands like rm(1), date(1), basename(1), mcs(1), and install(1) are run a lot as well.

        - The per directory information on executables are interesting to see and may be useful to figuring out what's going in within a certain directory.

        - The system calls that take the most time are wait(2), fork(2), and ones that  manipulate files (eg. write(2), open(2), unlink(2), etc.).

        - cc1, dmake, acomp, fbe, sh, etc. fail a lot on open(2), stat(2), unlink(2), ioctl(2), etc., but most of the time spent in system call failures are in wait(2), open(2), stat(2), etc. and don't take much time

Observing Directories and Targets

Currently it is very difficult to observed what directories were visited during the build and why. Makefiles print current working directories, but it is done inconsistently, the target information is missing and it is very difficult to
see what caused traversal of a directory.

When the same directory is accessed more than once from two different places (e.g. from sun4v and sun4v directories) there is a potential of a race when
the build is happening in parallel, so identifying such potential race conditions is critical.

To do that we developed a special DTrace script
that collects the relevant information during the build. The result can then be post-processed by Perl script
which dumps this information in a human-readable form. Results are available below for usr/src/uts:

With a little bit different postprocessing script we can construct the full build timeline:

last modified by admin on 2009/10/26 13:02
Collectives
Project


© Sun Microsystems Inc. 2009
XWiki Enterprise 1.8.2.19075 - Documentation
Terms Of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines | Site map | Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.