KMDB Design Overview

Table of Contents

List of Figures

Introduction

To quote the MDB answerbook:

The Modular Debugger (MDB) is a general purpose debugging tool for Solaris whose primary feature is its extensibility.

[...]

MDB is a tool that provides a completely customizable environment for debugging these programs and scenarios, including a dynamic module facility that programmers can use to implement their own debugging commands to perform program-specific analysis. Each MDB module can be used to examine the program in several different contexts, including live and post-mortem.

mdb, the userland debugger, is designed to debug the running kernel and kernel crash dumps. It can also control and debug live user processes as well as user core dumps. kmdb extends the debugger's functionality to include instruction-level execution control of the kernel. mdb, by contrast, can only observe the running kernel.

kmdb's goal is to bring mdb's advanced debugging functionality, to the maximum extent practicable, to in-situ kernel debugging. This includes support for loadable debugger modules, debugger commands, ability to process symbolic debugging information, and the various other features which make mdb so powerful.

kmdb is often compared with tracing tools like DTrace. DTrace is designed for tracing in the large -- for safely examining kernel and user process execution at a function level, with minimal impact upon the running system. kmdb, on the other hand, grabs the system by the throat, stopping it in its tracks. It then allows for micro-level (per-instruction) analysis, giving the user the ability to observe the execution of individual instructions, and allowing them to observe and change processor state. Whereas DTrace spends a great deal of energy trying to be safe, kmdb scoffs in the face of safety, letting developers wreak unpleasantness upon the machine in furtherance of the debugging of their code.

Terms

Throughout this document, MDB is used to describe the common debugger core -- the set of functionality common to both mdb and kmdb. mdb is used to refer to the userland debugger. kmdb is used to refer to the in-situ kernel debugger.

Layout

The best way to understand kmdb is by first understanding how mdb does things. We begin by giving an overview of the portions of mdb that are relevant to our later discussion of kmdb. Readers requiring more information about mdb and its operation are advised to consult the Modular Debugger AnswerBook. Having set the stage, we proceed to a discussion of the major design goals behind kmdb. With those goals in mind, we return to the list of components we discussed from an mdb perspective, analyzing them this time from the point of view of kmdb, showing how their implementation fulfills kmdb's design goals. Finally, we embark upon a whirlwind tour of some of the lower-level components of kmdb which weren't described in earlier sections.

MDB components and their implementation in mdb

In this section, we review the parts of MDB that are particularly relevant for our later discussion of kmdb, focusing on how those components are implemented in mdb. That is, we concentrate only on those components whose implementation changes significantly in kmdb. The design of MDB is sufficiently modular that the components requiring change were able to be replaced without disrupting the remainder of the debugger. The components described are shown in Figure 1.

The target layer

The MDB answerbook describes targets as follows:

The target is the program being inspected by the debugger. [...] Each target exports a standard set of properties, including one or more address spaces, one or more symbol tables, a set of load objects, and a set of threads.

Targets are implemented using an ops vector, with each target implementing of subset of the functions in the vector. In-situ targets, such as the user process, or proc, target, will implement virtually all operations. Targets used to debug entities whose execution cannot be controlled, such as the kvm target used for crash dump analysis, implement a smaller subset of the operations. As with many other parts of MDB, the targets are modular, and are designed to be easily replaceable depending on the requirements of the debugging environment.

Figure 1 shows three of the targets used by mdb. The first is the proc target, which is used for the debugging and control of user processes as well as the analysis of user core dumps. The proc target is implemented on top of libproc, which provides the primitives used for process control. The interfaces provided by libproc simplify the implementation of the proc target by hiding the differences between in-situ and post-mortem debugging (one is done with a live process, while the other uses a core file). The target itself is largely concerned with mapping the requests of the debugger to the interfaces exposed by libproc.

Also shown in Figure 1 is the kvm target, which is used for both live and post-mortem kernel debugging. Like the proc target, the kvm target uses a support library (libkvm) to abstract away the differences between live and post-mortem debugging. While the capabilities of the kvm and proc targets are largely the same when used for post-mortem debugging, they differ when the subjects are live. The proc target provides full control of process execution, while the kvm target allows only the inspection and alteration of kernel state. Allowing the debugger to control the execution of the kernel that is responsible for running the debugger would be difficult at best. Consequently, most debugging done with the kvm target is of the post-mortem variety.

The third target shown in Figure 1 is used for the "debugging" of raw files. This allows the data-presentation abilities of mdb to be brought to bear upon flat (usually binary) files. This target lays the foundation for the eventual replacement of something like fsdb, the filesystem debugger.

Debugger module management

Todays kernels are made up of a great many modules, each implementing a different subsystem, and each requiring different tools for analysis and debugging. The same can be said for modern, large-scale user processes, which can incorporate tens or even hundreds of shared libraries and subsystems. A modern modular debugger should, therefore, allow for the augmentation of its basic tool set as needed. MDB allows subsystem-specific debugging facilities to be provided via shared objects known as debugger modules, or dmods. Each dmod provides debugging commands (also known as dcmds) and walkers (iterators) which are used to debug a given subsystem. These modules interface with MDB via the module API layer, and use well-defined interfaces for data retrieval and analysis. This is enforced by the fact that, in the case of both major targets (kvm and proc), the debugger runs in a separate address space from the entity being analyzed. The dcmds are therefore forced to use the module API to access the target. While some dmods link with other support libraries to reduce the duplication of code, most dmods are standalone, consuming only the header files from the subsystems they support.

While the core debugger uses its own code for the management of debugger modules and their metadata, it relies upon a system library, libdl, for the mechanics of module unloading and unloading. It is libdl, for example, which knows how to load the dmod into memory, and it is libdl which knows how to integrate that dmod into the debugger's address space.

Terminal I/O

MDB was designed with an eye towards the eventual implementation of something like kmdb, and thus performs most terminal interaction directly. Having built up a list of terminal attributes, MDB handles cursor and character manipulation directly. The MDB subsystem which performs terminal I/O is known as termio.

While termio handles a great deal itself, there is one aspect of terminal management that is provided by a support library. MDB uses libcurses to retrieve the list of terminal attributes for the current terminal from the terminfo database. The current terminal type is retrieved from the environment variable TERM.

Other Stuff

MDB is a large program, with many more subsystems than are described here. One of the benefits arising from the modular design of the debugger is the fact that these other subsystems don't need to change even when used in an environment as radically different as kmdb is from mdb. For example, MDB implements its own routines for the management of ELF symbol tables. ELF being ELF regardless of source, the same subsystem can be used, as-is, in both mdb and kmdb. A description of the MDB subsystems unaffected by kmdb is beyond the scope of this document.

Major kmdb design decisions

The Kernel/Debugger Interface (KDI)

When one implements an in-situ kernel debugger, one must determine the extent to which the debugger will be intermingled with the kernel being debugged. Should the debugger call kernel functions to accomplish its duties, or should the debugger be entirely self-contained. The legacy Solaris in-situ kernel debugger, kadb, hewed to the latter philosophy to a significant extent. The kadb module was as self-contained as possible, to the point where it contained copies of certain low-level kernel routines. That said, there were some kernel routines to which kadb needed access. During debugger startup, it would search for a number of functions by name, saving pointers to them for later use.

There are a number of problems with kadb's approach. First of all, by duplicating low-level kernel code in the debugger, duplication is introduced. Furthermore, this duplication, due to the layout of the Solaris source code, results in the copies being significantly separated. It's hard enough to maintain code rife with duplication when the duplicates are co-located. Maintaining duplicates located in wildly disparate locations is next to impossible. During initial analysis of kadb as part of the kmdb project, we discovered several duplicated functions in kadb that had not kept up with hardware-specific changes to the versions in the kernel. The second problem concerns the means by which kadb gained access to the kernel functions it did use. Searching for those functions by name is dangerous, as it leaves the debugger vulnerable to changes in the kernel. A change in the signature of a kernel function used by kadb, for example, would not be caught until kadb failed while trying to use said function.

To some extent, duplication is required due to the nature of a kernel debugger. The kernel debugger cannot, for example, hold locks, and therefore requires lock-free versions of any kernel code that it must call. The lock-free version of a function may not be safe when used in a running kernel context, and therefore must be kept separate from the normal version. Rather than placing that duplicate copy within the debugger itself, we decided to co-locate the duplicate with the original. This reduces the chances of code rot, as an engineer changing the normal version is much more likely to notice the debugger-specific version sitting right next to it.

Access to kernel functionality was formalized through an interface known as the KDI, or Kernel/Debugger Interface. The KDI is an ops vector through which all kernel function calls must pass. Each function called by the debugger has a member in this vector. Whereas an assessment of kernel functionality used by kadb required a search for symbol lookup routines and their consumers, a similar assessment in kmdb simply requires the review of the single ops vector. Furthermore, our use of an ops vector allowed us to use the compiler to monitor the evolution of kernel functions used by kmdb. Any change to a KDI function significant enough to change the function signature will be caught by the compiler during the initialization of the KDI ops vector. Furthermore, the initialization of said vector is easily visible to code analysis tools such as cscope, allowing engineers to quickly determine whether kmdb is a consumer of a given function. With kadb, such a check would require a check of the symbol lookup routines, something that is not automatically done by the code analysis tools used today.

Implementation as a kernel module

kadb was implemented as a standalone. In Solaris, this means that the kadb module was an executable, directly loadable by the boot loader. It had no static dependencies on other modules, thus leading to the symbol lookup problems discussed above. When the use of kadb was requested, the boot process ran something like this:

  1. Boot loader loads kadb
  2. kadb initializes
  3. kadb loads normal standalone, unix
  4. kadb loads unix's interpreter, krtld
  5. kadb passes control to krtld
  6. krtld loads unix's dependencies (genunix, CPU module, platform module, etc)
  7. krtld transfers control to unix

While this allowed the debugger to take early control of the system (it could debug from the first instruction in krtld), that ability came with some significant penalties. The decision to load a 32-bit or 64-bit kernel being made after kadb had loaded and initialized, kadb had to be prepared to debug either variety. kadb's need to execute prior to the loading of unix itself meant that it could not use any functions located in the kernel until the kernel was loaded. While some essential functions were dynamically located later, the result of this restriction was the location of many low-level kernel functions in the debugger itself. A further penalty comes in the form of increased debugger complexity. kadb's need to load unix and krtld requires that it know how to process ELF files, and to load modules into the address space. The boot loader already needs to know how to do that, as does krtld. With kadb as a standalone, the number of separate copies of ELF-processing and module-loading code goes up to three.

The remaining limitations have to do with the timing of the decision to load kadb. As stated above, kadb was a standalone, and standalones can only be loaded at boot. As such, the administrator was required to decide, prior to reboot, whether to load kadb. Once loaded, it could not be unloaded. While the inability to unload the debugger isn't a major limitation, the inability to dynamically load it is. Not knowing whether kadb would be needed during the life of a given system boot, administrators would be faced with an unfortunate choice. On the one hand, they could always load kadb at boot. This kept it always ready for use, but at the cost of the wiring down of a chunk of kernel address space. This could be avoided, of course, by making the other choice -- not loading the debugger at boot. Administrators then ran the risk of not having the debugger around when they needed it.

kmdb's implementation as a normal kernel module solves all of these problems, with only a minor activation-time penalty compared to kadb. When kmdb is loaded at boot, the boot process looks something like this:

  1. Boot loader loads unix
  2. Boot loader loads unix's interpreter, krtld
  3. Boot loader passes control to krtld
  4. krtld loads unix's dependencies (genunix, CPU module, platform module, etc)
  5. krtld loads kmdb
  6. krtld transfers control to unix

As shown above, kmdb loads after the primary kernel modules have been selected and loaded. kmdb can therefore assume that it will be running with the same bit-width as that of the underlying kernel. That is, a 32-bit kmdb will never have to deal with a 64-bit kernel and vice versa.

By loading after the primaries, kmdb can have static symbol dependencies upon the other primary kernel modules. It is this ability that allows the KDI to exist. Even better, kmdb can rely on krtld's selection of the proper CPU and platform modules for this machine. Rather than having to carry around several processor-specific implementations of the same function (or compiling one module for each of four platform types, as kadb did), kmdb can, using the KDI, simply use the proper implementation of a given function from the proper module. When a new platform-specific KDI function is implemented, the developer implements it in a platform-specific way in each platform module. krtld selects the proper platform module on boot, and kmdb automatically ends up using the proper version for the host machine.

Last but certainly not least, kmdb's implementation as a normal kernel module allows it to be dynamically loaded and unloaded. It can still be loaded at boot, but it can also be loaded on-demand by the administrator. If dynamically loaded, it can also be unloaded when no longer needed. This can be a consolation to wary administrators who would otherwise object to the running of a kernel debugger on certain types of machines.

The only disadvantage of the use of a normal kernel module versus a standalone is the loss of the ability to debug the early stages of krtld. In practice, this has not turned out to be a problem, as the early stages of krtld are fairly straightforward and stable.

Every attempt has been made to minimize the effects of the two load types (boot and runtime). Obviously initialization differs in some respects, a number of common kernel subsystems simply won't be available during the initialization of boot-loaded kmdb. Largely, though, these differences are dealt with under the covers, and are not visible to the user.

The structure of kmdb

The inner workings of kmdb are best understood by first reviewing the debugger's external structure. kmdb's external structure is dictated, to some extent, by the environments in which it will be used. Those requirements are:

To satisfy the first two requirements, kmdb exists as two separate kernel modules. The first, misc/kmdbmod, contains the meat of the debugger, and is the module loaded by krtld when kmdb is loaded at boot. The second module, drv/kmdb, exists solely to gather property values from the device tree and to present an ioctl-based interface to controlling userland programs such as mdb(1). When kmdb is to be loaded at runtime, mdb opens /dev/kmdb, and uses the ioctl interface to command it to activate. The opening of /dev/kmdb causes drv/kmdb to load. drv/kmdb has a dependency on misc/kmdbmod, which gets loaded as well. Upon receipt of the appropriate ioctl, drv/kmdb calls into misc/kmdbmod, and the debugger is initialized.

If the debugger was loaded at boot, only misc/kmdbmod will be loaded. The module loading subsystem will not have fully initialized at that point. Userland does not exist yet, and given that drv/kmdb exists only to convey ioctl requests from userland to misc/kmdbmod, there is no need to force drv/kmdb to load until an attempt is made to open /dev/kmdb. When someone does attempt to control the debugger via ioctls to /dev/kmdb, drv/kmdb will load. It will then then send commands to misc/kmdbmod as in the runtime case above.

We now focus our attention more tightly upon misc/kmdbmod, which itself is composed of two parts. The first, referred to as the debugger, contains the core debugger functionality, as well as the primary subsystems needed to allow the core to control the kernel. The second, referred to as the controller, interacts with the running kernel.

The debugger interacts with the outside world only through a set of well-defined interfaces. One of these is the KDI; the other is composed of a set of functions passed during initialization by the controller. Aside from these interactions, the debugger must, by nature, function as a fully self-contained entity. Put in compilation terms, the debugger, which is built separately from the controller, must not have any unresolved symbols at link time. It is the debugger, and only the debugger, which is active when kmdb has control of the machine.

Behind the scenes, as it were, the controller works to ensure that the debugger's runtime needs are met. The debugger's limited set of direct interactions with the kernel, and the fact that it can only be active when the world has stopped, necessarily limits the sorts of things that it can do. For example, it can neither perform the early stages of kmdb initialization, nor can it load or unload kernel modules.

The former takes place before debugger initialization starts, and is taken care of by the controller. A memory region, known as Oz, is allocated and is set aside for use by the debugger. Other initialization tasks performed by the controller include the creation of trap tables or IDTs, as appropriate, after which control is passed to the debugger for the completion of initialization.

Kernel module loading and unloading, which will be discussed in more detail below, is a task that must be performed by the running kernel. The debugger must rely on the controller to perform these sorts of tasks for it.

In the text that follows, we will use the words driver, debugger, and controller to refer to the components we've just discussed. These three components are indicated in Figure 2 by regions surrounded by dotted lines. When we discuss the entire entity, we refer to it as kmdb. References to the core debugger refer to the set of blue boxes labeled MDB. One unfortunate note: the term "controller" is a relatively recent invention. In many instances, the source code refers to the driver when it means the controller. This doesn't cause nearly as many issues as one might imagine, due to the minor role played by the entity we refer to as the driver.

MDB components and their implementation in kmdb

We will now use our earlier discussion of mdb to motivate our review of the major subsystems used by kmdb. Recall that the three subsystems discussed were the target layer, module management, and terminal management (termio). The implementation of kmdb is largely the story of the replacement of support libraries with subsystems designed to work in kmdb's unique environment.

Figure 2 shows how these replacement subsystems relate to the core debugger

The target layer

The target layer itself is unchanged in kmdb. What changes is the target implementation itself. Gone are the proc, kvm, and file targets, replaced with a single target called kmdb_kvm. We'll continue to call it kmdb_kvm to avoid confusion with the kvm target used by mdb.

kmdb_kvm can be thought of as a hybrid of the proc and kvm targets. It includes the execution control aspects of proc, such as the ability to set breakpoints and watchpoints, as well as support for single-stepping, continuation, and so forth. This functionality is coupled with the kernel-oriented aspects of the kvm target. The kmdb_kvm target is common between SPARC and x86, and for the most part handles the bits of kernel analysis, management, and control that are generic to the two architectures. With the exceptions of stack trace construction and the display of saved registers, all architecture-specific functionality is abstracted away into the DPI. The DPI's relationship to kmdb_kvm is very similar to that of libkvm to the kvm target, or to that of libproc to the proc target.

A significant portion of kmdb_kvm is devoted to the monitoring of kernel state. As an example, target implementations are required to provide symbol lookup routines for use by the core debugger. Provision of this information requires access to kernel module symbol tables, which are easily accessed by kmdb_kvm. What is not so simple, however, is dealing with the constant churn in the set of loaded modules. Whenever kmdb regains control of the machine, kmdb_kvm scans the entire module list, looking for modules that have loaded or unloaded. Modules which have unloaded will have their kmdb_kvm tracking state (symbol table references, and so forth) destroyed, while modules which have been loaded have new state created. Challenges arise when a module has unloaded and then loaded again since kmdb last had control. This churn must be detected, and tracking state rebuilt.

The tracking of module movement, for lack of a better term, illustrates the interaction between the debugger and the controller. While the debugger could certainly rescan the entire list upon every entry, this would be wasteful. Instead, the controller subscribes to the kernel's module change notification service, and bumps a counter whenever a change has occurred. kmdb_kvm can, upon re-entry, check the value of that counter. If the value has changed since kmdb_kvm last saw it, a module list rescan is necessary.

While this interaction with the controller results in a useful optimization for module state management, it becomes crucial for the management of deferred breakpoints. Deferred breakpoints are breakpoints requested for modules that haven't yet loaded. The user's expectation is that the breakpoint will activate when the named module loads. The debugger is responsible for the creation, deletion, enabling, disabling, activation, and deactivation of breakpoints. The user creates the breakpoint using the breakpoint command (::bp). This being a deferred breakpoint for a module that hasn't been loaded, the debugger leaves the breakpoint in a disabled state. When that module has loaded, the breakpoint is enabled. Enabled breakpoints are activated by the debugger when the world is resumed. The activation is what makes the breakpoint actually happen. In kmdb_kvm, the DPI is used to install a breakpoint instruction at the specified virtual address. The key design question: how do we detect the loading of the requested module?

The simplest, cleanest, and slowest approach would be to have kmdb_kvm place an internal breakpoint on the kernel's module loading routine. Whenever a module loaded, the debugger would activate, would check the identity of the loaded module, and would decide whether to enable the breakpoint. Debugger entry isn't cheap. All CPUs must be stopped, and their state must be saved. This particular stop would happen after a module load, so we'd need to rescan the module list. All in all, this is something that we really don't want to have to do every time a module is loaded or unloaded.

If we involve the controller, we can eliminate the unnecessary debugger activations, entering the debugger only when a module named in a deferred breakpoint is loaded or unloaded. How do we do this? We bend the boundaries between the debugger and controller slightly, exposing the list of deferred breakpoints to code which runs when the world is turning. Tie this into the controller's registration with the kernel's module change notification service, and we end up entering the debugger only when a change has occurred in a module named in a deferred breakpoint. We use a quasi-lock-free data structure to allow access to the deferred breakpoint list both from within the debugger (when the world is stopped) and within the module change check (when the world is running).

kmdb_kvm is also, as are the proc and kvm targets, home to dcmds that could not be implemented anywhere else. Implemented in the target, they have access to everything the target does, and can thus do things that dcmds implemented in dmods could only dream of doing. As implied above, kmdb_kvm (as well as kvm and proc) implement dcmds which provide stack tracing and register access.

Debugger module management

As discussed earlier, mdb uses libdl for the management of dmods, which are implemented as shared objects. kmdb's implementation is similar, only we don't have libdl. Nor does the debugger have the way to actually load or unload modules. Other than that, they're the same.

We decompose module management into two pieces: the requesting of module loads and unloads, and the implementation of a libdl replacement atop the results of the loading and unloading.

Module loads and unloads: the work request queue (WR)

kmdb implements debugger modules as kernel modules. While we engage in some sleight of hand to keep the dmods off the kernel's main module list, the mechanics of loading and unloading dmods is largely the same as that used for "normal" kernel modules. The primary difference is in the means by which a load or unload is requested. Recall that the debugger, which will receive the load or unload request from the user, is only able to run when the world is stopped. Also note that the loading or unloading of a kernel module is a process which uses many different kernel subsystems. The kernel runtime linker (krtld), the disk driver, VM system, filesystem and many others come into play. Use of these subsystems of course entails the use of locks, threads, and various other things which are anathema to the debugger.

In order to load a dmod, the debugger must therefore ask the controller to do it. The controller runs when the world is turning, and is more than capable of loading and unloading kernel modules. The only thing we need is a channel for communication between the two. That channel is provided by the Work Request Queue, or WR. The WR is comprised of two queues: one for messages from the debugger to the controller, and one for messages from the controller to the debugger. The rough sequence of events for a module load is as follows:

  1. User requests a dmod load with ::load
  2. kmdb module layer receives the request, and passes it along to the WR debugger->controller queue
  3. The world is resumed
  4. The controller receives the request
  5. The controller loads the module
  6. The controller returns the requests to the debugger as a (successful) reply on the controller->debugger queue
  7. The controller initiates a debugger re-entry
  8. The debugger receives the reply, and makes the contents of the dmod available to the debugger core.

A few details bear mentioning.

The debugger can be activated at any time -- even in the midst of the controller's processing of a load request. The controller must keep this in mind when checking and manipulating the WR queues. The queues themselves are lock-free, and have very strict rules regarding the methods used to access them. For example, the controller may only add to the end of the controller->debugger queue. It sets the next pointer on its request and updates the tail pointer for the queue. Even though the queue is doubly-linked, there's no easy way for the controller, which may be interrupted at any time by the debugger, to set the prev pointer. Accordingly, the debugger's first action upon preparing to process the controller->debugger queue is to traverse it, from tail to head, building the prev pointers. The debugger doesn't have to worry about being interrupted by the controller, and can thus take its time. Similar rules are in place for the debugger->controller queue.

Every request must be tracked, and must be sent back as a reply at some point. Even fire-and-forget requests, such as those establishing new module search paths, must be returned as replies, even if those replies don't come until the debugger is unloaded. To see why this is necessary, consider the source of the memory underlying the requests. Requests from the debugger are allocated from debugger memory using the debugger's allocator, and can thus only be freed by the debugger. Requests initiated by the controller (for example, an automatic dmod load triggered by the loading of the corresponding kernel module) are allocated by the controller from kernel memory, and can thus only be freed by the kernel. Replies therefore serve a dual purpose -- they provide status to the requester, but also return the request to the requester for freeing.

We'd like to minimize the impact of the debugger on the running system to the extent practicable, and as such don't want the controller to poll for updates to the WR queues. Instead, we want the debugger to tell the controller when work is available for processing. This isn't as simple as it may seem. In the real world, we'd use semaphores or condition variables to signal the availability of work. To use kernel synchronization objects, the debugger would need to call into the kernel to release them. The kernel is most definitely not prepared for a cv_broadcast() call with every CPU stuck in the debugger. Unpleasantness would ensue. The lightest-weight way to communicate with the controller is to post a soft interrupt, the implementation of which is essentially the setting of a bit in the kernel's cpu_t structure. Normal interrupt processing will, when the world has resumed, encounter this bit, and will call the soft interrupt handler registered by the controller. That handler bangs on a semaphore, which triggers the controller's WR processing. Note that these problems apply only for communications from the debugger to the controller. The debugger can simply poll for messages sent in the opposite direction. The debugger being activated relatively infrequently, the occasional check of a message-waiting bit doesn't impose a burden. When the user requests a debugger activation, the last thing on their mind is whether or not the debugger is wasting a few cycles to check for messages.

libdl provides a synchronous loading and unloading interface to mdb, thus considerably simplifying its management of dmods. kmdb has no such luxury. As the reader might surmise from the preceding discussion, kmdb's loading and unloading of dmods is decidedly asynchronous. Every attempt is made to preserve the user's illusion of a blocking load, the asynchronous nature occasionally pokes its head into the open. A breakpoint encountered prior to the completion of the load, for example, will cause an early debugger re-entry. The user is told that a load or an unload are still pending, and is told how to allow them to complete.

libdl wrapper

MDB's dmod management code uses the libdl interfaces for manipulating dmods. dlopen() is used to load modules, dlclose() is used to unload them, and dlsym is used for symbol lookup. The debugger implements its own versions of these functions (using the same function signatures) to support the illusion of libdl. Underneath, the debugger's symbol table facilities are retargeted to implement dlsym()'s searches of dmod symbol tables.

Terminal I/O

To implement terminal I/O handling, three things are needed: access to the terminal type, the ability to manipulate that terminal, and routines for actually sending I/O to and from that terminal. The second of these can be further subdivided into the retrieval of terminal characteristics and the use of that knowledge to manipulate the terminal. MDB implements the most difficult of these -- the routines that actually manipulate the terminal based on the gathered characteristics. MDB handles the tracking of cursor position, in-line editing, the implementation of a parser, and knows how to use the individual terminal attributes (echo this to make the cursor move right, echo that to enable bold, etc) to accomplish those tasks.

Left to mdb and kmdb are terminal type determination, attribute retrieval, and I/O to the terminal itself. For mdb, this is relatively straightforward. The terminal type can be gathered from the environment, attributes can be retrieved from the terminfo database using libcurses, and I/O accomplished using stdin, stdout, and stderr.

kmdb, as is its wont, has a more difficult time of things. There is no environment from which to gather the current terminal type. There's no easy access to the terminfo database. Completing the trifecta, the I/O methods vary with the type of platform, progress of the boot process, and phase of the moon. As a bonus, kmdb's termio implementation handles interrupt (^C) processing. We'll discuss each in turn. While the preceding sections had happy endings, in that pleasing solutions were found for the enumerated problems, the reader is warned that there are no happy endings in terminal management. Tales of wading through terminal types, to say nothing of the terminfo/termcap databases, are generally suitable only for frightening small children, and always end in woe and the gnashing of teeth.

Retrieving the terminal type

At first glance, gaining access to the terminal type would seem straightforward. Sadly, no. kmdb can be loaded at boot or at runtime. It can be used on a locally-attached console/framebuffer, or it can be used via a serial console. If loaded at runtime, the invocation could be made from a console login, or it could be made from an rsh (or telnet or ...) session. Boot-loaded kmdb on a serial console is the worst, as we have no information regarding the type of terminal attached to the other end of the serial connection. We end up assuming the worst, which is a 80x24 vt100. Boot-loaded kmdb on a machine with a locally-attached console or framebuffer is easier, because we know the terminal type and terminal dimensions for SPARC and x86 consoles. Also easy is a runtime-loaded kmdb from a console login. Assuming that the user set their terminal type correctly, we can use the value of the TERM environment variable. But unfortunately we can't trust $TERM to be set correctly, so we'll ignore $TERM if the console is locally-attached. We end up with a pile of heuristics, which generally come up with the right answer. If they don't, they can always be overridden.

Terminal attributes

After considering the mess that is access to $TERM, retrieval of terminfo data is almost trivial. We don't want to compile in a copy of the terminfo database, and can't rely on the ability to gain access to it while the debugger is running. We compromise by hard-coding a selection of terminal types into the debugger. The build process extracts the attributes for each selected terminal from the terminfo database, and compiles them into the debugger. Terminal type selection in kmdb is thus limited to the types selected during the build. It turns out, though, that the vast majority of common terminal types can be covered by a set of 15 terminal types.

Console I/O

Access to the terminal entails the reading of input, the writing of input, and the retrieval of hardware parameters (terminal size and so forth), generally through an ioctl-based interface. MDB's modular I/O subsystem makes our job somewhat easier. Each I/O module provides an ops vector, exposing interfaces for reading, writing, ioctls, and so forth. kmdb has its own I/O module, called promio. promio acts as a front end for promif, which we'll discuss in a moment. For the most part, promio is a pass-through, with the exception of the ioctl function. promio interprets the ioctls sent from termio, and invokes the appropriate promif functions to gather the necessary information. In addition to the aforementioned terminal size ioctl (TIOCGWINSZ), promio's ioctl handler is prepared to deal with requests to get (TCGETS) and set (TCSETSW) hardware parameters. The parameters of interest to kmdb are largely concerned with echoing and newlines.

promif interfaces the debugger with the system's OpenBoot PROM (OBP). While x86 systems don't have PROMs, Solaris (and thus kmdb) try very hard to pretend that they do. For the most part, this means functions called prom_something(), named to mimic their SPARC counterparts. Whereas the SPARC versions jump into OBP, the x86 versions do whatever is necessary to implement the same functionality without a PROM. promif exposes two classes of interface: those which deal with console (terminal) I/O, and those which are merely wrappers around PROM routines. We'll cover the former group here.

Both SPARC and x86 systems get help from the boot loader (OBP on SPARC) for console I/O during the initial stages of boot. SPARC systems without USB keyboards are able to use OBP for console I/O even after boot. x86 systems, and SPARC systems with USB keyboards, use a kernel subsystem known as polled I/O. Exposed to kmdb via the KDI, polled I/O is a method for interacting directly with the I/O hardware, be it a serial driver, the USB stack, or something completely different without blocking. Rather than waiting for interrupts, as can be done while the world is turning, the polled I/O subsystem is designed to poll I/O devices until input is available or output has been sent. The bottom line is that the method used for console I/O changes during the boot process. The portion of promif dedicated to console I/O hides this complexity from consumers, exposing only routines for reading and writing bytes. Consumer need not concern themselves with where those bytes come from or go to.

Interrupt (^C) management

Given that kmdb console I/O is synchronous, there is no easy way for an interrupt (^C) from the user to get to the core debugger. In userland, the kernel detects interrupts asynchronously, generates a signal, and inflicts it upon the process. There is no parallel in kmdb. The debugger doesn't know about pending interrupts until it reads the interrupt character from the keyboard. With a simplistic I/O implementation, reading only when we need to, the user would never be able to interrupt anything.

promif works around this limitation by implementing a read-ahead buffer. That buffer is drained when the debugger needs input from the user. It is filled whenever input is available using a non-blocking reader. Attempts are made to fill the buffer whenever input is requested, there's data to be output, or an attempt is made to read or write the kernel's address space. If an interrupt character is discovered during a buffer fill, control passes to the interrupt-handling routine, which halts the command that was executing. Debugger commands that aren't constantly writing to the console, reading from the kernel, or writing to the kernel are very rare (and probably of questionable utility). In practice, this means that a buffer fill attempt will be made soon after the user presses ^C. As a future enhancement, we could, barring the implementation of an asynchronous interrupt-delivery mechanism, expand the number of fill points. In practice, though, this doesn't seem like it'd be necessary.

Conclusion

A significant portion of the design and implementation of kmdb was spent filling in the gaping holes left when mdb was separated from its supporting libraries. Certainly one doesn't realize how much is provided by those supporting libraries until one attempts to take them away. These gaps were filled by replacement subsystems whose operations were complicated by the restrictive environment in which kmdb operates. The balance of kmdb's implementation was spent in the development of the KDI functions and in the implementation of the DPI, more on which below. The DPI provides the low-level code that allows the remainder of kmdb to be largely architecture-neutral.

Remaining components

The Debugger/PROM Interface (DPI)

The DPI has a somewhat sordid history, the twists and turns of which have influenced the way it appears today.

kadb on x86, having no PROM, did everything itself. The SPARC version on the other hand, was dependent upon a great many services provided by OBP. OBP provided trap handling for the debugger. It also took care of debugger entry, the saving of a portion of processor state, among other things.

kmdb was initially planned to be released in conjunction with an enhanced OBP. This new OBP would provide more sophisticated debugging facilities, thus freeing kmdb from having to deal with many low-level, hardware-specific details. For example, the new OBP would manage software breakpoints itself. It would capture and park processors during debugger execution. It would also manage watchpoints.

Recognizing that not all systems would have this new OBP, kmdb was initially designed with a pluggable interface which would allow for its use on systems with both types of OBP. That interface is called the Debugger/PROM Interface, or DPI. On SPARC, there would be one module for the old-style OBP interface, which we called the kadb-style interface (or kaif). SPARC would have a second module for the new-style OBP interface, the name for which has been lost in the sands of time. The debugger would choose between the two modules based on an assessment of OBP features. x86 systems would have a single module, also called kaif.

Some time into the implementation of kmdb (well after the terms DPI and kaif had cemented themselves throughout the source code), the plans for the new-style OBP were dropped. This turned out to be for the best, the reasons for which are beyond the scope of this document. As a result, modern-day kmdb has one module for each architecture. The intervening layer, the DPI, is not strictly necessary. It may not have been invented had it not been for our earlier plans to accommodate multiple styles of OBP interaction. It remains, though, and serves as a useful repository for some functionality common to the two kaif implementations.

The bulk of the kaif module is devoted to the performance of the following five tasks:

  1. Coordination of debugger entry
  2. Manipulation of processor state
  3. Source analysis for execution control
  4. Management of breakpoints and watchpoints.
  5. Trap handling

Coordination of debugger entry

kmdb is single-threaded, and establishes a master-slave relationship between the CPUs on the machine. The first CPU to encounter an event which triggers debugger entry, such as a breakpoint, watchpoint, or deliberate entry becomes the master. The master then cross-traps the remaining CPUs, causing them to enter the debugger as slaves. Slaves spin in busy loops until the world is resumed or until one of them switches places with the master. If multiple CPUs encounter debugger entry events at the same time, and thus race for debugger entry, only one will win. The first to grab the master lock will win, with the remainder becoming slaves.

Manipulation of processor state

When processors enter the debugger, they save their register state into per-processor save areas. This state is then exposed to the user of the debugger. The kaif module coordinates the saving of this state, and also implements the looking routines that allow for its retrieval.

Source analysis for execution control

MDB supports a number of execution control primitives. In addition to breakpoints and watchpoints, which will be discussed shortly, it provides for single-step, step over, step out, and continue. Single-step halts execution at the next instruction. Step over is similar, except for the fact that it does not step into subroutines. That is, it steps to the next instruction in the current routine. Step out steps to the next instruction in the calling routine. Continue resumes system execution on all processors (single-step only resumes execution on the processor being stepped).

Single-step is implemented directly by the kaif module. On x86, this entails the setting of EFLAGS.TF. On SPARC, we set breakpoints at next possible execution points. If the next instruction is a branch, for example, we may have to set two breakpoints to cover both possible results of the branch.

Step over and step are implemented independently of single-step. For step over, MDB calls into the target, which calls into the DPI and kaif, asking whether the next instruction requires special processing. If the next instruction is a call, the kaif returns with the address of the instruction after the call. MDB will place a breakpoint at that location, and will use continue to "step" over the call. If the next instruction is not a call, the kaif module will indicate as such, and MDB will use normal single-step. When the user requests a set out, MDB will, via the target and the DPI, request that the kaif module locate the next instruction in the calling function.

Whereas single-step releases a single processor to execute a single instruction, continue releases all processors, and fully resumes the world. Continue also posts the soft interrupt to the controller, if necessary, in support of debugger module management.

Management of Breakpoints and Watchpoints

Both SPARC and x86 rely on software breakpoints. That is, a specific instruction (int $3 on x86, and ta 0x7e on SPARC) is written at a given location. When control reaches that location, the debugger is entered. Breakpoints are activated by installing one of these instructions, and are deactivated by restoring the original instruction.

Watchpoints are implemented by hardware on both platforms. Space on processors being at a premium, and watchpoints being relatively rarely used (though oh so helpful), processors don't provide many of them, and impose restrictions on the ones they do. SPARC, for example, has two watchpoints -- one physical and one virtual. SPARC watchpoint sizes are restricted to 8 bytes or any non-zero power of 256. x86 implements four watchpoints, even allowing watchpoints on individual I/O port numbers, but imposes restrictions on their size and access type. Watchpoints are activated by writing to the appropriate hardware registers, and are deactivated by clearing those registers. The kaif ensures that the target only activates the supported number of watchpoints. It also checks to make sure that the watchpoints requested meet the hardware limitations. No attempt is made to synthesize more flexible watchpoints.

Trap handling

On SPARC, kmdb has drastically reduced its dependency upon OBP as the project has progressed. This is somewhat ironic in light of our earlier attempts to increase that dependency. Whereas kadb allowed OBP to handle traps, and to coordinate entrance into the debugger, kmdb has its own trap table, handles its own debugger entry, and even handles its own MMU misses.

kmdb also installs its own trap table on x86, though the trap table there is called an IDT. Not having ever had an OBP upon which to become dependent, Solaris x86 in-situ debuggers have always handled their own traps and debugger entry.

When kmdb gains control of the machine, it switches to its trap table. When the world resumes, the trap table used prior to debugger entry is restored. While kmdb is running, traps that are immediately resolvable by the handler (MMU misses to valid addresses, for example) are handled, and control is returned to the execution stream that caused the trap. Traps that are not resolvable by the handler cause a debugger re-entry. In some cases, such as when an access is being made to the kernel's address space, the debugger takes precautions against traps resulting from those accesses. Re-entry caused by such a trap would cause control to be transferred back to the code which initiated the access, with a return code set indicating that an error occurred. Unexpected traps are signs that something has gone wrong, and are grounds for entry into a debugger fault state. The stack trace leading up to the access will be displayed, and the user will be offered the option to induce a crash dump.