Guidelines for writing ksh-93 built-in commands
David G. Korn
1. INTRODUCTION
A built-in command is executed without creating a separate
process. Instead, the command is invoked as a C function by
ksh. If this function has no side effects in the shell
process, then the behavior of this built-in is identical to
that of the equivalent stand-alone command. The primary
difference in this case is performance. The overhead of
process creation is eliminated. For commands of short
duration, the effect can be dramatic. For example, on SUN
OS 4.1, the time do run wc on a small file of about 1000
bytes, runs about 50 times faster as a built-in command.
In addition, built-in commands that have side effects on the
shell environment can be written. This is usually done to
extend the application domain for shell programming. For
example, an X-windows extension that makes heavy use of the
shell variable namespace was added as a group of built-ins
commands that are added at run time. The result is a
windowing shell that can be used to write X-windows
applications.
While there are definite advantages to adding built-in
commands, there are some disadvantages as well. Since the
built-in command and ksh share the same address space, a
coding error in the built-in program may affect the behavior
of ksh; perhaps causing it to core dump or hang. Debugging
is also more complex since your code is now a part of a
larger entity. The isolation provided by a separate process
guarantees that all resources used by the command will be
freed when the command completes. Also, since the address
space of ksh will be larger, this may increase the time it
takes ksh to fork() and exec() a non-builtin command. It
makes no sense to add a built-in command that takes a long
time to run or that is run only once, since the performance
benefits will be negligible. Built-ins that have side
effects in the current shell environment have the
disadvantage of increasing the coupling between the built-in
and ksh making the overall system less modular and more
monolithic.
Despite these drawbacks, in many cases extending ksh by
adding built-in commands makes sense and allows reuse of the
shell scripting ability in an application specific domain.
This memo describes how to write ksh extensions.
2. WRITING BUILT-IN COMMANDS
There is a development kit available for writing ksh built-
ins. The development kit has three directories, include,
lib, and bin. The include directory contains a sub-
directory named ast that contains interface prototypes for
functions that you can call from built-ins. The lib
directory contains the ast library1 and a library named
libcmd that contains a version of several of the standard
POSIX[1] utilities that can be made run time built-ins. It
is best to set the value of the environment variable
PACKAGE_ast to the pathname of the directory containing the
development kit. Users of nmake[2] 2.3 and above will then
be able to use the rule
:PACKAGE: ast
in their makefiles and not have to specify any -I switches
to the compiler.
A built-in command has a calling convention similar to the
main function of a program,
int main(int argc, char *argv[]).
However, instead of main, you must use the function name
b_name, where name is the name of the built-in you wish to
define. The built-in function takes a third void* argument
which you can define as NULL. Instead of exit, you need to
use return to terminate your command. The return value,
will become the exit status of the command.
The steps necessary to create and add a run time built-in
are illustrated in the following simple example. Suppose,
you wish to add a built-in command named hello which
requires one argument and prints the word hello followed by
its argument. First, write the following program in the
file hello.c:
#include
int b_hello(int argc, char *argv[], void *context)
{
if(argc != 2)
{
fprintf(stderr,"Usage: hello arg\n");
return(2);
}
printf("hello %s\n",argv[1]);
return(0);
}
Next, the program needs to be compiled. On some systems it
is necessary to specify a compiler option to produce
position independent code for dynamic linking. If you do
not compile with nmake it is important to specify the a
special include directory when compiling built-ins.
cc -pic -I$PACKAGE_ast/include -c hello.c
since the special version of in the development
kit is required. This command generates hello.o in the
current directory.
On some systems, you cannot load hello.o directly, you must
build a shared library instead. Unfortunately, the method
{{{____________________}}}
1. ast stands for Advanced Software Technology
for generating a shared library differs with operating
system. However, if you are building with the ATT nmake
program you can use the :LIBRARY: rule to specify this in a
system independent fashion. In addition, if you have
several built-ins, it is desirable to build a shared library
that contains them all.
The final step is using the built-in. This can be done with
the ksh command builtin. To load the shared library
hello.so and to add the built-in hello, invoke the command,
builtin -f hello hello
The suffix for the shared library can be omitted in which
case the shell will add an appropriate suffix for the system
that it is loading from. Once this command has been
invoked, you can invoke hello as you do any other command.
It is often desirable to make a command built-in the first
time that it is referenced. The first time hello is
invoked, ksh should load and execute it, whereas for
subsequent invocations ksh should just execute the built-in.
This can be done by creating a file named hello with the
following contents:
function hello
{
unset -f hello
builtin -f hello hello
hello "$@"
}
This file hello needs to be placed in a directory that is in
your FPATH variable. In addition, the full pathname for
hello.so should be used in this script so that the run time
loader will be able to find this shared library no matter
where the command hello is invoked.
3. CODING REQUIREMENTS AND CONVENTIONS
As mentioned above, the entry point for built-ins must be of
the form b_name. Your built-ins can call functions from the
standard C library, the ast library, interface functions
provided by ksh, and your own functions. You should avoid
using any global symbols beginning with sh_, nv_, and ed_
since these are used by ksh itself. In addition, #define
constants in ksh interface files, use symbols beginning with
SH_ to that you should avoid using names beginning with SH_.
3.1 Header Files
The development kit provides a portable interface to the C
library and to libast. The header files in the development
kit are compatible with K&R C[3], ANSI-C[4], and C++[5].
The best thing to do is to include the header file
. This header file causes the header, the
header and the header to be included as
well as defining prototypes for functions that you can call
to get shell services for your builtins. The header file
provides prototypes for many libast functions and
all the symbol and function definitions from the ANSI-C
headers, , , , , and
. It also provides all the symbols and
definitions for the POSIX[6] headers ,
, and . You should include
instead of one or more of these headers. The
header provides the interface to the error and option
parsing routines defined below. The header
provides the interface to the memory allocation routines
described below.
Programs that want to use the information in
should include the file instead. This provides the
complete POSIX interface to stat() related functions even on
non-POSIX systems.
3.2 Input/Output
ksh uses sfio, the Safe/Fast I/O library[7], to perform all
I/O operations. The sfio library, which is part of libast,
provides a superset of the functionality provided by the
standard I/O library defined in ANSI-C. If none of the
additional functionality is required, and if you are not
familiar with sfio and you do not want to spend the time
learning it, then you can use sfio via the stdio library
interface. The development kit contains the header
which maps stdio calls to sfio calls. In most
instances the mapping is done by macros or inline functions
so that there is no overhead. The man page for the sfio
library is in an Appendix.
However, there are some very nice extensions and performance
improvements in sfio and if you plan any major extensions I
recommend that you use it natively.
3.3 Error Handling
For error messages it is best to use the ast library
function errormsg() rather that sending output to stderr or
the equivalent sfstderr directly. Using errormsg() will
make error message appear more uniform to the user.
Furthermore, using errormsg() should make it easier to do
error message translation for other locales in future
versions of ksh.
The first argument to errormsg() specifies the dictionary in
which the string will be searched for translation. The
second argument to errormsg() contains that error type and
value. The third argument is a printf style format and the
remaining arguments are arguments to be printed as part of
the message. A new-line is inserted at the end of each
message and therefore, should not appear as part of the
format string. The second argument should be one of the
following:
ERROR_exit(n): If n is not-zero, the builtin will exit value
n after printing the message.
ERROR_system(n): Exit builtin with exit value n after
printing the message. The message will display the
message corresponding to errno enclosed within [ ] at
the end of the message.
ERROR_usage(n): Will generate a usage message and exit. If
n is non-zero, the exit value will be 2. Otherwise the
exit value will be 0.
ERROR_debug(n): Will print a level n debugging message and
will then continue.
ERROR_warn(n): Prints a warning message. n is ignored.
3.4 Option Parsing
The first thing that a built-in should do is to check the
arguments for correctness and to print any usage messages on
standard error. For consistency with the rest of ksh, it is
best to use the libast functions optget() and optusage()for
this purpose. The header included prototypes for
these functions. The optget() function is similar to the
System V C library function getopt(), but provides some
additional capabilities. Built-ins that use optget()
provide a more consistent user interface.
The optget() function is invoked as
int optget(char *argv[], const char *optstring)
where argv is the argument list and optstring is a string
that specifies the allowable arguments and additional
information that is used to format usage messages. In fact
a complete man page in troff or html can be generated by
passing a usage string as described by the getopts command.
Like getopt(), single letter options are represented by the
letter itself, and options that take a string argument are
followed by the : character. Option strings have the
following special characters:
: Used after a letter option to indicate that the option
takes an option argument. The variable opt_info.arg
will point to this value after the given argument is
encountered.
# Used after a letter option to indicate that the option
can only take a numerical value. The variable
opt_info.num will contain this value after the given
argument is encountered.
? Used after a : or # (and after the optional ?) to
indicate the the preceding option argument is not
required.
[...] After a : or #, the characters contained inside the
brackets are used to identify the option argument when
generating a usage message.
space The remainder of the string will only be used when
generating usage messages.
The optget() function returns the matching option letter if
one of the legal option is matched. Otherwise, optget()
returns
':' If there is an error. In this case the variable
opt_info.arg contains the error string.
0 Indicates the end of options. The variable
opt_info.index contains the number of arguments
processed.
'?' A usage message has been required. You normally call
optusage() to generate and display the usage message.
The following is an example of the option parsing portion of
the wc utility.
#include
while(1) switch(n=optget(argv,"xf:[file]"))
{
case 'f':
file = opt_info.arg;
break;
case ':':
error(ERROR_exit(0), opt_info.arg);
break;
case '?':
error(ERROR_usage(2), opt_info.arg);
break;
}
3.5 Storage Management
It is important that any memory used by your built-in be
returned. Otherwise, if your built-in is called frequently,
ksh will eventually run out of memory. You should avoid
using malloc() for memory that must be freed before
returning from you built-in, because by default, ksh will
terminate you built-in in the event of an interrupt and the
memory will not be freed.
The best way to to allocate variable sized storage is
through calls to the stak library which is included in
libast and which is used extensively by ksh itself. Objects
allocated with the stakalloc() function are freed when you
function completes or aborts. The stak library provides a
convenient way to build variable length strings and other
objects dynamically. The man page for the stak library is
contained in the Appendix.
Before ksh calls each built-in command, it saves the current
stack location and restores it after it returns. It is not
necessary to save and restore the stack location in the b_
entry function, but you may want to write functions that use
this stack are restore it when leaving the function. The
following coding convention will do this in an efficient
manner:
yourfunction()
{
char *savebase;
int saveoffset;
if(saveoffset=staktell())
savebase = stakfreeze(0);
...
if(saveoffset)
stakset(savebase,saveoffset);
else
stakseek(0);
}
4. CALLING ksh SERVICES
Some of the more interesting applications are those that
extend the functionality of ksh in application specific
directions. A prime example of this is the X-windows
extension which adds builtins to create and delete widgets.
The nval library is used to interface with the shell name
space. The shell library is used to access other shell
services.
4.1 The nval library
A great deal of power is derived from the ability to use
portions of the hierarchal variable namespace provided by
ksh-93 and turn these names into active objects.
The nval library is used to interface with shell variables.
A man page for this file is provided in an Appendix. You
need to include the header to access the functions
defined in the nval library. All the functions provided by
the nval library begin with the prefix nv_. Each shell
variable is an object in an associative table that is
referenced by name. The type Namval_t* is pointer to a
shell variable. To operate on a shell variable, you first
get a handle to the variable with the nv_open() function and
then supply the handle returned as the first argument of the
function that provides an operation on the variable. You
must call nv_close() when you are finished using this handle
so that the space can be freed once the value is unset. The
two most frequent operations are to get the value of the
variable, and to assign value to the variable. The
nv_getval() returns a pointer the the value of the variable.
In some cases the pointer returned is to a region that will
be overwritten by the next nv_getval() call so that if the
value isn't used immediately, it should be copied. Many
variables can also generate a numeric value. The
nv_getnum() function returns a numeric value for the given
variable pointer, calling the arithmetic evaluator if
necessary.
The nv_putval() function is used to assign a new value to a
given variable. The second argument to putval() is the
value to be assigned and the third argument is a flag which
is used in interpreting the second argument.
Each shell variable can have one or more attributes. The
nv_isattr() is used to test for the existence of one or more
attributes. See the appendix for a complete list of
attributes.
By default, each shell variable passively stores the string
you give with with nv_putval(), and returns the value with
getval(). However, it is possible to turn any node into an
active entity by assigning functions to it that will be
called whenever nv_putval() and/or nv_getval() is called.
In fact there are up to five functions that can associated
with each variable to override the default actions. The
type Namfun_t is used to define these functions. Only those
that are non-NULL override the default actions. To override
the default actions, you must allocate an instance of
Namfun_t, and then assign the functions that you wish to
override. The putval() function is called by the
nv_putval() function. A NULL for the value argument
indicates a request to unset the variable. The type
argument might contain the NV_INTEGER bit so you should be
prepared to do a conversion if necessary. The getval()
function is called by nv_getval() value and must return a
string. The getnum() function is called by by the
arithmetic evaluator and must return double. If omitted,
then it will call nv_getval() and convert the result to a
number.
The functionality of a variable can further be increased by
adding discipline functions that can be associated with the
variable. A discipline function allows a script that uses
your variable to define functions whose name is
varname.discname where varname is the name of the variable,
and discname is the name of the discipline. When the user
defines such a function, the settrap() function will be
called with the name of the discipline and a pointer to the
parse tree corresponding to the discipline function. The
application determines when these functions are actually
executed. By default, ksh defines get, set, and unset as
discipline functions.
In addition, it is possible to provide a data area that will
be passed as an argument to each of these functions whenever
any of these functions are called. To have private data,
you need to define and allocate a structure that looks like
struct yours
{
Namfun_t fun;
your_data_fields;
};
4.2 The shell library
There are several functions that are used by ksh itself that
can also be called from built-in commands. The man page for
these routines are in the Appendix.
The sh_addbuiltin() function can be used to add or delete
builtin commands. It takes the name of the built-in, the
address of the function that implements the built-in, and a
void* pointer that will be passed to this function as the
third agument whenever it is invoked. If the function
address is NULL, the specified built-in will be deleted.
However, special built-in functions cannot be deleted or
modified.
The sh_fmtq() function takes a string and returns a string
that is quoted as necessary so that it can be used as shell
input. This function is used to implement the %q option of
the shell built-in printf command.
The sh_parse() function returns a parse tree corresponding
to a give file stream. The tree can be executed by
supplying it as the first argument to the sh_trap() function
and giving a value of 1 as the second argument.
Alternatively, the sh_trap() function can parse and execute
a string by passing the string as the first argument and
giving 0 as the second argument.
The sh_isoption() function can be used to set to see whether
one or more of the option settings is enabled.
REFERENCES
1. POSIX - Part 2: Shell and Utilities, IEEE Std
1003.2-1992, ISO/IEC 9945-2:1993.
2. Glenn Fowler, Nmake reference needed
3. Brian W. Kernighan and Dennis M. Ritchie, The C Pro-
gramming Language, Prentice Hall, 1978.
4. American National Standard for Information Systems -
Programming Language - C, ANSI X3.159-1989.
5. Bjarne Stroustroup, C++, Addison Wesley, xxxx
6. POSIX - Part 1: System Application Program Interface,
IEEE Std 1003.1-1990, ISO/IEC 9945-1:1990.
7. David Korn and Kiem-Phong Vo, SFIO - A Safe/Fast In-
put/Output library, Proceedings of the Summer Usenix,
pp. , 1991.