Overview of the OpenSolaris
Korn Shell 93 Integration Project
[Draft Version 1.0 as of 2007-02-27, the final version will be available at the conference]
Roland Mainz
Bahnhofstraße 5
35390 Gießen
Germany
roland.mainz@nrubsig.org
Abstract
The aim of this talk is to describe the Korn Shell 93 [ksh93] Integration Project,
outlining the origins and goals of the project.
We describe the "new" Korn Shell [ksh93] features, and its
improvements/advantages over the "old" korn shell [ksh88]: performance,
usability, administration, localisation, internationalisation,
mathematical functions, networking and builtin commands, and its
components, including libshell - the ksh93 core library.
We present our project's status and progress, focusing on architectural
difficulties and problems we encountered and solved.
We further describe the Solaris specific changes to ksh93, and outline
future directions for this project: short-term, mid-term and long-term goals,
and migration/update status of /usr/bin/ksh [ksh88] to "ksh93" within Solaris
and OpenSolaris-based distributions.
We also outline the future utilisation of ksh93 and libshell within the
Solaris Operating Environment's core components, as well as various
components-related changes and enhancements planned for ksh93.
Finally we provide a description what could/should be done in a
(revised) future POSIX shell standard.
Motivation
There are several motivations behind this project - partially seeking to fix
longstanding problems and partially looking towards the future of Solaris:
- Customers have requested ksh93 integration and ksh93 features in Solaris for
many years. RFEs supporting this include
4113420,
6332421,
1215363,
4201349,
4448701,
4827484,
4877415,
5034853,
etc.)
- Lower the maintaince burden of Sun engineers by delivering an
almost unmodified version of ksh93 (which is actively maintained by
upstream authors (i.e. David Korn and Glenn Fowler) and the AST community)
instead of the current
/usr/bin/ksh codebase which is
closed-source, highly Solaris-specific derived a lot from the original
ksh88 version (which causes interoperability problems) and no longer
supported/maintained by upstream.
- Introduction of ksh93 to improve interoperability between Solaris
and other operating systems which ship ksh93 as either
/usr/bin/ksh or /usr/bin/ksh93.
- Improve usability:
- Improve performance:
ksh93 provides significantly better performance for executing scripts thanks to (at least) the following two points:
- builtin commands
ksh93 provides builtin POSIX implementations of several small
and often-used commands (like
basename,
cat,
cmp,
cut,
dirname,
echo (print provides a superset),
join,
mkdir,
paste,
rmdir,
sync,
sleep,
sync
printf,
uniq,
wc,
etc.) which
do not require the overhead of launching an external command
(avoiding heavywheight operations such as fork(),
setlocale(), etc.) resulting
in significantly faster execution of scripts compared to the
Bourne shell, Bourne-Again Shell (bash3) or the old korn shell.
(Note: The same interface used to implement the builtin commands is exposed as public API
and can be used to implement 3rd-party addons.)
- avoid
fork() if possible
ksh93 avoids fork()'ing child process when possible, unlike other shells.
For example subshell constructs (like $( command ) or
(command ; command ; ))
no longer trigger the creation of a new
process, instead the current "context" is saved and restored when the
subshell is exited. This saves a significant amount of CPU time which is
normally spent on child process creation+destruction and I/O between
child and parent.
- Provide better administration tools for shell accounts:
/etc/ksh.kshrc
Defines a session configuration script to be defined all interactive
shells which is read before the per-user ~/.kshrc script
(/etc/profile and ~/.profile are only for
login shells and do not apply to all interactive shells).
This greatly simplifies side- and machine-wide administration since there
is now a way to provide machine-wide defaults (the ksh88 only has a
per-user configuration file (~/.kshrc) which caused lots
of headaches for users and admins in the past).
(Note:This is only a tiny piece of a larger feature work which will come
later with /etc/env.d/; see below).
- More features:
-
Associative arrays (where the subscript is a string) and arrays of
(almost) unlimited size (we even ship a 64bit version of ksh93 to
avoid the limits of the 32bit address space (e.g. ~~3.5GB)).
- Localisation:
ksh93 has builtin support for localisation (="l10n"),
supersetting the external /usr/bin/gettext command
and simplifying the implementation of localised scripts
(/usr/bin/gettext still works but the new method is
faster, far less resource hungry and much easier to use).
Example:
$ export LC_ALL=C
$ echo $"hello world"
hello world
$ export LC_ALL=de_DE.UTF_8
$ echo $"hello world"
Hallo Welt
$ export LC_ALL=fr_FR.UTF_8
$ echo $"hello world"
bonjour, monde
- Builtin floating-point math
(i.e. a superset of /usr/bin/bc + /usr/bin/dc):
Example (calculate π using the Wallis product):
$ float r=1.0 ; \
for (( i=1 ; i < 1000000 ; i++ )) ; do \
(( r=( r * ((4.0 * (i * i)) / (4. * i * i -1.0)) ) )) ; \
done ; \
print $(( r * 2.0 ))
3.14159186819
- Networking:
Example (fetch file via HTTP):
# open TCP channel to the web server specified in ${host} and ${port}
exec 3<>"/dev/tcp/${host}/${port}"
# send HTTP "GET" request to retrive document ${path}
request="GET /${path} HTTP/1.0\n"
request+="Host: ${host}\n"
request+="User-Agent: osoldevconf2007bot (2007-01-26; $(uname -s -r -p))\n"
print -- "${request}\n" >&3
# collect response and send it to stdout
cat <&3
- Provide ksh93 and friends available as shared library:
ksh93 is based on several libraries including libshell.so
(shell core) and libast.so (platform abstraction and utility library)
which could be useful to implement new applications or enhancing older ones.
Currently our "future consumers of libshell.so"-list contains:
Realisation/Challenges/Lessons Learned
Main focus is on interoperability, portability of ksh93 script between
platforms, compatibility to the "upstream" version of ksh93, usability
and (code) quality - which includes running the AST/ksh test suite and making
sure that the initial code which is put back into OS/Net (=Core components
of Solaris) passes this test suite without any errors or warnings
(in the "C" locale for now).
The project progress can be divided into three major parts:
Part one being the development of the initial prototype integration code which can only be
described as "failure" (or learning experience) since the AST/ksh build
system and code has some unique properties (and doesn't work as we had
expected it), part two being the development of the 2nd prototype series and part three
the ARC cases and adjustments following the ARC cases and review.
The lessons learned during the first prototype include:
- We cannot use AST's
iffe ("IF Feature Exists",
a feature-probing system similar to autoconf) in the
OS/Net build, neither do the rules allow it nor is the iffe use
of timing loops and probing compatible with the highly-parallel OS/Net build
(e.g. the high CPU load may cause iffe probes to fail under sufficiently high load).
- We cannot write our own versions of the generated includes by hand - it is
far too much work, the upstream codebase is constantly changing
and the resulting testing requirements are far too great for an OpenSoure
project run by volunteers who do their work in their spare time.
- Generated 32bit and 64bit sources and include files differ slightly
in their content and shipping two separate sets of includes is not
desired (e.g. for portability, we want the same include file paths
as on other platforms. This implies not adding special Solaris
subdirectories for 32-bit and 64-bit headers which would cause problems
for cross-compilations and similar scenarios (e.g. Solaris can build
both 32-bit and 64-bit binaries regardless whether the OS is running
in 32-bit mode or 64-bit mode. This eliminates solutions like making
/usr/include/ast a symlink to the 32-bit or 64-bit headers)).
-
The AST builtin commands in ksh93 implement POSIX behaviour but some Solaris
tools/utilities in
/usr/bin/ predate this standard.
Finally it was decided to enable only those builtins for now (which is the
only point where our version of ksh93 differs from the original upstream
sources (for now)) which are 100% compatible to those versions found in
/usr/bin/ or make them compatible (if possible) and revisit
the issue later (maybe via binding the builtins to /usr/xpg[46]/bin/,
the location where Solaris stores it's POSIX binaries if their functionality
differs from the traditional Unix behaviour.).
Thanks to the lessons learned during the development with the first
prototype we started with a different design, focusing on building
the autogenerated includes outside the OS/Net tree and then import
them for each source update and developing automated solutions to
simplify this process, including "buildksh93.ksh".
This script has several tasks, including:
- Build the upstream sources on SPARC and Intel/AMD64 x86 as 32bit and 64bit targets.
- Provide a stable build environment, independent from the current users environment (env, locale, shell etc.).
- Modify the AST/ksh build defaults and force C99 and XPG6 modes to ensure that all matching standard APIs are found.
- Provide an easy way to run the AST/ksh test suite against the generated binaries.
The new prototype roughly follows the following process:
- Build the "upstream" AST/ksh sources using the
buildksh93.ksh script.
- Test the resulting "ksh" binary against the test suite (as safeguard to catch any problems early in the native build environment).
- Move the new sources into OS/Net:
- Create a diff between the old and new AST/ksh sources, adjust the paths in the resulting patch to match
the source locations in OS/Net and patch the tree with this diff.
- Copy over the generated AST includes to their matching locations in the OS/Net tree.
- Manually adjust the OS/Net Makefiles for any differences.
- Build OS/Net tree.
- Test the resulting ksh binary using the AST/ksh test suite and compare the results against the test suite results from the native AST/ksh build.
Another major issue was to implement 64bit-support. Prior to our project the AST/ksh
codebase had 64bit support on various platforms like Dec/Alpha but Solaris/64bit was not
supported. We added Solaris/64bit support for both SPARC and AMD64 to the
upstream sources but faced the problem that generated sources
and includes were not 100% identical to those created in a Solaris/32bit
build.
Our solution is to use the unique approach to build both 32bit and 64bit
includes/sources separately as described above and then use
/usr/bin/diff -D<symbol> to create one "merged" 32bit
and 64bit source file.
Example (for SPARC):
...
#ifdef __sparcv9
...
/* 64bit include content */
...
#else /* __sparcv9 */
...
/* 32bit include content */
...
#endif /* __sparcv9 */
...
This solution allows us to keep the original design while supporting multiple targets (32bit, 64bit) for one set of includes.
During this longer development
phase several improvements were applied to ksh93, too - including an improved filename/variable/etc. expansion mode
(now displays a list of choices with index numbers; typing
<number><tab> then picks a
choice from that list), the addition of almost all mathematical library
functions specified by ISO C99 standard and various minor fixes and
changes to the codebase (which includes a major cleanup/sweep of almost
all build warnings reported by Sun Studio 11).
After most of the development work on the 2nd prototype series was done we started the first ARC case
("ksh93-integration project").
We had to file PSARC 2006/587 "/etc/ksh.kshrc for ksh93"
to handle a small mistake/leftover from original ARC case - we ARC'ed that there is a new file called /etc/ksh.kshrc but we forgot
to specify the content - the file sets the default editor mode to "gmacs" - which was VERY important to improve the situation compared to
the old ksh where every user had to configure the editor mode him-/herself (which is a challenge for beginners and a source of major
frustration for users of the old korn shell).
Finally we filed
PSARC 2007/035 "ksh93 Amendments"
to handle the remaining
issues which came up during development, mainly that...
-
... ksh93 defines its
own version of "
getconf" which supports extra AST and
ksh93-specific values (which is mandatory since the ksh93 test suite
depends on this version of "getconf", making it impossible to
disable this builtin command without causing portability/interoperability
issues)
-
... the introduction of the AST message catalog generation tools
which itself is dependent on ksh93. The problem is that the OS/Net build rules
do not allow the execution of a binary build from the normal sources as part of
the build itself (including ksh93 and the AST l10n tools) which means
that the build machine itself has to provide these tools (otherwise the newly
build binary of ksh93 may fail to work if an interface or dependency in
something like
libc.so.1 (or any other kernel-userland
communication interface/protocol) was added, changed, removed etc. (such a condition is
called a "flag day"
in OS/Net, i.e. these changes require an update of interface/library/etc.
(e.g. libc.so.1) on the build machine before the new binaries
can be executed)).
The solution for this problem was pretty much straightforward - we
jump more or less two years ahead in the plan and ship those tools at their designated
location now. The first putback into OS/Net will then have the l10n
message catalog generation disabled until all build machines have been
updated to include the new package containing the new tools and at this
point we'll enable the generation again.
Future work
The ksh93-integration project is a large project (for OpenSolaris.org) and
work on it will keep us busy for the next couple of years.
Some tasks which are ahead of us include:
- Finish the first putback
- Enable the generation of the l10n catalogs in OS/Net once all build machines have been updated to include ksh93
- Migrate the default shell used in OS/Net Makefiles from
/usr/bin/sh to ksh93 (maybe even in a limited way as part of the initial putback
- DTrace support for ksh93
- Enable
pfksh93
- Migrate the POSIX shell implementation
/usr/xpg4/bin/sh to ksh93 (the current code is derived from the old ksh88)
- Start the work on a future POSIX shell standard which should include some of the newer ksh93 features like
- associative arrays
function-style functions with local variables and scoping
- floating-point math with C99 math functions
$"..."-style localised strings
- "
\u[unicodeval]" to describe a character using its hexadecimal unicode value
- "
\w[widecharval]" to describe a character using its hexadecimal widechar value
(for locales like ja_JP.PCK where the character values are not based on Unicode)
/etc/sh.shrc and ~/.shrc as startup scripts for the standard shell in interactive mode
/etc/sh.sh_logout as counterpart to /etc/profile
- Default path for loadable functions like
/usr/libs/shell/
bash: Enable features like /etc/bash.bashrc and /etc/bash.bash_logout
- Work on
/etc/env.de/ (shell session/profile
login/startup/shutdown/logout plugin scripts ; like /etc/profile.d/
on Linux except that this should cover all session types/modes)
Conclusions
Currently the ksh93-integration prototype codebase is being reviewed
and we await the approval for integration ("RTI") within the next couple
of weeks following the review.
References/Links