OpenSolaris
Distributed Source Code Management (DSCM) Requirements
ident "@(#)d-scm-requirements.txt 1.4 06/03/22 SMI"
Summary
This document identifies and explains the requirements for a distributed source code management (SCM) solution to be used with OpenSolaris. The requirements are grouped into three sets of decreasing importance. It outlines a number of specific evaluations that will be used to determine whether a candidate SCM meets the various requirements.
Discussion
The requirements described below arise from a number of distinct classes: some are social, in that the requirement is believe necessary for successful use in the community; some are technical, in that the requirement is believed necessary to successfully produce software in a multi-project, multi-committer, multi-site development organization; and some are economic, in that the requirement is attempting to describe attributes that would limit the costs of the ongoing use of the tool.
In an attempt to use neutral terms, we use the phrase "candidate SCM" to describe the SCM solution we are evaluating and "current SCM" to refer to the distributed SCM solution in use (inside Sun) at present. (Not all consolidations participating in OpenSolaris use a distributed SCM at present; their SCM requirements are not discussed in this document.)
The requirements are ranked by necessity, using the terminology proposed in IEEE Std 830-1998 [1].
"Essential" requirements
E0. Open source
To be considered for use by the OpenSolaris community, the candidate SCM is expected to be available under an OSI-approved license.
E1. Unbiased and disconnected distribution
Although a distributed SCM may choose to implement some form of dependency relationship between source trees (such as a "parent-child" convention) that relationship must not need to be continuously available for sensible SCM operation.
Moreover, the candidate SCM must support source code updates between two distinct repositories with a common ancestor that have had no other contact. Sensible operation for disconnected use encompasses all SCM operations that act only on the local repository: creation, modification, or deletion of files or directories or the metadata describing these objects or the changes to them; creation of private branch or child workspace; creation of intermediate snapshots.
E2. Networked operation
The candidate SCM must be able to operate in a sensible and well performing manner between two hosts in separate administrative domains. Beyond the data contained within the candidate SCM's representation, the only common administrative requirement should be a credential identifying the remote operator initiating the transaction to the other host.
One mechanism that meets this requirement is to tunnel the candidate SCM operation through ssh(1). Candidate SCMs that use an implementation that requires domains to change security policies to open unusual or believed risky network ports will be considered to be minimally compliant with this requirement.
Performance measurements will be used to compare candidates, as outlined below. A candidate SCM with performance results in the bottom third of all candidates will be deemed to have failed to meet this requirement.
Mercurial example of setting up a testbed that crosses administrative domains: Mercurial has built-in support for ssh access to a repository. Install Mercurial on both ends; create a user account on the remote system; and configure the local and remote systems for public key ssh.
E3. Interface stability and completeness
The storage representation, command line interfaces, network protocols, and hooks interfaces should be documented and have some level of declared commitment. The state of the storage representation and the operations that modify it should be well defined, so that use with advanced file system capabilities can be assessed for hazards. (For example, consistent use with file system snapshot capabilities.)
Storage representation is important because if the storage representation changes frequently, issues might arise if frequent upgrades of the gate and personal workspaces are required.
Use of the candidate SCM with advanced file system capabilities should be defined. (For example: Can ZFS clones be used to back up repositories? Can filesystem ACLs be used to control access to portions of the repository?)
E4. Standard operations and transactions
The candidate SCM is expected to support rename and deletion transactions at the file and directory levels. Note that a history-preserving copy operation, followed by a delete operation, may be considered equivalent to a rename.
The following transactions are to be assessed and documented by the evaluating engineer:
- rename at the file and directory levels;
- deletion at the file and directory levels;
- delete file, create new file with the same name, commit the new file;
- delete file (user A), deleting gets backed out, another user (B) commits changes to the same file using a workspace B created before A's original deletion;
- whether the candidate allows references to files as they existed prior to deletion.
Equivalency (another operation or set of operations that might be used in place of an operation not specifically supported), and the the reasons for the omission of a transaction, are to be assessed and documented by the evaluating engineer.
E5. Per changeset metadata.
The candidate SCM must be able to associate, at a minimum, an unstructured text fragment with each changeset.
Additional support is to be assessed and documented by the evaluating engineer.
"Conditional" requirements
C6. Ease of use
The candidate SCM should be easy to install in a reasonably self-contained fashion. In principle, shipment in an OpenSolaris consolidation should be possible with a finite investment of resources, meaning that the candidate SCM does not have a complicated makefile system, has dependencies that can be easily managed, etc.
The primary interfaces should be understandable based on the interfaces and documentation to a user familiar with distributed SCM concepts.
The candidate SCM should offer some assistance with conflict resolution during an update, the issuance of source code patches, and the ability to browse the source tree via a web server.
The candidate SCM should be able to undo the application of a specific changeset ("backout") atomically and easily. Whether an undo can be done at any time or only before any other putback is to be assessed and documented by the evaluating engineer.
C7. "No dedicated server" operational mode
In the interests of machine resource conservation, the candidate SCM should have a mode in which it can operate without a continuously running server process. This mode may have concurrency restrictions or performance limitations compared to its primary server mode.
For instance, within a large administrative domain, it may be more convenient to utilize NFS and a shared identity infrastructure than to rely on the networked operating mode required by E2. A candidate SCM which can sensibly operate in a pure OpenSolaris NFS environment without the establishment of a dedicated server process would meet this requirement.
C8. Tool community health
The community or author of the candidate SCM needs to be active and engaged with their user population. The ability of the candidate SCM's community to absorb, directly or through a liaison, the defects and feature requests of the OpenSolaris community should be estimated, preferably by a direct inquiry to the candidate SCM community.
C9. OpenSolaris community implementation expertise
One or more contributors within the OpenSolaris community need to be able to assess potential defects in the implementation of the candidate SCM and potentially participate in the development of new features or supporting tools for the candidate SCM.
C10. Interface extensibility
Beyond the requirements of E3, an extensible interface, so that OpenSolaris-specific tools might be integrated with SCM operations is desired. Such an interface might be composed of a documented "hooks" interface, a documented library interface, or some other modular approach. An extensive hooks interface, with hook evaluations able to terminate operations, is a strongly desired attribute in a candidate SCM; a candidate SCM with such an interface will be considered to meet fully this requirement.
C11. Transactional operations and corruption recovery
The operations on the candidate repository should have defined semantics, in particular identifying non-atomic transactions and mechanisms for recovery from a corrupted repository.
C12. Content generality
The candidate SCM should be able to represent safely and track files with binary content, in addition to text files.
NOTE: This is a conditional requirement because lack of direct binary support can be finessed with tools like uuencode(1C) and corresponding makefile magic.
"Optional" requirements
O13. Partial trees
The structure of the ON consolidation and the current SCM solution allow a contributor to work on specific subsets of the source tree in a supported fashion. This requirement states that, while such a mode with support for expressing dependencies between files and directories is valuable, support for partial tree repositories is not necessary.
O14. Per-file histories
The current SCM uses SCCS as a per-file revision storage format. As such, each file has an individual history. This feature allows the combination of disjoint issues to be addressed in a single commit without connecting the per-file history. It is believed that the ability to meet the other requirements stated in this document is sufficiently more valuable than the support of per-file revision histories. Moreover, the construction of per-file histories in reporting and browsing tools can be accomplished by convention in many cases.
That is to say, a candidate SCM that meets E5 is sufficient.
Evaluations
We anticipate a number of qualitative and quantitative tests to evaluate the satisfaction of the various requirements, where a "meets" or "does not meet" result is not applicable.
Representational and performance criteria
These criteria focus on the ability of the candidate SCM to represent a large, long-running, and active source tree. The ON consolidation represents more than 25 000 changesets by over 1300 committers against approximately 40 000 files.
The expected set of meaningful operations for performance evaluation are:
- first pull/clone operation,
- subsequent pull/update operation, and
- push/commit operation.
Performance results for the set of operations will be captured for three distinct scenarios: within a campus, across SWAN between sites, and between two Internet sites. SWAN measurements will be captured between each of Menlo Park, CA and Burlington, MA, Manchester, UK, and Beijing, PRC. (Equivalent sites may be added or substituted.) For comparison, results will be phrased both as as a percentage of sustained bandwidth, and as absolute time elapsed (for an identical pair of endpoints). Baseline absolute time comparisons will be made against standard and "turbo" TeamWare for within-SWAN scenarios, and against an rsync copy of the same data for all scenarios.
The candidate SCM will be evaluated for data integrity by interruption of the set of operations by signal and by machine failure.
The safety of the candidate SCM with respect to file system capabilities will be evaluated using ZFS snapshot/clone technology for safe repository copies.
Implementation criteria
The candidate SCM implementation will be assessed by a design and code review by an OpenSolaris contributor with expertise in the implementation language of the candidate SCM.
Tools criteria
If available, the candidate SCM is expected to provide or identify a graphical merge program that can be used to resolve conflicts resulting from an update operation. In the case that no known program can be used, the evaluating contributor will assess the work necessary to use one of the standard graphical merge programs.
References
[1] IEEE Std 830-1998, "IEEE Recommended Practice for Software Requirements Specifications", 1998.
Stephen Hahn, PhD Solaris Kernel Development, Sun Microsystems
stephen dot hahn at sun dot com http://blogs.sun.com/sch///