This page has moved!

 The old content is below, but see http://www.opensolaris.org/os/project/nwam/architecture/ for the latest content.


Network Auto-Magic Architecture

 Draft 0.1.2, 2006-Feb-15

John BeckJim CarlsonRenee DansonMichael Hunter
Anay PanvalkarKacheong PoonGarima TripathiJan Xie

 This is a draft; it is by no means complete. Furthermore, it is being updated on a regular basis: weekly if not more often. Please check the version number and date above. Also, note that the bulk of this specification is in black text on a white background, as most web pages are. Commentary about unresolved issues is in white on orange; all of this will be cleaned up by the time this draft is finalized, at which time it will change from Draft 0.something to Version 1.0.

 There are six focus areas described below:

  1. Overview & Component Interaction
  2. State Machine
  3. Event Handler
  4. Abstractions for UI
  5. Network Service Model
  6. Dependencies with the rest of the System

1. Overview & Component Interaction

 Network Profiles, the primary component of the Network Auto-Magic project, are a way to simplify network configuration management, by allowing users to specify various properties which determine how things work in different circumstances. The properties include, but are not limited to:

  • which network interface(s) to use
  • how to obtain IP address(es) for the interface(s) in use
  • which name service(s) to use
  • a host name (and any required variations thereof)
  • routing information
  • a set of IP filter rules
  • smf(5) services
  • others? (It is not yet clear if this list is complete, or if it will be user-extensible.)

 Note that we expect to support "overlay" profiles, which can be thought of as partial profiles which specify certain attributes but inherit other attributes from a "parent" profile.

 Let us begin with an architectural overview. The primary components are:

  • The profile repository. This is where the configuration program stores its data, which will also be read by the profile daemon.
  • The profile configuration program (a.k.a. the UI).
    • Note that there will be both CLI and GUI versions of this program which will perform similar if not identical tasks.
    • In addition to using the repository, it also interacts with the profile daemon.
    • Tasks which users will use this program to perform include:
      • creating, modifying and deleting profiles
      • activating a profile
      • querying information about profiles
  • The profile daemon.
    • This reads data from the repository.
    • It reacts to events as notified by the event handler.
    • It reacts to changes which users make via the configuration program.
    • The "state machine" described in Section 2 is implemented in this daemon.
    • The daemon also interacts with the SMF network services.
  • The event handler. This will likely have at least some kernel component, and will communicate with the profile daemon, informing it of events which may trigger a network reconfiguration.
  • The SMF network services. These are already part of Solaris, but we expect to modify them to some extent. The daemon will restart / refresh some of these services as needed.

 How they interact is roughly as follows:

  • At any given time, exactly one profile is "active". If this is a "parent" profile, then it specifies all attributes; if it is an "overlay" profile, then it specifies some attributes, and the rest are inherited from its parent profile.
  • At boot, the profile daemon consults the repository for the current active profile and configures the network(s) accordingly. It is not yet clear whether:
    • the active profile is always persistent across reboots  or
    • there may be support for a temporarily active profile which does not persist across reboots
  • As events occur which may trigger a change in the network configuration, the event handler detects these and notifies the daemon accordingly. The daemon in turn consults the active profile and may reconfigure the network(s) accordingly.
  • If a user modifies a profile, the configuration program updates the repository and notifies the daemon. If the current active profile is modified, then the daemon may reconfigure the network(s) accordingly.
  • Likewise, if a user activates a new profile, then the configuration program updates the repository and notifies the daemon, which may then reconfigure the network(s) accordingly.
  • If an overlay profile is activated, the configuration program will notify the profile daemon, which may then reconfigure the network(s) accordingly. Note that this is just a special case of the current active profile being changed.

2. State Machine

 One of our focus areas is "State Machine", which needs to cover both the abstract set of states for the profile daemon, and the set of possible transitions between those states. For now, I will focus on the transitions, with the idea that sufficiently specifying the transitions may suggest what the states themselves should be.

  • There are two event-driven transitions:
    • network++
       A new network has become available. Common possible reasons:
      • a LAN cable is plugged in
      • a wireless card is hot-plugged
      • a wireless scan shows a new AP
      • a new tunnel has been plumbed
         Note that booting and resuming from suspend are really just special cases where one or more of the above appear to happen at once, as the daemon will attempt to "de-queue" all pending events whenever it starts or resumes (i.e., it will attempt to examine all pending events before handling any of them). This will be part of the daemon's "damping" to maximize stability (more on this below).
         There are two interesting sub-cases:
      • We now have one or more networks where we had none previously. Then we do whatever the profile specifies for this/these network(s).
      • We had a network previously, then "lost" it. If this is the same network, then we "resume" using it to the extent possible. Otherwise, we follow the profile's "release" rules for the old network, then the "obtain" rules for the new network.
    • network~--
       An existing network has gone away. Common possible reasons:
      • a LAN cable is unplugged
      • a wireless card is unplugged
      • a wireless scan shows an old AP is no longer usable
      • an existing tunnel has been unplumbed
         Note that shutdown and suspend are really just special cases where all networks appear to go away at once.
         Since the network is gone, we can do nothing with respect to it per se. But we can start a timer, then once that timer "pops" (per the profile), we might either reset all connections (if the number of networks is now 0) or try to get all services using the "dead" network to transition to one of the other networks (if the number of networks is now ≥ 1). Also note that when the timer pops, we set the state so that a subsequent "network++" event follows the "there was no previous network" path rather than the "there was a previous network" path. But note that if we get a "network++" event before the timer pops, and determine that the "new network" is the same as the "old network", then we will attempt to "damp" the events out and act as if neither event had occurred.
  • There are also user-driven transitions: whenever a user modifies the active profile, or activates a different profile (including e.g. a punchin tunnel overlay), then the new active profile may result in a transition. Depending on the change(s), there may be nothing to do, or there may be minor reconfigurations to make, or it may be that the user did the equivalent of pressing a giant red "reset" button.
    • XXX more specificity is needed here, including examples
    • A note on "punchin" (the IPsec-based VPN which many of use to access the SWAN remotely): although tunnels coming and going should be detected by the event handler and thus be handled by the profile daemon as an event-driven transition, it would probably be better for us to work with the punchin team to integrate our stuff together so that punching in and out would involve using our interfaces, and thus be user-driven transitions, with the profile daemon doing the heavy lifting instead of the punchin script.
  • So what states do all the transitions suggest? It is not clear if these transitions suggest any sort of traditional simple state model. E.g., the Zones model whose primary states are Configured, Installed and Running seems impossibly simplistic for what we are trying to achieve. Instead, it seems to me that we ought to come up with an abstract representation of the network configuration, and that abstraction will become the "state". Then whenever the users modifies the active profile or activates a different profile, the network configuration will be changed accordingly, as will our abstract "state". Likewise, whenever an event forces a reconfiguration, the new configuration will be reflected in our abstract "state".

3. Event Handler

 The event handler must interface with the kernel, but will probably mostly run in user-land.

 hald will likely be involved; though it does not provide all the information we need. So the event handler will monitor several sources of information: hald, routing socket, syseventd; current thinking is that this monitoring will take place within the "profile daemon" entity.

 In addition to the monitoring component, work may be required in the kernel to ensure that information is reported in a consistent manner; hald back-end support will also need to be added (this will benefit other projects as well as this one).

 The design of the event handler will be shaped by the answers to the following questions.

  • What information needs to be delivered? What events are we concerned with?
    • link up/down for wired links
    • availability of a new wireless network
    • loss of signal/signal level crosses a threshold for an existing wireless network
    • plumbing/unplumbing of a tunnel
    • creation/removal of an aggregation
    • insertion/removal of a NIC
    • L3 renumbering Is this list complete? Are there things on it that should not be?
  • How will the information be obtained?
    • link up/down
      DL_NOTE_LINK_UP/DOWN
      translates to toggling of IFF_RUNNING which can be monitored on a routing socket.
       Support for DL_NOTE_LINK_UP/DOWN is not consistent across all drivers. Should we/can we make all of Sun's drivers do this correctly? Or at least some subset that seem important enough?
       We still need to be able to work with drivers which do not support this. What happens if we always assume the link is up?
    • wireless
       If the wireless driver is made to support the DL_NOTE_LINK_UP/DOWN notifications, that would be a useful bit of information, assuming that we all agree on what it means; should probably mean that we have an associated AP or we have established an ad-hoc group. Do any existing wireless drivers support it already (ath does not)?
       An earlier PSARC case (whose official title was WiFi PCMCIA Driver Productization, although it is colloquially known as wificonfig) defines a set of (unstable) wireless driver ioctls that make up the interface with wificonfig. Those interfaces are likely to change, though, as work in that area is going on right now. Is it safe to assume that there will be an interface of some sort that we can use, that will be supported by a critical mass of wireless drivers?
       Even if that is the case, though, the existing ioctls at least only respond to queries; they do not asynchronously send reports as changes happen. Making a user-space program poll for changes in the wireless device state seems pretty painful...are there useful changes coming in the dladm/wificonfig work that is going on?
    • insertion/removal of NICs devfs will help here; XXX need more detail.
    • tunnel/aggregation/other L2 changes We should be able to get this sort of information from GLDv3, which will have pretty comprehensive L2 information post-Clearview. Possibly something as simple as posting an event to user-land from dls_create() and dls_destroy() will do the trick.
    • L3 renumbering This is easy enough to get from a routing socket; or should we be more tied in to the mechanisms that are directly involved with the renumbering (DHCP, LLA, ...)?
  • How will the information be delivered to consumers?
     This is probably a more detailed design question, as the current thinking is that the event handler will be part of the profile daemon entity, our primary consumer.
  • How should we interact with a future FMA strategy for datalinks?
    This discussion might be better placed in the Dependencies with the rest of the system section. It is also pretty conjectural; more detailed knowledge of FMA needs to be applied.
     If we had an FMRI for datalinks, it could report "faulted" in the event of a LAN cable being unplugged, or loss of an AP, for example. Or maybe if a wireless card is unplugged, the FMRI would simply disappear.
     In this scenario, our event handler could be an FMA Agent, consuming fault events from the (to be developed) network diagnosis engine. Say the diagnosis engine decides that an interface needs to be taken offline; the event handler would get a report that this action has been taken, and then propagate this info into the state machine to do whatever magic the current profile calls for.
  • Will the information be stored anywhere? If so, when should snapshots be taken? How many should be stored?
     It is not clear that we need a repository at this point. We will keep this as a place-holder note just in case.
  • See the State Machine section for discussion of how to damp the effects of "bouncing" interfaces, and attempting to determine if a link change is transient or not.

4. UI Abstractions

 Network Profiles (in general) are a way to simplify network configuration management, by allowing users to specify various properties which determine how things work in different circumstances. I.e., when the system's network connectivity changes, various system services may require re-configuration; each such service will need direction as to where its new configuration data comes from. This might be the name of a new configuration file, or possibly just a few attributes obtained from the "network profile".

 Thus a network profile (in particular) is a set of attributes, some single-valued (e.g., boolean or string or integer), some multi-valued (e.g., HTTP proxy details), which taken together specify the network configuration. A single-valued attribute shall be called a "property", and a multi-valued attribute (or collection of properties) shall be called a "resource". Note that these terms are borrowed from [[zonecfg(1m)>>http://docs.sun.com/app/docs/doc/816-5166/6mbb1kqlk?a=view]] and are not necessarily final.

 Some (i.e., the list is admittedly incomplete at this time) attributes which profiles will need to have include:

  • boolean property: automatically activate a wired connection when a cable is plugged in?
  • N-choice property: get IP address (et al.) from DHCP, statically, RARP, something else?
    • If "statically", then a resource is needed to specify the IP address, net mask, default router, etc.
    • Or perhaps this should be a series of properties:
        for X in (some list)
          get X from DHCP or statically or ...?
    • Or we may go with an nsswitch-like mechanism.
  • resource: HTTP proxy information (one property of which could be a string: the path to a proxy-auto-conf file)
     Note: most attributes are system-level, i.e., only a privileged use would be allowed to modify them, but this attribute is user- level, which an unprivileged user could modify. This distinction will probably be important at some point.
  • string property: path to firewall (IP Filter) configuration file
  • 4-choice property: what to do in the face of a new wireless AP? (nothing, always query the user, query the user only if new [i.e., if it is the first time we have run across this particular WLAN], connect automatically)
  • N-choice property: what name service to use? (Perhaps the Sparks project will clean this messy area up and save us some work, but we cannot count on that.)
  • N-choice property: how to rank networks

 We will also need the concept of layered profiles. I.e., when "punching in", some of the previous profile still applies, but some new attributes need to take effect. So a layered profile needs to specify which attributes are inherited from its "parent", and the rest need to be specified, or that none are inherited except a certain set.

 An issue which has come up repeatedly during design discussions has been that of "user intent". For example, laptops are used very differently than servers and test servers may be used very differently than production servers. So a knob to indicate this intent seems like a good idea. The form this knob may take needs to be worked out.

5. Network Service Model

 The network service model allows representing/converting qualified network change events into service refresh events. The service model is not envisioned as a separate stateful entity itself but is expected to be represented within the SMF framework.

 At every qualified network change event (within the range of events defined and recognized by the event handler), based on inputs from the event handler and information in the profiles, the 'state machine' sends a notification of change to the 'service model'. A predefined list of services will need to be refreshed. This list is expected to be derived from the dependency list for existing network initialization milestones such as network/physical. However currently SMF does not offer a level of granularity that can accommodate specific actions for a single network interface changing state while other interfaces experience no change. The service model may need a new milestone such as network/ip to accommodate state being abstracted per network interface. A refresh method will have to be added to all services that are expected to participate (directly or otherwise) in the refreshing of the service states. The state machine is expected to directly call the refresh method of the services to be refreshed.

 The refresh should also resolve and take care of inter-service dependencies and automatically refresh those if the services they depend on were refreshed. In other words, a qualified network event should cause a whole slew of service refreshes on the system.

 Certain services to be refreshed may require a finer level of control and may need certain properties reset/changed.

 Finally the state machine will have to repoll the SMF repository for error checking to make sure that the expected service actions have taken place. Ultimately this status information should be available in the same mechanism the state machine uses to communicate with the end-user.

XXX This section needs more detail, which we are working on for the next draft.

6.Dependencies with the rest of the System

 In order to tie the NWAM architecture into the overall system we need to specify what other subsystems it interfaces with and what requirements it drives on those systems.

6.1 Bonjour

 Bonjour is a system under our control but not covered by this architecture document. It is worthwhile to consider it separately and understand how it fits into our overall usage. Issues related to Bonjour are:

  1. hostname issues
    1. initialization
    2. user preferences
    3. dynamic changes
  2. how mdns fits in? (litmus test might be if nodeinfo or lldp discovered names could fit in later)
  3. how do we utilize service discovery?

 The current choice for the source base for the Bonjour protocol is Avahi. Avahi comes with a C language binding to its API. While a Java API does not currently exist, it is hoped this project can develop or encourage the development of a Java API. Java has widespread support for the development of graphical tools which consume the kind of information Bonjour provides.

 As background its possible that Bonjour will be developed in two phases.

 In the first phase Bonjour would be implemented in a "superficial" way- i.e. we provide the framework, rework some services (say for example, the ftp daemon) to participate in the Bonjour framework and optionally, provide some sort of a "service browser" to browse for these services. However service discovery components would not be turned on automatically. There would be some kind of mdns/client service which would be activated if /etc/mdns.conf were present and if "mdns" was present for the ipnodes backend in /etc/nsswitch.conf. There would also be an mdns/server service which would allow local services to advertise themselves on the LAN.

 In the second phase a much deeper integration would be provided where clients browse and connect to discovered Bonjour services on the network. (In the ftp example, when you run the ftp client without specifying a hostname, it would automatically browse for ftpd services and offer to connect to them automatically.) This may be more suitable for the desktop application model rather than the network client-server model - for example it is the model for OS/X's iTunes to find music on the LAN automatically. It involves doing all of 1. above but also modifying both clients and servers to be Bonjour aware.

 From an individual end system standpoint having Bonjour on by default is preferred but that is often not the right default for many corporate environments due to security and protocol chattiness concerns. So the default behavior of this protocol might tie to the use of the machine and will have a single control to turn it on/off.

6.2 Virtualization

 Another aspect of configuration is how we deal with the plethora of virtualization technologies being developed. For a zone, if the administrative model stays as it currently is this seems like it will mostly happen in the global zone. But if the design of the Zones admin model changes to allow for more privileges to be given to zone admins then that seems like it could change radically.

 Other virtualizations technologies include:

  • BrandZ is an attribute of a zone so should be dealt with in the same manner as a zone.
  • Xen creates enough separation between the host OS and the guest OS that it is unlikely we will interact with it much.
  • Crossbow is likely to be a feature that we have to be able to interact with. The configuration model for CrossBow is currently in flux but will be something that involved creating a stack instance and then another step which binds it to a zone or domain.

6.3 HAL

 HAL has come up as a possible data source for NWAM. HAL is a schema and API for accessing information about the system. Initially it was envisioned as the layer Gnome needed to abstract the hardware. Applications of HAL are as diverse as the battery status meter and a network manager.

 There is an effort (to be documented soon on the OpenSolaris web site) to get HAL into solaris for use in volume management. HAL uses DBUS as its IPC mechanism. There are system specific back-ends and an upper layer which also just work.

Further investigation needs to be done in this area.

6.4 Visual Panels

Visual Panels are a new project aimed at creating a better way to configure Solaris. The current demo includes a hook for Network Profiles. We need to investigate more how our gui would integrate within the Visual Panels framework.

 Issues:

  • How does VP manage dependencies between framework elements?
  • Does VP have the idea of doing a set of operations atomically?
last modified by admin on 2009/11/13 00:34
Collectives
Project


© Sun Microsystems Inc. 2009
XWiki Enterprise 1.8.2.19075 - Documentation
Terms Of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines | Site map | Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.