| Solaris |
|
|
Author: Darren Reed, March 2009
Starting with the release of Solaris 10, networking inside the Solaris kernel has evolved away from being centered on the SVR4 STREAMS approach to a newer design that supports better performance. Whilst the system has continued to support the old interfaces, using them has often meant there is a performance cost.
A classic example of this is the use of STREAMS modules between TCP/IP and network interface drivers, typically used by firewalls and other software packages to inspect all of the packets.
To bring filtering of packets into line with the developments in other parts of Solaris, we've developed a new interface, known as the Packet Filtering Hooks. In addition to providing the means to be notified each time a packet appears at one of the hook points, this new interface also provides the developer with kernel access to other basic network interface information, such as their names and addresses associated with them.
Callbacks and notifications are available to the programmer to provide them with information about when a new instance of IP is created to support a new Zone booting that requries an exclusive instance of IP. Finally, for the first time on Solaris, it is possible to intercept packets on the loopback interface. The loopback packet interception also provides the developer with access to packets as they move between zones that are using a shared instance of IP, the default model.
In summary, the introduction of Packet Filtering Hooks into Solaris brings into line with the needs and expectations of developers who are looking to develop value added network solutions at the kernel level, be they for security (packet filtering/firewall), network address translation (NAT) or other purposes. For Solaris users, these interfaces are available in Solaris 10 Update 7 and later. For OpenSolaris users, these interfaces are available in OpenSolaris 2009.06 and later.
The folliowing functions are exported from the misc/neti and misc/hook kernel modules to support packet filtering. Developers writing code to interface with these functions will need to link their kernel module(s) with -Nmisc/neti and -Nmisc/hook in order to be correctly loaded by the kernel.
The following types have been introduced as part of this API to support the functions above.
There is a substantial amount of programming required to properly setup working with this API as it supports multiple instances of the IP stack running concurrently in the same kernel. When the IP stack was extended to allow multiple instances of itself for zones, it was also deemed necessary to have multiple instances of the framework supporting packet interception in IP. This section of the document will explore what's necessary to use this API to receive inbound IPv4 packets.
To start out using the API, it is first necessary to decide whether or not to cater to multiple instances of IP running in the kernel or to only interact with the global zone. This document will focus on what is necessary to set up code to get running. It will not go into any depth on shutdown/destroy, that is left as an exercise for the reader.
To be aware of the presence of IP instances, it is necessary to register callback functions that get activated when an instance is created, shutdown and destroyed. The structure used to store these three function pointers in should be allocated through a call to net_instance_alloc) and later free'd when it is no longer required by calling net_instance_free. The programmer is expected to give the instance a name, through setting nin_name and specify at least the create (nin_create) and destroy (nin_destroy) callbacks. Modifying nin_version is prohibited. Setting nin_shutdown is optional unless the code will be exporting information to kstats. To use kstats on a per-instance basis, it is necessary to use the function net_kstat_create as part of your work during the create callback. Cleanup of the kstat information must happen during the shutdown callback, not the destroy callback and is done by calling net_kstat_delete.
extern void *mycreate(const netid_t);
net_instance_t *n;
n = net_instance_alloc(NETINFO_VERSION);
if (n != NULL) {
n->nin_create = mycreate;
n->nin_destroy = mydestroy;
n->nin_name = "my module";
if (net_instance_register(n) != 0)
net_instance_free(n);
}
If there is already 1 or more instances present when this function call is made, the create callback will be called for each currently active instance of IP. The framework that supports the callbacks being made will ensure that only one of the create/destroy/shutdown functions is active at any one time for a given instance and that once created has been called, shutdown will only be called after create has been completed and similarly destroy does not start until shutdown is complete.
The function below, mycreate, is a simple example of what might be used as the callback when an instance is created. All that this function does is record the network instance identifier in its own private context structure and register a new callback to be called when a new protocol (such as IPv4 or IPv6) is registered with this framework.
Remember that even if there are no zones running (and therefore no instances other than the global zone), calling net_instance_register will always result in the nin_create being exercised for the global zone. As part of the programming to use this interface, it is always necessary to supply the destroy callback so that net_instance_unregister can be later called. Attempts to call net_instance_register with either the nin_create or nin_destroy fields set to NULL will fail.
void *
mycreate(const netid_t id)
{
mytype_t *ctx;
ctx = kmem_alloc(sizeof(*ctx), KM_SLEEP);
ctx->instance_id = id;
net_instance_notify_register(id, mynewproto, ctx);
return (ctx);
}
The function mynewproto should expect to be called each time a network protocol is either added to or removed from a networking instance. If there already registered network protocols operating within the given instance, then the create callback will be called for each one that already exists.
For this callback, only the proto argument is filled in by the caller - there is neither an event nor hook name that can be meaningfully supplied at this point. In this example function, only events that announce the registration of the IPv4 protocol are being looked for.
The next step, in this function, is to discover when events are added to the IPv4 protocol by registering the function mynewevent using the net_protocol_notify_register interface.
static int
mynewproto(hook_notify_cmd_t cmd, void *arg, const char *proto,
const char *event, const char *hook)
{
mytype_t *ctx = arg;
if (strcmp(proto, NHF_INET) != 0)
return (0);
switch (cmd) {
case HN_REGISTER :
ctx->inet = net_protocol_lookup(s->id, proto);
net_protocol_notify_register(s->inet, mynewevent, ctx);
break;
case HN_UNREGISTER :
case HN_NONE :
break;
}
return (0);
}
The table below lists all three protocols that could be expected to be seen with the mynewproto callback. In time new protocols may be added, so it is important to safely fail (return the value 0) any unknown protocols.
| Programming symbol | Protocol |
| NHF_INET | IPv4 |
| NHF_INET6 | IPv6 |
| NHF_ARP | ARP |
Just as the handling of instances and protocols is dynamic, so too is that of the events which live under each protocol. At present there are two types of events supported by this API: network interface events and packet events.
In the function below, the announcement for the presence of the event for inbound packets for IPv4 is being checked for. When that is seen, a hook_t structure is allocated, describing the function to be called for each inbound IPv4 packet.
static int
mynewevent(hook_notify_cmd_t cmd, void *arg, const char *parent,
const char *event, const char *hook)
{
mytype_t *ctx = arg;
char buffer[32];
hook_t *h;
if ((strcmp(event, NH_PHYSICAL_IN) == 0) &&
(strcmp(parent, NHF_INET) == 0)) {
sprintf(buffer, "mypkthook_%s_%s", parent, event);
h = hook_alloc(HOOK_VERSION);
h->h_hint = HH_NONE;
h->h_arg = s;
h->h_name = strdup(buffer);
h->h_func = mypkthook;
s->hook_in = h;
net_hook_register(ctx->inet, (char *)event, h);
} else {
h = NULL;
}
return (0);
}
The function mynewevent will be called for each event that is added and removed.
The following list of events are available for use today:
| Event Name | Data structure | Comment |
| NH_PHYSICAL_IN | hook_pkt_event_t | This event is generated for every packet that arrives at the network protocol and has been received from a network interface driver. |
| NH_PHYSICAL_OUT | hook_pkt_event_t | This event is generated for every packet prior to delivery to the network interface driver for sending from the network protocol layer. |
| NH_FORWARDING | hook_pkt_event_t | This is for all packets that have been received by the system and will be sent out another network interface. It happens after NH_PHYSICAL_IN and before NH_PHYSICAL_OUT |
| NH_LOOPBACK_IN | hook_pkt_event_t | This event is generated for packets that are received on the loopback interface or that are received by a zone that is sharing its network instance with the global zone. |
| NH_LOOPBACK_OUT | hook_pkt_event_t | This event is generated for packets that are sent on of the loopback interface or that are being sent by a zone that is sharing its network instance with the global zone. |
| NH_NIC_EVENTS | hook_nic_event_t | This event is generated for specific changes of state for network interfaces. |
For packet events, there is one specific event for each particular point in the IP stack. This is to enable the developer to be selective about exactly where in the flow of the packets they wish to intercept packets, without having to be overburdened by examining every packet event that happens inside the kernel. For network interface events, the model is different, in part because the events are much lower in volume and because it is more likely that the developer will be interested in several of them, not just one.
The network interface event announces when interfaces...
Over time new network interface events may be added so it is important to always return 0 for any unknown or unrecognised event that the callback function receives.
Finally, we have arrived in the function that is called when a packet is received. In this case the function mypkthook should expect to be called for each inbound packet that arrives in the kernel from a physical network interface. Packets generated internally, that flow between zones using the shared IP instance model or over the loopback interface, will not be seen.
To illustrate the difference between accepting a packet and allowing it to return normally with what's required to drop a packet, the code below prints out the source and destination address of every 100th packet and then drops it, introducing a packet loss of 1%.
static int
mypkthook(hook_event_token_t tok, hook_data_t data, void *arg)
{
static int counter = 0;
mytupe_t *ctx = arg;
hook_pkt_event_t *pkt = (hook_pkt_event_t)data;
struct ip *ip;
size_t bytes;
bytes = msgdsize(pkt->hpe_mb);
ip = (struct ip *)pkt->hpe_hdr;
counter++;
if (counter == 100) {
printf("drop %d bytes received from %x to %x\n", bytes,
ntohl(ip->ip_src.s_addr), ntohl(ip->ip_dst.s_addr));
counter = 0;
freemsg(*pkt->hpe_mp);
*pkt->hpe_mp = NULL;
pkt->hpe_mb = NULL;
pkt->hpe_hdr = NULL;
return (1);
}
return (0);
}
Packets received by this function, and all others that are called as a callback from a packet event, are done so one at a time. There is no chaning together of packets with this interface, so the developer should expect there to be only one packet per call and for b_next to always be NULL. It is important to remember that whilst there is no other packet, a single packet may be comprised of several mblk_t's chained together with b_cont.
For a fully worked example that can be compiled and loaded into the kernel, please look at full.c.
To compile this code into a working kernel module, on a 64bit system the following should be sufficient:
gcc -D_KERNEL -m64 -c full.c ld -dy -Nmisc/neti -Nmisc/hook -r full.o -o full
Terms of Use
|
Privacy
|
Trademarks
|
Copyright Policy
|
Site Guidelines
|
Site Map
|
Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2012, Oracle Corporation and/or its affiliates.