Programming in the NetScript Toolkit

Sushil da Silva
DCC Laboratory
Columbia University
September, 1998

Introduction

NetScript is a language and environment for implementing protocols or services. NetScript programs can be deployed into network nodes to dynamically extend the network with new protocols and services.

A NetScript network consists of a collection of network nodes (e.g. PCs, switches, routers) each of which runs one or more NetScript engines. A NetScript engine is a software abstraction of a programmable packet-processing device. Each NetScript engine consists of dataflow components, called boxes in NetScript, that process packet-streams that flow through them. .Packets that flow through a NetScript node are processed by successive boxes to perform various protocol functions. Typical NetScript boxes do packet header analysis, packet demultipexing, or other protocol functions.

NetScript boxes can be dispatched to remote network engines and dynamically connected to boxes that reside there to extend the network with new communication functions. NetScript uses interconnection to achieve protocol composition--composite packet-processing protocols are built by connecting together the typed ports of boxes. For example, an IP router implemented in NetScript could be dynamically extended with firewall functions. Or such a router might be extended to monitor traffic, support content-filtering on the edge of a network domain, or perform load balancing and traffic shaping. NetScript is useful in any application that process packet-streams.

The NetScript system consists of two components: NetScript, a textual dataflow language for composing packet-processing protocols and a the NetScript Toolkit, a set of Java classes to which the textual language compiles. This release of NetScript includes the NetScript Toolkit but not the front-end language language, which will be introduced in the next version of the system. The language will consist of three integrated components -- a dataflow composition language, a presentation language for defining the format of network packets, and a classification language for building packet classifiers. Even without the language, we hope that the NetScript Toolkit prove useful as a framework for building active protocols.

This tutorial shows how to program with the NetScript toolkit. In what follows, I first introduce the NetScript language and its computational model. Then, I describe the underlying toolkit. I use the language to describe main concepts in the Toolkit. Each language construct maps to an equivalent class in the NetScript toolkit. Therefore, an understanding of NetScript's dataflow model and constructs will help you quickly master the Java classes in the Toolkit.

Impatient readers may want to skip directly to the examples. The NetScript distribution comes with a number of example protocols, including Ethernet, ARP, IP, etc. These protocols, located in the package netscript.protocol, can be used as templates for creating new protocols.

NetScript's Dataflow Model

NetScript views the task of an active node as that of processing packet streams and allocating node resources to support this processing. NetScript uses dataflow to model programming of packet-stream processing funcionts. A dataflow system consists of a collection of interconnected programs that process data streams flowing between them. These programs, called boxes in NetScript, are connected together through their input and output ports. The boxes form a reactive system in which data (in the form of packets) flows from one box to another. Arrival of data at one or more input ports of a box triggers computation within that box; otherwise the box sleeps until data arrives to trigger it.

Dataflow is different from conventional imperative programming languages such as C or Java in a number of ways. First, program or protocol composition is achieved through interconnection. One composes dataflow programs by connecting together simpler boxes. In imperative programming, the unit of composition is the function. Second, dataflow is data-driven. It is the asynchronous arrival of data messages and their flow through the dataflow graph that determines what boxes are computing and what are not. Multiple boxes can be executing concurrently. In imperative languages, the program counter determines which part of the program is active. Finally, dataflow provides a simple model of concurrency and synchronization. All data must flow through the formal interfaces of a box (i.e., its ports). There is no global or shared concurrent state between boxes. Thus there is no need for primitives for synchronizing shared state (i.e. mutex locks, semaphores, etc).

NetScript's dataflow model permits dynamic composition of protocols. Boxes can be dispatched to NetScript nodes and connected to boxes that reside there in order to extend the network with various functions. For example, a running IP router written in NetScript could be extended dynamically with firewalling, protocol analysis, or traffic-shaping capabilities.

NetScript Constructs

The NetScript language provides a small set of primitives for declaring boxes, defining input ports (abbreviated inport), output ports (abbreviated outport) and a means to compose compound boxes by connecting ports of simpler boxes together.

NetScript is not a complete language for doing general-purpose computation. Rather, it is useful to view NetScript as a coordination language, one which separates the logical organization of a concurrent system (box composition, interconnections) from how sequential computation in each component is achieved. NetScript's fundamental constructs (boxes, ports, messages) are exported through a minimal set of extensions to Java.

This section is an overview of boxes, inports, outports, box composition, and messages. We will use a couple of examples to help explain the concepts.

Boxes

The box is the central construct in NetScript and the unit of program composition. A box declaration, consists of four parts: the box name, inport and outport declarations, a declaration of internal boxes and a connect statement that defines the connections between internal boxes. When a box is loaded at a NetScript engine, NetScript will instantiate its internal contents and make connections between these boxes, as specified in the connect statement.

The syntax for box is

box-declaration :: "box" name "{" (box-body-declaration)* [connect-declaration] "}"

box-body-declaration :: inport-declaration |
                        outport-declaration |
                        field-declaration |
                        method-declaration

connect-declaration :: "connect" "{" [name "->" name ";"]+ "}"

Outports

An outport represents one of the outputs of a NetScript box. It can be connected to zero or more inports of other boxes. Boxes communicate by using a send operator to deliver messages on their outports. When a message is sent on an outport, it is delivered to all connected inports. An outport declaration includes a name to uniquely identify the port type and a type signature.

There are actually two kinds of outport: one-way outports and two-way outports. An outport is one-way if its declaration does not specify a return type. This means that the outport only generates data but does not expect a reply. A one-way port is asynchronous. send simply enqueues its arguments for delivery to all connected inports, and returns immediately. A port need not be connected to anything. If an outport is not connected to any dowstream boxes, its message stream is transparently dropped.

A two-way outport declares a return type. This means that a two-way outport expects a reply. It also means that at most one inport can be connected to a two-way outport. A two-way port is synchronous. send delivers its arguments to receiving inport and waits until the handler for that inport has generated a reply. Two-way ports allows the destination inport to return a result to the calling outport.

Unlike the function call in imperative programming, a box outport need know to what inports it has been connected. For example, an IP router box might include a two-way outport that looks up routes for incoming IP packets. This outport could be connected to either a RIP or OSPF or BGP routing policy manager box at various nodes depending on the particular routing protocol being used in the network. Thus box composition provides true encapsulation and information-hiding.

The syntax for an outport declaration is:

outport-declaration :: "outport" type name "("  formal-parameters ")" ";"

formal-parameters :: [formal-parameter ("," formal-parameter)*]
formal-parameter :: type variable-declarator-id
variable-declarator-id :: ID ["[" "]"]
The type of a formal parameter is one of the Java primitive types (boolean, char, byte, short, int, long, float, double, String), a class that implements the interface netscript.kernel.msg.Copyable, or an array of one of these types.

Inports

An inport represents one of the inputs to a box. The inport declaration has three parts: a name, an argument signature, and the optional event handler. essageWhen a message from an outport arrives at an inport, the optional event-handler is called to process the m. The event-handler is simply sequential Java code that executes in response to the arrival of a message at an inport. Typically, the event-handler might operate on the message, possibly modify the message, and output the message on one or more of the outports of the box. For example, the Splitter box below demultiplexes a stream of incoming Ethernet packets into a two outgoing streams, one for packets that encapsulate ARP and the other for packets that encapsulate IP.

The syntax for an inport declaration is:

outport-declaration :: "inport" type name "("  formal-parameters ")" (java-block | ";")

Example 1: Ethernet Printer Box

To illustrate these concepts, consider a simple example. EthPrinter is a box that filters and prints Ethernet packets. It receives a stream of Ethernet packets on its single inport (ethIn) and generates the same stream on its single outport ethOut. Each incoming Ethernet packet is passed to a box called split which splits its input stream (ethIn) into two output streams (ipOut and arpOut). split examines the proto field in each incoming packet and sends IP packets to ipOut and ARP packets to arpOut. It drops all other packets.

Example Box: Ethernet Packet Printer

The two outports of split are connected respectively to printARP and printIP, which simply print each incoming packet on the console. Finally, the ethIn inport of EthPrinter is also connected directly to its EthOut outport, thereby forwarding all incoming packets to EthOut. Thus EthPrinter could be used as a debugger box by connecting it between an Ethernet packet generator and another box that operates on Ethernet packets.
box EthPrinter
{
  inport void ethIn (Eth m);
  outport void EthOut (Eth m);
  
  // declarations for internal boxes 
  EthSplitter split;
  ARPPrinter printARP;
  IPPrinter printIP;
   
  connect
  {
    ethIn -> split.ethIn;
    ethIn -> ethOut;
    split.ipOut -> printIP.in1;
    split.arpOut -> printerARP.in1;
  } 
}

Box Composition

In NetScript, one builds protocol software by connecting together simpler boxes together to form a composite box that performs some desired function. This composite box can then be used as a component in other NetScript programs or it can be deployed directly on NetScript engines in the network. In a box definition, the connect block defines the connections between internal boxes.

The connect operator (->) connects a port of one box to the port of another. NetScript uses dot notation to refer to ports within a box. Thus split.in1 refers to the in1 port of split.

Port connection is strongly typed. Therefore a connection is valid only if the signatures of the two ports match. The signatures of two ports match if both have the same number of formal arguments and the type of each respective parameter also matches. Even when ports can be connected dynamically, port types are checked to ensure that NetScript the composition is type safe.

NetScript also allows connections from outport to outport, from inport to inport, and inport to outport. However, such connections are only allowed on the 'inside' of a box to connect a port of the enclosing box to the port of an internal box. In the example above, the connection from inport ethIn of EthernetPrinter to inport in1 of split is an inport to inport connection. Typically, such connections simply forward messages to the next connected port. Thus messages arriving on ethIn are forwarded directly to the in1 port of split. A internal connection (inport to inport, outport to outport, inport to inport) is legal only the number and types of the two ports are the same.

The NetScript compiler takes a box description as the one above and translates it into an equivalent Java source (subclasses of box, inport, and outport). This translated source is passed to a Java compiler to create byte-code classfiles that are dynamically loaded at NetScript engines. When dynamically loaded at remote NetScript engines, compiler-generated code in the box constructor creates internal boxes and makes connections between these. NetScript comes with a Java class library that allow NetScript boxes to dynamically create boxes from Java, make connections between boxes, query engine and box state, and so on.

Example 2: Use of event-handlers, packet-classification and message-passing

The program shown here implements the EthSplitter box used in the previous example. It illustrates the use of an event-handler, and of send to communicate on an outport.
box EthSplitter
{

  outport void ipOut (EthPacket eth);  // outport type declaration
  outport void arpOut (EthPacket eth); // outport type declaration
   
  inport ethIn (EthPacket eth)         // inport type declaration
  {
        short proto = eth.getProto();
    if (proto == 0x800)                // IP packet inside
    {                                    
      ipout.send (eth);                // send packet to ipout outport
    }
    else if (proto == 0x806)           // ARP packet inside
    {
          arpout.send (eth);               // send packet on arpout outport
    }
  }

  // port instance declarations
  public ipOut ipout;
  public arpOut arpout;
  public ethIn ethin;

}
EthSplitter takes a stream of Ethernet packets on its input port (ethIn) and generates two output streams on its outports ipOut and arpOut. The ipOut() port will contain Ethernet packets that encapsulate IP; the arpOut() outport will contain Ethernet packets that contain ARP. All other packets are dropped.

A body of a NetScript event-handler is written in Java. Any code that can appear in a Java method body can be used in the NetScript event-handler, except that the handle body must not propagate any Java exceptions.

When a message arrives at the an inport that contains an event-handler, the NetScript runtime system assigns incoming arguments to respective arguments of the event-handler and calls the event-handler. The event-handle runs atomically to completion.

In the example above, the body of the event-handler is a simple if statement. This if statement demultiplexes the input stream based on the proto field in the incoming Ethernet frame. It uses send() to direct IP packets to the ipOut outport and ARP packets to the arpOut outport.

How Boxes Communicate: The send Operator

A box uses the send operator of an outport to communicate with other boxes. The arguments to send are passed by-value. That is, for each port connected to an outport, NetScript makes a copy of the arguments passed to send and queues these arguments for delivery to each downstream port. Note that in active network, components can potentially come from multiple vendors. NetScript's copying semantics enforces protection boundaries between boxes and ensures that independent boxes cannot not share or modify global state. Message copying, however, is extremely inefficient. To overcome the inefficiencies of copying, NetScript provides two types of message, mutable and immutable. Fields of mutable messages can be repeatedly modified, whereas immutable messages cannot be changed after creation. NetScript makes a deep copy when passing a mutable message between ports. Immutable messages, however, are efficiently passed by-reference when communicating boxes share the same address space. Section xxx shows how to declare immutable and immutable messages.

Programming in the NetScript Toolkit

This section is an overview of the NetScript Toolkit a set of Java classes that implement NetScript's central constructs: boxes, inports, outports, and messages. The front-end language described above compiles directly to the Java classes in the Toolkit. In fact, there is a one-to-one mapping from language constructs to the Java classes described here. In this section, you will learn how to use the Toolkit to create and compose NetScript boxes. These boxes can be dispatched to NetScript engines at nodes in the network, dynamically loaded and installed there, to extend the network with new functions.

All the classes described here are part of the package netscript.kernel. The accompanying javadoc documentation provides provides further information on individual classes.

In order to create a new box, you need to subclass netscript.Box. For each box outport with a unique signature you need to subclass netscript.kernel.Outport. Similarly, each box inport with a unique signature will need to subclass netscript.kernel.Inport.

To introduce a new packet type, for instance to process UDP packets, you will need to subclass netscript.lang.kernel.msg.Packet..The package netscript.protocol.* provides packet format definitions for a number of common protocols (e.g., Ethernet, ARP, IP, UDP, etc).

Example: EthPrinter

This example shows the Java implementation of the NetScript EthPrinter example in the previous section. It shows how to subclass Box to create a new kind of NetScript box. It also shows how to define a NetScript inport and outport in Java, to declare internal boxes and to connect these together. Here's the Java code:
import netscript.kernel.Box;
import netscript.kernel.Inport;
import netscript.kernel.Outport;
import netscript.protocol.ethernet.EthPacket;

class EthPrinter extends Box
{
  // inport and outport declarations
  public EthInport ethIn;
  public EthOutport ethOut;
   
  public init (String args[])
    throws InitException
  {
    // connect ports of internal boxes together
    this.connectInportToInport ("ethIn", split, "ethIn");
    this.connectInportToOutport ("ethIn", this, "ethOut"); 
    split.connectOutportToInport ("ipOut", printIP, "in1"); 
    split.connectOutportToInport ("arpOut", printARP, "in1"); 
  } 

  // inport void EthInport(EthPacket eth)
  public class EthInport extends Inport
  {
    // handleMsg() declares the signature of this inport  
    public void handleMsg(EthPacket eth)
    {
      // empty body
    }

    // tell the engine not to call this handler
    public boolean hasHandler()
    {
      return false;
    }
      
  }
   
  // outport void EthOutport(EthPacket eth)
  public class EthOutport extends Outport
  {
        // send() declares the signature of this outport
    public synchronized void send(EthPacket eth)
    {
      // set up outport an
      setArg(0, eth);
      sendMsg();
    }   
  } 

}
There are several things to note about this program.

Example: Defining an Event-Handler in Java

This example shows how to define an event-handler for an inport.To do this, you simply need to define a public method in the appropriate Inport subclass. The example below shows how the EthSplitter Box from the previous example is written in Java.
// import Box, Inport, Outport, and EthPacket classes
import netscript.kernel.Box;
import netscript.kernel.Inport;
import netscript.kernel.Outport;
import netscript.kernel.protocol.ethernet.EthPacket;

class EthSplitter extends Box
{
  // inport and outport declarations
  public EthInport  ethIn;
  public EthOutport ipOut;
  public EthOutport arpOut;
  public EthOutport otherOut;

  // no internal box declarations

  public class EthInport extends Inport
  {
  
    public void handleMsg (EthPacket eth)
    {
      short proto = eth.getProto();
      if (proto == 0x800)
        ipOut.send (eth);
      else if (proto == 0x806)
        arpOut.send (eth);
      else 
        otherOut.send(eth);
    }
  }
  
  public class EthOutport extends Outport
  {
    public synchronized void send (EthPacket eth)
    {
      setArg(0, eth);
      sendMsg();
    }  
  }
}
The important things to notice about this program are:

NetScript's Packet Presentation Language

Network packets are encoded (i.e., serialized or flattened) into a byte-stream for transmission on the physical medium. On reception at network nodes, the protocol stack converts (or parse) the incoming serial representation of into equivalent language-level data-structures.

Typically, the packet formats of network protocols are specified with in a machine-independent presentation language. Examples of such languages include ASN.1, Corba IDL, and XDR. Protocol designers and implementors specify packet formats with a presentation language, and then use an appropriate compiler to generate a library of functions to convert between the serial network representation of a packet and equivalent language-level data structures and vice versa. For example, protocol implementations written in the C language might use XDR language to specify packet format; an XDR compiler would then generate equivalent C data structures (structs) and a set of functions to parse a serial byte-stream into a appropriat C structure and flatten this structure into a serial byte-stream.

NetScript which does not provide data structures of its own, inherits Java's central data structure, the object. Packets arriving of the physical medium must be parsed into equivalent Java objects and vice-versa Java objects that represent packets must be flattened into an appropriate serial representation. But existing protocols (e.g., Ethernet, IP, TCP, etc) use encoding schemes that are incompatible with the native representation of Java objects. Although programmers can do this conversion manually, the task extremely tedious and error-prone.

NetScript seeks to automate this frequent operation. The language provides a message construct which allows programmers to declare the logical structure of network packets. The message construct is compiled into an equivalent Java class, which, whenever possible, provides parse() and flatten() operations to automatically convert from and to serial representations of the packet structure. As an example, consider the following declaration of an Ethernet packet:

message big_endian Eth
{
    byte  dst[6];              // Ethernet source address, 6 bytes long
    byte  src[6];              // Ethernet dest address, 6 bytes long
    short proto;               // Encapsulation key, 2 bytes long
  body 
    byte  data[0..1492];       // encapsulated data 0 to 1492 bytes long
}
This declaration says that an Ethernet packet consists of a header of three fields (6 byte source address, followed by 6 byte destination address, followed by a 2 byte protocol identifier) and a body of 0 to 1492 bytes. The big_endian keyword declares the serial representation of Ethernet messages. The body keyword is used to identify and separate the packet header from the packet body. This allows NetScript to internally represent the packet as a tree (DAG) of byte-arrays and to efficiently perform frequent packet operations such as de-encapsulation (strip off header) and encapsulation (catenate header and body). This data structure is similar to the message structure in the x-kernel.

The NetScript compiler converts this declaration to the following two Java classes. (For conciseness, not all methods implementations are shown; full implementations are in the package netscript.protocol.ethernet.) The key points to note here are that the NetScript translator generates two classes (ImmutableEthPacket and MutableEthPacket), each of which encapsulates the packet byte-stream representation in private internal state and exports access operations on packet fields through appropriate setter- and getter- methods. For example, getProto() retrieves the proto field from within the packet while hiding its internal data representation; it automatically performs byte-swapping and alignment appropriate for the host machine.

The translator creates immutable and mutable versions of each message declaration. Immutable messages do not export any set operations. As such, the fields of an immutable message cannot be changed once after a message is created. Since immutable messages cannot be changed, they can be passed by reference from box to box. This permits efficient (single-copy) message-passing of references but maintains protection boundaries between boxes. (Protecting boxes from each other is a major design goal of NetScript since engines can be composed of components that come from various suppliers.)

The contents of mutable messages can be changed through compiler-generated setter() methods. However, mutables must be copied when communicated between boxes in order to maintain protection boundaries between active elements (boxes). Although many protocols (e.g. protocol analyzers) do not need to write to incoming packet streams, some protocols (e.g., NAT address translators, proxy caches and firewalls) do need to change fields in protocol headers. Mutable messages arne useful in such cases. In order to convert from one format to the other, the NetScript compiler generates methods toMutable() and toImmutable().

The current NetScript distribution comes with packet classes for a variety of existing protocols, including Ethernet, ARP, IP, UDP, etc. The implementations of these packet classes can be found in the netscript.protocol.* package hierarchy. Should you need to create packet declarations for your own protocols, use one of the included protocols as a template.