NPR: Router Plugins
This section is in the process of being re-written. The interface for using plugins has changed greatly. Stay tuned...
NPR plugins provide users a powerful mechanism for extending the functionality of a basic NPR. In the simplest approach, users can insert pre-written plugins that act on packets as they move through the NPR and then direct selected packets to the plugin using a filter. For example, you can send TCP packets to the delay plugin to emulate propagation delay. In its most general form, users can write their own plugins. For example, a user could replace the entire functionality of the Queue Manager by sending all packets to plugins that implement a completely different strategy for scheduling packets.
As shown to the right, packets arriving to the NPR pass through the PLC (Parse, Lookup and Classify) module, and then normally pass through the Queue Manager and Header Formatting blocks before being transmitted out to the network. However, users can use a filter to direct packets to plugins for customized packet processing. For example, plugins can be used to:
- Examine packet headers and/or bodies
- Modify packet headers and/or bodies
- Delay packets
- Produce additional packets
- Change the normal packet forwarding action
As shown in the diagram (right), there are five microengines that can be used to execute plugin code. These plugins can be one of the predefined plugins, or they can be user-supplied plugins. This section discusses how to use predefined plugins. The following section Writing An NPR Plugin discusses how to write your own plugin.
Typically (path 1), after a packet has been processed by one or more plugins, the packet can be reinjected into the normal packet path by sending it to the Queue Manager. But if a plugin is used only for monitoring and the packet is a packet copy produced by an auxilliary filter, the plugin might want to drop the packet (path 2). A third possibility is that after processing the packet, the plugin might reinject the packet (with or without modification) back into the packet path for reclassification (path 3). In this reclassification scenario, the NPR also allows the user to tag a packet with a positive integer tag which can be used during reclassification. Some of these capabilities are demonstrated by predefined plugins.
As shown in the RLI plugin dialogue window to the right, plugin users can:
- Add (Load) a plugin into one of the five plugin MEs
- Delete (Unload) a plugin from a ME
- Send a command (message) to a plugin
- Add plugin counter data to a graph
- Output debug messages from a plugin to a file
A user might send a message to a plugin to either get information from a plugin or to change a plugin parameter.
Although this section discusses how to use predefined plugins, it is instructive to see some of the things that are involved in writing your own plugin. A plugin developer writes a plugin in microengine (ME) C which is the standard C-like language provided by Intel for use on MEs. The most important differences between microengine C and ANSI C are dictated by the IXP architecture. First, there is no dynamic memory allocation or use because there is no OS or other entity to manage the memory. Second, all program variables and tables must be explicitly declared to reside in a particular type of memory (registers, ME local memory, scratchpad, SRAM, DRAM) as there is no caching. Third, there is no stack and hence no recursion. And finally, you can have up to eight threads per ME, but these threads must share control explicitly (i.e., there is no preemption).
To help users who are unfamiliar with this programming environment, we have developed a programming framework that lowers the entry barrier for writing simple to moderately complex plugins. Note that users are not required to use the framework. Users who are already experts with the IXP can do whatever they wish with the five plugin MEs. The framework consists of a basic plugin structure that handles tasks common to most plugins and a plugin API that provides many functions that are useful for packet processing.
To support plugin developers, we provide a plugin API. The API consists of helper functions for common packet processing steps as well as functions that hide some of the complexity of interacting with packets and packet meta-data. Some examples are:
- Read packet headers from DRAM into local structures
- Increment local counters
- Read/write meta-data
- Prepare packets for sending to other blocks in the router
- Compute a checksum
Much of the complexity in these functions deals with reading or writing potentially unaligned memory so that the plugin developer need not worry about such things.
A plugin developer writes plugin code in the microengine C language and then compiles the code into a loadable module. In the simplest case, the developer only needs to write the packet handling part of the code; i.e., the part that gets the packet (actually the meta-packet), processes the packet, and then puts the packet into the input buffer of the next NPR component (e.g., Queue Manager). The plugin developer typically writes new plugin code by copying the code of some simple plugin (e.g., count) and then writes the code in the handle_pkt_user() function. For example, in the predefined count plugin, the function handle_pkt function does the necessary I/O and then calls handle_pkt_user to do the packet processing (incrementing a counter). In most cases, the control message handling code from a simple plugin can be easily customized for most applications.
Installing A Plugin
This page describes how to install a predefined plugin. We demonstrate the procedure using the delay plugin as the exemplar. In the example, there are two TCP flows with senders shown on the left and receivers on the right such that h5x2 sends packets to h1x2 and h4x2 sends to h2x2. All data packets are maximum length (1470 bytes). Output port 1.4 will be a 10 Mbps bottleneck which is fed from two a 300,000 byte queues (queue 64 for h5x2 traffic and queue 65 for h4x2 traffic). We will delay ACK packets arriving to NPR 2 (right) by 50 msec (milliseconds). The detailed steps in a similar example are given in Examples => TCP With Delay.
Recall that the steps involved in installing a plugin are:
- Load a plugin into one of five MEs (microengines).
- Create a Filter to direct packets to the ME where the plugin is located.
We will load the delay plugin in NPR 2's plugin microengine 0. Then, we will install filters at ports 2.2 and 2.3 to direct the ACK packets to the delay plugin and then on to queue 64 at output port 2.1.
To the right are two figures that schematically show the key components that govern the two TCP flows. The first figure shows the forward path taken by the data packets as they leave the two senders. The second figure shows the reverse path taken by the returning ACK packets as they leave the two receivers.
In the forward path, we install a filter F at each of the input ports 1.2 and 1.3 to direct packets to reserved queues 64 and 65 respectively at output port 1.4 (the bottleneck). We will configure the two queues so that they get an equal share of the 10 Mbps output port capacity. Each of the two reserved queues will be configured to hold atmost 150,000 bytes. When the packets from both flows reach port 2.1, the routing table R forwards them to their respective output ports (2.2 and 2.3) where they are queued in default datagram (non-reserved) queues for transmission to h1x2 and h2x2 respectively. All ports have a capacity of 1 Gbps except the bottleneck port.
In the reverse path, we install a filter F at each of the input ports 2.2 and 2.3. The filters will be configured to direct all packets (TCP, ICMP, ARP, etc.) to the delay plugin D and then on to reserved queue 64 at output port 2.1. The delay plugin has been programmed to have a default delay of 50 msec. We use default datagram queues at all output ports for the ACK packets except output port 2.1 and accept the default port capacities of 1 Gbps for all output ports.
In the explanation below, we show only the installation of the plugin and the associated filters and assume that you know how to install the other filters, configure the bottleneck port and generate the TCP traffic. We begin with the installation of the delay plugin.
Load The Pre-Defined Delay Plugin Into NPR 2, Microengine 0
There are pre-defined (standard) plugins available to the user. The code for these plugins is located in the subdirectories of the directory /users/onl/npr/plugins/. For example, the delay plugin source code is located at /users/onl/npr/plugins/delay/delay.c. Most of these plugins were written to demonstrate various capabilities of plugins. As of this writing, the pre-defined plugins are:
- null: Forward packets without modification.
- nstats: Count TCP, UDP and ICMP packets and drop.
- ipHdr: Check the checksum fields and forward packets to other MEs depending on whether the checksum was correct or not.
- delay: Delay packets.
- debug: Produce debug messages.
- count: Count packets using standard plugin counters.
Details of these plugins are documented in README files stored in their source code directories (e.g., /users/onl/npr/plugins/delay/README). They are also described later in this tutorial.
Other plugins that should come on-line soon demonstrate how to: modify a packet; drop a packet; drop a packet and delay undropped packets; send packet headers to an endhost; emulate packet transmission delay; and transmit packets in priority order.
To load the pre-defined (standard) delay plugin, you must:
- Select Configuration => Plugin Table for the target NPR.
- Select Operations => Configuration => Add Plugin.
- Enter the microengine number 0 into the microengine field and the full path to the delay plugin into the path field. The full path in this case is /users/onl/npr/plugins/delay/uof/delay.uof.
- Upon successful completion the plugin path and microengine number will show up in the Plugin Table.
The Plugin Management Window for NPR 2 will appear.
Recall that the source code (and binaries) for these plugins are located in subdirectories of the directory ~onl/npr/plugins. Although all of the plugins are useful as examples of plugin coding, the most useful plugins from a usage point of view are delay, nstats and count.
The Plugin Dialogue box will appear.
Note: Recall that there are five MEs (numbered 0 through 4) that can be used to run plugins. We could have installed the delay plugin into any of the five MEs since we will use only one plugin. Our choice of ME 0 was arbitrary. But when you install a filter to direct packets to this plugin, you will need to indicate the microengine number that you entered here.
Create a Filter
Now we need to install filters at input ports 2.2 and 2.3 to direct packets to the delay plugin. To do this for input port 2.2, we need to:
- RLI Window: Select NPR.2:port2 => Filter Table.
- NPR.2:port2 Tables Window: Select Edit => Add Filter.
- Add Filter Window: Modify the default filter so that all packets will be forwarded to the delay plugin:
- protocol: Select * so that packets with any protocol field (including TCP, ICMP and ARP) will be delayed.
- port/plugin selection: Change the selection to plugin only so that packets will be sent to a plugin.
- output ports: Enter 1 so that when the delay plugin forwards a packet, it will be sent to output port 1 (i.e., port 2.1).
- output plugins:
Enter 0 so that matching packets will be sent to
plugin ME 0.
Note: We specified that the delay plugin was to be loaded into ME 0 in the Plugin Table earlier.
- qid: Enter 64 so that when a packet is sent to output port 1, it will be placed in reserved queue 64. For this example, we could have left the value at 0 which would have meant packets would be placed into one of the 64 datagram queues using a hash function. Our selection of QID 64 was arbitrary and was chosen to demonstrate the capability for selecting a specific queue.
- Select Add to add the filter.
- RLI Window: Select File => Commit to commit the change.
This opens up the Filter Table at port 2.2.
This opens up the Add Filter Window which allows you to modify the default filter fields.
Note: We accepted the default values for about half of the fields. The top two lines indicate that we don't care about the IP address fields or port fields; i.e., this filter will match any values in those fields. We accept the default priorty of 50 making it higher priority than the default route table priority of 60 (a lower number means higher priority).
Test The Delay Plugin
Now we test that packets sent from h5x2 to h1x2 and from h4x2 to h2x2 will be delayed by 50 msec by sending some ping packets from the senders to the receivers.
The figure (right) shows that we logged into $h5x2 () and sent five ping packets to h1x2. The RTT of the first packet was 226 msec, much larger than the expected 50 msec. The extra time was because an ARP request-reply sequence occurred before the first packet could be forwarded to its destination. But the RTT of the remaining four packets is 50.1 msec &mdash very close to the desired delay considering that the RTT would be in the tenths of milliseconds with out the delay plugin.
The next two figures shows the bandwidth and queue length charts when we send TCP traffic using iperf. The second flow from h4x2 starts about 4 seconds after the one from h5x2. The "1.2 rx" and "1.3 rx" curves are the bandwidths seen at input ports 1.2 and 1.3, and the "2.2 tx" and "2.3 tx" curves are the bandwidths seen at output ports 2.2 and 2.3 respectively.
Both flows start by doubling the send window every RTT as they go through their slow-start phase. This phase ends when several packets are dropped. Then, fast retransmission/fast recovery begins and perhaps a timeout eventually occurs. Finally, another slow-start period starts but with an ssthresh value that limits the send window to something near the available transmission capacity. When the second flow contends for the 10 Mbps bottleneck capacity, each flow gets about 5 Mbps (an equal share of the transmission capacity). When the first flow finishes, the second flow gets the entire capacity of the bottleneck link.
Note that since our monitoring period is 1 second, the charts won't show the fine details of the slow-start phase. But a rough calculation will show that the slow-start period is about right. Since the senders are sending maximum-sized packets (1470 bytes + 40 bytes of headers), the packets are about 12,000 bits. That means that the input rate in the first RTT is about (12,000 bits/50 msec) = 0.24 Mbps. This rate doubles every RTT so that in round k, the effective transmission rate of the sender will be about 0.24 x 2^k or 0.24 x 2^k. Since the bottleneck link is 10 Mbps, we seek the smallest k such that:
0.24 x 2^k Mbps >= 10 Mbps
So, we expect that it will take just under 6 RTTs or about 300 msec to experience the first packet drop. Since 300 msec is much less than our monitoring period of 1 second, we shouldn't be surprised that the bandwidth chart doesn't show that the first flow didn't reach 10 Mbps during its first slow-start phase. But note that it does reach that rate in its second slow-start phase and remains there until the second flow contends for the bottleneck capacity.
The queue length charts monitor the length of queues 64 and 65 at
output port 1.4.
Both queues were configures to have a capacity (threshold) of 150,000 bytes.
We see the classic saw-tooth pattern of the queue lengths during the
congestion avoidance phase where the sender windows increase by one
packet every RTT.
When they detect a packet drop, they cut their window in half and begin
the bandwidth probe again.
If we had also monitored packet drops at port 1.4 as described in
Monitoring Queue Length and Packet Drops,
we would have seen drops correlated with the queues reaching their capacities
of 150,000 bytes.
What if we want a delay different than 50 msec? The next section describes how to configure the delay plugin for a different delay value and how to get counts such as the maximum number of packets queued in the delay queue.
Sending A Message To A Plugin
Most useful plugins respond to control messages. Some control messages provide feedback to the user (GET) while others allow the user to set plugin variable values (SET). For example, most plugins have a counter that indicates the number of packets seen by the plugin. The plugin developer usually provides one control message for retrieving that counter value and another for resetting the counter. As an example, the delay plugin has four visible counters: npkts (number of packets seen), maxinq (maximum number of packets in the delay queue), ndrops (number of packet drops from queue overflow), and ninq (number of packets in the delay queue). But some plugins allow the user to change the behavior of the plugin by setting control variables. For example, the delay plugin normally delays packets by 50 msec, but a user can change that delay through the control message interface.
We have already seen how to install a plugin through a plugin control window. This same window can be used to send messages to a specific plugin through the Edit => Send Command to Plugin menu item. This section describes how messages get to and from a plugin and illustrates how to send messges to a plugin:
Plugin Message Architecture
This section describes the architecture of the plugin message system by following a control message as it makes its way from the RLI to the Xscale processor and then to the plugin which sends a reply message is back to the RLI.
The figure (right) shows the =counts control message as it is created by the user in the Send Command to Plugin window. The message enterred in the message window is sent unedited to the target plugin. You should realize the following about the message you enter:
- You can not enter more than 28 characters (including spaces).
This is usually not an issue since most commands are much shorter than 28 characters. But since spaces are sent as enterred, you should not be so cavalier as to think that you can send an arbitrary number of spaces.
- A plugin receiving a message expects the message to be a sequence
of white-space separated words.
Typically, a plugin will convert numeric words to their internal form. For example, the delay plugin message "delay= 25" sets the delay to 25 msec. The word "25" is passed in ASCII form to the plugin where it is converted to a 32-bit integer.
- There is no standard input message format except that all plugins
to-date assume that the first word is a command or operation name
(e.g., =counts) and that 0 or more words corresponding to
command arguments follow the command.
The number of arguments is plugin dependent.
NPR Router Plugins => Predefined Plugins document the command formats for predefined (or standard) plugins. By convention, plugin writers are expected to supply a README file for each plugin that documents the control messages. The README files for predefined plugins are located in the standard plugin source code subdirectories located at ~onl/npr/plugins/ (e.g., ~onl/npr/plugins/delay/README for the delay plugin).
Shown in the figure (right) is the path (shown in red) taken by a request message from the RLI to plugin 0 through the Xscale and the returning reply message which takes the reverse path. The RLI tranmsits the plugin message to a control daemon running on the Xscale processor of the target NPR. The control daemon forms an internal message consisting of a 4-byte (32-bit) header followed by the 28-byte user message and puts the message into the SRAM message queue of the target plugin (plugin0 in this example). The figure also shows that these message queues are 512 words (one word is 32 bits). A simple calculation shows that a message queue can hold atmost 16 messages; i.e., these message queues are small when compared with the meta-packet queues which are 64K words each.
Meanwhile one thread of each plugin is typically blocked in its handle_msg routine waiting for a control message to arrive to its message queue. When the message handling thread is given control of the ME, it wakes up and processes the control message and sends back a reply. The outgoing reply message has the same format as the incoming request message (one word header followed by a 7-word message body). By convention, the body of the reply message is an ASCII C-string. This means that if a plugin wants to return one or more integer values, it converts the value from internal representation to a string before putting the reply message in its reply queue.
A plugin inserts the reply message into its reply message queue. Eventually, a control daemon running on the Xscale processor of the plugin's NPR reads the reply message and sends it to the RLI where it is displayed in the Command Log window.
Delay Plugin Example
Each plugin understands a specific set of commands. We use the delay plugin as an example and illustrate how to get counter values, reset counters, and set the delay to 25 msec. A more detailed example is given in the example page NPR Tutorial => Examples => TCP With Delay and specifically in these two subsections: Install a delay plugin and Get counts from the delay plugin.
Here is a list of the most useful delay plugin commands that can be enterred in an RLI plugin message window:
|SET||delay= X||Set the delay to X msec (Note SPACE character after '=')|
|SET||reset||Reset all counter (npkts, maxinq, ndrops, ninq) values|
|GET||=counts||Get counts (npkts, maxinq, ndrops)|
|GET||=delay||Get the current delay setting (msec)|
|GET||=ninq||Get the number of packets in the delay queue|
Our intent here is not to describe the delay plugin in detail but to illustrate the concept of a plugin control message. More details about the delay plugin in particular and predefined plugins in general are described later in NPR Tutorial => Router Plugins => Predefined Plugins and summarized in NPR Tutorial => Summary Information => Predefined Plugins.
CAVEAT: At this point, you might think that there is a standard format for control messages because the table above seems to indicate that GET operations begin with '=' and SET operations end with '='. As of this writing, there is no such standard or convention applied to all plugins. Only the delay plugin uses this convention.
In the example below, we assume that the delay plugin has been installed in microengine (ME) 0 of NPR 2. Furthermore, the screen shots are for the case where TCP packets go through a path with the following characteristics:
- 10 Mbps bottleneck
- 50 msec delay
The figures below illustrate how to read counter values and then how to reset them. We then show how to set the delay to 25 msec and how the maxinq counter value is affected by halving the delay.
- Read the main delay plugin counters
Select Edit => Send Command to Plugin
- Send Command Window:
Enter 0 in the microengine box; Enter =counts in the message box; and Select Enter.
- npkts: Total number of packets seen by the plugin
- maxinq: The maximum number of packets in the delay queue
- ndrops: The number of packets dropped by the plugin
Plugin Command Log Window:
A message shows that the three counter values
are npkts = 82008, maxinq = 43, and ndrops = 0.
A Send Comamnd window appears allowing you to enter a plugin command.
The =counts command reads the three main packet counters:
Note: It is interesting that the value of maxinq is 43; i.e., at some point the plugin had 43 packets in its queue. A simple calculation will show that the bandwidth-delay product (BDP) for our example is about 42 packets (= 10 Mbps x 50 msec / 12 Kbits/pkt). This is no coincident since the delay queue represents the packet pipeline associated with the 50 msec of delay.
Suppose we want to repeat the TCP experiment but with a 25 msec delay instead of 50 msec. Before doing so, we would like to reset the delay plugin counters and then verify that the counters have been zeroed.
We follow the same procedure for the =counts command above but instead send the reset command first.
Send Command Window:
Enter 0 in the microengine box; Enter reset in the message box; and Select Enter.
Send Command Window:
Enter =counts in the message box; and Select Enter.
The figure shows the output from these two commands with the last one showing that the three main counter values have been zeroed.
Note: The convention followed by the delay plugin for command names is that commands that read from plugin variables begin with the equal character (e.g., =delay, =counts). While those that write to plugin variables end with the equal character (e.g., delay=).
- Send Command Window:
Enter 0 in the microengine box; Enter delay= 25 in the message box; and Select Enter.
- If we start another TCP flow and then read the counters, we should see the effect of reducing the delay by one-half on the maxinq counter.
- Send Command Window:
Enter 0 in the microengine box; Enter =counts 25 in the message box; and Select Enter.
Watch Out!!! There must be atleast one space after the equal sign.
The figure showing the Plugin Command Log Window shows in the last two lines that the plugin delay was set to 25 msec. Furthermore, it shows that the value of maxinq is 23 which is a little more than one-half of its previous value of 43 when the delay was 50 msec.
We introduced the delay plugin earlier. This section introduces you to the other predefined plugins. The predefined plugins have been written by the ONL staff with one of two purposes in mind:
- To provide useful functionality; or
- To demonstrate some programming feature
Thus, some plugins are more useful to the plugin writer while others are more useful to the plugin user. We call the former type of plugin a demo plugin (or demonstration plugin) and the latter a user plugin. For the most part, demo plugins are not useful by themselves except as coding examples. However, user plugins often contain code that demonstrate some programming feature and can also be useful to the plugin writer. All of the predefined plugins can be used as a base for developing more extended plugins.
The source code and documentation for predefined plugins can be found in the subdirectories of the directory ~onl/npr/plugins/. For example, the source code for the delay plugin is in the directory ~onl/npr/plugins/delay/. You can get the most up-to-date information on predefined plugins by looking at the README files in each of the plugin directories (e.g., ~onl/npr/plugins/delay/README describes the delay plugin). This section gives an overview of the predefined plugins. The page NPR Tutorial => Writing A Plugin => More Useful Code provides some more details of their inner workings.
The following table summarizes the functionality and/or purpose of the predefined plugins. We list the user plugins first followed by the demo plugins.
|delay||User|| Delay packet X msec.
Described in detail in this tutorial.
|drop-delay||User|| Drop and delay packets.
|emulink||User|| Emulates the transmission delay of a link with capacity R.
|nstats||User|| Counts ICMP, TCP and UDP packets and then drops all packets.
|priq||User|| Schedules packets based on three priority levels: high,
medium and low.
|delay++||User|| Delay packet X msec (plugin chaining)
|erd++||User|| Early Random Drop (plugin chaining)
|shaper++||User|| Traffic shaper (plugin chaining)
|mycount||Demo|| Described in detail in this tutorial.
|damage||Demo|| Damage UDP packets selected deterministically or
|debug||Demo|| Code shows how to display debug messages and handle
control messages. Counts packets using an internal variable instead of a PluginCounter.
|count||Demo|| Counts packets using PluginCounter 0 and forward packets.
|drop||Demo||Drops selected packets or drops every kth packet|
|ipHdr||Demo|| Checks IP checksum, TCP/UDP checksum, and IP header length
fields. Forwards TCP/UDP packets to other plugins depending on checksum results. Rotates the payload bytes of UDP packets.
|null||Demo|| Forwards packets.
Serves as a starting point for building other plugins.
|stringSub||Demo|| Replaces all occurrences of the string "hello"
with the string "adieu" in the packet payload.
Because the set of predefined plugins continues to grow, these tutorial pages only touch on the most basic ones. Consult the README files in the subdirectories of ~onl/npr/plugins/ for the most up-to-date documentation.
Basic User Plugins
We describe the user plugins in more detail below. But we leave the complete details to the tutorial pages referenced at the beginning of this page and the README files in the source code directories.
The delay plugin delays packets for a number of milliseconds. Because numerous examples of its use have been given earlier, we will not describe it further here.
The drop-delay plugin is an extension of the delay plugin. The plugin first determines if it should drop a packet. Then, any surviving packet is delayed by a fixed number of milliseconds. The drop selection can be deterministic or probabiliistic. If deterministic, the user specifies a stride value k which causes every kth packet to be dropped. If probabilistic, the user specifies a percentage P which causes P percent of packets to be randomly dropped.
The emulink plugin emulates the delay experienced by a link with capacity R. This is different than setting the port rate in the Queue Table because the port rate setting configures a token bucket regulator. The regulator will allow an initial burst of two maximum sized packets at maximum rate (1 Gbps). But the plugin will always delay the first packet by L/R if L is the length of the packet; i.e., it is delayed by its theoretical transmission delay. If the next packet arrives during the transmission period of the first packet, it will be queued until the first packet's emulated transmission delay has finished. In the long-run, a token bucket with average rate R will have the same long-term average output rate as the emulated link with capacity R.
The nstats plugin counts the number of ICMP, TCP, and UDP packets storing them in the PluginCounters 0, 1, and 2 respectively. If you want the packets to reach their destination, use an auxilliary filter to direct packets to the plugin so that the original packet can make it to its destination. PluginCounters can be easily monitored through the Monitoring => PluginCounter menu item that each NPR has. The next page gives and example of the nstats plugin in use. And the tutorial page NPR Tutorial => Examples => Using An Auxilliary Filter gives a detailed example of its use.
The priq plugin implements priority queueing for three traffic priorities: high, medium and low. The priority is based on the QID selected in the filter that directs the meta-packet to the plugin. All packets are placed into queue 64 of the chosen output port in priority order. To do this, it immediately passes all high-priority meta-packets to the Queue Manager, but queues medium- and low-priority packets until queue 64 is empty.
Plugin Chaining Plugins
Plugin chaining is a meta-packet forwarding paradigm that follows these rules:
- If the plugin is loaded into plugin ME 4, it will forward all meta-packets to the Queue Manager.
- If the plugin is loaded into a plugin ME k where k is not 4, it will forward all meta-packets to plugin ME k+1.
By convention, plugins with names ending in ++ (e.g., shaper++, delay++, erd++) follow the plugin chaining paradigm. An example of a plugin chain might be the plugin sequence shaper++, delay++, erd++ in which traffic is first shaped, then delayed, and finally dropped or forwarded to the Queue Manager depending on the length of its destination queue.
The delay++ plugin is just the basic delay plugin that has been changed to conform to the plugin chaining rules.
The erd++ plugin implements the Early Random Drop (ERD) queue management algorithm: it drops packets with a fixed probability if the length of the queue being managed exceeds a threshold. ERD was intended to punish unresponsive, aggressive flows. The idea was that if a large backlog developed, we should drop packets with a fixed probability. The hope was that flows consuming more than their fair share would see their packets dropped. But ERD didn't work well in practice. Since ERD was first developed, better queue management algorithms have been developed such as RED (Random Early Detection), BLUE, Stochastic Fair BLUE, and Queue State DRR.
The shaper++ plugin implements traffic shaping using a token bucket; that is, the output of the traffic shaper with burst size B and rate R has a long-term average rate of R with an initial burst of B bytes after a sufficient idle period. Thus, it maintains a constant output rate of R Kbps when sufficiently backlogged but will allow a burst of atmost B bytes if it has been idle long enough. Equivalently, it ensures that the following bound is maintained during a period when a queue is backlogged:
Number of bits transmitted in time t <= R*t + 8*B
More details of the plugins that support chaining can be found in the page NPR Tutorial => Writing A Plugin => More Useful Code
Monitoring With A Plugin
Suppose that you want to chart the value of a plugin variable. The RLI makes it easy to do this through its monitoring menus. For example, the nstat plugin keeps track of the number of ICMP, TCP and UDP packets by recording these values in special registers called PluginCounters. The RLI makes it easy to chart these PluginCounter values through the PluginTable => Operations => Monitoring => PluginCounter menu item available in each NPR. This section begins with a description of PluginCounters. Then, it shows how to chart the values stored in PluginCounters by showing you how to chart the ICMP, TCP and UDP counters used by the nstats plugin.
The nstats plugin counts the number of ICMP, TCP and UDP packets that it receives. These three counts are stored in the first three register counters available in each plugin ME for storing general counts (four are available). These register counter values are labeled as PluginCounters in the RLI and are visible to the user through the PluginTable => Operations => Monitoring => PluginCounter menu item available with each NPR. To the RLI user, the three counters of interest are known as PluginCounter 0, PluginCounter 1, and PluginCounter 2.
Although the Figure 25 shows the nstats plugin loaded into ME 0, it can be loaded into any of the five plugin MEs.
The general concept of register counters was briefly described in the tutorial page NPR Tutorial => Packet Processing. The register counters available to the plugin MEs are listed in NPR Tutorial => Summary Information => Counters as registers 38-57. The 20 counters are evenly divided among the five plugin MEs. (Aside: A plugin programmer refers to these counters through names that have the form ONL_ROUTER_PLUGIN_x_CNTR_y where x denotes the ME number and y denotes the counter number (i.e., x is in the range 0:4, and y is in the range 0:3).)
We will use the dumbbell network shown on the right. The hosts h6x2, h5x2 and h4x2 on the left send ICMP, UDP and TCP traffic respectively to their counter parts on the right (i.e., h6x2 sends ICMP packets using ping to h3x2; h5x2 sends UDP packets using iperf to h1x2; and h4x2 sends TCP packets using iperf to h2x2). All forward traffic goes through the 3.415 Mbps bottleneck at port 1.4 and share a reserved queue that has a capacity of 300,000 bytes. This means that there is a filter at each of the input ports attached to the senders.
Furthermore, we stagger the starting times of the three senders so that the ICMP traffic from h6x2 starts first. After four seconds, the UDP traffic from h5x2 starts, and then after another four seconds, the TCP traffic from h4x2 starts.
The nstats plugin is installed in microengine 0 of NPR 2 (the NPR on the right) and auxilliary filters are installed at input port 2.0 to direct copies of the incoming traffic from NPR 1 traffic to the plugin. Specifically, one auxilliary (sampling) filter sends a 12.5% sample of TCP packets to the plugin, and another auxilliary filter sends all ICMP and UDP packets to the plugin. This second filter is actually configured to send all packets to the plugin but with a priority lower than the first filter that matches only TCP packets.
We would like to plot the values of the three nstats packet counters. Suppose that we have already created an empty nstats counts chart using the Monitoring => Add Monitoring Display menu item in the main RLI window. Below is an example of how to plot the number of ICMP packets sent to the nstats plugin:
- nstats counts Window: Select Parameter => Add Parameter (not shown).
- RLI Window: Select NPR.2 => Monitoring => PluginCounter.
- Add Parameter Window:
Fill in the fields.
- microengine: 0
- counter: 0
- secs: 1
- Check the rate box.
- Select Enter.
We loaded the nstats plugin into ME 0 earlier.
The ICMP packet count is maintained in PluginCounter 0 by the plugin. This field should be 1 for the TCP plot and 2 for the UDP plot.
- Repeat all of the above steps for the TCP counter and the UDP counter; i.e., PluginCounters 1 and 2.
- Change the plot labels from the default PluginCounter to ICMP, TCP and UDP.
An Add Parameter dialogue window appears.
The chart label will appear in the nstats counts window.
The result is shown to the right.
The two figures below show the result of sending the traffic described earlier. We started the traffic by running the following script from the onlusr host:
1 source /users/onl/.topology # get defs of external interfaces 2 ssh $h6x2 ping -c 120 -s 1400 -i 0.2 h3x2 > /dev/null & # icmp 3 sleep 4 4 ssh $h5x2 /usr/local/bin/iperf -c h1x2 -u -w 1m -t 20 & # udp 5 sleep 4 6 ssh $h4x2 /usr/local/bin/iperf -c h2x2 -w 16m -t 0.01 & # tcp
The ping command on line 2 sends 120 1400-byte packets at an interval of 0.2 seconds (i.e., five packets per second for 24 seconds). The iperf command on line 4 sends UDP traffic at 1 Mbps for 20 seconds, leaving about 2.415 Mbps of capacity for the TCP traffic. The iperf command on line 6 sends TCP traffic for only 0.01 seconds. We will see later that the actual duration of the TCP traffic will be longer than 0.1 seconds because of retransmissions.
The chart on the left is the chart we got from running the traffic script. The chart on the right is the result from zooming in on the chart on the left (View => Zoom In From Selection).
The chart on the right shows the four-second stagger of the starting time of the three flows and that the ICMP traffic lasts for 24 seconds as expected. The chart on the left shows that about 1705 UDP packets were sent in about 20 seconds, and the nstats plugin saw about 350 packets. Furthermore, the slopes of the ICMP and UDP plots are nearly linear indicating that the traffic was sent at a near constant rate.
The iperf UDP window (not shown) shows that 2.39 MB of UDP traffic was received and that 81 packets out of 1785 packets were lost. A rough calculation shows that the UDP packet count makes sense. Since the packet length is 1,470 bytes, the 2.39 MB received is 2.39 x 1024 x 1024 / 1470 bytes/pkt or 1,704 packets. Also, if we send maximum-sized packets (about 1500 bytes) at 1 Mbps for 20 seconds, the number packets sent (roughly) will be 1 Mbps x 20 sec / 12,000 bits/pkt or about 1,700 packets.
The TCP packet count is not as straightforward to verify. The iperf TCP window (not shown) and the bandwidth chart of the sending traffic rates gives us some idea whether the TCP packet count makes sense.
The iperf TCP window showed that 3.99 MB of TCP traffic was received at an average rate of 2.32 Mbps. This 2.32 Mbps is close to the 2.415 Mbps of remaining output port capacity. But the 3.99 MB received by the server might seem strange since we only sent TCP traffic for only 0.1 second. The bandwidth chart suggests a partial explanation.
The chart on the left shows the bandwidths of the traffic coming from the senders (i.e., Monitoring => RXBYTE values), and the chart on the right shows a close up of the region when traffic began.
The TCP sending rate corresponds to the 1.3 rx curve; i.e., it is the traffic from n1p3. It appears that during the slow-start phase, the sending rate reaches about 5 Mbps even though the bottleneck is 3.415 Mbps. Then, the sending rate is throttled twice before it settles into around 2.4 Mbps (i.e., the residual capacity of the bottleneck). It appears that about 2.4 Mbps of traffic is sent for about 15 seconds or 3,000 packets. 3.99 MB of maximum-sized packets is about 2,846 packets. Since we sampled only 12.5% of these packets, we would expect about 356 TCP packets at the nstats plugin which is what we saw earlier. Only an examination of tcpdump output would reveal that the sender did indeed send out 2,846 packets in the first 0.1 second and then spent the remaining 15 seconds retransmitting some of these packets.