Dev:onl reservations

From ARL Wiki
Jump to navigationJump to search

The Open Network Lab now uses multiple 52 port gigabit switches to interconnect the hosts, routers, and other hardware (collectively referred to as components) that users can allocate to build their own topologies. Two of the ports on each switch support 12 Gbps interfaces, typically used to connect the switches in to a ring topology.

Original Model

Previously, we only have enough hardware to fill one switch. Since each switch is internally non-blocking any components could be connected to any other components without any concern over switch utilization. As such, the reservation model had no need to account for actual topologies used in any experiments. Therefore, only the number of each component was recorded to make a reservation. For example, a reservation might consist of 4 NPRs and 10 hosts. That is all the reservation system knew about, and that is all the RLI (and webpages) kept track of.

New Model

Unfortunately, the very simplistic original model no longer works. The addition of so many new components has forced us to use multiple switches. The limited 12 Gbps capacity between switches means that the reservation algorithm needs to ensure that at any time no more than 12 Gbps of traffic can possibly be flowing between any two switches. Otherwise, users may see artifacts in their experiments that can't be explained within the context of that experiment. At first, the plan had been to use the original model and compute worst-case inter-switch capacities, assuming that each component in an experiment was mapped to a specific switch. This is extremely limiting however. As example, consider this example switch setup:

Example physical topology.

There are two switches, each with 20 hosts and 4 5-port NPRs attached. If a user asked for an experiment with 8 NPRs and 20 hosts, we would have no choice but to reject the request. It could be that they would connect all hosts on one switch to NPRs and the same switch and not ever use any of the inter-switch capacity. On the other hand, they could also connect each of the 20 hosts on one switch to a host on the second switch, using 20 Gbps of the 12 Gbps link. In other words, we could never accept experiments that were this large (or larger). This example may be a bit extreme, but it certainly illustrates that we can not use the original model with worst-case inter-switch capacity assumptions.

The only way we could see to proceed was to change the model to account for the actual topology in the user's experiment, rather than just counts of components. Unfortunately, this has some ramifications that affect the end user experience in some ways.