Difference between revisions of "SPP Hardware Components"
Line 1: | Line 1: | ||
[[Category:SPP]] | [[Category:SPP]] | ||
− | + | == Overview == | |
− | Figure 2 | + | >> Figure 2 |
− | Figure | + | Figure 2 shows the main components of an SPP node. |
+ | Most of the components are blades in an ''Advanced Telecommunications Computing Architecture'' (ATCA) blade server. | ||
+ | ATCA is a standard for telecom-class blade servers, appropriate for a wide range of applications, including high capacity routers. | ||
+ | All input and output occurs through the ''Line Card'' (LC), which is an NP-based subsystem with one or more physical interfaces. | ||
+ | The LC forwards each arriving packet to the system component configured to process it, and queues outgoing packets for transmission, ensuring that each slice gets the appropriate share of the network interface bandwidth. | ||
+ | The architecture can support multiple LCs, but the systems being deployed for GENI have one LC each. | ||
+ | The ''General Purpose Processing Engines'' (GPE) are conventional dual processor server blades running the PlanetLab OS (currently Linux 2.6, with PlanetLab-specific extensions) and hosting vServers that serve application slices. | ||
+ | ''The Network Processing Engine'' (NPE) is a server blade containing two NP subsystems, each comprising an Intel IXP 2850 NP, with 17 internal processor cores, 3 banks of SDRAM, 3 banks of QDR SRAM and a Ternary Content Addressable Memory (TCAM). | ||
+ | The architecture supports multiple NPEs, but the deployed systems have a single NPE each. The NPE supports fast path processing for slices that elect to use this capability and provides up to 10 Gb/s of IO bandwidth. | ||
+ | The ''Control Processor'' (CP) is a separate rack-mount server that hosts the software that coordinates the operation of the system as a whole. | ||
+ | The CP also hosts a Net-FPGA card with four 1 GbE interfaces. | ||
+ | The Net-FPGA will be made available for use by researchers. | ||
+ | The switching substrate includes a chassis switch and a separate external switch, which provides additional 1 GbE ports. | ||
+ | The chassis switch board actually includes two switches, a ''Fabric Switch'' with both 1 GbE and 10 GbE ports for data traffic and a ''Base Switch'' with 1 GbE ports for control traffic. | ||
− | 3 | + | >>Figure 3. General Purpose Processing Engine (Radisys ATCA 4310) |
− | + | == General Purpose Processing Engine (GPE) == | |
− | 3. | + | The GPEs (see Figure 3) are dual processor blade servers, specifically Radisys ATCA 4310 blades with 2 GHZ Intel Xeon processors with 4 GB of memory and an on-board SAS disk (37 GB). |
+ | They have two GbE network interfaces, one on the fabric switch and one on the base switch. | ||
+ | The base switch interface is reserved for control traffic only and is not directly accessible to user applications running on the GPEs. | ||
+ | |||
+ | == Network Processing Engine (NPE) == | ||
The Network Processor Engine is implemented using a Radisys ATCA 7010 blade, which contains two Intel IXP 2850 NP subsystems. The 7010 blade communicates with the chassis switch through the Fabric Interface Card, a small mezzanine card that allows the 7010 to be used with different types of chassis switches. In the case of the SPP, the FIC provides a 10 GbE interface to the chassis switch that passes through the ATCA backplane. The 7010 blade can be configured with an optional input/output card that is mounted on the rear-side of the chassis. Such rear-mounted cards are referred to as Rear Transition Modules (RTM). The NPE does not use the RTM, but the Line Card does. The various components on the blade communicate through a Serial Peripheral Interface (SPI) switch supporting data rates of just over 12 Gb/s. The SPI interface transfers data in fixed length cells of 64 bytes, so is subject to segmentation losses when transferring variable length packets. The two NP subsystems share a Ternary Content Addressable Memory (TCAM) which can store up to 18 Mb of data and can be configured to support word lengths ranging from 72 to 576 bits. The xScale processors share a separate network connection to the base switch, which is used for control communication. | The Network Processor Engine is implemented using a Radisys ATCA 7010 blade, which contains two Intel IXP 2850 NP subsystems. The 7010 blade communicates with the chassis switch through the Fabric Interface Card, a small mezzanine card that allows the 7010 to be used with different types of chassis switches. In the case of the SPP, the FIC provides a 10 GbE interface to the chassis switch that passes through the ATCA backplane. The 7010 blade can be configured with an optional input/output card that is mounted on the rear-side of the chassis. Such rear-mounted cards are referred to as Rear Transition Modules (RTM). The NPE does not use the RTM, but the Line Card does. The various components on the blade communicate through a Serial Peripheral Interface (SPI) switch supporting data rates of just over 12 Gb/s. The SPI interface transfers data in fixed length cells of 64 bytes, so is subject to segmentation losses when transferring variable length packets. The two NP subsystems share a Ternary Content Addressable Memory (TCAM) which can store up to 18 Mb of data and can be configured to support word lengths ranging from 72 to 576 bits. The xScale processors share a separate network connection to the base switch, which is used for control communication. | ||
− | Figure 4. | + | >>Figure 4. Radisys ATCA 7010 Network Processor Blade with 10x1 GbE IO card |
+ | |||
− | + | Each of the IXP 2850s has an xScale management processor that runs an embedded version of Linux, plus ''16 MicroEngines'' (ME), which are 32 bit RISC processors optimized for packet processing. | |
+ | Each ME has a small program store capable of storing 8K instructions, a register file and a small data memory. | ||
+ | There is an on-chip SRAM that can be accessed by all of the MEs and multiple interfaces to off-chip memory. | ||
+ | These include four SRAM and three DRAM interfaces. | ||
+ | As with any modern processor, the primary challenge to achieving high performance is coping with the large processor/memory latency gap. Retrieving data from off-chip memory can take 50-100 ns (or more), meaning that in the time it takes to retrieve a piece of data from memory, a processor can potentially execute over 100 instructions. | ||
+ | The challenge for processor designers is to try to ensure that the processor stays busy, in spite of this. | ||
+ | Conventional processors cope with the memory latency gap primarily using caches. | ||
+ | However for caches to be effective, applications must exhibit ''locality of reference'', and unfortunately, networking applications typically exhibit very limited locality of reference, with respect to their data. Since caches are relatively ineffective for networking workloads, the IXP provides a different mechanism for coping with the memory latency gap, ''hardware multithreading''. | ||
+ | Each of the MEs has eight separate sets of processor registers (including Program Counter), which form the MEs hardware ''thread contexts''. | ||
+ | An ME can switch from one context to another in 2 clock cycles, allowing it to stay busy doing useful work, even when several of its hardware threads are suspended, waiting for data to be retrieved from external memory. | ||
− | + | The MEs include small FIFOs (called ''Next Neighbor Rings'') connecting to neighboring MEs, which can be used to support pipeline processing of packets. A pipelined program structure makes it easy to use the processing power of the MEs effectively, since the parallel components of the system are largely decoupled from one another. Pipelined processing also makes effective use of the limited ME program stores, since each ME need only store the instructions for its stage of the pipeline | |
− | + | == Line Card (LC) == | |
+ | The Line Card is implemented using the same Radisys 7010 blade as the NPEs. The one difference is that the Line Cards are configured with the optional IO card. There are multiple IO cards available for the 7010. The Line Cards use an IO card with ten 1 GbE interfaces. The card supports swappable modules allowing it to accommodate either copper or fiber connections. Packets are transferred between the IO card and the 7010 boards NP subsystems through the on-board SPI switch. This requires that the Ethernet packets be fragmented into 64 byte cells for transfer through the SPI switch, which can lead to fragmentation losses for packets that are just a little too long to fit into one SPI cell. | ||
− | Figure 5. Switch blade (Radisys ATCA 2210) | + | Figure 5. Switch blade (Radisys ATCA 2210) |
− | + | == Switching Substrate == | |
− | The chassis switch is a Radisys ATCA 2210 card which includes two switches, a fabric switch with 10 GbE ports (these can optionally be used for 1 GbE interfaces) and a base switch with 1 GbE ports. The fabric switch has four expansion ports on the front panel that can be used to connect directly to other components. The base switch also has four front panel ports which are used by the SPP for connections to the CP and the shelf manager. The external switch is a Netgear GSM 7228 with 24 1 GbE interfaces and the ability to host up to four 10 GbE interfaces. The SPP has one of these 10 GbE interfaces equipped and connected to the chassis switch through one of its front panel ports. | + | The chassis switch is a Radisys ATCA 2210 card which includes two switches, a fabric switch with 10 GbE ports (these can optionally be used for 1 GbE interfaces) and a base switch with 1 GbE ports. |
+ | The fabric switch has four expansion ports on the front panel that can be used to connect directly to other components. | ||
+ | The base switch also has four front panel ports which are used by the SPP for connections to the CP and the shelf manager. | ||
+ | The external switch is a Netgear GSM 7228 with 24 1 GbE interfaces and the ability to host up to four 10 GbE interfaces. | ||
+ | The SPP has one of these 10 GbE interfaces equipped and connected to the chassis switch through one of its front panel ports. | ||
− | + | == ATCA Chassis == | |
− | The SPP uses a Shroff 5U six slot ATCA chassis (model name Zephyr) with an integrated Shelf Manager. The Shelf Manager has an on-board CPU that provides low level maintenance access to the chassis. This allows the various blades in the chassis to be remotely controlled. This is used primarily to force a reboot of a component that is not responding as expected. | + | The SPP uses a Shroff 5U six slot ATCA chassis (model name Zephyr) with an integrated Shelf Manager. |
+ | The Shelf Manager has an on-board CPU that provides low level maintenance access to the chassis. | ||
+ | This allows the various blades in the chassis to be remotely controlled. | ||
+ | This is used primarily to force a reboot of a component that is not responding as expected. | ||
− | + | == Control Processor == | |
The Control Processor (CP) is implemented by a Dell PowerEdge 860 with 2 GB of memory and 160 GB of disk. It has three 1 GbE network connections. One is connected to the base switch (for control communication), one is connected to the fabric switch (for data communication to and from the line card) and one serves as a “back door” for remote maintenance access to the CP. The CP is also equipped with several serial interfaces, which connect to the Shelf Manager, the chassis switch blade, the external switch and each of the two GPEs. These provide backup maintenance access, in the event that standard access mechanisms fail. | The Control Processor (CP) is implemented by a Dell PowerEdge 860 with 2 GB of memory and 160 GB of disk. It has three 1 GbE network connections. One is connected to the base switch (for control communication), one is connected to the fabric switch (for data communication to and from the line card) and one serves as a “back door” for remote maintenance access to the CP. The CP is also equipped with several serial interfaces, which connect to the Shelf Manager, the chassis switch blade, the external switch and each of the two GPEs. These provide backup maintenance access, in the event that standard access mechanisms fail. | ||
− | + | ||
+ | == NetFPGA == | ||
− | The NetFPGA is a PCI card that hosts a Xilinx Virtex 2 Pro 50 FPGA, on-board SRAM and DRAM and four 1 GbE interfaces. Each SPP has one NetFPGA which is available as a resource for use by researchers. The NetFPGA is hosted by the CP, and its four network connections go to the external switch. From there, packets can be forwarded to any of the other components in the system. More details on the NetFPGA can be found at http://www.netfpga.org/. | + | The NetFPGA is a PCI card that hosts a Xilinx Virtex 2 Pro 50 FPGA, on-board SRAM and DRAM and four 1 GbE interfaces. Each SPP has one NetFPGA which is available as a resource for use by researchers. The NetFPGA is hosted by the CP, and its four network connections go to the external switch. From there, packets can be forwarded to any of the other components in the system. More details on the NetFPGA can be found at ''http://www.netfpga.org/''. |
− | Figure 6. NetFPGA card | + | >> Figure 6. NetFPGA card |
− |
Revision as of 19:19, 12 November 2009
Contents
Overview
>> Figure 2
Figure 2 shows the main components of an SPP node. Most of the components are blades in an Advanced Telecommunications Computing Architecture (ATCA) blade server. ATCA is a standard for telecom-class blade servers, appropriate for a wide range of applications, including high capacity routers. All input and output occurs through the Line Card (LC), which is an NP-based subsystem with one or more physical interfaces. The LC forwards each arriving packet to the system component configured to process it, and queues outgoing packets for transmission, ensuring that each slice gets the appropriate share of the network interface bandwidth. The architecture can support multiple LCs, but the systems being deployed for GENI have one LC each. The General Purpose Processing Engines (GPE) are conventional dual processor server blades running the PlanetLab OS (currently Linux 2.6, with PlanetLab-specific extensions) and hosting vServers that serve application slices. The Network Processing Engine (NPE) is a server blade containing two NP subsystems, each comprising an Intel IXP 2850 NP, with 17 internal processor cores, 3 banks of SDRAM, 3 banks of QDR SRAM and a Ternary Content Addressable Memory (TCAM). The architecture supports multiple NPEs, but the deployed systems have a single NPE each. The NPE supports fast path processing for slices that elect to use this capability and provides up to 10 Gb/s of IO bandwidth. The Control Processor (CP) is a separate rack-mount server that hosts the software that coordinates the operation of the system as a whole. The CP also hosts a Net-FPGA card with four 1 GbE interfaces. The Net-FPGA will be made available for use by researchers. The switching substrate includes a chassis switch and a separate external switch, which provides additional 1 GbE ports. The chassis switch board actually includes two switches, a Fabric Switch with both 1 GbE and 10 GbE ports for data traffic and a Base Switch with 1 GbE ports for control traffic.
>>Figure 3. General Purpose Processing Engine (Radisys ATCA 4310)
General Purpose Processing Engine (GPE)
The GPEs (see Figure 3) are dual processor blade servers, specifically Radisys ATCA 4310 blades with 2 GHZ Intel Xeon processors with 4 GB of memory and an on-board SAS disk (37 GB). They have two GbE network interfaces, one on the fabric switch and one on the base switch. The base switch interface is reserved for control traffic only and is not directly accessible to user applications running on the GPEs.
Network Processing Engine (NPE)
The Network Processor Engine is implemented using a Radisys ATCA 7010 blade, which contains two Intel IXP 2850 NP subsystems. The 7010 blade communicates with the chassis switch through the Fabric Interface Card, a small mezzanine card that allows the 7010 to be used with different types of chassis switches. In the case of the SPP, the FIC provides a 10 GbE interface to the chassis switch that passes through the ATCA backplane. The 7010 blade can be configured with an optional input/output card that is mounted on the rear-side of the chassis. Such rear-mounted cards are referred to as Rear Transition Modules (RTM). The NPE does not use the RTM, but the Line Card does. The various components on the blade communicate through a Serial Peripheral Interface (SPI) switch supporting data rates of just over 12 Gb/s. The SPI interface transfers data in fixed length cells of 64 bytes, so is subject to segmentation losses when transferring variable length packets. The two NP subsystems share a Ternary Content Addressable Memory (TCAM) which can store up to 18 Mb of data and can be configured to support word lengths ranging from 72 to 576 bits. The xScale processors share a separate network connection to the base switch, which is used for control communication.
>>Figure 4. Radisys ATCA 7010 Network Processor Blade with 10x1 GbE IO card
Each of the IXP 2850s has an xScale management processor that runs an embedded version of Linux, plus 16 MicroEngines (ME), which are 32 bit RISC processors optimized for packet processing.
Each ME has a small program store capable of storing 8K instructions, a register file and a small data memory.
There is an on-chip SRAM that can be accessed by all of the MEs and multiple interfaces to off-chip memory.
These include four SRAM and three DRAM interfaces.
As with any modern processor, the primary challenge to achieving high performance is coping with the large processor/memory latency gap. Retrieving data from off-chip memory can take 50-100 ns (or more), meaning that in the time it takes to retrieve a piece of data from memory, a processor can potentially execute over 100 instructions.
The challenge for processor designers is to try to ensure that the processor stays busy, in spite of this.
Conventional processors cope with the memory latency gap primarily using caches.
However for caches to be effective, applications must exhibit locality of reference, and unfortunately, networking applications typically exhibit very limited locality of reference, with respect to their data. Since caches are relatively ineffective for networking workloads, the IXP provides a different mechanism for coping with the memory latency gap, hardware multithreading.
Each of the MEs has eight separate sets of processor registers (including Program Counter), which form the MEs hardware thread contexts.
An ME can switch from one context to another in 2 clock cycles, allowing it to stay busy doing useful work, even when several of its hardware threads are suspended, waiting for data to be retrieved from external memory.
The MEs include small FIFOs (called Next Neighbor Rings) connecting to neighboring MEs, which can be used to support pipeline processing of packets. A pipelined program structure makes it easy to use the processing power of the MEs effectively, since the parallel components of the system are largely decoupled from one another. Pipelined processing also makes effective use of the limited ME program stores, since each ME need only store the instructions for its stage of the pipeline
Line Card (LC)
The Line Card is implemented using the same Radisys 7010 blade as the NPEs. The one difference is that the Line Cards are configured with the optional IO card. There are multiple IO cards available for the 7010. The Line Cards use an IO card with ten 1 GbE interfaces. The card supports swappable modules allowing it to accommodate either copper or fiber connections. Packets are transferred between the IO card and the 7010 boards NP subsystems through the on-board SPI switch. This requires that the Ethernet packets be fragmented into 64 byte cells for transfer through the SPI switch, which can lead to fragmentation losses for packets that are just a little too long to fit into one SPI cell.
Figure 5. Switch blade (Radisys ATCA 2210)
Switching Substrate
The chassis switch is a Radisys ATCA 2210 card which includes two switches, a fabric switch with 10 GbE ports (these can optionally be used for 1 GbE interfaces) and a base switch with 1 GbE ports. The fabric switch has four expansion ports on the front panel that can be used to connect directly to other components. The base switch also has four front panel ports which are used by the SPP for connections to the CP and the shelf manager. The external switch is a Netgear GSM 7228 with 24 1 GbE interfaces and the ability to host up to four 10 GbE interfaces. The SPP has one of these 10 GbE interfaces equipped and connected to the chassis switch through one of its front panel ports.
ATCA Chassis
The SPP uses a Shroff 5U six slot ATCA chassis (model name Zephyr) with an integrated Shelf Manager. The Shelf Manager has an on-board CPU that provides low level maintenance access to the chassis. This allows the various blades in the chassis to be remotely controlled. This is used primarily to force a reboot of a component that is not responding as expected.
Control Processor
The Control Processor (CP) is implemented by a Dell PowerEdge 860 with 2 GB of memory and 160 GB of disk. It has three 1 GbE network connections. One is connected to the base switch (for control communication), one is connected to the fabric switch (for data communication to and from the line card) and one serves as a “back door” for remote maintenance access to the CP. The CP is also equipped with several serial interfaces, which connect to the Shelf Manager, the chassis switch blade, the external switch and each of the two GPEs. These provide backup maintenance access, in the event that standard access mechanisms fail.
NetFPGA
The NetFPGA is a PCI card that hosts a Xilinx Virtex 2 Pro 50 FPGA, on-board SRAM and DRAM and four 1 GbE interfaces. Each SPP has one NetFPGA which is available as a resource for use by researchers. The NetFPGA is hosted by the CP, and its four network connections go to the external switch. From there, packets can be forwarded to any of the other components in the system. More details on the NetFPGA can be found at http://www.netfpga.org/.
>> Figure 6. NetFPGA card