Every device family in the Xilinx UltraScale+ family of devices (Virtex UltraScale+ FPGAs, Kintex UltraScale+ FPGAs, and Zynq UltraScale+ MPSoCs) have members with 28Gbps-capable GTY transceivers. That’s likely to be important to you as the number and forms of small, 28Gbps interconnect grow. You have many such choices in such interconnect these days including:
The following 5.5-minute video demonstrates all of these interfaces operating with 25.78Gbps lanes on Xilinx VCU118 and KCU116 Eval Kits, as concisely explained (as usual) by Xilinx’s “Transceiver Marketing Guy” Martin Gilpatric. Martin also discusses some of the design challenges associated with these high-speed interfaces.
But first, as a teaser, I could not resist showing you the wide-open IBERT eye on the 25.78Gbps Samtec FireFly AOC:
Now that’s a desirable eye.
Here’s the new video:
Netcope Technologies’ NFB-100G2Q NIC broke industry records for 100GbE performance earlier this year by achieving 148.8M packets/sec throughput on DPDK (the Data Plane Development Kit) for 64-byte packets—which is 100GbE’s theoretical maximum. That’s good news if you’re deploying NFV applications. Going faster is the name of the game, after all. That performance—tracking the theoretical maximum as defined by line rate and packet size—continues as the frame size gets larger. You can see that from this performance graph from the Netcope Web site:
It’s possible to go even faster, you just need a faster line rate. That’s what the just-announced Netcope NFB-200G2QL 200G Programmable Smart NIC is for: sending packets to your application at 200Gbps over two 100GbE connections. The Netcope NFB-100G2Q NIC is based on a Xilinx Virtex-7 580T FPGA. The NFB-200G2QL Smart NIC (with NACA/NASA-style air scoop) is based on a Xilinx Virtex UltraScale+ FPGA.
The Netcope NFB-200G2QL 200G Programmable Smart NIC is based on a Xilinx Virtex UltraScale+ FPGA
For more information about Netcope’s DPDK performance, see the company’s White Paper titled “Improving DPDK Performance.”
For more information about the Netcope NFB-100G2 NIC in Xcell Daily, see “Brief demo at SC15 by NetCOPE shows the company’s 100G Ethernet boards in action.”
Earlier this year, the University of New Hampshire’s InterOperability Laboratory (UNH-IOL) gave a 25G and 50G Plugfest and everybody came to the party to test compatibility of their implementations with each other. The long list of partiers included:
“The 25 Gigabit Ethernet Consortium is an open organization to all third parties who wish to participate as members to enable the transmission of Ethernet frames at 25 or 50 Gigabit per second (Gbps) and to promote the standardization and improvement of the interfaces for applicable products.”
From the Consortium’s press release about the plugfest:
“The testing demonstrated a high degree of multi-vendor interoperability and specification conformance.”
For its part, Xilinx tested its 10/25G High-Speed Ethernet LogiCORE IP and 40/50G High-Speed Ethernet LogiCORE Subsystem IP using the Xilinx VCU108 Eval Kit based on a Virtex UltraScale XCVU095-2FFVA2104E FPGA over copper using different cable lengths. Consortium rules do not permit me to tell you which companies interoperated with each other, but I can say that Xilinx tested against every company on the above list. I’m told that the Xilinx 25G/50G receiver “did well.”
Xilinx Virtex UltraScale VCU108 Eval Kit
To paraphrase Douglas Adams’ Hitchhikers Guide to the Galaxy: “400GE is fast. Really fast. You just won't believe how vastly, hugely, mind-bogglingly fast it is.”
Xilinx, Microsoft/Azure Networking, and Juniper held a 400GE panel at OFC 2017 that explored the realities of the 400GE ecosystems, deployment models and why the time for 400GE has arrived. The half-hour video below is from OFC 2017. Xilinx’s Mark Gustlin discusses the history of Ethernet from 10Mbps in the 1980s to today’s 400GE, including an explanation lower-speed variants and why they exist. It also provides technical explanations for why the 400GE IEEE technical specs look the way they do and what 400GE optical modules will look like as they evolve. Microsoft/Azure Networking’s Brad Booth describes what he expects Azure’s multi-campus, data-center networking architecture to look like in 2019 and how he expects 400GE to fit into that architecture. Finally, Juniper’s David Ofelt discusses how the 400GE development model has flipped: the hyperscale developers and system vendors are now driving the evolution and the carriers are following their lead. He also touches on the technical issues that have held up 400GE development and what happens when we max out on optical module density (we’re almost there).
For more information about 400GE in Xcell Daily, see:
Xilinx announced the addition of the P416 network programming language for SDN applications to its SDNet Development Environment for high-speed (1Gbps to 100Gbps) packet processing back in May. (See “The P4 has landed: SDNet 2017.1 gets P4-to-FPGA compilation capability for 100Gbps data-plane packet processing.”) An OFC 2017 panel session in March—presented by Xilinx, Barefoot Networks, Netcope Technologies, and MoSys—discussed the adoption of P4, the emergent high-level language for packet processing, and early implementations of P4 for FPGA and ASIC targets. Here’s a half-hour video of that panel discussion.
Metamako decided that it needed more than one Xilinx UltraScale FPGA to deliver the low latency and high performance it wanted from its newest networking platform. The resulting design is a 1RU or 2RU box that houses one, two, or three Kintex UltraScale or Virtex UltraScale+ FPGAs, connected by “near-zero” latency links. The small armada of FPGAs means that the platform can run multiple networking applications in parallel—very quickly. This new networking platform allows Metamako to expand far beyond its traditional market—financial transaction networking—into other realms such as medical imaging, SDR (software-defined radio), industrial control, and telecom. The FPGAs are certainly capable of implementing tasks in all of these applications with extremely high performance.
Metamako’s Triple-FPGA Networking Platform
The Metamako platform offers an extensive range of standard networking features including data fan-out, scalable broadcast, connection monitoring, patching, tapping, time-stamping, and a deterministic port-to-FPGA latency of just 3nsec. Metamako also provides a developer’s kit with the platform with features that include:
This latest networking platform from Metamako demonstrates a key attribute of Xilinx All Programmable technology: the ability to fully differentiate a product by exploiting the any-to-any connectivity and high-speed processing capabilities of Xilinx silicon using Xilinx’s development tools. No other chip technology could provide Metamako with a comparable mix of extreme connectivity, speed, and design flexibility.
When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.
Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:
That’s a huge range covered in just five minutes.
Here’s the video:
Light Reading’s International Group Editor Ray Le Maistre recently interviewed David Levi, CEO of Ethernity Networks, who discusses the company’s FPGA-based All Programmable ACE-NIC, a Network Interface Controller with 40Gbps throughput. The carrier-grade ACE-NIC accelerates vEPC (virtual Evolved Packet Core, a framework for virtualizing the functions required to converge voice and data on 4G LTE networks) and vCPE (virtual Customer Premise Equipment, a way to deliver routing, firewall security and virtual private network connectivity services using software rather than dedicated hardware) applications by 50x, dramatically reducing end-to-end latency associated with NFV platforms. Ethernity’s ACE-NIC is based on a Xilinx Kintex-7 FPGA.
“The world is crazy about our solution—it’s amazing,” says Levi in the Light Reading video interview.
Ethernity Networks All Programmable ACE-NIC
Because Ethernity implements its NIC IP in a Kintex-7 FPGA, it was natural for Le Maistre to ask Levi when his company would migrate to an ASIC. Levi’s answer surprised him:
“We offer a game changer... We invested in technology—which is covered by patents—that consumes 80% less logic than competitors. So essentially, a solution that you may want to deliver without our patents will cost five times more on FPGA… With this kind of solution, we succeed over the years in competing with off-the-shelf components… with the all-programmable NIC, operators enjoy the full programmability and flexibility at an affordable price, which is comparable to a rigid, non-programmable ASIC solution.”
In other words, Ethernity plans to stay with All Programmable devices for its products. In fact, Ethernity Networks announced last year that it had successfully synthesized its carrier-grade switch/router IP for the Xilinx Zynq UltraScale+ MPSoC and that the throughput performance increases to 60Gbps per IP core with the 16nm device—and 120Gbps with two instances of that core. “We are going to use this solution for novel SDN/NFV market products, including embedded SR-IOV (single-root input/output virtualization), and for high density port solutions,” – said Levi.
Towards the end of the video interview, Levi looks even further into the future when he discusses Amazon Web Services’ (AWS’) recent support of FPGA acceleration. (That’s the Amazon EC2 F1 compute instance based on Xilinx Virtex UltraScale+ FPGAs rolled out earlier this year.) Because it’s already based on Xilinx All Programmable devices, Ethernity’s networking IP runs on the Amazon EC2 F1 instance. “It’s an amazing opportunity for the company [Ethernity],” said Levi. (Try doing that in an ASIC.)
Here’s the Light Reading video interview:
When discussed in Xcell Daily two years ago, Exablaze’s 48-port ExaLINK Fusion Ultra Low Latency Switch and Application Platform with the company’s FastMUX option was performing fast Ethernet port aggregation on as many as 15 Ethernet ports with blazingly fast 100nsec latency. (See “World’s fastest Layer 2 Ethernet switch achieves 110nsec switching using 20nm Xilinx UltraScale FPGAs.”) With its new FastMUX upgrade, also available free to existing customers with a current support contract as a field-installable firmware upgrade, Exablaze has now cut that number in half, to an industry-leading 49nsec (actually, between 48.79nsec and 58.79nsec). The FastMUX option aggregates 15 server connections into a single upstream port. All 48 ExaLINK Fusion ports including the FastMux ports are cross-point enabled so that they can support layer 1 features such as tapping for logging, patching for failover, and packet counters and signal quality statistics for monitoring.
The ExaLINK Fusion platform is based on a Xilinx 20nm UltraScale FPGA, which initially gave Exablaze the ability to initially create the fast switching and fast aggregation hardware and massive 48-port connectivity and then to improve the product’s design by taking advantage of the FPGA’s reprogrammability, which simply requires a firmware upgrade that can be performed in the field.
Perhaps you think DPDK (Data Plane Development Kit) is a high-speed data-movement standard that’s strictly for networking applications. Perhaps you think DPDK is an Intel-specific specification. Perhaps you think DPDK is restricted to the world of host CPUs and ASICs. Perhaps you’ve never heard of DPDK—given its history, that’s certainly possible. If any of those statements is correct, keep reading this post.
Originally, DPDK was a set of data-plane libraries and NIC (network interface controller) drivers developed by Intel for fast packet processing on Intel x86 microprocessors. That is the DPDK origin story. Last April, DPDK became a Linux Foundation Project. It lives at DPDK.org and is now processor agnostic.
DPDK consists of several main libraries that you can use to:
So far, DPDK certainly sounds like a networking-specific development kit but, as Atomic Rules’ CTO Shep Siegel says, “If you can make your data-movement problem look like a packet-movement problem,” then DPDK might be a helpful shortcut in your development process.
Siegel knows more than a bit about DPDK because his company has just released Arkville, a DPDK-aware FPGA/GPP data-mover IP block and DPDK PMD (Poll Mode Driver) that allow Linux DPDK applications to offload server cycles to FPGA gates in tandem with the Linux Foundation’s 17.05 release of the open-source DPDK libraries. Atomic Rules’ Arkville release is compatible with Xilinx Vivado 2017.1 (the latest version of the Vivado Design Suite), which was released in April. Currently, Atomic rules provides two sample designs:
(Atomic Rules’ example designs for Arkville were compiled with Vivado 2017.1 as well.)
These examples are data movers; Arkville is a packet conduit. This conduit presents a DPDK interface on the CPU side and AXI interfaces on the FPGA side. There’s a convenient spot in the Arkville conduit where you can add your own hardware for processing those packets. That’s where the CPU offloading magic happens.
Atomic Rules’ Arkville IP works well with all Xilinx UltraScale devices but it works especially well with Xilinx UltraScale+ All Programmable devices that provide two integrated PCIe Gen3 x16 controllers. (That includes devices in the Kintex UltraScale+ and Virtex UltraScale+ FPGA families and the Zynq UltraScale+ MPSoC device families.)
Because, as BittWare’s VP of Network Products Craig Lund says, “100G Ethernet is hard. It’s not clear that you can use PCIe to get [that bit rate] into a server [using one PCIe Gen3 x16 interface]. From the PCIe specs, it looks like it should be easy, but it isn’t.” If you are handling minimum-size packets, says Lund, there are lots of them—more than 14 million per second. If you’re handling big packets, then you need a lot of bandwidth. Either use case presents a throughput challenge to a single PCIe Root Complex. In practice, you really need two.
BittWare has implemented products using the Atomic Rules Arkville IP, based on its XUPP3R PCIe card, which incorporates a Xilinx Virtex UltraScale+ VU13P FPGA. One of the many unique features of this BittWare board is that it has two PCIe Gen3 x16 ports: one available on an edge connector and the other available on an optional serial expansion port. This second PCIe Gen3 x16 port can be connected to a second PCIe slot for added bandwidth.
However, even that’s not enough says Lund. You don’t just need two PCIe Gen3 x16 slots; you need two PCIe Gen2 Root Complexes and that means you need a 2-socket motherboard with two physical CPUs to handle the traffic. Here’s a simplified block diagram that illustrates Lund’s point:
BittWare’s XUPP3R PCIe Card has two PCIe Gen3 x16 ports: one on an edge connector and the other on an optional serial expansion port for added bandwidth
BittWare has used its XUPP3R PCIe card and the Arkville IP to develop two additional products:
Note: For more information about Atomic Rules’ IP and BittWare’s XUPP3R PCIe card, see “BittWare’s UltraScale+ XUPP3R board and Atomic Rules IP run Intel’s DPDK over PCIe Gen3 x16 @ 150Gbps.”
Arkville is a product offered by Atomic Rules. The XUPP3R PCIe card is a product offered by BittWare. Please contact these vendors directly for more information about these products.
DFC Design’s Xenie FPGA module product family pairs a Xilinx Kintex-7 FPGA (a 70T or a 160T) with a Marvell Alaska X 88X3310P 10GBASE-T PHY on a small board. The module breaks out six of the Kintex-7 FPGA’s 12.5Gbps GTX transceivers and three full FPGA I/O banks (for a total of 150 single-ended I/O or up to 72 differential pairs) with configurable I/O voltage to two high-speed, high-pin-count, board-to-board connectors. A companion Xenie BB Carrier board accepts the Xenie FPGA board and breaks out the high-speed GTX transceivers into a 10GBASE-T RJ45 connector, an SFP+ optical cage, and four SDI connectors (two inputs and two outputs).
Here’s a block diagram and photo of the Xenia FPGA module:
Xenia FPGA module based on a Xilinx Kintex-7 FPGA
And here’s a photo of the Xenie BB Carrier board that accepts the Xenia FPGA module:
Xenia BB Carrier board
These are open-source designs.
Here’s a block diagram of the Ethernet example:
Please contact DFC Design directly for more information.
After telegraphing its intent for more than a year, Xilinx has now added the P416 language to its SDNet Development Environment for high-speed (1Gbps to 100Gbps) packet processing. SDNet release 2017.1 includes a generally accessible, front-end P4-to-SDNet translator. P416 is the latest version of the P4 language and the SDNet workflow compiles packet-processing descriptions into data-plane switching algorithms instantiated in high-speed Xilinx FPGAs. Xilinx debuted the new SDNet release at this week’s P4 Developer Day and P4 Workshop held at Stanford U. in Palo Alto, CA. (There was a beta version of the translator in the prior SDNet 2016.4 release.)
There’s information about the new Xilinx P4-toSDNet translator in the latest version of the SDNet Packet Processor User Guide (UG1012) and the P4-SDNet Translator User Guide (UG1252). If you’re up on recent developments with the P416 language, you might want to jump to these user guides directly. Otherwise, you might want to take a look at this Linley Group White Paper titled “Xilinx SDNet: A New Way to Specify Network Hardware”, written by Senior Analyst Loring Wirbel, or watch this short video first:
And if you have a couple of hours to devote to learning a lot more about the P4 language, try this video from the P4 Language Consortium, which includes presentations from Vladimir Gurevich from Barefoot Networks, Ben Pfaff from VMware, Johann Tonsing from Netronome, and Gordon Brebner from Xilinx:
The 1-minute video appearing below shows two 56Gbps, PAM-4 demos from the recent OFC 2017 conference. The first demo shows a CEI-56G-MR (medium-reach, 50cm, chip-to-chip and low-loss backplane) connection between a Xilinx 56Gbps PAM-4 test chip communicating through a QSFP module over a cable to a Credo device. A second PAM-4 demo using CEI-56G-LR (long-reach, 100cm, backplane-style) interconnect shows a Xilinx 56Gbps PAM-4 test chip communicating over a Molex backplane to a Credo device, which is then communicating with a GlobalFoundries device over an FCI backplane, which is then communicating over a TE backplane back to the Xilinx device. This second demo illustrates the growing, multi-company ecosystem supporting PAM-4.
For more information about the Xilinx PAM-4 test chip, see “3 Eyes are Better than One for 56Gbps PAM4 Communications: Xilinx silicon goes 56Gbps for future Ethernet,” and “Got 90 seconds to see a 56Gbps demo with an instant 2x upgrade from 28G to 56G backplane? Good!”
Looking for a relatively painless overview of the current state of the art for high-speed Ethernet used in data centers and for telecom? You should take a look at this just-posted, 30-minute video of a panel discussion at OFC2017 titled “400GE from Hype to Reality.” The panel members included:
Gustlin starts by discussing the history of 400GbE’s development, starting with a study group organized in 2013. Today, the 400GbE spec is at draft 3.1 and the plan is to produce a final standard by December 2017.
Booth answers a very simple question in his talk: “”Yes, we ill” use 400GbE in the data center. He then proceeds to give a fairly detailed description of the data centers and networking used to create Microsoft’s Azure cloud-computing platform.
Ofelt describes the genesis of the 400GbE standard. Prior to 400G, says Ofelt, system vendors worked with end users (primarily telecom companies) to develop faster Ethernet standards. Once a standard appeared, ther would be a deployment ramp. Although 400GbE development started that way, the people building hyperscale data centers sort of took over and they want to deploy 400GbE at scale, ASAP.
Don’t be fooled by the title of this panel. There’s plenty of discussion about 25GbE through 100GbE and 200GbE as well, so if you’re needing a quick update on high-speed Ethernet’s status, this 30-minute video is for you.
Samtec recorded a demo of its FireFly FQSFP twinax cable assembly carrying four 28Gbps lanes from a Xilinx Virtex UltraScale+ VU9P FPGA on a VCU118 eval board to a QSFP optical cage at the recent OFC 2017 conference in Los Angeles. (The Virtex UltraScale+ VU9P FPGA has 120 GTY transceivers capable of 32.75Gbps operation and the VCU118 eval kit includes the Samtec FireFly daughtercard with cable assembly.) Samtec’s FQSFP assembly plugs mid-board into a FireFly connector on the VCU118 board. The 28Gbps signals then “fly over” the board through to the QSFP cage and loop back over the same path, where they are received back into the FPGA. The demonstration shows 28Gbps performance on all four links with zero bit errors.
As explained in the video, the advantage to using the Samtec FireFly flyover system is that it takes the high-speed 28Gbps signals out of the pcb-design equation, making the pcb easier to design and less expensive to manufacture. Significant savings in pcb manufacturing cost can result for large board designs, which no longer need to deal with signal-integrity issues and controlled-impedance traces for such high-speed routes.
Samtec has now posted the 2-minute video from OFC 2017 on YouTube and here it is:
Note: Martin Rowe recently published a related technical article about the Samtec FireFly system titled "High-speed signals jump over PCB traces" on the EDN.com Web site.
Here’s a 90-second video showing a 56Gbps Xilinx test chip with a 56Gbps PAM4 SerDes transceiver operating with plenty of SI margin and better than 10-12 error rate over a backplane originally designed for 28Gbps operation.
Note: This working demo employs a Xilinx test chip. The 56Gbps PAM4 SerDes is not yet incorporated into a product. Not yet.
For more information about this test chip, see “3 Eyes are Better than One for 56Gbps PAM4 Communications: Xilinx silicon goes 56Gbps for future Ethernet.”
Today, Xilinx posted information about the new $2995 Kintex UltraScale+ KCU116 Eval Kit on Xilinx.com. If you’re looking to get into the UltraScale+ FPGAs’ GTY transceiver races—to 32.75Gbps—this is a great kit to start with. The kit includes:
Here’s a nice shot of the KCU116 board from the kit’s quickstart guide:
Kintex UltraScale+ KCU116 Eval Board
One of the key features of this board are the four SFP+ optical cages there on the left. Those handle 25Gbps optical modules, driven of course by four of the KU5P FPGA’s GTY transceivers.
Take a look.
InnoRoute has just started shipping its TrustNode extensible, ultra-low-latency (2.5μsec) IPv6 OpenFlow SDN router as a pcb-level product. The design combines a 1.9GHz, quad-core Intel Atom processor running Linux with a Xilinx FPGA to implement the actual ultra-low-latency router hardware. (You’re not implementing that as a Linux app running on an Atom processor!) The TrustNode Router reference design features twelve GbE ports. Here’s a photo of the TrustNode SDN Router board:
InnoRoute TrustNode SDN Router Board with 12 GbE ports
Based on the pcb layout in the photo, it appears to me that the Xilinx FPGA implementing the 12-port SDN router is under that little black heatsink in the center of the board nearest to all of the Ethernet ports while the quad-core processor running Linux must be sitting there in the back under that great big silver heatsink with an auxiliary cooling fan, near the processor-associated USB ports and SDcard carrier.
InnoRoute’s TrustNode Web page is slightly oblique as to which Xilinx FPGA is used in this design but the description sort of winnows the field. First, the description says that you can customize InnoRoute’s TrustNode router design using the Xilinx Vivado HL Design Suite WebPACK Edition—which you can download at no cost—so we know that the FPGA must be a 28nm series 7 device or newer. Next, the description says that the design uses 134.6k LUTs, 269.2k flip-flops, and 12.8Mbits of BRAM. Finally, we see that the FPGA must be able to handle twelve Gigabit Ethernet ports.
The Xilinx FPGA that best fits this description is an Artix-7 A200.
You can use this TrustNode board to jump into the white-box SDN router business immediately, or at least as fast as you can mill and drill an enclosure and screen your name on the front. In fact, InnoRoute has kindly created a nice-looking rendering of a suggested enclosure design for you:
InnoRoute TrustNode SDN Router (rendering)
The router’s implementation as IP in an FPGA along with the InnoRoute documentation and the Vivado tools mean that you can enhance the router’s designs and add your special sauce to break out of the white box. (White Box Plus? White Box Permium? White Box Platinum? Hey, I’m from marketing and I’m here to help.)
This design enhancement and differentiation are what Xilinx All Programmable devices are especially good at delivering. You are not stuck with some ASSP designer’s concept of what your customers need. You can decide. You can differentiate. And you will find that many customers are willing to pay for that differentiation.
Note: Please contact InnoRoute directly for more information on the TrustNode SDN Router.
Next week at OFC 2017 in Los Angeles, Acacia Communications, Optelian, Precise-ITC, Spirent, and Xilinx will present the industry’s first interoperability demo supporting 200/400GbE connectivity over standardized OTN and DWDM. Putting that succinctly, the demo is all about packing more bits/λ, so that you can continue to use existing fiber instead of laying more.
Callite-C4 400GE/OTN Transponder IP from Precise-ITC instantiated in a Xilinx Virtex UltraScale+ VU9P FPGA will map native 200/400GbE traffic—generated by test equipment from Spirent—into 2x100 and 4x100 OTU4-encapsulated signals. The 200GbE and 400GbE standards are still in flux, so instantiating the Precise-ITC transponder IP in an FPGA allows the design to quickly evolve with the standards with no BOM or board changes. Concise translation: faster time to market with much less risk.
Callite-C4 400GE/OTN Transponder IP Block Diagram
Optelian’s TMX-2200 200G muxponder, scheduled for release later this year, will muxpond the OTU4 signals into 1x200Gbps or 2x200Gbps DP-16QAM using Acacia Communications’ CFP2-DCO coherent pluggable transceiver.
The Optelian and Precise-ITC exhibit booths at OFC 2017 are 4139 and 4141 respectively.
Next week at the OFC Optical Networking and Communication Conference & Exhibition in Los Angeles, Xilinx will be in the Ethernet Alliance booth demonstrating the industry’s first, standard-based, multi-vendor 400GE network. A 400GE MAC and PCS instantiated in a Xilinx Virtex UltraScale+ VU9P FPGA will be driving a Finisar 400GE CFP8 optical module, which in turn will communicate with a Spirent 400G test module over a fiber connection.
In addition, Xilinx will be demonstrating:
If you’re visiting OFC, be sure to stop by the Xilinx booth (#1809).
Berten DSP’s GigaX API for the Xilinx Zynq SoC creates a high-speed, 200Mbps full-duplex communications channel between a GbE port and the Zynq SoC’s PS (programmable logic) through an attached SDRAM buffer and an AXI DMA controller IP block. Here’s a diagram to clear up what’s happening:
The software API implements IP filtering and manages TCP/UDP headers, which help you implement a variety of hardware-accelerated Ethernet systems including Ethernet bridges, programmable network nodes, and network offload appliances. Here’s a performance curve illustrating the kind of throughput you can expect:
Please contact Berten DSP directly for more information about the GigaX API.
As the BittWare video below explains, CPUs are simply not able to process 100GbE packet traffic without hardware acceleration. BittWare’s new Streamsleuth, to be formally unveiled at next week’s RSA Conference in San Francisco (Booth S312), adroitly handles blazingly fast packet streams thanks to a hardware assist from an FPGA. And as the subhead in the title slide of the video presentation says, StreamSleuth lets you program its FPGA-based packet-processing engine “without the hassle of FPGA programming.”
(Translation: you don’t need Verilog or VHDL proficiency to get this box working for you. You get all of the FPGA’s high-performance goodness without the bother.)
That said, as BittWare’s Network Products VP & GM Craig Lund explains, this is not an appliance that comes out of the box ready to roll. You need (and want) to customize it. You might want to add packet filters, for example. You might want to actively monitor the traffic. And you definitely want the StreamSleuth to do everything at wire-line speeds, which it can. “But one thing you do not have to do, says Lund, “is learn how to program an FPGA.” You still get the performance benefits of FPGA technology—without the hassle. That means that a much wider group of network and data-center engineers can take advantage of BittWare’s StreamSleuth.
As Lund explains, “100GbE is a different creature” than prior, slower versions of Ethernet. Servers cannot directly deal with 100GbE traffic and “that’s not going to change any time soon.” The “network pipes” are now getting bigger than the server’s internal “I/O pipes.” This much traffic entering a server this fast clogs the pipes and also causes “cache thrash” in the CPU’s L3 cache.
Sounds bad, doesn’t it?
What you want is to reduce the network traffic of interest down to something a server can look at. To do that, you need filtering. Lots of filtering. Lots of sophisticated filtering. More sophisticated filtering than what’s available in today’s commodity switches and firewall appliances. Ideally, you want a complete implementation of the standard BPF/pcap filter language running at line rate on something really fast, like a packet engine implemented in a highly parallel FPGA.
The same thing holds true for attack mitigation at 100Gbe line rates. Commodity switching hardware isn’t going to do this for you at 100GbE (10GbE yes but 100GbE, “no way”) and you can’t do it in software at these line rates. “The solution is FPGAs” says Lund, and BittWare’s StreamSleuth with FPGA-based packet processing gets you there now.
Software-based defenses cannot withstand Denial of Service (DoS) attacks at 100GbE line rates. FPGA-accelerated packet processing can.
So what’s that FPGA inside of the BittWare Streamsleuth doing? It comes preconfigured for packet filtering, load balancing, and routing. (“That’s a Terabit router in there.”) To go beyond these capabilities, you use the BPF/pcap language to program your requirements into the the StreamSleuth’s 100GbE packet processor using a GUI or APIs. That packet processor is implemented with a Xilinx Virtex UltraScale+ VU9P FPGA.
Here’s what the guts of the BittWare StreamSleuth look like:
And here’s a block diagram of the StreamSleuth’s packet processor:
The Virtex UltraScale+ FPGA resides on a BittWare XUPP3R PCIe board. If that rings a bell, perhaps you read about that board here in Xcell Daily last November. (See “BittWare’s UltraScale+ XUPP3R board and Atomic Rules IP run Intel’s DPDK over PCIe Gen3 x16 @ 150Gbps.”)
Finally, here’s the just-released BittWare StreamSleuth video with detailed use models and explanations:
For more information about the StreamSleuth, contact BittWare directly or go see the company’s StreamSleuth demo at next week’s RSA conference. For more information about the packet-processing capabilities of Xilinx All Programmable devices, click here. And for information about the new Xilinx Reconfigurable Acceleration Stack, click here.
Accolade’s newly announced ATLAS-1000 Fully Integrated 1U OEM Application Acceleration Platform pairs a Xilinx Kintex UltraScale KU060 FPGA on its motherboard with an Intel x86 processor on a COM Express module to create a network-security application accelerator. The ATLAS-1000 platform integrates Accolade’s APP (Advanced Packet Processor), instantiated in the Kintex UltraScale FPGA, which delivers acceleration features for line-rate packet processing including lossless packet capture, nanosecond-precision packet timestamping, packet merging, packet filtering, flow classification, and packet steering. The platform accepts four 10G SFP+ or two 40G QSFP pluggable optical modules. Although the ATLAS-1000 is designed as a flow-through security platform, especially for bump-in-the-wire applications, there’s also 1Tbyte worth of on-board local SSD storage.
Accolade Technology's ATLAS-1000 Fully Integrated 1U OEM Application Acceleration Platform
Here’s a block diagram of the ATLAS-1000 platform:
All network traffic enters the FPGA-based APP for packet processing. Packet data is then selectively forwarded to the x86 CPU COM Express module depending on the defined application policy.
Please contact Accolade Technology directly for more information about the ATLAS-1000.
Aquantia has packed its Ethernet PHY—capable of operating at 10Gbps over 100m of Cat 6a cable (or 5Gbps down to 100Mbps over 100m of Cat 5e cable)—with a Xilinx Kintex-7 FPGA, creating a universal Gigabit Ethernet component with extremely broad capabilities. Here’s a block diagram of the new AQLX107 device:
This Aquantia device gives you a space-saving, one-socket solution for a variety of Ethernet designs including controllers, protocol converters, and anything-to-Ethernet bridges.
Please contact Aquantia for more information about this unique Ethernet chip.
Do you have a big job to do? How about a terabit router bristling with optical interconnect? Maybe you need a DSP monster for phased-array radar or sonar. Beamforming for advanced 5G applications using MIMO antennas? Some other high-performance application with mind-blowing processing and I/O requirements?
You need to look at Xilinx Virtex UltraScale+ FPGAs with their massive data-flow and routing capabilities, massive memory bandwidth, and massive I/O bandwidth. These attributes sweep away design challenges caused by performance limits of lesser devices.
Now you can quickly get your hands on a Virtex UltraScale+ Eval Kit so you can immediately start that challenging design work. The new eval kit is the Xilinx VCU118 with an on-board Virtex UltraScale+ VU9P FPGA. Here’s a photo of the board included with the kit:
Xilinx VCU118 Eval Board with Virtex UltraScale+ VU9P FPGA
The VCU118 eval kit’s capabilities spring from the cornucopia of on-chip resources provided by the Virtex UltraScale+ VU9P FPGA including:
If you can’t build what you need with the VCU118’s on-board Virtex UltraScale+ VU9P FPGA—and it’s sort of hard to believe that’s even possible—just remember, there are even larger parts in the Virtex UltraScale+ FPGA family.
Think you can design the lowest-latency network switch on the planet? That’s the challenge of the NetFPGA 2017 design contest. You have until April 13, 2017 to develop a working network switch using the NetFPGA SUME dev board, which is based on a Xilinx Virtex-7 690T FPGA. Contest details are here. (The contest started on November 16.)
Competing designs will be evaluated using OSNT, an Open Source Network Tester, and testbenches will be available online for users to experiment and independently evaluate their design. The competition is open to students of all levels (undergraduate and postgraduate) as well as to non-students. Winners will be announced at the NetFPGA Developers Summit, to be held on Thursday, April 20 through Friday, April 21, 2017 in Cambridge, UK.
Note: There is no need to own a NetFPGA SUME platform to take part in the competition because the competition offers online access to one. However, you may want one for debugging purposes because there’s no online debug access to the online NetFPGA SUME platform. (NetFPGA SUME dev boards are available from Digilent. Click here.)
NetFPGA SUME Board (available from Digilent)
Intel’s DPDK (Data Plane Development Kit) is a set of software libraries that improves packet processing performance on x86 CPU hosts by as much as 10x. According to Intel, its DPDK plays a critical role in SDN and NFV applications. Last week at SC16 in Salt Lake City, BittWare demonstrated Intel’s DPDK running on a Xeon CPU and streaming packets over a PCIe Gen3 x16 interface at an aggregate rate of 150Gbps (transmit + receive) to and from BittWare’s new XUPP3R PCIe board using Atomic Rules’ Arkville DPDK-aware data mover IP instantiated in the 16nm Xilinx Virtex UltraScale+ VU9P FPGA on Bittware’s board. The Arkville DPDK-aware data mover marshals packets between the IP block implemented in the FPGA’s programmable logic and the CPU host's memory using the Intel DPDK API/ABI. Atomic Rule’s Arkville IP plus a high-speed MAC looks like a line-rate-agnostic, bare-bones L2 NIC.
BittWare’s XUPP3R PCIe board with an on-board Xilinx Virtex UltraScale+ VU9P FPGA
Here’s a very short video of BittWare’s VP of Systems & Solutions Ron Huizen explaining his company’s SC16 demo:
Here’s an equally short video made by Atomic Rules with a bit more info:
If this all looks vaguely familiar, perhaps you’re remembering an Xcell Daily post that appeared just last May where BittWare demonstrated an Atomic Rules UDP Offload Engine running on its XUSP3S PCIe board, which is based on a Xilinx Virtex UltraScale VU095 FPGA. (See “BittWare and Atomic Rules demo UDP Offload Engine @ 25 GbE rates; BittWare intros PCIe Networking card for 4x 100 GbE.”) For the new XUPP3R PCIe board, BittWare has now jumped from the 20nm Virtex UltraScale FPGAs to the latest 16nm Virtex UltraScale+ FPGAs.
Today, Xilinx announced four members of a new Virtex UltraScale+ HBM device family that combines high-performance 16nm Virtex UltraScale+ FPGAs with 32 or 64Gbits of HBM (high-bandwidth memory) DRAM in one device. The resulting devices deliver a 20x improvement in memory bandwidth relative to DDR SDRAM—more than enough to keep pace with the needs of 400G Ethernet, multiple 8K digital-video channels, or high-performance hardware acceleration for cloud servers.
These new Virtex UltraScale+ HBM devices are part of the 3rd generation of Xilinx 3D FPGAs, which started with the Virtex-7 2000T that Xilinx started shipping way, way back in 2011. (See “Generation-jumping 2.5D Xilinx Virtex-7 2000T FPGA delivers 1,954,560 logic cells using 6.8 BILLION transistors (PREVIEW!)”) Xilinx co-developed this 3D IC technology with TSMC and the Virtex UltraScale+ HBM devices represent the current, production-proven state of the art.
Here’s a table listing salient features of these four new Virtex UltraScale+ HBM devices:
Each of these devices incorporates 32 or 64Gbits of HBM DRAM with more than 1000 I/O lines connecting each HBM stack through the silicon interposer to the logic device, which contains a hardened HBM memory controller that manages one or two HBM devices. This memory controller has 32 high-performance AXI channels, allowing high-bandwidth interconnect to the Virtex UltraScale+ devices’ programmable logic and access to many routing channels in the FPGA fabric. Any AXI port can access any physical memory location in the HBM devices.
In addition, these Virtex UltraScale+ HBM FPGAs are the first Xilinx devices to offer the new, high-performance CCIX cache-coherent interface announced just last month. (See “CCIX Consortium develops Release1 of its fully cache-coherent interconnect specification, grows to 22 members.”) CCIX simplifies the design of offload accelerators for hyperscale data centers by providing low-latency, high-bandwidth, fully coherent access to server memory. The specification employs a subset of full coherency protocols and is ISA-agnostic, meaning that the specification’s protocols are independent of the attached processors’ architecture and instruction sets. CCIX pairs well with HBM and the new Xilinx UltraScale+ HBM FPGAs provide both in one package.
Here’s an 8-minute video with additional information about the new Virtex UltraScale+ HBM devices:
Yesterday, Aquantia announced its QuantumStream technology, which drives 100Gbps data over direct-attached copper cable through an SFP connector. Aquantia notes that this is not a 4x25Gbps or 2x50Gbps connection; it’s a true, 1-lane, 100Gbps data stream. The technology is based on a 56Gbps SerDes IP core from GLOBALFOUNDRIES implemented in 14nm FinFET technology. Aquantia has added its own magic in the form of its patented Mixed-Mode Signal Processing (MMSP) and Multi-Core Signal Processing (MCSP) architectural innovations.
The company expects this technology will significantly change interconnectivity within data centers for both inter- and intra-rack connectivity with connections up to a few meters in length. Looks pretty cool for top-of-rack switches.
Full disclosure: You’ll find that Xilinx is listed as a strategic investor on Aquantia’s home page.
Accolade’s 3rd-gen, dual-port, 100G ANIC-200Ku PCIe Lossless Packet Capture Adapter can classify packets in 32 million flows simultaneously—enabled by Xilinx UltraScale FPGAs—while dissipating a mere 50W. The board features two CFP4 optical adapter cages and can time-stamp packets with 4nsec precision. You can directly link two ANIC-200Ku Packet Capture Adapters with a direct-attach cable to handle lossless, aggregated traffic flows at 200Gbps.
Applications for the adapter include:
Accolade’s 3rd-gen, dual-port, 100G ANIC-200Ku PCIe Lossless Packet Capture Adapter