Editor’s Note: This content is contributed by Thanaporn Sangpaithoon, General Manager at Design Gateway Co., Ltd.
The Kintex® UltraScale+™ family is considered to be the best price/performance/ watt balance FPGA device built on TSMC 16nm FinFET Technology from Xilinx®. Combine with new UltraRAM, new interconnect optimization technology (SmartConnect), this device can deliver the most cost-effective solution for applications that require high-end capabilities GTY transceivers for 100Gbps network and PCIe® Gen4 connectivity, especially for networking and data storage application.
This article demonstrates the 100Gb/s solution of TCP Offload Engine networking IP Core and NVMe PCIe Gen4 SSD Host IP Core implementation on Xilinx’s KCU116 Evaluation Kit, which is no CPU solutions for 12GB/s TCP transmission over 100GbE interface and NVMeG4-IP core, which is able to achieve incredibly fast performance ~4GB/s per SSD.
The KCU116 is ideal for evaluating key Kintex UltraScale+ features, equipped with onboard 32bit DDR4-2666, FMC expansion ports for M.2 NVMe SSD, and PCIe Gen4 x8 lanes. The 16 x 28Gb/s GTY transceivers available for both PCIe Gen4 and 100GbE interface for our demo implementation.
TOE100G IP core implements TCP/IP stack by hardwire logic and connects with Xilinx’s 100Gb Ethernet Subsystem module for the lower-layer hardware. The user interface of TOE100G IP consists of a register interface for control signals and FIFO interface for data signals. TOE100G IP is designed to connect with a 100Gb Ethernet subsystem, which uses 512-bit AXI4-ST as a user interface. Ethernet subsystem, provided by Xilinx, includes EMAC, PCS, and PMA function. The clock frequency of the user interface of 100Gb Ethernet subsystem is equal to 322.265625 MHz
Full TCP/IP stack implementation
Support one session by one TOE100G IP (Multisession can be implemented by using multiple TOE100G IPs))
Support both Server and Client mode (Passive/Active open and close)
Support Jumbo frame
Simple data interface by standard FIFO interface
Simple control interface by single-port RAM interface
Designed to connect with Xilinx’s 100Gb Ethernet Subsystem
FPGA resource usages on the XCKU5P-2FFVB676E FPGA device are shown in Table 1 below.
Xilinx’s 100Gb Ethernet Subsystem
Xilinx’s 100G Ethernet Subsystem implements the MAC layer and Physical layer for 100Gb Ethernet. The user interface to connect with TOE100G IP is 512bit AXI4 stream. Xilinx provides 100G Ethernet Subsystem (Ethernet MAC and Ethernet PCS/PMA) with many features, described on the following website. https://www.xilinx.com/products/intellectual-property/cmac_usplus.html
Design Gateway’s NVMe PCIe Gen4 Host Controller for GTY Transceivers
More details of the NVMeG4-IP for GTY Transceivers are described Xilinx’s Adaptable Avantage Blog.
FPGA resource usage for NVMeG4-IP implementation is shown in Table 2 below.
Example TOE100G-IP implementation & performance result on KCU116
Figure 4 shows the overview of the reference design based on the KCU116 to demonstrate TOE100G-IP implementation. The demo system includes bare-metal OS Microblaze systems, user logic, and Xilinx’s 100Gb Ethernet Subsystems.
The demo system is designed to evaluation TOE100G-IP operation in both Client and Server mode. The test logic allows testing by sending and receiving data with a test pattern for the highest possible data speed at user interface side. For 100GbE interface with KCU116, 4 x SFP+ transceiver (25GBASE-R) and fiber cable are required as shown in figure 5.
Figure 5: TOE100G-IP demo environment set up on KCU116. (Image source: Design Gateway)
The example performance test result for FPGA-to-FPGA speed when comparing 100G with others 1G/10G/25G/40G speed is shown in figure 6.
Figure 6: TOE100G-IP performance comparison with 1G/10G/25G/40G on KCU116. (Image source: Design Gateway)
The test result demonstrates that TOE100G-IP is capable of achieving ~12GB/s TCP transmission speed.
Example of NVMeG4-IP implementation & performance result on KCU116
Figure 7 shows the overview of the reference design based on the KCU116 to demonstrate 1CH NVMeG4-IP implementation. It’s possible to implement multiple instances of NVMeG4-IP to achieve higher storage performance if FPGA resource is available from customized design.
The demo system writes and verifies data with the NVMe SSD on the KCU116. The user controls the test operation through a Serial console. For the NVMe SSD to interface with the KCU116, an AB18-PCIeX16 adapter board is required, as shown in Figure 8.
Figure 8: NVMeG4-IP demo environment set up on KCU116. (Image source: Design Gateway)
The example performance test result is shown in Figure 9.
Figure 9: NVMe SSD read/write performance on KCU116 by using Aorus NVMe PCIe Gen4 SSD. (Image source: Design Gateway)
Both TOE100G-IP and NVMeG4-IP Core utilize 100Gbps and PCIe Gen4 connectivity capability on KCU116 board for networking and NVMe storage application implementation. TOE100G-IP is capable of ~12GB point-to-point TCP transmission over 100GbE. While NVMeG4-IP can provide very high-performance storage at ~4GB/s per SSD. Storage performance can be increase by RAID implementation.
Open up new opportunities for advanced system-level solutions such as sensors data capturing, onboard computation, and AI based Edge computing devices.
For more detail of TOE100G-IP and NVMeG4-IP: datasheet, available reference design, demo environment setup, please visit Design Gateway’s website at