Showing results for 
Show  only  | Search instead for 
Did you mean: 

Partner: NVMe PCIe Gen4 Host Controller Core by Leveraging Xilinx's UltraScale+ GTY Transceivers

Xilinx Employee
Xilinx Employee
0 2 1,690

Editor’s Note: This content is contributed by Thanaporn Sangpaithoon, General Manager of Design Gateway Co., Ltd.


The Xilinx® UltraScale+™ FPGAs and SoCs with GTY transceivers can support a PCI Express® Gen4 interface. Design Gateway’s NVMe Host Controller IP core is designed to leverage the GTY transceivers to support the latest NVMe SSD drive PCIe Gen4 technology. The IP core is implemented on Xilinx’s Virtex® UltraScale+ FPGA VCU118 Evaluation Kit and able to achieve incredibly fast read/write performance—more than 4GB/s.


Implementation of NVMe Host Controller on UltraScale+ GTY Transceiver

NVMe Implementation. (Image source: Design Gateway)NVMe Implementation. (Image source: Design Gateway)


Conventionally, the NVMe host is implemented by using a host processor operating with a PCIe controller for transferring data to and from the NVMe SSD. The NVMe protocol is implemented for device driver communications with the PCIe controller hardware’s CPU peripheral connected through a very high-speed bus. External DDR memory is required for data buffering and command queue to transfer the data between the PCIe controller and SSD.

UltraScale+ devices with GTY transceivers are capable of PCIe Gen4 interface support. However, a PCIe Gen4 integrated block and Arm® processor are not available on some devices.

Design Gateway solved this problem by developing the NVMeG4-IP core that can run as a stand-alone NVMe host controller with built-in PCIe soft IP and PCIe bridge logic in a single core. Enabling NVMe PCIe Gen4 SSD access with a simplified user interface and standard features allows ease of use without needing knowledge of the NVMe protocol.


Overview of NVMeG4-IP

NVMeG4-IP block diagram. (Image source: Design Gateway)NVMeG4-IP block diagram. (Image source: Design Gateway)


Key Features

  • Implement application layer, transaction layer, data link layer, and some parts of the physical layer to access the NVMe SSD without CPU and external DDR memory required
  • Operate Xilinx PCIe PHY IP configured as a 4-lane PCIe Gen4 (256-bit bus interface)
  • Includes 256Kb RAM data buffer
  • Supports six commands, i.e., Identify, Shutdown, Write, Read, SMART, and Flush (support additional command as optional)
  • User clock frequency must be more than or equal to PCIe clock (250MHz for Gen4)
  • Available reference design:
    • ZCU102 with AB17-M2FMC adapter board
    • KCU105 with AB18-PCIeX16/AB16-PCIeXOVR adapter board
    • VCU118 with AB18-PCIeX16 adapter board

FPGA resources on the XCVU9P-FLGA2104-2L FPGA device are shown in the table below.

Example Implementation Statistics for UltraScale+ DevicesExample Implementation Statistics for UltraScale+ DevicesBecause of very low FPGA resource usages, the NVMeG4-IP core is also suitable for building a multi-channel RAID system with very high performance and the lowest possible FPGA resource consumption.


Implementation and Performance Result on the VCU118

NVMeG4-IP demo environment set up on VCU118. (Image source: Design Gateway)NVMeG4-IP demo environment set up on VCU118. (Image source: Design Gateway)


The example test results when running the demo system on the VCU118 while using the 1 TB GIGABYTE AORUS NVMe PCIe Gen4 SSD is shown in the figure below.

NVMe SSD read/write performance on the VCU118 by using GIGABYTE AORUS NVMe PCIe Gen4 SSD (Image source: Design Gateway)NVMe SSD read/write performance on the VCU118 by using GIGABYTE AORUS NVMe PCIe Gen4 SSD (Image source: Design Gateway)



The NVMeG4-IP core provides a solution to enable the NVMe PCIe Gen4 SSD interface on the VCU118 evaluation kit and also the solution for Xilinx’s UltraScale+ device family features with GTY transceivers without a PCIe Gen4 integrated block. NVMeG4-IP delivers the highest possible performance with the lowest possible FPGA resource usage for NVMe SSD access without requiring a CPU. It is very suitable for high-performance NVMe storage without CPU invention and able to implement multiple NVMe SSD interfaces by utilizing GTY transceivers without limitations from the number of available PCIe integrated blocks in the FPGA device.


For more detail of NVMeG4-IP and available reference design, please visit Design Gateway’s website at


How much data were you writing in a burst? Can it maintain 4GB/s to the example SSD continues? We have seen issues with this SSD reducing the write speed after a 200GB write and a repeat of this.

Write speed mostly depended on NVMe SSD's NAND flash cell technology. Especially, the NVMe SSD that mix use between high and low performance cells (SLC and MLC/TLC) may not be able to provide consistent performance over the entire disk.
For NVMeG4-IP, It can continuously write to NVMe SSD as long as write data available on User FIFO and NVMe SSD continuously requests data to be written from the IP.