This blog entry will cover important information you should understand before designing with Memory Interfaces on Versal™ ACAP devices.
It will additionally link you to relevant documentation, tutorials, and example designs.
You can find all of our Versal related blogs here.
Versal ACAP offers the hardened Integrated DDR Memory Controller (DDRMC) along with soft memory interface IP options.
Additionally, the Performance AXI Traffic Generator is available to stimulate the Memory IP in both simulation and post-synthesis for hardware analysis.
The Versal Integrated DDRMC is the preferred solution due to its power, resource utilization, and timing closure savings.
The DDRMC has programmable network on chip (NoC) interface ports and is designed to handle multiple streams of traffic.
Additionally, it supports Quality of Service (QoS) classes to ensure appropriate prioritization of commands.
The NoC is a configurable AXI network used for sharing data between IP endpoints in the programmable logic (PL), the processing system (PS), and other hard blocks. This device-wide infrastructure is a high-speed, integrated data path with dedicated switching.
The Versal soft IP offerings are located within the PL and are similar to the soft memory interface IP offerings in the UltraScale/UltraScale+ device families.
Utilize the Versal™ Network on Chip/Multiple DDR Memory Controllers Tutorial .This example connects many different DDR devices simultaneously in one design to communicate to PS through NoC. It connects one DDR4 device and two interleaved LPDDR4 devices, which requires one NoC instance to configure the DDRMC for the DDR4 device and another NoC instance to configure the two interleaved DDRMCs for the two LPDDR4 devices.
Utilize the Efficient Data Movement with Versal Network on Chip tutorial. This tutorial uses a complex design example to demonstrate how the Versal™ Network on Chip (NoC) simplifies the design process for on-chip data movement. For comparison, a similar design is built in a Zynq® UltraScale+™ device. The NoC frees up programmable logic resources that are consumed by SmartConnect in the Zynq UltraScale+ design. Both designs can be run in hardware, and you can measure data movement and power consumption for comparison purposes.
Appropriately Configure the NoC IP: Configure the IP for the appropriate number of AXI masters, slaves, inter-NoC interfaces, and memory controllers. Based on the determined traffic model (see Designing for Performance section below), enter the QoS for each AXI channel and set the DDR Address Mapping. Configuring Memory Interfaces within the DDRMC is different from previous device families. The DDR controllers are implemented using the NoC IP Wizard. The wizard allows users to configure the target memory device options (memory density parameters, JEDEC timing parameters, and the mode register settings) rather than selecting the memory device from a drop-down menu. Additionally, the wizard provides the option for future device expansion to ensure a DDRMC pinout is sufficient when considering future memory topology expansions such as additional ranks, slots, or transitioning to 3DS devices. Review Chapter 4 "Integrated Memory Controller (DDRMC) Architecture" in (PG313) for additional information.
Design for Performance: When designing with the NoC and DDRMC, pre-planning to design for performance is critical. Chapter 7 "NoC Performance Tuning" in (PG313) reviews the key performance measures of bandwidth, latency, and system design trade-offs affecting performance, and how to optimize performance of the NoC and the integrated DDR Memory Controllers. Before designing for performance in your system, utilize the “Versal Network on Chip/DDR Memory Controller Performance Tuning” tutorial on GitHub which demonstrates the process of refining a design to achieve performance goals. You will start with a system DDR traffic spec and learn how to model this with the NoC, DDR memory controllers, and AXI traffic generators (TGs).
When you are ready, design for performance in your system:
Model the traffic flow. Determine the system’s traffic requirements including command and address patterns.
Determine the system’s aggregate (read and write) bandwidth and bandwidth for each master.
Compare the maximum theoretical bandwidth to the actual/achievable bandwidth for the NoC and DDRMC. For NoC bandwidth, refer to the Performance Metrics section of (PG313). For LPDDR4/DDR4 bandwidth, consider SDRAM overhead such as bus turn around time, page misses, and maintenance commands.
Run simulations to ensure that channels are executing traffic as expected, determine bottlenecks, and utilize the levers available to tune for performance.
The Performance AXI Traffic Generator is intended for modeling traffic masters in Versal ACAP designs for performance evaluation of network on chip (NoC) based solutions. It is available in two versions: Non-Synthesizable for simulations only and Synthesizable for both simulations and running in hardware. Custom traffic patterns can be loaded into the IP through a .csv file. (Xilinx Answer 75782) provides .csv examples.
Include the AXI Performance Monitor IPs which will display read/write latency and bandwidth.
Tune for performance and re-simulate:
Ensure that you have the right number of NoC NMUs and DDRMCs to meet your requirements. Interleaving memories, additional memories, wider data widths, and running the memories faster are options to consider.
Ensure that you have the DDR component that meets your traffic needs. For example, a DDR component with more banks and bank groups reduces page hits and switching penalties, ultimately resulting in better performance.
Determine DDR bandwidth for single versus dual channel.
Maximize efficiency in your DRAM command and address mapping to reduce DRAM penalties including page hits. Consider your address pattern, command pattern, transaction size, and number of threads accessing DDR.
Ensure that you have Outstanding Reads and Outstanding Writes (visible in simulation) queued at all times.
Look at how axi_cache is set. If it is marked as modifiable, this allows the NoC to change transactions to behave more efficiently.
Minimize data width conversions.
Follow Pin and PCB Requirements: Ensure that your DDRMC Pin-out and PCB adhere to all requirements and then validate with signal integrity simulations. The following tools are available to design and simulate your PCB:
Xilinx documentation is organized around a set of user design processes to help you find relevant content for your design needs. Visit these Design Process Hubs for complete information related to your design process: