cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Versal Embedded Memory/FIFO Generator and XPM_MEMORY/FIFO: Introduction and Debugging Techniques.

pthakare
Moderator
Moderator
2 0 257

Introduction

 

This blog entry covers important information users should consider when using Embedded Memory/FIFO Generator and XPM_MEMORY/FIFO in designs targeted for Versal™ ACAPs.

It also includes links to relevant references, test designs and test benches.

You might also want to refer to these other blog entries which are very helpful when targeting your design for Versal.

 

Embedded Memory Generator (EMG)

 

Introduction

Before beginning it would be beneficial to familiarize yourself with the Block Memory Generator (BMG) core which is used for memory construction using embedded block RAM resources in UltraScale™ and UltraScale+™, Zynq®-7000, 7 Series and mature devices (Spartan-6 ,Virtex-5 etc.).

The EMG core is a similar memory constructor that generates area and performance-optimized memories using embedded block RAM and UltraRAM resources in Versal Devices.

You can find the details of these resources here.

Changes and Enhancements compared to BMG

 

  • Unlike the BMG which can be used in both the RTL flow (IP catalog) and IP Integrator (IPI) flow, EMG can only be used in the IP integrator flow.

    If you are asking yourself "then how can I construct the memory in the RTL flow?", The recommendation is to use the XPM_MEMORY macros which will be covered later in this article. The other alternatives are inference and direct primitive instantiation.

             

         pthakare_0-1617074379569.png

  • Memory Initialization is supported in both the Memory controller and Stand alone operating modes (In BMG, Memory initialization is only supported in stand alone mode).

      pthakare_1-1615783347557.png

  • The EMG supports Error Correction Capability and Error Injection.

     pthakare_3-1615784179039.png

  • The EMG offers a feature called Auto Sleep Mode (valid only when the MEMORY_PRIMITIVE is UltraRAM) that looks ahead to RAM accesses through a variable-length input pipeline, and dynamically sleeps when none are pending.

    pthakare_0-1617100415370.png

 

  • An additional enhancement is that the user can explicitly specify Cascade Height, Enable/Disable assertions, and write protections.
    Cascade height specifies the number of BRAM/URAMs in a cascading structure. The maximum value is 16 for block RAM and 64 for URAM.

 

Test Block design

There is no example design/test bench for this IP core release (Vivado 2020.2). The following test block design illustrates the use of the EMG.

You might also want to refer to this example of a Basic read/write to AXI BRAM from PS-APU though NoC in Versal.

pthakare_0-1615785546442.png

Hardware Debug Techniques

  1. Before testing on hardware, a good approach is to perform the post-place and route timing simulation.
    If it works in post-place and route timing simulation and problems are seen on hardware, 
    this could indicate a PCB issue.
    Ensure that all clock sources are 
    active and clean.
  2. If data corruption is observed , check the timing. The setup and hold times of all control signals, including Enable (EN), must always be met during assertion and deassertion.
  3. Run the methodology check (Report tab -> Report Methodology or in the Tcl console use report_methodology) . The Methodology check is very useful for finding missing/incorrect timing constraints or IP related design rules which have not been followed correctly. (Refer to this blog for more detailing on report_methodology).
  4. If using MMCMs in the design, ensure that all MMCMs have obtained lock by monitoring the locked port.
  5. If your outputs go to 0, check your licensing.
  6. When configured in memory controller mode, ensure that the "READ_LATENCY" settings in the EMG and AXI BRAM controllers match.pthakare_1-1615786417176.png

     

Embedded FIFO Generator (EFG)

 

Introduction

In addition to the BMG, it is also beneficial to be familiar with the FIFO generator IP core which is used for FIFO constructions using embedded block RAM, distributed RAM or built-in FIFO resources in UltraScale and UltraScale+, Zynq-7000, 7 Series and mature devices (Spartan-6 ,Virtex-5 etc.).

EFG for Versal is also a fully verified first-in first-out (FIFO) memory queue for applications requiring in-order storage and retrieval. 

Changes and Enhancements compared to FIFO Generator

  • Unlike the FIFO generator which can be used in the RTL flow (IP catalog) and IP Integrator flow, EFG can only be used with the IP integrator (IPI) flow.
    For FIFO implementation in the RTL flow, use the XPM_FIFO macro which will be covered in the next section of this article.
  • MEMORY_TYPE = URAM is supported for common clock FIFO implementation.
    Note: because Hard FIFO support is removed in Versal devices, you will not see the Built-in FIFO option for FIFO implementation.

       pthakare_0-1615795477128.png

  • Configurable read latency has been added for Standard Read Mode. The block RAM macros and UltraRAM macros have built-in embedded registers that can be used to pipeline data and improve macro timing.

         pthakare_3-1615796169980.png

  • Single Bit and Double Bit Error Injection capability has been added.

       pthakare_5-1615796603580.png

  • As with the EMG, the user can explicitly specify Cascade Height and Enable/Disable assertions.

 

Test Block design

There is no example design/test bench for this IP core release (Vivado 2020.2). The use of the Embedded FIFO generator core is based on the interface required.

  • Native: Implements a Native FIFO. (Rarely used in the IP Integrator flow) 
  • AXI Full/Lite: Implements an AXI4 and AXI4-Lite FIFO in First-Word-Fall-Through mode.
  • AXI Stream: Implements an AXI4-Stream FIFO in First-Word-Fall-Through mode 

pthakare_0-1615797104847.png

 

Example:

Embedded FIFO generator configured with Interface Type = AXI4 Full (Vivado 2020.2)

pthakare_0-1617093508527.png

 

Hardware/Interface Debug Techniques

  1. Similar to the EMG, first perform the post place and route timing simulation.
    If it works in post-place and route timing simulation and problems are seen on hardware, 
    this could indicate a PCB issue. Ensure that all clock sources are active and clean.
  2. If data corruption is observed, check the timing. The setup and hold times of all control signals, including Enable (EN), must always be met during assertion and deassertion.
  3. Run the methodology check as suggested in Debug Techniques for EMG.
  4. If using MMCMs in the design, ensure that all MMCMs have obtained lock by monitoring the locked port.
  5. If your outputs go to 0, check your licensing.
  6. Ensure that wr_en and rd_en are not toggling during reset.
  7. If independent clock FIFO is used, ensure wr_en is coming from the write clock domain and rd_en is coming from the read clock domain.
  8. If the data is not being written , check whether FULL= HIGH (the core cannot write the data), that the core is not in reset and that wr_en is synchronous to the write domain clock.
  9. If the data is not being read, check whether EMPTY = HIGH  (the core cannot read the data), that the core is not in reset, and that rd_en is synchronous to the read domain clock.
  10. For Reset ,The clock(s) must be available when the reset is applied.
    If for any reason, the clock(s) are lost at the time of reset, you must release the reset only when the clock(s) are available. Violating this requirement might cause unexpected behavior. For example, the busy signals might be stuck and reconfiguration of the FPGA will be needed.

 

Xilinx Parameterized Macro (XPM_MEMORY and XPM_FIFO)

 

Introduction

 

These elements are included in the Xilinx Parameterized Macro library in the tool, and improve the ease of use over instantiating primitives by parameterizing the code.

The synthesis tools will automatically expand the macros to their underlying primitives.

pthakare_0-1615800515398.png

 

Using XPMs

Refer to the "Enabling Xilinx Parameterized Macros" section in the libraries guide.

Instantiation Templates are available in Vivado as well as in a downloadable ZIP file.

pthakare_1-1615800991103.png

 

Testbench

XPM_FIFO 

XPM_MEMORY  (Not yet available)

Hardware Debug Techniques

  1. The "Introduction" section of each macro covers the reset and clocking guidelines. If you see unintended behavior of the FIFO, double check the reset guidelines.
  2. There might be an issue where the MEMORY_TYPE attribute for XPM_FIFO/XPM_MEMORY is set as "Ultra" but the FIFO/Memory is constructed using BRAMs.
    This happens when the
    CLOCKING_MODE is not set correctly. Because UltraRAM is a single-clocked, two port synchronous memory, CLOCKING_MODE should be set to "common clock".
  3. Make sure to run the methodology check as suggested in previous sections.
  4. If the XPM_FIFO: wr_data_count/rd_data_count is not changing correctly, check the following:
    To reflect the correct value, the width should be 
    log2(FIFO_READ_DEPTH)+1.
    Consider the below use case where wr_data_count value output by FIFO is half of the actual/expected write data count (The wr_data_count is incrementing by one when two values are written into the FIFO).

        xpm_fifo_async #(
      .CDC_SYNC_STAGES(2),       
      .DOUT_RESET_VALUE("0"),    
      .ECC_MODE("no_ecc"),      
      .FIFO_MEMORY_TYPE("auto"), 
      .FIFO_READ_LATENCY(1),     
      .FIFO_WRITE_DEPTH(512),   
      .FULL_RESET_VALUE(0),      
      .PROG_EMPTY_THRESH(10),   
      .PROG_FULL_THRESH(10),     
      .RD_DATA_COUNT_WIDTH(11),   
      .READ_DATA_WIDTH(128),     
      .READ_MODE("std"),         
      .RELATED_CLOCKS(0),        
      .USE_ADV_FEATURES("1707"), 
      .WAKEUP_TIME(0),           
      .WRITE_DATA_WIDTH(512),     
      .WR_DATA_COUNT_WIDTH(9)    
   )          

             pthakare_1-1617102454670.png

As the FIFO write depth is 512, the width should be set to 10 for the accurate wr_data_count .
With lesser bits, the output is truncated to MSBs from the correct wr_data_count.
In the above simulation 9 MSBs from 10 bits are assigned to wr_data_count (9:0).

        pthakare_2-1617102598966.png

 

Additional Resources

  1. Using XPM Memory in IP Integrator  (page 74): 
    https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_2/ug898-vivado-embedded-design.pdf.
  2. Achieving optimal timing performance using automatic pipelining of a URAM matrix in Vivado synthesis:
    https://forums.xilinx.com/t5/Design-and-Debug-Techniques-Blog/Achieving-optimal-timing-performance-by-automatic-pipelining-of/ba-p/971760