UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor HSCD12
Visitor
727 Views
Registered: ‎10-07-2015

Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

Hi,

we currently have a problem with several intra-clock setup violations when using XPM_FIFO_SYNC.
(Vivado 2017.3, clock frequency 156.25MHz)

The affected paths are
-from

*/ge_xpm_fifo_sync_inst.xpm_fifo_sync_inst/xpm_fifo_base_inst/gen_sdpram.xpm_memory_base_inst/gen_wr_a.gen_word_narrow.mem_reg_bram_4/CLKBWRCLK

-to

*/ge_xpm_fifo_sync_inst.xpm_fifo_sync_inst/xpm_fifo_base_inst/gen_sdpram.xpm_memory_base_inst/gen_wr_a.gen_word_narrow.mem_reg_bram_1/ENBWREN


What could be the problem?
Do we have to manually add timing constraints when using this parameterized macro?

0 Kudos
1 Solution

Accepted Solutions
Guide avrumw
Guide
584 Views
Registered: ‎01-23-2009

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

OK - the reporting of the path is a bit wierd since it starts in the block RAM, but...

The problem is very simple - you are asking for a 8192x72 FIFO. Since the maximum size of a FIFO36 is 512x72, this requires 16 FIFO36 instances. When building FIFOs that are larger than one BRAM, the tool does depth expansion - it assumes the first RAM contains the first 512 entries of the FIFO, the 2nd contains the next, etc...

Thus to pop data, the data may come from any of the 16 FIFO36 instances. To get this to a single output port, the tools are using a combination of the dedicated cascade logic from RAM to RAM using the 72bit 2:1 MUX in each RAM, and fabric logic. From your timing report, it is using the dedicated cascade path for 4 RAMs, and then (presumably) merging the other 3 groups of 4 RAMs using the some of the LUTs you see at the end of the path.

The net result is that the tool simply can't do this at the speed you want. These 16 RAMBs are "big blocks" - they physically take a reasonable amount of space on the die. Furthermore, RAM reads are "slow" - the clock->Q of the first RAM (alone) is 1.596ns. Add to this the cascade through the next 3 (which take 0.36ns each) and 2.62ns of your 6.4ns period are already consumed (for the group of 4 RAMs). Then these 4 need to be merged with the other 3 groups of 4 and MUXed together.

Then, it appears that you have some logic that processes the returned data, and based on that determines whether or not to pop the RAM again (which really pops any one of the 16 RAMs). The net result is a path that cannot be done in 6.4ns.

In general, when working with block RAMs (or FIFOs), you need to take into account the fact that the RAMs are

  • Physically far apart from eachother
    • This may mean that some RAMs get pushed physically far from the logic driving them/receiving their data
  • Slow reads
    • This is why the RAMs have the optional DO_REG, which decreases the clock->Q, but increases the latency by one

As a result, it is "good practice" to

  • pipeline the address, control and write data to RAMBs if you can (especially large ones)
  • pipeline the return data coming back from RAMs (if you can) - sometimes even twice (at higher frequencies)
    • One to use the DO_REG and one for routing delay

A corrollary of this is never try to drive the input of one RAM with the output of another RAM...

While there may be some things that you can play with in the tool (like changing the bram_max_cascade_height value), this is not likely to be enough. Most likely you will need to re-architect this portion of the design. Maybe

  • Trying to implement the FIFO using width expansion rather than depth expansion
    • The tools will not automatically do this for you - you will have to manually instantiate this
  • Implement some kind of FIFO depth expansion mechanism (where the first of the 8193 words of the FIFO is actually implemented in registers)
    • Something similar to how first-word-fall-through is implemented
  • others

None of these will happen using the XPM_FIFO_SYNC - you will have to implement this yourself.

Avrum

View solution in original post

Tags (1)
9 Replies
Xilinx Employee
Xilinx Employee
715 Views
Registered: ‎05-14-2008

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

Can you post your code of the XPM_FIFO_SYNC instantiation and attach the timing report?

-vivian

-------------------------------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------------------------------
如果提供的信息能解决您的问题,请标记为“接受为解决方案”。
如果您认为帖子有帮助,请点击“奖励”。谢谢!
-------------------------------------------------------------------------------------------------
0 Kudos
Visitor HSCD12
Visitor
700 Views
Registered: ‎10-07-2015

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution
xpm_fifo_sync_inst : xpm_fifo_sync
      generic map (
         FIFO_MEMORY_TYPE         => "block",                   --string; "auto", "block", "distributed", or "ultra" ;
         ECC_MODE                 => "no_ecc",                  --string; "no_ecc" or "en_ecc";
         FIFO_WRITE_DEPTH         => 8192,                      --positive integer
         WRITE_DATA_WIDTH         => 72,                        --positive integer
         WR_DATA_COUNT_WIDTH      => 14,                        --positive integer
         PROG_FULL_THRESH         => 8001,                      --positive integer
         FULL_RESET_VALUE         => 0,                         --positive integer; 0 or 1
         USE_ADV_FEATURES         => "0707",
         READ_MODE                => "std",                     --string; "std" or "fwft";
         FIFO_READ_LATENCY        => 1,                         --positive integer;
         READ_DATA_WIDTH          => 72,                        --positive integer
         RD_DATA_COUNT_WIDTH      => 14,                        --positive integer
         PROG_EMPTY_THRESH        => 10,                        --positive integer
         DOUT_RESET_VALUE         => "0",                       --string
         WAKEUP_TIME              => 0                          --positive integer; 0 or 2;
      )
      port map (
         rst                     => reset,
         wr_clk                  => wr_clk,
         wr_en                   => fifo_wr,
         din                     => control_din & raw_din,
         full                    => fifo_full,
         overflow                => OPEN,
         wr_rst_busy             => OPEN,
         rd_en                   => fifo_rd,
         dout                    => fifo_dout,
         empty                   => fifo_empty,
         underflow               => OPEN,
         rd_rst_busy             => OPEN,
         prog_full               => fifo_progFull,
         wr_data_count           => fifo_wrCnt,
         prog_empty              => OPEN,
         rd_data_count           => fifo_rdCnt,
         sleep                   => '0',
         injectsbiterr           => '0',
         injectdbiterr           => '0',
         sbiterr                 => OPEN,
         dbiterr                 => OPEN
        );

The input signals to xpm_fifo_sync_inst are registered before they enter the XPM_FIFO_SYNC.

0 Kudos
Xilinx Employee
Xilinx Employee
692 Views
Registered: ‎05-14-2008

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

The BRAMs are not well pipelined as you have "FIFO_READ_LATENCY" set to 1. 

Try increasing this number.

-vivian

-------------------------------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------------------------------
如果提供的信息能解决您的问题,请标记为“接受为解决方案”。
如果您认为帖子有帮助,请点击“奖励”。谢谢!
-------------------------------------------------------------------------------------------------
Visitor HSCD12
Visitor
677 Views
Registered: ‎10-07-2015

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

Unfortunately, that is not an option for us.
We will probably revert to built-in FIFOs instead, since they do not lead to a timing violation in our design.

0 Kudos
Guide avrumw
Guide
631 Views
Registered: ‎01-23-2009

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

The startpoint you showed doesn't make sense - that is a clock pin of one of the RAMs (so probably not the actual startpoint).

Show us the complete timing path.

Avrum

0 Kudos
Visitor HSCD12
Visitor
613 Views
Registered: ‎10-07-2015

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

The whole path(s) can be found in the timing report that was attached to the message above.

0 Kudos
Teacher drjohnsmith
Teacher
608 Views
Registered: ‎07-09-2009

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

If i rember, .... 

  the XPM_FIFO_SYNC everything is timed off of Wr clock

There is a great app note on the things, some where.

You neeed the Asyn fifo I think to have  read and write clock,

 

And like all the blobk Rams in the FPGA's , they are assumed to be Syncronous, having output registers to achieve the speed.

    

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Teacher drjohnsmith
Teacher
607 Views
Registered: ‎07-09-2009

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

And re reading the questoin, i don' t know why I mentioned read clock !

time for a coffee.   Oh how I wish this thing allowe dus to edit posts, but alas the old IE I use , does not allow that.

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Guide avrumw
Guide
585 Views
Registered: ‎01-23-2009

Re: Intra-clock setup violation in XPM_FIFO_SYNC

Jump to solution

OK - the reporting of the path is a bit wierd since it starts in the block RAM, but...

The problem is very simple - you are asking for a 8192x72 FIFO. Since the maximum size of a FIFO36 is 512x72, this requires 16 FIFO36 instances. When building FIFOs that are larger than one BRAM, the tool does depth expansion - it assumes the first RAM contains the first 512 entries of the FIFO, the 2nd contains the next, etc...

Thus to pop data, the data may come from any of the 16 FIFO36 instances. To get this to a single output port, the tools are using a combination of the dedicated cascade logic from RAM to RAM using the 72bit 2:1 MUX in each RAM, and fabric logic. From your timing report, it is using the dedicated cascade path for 4 RAMs, and then (presumably) merging the other 3 groups of 4 RAMs using the some of the LUTs you see at the end of the path.

The net result is that the tool simply can't do this at the speed you want. These 16 RAMBs are "big blocks" - they physically take a reasonable amount of space on the die. Furthermore, RAM reads are "slow" - the clock->Q of the first RAM (alone) is 1.596ns. Add to this the cascade through the next 3 (which take 0.36ns each) and 2.62ns of your 6.4ns period are already consumed (for the group of 4 RAMs). Then these 4 need to be merged with the other 3 groups of 4 and MUXed together.

Then, it appears that you have some logic that processes the returned data, and based on that determines whether or not to pop the RAM again (which really pops any one of the 16 RAMs). The net result is a path that cannot be done in 6.4ns.

In general, when working with block RAMs (or FIFOs), you need to take into account the fact that the RAMs are

  • Physically far apart from eachother
    • This may mean that some RAMs get pushed physically far from the logic driving them/receiving their data
  • Slow reads
    • This is why the RAMs have the optional DO_REG, which decreases the clock->Q, but increases the latency by one

As a result, it is "good practice" to

  • pipeline the address, control and write data to RAMBs if you can (especially large ones)
  • pipeline the return data coming back from RAMs (if you can) - sometimes even twice (at higher frequencies)
    • One to use the DO_REG and one for routing delay

A corrollary of this is never try to drive the input of one RAM with the output of another RAM...

While there may be some things that you can play with in the tool (like changing the bram_max_cascade_height value), this is not likely to be enough. Most likely you will need to re-architect this portion of the design. Maybe

  • Trying to implement the FIFO using width expansion rather than depth expansion
    • The tools will not automatically do this for you - you will have to manually instantiate this
  • Implement some kind of FIFO depth expansion mechanism (where the first of the 8193 words of the FIFO is actually implemented in registers)
    • Something similar to how first-word-fall-through is implemented
  • others

None of these will happen using the XPM_FIFO_SYNC - you will have to implement this yourself.

Avrum

View solution in original post

Tags (1)