05-29-2019 01:05 AM
Hi,
we currently have a problem with several intra-clock setup violations when using XPM_FIFO_SYNC.
(Vivado 2017.3, clock frequency 156.25MHz)
The affected paths are
-from
*/ge_xpm_fifo_sync_inst.xpm_fifo_sync_inst/xpm_fifo_base_inst/gen_sdpram.xpm_memory_base_inst/gen_wr_a.gen_word_narrow.mem_reg_bram_4/CLKBWRCLK
-to
*/ge_xpm_fifo_sync_inst.xpm_fifo_sync_inst/xpm_fifo_base_inst/gen_sdpram.xpm_memory_base_inst/gen_wr_a.gen_word_narrow.mem_reg_bram_1/ENBWREN
What could be the problem?
Do we have to manually add timing constraints when using this parameterized macro?
06-03-2019 01:56 PM
OK - the reporting of the path is a bit wierd since it starts in the block RAM, but...
The problem is very simple - you are asking for a 8192x72 FIFO. Since the maximum size of a FIFO36 is 512x72, this requires 16 FIFO36 instances. When building FIFOs that are larger than one BRAM, the tool does depth expansion - it assumes the first RAM contains the first 512 entries of the FIFO, the 2nd contains the next, etc...
Thus to pop data, the data may come from any of the 16 FIFO36 instances. To get this to a single output port, the tools are using a combination of the dedicated cascade logic from RAM to RAM using the 72bit 2:1 MUX in each RAM, and fabric logic. From your timing report, it is using the dedicated cascade path for 4 RAMs, and then (presumably) merging the other 3 groups of 4 RAMs using the some of the LUTs you see at the end of the path.
The net result is that the tool simply can't do this at the speed you want. These 16 RAMBs are "big blocks" - they physically take a reasonable amount of space on the die. Furthermore, RAM reads are "slow" - the clock->Q of the first RAM (alone) is 1.596ns. Add to this the cascade through the next 3 (which take 0.36ns each) and 2.62ns of your 6.4ns period are already consumed (for the group of 4 RAMs). Then these 4 need to be merged with the other 3 groups of 4 and MUXed together.
Then, it appears that you have some logic that processes the returned data, and based on that determines whether or not to pop the RAM again (which really pops any one of the 16 RAMs). The net result is a path that cannot be done in 6.4ns.
In general, when working with block RAMs (or FIFOs), you need to take into account the fact that the RAMs are
As a result, it is "good practice" to
A corrollary of this is never try to drive the input of one RAM with the output of another RAM...
While there may be some things that you can play with in the tool (like changing the bram_max_cascade_height value), this is not likely to be enough. Most likely you will need to re-architect this portion of the design. Maybe
None of these will happen using the XPM_FIFO_SYNC - you will have to implement this yourself.
Avrum
05-29-2019 01:30 AM
Can you post your code of the XPM_FIFO_SYNC instantiation and attach the timing report?
-vivian
05-29-2019 02:27 AM
xpm_fifo_sync_inst : xpm_fifo_sync generic map ( FIFO_MEMORY_TYPE => "block", --string; "auto", "block", "distributed", or "ultra" ; ECC_MODE => "no_ecc", --string; "no_ecc" or "en_ecc"; FIFO_WRITE_DEPTH => 8192, --positive integer WRITE_DATA_WIDTH => 72, --positive integer WR_DATA_COUNT_WIDTH => 14, --positive integer PROG_FULL_THRESH => 8001, --positive integer FULL_RESET_VALUE => 0, --positive integer; 0 or 1 USE_ADV_FEATURES => "0707", READ_MODE => "std", --string; "std" or "fwft"; FIFO_READ_LATENCY => 1, --positive integer; READ_DATA_WIDTH => 72, --positive integer RD_DATA_COUNT_WIDTH => 14, --positive integer PROG_EMPTY_THRESH => 10, --positive integer DOUT_RESET_VALUE => "0", --string WAKEUP_TIME => 0 --positive integer; 0 or 2; ) port map ( rst => reset, wr_clk => wr_clk, wr_en => fifo_wr, din => control_din & raw_din, full => fifo_full, overflow => OPEN, wr_rst_busy => OPEN, rd_en => fifo_rd, dout => fifo_dout, empty => fifo_empty, underflow => OPEN, rd_rst_busy => OPEN, prog_full => fifo_progFull, wr_data_count => fifo_wrCnt, prog_empty => OPEN, rd_data_count => fifo_rdCnt, sleep => '0', injectsbiterr => '0', injectdbiterr => '0', sbiterr => OPEN, dbiterr => OPEN );
The input signals to xpm_fifo_sync_inst are registered before they enter the XPM_FIFO_SYNC.
05-29-2019 02:48 AM
The BRAMs are not well pipelined as you have "FIFO_READ_LATENCY" set to 1.
Try increasing this number.
-vivian
05-29-2019 04:20 AM
Unfortunately, that is not an option for us.
We will probably revert to built-in FIFOs instead, since they do not lead to a timing violation in our design.
06-02-2019 04:46 PM - edited 06-02-2019 04:46 PM
The startpoint you showed doesn't make sense - that is a clock pin of one of the RAMs (so probably not the actual startpoint).
Show us the complete timing path.
Avrum
06-03-2019 03:38 AM
The whole path(s) can be found in the timing report that was attached to the message above.
06-03-2019 03:51 AM
If i rember, ....
the XPM_FIFO_SYNC everything is timed off of Wr clock
There is a great app note on the things, some where.
You neeed the Asyn fifo I think to have read and write clock,
And like all the blobk Rams in the FPGA's , they are assumed to be Syncronous, having output registers to achieve the speed.
06-03-2019 03:53 AM
And re reading the questoin, i don' t know why I mentioned read clock !
time for a coffee. Oh how I wish this thing allowe dus to edit posts, but alas the old IE I use , does not allow that.
06-03-2019 01:56 PM
OK - the reporting of the path is a bit wierd since it starts in the block RAM, but...
The problem is very simple - you are asking for a 8192x72 FIFO. Since the maximum size of a FIFO36 is 512x72, this requires 16 FIFO36 instances. When building FIFOs that are larger than one BRAM, the tool does depth expansion - it assumes the first RAM contains the first 512 entries of the FIFO, the 2nd contains the next, etc...
Thus to pop data, the data may come from any of the 16 FIFO36 instances. To get this to a single output port, the tools are using a combination of the dedicated cascade logic from RAM to RAM using the 72bit 2:1 MUX in each RAM, and fabric logic. From your timing report, it is using the dedicated cascade path for 4 RAMs, and then (presumably) merging the other 3 groups of 4 RAMs using the some of the LUTs you see at the end of the path.
The net result is that the tool simply can't do this at the speed you want. These 16 RAMBs are "big blocks" - they physically take a reasonable amount of space on the die. Furthermore, RAM reads are "slow" - the clock->Q of the first RAM (alone) is 1.596ns. Add to this the cascade through the next 3 (which take 0.36ns each) and 2.62ns of your 6.4ns period are already consumed (for the group of 4 RAMs). Then these 4 need to be merged with the other 3 groups of 4 and MUXed together.
Then, it appears that you have some logic that processes the returned data, and based on that determines whether or not to pop the RAM again (which really pops any one of the 16 RAMs). The net result is a path that cannot be done in 6.4ns.
In general, when working with block RAMs (or FIFOs), you need to take into account the fact that the RAMs are
As a result, it is "good practice" to
A corrollary of this is never try to drive the input of one RAM with the output of another RAM...
While there may be some things that you can play with in the tool (like changing the bram_max_cascade_height value), this is not likely to be enough. Most likely you will need to re-architect this portion of the design. Maybe
None of these will happen using the XPM_FIFO_SYNC - you will have to implement this yourself.
Avrum