cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
mahmoud_xilinx
Observer
Observer
482 Views
Registered: ‎08-11-2019

Pipeline output registers for URAM

Hi,

I'm using Vivado 2018.3 and xcvu5p device.

In a large design, I've used 8 cascaded URAMs for storage of my Hash tables. And to improve the timing of design, I've enabled output pipeline registers (in Port A) of URAM blocks for all 8 blocks. The code is attached (HashTable_8_URAM.v).

Synthesis and implement of design is normal and the timing is also met. And here is the instantiation for required URAMs:

HashTable_8_URAM T1_QTY (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb));

HashTable_8_URAM T1_PRC (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb));

HashTable_8_URAM T2_QTY (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb));

HashTable_8_URAM T2_PRC (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb));

HashTable_8_URAM T3_QTY (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb));

HashTable_8_URAM T3_PRC (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb));


Then I go to increase number of stored Hash tables to 4 times. In my Top file, I replicate the instances of URAMs for 4 times as following:

  HashTable_8_URAM T1_QTY_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_1), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta_1),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_1), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb_1));

HashTable_8_URAM T1_PRC_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_1), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta_1),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_1), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb_1));

HashTable_8_URAM T2_QTY_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_1), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta_1),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_1), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb_1));

HashTable_8_URAM T2_PRC_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_1), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta_1),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_1), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb_1));

HashTable_8_URAM T3_QTY_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_1), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta_1),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_1), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb_1));

HashTable_8_URAM T3_PRC_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_1), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta_1),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_1), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb_1));

/****************************************************************************************************************************************************/
HashTable_8_URAM T1_QTY_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_2), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta_2),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_2), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb_2));

HashTable_8_URAM T1_PRC_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_2), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta_2),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_2), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb_2));

HashTable_8_URAM T2_QTY_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_2), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta_2),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_2), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb_2));

HashTable_8_URAM T2_PRC_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_2), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta_2),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_2), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb_2));

HashTable_8_URAM T3_QTY_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_2), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta_2),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_2), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb_2));

HashTable_8_URAM T3_PRC_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_2), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta_2),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_2), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb_2));

/****************************************************************************************************************************************************/
HashTable_8_URAM T1_QTY_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_3), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta_3),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_3), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb_3));

HashTable_8_URAM T1_PRC_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_3), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta_3),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_3), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb_3));

HashTable_8_URAM T2_QTY_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_3), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta_3),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_3), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb_3));

HashTable_8_URAM T2_PRC_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_3), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta_3),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_3), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb_3));

HashTable_8_URAM T3_QTY_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_3), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta_3),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_3), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb_3));

HashTable_8_URAM T3_PRC_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_3), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta_3),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_3), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb_3));

/****************************************************************************************************************************************************/
HashTable_8_URAM T1_QTY_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_4), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta_4),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_4), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb_4));

HashTable_8_URAM T1_PRC_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_4), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta_4),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_4), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb_4));

HashTable_8_URAM T2_QTY_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_4), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta_4),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_4), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb_4));

HashTable_8_URAM T2_PRC_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_4), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta_4),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_4), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb_4));

HashTable_8_URAM T3_QTY_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_4), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta_4),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_4), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb_4));

HashTable_8_URAM T3_PRC_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_4), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta_4),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_4), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb_4));

 

But now if I check the smallest WNS (regardless of timing is met or not), it belongs to the nets of same URAM blocks as shown below:

Schematic.jpg

As you see, the pipeline output registers of URAM don't work. but it's just for one block of URAMs (including 8 cascaded URAMs) [for example, the last one = T3_PRC_4] and these registers are enabled and work for the other blocks of URAM.

Now, if code changes a little and I make synthesis and implement again, the block of URAMs which does not work, can be another one (for example, T2_PRC_4).

Can you help me to fix this bug?

0 Kudos
Reply
4 Replies
mahmoud_xilinx
Observer
Observer
407 Views
Registered: ‎08-11-2019

No one to help me?

0 Kudos
Reply
inth
Visitor
Visitor
275 Views
Registered: ‎11-23-2018

it could be because

.OREG_B("FALSE"),

Or maybe because 

.REG_CAS_A("FALSE"), // Optional Port A cascade register

.REG_CAS_B("FALSE"), // Optional Port B cascade register

 

Have you tried using the xpm_memory_spram instantiation method instead?

driesd
Xilinx Employee
Xilinx Employee
241 Views
Registered: ‎11-28-2007

Hi Mahmoud,

I agree with @inth: you haven't enabled any output or cascade registers.

In general we recommend to use XPM memories or inferring (RTL). Examples of XPM can be found in the language templates in the Tools menu of Vivado.

For your case, you actually need the xpm_memory_tdpram which is dual-port like your memory. It's very easy to instantiate this XPM and configure it. The naming of the ports is mostly the same. By increasing the latency, you can infer more pipelining registers.

 

Best regards

Dries

--------------------------------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented by clicking the star next to the post.
mahmoud_xilinx
Observer
Observer
230 Views
Registered: ‎08-11-2019

Hi inth & driesd,
Thanks for your reply.

I haven't enabled REG_CAS_A because I'm not to add lots of latency to my design, so I've just enabled the output of last URAM in the cascade chain.
And since I use only port A, so no need to enable the pipeline output registers.
And also as I described the design works correctly when I have just one cascaded chain of URAMs, but when I go to have 4 of them, then the bug arises.

Finally, regarding the XPM, there is no difference conceptually as I know and I will face this bug there too.

But I will try it soon ...

0 Kudos
Reply