12-22-2020 10:26 AM
Hi,
I'm using Vivado 2018.3 and xcvu5p device.
In a large design, I've used 8 cascaded URAMs for storage of my Hash tables. And to improve the timing of design, I've enabled output pipeline registers (in Port A) of URAM blocks for all 8 blocks. The code is attached (HashTable_8_URAM.v).
Synthesis and implement of design is normal and the timing is also met. And here is the instantiation for required URAMs:
HashTable_8_URAM T1_QTY (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb));
HashTable_8_URAM T1_PRC (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb));
HashTable_8_URAM T2_QTY (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb));
HashTable_8_URAM T2_PRC (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb));
HashTable_8_URAM T3_QTY (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb));
HashTable_8_URAM T3_PRC (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb));
Then I go to increase number of stored Hash tables to 4 times. In my Top file, I replicate the instances of URAMs for 4 times as following:
HashTable_8_URAM T1_QTY_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_1), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta_1),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_1), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb_1));
HashTable_8_URAM T1_PRC_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_1), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta_1),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_1), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb_1));
HashTable_8_URAM T2_QTY_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_1), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta_1),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_1), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb_1));
HashTable_8_URAM T2_PRC_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_1), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta_1),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_1), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb_1));
HashTable_8_URAM T3_QTY_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_1), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta_1),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_1), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb_1));
HashTable_8_URAM T3_PRC_1 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_1), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta_1),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_1), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb_1));
/****************************************************************************************************************************************************/
HashTable_8_URAM T1_QTY_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_2), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta_2),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_2), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb_2));
HashTable_8_URAM T1_PRC_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_2), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta_2),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_2), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb_2));
HashTable_8_URAM T2_QTY_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_2), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta_2),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_2), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb_2));
HashTable_8_URAM T2_PRC_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_2), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta_2),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_2), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb_2));
HashTable_8_URAM T3_QTY_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_2), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta_2),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_2), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb_2));
HashTable_8_URAM T3_PRC_2 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_2), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta_2),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_2), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb_2));
/****************************************************************************************************************************************************/
HashTable_8_URAM T1_QTY_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_3), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta_3),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_3), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb_3));
HashTable_8_URAM T1_PRC_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_3), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta_3),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_3), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb_3));
HashTable_8_URAM T2_QTY_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_3), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta_3),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_3), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb_3));
HashTable_8_URAM T2_PRC_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_3), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta_3),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_3), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb_3));
HashTable_8_URAM T3_QTY_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_3), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta_3),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_3), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb_3));
HashTable_8_URAM T3_PRC_3 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_3), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta_3),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_3), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb_3));
/****************************************************************************************************************************************************/
HashTable_8_URAM T1_QTY_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_4), .dina(T1_QTY_dina), .addra(T1_addra), .douta(T1_QTY_douta_4),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_4), .dinb(T1_QTY_dinb), .addrb(T1_addrb), .doutb(T1_QTY_doutb_4));
HashTable_8_URAM T1_PRC_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T1_rdb_wr_a), .en_a(T1_en_a_4), .dina(T1_PRC_dina), .addra(T1_addra), .douta(T1_PRC_douta_4),
.rstb(0), .rdb_wr_b(T1_rdb_wr_b), .en_b(T1_en_b_4), .dinb(T1_PRC_dinb), .addrb(T1_addrb), .doutb(T1_PRC_doutb_4));
HashTable_8_URAM T2_QTY_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_4), .dina(T2_QTY_dina), .addra(T2_addra), .douta(T2_QTY_douta_4),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_4), .dinb(T2_QTY_dinb), .addrb(T2_addrb), .doutb(T2_QTY_doutb_4));
HashTable_8_URAM T2_PRC_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T2_rdb_wr_a), .en_a(T2_en_a_4), .dina(T2_PRC_dina), .addra(T2_addra), .douta(T2_PRC_douta_4),
.rstb(0), .rdb_wr_b(T2_rdb_wr_b), .en_b(T2_en_b_4), .dinb(T2_PRC_dinb), .addrb(T2_addrb), .doutb(T2_PRC_doutb_4));
HashTable_8_URAM T3_QTY_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_4), .dina(T3_QTY_dina), .addra(T3_addra), .douta(T3_QTY_douta_4),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_4), .dinb(T3_QTY_dinb), .addrb(T3_addrb), .doutb(T3_QTY_doutb_4));
HashTable_8_URAM T3_PRC_4 (.clk(clk),
.rsta(0), .rdb_wr_a(T3_rdb_wr_a), .en_a(T3_en_a_4), .dina(T3_PRC_dina), .addra(T3_addra), .douta(T3_PRC_douta_4),
.rstb(0), .rdb_wr_b(T3_rdb_wr_b), .en_b(T3_en_b_4), .dinb(T3_PRC_dinb), .addrb(T3_addrb), .doutb(T3_PRC_doutb_4));
But now if I check the smallest WNS (regardless of timing is met or not), it belongs to the nets of same URAM blocks as shown below:
As you see, the pipeline output registers of URAM don't work. but it's just for one block of URAMs (including 8 cascaded URAMs) [for example, the last one = T3_PRC_4] and these registers are enabled and work for the other blocks of URAM.
Now, if code changes a little and I make synthesis and implement again, the block of URAMs which does not work, can be another one (for example, T2_PRC_4).
Can you help me to fix this bug?
12-25-2020 01:30 AM
No one to help me?
01-05-2021 04:34 AM
it could be because
.OREG_B("FALSE"),
Or maybe because
.REG_CAS_A("FALSE"), // Optional Port A cascade register
.REG_CAS_B("FALSE"), // Optional Port B cascade register
Have you tried using the xpm_memory_spram instantiation method instead?
01-06-2021 02:38 AM
Hi Mahmoud,
I agree with @inth: you haven't enabled any output or cascade registers.
In general we recommend to use XPM memories or inferring (RTL). Examples of XPM can be found in the language templates in the Tools menu of Vivado.
For your case, you actually need the xpm_memory_tdpram which is dual-port like your memory. It's very easy to instantiate this XPM and configure it. The naming of the ports is mostly the same. By increasing the latency, you can infer more pipelining registers.
Best regards
Dries
01-06-2021 04:08 AM
Hi inth & driesd,
Thanks for your reply.
I haven't enabled REG_CAS_A because I'm not to add lots of latency to my design, so I've just enabled the output of last URAM in the cascade chain.
And since I use only port A, so no need to enable the pipeline output registers.
And also as I described the design works correctly when I have just one cascaded chain of URAMs, but when I go to have 4 of them, then the bug arises.
Finally, regarding the XPM, there is no difference conceptually as I know and I will face this bug there too.
But I will try it soon ...