UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Reply

LUT utilization when I mux several BRAMS/SRAMS

Highlighted
Visitor
Posts: 14
Registered: ‎05-31-2018

LUT utilization when I mux several BRAMS/SRAMS

I want to implement sram muxing logic like below code.

This switch statement consume LUT very much.

It seems like I use 32-parallel SRAMs.

I tests with cyclic_factor=1 and get very low LUT utilization. but it has very low throughput.

Anybody knows why I need much LUT to implement SRAM mux and give me solution to reduce source but same performance?

 

 

 #pragma HLS ARRAY_RESHAPE variable=sram_0   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_1   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_2   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_3   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_4   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_5   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_6   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_7   cyclic factor = 32   dim=1

 

for(i=0; i<100; i++){

   switch(sram_number){

      case 0 : read_data = sram_0[i]; break;

      case 1 : read_data = sram_1[i]; break;

      case 2 : read_data = sram_2[i]; break;

      case 3 : read_data = sram_3[i]; break;

      case 4 : read_data = sram_4[i]; break;

      case 5 : read_data = sram_5[i]; break;

      case 6 : read_data = sram_6[i]; break;

      case 7 : read_data = sram_7[i]; break;

      default : break;

   }

  do_something();

}

Scholar
Posts: 2,709
Registered: ‎04-26-2015

Re: LUT utilization when I mux several BRAMS/SRAMS

Can you give us a bit more code? Is that loop from 0 to 100 unrolled? If not, I can't see how factor=1 reshaping would be any slower than factor=32.

 

In practical terms, if you never actually access multiple RAMs at once (eg. you're doing 100 reads from one RAM, then 100 reads from another RAM, etc) it would make sense to merge them into one big array and do a cyclic partition on that. That way, you're always using all the RAM ports, rather than only every using 1/8 of the available ports.

Visitor
Posts: 14
Registered: ‎05-31-2018

Re: LUT utilization when I mux several BRAMS/SRAMS

I use array_partition pragma instead of array_reshape.

Synthesis result show that it use just a few LUTs.

I will verify function and throughput.

Thanks!