UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor hun03177
Visitor
354 Views
Registered: ‎05-31-2018

LUT utilization when I mux several BRAMS/SRAMS

I want to implement sram muxing logic like below code.

This switch statement consume LUT very much.

It seems like I use 32-parallel SRAMs.

I tests with cyclic_factor=1 and get very low LUT utilization. but it has very low throughput.

Anybody knows why I need much LUT to implement SRAM mux and give me solution to reduce source but same performance?

 

 

 #pragma HLS ARRAY_RESHAPE variable=sram_0   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_1   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_2   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_3   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_4   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_5   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_6   cyclic factor = 32   dim=1

 #pragma HLS ARRAY_RESHAPE variable=sram_7   cyclic factor = 32   dim=1

 

for(i=0; i<100; i++){

   switch(sram_number){

      case 0 : read_data = sram_0[i]; break;

      case 1 : read_data = sram_1[i]; break;

      case 2 : read_data = sram_2[i]; break;

      case 3 : read_data = sram_3[i]; break;

      case 4 : read_data = sram_4[i]; break;

      case 5 : read_data = sram_5[i]; break;

      case 6 : read_data = sram_6[i]; break;

      case 7 : read_data = sram_7[i]; break;

      default : break;

   }

  do_something();

}

0 Kudos
2 Replies
Scholar u4223374
Scholar
343 Views
Registered: ‎04-26-2015

Re: LUT utilization when I mux several BRAMS/SRAMS

Can you give us a bit more code? Is that loop from 0 to 100 unrolled? If not, I can't see how factor=1 reshaping would be any slower than factor=32.

 

In practical terms, if you never actually access multiple RAMs at once (eg. you're doing 100 reads from one RAM, then 100 reads from another RAM, etc) it would make sense to merge them into one big array and do a cyclic partition on that. That way, you're always using all the RAM ports, rather than only every using 1/8 of the available ports.

Visitor hun03177
Visitor
331 Views
Registered: ‎05-31-2018

Re: LUT utilization when I mux several BRAMS/SRAMS

I use array_partition pragma instead of array_reshape.

Synthesis result show that it use just a few LUTs.

I will verify function and throughput.

Thanks!

0 Kudos