UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
821 Views
Registered: ‎05-21-2017

Same design, different amount of BRAM usage between different devices?

Hello,

 

My design could - in theory - fit in a smaller FPGA. But HLS uses a much greater number of BRAMs in the smaller FPGA making my design impossible to implement in this smaller FPGA.

 

Specifically, for the smaller fpga - xc7z020clg484-1 - I get:

 

================================================================
== Utilization Estimates
================================================================
* Summary: 
+-----------------+---------+-------+--------+-------+
|       Name      | BRAM_18K| DSP48E|   FF   |  LUT  |
+-----------------+---------+-------+--------+-------+
|DSP              |        -|      -|       -|      -|
|Expression       |        -|      -|      45|     72|
|FIFO             |        -|      -|       -|      -|
|Instance         |        -|    128|   45101|  21200|
|Memory           |      360|      -|     240|     64|
|Multiplexer      |        -|      -|       -|   2450|
|Register         |        -|      -|     504|      -|
+-----------------+---------+-------+--------+-------+
|Total            |      360|    128|   45890|  23786|
+-----------------+---------+-------+--------+-------+
|Available        |      280|    220|  106400|  53200|
+-----------------+---------+-------+--------+-------+
|Utilization (%)  |      128|     58|      43|     44|
+-----------------+---------+-------+--------+-------+

and for the larger - xczu3eg-sfva625-1-i - fpga i get:

 

================================================================
== Utilization Estimates
================================================================
* Summary: 
+-----------------+---------+-------+--------+-------+-----+
|       Name      | BRAM_18K| DSP48E|   FF   |  LUT  | URAM|
+-----------------+---------+-------+--------+-------+-----+
|DSP              |        -|      -|       -|      -|    -|
|Expression       |        -|      -|      45|     74|    -|
|FIFO             |        -|      -|       -|      -|    -|
|Instance         |        -|    128|   41805|  21948|    -|
|Memory           |      238|      -|     240|     64|    -|
|Multiplexer      |        -|      -|       -|   2450|    -|
|Register         |        -|      -|     504|      -|    -|
+-----------------+---------+-------+--------+-------+-----+
|Total            |      238|    128|   42594|  24536|    0|
+-----------------+---------+-------+--------+-------+-----+
|Available        |      432|    360|  141120|  70560|    0|
+-----------------+---------+-------+--------+-------+-----+
|Utilization (%)  |       55|     35|      30|     34|    0|
+-----------------+---------+-------+--------+-------+-----+

for the exact same design.

 

 

I cannot understand why my design does not fit the smaller fpga. Is an issue of the internal architecture of the fpga?
I attach the full synthesis reports for the one who would like to take a closer look..

 

 

Cheers,

Panos

 

Without proper software tools the hardware is unusable no matter how good and well designed it is.
0 Kudos
4 Replies
Scholar u4223374
Scholar
807 Views
Registered: ‎04-26-2015

Re: Same design, different amount of BRAM usage between different devices?

Hmm, that's a new one.

 

My first guess would be that you've got different synthesis directives active. The directives applied in their own file are unique to a solution (ie if you've got one solution for the ZU3EG and another for the 7Z020 then they will not share directives). The easy way to check that is to change the ZU3EG solution to build with the 7Z020. If that does turn out to be the issue, the long-term solution is to add the directives as pragmas within the code, which will be shared across all solutons.

 

If that fails, it's time to build the problematic functions one at a time and see what's causing this behaviour. "hw_conv_p_weightscud" looks like the logical place to start.

 

 

0 Kudos
Explorer
Explorer
782 Views
Registered: ‎05-21-2017

Re: Same design, different amount of BRAM usage between different devices?

@u4223374

 

I only change the device - the directives are from the same file.

 

The reports above are generated running the following tcl file using vivado_hls in command line:

 

open_project -reset time_conv_layer

set_top hw_conv
add_files ./hw_conv_helper.cpp -cflags "-std=c++0x"
add_files ./hw_conv.cpp -cflags "-std=c++0x"
add_files -tb ./sw_conv.cpp -cflags "-std=c++0x"
add_files -tb ./conv_tb.cpp -cflags "-std=c++0x"

set all_solution [list sol_100p1 sol_100p2]
set all_part [list xczu3eg-sfva625-1-i xc7z020clg484-1]
set all_clock_period [list 10 10]

foreach solution $all_solution part $all_part clock_period $all_clock_period {

	open_solution -reset $solution
	set_part $part
	create_clock -period $clock_period -name default
	source "./directives.tcl"
	csynth_design

}

exit
Without proper software tools the hardware is unusable no matter how good and well designed it is.
0 Kudos
Scholar u4223374
Scholar
767 Views
Registered: ‎04-26-2015

Re: Same design, different amount of BRAM usage between different devices?

After looking through the reports, the Zynq 7020 one makes perfect sense to me. This is how HLS has normally worked: stack block RAMs end-to-end to get the required depth, then put those in parallel for the width. On "p_weights", for example, it's trying to build a 18,432*16-bit RAM. Using 18K BRAMs, it's stacking two RAMs in 16K*1-bit mode end-to-end to get 32K*1-bit, then stacking 16 of those in parallel to get the width. The result is a neat 32 RAMs.

 

The ZU3EG version is apparently not doing that; and I cannot figure out what it's doing. Reversing the above approach (ie starting with 1K*16-bit RAMs and stacking those) should only require 18 RAMs. Doing the bottom 16384*16-bit section in the original approach (16 RAMs) and then sticking a little bit of RAM on top for the extra depth would make sense too, but that still only requires 18 RAMs. I can't see any way that it can need 19!

 

As for why it's behaving differently on the two chips, I think I've found the answer. The UltraScale+ block RAMs support a much more extensive cascading system than the 7-series ones; you can stack (for example) 18 1K*16-bit RAMs to get an 18K*16-bit RAM inside a block RAM column, without using general logic resources. The 7-series can't do that; HLS would have had to build 18 individual 1K*16-bit RAMs and a 16-bit 18-to-1 multiplexer to select between them. This adds both resource cost and delay, so HLS never does it unless you force it to.

Explorer
Explorer
727 Views
Registered: ‎05-21-2017

Re: Same design, different amount of BRAM usage between different devices?

 

It would be very nice if there was an official answer for this one because this behavior results in a big waste of resources.

Without proper software tools the hardware is unusable no matter how good and well designed it is.
0 Kudos