06-06-2019 12:44 AM - edited 06-06-2019 12:49 AM
My current design utilizes more resource than available on the FPGA board. For the baseline implementation, an array is declared as:
with the total number BRAM_18k utilization of 57.
After applying the following pragma, the BRAM_18k utilization becomes 29*4 = 116 even if the total number of bits 262144*4 = 1048576 bits is the same as above.
ap_uint<512> out_mem #pragma HLS array_partition variable = out_mem cyclic factor = 4 dim = 1
Is there any way to reduce the utilization of BRAM_18k in this case?
06-06-2019 07:06 AM
Unfortunately, I don't see a good solution at first glance. The unpartitioned RAM has a very high utilization of the 57 BRAMs. The 4 partitioned RAMs have a much lower utilization. The data width appears to be the issue. A data width of 512-bits requires 29 RAMs each with an 18-bit data bus.
You might have to look at your algorithm and see if you can make changes at the level.
06-06-2019 07:24 AM
As @tedbooth has said, getting that 512-bit width is requiring that each sub-array uses 29 block RAMs. The widest each RAM can be in TDP mode is 18-bit, which leads to a depth of 1024 elements - twice what you actually need.
Two options from here:
- Only partition it with a factor of 2, taking advantage of the fact that each RAM has two ports (ie so you can still read four values per cycle). This should reduce the total number of RAMs to 58.
- Keep the partition factor of 4, but put the RAM in SDP mode. This means you only get a single read port and a single write port from each sub-array, but in SDP mode the RAMs can operate in 512x36-bit configuration. This should reduce the number required for each sub-array down to 15, for a total of 60 RAMs used.
06-09-2019 05:03 AM - edited 06-09-2019 05:04 AM
However, I was just wondering if there is any way or any HLS pragma that would tell the Vivado HLS to configure the BRAM_18k , trading the address spacec for bit width? The array case is acceptable, but I am more concerned about implementing a not so deep, but wide data width FIFO queue, such as:
hls::stream<ap_uint<512> > queue; #pragma HLS stream variable=queue depth=4 dim=1
This requires only 2048 bits plus other resource for logic. But as of right now, it wastes ~29 BRAM_18k.
Question: Is there any way in HLS to configure the BRAM_18k to trade address space for data width?
06-09-2019 05:09 AM
06-09-2019 06:31 AM - edited 06-09-2019 06:33 AM
However, I do not know how if it is applicable in HLS, or how I would use this in HLS. I am not so familiar with this advanced topic. I will take some time to look into it.
06-10-2019 05:46 AM
The XPM that you mentioned is not applicable to HLS. That is for RTL development.
HLS does allow the user to specify how an operation or array is implemented by way of the RESOURCE directive. Search "set_directive_resource" in UG902 for more details.
In the case of a shallow but wide memory, you might want to look at distributed memory instead of Block RAM.