06-19-2018 11:05 PM
Hi there,
I currently have a design that uses quite large amounts of BRAM. The device I am targetting is XCKU40 which has a maximum of 21.1 Mb.
After synthesis, the design says that 0.0 BRAMs are implemented (not sure what is happening there, it appears to be optimising things out, although it is weird that implementation attempts to route BRAMs that have not been optimized away.... ). After synthesizing, I run implementation which eventually fails due to the design requiring more cells than required in the design. Here are the error messages:
[Place 30-640] Place Check : This design requires more RAMB36/FIFO cells than are available in the target device. This design requires 622 of such cell types but only 600 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.
[Place 30-640] Place Check : This design requires more RAMB18 and RAMB36/FIFO cells than are available in the target device. This design requires 1303 of such cell types but only 1200 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.
[Place 30-640] Place Check : This design requires more RAMB36E2 cells than are available in the target device. This design requires 622 of such cell types but only 600 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.
Okay, so it appears that I am exceeding the device maximum. However, if I do the calculations, I should be using around 80% of theoretical maximum:
My design is for video processing, and uses the following:
06-20-2018 03:54 PM
Hi @aaron_holliday. Are there any out of context modules or IP in this design? Since they are separately, the Block RAM utilization would not show up in the top-level (global) synthesis report.
I would try opening the post-synthesis design and checking the utilization at that point with report_utilization.
-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
06-20-2018 06:52 AM
BRAMs are used in 16 Kbit or 32 Kbit chunks whether you use 1 bit or all of the bits. The data width can also determine how efficiently the BRAMs are used. A data width that is a power of 2 (2,4,8,16,32...) can use an entire BRAM. Other data widths will typically waste part of the BRAM. This will depend upon whether or not you are able to use the parity bits as additional data bits.
The Memory Interface Generator usually does a pretty good job of efficiently building BRAM memories, but there is typically some amount of wasted BRAM space.
You can find all info about BRAMs in the UltraScale Architecture Memory Resources User Guide (UG573).
https://www.xilinx.com/support/documentation/user_guides/ug573-ultrascale-memory-resources.pdf
06-20-2018 03:54 PM
Hi @aaron_holliday. Are there any out of context modules or IP in this design? Since they are separately, the Block RAM utilization would not show up in the top-level (global) synthesis report.
I would try opening the post-synthesis design and checking the utilization at that point with report_utilization.
-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
06-20-2018 04:26 PM
hi @marcb, I believe there are out of context modules/ IP in the design. I used report_utilization and got the following for blockRAM usage, so it is odd that my design is exceeding the maximum:
Any idea why this would be causing implementation to fail? Are there any other categories that BRAM is being used in perhaps?
06-20-2018 04:30 PM
@tedbooth Thanks for this information - I think that in my case, the memory exceeding the maximum is not due to BRAMs being partially full, even if they are - each of my memory structures might waste one entire RAMB36E2 each, but given that I have ~50 FIFOs and then 5 frame buffers, would be 55 wasted RAM36s in worst case. Apparently my utilization is at 220 / 600 so even with 100s more, it should still be fine. Thank you for the document, I will read through it.
06-20-2018 04:32 PM
Actually, RAMs with data widths of 9, 18, and 36 utilize the underlying structure the best (i.e. 8 bit Bytes + 1 bit parity = 9 bits, and multiple lanes thereof).
I suggest running a:
report_ram_utilization -detail
And looking at the results. The report format isn't the best, but you should be able to get a good idea of where all your BLOCK Rams are.
Good Luck
Mark
06-20-2018 06:00 PM
hi @markcurry thanks for the suggestion. Unfortunately, my data widths for the pipeline I've implemented are for a 14 bit sensor, so unless I truncate the sensor output, I cannot really use widths of 9, 18 or 36. I will, however, report the ram utilization after this implementation run finishes, and report back with the results. Thanks
06-20-2018 07:03 PM
hi @markcurry the results of report_ram_utilization -detail are hard to read as you said, haha. But it is helpful especially for confirming which clocks are driving which RAMs etc. My design has implemented correctly with OK timing so I think I will close this post. I still have my suspicions that the RAM usage is slightly higher than it should be, but if my design fits, it doesn't bother me too much. Thanks for the report_ram_utilization tip.