05-18-2021 05:42 AM
I am currently using Basys3 board.( Using microblaze soft processor in my design )
I am looking for a way to read and write data into bram quickly. I am using a custom ip to do the computations related to the data. The outputs computed by the ip are then stored into the bram which is then again sent to the ip which processes the data in a different manner this time around and so on.This continues for a few cycles.So data transfer in an out of bram is quite an important factor affecting the peformance of my design.
I went through the method of using Axi dma, i.e., 1. Connecting axi stream interfaces with the custom ip(M_AXIS_MM2S and S_AXIS_S2MM) and 2. (normal)axi interfaces with the bram(through bram controller and then block memory generator) ( using M_AXI_S2MM and M_AXI_MM2S of the dma).
Part 1: If I am understanding this correctly, then although the data in and out of the custom ip is using axi stream but the data transfer in the Bram is using axi interface only which slows down the design ( here i am assuming that data transfer through axi stream is much faster than (normal) axi which will take a lot of cycles to transfer the data in and out of bram - hence slower design.
So, I would like it if someone could approve or reject ( and then clarify the right answer ) this notion of mine. And if possible tell me a better approach than this for my design.
Part 2 : Although I am currently using Basys 3 Board, in future I will move on to a higher end board and will be eventually using ddr memory for my design instead of bram. So is there a way I can use bram ( in my current board ) which will resemble using ddr : Sort of like a design where my bram acts as a ddr so I can check my ideas for the application as if I was actually using ddr in the higher end board right now and not have to change much when I use this design (of basys 3 board) on the higher end board.
05-18-2021 06:09 AM - edited 05-18-2021 06:19 AM
The fastest way to interface Xilinx BRAMs would be to use their native interface, i.e. the signals ADDR*, EN*, WEN*, DI*, DO*, etc.
For a complete understanding you should consult the UG473, https://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memory_Resources.pdf
Part 1: Yes cycles are lost in protocol conversion to and from. AXI4 full is a heavy protocol and should be avoided if allowable by design. Comparatively AXIS is comparatively light-weight and conversion to and from native signals to AXIS is relatively easier. But remember that in AXIS, memory addresses are not involved (probably that is why AXI full is used wherever memories need to written and read).
You have to keep in mind that you need to interface the BRAM with the uBlaze, so you have kind of little choice but to use the interface which uBlaze uses to communicate with the memory. In my opinion a custom glue logic between the uBlaze and the BRAM native signals should be the fastest. You can have your cycle accurate logic built here.
Part 2: If you want to move to DDR memory, then you have to study the Xilinx MIG IP core docu. Here again the MIG core can be generated in native mode (native signal interface) or AXI4 full mode. In AXI4 full mode the AXI4 interface signals are internally converted to native signals (so some cycles will be lost in conversion).
It all depends how much latency you design can handle and how fast your custom IP computes the data and needs more data. Generally putting a well-calculated large buffer memory (built out of BRAMs) just before the MIG core should help. You have to play with your design a bit in this stage.
Consider giving "Kudos" if you like my answer. Please mark my post "Accept as solution" if my answer has solved your problem
Asking for solutions to problems via PM will be ignored.
05-18-2021 06:22 AM
Regarding the move to DDR
This is a big step, ni that BRAM and DDR are very different
BRAM , every access has the same speed, and data is always available,
Whilst DDR , there are all sorts of things that can affect speed., and if you dont refresh the data , either yoru self or part of the cycle, data is lost.
reading a DDR cache line is the fastest,
reading a single byte is likely to be very slow,
you can be held up by an enforced refresh, or re calibration
In short , its very different to a BRAM,
you probably need a totally different system between the two, not just a change of the memory controller.
Also in general regarding data rate,
BRAMs are dual port you can do two writes, two reads or a write and a read at the same time, or a single read / write.
The width affects the data rate, if you write 128 bytes into 128 byte wide BRAMs in parallel , that's faster than writing to 16 byte wide BRAMs at 4 times the frequency .