cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
m.asiatici
Visitor
Visitor
257 Views
Registered: ‎02-01-2017

Achieving peak bandwidth with single requests on ZC706 DDR3

Hello,

I am trying to find the best configuration to achieve peak DDR3 bandwidth using single 512-bit read requests on a ZC706. My target frequency is 200 MHz. In my final applications, requests will not be necessarily sequential but I am currently sending out sequential requests as a benchmark, to find the configuration that can get me the highest performance at least in the ideal case.

While using bursts of at least two beats I can essentially achieve the theoretical 12.8 GB/s (64 bytes x 200 MHz), with single requests I am stuck at about 70% of that value. This was achieved using the following configuration:

my_accelerator -> AXI SmartConnect -> MIG

leaving all the default settings on the AXISmartConnect. Note that my_accelerator can send unlimited outstanding read requests.

Using an ILA to look into what is happening on the two sides of the SmartConnect, it looks like the SmartConnect does not have enough buffering capacity to allow fully pipelined operation on the AR channel: while the latency of the memory is 44-45 cycles, I can only send at most 33 outstanding requests, which explains the 70% memory bandwidth limit:

Screenshot from 2021-02-18 10-47-10.png

The bottleneck is introduced by the SmartConnect as the ARREADY coming from the MIG always remains high:

Screenshot from 2021-02-18 10-48-28.png

I tried playing around with the buffering settings in the SmartConnect (such as adding extra AR pipeline buffers, since AR_SIZE in S00_buffer is limited to 32) but whatever I did it always ended up reducing the number of outstanding reads even further. 

Surprisingly, if I remove the SmartConnect and connect the accelerator directly to the MIG, the latter seems to behave differently and only accept one request every other cycle, which was not the case with the SmartConnect:

Screenshot from 2021-02-18 11-21-14.png

I also tried increasing NUM_READ_OUTSTANDING on the AXI Master port of my IP from the default of 2 to 128 but it made no difference.

Is there a way to increase the buffering capacity inside the SmartConnect or, alternatively, to get the MIG to behave just like with the SmartConnect even when connected directly to my IP?

Thanks,

Mikhail

0 Kudos
0 Replies