cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
5,125 Views
Registered: ‎07-11-2016

How can I test the actual bandwidth of FPGA?

I am now using 7 series board, and I want to test, for example, the maximum number of data that AFI port can transfer in one cycle assuming the data is 16-bit. My tool is SDSoC, is there some efficient method such as partition the bandwidth in SDSoC to transfer data.

0 Kudos
3 Replies
Highlighted
Moderator
Moderator
5,053 Views
Registered: ‎04-17-2011

Moving to correct board.
Regards,
Debraj
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
5,048 Views
Registered: ‎06-29-2015

Hi a1300012709,

 

In Zynq-7000 devices the AFI/HP/ACP ports have a configurable 32/64-bit data width. SDSoC by default configures these for 64-bit width and connects AXI DMAs to them which also use 64-bit data widths. 

 

If your data is an array of 16-bit elements and you transfer these using SDSoC then they will get read out in contiguous 64-bit chunks by the DMA and then later downsized to 16-bit elements prior to being passed into your accelerator core. 

 

The DMA has a default buffer size of 512 64-bit words so it will read out the data from the PS port as fast as possible, but if your accelerator doesnt consume it fast enough then it will eventually back up and your throughput will drop. 

 

If in SDSoC you use a Data Motion Clock Frequency (the freq. the DMA will run at) of 100MHz, then the DMA will be able to provide 4x 16-bit elements every clock cycle. But then the data width converter that takes one 64-bit chunk and turns it into 16-bit elements will run at the same frequency as the DMA, so you'll get 1x 16-bit element per cycle. Even if you choose a frequency for your accelerator of 400MHz you'll still get the data width converter running at 100MHz, so your core wont be able to consume the data fast enough to keep the DMA from backing up. 

 

If you really want to maximize the PS port bandwith, you should consider reconfiguring your hardware function such that you have 64-bit streaming interfaces. Inside your core you can do whatever data manipulation to ensure that you can operate on 4x 16-bit elements per cycle. 

 

Sam

0 Kudos
Highlighted
Observer
Observer
5,032 Views
Registered: ‎07-11-2016

Many thanks for your reply.

Now I use a struct in C code to pack 4 16-bit data into 64-bit. But I tested the transfer time, the port cannot transfer a 64-bit struct in one cycle. What's the problem?

0 Kudos