cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
313 Views
Registered: ‎03-11-2019

Increasing throughput with multiple instances of a hardware block (zynq-7000)

Jump to solution

Hi everyone

I do have  a hardware accelerator (B1) connected to the processor via the AXI-Stream Interface in a Zynq-7000 device.

I would like to understand where the bottleneck is, and how I could achieve an increment in the throughput by having multiple instances of the hardware accelerator synthesized in the FGPA. 

For instance, would I achieve a speedup if I create three blocks B1, B2, B3, connected to the HP0, HP1, and HP3 ports, respectively? 

What is the best approach to benefit from this kind of parallelism (having multiple instances of a particular kernel) in the FPGA?

 

Best,

Medrano

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Scholar
Scholar
275 Views
Registered: ‎03-28-2016

Re: Increasing throughput with multiple instances of a hardware block (zynq-7000)

Jump to solution

@medrano ,

I would suggest that you first concentrate on optimizing the HLS IP to make sure that it is as efficient as possible.

After that, it really depends upon the architecture of your system if multiple instances will increase the systems throughput.  If the IP directly transfers data to/from the DDR or if it uses a DMA to transfer the data, then multiple instances could improve the throughput.  The trick then is handling the DDR buffers.  Can the system fill and empty the DDR buffers fast enough the keep all of the IP instances running at the same time.

To instantiate multiple instances you can use several of the HP ports on the PS.  Depending upon the data rate, you could also use an AXI_Interconnect to link multiple IP to one HP port.  You will need to make the HP port as wide as possible and run the AXI clock as fast as possible.

Ted Booth | Tech. Lead FPGA Design Engineer | DesignLinx Solutions
https://www.designlinxhs.com

View solution in original post

0 Kudos
3 Replies
Highlighted
Scholar
Scholar
276 Views
Registered: ‎03-28-2016

Re: Increasing throughput with multiple instances of a hardware block (zynq-7000)

Jump to solution

@medrano ,

I would suggest that you first concentrate on optimizing the HLS IP to make sure that it is as efficient as possible.

After that, it really depends upon the architecture of your system if multiple instances will increase the systems throughput.  If the IP directly transfers data to/from the DDR or if it uses a DMA to transfer the data, then multiple instances could improve the throughput.  The trick then is handling the DDR buffers.  Can the system fill and empty the DDR buffers fast enough the keep all of the IP instances running at the same time.

To instantiate multiple instances you can use several of the HP ports on the PS.  Depending upon the data rate, you could also use an AXI_Interconnect to link multiple IP to one HP port.  You will need to make the HP port as wide as possible and run the AXI clock as fast as possible.

Ted Booth | Tech. Lead FPGA Design Engineer | DesignLinx Solutions
https://www.designlinxhs.com

View solution in original post

0 Kudos
Highlighted
Visitor
Visitor
227 Views
Registered: ‎03-11-2019

Re: Increasing throughput with multiple instances of a hardware block (zynq-7000)

Jump to solution

Thanks a lot for your answer @tedbooth 

What do you exactly mean with making the HP ports as wide as possible? Do you mean the data-size of the AXI-Stream interface?

Are you aware of any example or application note where a similar problem is addressed?

 

Best,

Medrano

0 Kudos
Highlighted
Scholar
Scholar
175 Views
Registered: ‎03-28-2016

Re: Increasing throughput with multiple instances of a hardware block (zynq-7000)

Jump to solution

@medrano,

Yes, set the AXI data width as wide as possible.

Ted Booth | Tech. Lead FPGA Design Engineer | DesignLinx Solutions
https://www.designlinxhs.com
0 Kudos