cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
1,102 Views
Registered: ‎07-23-2013

Increasing FastX throughput

Jump to solution

Hi,

 

I have built an FPGA board that takes in video from a custom grayscale camera that can output video at a rate of 300 FPS and I would like to run FASTX on every frame. Regarding the camera interface, after the raw pixel data is deserialization into an 8-bit wide bus the pixel frequency is 300MHz. Instead of using an 8-bit wide single pixel AXI Stream I've grouped the pixels into an 8 byte wide 64-bit AXI Stream that runs off of the DDR3 UI clock at 100MHz.

 

I have some questions:

 

1. If I use a 64-bit wide bus to group together 8 pixels would the hls::AXIvideo2Mat correctly convert this into an hls::MAT even though the MAT is typed as an 'HLS_8UC1'. I believe that I would need to add a 'stream' directive with an appropriate depth to handle the 64 -> 8 bit conversion.

 

2. The FASTX core is performing well with a video size of 1024x768 = 786432 pixels. According to the HLS synthesizer output the max latency is 797,218, only a 10,786 clock cycle delay of ~.1 mS! Unfortunately, I need the latency/interval to be lower so that I can process data at least 3X faster. Is there a way to pipeline the FASTX function to use something like array partitioning to help FASTX parallelize the task. I understand that it will take more resources but at the moment my speed limit is roughly 125 FPS.

 

Thanks for any feedback,

 

Dave

0 Kudos
1 Solution

Accepted Solutions
Advisor
Advisor
1,069 Views
Registered: ‎04-26-2015

HLS is can automatically optimize things to take one-or-more clock cycles on average - as you've found, it's getting pretty close to one pixel per clock cycle with the existing system.

 

Getting it to do the operation in less than one cycle on average (ie more than one pixel per cycle) is a significant challenge. You'll have to essentially rewrite the FASTX algorithm from scratch as none of the built-in functions can do multiple pixels per clock cycle. Doing eight at a time should not be too expensive from a computational point of view, but it's going to require a lot of careful buffering.

 

As a workaround, can you just boost the clock to 250MHz and thereby get a bit over 300 FPS which still doing one pixel per clock cycle? I've done a >350MHz HLS block on a 7-series FPGA before, and I would imagine that Ultrascale would have a lot more headroom.

 

 

View solution in original post

2 Replies
Advisor
Advisor
1,070 Views
Registered: ‎04-26-2015

HLS is can automatically optimize things to take one-or-more clock cycles on average - as you've found, it's getting pretty close to one pixel per clock cycle with the existing system.

 

Getting it to do the operation in less than one cycle on average (ie more than one pixel per cycle) is a significant challenge. You'll have to essentially rewrite the FASTX algorithm from scratch as none of the built-in functions can do multiple pixels per clock cycle. Doing eight at a time should not be too expensive from a computational point of view, but it's going to require a lot of careful buffering.

 

As a workaround, can you just boost the clock to 250MHz and thereby get a bit over 300 FPS which still doing one pixel per clock cycle? I've done a >350MHz HLS block on a 7-series FPGA before, and I would imagine that Ultrascale would have a lot more headroom.

 

 

View solution in original post

Highlighted
Observer
Observer
1,024 Views
Registered: ‎07-23-2013
Thanks for the reply,

That's what I was apprehensively anticipating.

Save
0 Kudos