01-21-2018 10:28 PM
I am a graduate student, and I was recently looking for an article on how to implement image processing with an FPGA. I searched for some English articles in the electronic journal, and read one or two articles, but these articles did not involve too much in terms of FPGA image processing. Is there any information in this regard? Can you share it with me? Thank you very much for your reply.
01-21-2018 10:37 PM
01-21-2018 10:37 PM
01-22-2018 12:21 AM
01-22-2018 02:47 AM
The normal approach to template matching in HLS would be to store your template in block RAM, written over AXI Lite (or read from system RAM by an AXI Master). Then you stream the main image in and compare it to the on-chip template.
If the template is very large then you might have to pull it in from external RAM too. This is not a nice solution, but it's about the best option available.
01-22-2018 10:37 PM
01-23-2018 02:35 AM
I've done it that way, but I've actually had more success with a rather simpler approach: just buffer enough lines of the image to do a full comparison. If your template is (for example) 20x20 pixels, you'll have to buffer 20 image lines. Then on each cycle you just apply the matching operation (whatever method you choose to use; NCC, SAD, etc). I used this method because for me it was easier to write and more space efficient (with 8-bit greyscale images it's cheaper to store extra pixels than to store all your in-progress convolution/correlation sums). However, it turns out that HLS is very good at optimizing this operation, and so it also turned out to be both small (in terms of hardware) and fast (I think the claimed top speed was something like 400MHz on a 7-series chip).
Using your method, where you get one line of the image at a time, you have to store the in-progress correlation sums. For a sample application (10x10 template, SAD, 8-bit greyscale image) the correlation sum is a signed 16-bit value (each pixel is 8-bit minus 8-bit to give 9-bit signed, and summing a 10x10 array adds 7 bits). Then you have to store 10 rows of this, although each row can potentially be only 630 pixels long if you ignore incomplete correlations. 10*630*16-bit is a much more expensive RAM than 10*640*8-bit.
01-24-2018 01:07 AM
As you said, my idea is to load only a few lines of image data, which would come at a greater cost. In order to realize this idea, I must figure out the following aspects:
1. How data come in, how much FPGA resources will be used.
2. As a result of template and a few lines of image data matching operation, will continue to get the result of convolution and these results should be cached in a container.
What I want to know more is the use of resources in the FPGA.