UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer alfredoer
Observer
10,676 Views
Registered: ‎06-06-2015

hls::Filter2D not synthesizing DSP blocks

Hi,

 

First post here, so please bear with me in case this has been answered already or I missed it in the documentation. 

 

I am implementing an image detection algorithm that involves using a 2d filter operation to extract candidate locations of an image that might match a template. I have tested the C code, ran C/RTL cosimulation and even tested the HLS block in real hardware and it works as expected. The tools work great in this sense. However, no DSP blocks are being inferred. I would expect the filter 2d operation to inherently inferr lots of DSP blocks, however, all the multiplications are being synthesized to FF and LUTs, hence consuming lots of fabric and lowering the maximum speed at which the algorithm can run. 

 

This is how I have my hls:Mat defined

 

typedef hls::Mat<MAX_IMAGE_SIZE, MAX_IMAGE_SIZE, HLS_16U>     ACCUM_IMAGE;

 

And this is the portion of the code that is relevant to the filter operation:

 

 

void image_filter_hw( AXI_STREAM& INPUT_STREAM, uint32_t threshold, volatile uint32_t *count, uint32_t *ID, uint32_t *Rev){

	// Function interface
	#pragma HLS INTERFACE s_axilite port=return bundle=LITE
	// AXI streaming interfaces for input/output
	#pragma HLS INTERFACE axis port=INPUT_STREAM
	// Control inputs
	#pragma HLS INTERFACE s_axilite port=threshold bundle=LITE
	// Outputs
	#pragma HLS INTERFACE s_axilite port=count bundle=LITE
	#pragma HLS INTERFACE s_axilite port=ID bundle=LITE
	#pragma HLS INTERFACE s_axilite port=Rev bundle=LITE
	// Data outputs
	ACCUM_IMAGE src(MAX_IMAGE_SIZE, MAX_IMAGE_SIZE);
	ACCUM_IMAGE edges(MAX_IMAGE_SIZE, MAX_IMAGE_SIZE);
	ACCUM_IMAGE hough_accumulator(MAX_IMAGE_SIZE, MAX_IMAGE_SIZE);

	// Initialize
	AXI_WINDOW c_kern;
        #pragma HLS RESOURCE variable=c_kern core=ROM_1P_BRAM
	uint32_t local_threshold = threshold;
	uint32_t hits = 0;
        #pragma HLS RESOURCE variable=hits core=AddSub_DSP
	init_kernel( &c_kern);
	hls::Point_<int> 		anchor;
	anchor.x = -1;	anchor.y = -1;

#pragma HLS dataflow
    hls::AXIvideo2Mat(INPUT_STREAM, src);
    hls::Sobel<1,0,3>(src, edges);
    hls::Filter2D(edges, hough_accumulator, c_kern, anchor);

/* non relevant code omitted */

The filter2d instance ends up consuming over 109k FF and 123K LUT and 0 DSP. (It is a big convolution, but I have tried small kernels and no DSPs are inferred so it does not seem related to that). I should note that the sobel filter also does not synthesize any DSP48 blocks.

 

Is there a specifiic data type that I should use to allow the tools to infer DSPs?

 

I know I can go in and modify the Filter2D operation and add #pragmas, but that would defeat the purpose of using hls libraries in the first place. I might be ommiting something in the tools that might have happened to someone else so I am welcome to ideas/suggestions. 

 

Best regards,

Alfredo

 

 

 

 

0 Kudos
3 Replies
Observer alfredoer
Observer
9,864 Views
Registered: ‎06-06-2015

Re: hls::Filter2D not synthesizing DSP blocks

After working for a few weeks on the project I am now able to answer my own question. It is simply a matter of including hls_math.h and multiply/accumulate operations will be properly synthesized to DSP slices (including the openCV equivalents). I should note that either way, the synthesized IP block works correctly in the real hardware. 

0 Kudos
Highlighted
Adventurer
Adventurer
4,772 Views
Registered: ‎07-18-2016

Re: hls::Filter2D not synthesizing DSP blocks

Hello

         Hi even I am using hls::Filter2D function in my algorithm. Algorithm has dataflow,inline,pipeline directives. LUT utilization for whole algorithm is 100%. so I need to reduce it for atleast 40 to 50%. Among all functions Filter2D function is using more LUT's

 

Resource utilization for this function is as mentioned below

                    BRAM    DSP48E         FF                        LUT

Filter2D       31            41                22048                  30361

hls::Filter2D(img_grey2, img_lap, kernel, anchor);

img_lap is of double data type. kernel is 31x31 window.

 

 

How can I reduce LUT's for this particular function? How can I force this function to use DSP48 slices?

I have attached synthesis report also.

 

0 Kudos
Scholar u4223374
Scholar
4,733 Views
Registered: ‎04-26-2015

Re: hls::Filter2D not synthesizing DSP blocks

@rashmi_ha

 

You should start a new thread for this. No use re-using an ancient thread for a different question. With that said, have you read his conclusion and included hls_math.h?

 

Is there anything you can do to cut down the kernel size? Can it be separated into a 31*1 and a 1*31 kernel, for example? 31*31 is pretty huge. Or can you cut it down to float (32-bit) precision, or even HLS's "half" precision floating point (16-bit).

 

The other thing to check is whether there are pipeline directives higher in the hierachy than the Filter2D call. A pipeline directive inlines and unrolls everything within the pipelined region, which tends to do horrible things to resource consumption. I suspect that this isn't actually a problem in your case (because if you'd done that it'd be at 10000% resource consumption, not 100%) but still worth checking.

0 Kudos