UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
3,465 Views
Registered: ‎04-23-2014

expf() takes enormous amount of resources

I have been trying to implement a bilateral filter for my video processing pipeline (pixel-rate processing), and the function requires taking the exponent to the values from memory window, the prototype is as follows:

 

F = e^( - ((X-Y)^2 ) / Z^2) * P

 

The function below takes 2-3 times more resources that are available on the chip. Is that reasonable? Is there the "correct" way to implement such function?

 

unsigned char gaussianWeights(hls::Window<5,5,unsigned char> *I)
{
	unsigned char out;
	float FI,F;
	float sumF=0;
	float sumFI=0;
	int row, col;

	for(row=0; row<5; row++){
		for(col=0; col<5; col++){
			F = ( expf( -(float)(  (I->getval(row,col) - I->getval(2,2))*(I->getval(row,col) - I->getval(2,2)) )/0.02 ) * G[row][col] );
			FI = F * I->getval(row,col);
			sumF = sumF+F;
			sumFI = sumFI + FI;
		}
	}

	out = (sumFI/sumF)*255;
	return out;
}
0 Kudos
12 Replies
Teacher muzaffer
Teacher
3,439 Views
Registered: ‎03-31-2012

Re: expf() takes enormous amount of resources

@naz_rb which chip are you using and what directives are you adding to the synthesis?

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Explorer
Explorer
3,411 Views
Registered: ‎04-23-2014

Re: expf() takes enormous amount of resources

@muzaffer Using ZC702 board, I am trying to implement a pixel stream 5x5 filter, where I have two for_loops with internal loop pipelined to II=1, and the function above is called within the inner loop to perform calculation on the window. As in the snippet above, this function has no directives, I assume it is automatically inlined to meet pixel rate processing.

 

Capture.PNG

0 Kudos
Teacher muzaffer
Teacher
3,401 Views
Registered: ‎03-31-2012

Re: expf() takes enormous amount of resources

@naz_rb I tried the function you show by itself and its size is quite reasonable (roughly 20% of what you show). I think your outer loops are doing something funny. Can you show the function which is calling the code you instantiate?

 

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Observer martin-91x
Observer
3,391 Views
Registered: ‎10-02-2015

Re: expf() takes enormous amount of resources

 Just to be sure: You want to calculate the weights for a gaussian filter with a Kernel size of 5x5?

0 Kudos
Scholar u4223374
Scholar
3,366 Views
Registered: ‎04-26-2015

Re: expf() takes enormous amount of resources

If you've pipelined the outer loop, that's unrolled both of your inner loops - so it's doing 25 of those calculations per clock cycle, plus a uint8-to-float conversion, plus a bunch of multiplies, plus a floating-point divide.

 

It appears that the whole exponential calculation only requires a single input value:

I->getval(row,col) - I->getval(2,2)

Since the image is 8-bit, the difference is 9-bit. You could build a 36-bit lookup table for this exponential in a single 18K block RAM - and you've got loads of those available. Even if you need to access 25 per cycle (as you do for fully unrolled loops), that's only 13 RAMs.

 

 

0 Kudos
Explorer
Explorer
3,337 Views
Registered: ‎04-23-2014

Re: expf() takes enormous amount of resources

@martin-91x Yes, the window contains neighboring pixels of the pixel of interest (just like in any video pipeline), I have my gaussian coefficients precalculated and stored in array float G[5][5]. What I calculate with the exponential function is the "photometric" weights.
0 Kudos
Explorer
Explorer
3,331 Views
Registered: ‎04-23-2014

Re: expf() takes enormous amount of resources

@u4223374 I don't quite understand your proposed solution. Could you provide more details, please?
0 Kudos
Observer martin-91x
Observer
3,308 Views
Registered: ‎10-02-2015

Re: expf() takes enormous amount of resources

Well, you could create a lookup table containing all possible output values of the exponential function. Then you could use this difference

I->getval(row,col) - I->getval(2,2)

as "input" (with an offset as an array index).

I think, this is what he meant - at least I would go this way :)

0 Kudos
Explorer
Explorer
3,324 Views
Registered: ‎04-23-2014

Re: expf() takes enormous amount of resources

Yep, I actually started doing it this way. I did not realize that is was only 256 different values. At this point, I still wonder if there was better implementation that would use much less resources if I indeed wanted to calculate values real time.
0 Kudos
Explorer
Explorer
2,155 Views
Registered: ‎04-23-2014

Re: expf() takes enormous amount of resources

So, I created an H[256] and g[5][5] arrays of precalculated exponential values for all cases and eliminated the expf() function, but my resource usage is almost as insane as before. Obviously, the problem was not in that function. Should there be specific directives for those arrays? Also, could someone please comment on the content of the arrays? I generated the values externally, but I am not sure if the format is correct.

 

unsigned char gaussianWeights(hls::Window<5,5,unsigned char> *w)
{
	unsigned char out;
	ap_ufixed<16,8,AP_TRN_ZERO,AP_SAT> Ic;
	float F,FI,If;
	int row,col;
	int indx;
	float sumF=0;
	float sumFI=0;

	for(row=0; row<5; row++){
		for(col=0; col<5; col++){
			indx = abs(w->getval(row,col) - w->getval(2,2));
			F = H[indx] * G[row][col];
Ic = w->getval(row,col);
If = (float)(Ic>>8); FI = F * If; sumF = sumF+F; sumFI = sumFI + FI; } } out = (sumFI/sumF)*255; return out; }
const float G[5][5] =	{	{ 0.6411804, 0.7574651, 0.8007374, 0.7574651, 0.6411804},
				{ 0.7574651, 0.8948393, 0.9459595, 0.8948393, 0.7574651},
				{ 0.8007374, 0.9459595, 1.0000000, 0.9459595, 0.8007374},
				{ 0.7574651, 0.8948393, 0.9459595, 0.8948393, 0.7574651},
				{ 0.6411804, 0.7574651, 0.8007374, 0.7574651, 0.6411804}};
const float H[256] = {1,0.99923,0.99693,0.9931,0.98777,0.98096,0.9727,0.96302,0.95198,0.93962,0.92599,0.91116,0.89518,0.87814,
		0.8601,0.84113,0.82132,0.80074,0.77947,0.75761,0.73523,0.71241,0.68924,0.6658,0.64217,0.61842,0.59464,
		0.57089,0.54725,0.52378,0.50055,0.47762,0.45503,0.43285,0.41111,0.38987,0.36915,0.349,0.32945,0.31051,
		0.29221,0.27456,0.25759,0.24129,0.22568,0.21075,0.19651,0.18294,0.17006,0.15783,0.14626,0.13534,0.12503,
		0.11533,0.10622,0.097683,0.089691,0.082227,0.075268,0.068792,0.062777,0.0572,0.052038,0.047269,0.042871,
		0.038823,0.035103,0.03169,0.028565,0.025709,0.023103,0.020729,0.018571,0.016612,0.014836,0.01323,0.01178,
		0.010472,0.0092957,0.0082386,0.0072905,0.0064416,0.0056828,0.0050056,0.0044024,0.0038659,0.0033896,
		0.0029674,0.0025938,0.0022637,0.0019727,0.0017164,0.0014911,0.0012934,0.0011201,0.00096862,0.00083632,
		0.00072097,0.00062058,0.00053335,0.00045768,0.00039213,0.00033546,0.00028654,0.00024438,0.0002081,
		0.00017693,0.0001502,0.00012731,0.00010775,9.1049e-05,7.682e-05,6.4715e-05,5.4433e-05,4.5715e-05,
		3.8334e-05,3.2096e-05,2.6831e-05,2.2395e-05,1.8664e-05,1.5531e-05,1.2904e-05,1.0705e-05,8.8666e-06,
		7.3329e-06,6.0551e-06,4.9923e-06,4.1097e-06,3.378e-06,2.7723e-06,2.2717e-06,1.8586e-06,1.5183e-06,
		1.2384e-06,1.0085e-06,8.201e-07,6.6584e-07,5.3976e-07,4.3688e-07,3.5307e-07,2.849e-07,2.2954e-07,1.8465e-07,
		1.4831e-07,1.1894e-07,9.5241e-08,7.6146e-08,6.0785e-08,4.8449e-08,3.8557e-08,3.0638e-08,2.4307e-08,1.9255e-08,
		1.523e-08,1.2028e-08,9.484e-09,7.4668e-09,5.8696e-09,4.607e-09,3.6104e-09,2.8251e-09,2.2071e-09,1.7217e-09,
		1.341e-09,1.0429e-09,8.0978e-10,6.2782e-10,4.8599e-10,3.7563e-10,2.8988e-10,2.2336e-10,1.7184e-10,1.3201e-10,
		1.0125e-10,7.7536e-11,5.9287e-11,4.5263e-11,3.4503e-11,2.6261e-11,1.9957e-11,1.5143e-11,1.1472e-11,8.6782e-12,
		6.5545e-12,4.9429e-12,3.7218e-12,2.7981e-12,2.1004e-12,1.5743e-12,1.1781e-12,8.8026e-13,6.5672e-13,4.8919e-13,
		3.6384e-13,2.7019e-13,2.0034e-13,1.4832e-13,1.0964e-13,8.0919e-14,5.9631e-14,4.3876e-14,3.2234e-14,2.3645e-14,
		1.7318e-14,1.2664e-14,9.2468e-15,6.7413e-15,4.9071e-15,3.5665e-15,2.5881e-15,1.8752e-15,1.3567e-15,9.7996e-16,
		7.0678e-16,5.0897e-16,3.6595e-16,2.6272e-16,1.8832e-16,1.3478e-16,9.6316e-17,6.8722e-17,4.8959e-17,3.4825e-17,
		2.4734e-17,1.7539e-17,1.2419e-17,8.7794e-18,6.1971e-18,4.3676e-18,3.0735e-18,2.1595e-18,1.515e-18,1.0612e-18,
		7.4217e-19,5.1826e-19,3.6135e-19,2.5156e-19,1.7486e-19,1.2136e-19,8.4095e-20,5.8185e-20,4.0196e-20,2.7726e-20,
		1.9095e-20,1.3131e-20,9.0157e-21,6.1806e-21,4.2305e-21,2.8913e-21,1.973e-21,1.3443e-21,9.1449e-22,6.2116e-22,
		4.2127e-22,2.8527e-22,1.9287e-22};

 

0 Kudos
Teacher muzaffer
Teacher
2,133 Views
Registered: ‎03-31-2012

Re: expf() takes enormous amount of resources

@naz_rb your problem is already diagnosed by @u4223374 "If you've pipelined the outer loop, that's unrolled both of your inner loops - so it's doing 25 of those calculations per clock cycle, plus a uint8-to-float conversion, plus a bunch of multiplies, plus a floating-point divide."

 

it might help if you partition the H & G array fully. You need to simplify the gaussianweights function in its fully unrolled form.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Observer martin-91x
Observer
2,120 Views
Registered: ‎10-02-2015

Re: expf() takes enormous amount of resources

According to @muzaffer's post: I don't know what precision you need, but you could switch to integer math for your calculations and use float only in the end. One possibility is to map your arrays to a specific integer range (eg. H to 16 Bit Integers (1 => 65535 ..... 1/65535 = 1,526e-5 => 1)).

0 Kudos