UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor dishlamian
Visitor
257 Views
Registered: ‎12-03-2019

Inferring Floating-point Fused Multiply and Add in HLS

I spent the bulk of the past few days scouring through Xilinx's documentation and forum and trying multiple different coding patterns but could not get the HLS compiler to infer something so simple as an FP32 fused multiply and add (FMA). Is there a specific coding style that needs to be followed to get this? e.g., how can the following basic example be modfiied to infer an FMA which uses 2 DSPs on Zynq Ultrascale+ (2 DSPs based on numbers mentioned at https://www.xilinx.com/support/documentation/ip_documentation/ru/floating-point.html and also confirmed by instantiating the floating-point IP from the IP catalogue), rather than 5 DSPs (3 for mul and 2 for add) because the add and multiply are not being fused?

float test(float a, float b, float c)
{
	float d = 0;
	d = a * b + c;
	return d;
}

I also tried the built-in fma function as follows, but the DSP usage still stayed at 5.

float test(float a, float b, float c)
{
	float d = 0;
	d = hls::fma(a, b, c);
	return d;
}

I also placed and routed the above code as an IP, hoping that maybe the mapper would be smart enough to fuse the operations, but it didn't. I also checked the RESOURCE directive in the documentation, but there is no option to override resource usage for FMA; just add, mul and other operations. Also already tried the work-around mentioned here, but even though it seemed to work for integer, it didn't work for float.

 

I am using Vivado HLS v2019.2 and my target is the ZCU102 board.

0 Kudos
2 Replies
Highlighted
Teacher muzaffer
Teacher
160 Views
Registered: ‎03-31-2012

Re: Inferring Floating-point Fused Multiply and Add in HLS

Have you compared LUTs & DFFs also ? The xczu9 single precision FMA has this report for LUT/DFF/DSP usage:

704 1076 2

 

Maybe it's possible to trade off LUTs/DFFs with DSPs ? 

Another option is to limit the use of DSP resources for this module and see if it helps.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Visitor dishlamian
Visitor
80 Views
Registered: ‎12-03-2019

Re: Inferring Floating-point Fused Multiply and Add in HLS

@muzafferActually I did that recently by generating the IPs in Vivado 2019.2. For a fixed total number of DSPs (e.g. 2), the FMA core uses more logic and FFs compared to the combination of FMUL and FADD (e.g. using the 2-DSP variation of FMUL and 0-DSP variation of FADD) which is very counter intuitive. Hence, even if there was a possibility to infer FMA using HLS, there would be no point in doing so from an area utilization point of view and now I am wondering what is the point of having the FMA core in the first place since the peak operating frequency of the cores is also hardly different. The 4-DSP variation of the FMA core uses even more logic and FF than the 2-DSP version which is even more counter intuitive.

xilinx.jpg

P.S. For future reference, I am pretty positive Xilinx HLS, at this point of in time, is incapable of inferring FMA blocks, not that there is any point in doing so considering the above results.

0 Kudos