UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
3,036 Views
Registered: ‎03-22-2017

DSPs are not used when the should be

Jump to solution

I am using Vivado 2017.1 and running the simple HLS example described in XAPP59 (https://www.xilinx.com/support/documentation/application_notes/xapp599-floating-point-vivado-hls.pdf, page 7).

 

void core (float *r, float a, float b, float c, float d)
{
	*r = a + b + c + d;
}

 

According to the application note (Vivado 2012.x), the generated RTL should use DSPs. See the snap of the application note PDF:

 

reference.png

 

But, in my case, the generated RTL implementation does not use any DSP. See my Vivado HLS report:

implementation.png

 

Does anybody know the reason of that?

 

Thank you

 

0 Kudos
1 Solution

Accepted Solutions
Scholar jprice
Scholar
5,044 Views
Registered: ‎01-28-2014

Re: DSPs are not used when the should be

Jump to solution

Bot of a preface, this kind of question is more related to HDL Synthesis than HLS.  The Synthesis engine is ultimately what decides what operations are mapped to what resources. HLS generates HDL that the Vivado Synthesis Engine can consume. You can apply directives to force operations into DSP48s (which are implemented as attributes in HDL). Forcing the Synthesis Engine down a particular route this way is possible but often ill advised. 

 

In a modern FPGA an add isn't very expensive to implement in logic More importantly in most cases adders are much faster when implemented in LUTs than a DSP48s  when doing the add in a single cycle. This is because the time it takes to route signals into a DSP48 column, through the DSP48 primitive and out back into general fabric is rather large. Much larger than a single 2ns clock cycle. When you lowered the clock speed the tool realized hey this adder could now be implemented in a DSP48 and so chose to do. In most cases completing an add in a single cycle is desirable for performance (ignoring very large adders). It is possible to register the results of adders (implemented using the latency option of the resource directive, make sure you pick the pipelined adder for the resource itself). However this increases the latency of the add operation. In many instances this is fine but in many cases this increase in latency is unacceptable and so it's better to use LUTs. In general the adders in DSP48s are meant for Multiply and Accumulate (MAC) operations that are extremely common in digital signal processing (thus the name DSP48). 

3 Replies
Explorer
Explorer
3,021 Views
Registered: ‎03-22-2017

Re: DSPs are not used when the should be

Jump to solution

I think I found the issue.

 

I am targeting a Zynq UltraScale+ and my target clock period is 2ns. Given that value there are no DSPs. For a lower frequency, i.e. a bigger clock period (for example 10ns) there are DSPs.

 

Can you guys elaborate a little more about when and why use DSPs rather than LUSs?

 

Thank you

Scholar jprice
Scholar
5,045 Views
Registered: ‎01-28-2014

Re: DSPs are not used when the should be

Jump to solution

Bot of a preface, this kind of question is more related to HDL Synthesis than HLS.  The Synthesis engine is ultimately what decides what operations are mapped to what resources. HLS generates HDL that the Vivado Synthesis Engine can consume. You can apply directives to force operations into DSP48s (which are implemented as attributes in HDL). Forcing the Synthesis Engine down a particular route this way is possible but often ill advised. 

 

In a modern FPGA an add isn't very expensive to implement in logic More importantly in most cases adders are much faster when implemented in LUTs than a DSP48s  when doing the add in a single cycle. This is because the time it takes to route signals into a DSP48 column, through the DSP48 primitive and out back into general fabric is rather large. Much larger than a single 2ns clock cycle. When you lowered the clock speed the tool realized hey this adder could now be implemented in a DSP48 and so chose to do. In most cases completing an add in a single cycle is desirable for performance (ignoring very large adders). It is possible to register the results of adders (implemented using the latency option of the resource directive, make sure you pick the pipelined adder for the resource itself). However this increases the latency of the add operation. In many instances this is fine but in many cases this increase in latency is unacceptable and so it's better to use LUTs. In general the adders in DSP48s are meant for Multiply and Accumulate (MAC) operations that are extremely common in digital signal processing (thus the name DSP48). 

Xilinx Employee
Xilinx Employee
2,976 Views
Registered: ‎08-01-2008

Re: DSPs are not used when the should be

Jump to solution

you can use Vivado synthesis attribute to tell the synthesis tool not infer DSP48 for integer MULTs.

 

USE_DSP48 Verilog Example
(* use_dsp48 = "yes" *) module test(clk, in1, in2, out1);

USE_DSP48 VHDL Example
attribute use_dsp48 : string;
attribute use_dsp48 of P_reg : signal is "no“

USE_DSP48 XDC Example
#apply it on an instantiation of a module
set_property USE_DSP48 NO [get_cells u_mult]

#apply it on a register inside an instance of a module
set_property USE_DSP48 YES [get_cells u_mult/p_i]

#apply it on all instantiations of a module
set_property USE_DSP48 NO [get_cells -hierarchical -filter { REF_NAME =~  "mult" } ]
Thanks and Regards
Balkrishan
--------------------------------------------------------------------------------------------
Please mark the post as an answer "Accept as solution" in case it helped resolve your query.
Give kudos in case a post in case it guided to the solution.