UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor brugger
Visitor
8,794 Views
Registered: ‎08-21-2013

HLS: Efficiently apply RESOURCE constraints to expressions.

Hello,

 

I have a code with many floating point additions and multiplications and want to reduce DSP ressource allocation. I read in the documentation how this can be done with the set_resource_directive. It seems to me that this directive is limited to one C++ operator. I have the following code, assume everything is float:

  

float out = input1 * input2 + input3 * input4 + input5 * input6 + input7 * input8;

In the standard setting 3 DSPs are used for multiplications (FMul_maxdsp) and 2 DSPs for addition (FAddSub_fulldsp). When I simply do

  

float out = input1 * input2 + input3 * input4 + input5 * input6 + input7 * input8;
#pragma HLS RESOURCE variable=out core=FMul_nodsp
#pragma HLS RESOURCE variable=out core=FAddSub_nodsp

only one of the multiplication instances is actually implemented with FMul_nodsp. The three remaining are unchanged FMul_maxdsp. Also a warning is issued that the second directive could not be applied. Is there a way to reduce the DSPs for the whole expression?

 

The only solution I am currently aware of would be:

 

float subexpr1 = input1 * input2;
#pragma HLS RESOURCE variable=subexpr1 core=FMul_nodsp
float subexpr2 = input3 * input4;
#pragma HLS RESOURCE variable=subexpr2 core=FMul_nodsp
float subexpr3 = input5 * input6;
#pragma HLS RESOURCE variable=subexpr3 core=FMul_nodsp
float subexpr4 = input7 * input8;
#pragma HLS RESOURCE variable=subexpr4 core=FMul_nodsp

float subout1 = subexpr1 + subexpr2;
#pragma HLS RESOURCE variable=subout1 core=FAddSub_nodsp
float subout2 = subout1 + subexpr3;
#pragma HLS RESOURCE variable=subout2 core=FAddSub_nodsp
float out = + subout2 + subexpr4;
#pragma HLS RESOURCE variable=subout3 core=FAddSub_nodsp

 

I currently have expressions for differential equations with up to 15 operatiors and many brackets. I don't want to imagine how they would look like after doing this.

 

Is there a shorter way of doing this? Or is there a global setting for binding operators to functional units? The last version looks just horrible.

 

Regards

Christian

 

PS: For me HLS is reporting a FMul_maxdsp core is using 3 DSPs, 143 FFs and 321 LUTs, while a FMul_fulldsp is using only 2 DSPs, 144 FFs and 297 LUTs. If that would be true, why would anyone prefer the standard setting FMul_maxdsp  over FMul_fulldsp?

0 Kudos
3 Replies
Moderator
Moderator
8,781 Views
Registered: ‎04-17-2011

Re: HLS: Efficiently apply RESOURCE constraints to expressions.

There are couple of things you can try:

1. set_directive_resource

 

For example: Result=A*B in function foo, to specifiy multiplication with two-stage, pipelined multiplier core, Mul2S.


set_directive_resource -core Mul2S foo Result


If the variable is used with multiple operators, the code must be modified to ensure there is a single variable for each operator. For example:


Result = (A*B) + C;


Should be changed to:


Result_tmp = (A*B);
Result = Result_tmp + C;

The functional cores are listed at page 208: http://www.xilinx.com/support/documentation/sw_manuals/xilinx2013_2/ug902-vivado-high-level-synthesis.pdf

 

2. config_bind : To control binding of operators.

Example: config_bind -min_op add

 

The –min_op option to the config_bind command instructs High-Level Synthesis to create a design with the minimum number of specified operators

 

3. set_directive_allocation

set_directive_allocation –limit 256 –type operation foo mul

 

For example,
if a design called foo has 317 multiplications but the FPGA only has 256 multipliers. The
following allocation command can be used to direct High-Level Synthesis to only create a
design schedule with maximum of 256 multiplication (mul) operators

Regards,
Debraj
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos
Visitor brugger
Visitor
8,776 Views
Registered: ‎08-21-2013

Re: HLS: Efficiently apply RESOURCE constraints to expressions.

Dear debrajr,

thank you for your quick reply.

1. set_directive_resource
If I understand you correctly, you suggest me to ensure that there is a single variable for each operator. This is basically what I did in my last version. Or is there anything different?

2. config_bind and 3. set_directive_allocation
I see how those directivees can help to reduce the number of multipliers and therefore also the number of DSPs. In my case this goes into the wrong direction, let me clarify. I want to have a fully pipelined design to get the most power efficient architecture. All the multipliers will be fully utilized in each clock cycle and none can be shared. So both directives won't help as the number of multipliers is fixed and cannot be reduced.

What I want is to reduce the number of DSPs slides for the whole function. Multipliers - for example - can be implemented with different cores for FPGA. There are cores that take many DSPs and view LUTs - and others that take no DSPs and many LUTs. In my design I have a shortage on DSPs and still a lot of LUTs. That is why I want to trade DSPs for LUTs by selecting a different type floating point core (FMul_fulldsp instead of FMul_maxdsp) for the same operation (floating-point multiplication).

BTW: It would be a great feature to somehow specify the maximum number of DSP ressources for a function. Very similar to set_directive_allocation for operations, but instead for FPGA ressources. HLS would then figure out where to use cores with less DSP where it impacts Frequency & Power the least. I assume FMul_nodsp will be slower and draw more power than a FMul_maxdsp, which uses more specialized hard cores. So replace the multiplication that has the lowest utilization and is not in the critical path.

Regards,
Christian

0 Kudos
Xilinx Employee
Xilinx Employee
8,733 Views
Registered: ‎03-24-2010

Re: HLS: Efficiently apply RESOURCE constraints to expressions.

Hi,brugger,

I'm afraid that you need to do as you have already tried - to modify your code to ensure only a single variable for each operator. See the following explaination from HLS User Guide:

 

The resource directives uses the assigned variable as the target for the resource. Given code
Result=A*B in function foo, this example specifies the multiplication be implemented
with two-stage, pipelined multiplier core, Mul2S.
set_directive_resource -core Mul2S foo Result
If the variable is used with multiple operators, the code must be modified to ensure there
is a single variable for each operator. For example:
Result = (A*B) + C;
Should be changed to:
Result_tmp = (A*B);
Result = Result_tmp + C;

And the directive specified on Result_tmp to control the multiplier resource or on Result
to control the adder resource.

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos