cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
kelemixx
Contributor
Contributor
11,439 Views
Registered: ‎11-21-2014

WARNING: Estimated clock period exceeds the target

Jump to solution

Hi!

 

I'm trying to synthesize a very small project using Vivado HLS. All code are in one file: "fmmod.c":

 

#include <ap_cint.h>

void fmmod(int12 x, int12 *y) {

	static const int12 sin_lut[4096] = {
#include "sin_lut.dat"
	};

	static uint24 acc;
	uint12 pha;

	acc += x * 717;
	pha = acc >> 12;
	pha = pha + 1024;

	*y = sin_lut[pha];

}

 

Target device is xc7a200tfbg484-2, clock is 10 ns with 1.25 ns uncertainty. Vivado's version is 2016.2. After click the  C Synthesis button I got some WARNING in the console:

 

WARNING: [SCHED 204-21] Estimated clock period (10.3ns) exceeds the target (target clock period: 10ns, clock uncertainty: 1.25ns, effective delay budget: 8.75ns).
WARNING: [SCHED 204-21] The critical path consists of the following:
    wire read on port 'x' (0 ns)
    'mul' operation ('tmp_1', test1/fmmmod.c:12) (2.82 ns)
    'add' operation ('tmp_2', test1/fmmmod.c:12) (3.53 ns)
    'partselect' operation ('pha', test1/fmmmod.c:13) (0 ns)
    'add' operation ('pha', test1/fmmmod.c:14) (1.6 ns)
    'getelementptr' operation ('sin_lut_addr', test1/fmmmod.c:16) (0 ns)
    'load' operation ('sin_lut_load', test1/fmmmod.c:16) on array 'sin_lut' (2.39 ns)

Below is the Performance Analysis:

 

Capture2.PNG

 

Capture.PNG

 

I know it means HLS want to do a '*', two '+' and a rom load in one clock so it takes much more time than 10 ns. But why HLS does not increase  the latency to meet the timing requirement?

 

If HLS can't do this automatically for me, what should I do to make the result meet the timing?

 

And another strange thing is: I have an old version of Vivado installed (2014.2). Same code will generate a different result on 2014.2, and it meets the timing (Latency  from 1 to 2).

 

Attachment is the full  Synthesi Report for "fmmod".

0 Kudos
1 Solution

Accepted Solutions
debrajr
Moderator
Moderator
18,059 Views
Registered: ‎04-17-2011
I think so. Try the code as below:

#include <ap_cint.h>

void fmmod(int12 x, int12 *y) {

static const int12 sin_lut[4096] = {
#include "sin_lut.dat"
};

static uint24 acc;
uint12 pha;
uint24 temp;
temp = x * 717;
#pragma HLS RESOURCE variable=temp core=Mul
acc = temp + acc;
//acc += x * 717;
pha = acc >> 12;
pha = pha + 1024;

*y = sin_lut[pha];

}

It should be within 10 ns in HLS.
Regards,
Debraj
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

View solution in original post

10 Replies
siva_krishna
Visitor
Visitor
11,415 Views
Registered: ‎04-06-2016

Hello,

You are right. Even I also faced this problem. The same design gives different results in Vivado HLS and SDSoC. Also if a design exceeds target time at 10ns(i.e. estimated 11.5ns). If I synthesize the same design for 6ns then it will estimate 7.7ns. Why it was not able to synthesize for 7.7ns when I used 10ns?. I also didn't know exact reason. It may be due to the optimization/effort put by the tool for a particular clock time.

0 Kudos
u4223374
Advisor
Advisor
11,391 Views
Registered: ‎04-26-2015

Yes, same issue here.

 

I suspect that HLS is optimising each function or loop separately, and stopping optimsation once each one is under 10ns. Unfortunately each one being under 10ns does not guarantee that the combined set is under 10ns.

0 Kudos
nagabhar
Xilinx Employee
Xilinx Employee
11,374 Views
Registered: ‎05-07-2015

HI @kelemixx

 

Interesting, Did you try applying PIPELINE directive on your fmmod function and see if the estimated frequency improved?

Thanks
Bharath
--------------------------------------------------​--------------------------------------------
Please mark the Answer as "Accept as solution" if information provided addresses your query/concern.
Give Kudos to a post which you think is helpful.
--------------------------------------------------​-------------------------------------------
0 Kudos
kelemixx
Contributor
Contributor
11,360 Views
Registered: ‎11-21-2014

Hi, @nagabhar

 

I tried to add PIPELINE directive to my project. Contents in directives.tcl:

 

set_directive_pipeline "fmmod"

The result is slightly different from above, the Interval goes to 1 from 2, but the Latency is still 1. So it does not meets the timing too. Here is the screenshot:

 

Capture.PNG

 

Capture2.PNG

 

From the result, we can see that HLS still wants all operation in one clock and estimated frequency does not improved.

 

Attachment is the full report using PIPLINE directive. And I also uploaded my source code here. Thanks for your help.

 

 

 

 

 

 

0 Kudos
kelemixx
Contributor
Contributor
10,766 Views
Registered: ‎11-21-2014

Hi,

 

Currently my only way is reducing the time target a little, such as 10 to 9.5. In fact I change it to 5, and HLS give me a timing met result.

 

Anyway this helps.

0 Kudos
debrajr
Moderator
Moderator
10,756 Views
Registered: ‎04-17-2011
The tool is over-pessimistic in your case it seems. WIth the Export RTL evaluate option is shows as below:

#=== Resource usage ===
SLICE: 2
LUT: 3
FF: 1
DSP: 1
BRAM: 3
SRL: 0
#=== Final timing ===
CP required: 10.000
CP achieved: 5.775
Timing met
INFO: [Common 17-206] Exiting Vivado at Fri Jul 15 12:13:45 2016...
Finished export RTL.
Regards,
Debraj
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos
kelemixx
Contributor
Contributor
10,747 Views
Registered: ‎11-21-2014

Hi, @debrajr

 

Thanks for your reply.

 

It's very interesting that Verilog/VHDL sythesised RTL has a much better timing. So HLS knows this and I can safely ignore the WARNINGs about timing? 

 

What if not? I mean, if RTL does not meet the timing too, what could I do to optimize the path? Which DIRECTIVE could help?

0 Kudos
debrajr
Moderator
Moderator
18,060 Views
Registered: ‎04-17-2011
I think so. Try the code as below:

#include <ap_cint.h>

void fmmod(int12 x, int12 *y) {

static const int12 sin_lut[4096] = {
#include "sin_lut.dat"
};

static uint24 acc;
uint12 pha;
uint24 temp;
temp = x * 717;
#pragma HLS RESOURCE variable=temp core=Mul
acc = temp + acc;
//acc += x * 717;
pha = acc >> 12;
pha = pha + 1024;

*y = sin_lut[pha];

}

It should be within 10 ns in HLS.
Regards,
Debraj
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

View solution in original post

kelemixx
Contributor
Contributor
10,741 Views
Registered: ‎11-21-2014

Hi, @debrajr

 

Thanks for your code, it works.

 

However, I'm confused because without the RESOURCE directive, the * operator will be implemented using DSP(Mul) too. Why manual assigning it causes an additional latency (then causes the timing met).

0 Kudos
mik3l3_hdl
Adventurer
Adventurer
491 Views
Registered: ‎08-15-2019

Hi,

why don't you simply replace your sin_lut.dat with a DDS IP core within your design?

Basically  you have to synthesize only the portion of your simulation engine C/C++-code  generating "pha" with latency directive "#pragma HLS latency min=0 max=0" and then bring the VDHL/Verilog synthesized module  in Vivado HLx,  connect its pha port with the input port of a DDS IP core set to generate sinusoidal values.

Hope this helps

 

0 Kudos