cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
hs_mul
Newbie
Newbie
1,410 Views
Registered: ‎05-25-2018

Finding the shortest critical path of a clocked design

Jump to solution

Hello everybody,

I have a question regarding the best achievable critical path delay of a clocked design, for example a multiplier with clocked input and output. Until now I am using clock constraints such as

 

create_clock -period 7.0 -name clock -add [get_ports -filter {NAME =~ "*clk*" && DIRECTION == "IN" }]

But the critical path delay is strongly defined by the given clock period. My assumption is, that the placement is optimised until the timing constrains are reached. After that no further optimisation will be applied by Vivado. Because of that I wrote a tcl script to reduce the clock period after every run until the timing constraints can not be met anymore. But here the actual problems begin: the results differ very much for small differences in clock period.

 

As an example:

 

For a multiplier with 4.202ns clock period constraint, Vivado 2017.4 creates an implementation with 4.09ns critical path delay, while for the same design with a 4.2ns clock period timing constraint Vivado implements it with a critical path delay of 4.168ns. Furthermore, if the clock period constraint is set to something unreasonably small value e.g. 1ns the critical path delay changes again to another value which is faster or slower than those achieved with previous timing constraints.

Is there any "guideline" to get the fastest possible critical path delay? The only purpose is the comparison of different arithmetic structures. I can change the clock frequency to any value since there is no other module beside the arithmetic structure.

Thanks in advance,
Hendrik

 

 

0 Kudos
1 Solution

Accepted Solutions
1,655 Views
Registered: ‎01-22-2015

Hendrik,

 

    During ISE times there was this PlanAhead tool, which had the feature of optimizing the delay in multiple runs …

I think you are referring to ISE Xplorer/SmartXplorer, described nicely <here>.  Apparently, it is not part of Vivado for reasons shown <here>.

 

I have not used Xplorer but I can relate to what you are trying to do.  There is a part of me that wants to tweak my HDL and optimize slack(WNS) reported by Vivado.   If @austin were listening in to our discussion, I think he’d say something like “optimizing slack is fine for the ivory-tower guys but it is not what Vivado does and it is not what we should do as engineers”. 

 

Anyway, good luck with your arithmetic structures work.

 

Mark

View solution in original post

5 Replies
jmcclusk
Mentor
Mentor
1,405 Views
Registered: ‎02-24-2014

You are facing the "noise" problem of FPGA place and route.   FPGA place and route is NP-Complete, and place and routing results have strong dependencies on initial placement, and the solution space for timing closure is notoriously rough and discontinuous.    In order to find a reasonable estimation for Fmax  (maximum clock frequency), it takes a lot of averaging over multiple trials.  Very frequently, very ambitious clock constraints will produce disastrous results, so the plot of Fmax achieved vs Fmax attempted is far from linear or smooth.     

 

Better results are frequently achieved with post placement physical synthesis   "phys_opt_design" passes, sometimes run multiple times, along with phys_opt_design passes after routing as well.   And sometimes it just takes many place & route runs to get lucky and find a solution with the best timing.  

 

I get my best results by examining the timing failures, and either inserting more pipelining registers, or by hand placing the relative locations of the logic in the critical path using RLOC constraints.   It's a lot of work.   The low hanging fruit is usually just adding more pipeline registers.

 

Then there is the problem of maintaining the optimal critical paths when a design gets full of logic, and optimal placement seems out of reach for the standard placement algorithms.    There's a good reason why Vivado is chock full of methods for constraining logic placement.  

 

Bottom line,  brute force search for the fastest timing is about the only practical approach.   If you have the computing power available,  minor parameter variation over many place & route attempts in parallel is a good approach.   Gradient descent largely fails because derivatives of the parameters being varied don't exist.

Don't forget to close a thread when possible by accepting a post as a solution.
1,373 Views
Registered: ‎01-22-2015

@hs_mul

 

I agree with @jmcclusk that consistently achieving timing closure (and keeping Fmax high) come from careful thought about each timing path that fails timing analysis – and that this careful thought often leads to adding pipeline-registers, synchronizers, and constraints (RLOC and others).

 

However, sometimes you’ve not had time for this careful thought and the deadline is tomorrow - but you got lucky and (after many trials) you achieved timing closure.  Then, the boss asks you to make a very small change to the firmware – which you know will put you back in that “many trials – hope I get lucky” place.

 

If you find yourself in this situation, you might try the Vivado feature called Incremental Compile that is described in AR-57853 and ug904. In short, this feature tries to reuse that previous implementation where you got lucky and achieved time closure – and only changes the small part of implementation that corresponds to your small change in firmware.

 

Mark

hs_mul
Newbie
Newbie
1,303 Views
Registered: ‎05-25-2018

Thanks for the response,

 

at least I know now, that it is not only my own inability to find a suitable solution. Since it is not useful to add some additional hardware and it is not possible to write manual placement constraints for all different designs and sizes "brute forcing" seems to be the only solution.

During ISE times there was this PlanAhead tool, which had the feature of optimizing the delay in multiple runs remembering the fastest out of a set of multiple runs with small modifications regarding the constraints. Is there still anything comparable available in Vivado or is the best way to go to write a tcl script testing a large set of constraints?

 

The "incremental compile flow" appears to be a good method to have an as short as possible implementation time.


Best regards

0 Kudos
1,656 Views
Registered: ‎01-22-2015

Hendrik,

 

    During ISE times there was this PlanAhead tool, which had the feature of optimizing the delay in multiple runs …

I think you are referring to ISE Xplorer/SmartXplorer, described nicely <here>.  Apparently, it is not part of Vivado for reasons shown <here>.

 

I have not used Xplorer but I can relate to what you are trying to do.  There is a part of me that wants to tweak my HDL and optimize slack(WNS) reported by Vivado.   If @austin were listening in to our discussion, I think he’d say something like “optimizing slack is fine for the ivory-tower guys but it is not what Vivado does and it is not what we should do as engineers”. 

 

Anyway, good luck with your arithmetic structures work.

 

Mark

View solution in original post

hs_mul
Newbie
Newbie
1,277 Views
Registered: ‎05-25-2018

Hi Mark,

 

thanks a lot for your help. I will try to get as close as possible to the ivory-tower without entering it :D

 

Best Regards,

Hendrik

0 Kudos