cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
424 Views
Registered: ‎01-25-2012

replicating FFs with phys_opt_design

Jump to solution

I'm trying to improve the timing on a aws-fpga design (using Vivado 2019.2 linux with xcvu9p-flgb2104). It is running at 300MHz but I'd like to speed it up to 350 or 400MHz. In the design there are many situations where the last FF in a register pipeline chain fans out like this: 

Capture.PNG

Clearly, replicating the driving FF could dramatically improve the timing (assuming there is slack in the preceding pipeline stage). phys_opt_design should do this but so far I have been unable to get it to do what I want.

phys_opt_design -directive AggressiveFanoutOpt

This helps a lot but it doesn't fix all of the obvious problems. I've tried running it iteratively but it stops making any additional changes after 2 or 3 runs.

I've tried making a list of all the failing nets (https://forums.xilinx.com/t5/Implementation/How-to-best-find-negative-slack-high-fanout-nets/td-p/1134063) and using 

phys_opt_design -force_replication_on_nets [get_nets $my_hfo_nets]

This has the opposite problem. It goes way overboard and adds too many extra FF's (eg 8 replicated FF's for a fanout of 16!!?!). The extra utilization/congestion makes the routed results worse.

Does anyone know of a better way of replicating FF's on failing routes? I'm looking for the middle path between the overkill of -force_replication and the underkill of AggressiveFanoutOpt. I'd like to just add one or maybe two FFs, as needed. Doing this in RTL would be really difficult. The phys_opt_design approach seems correct but I just can't make it do what I want.

0 Kudos
1 Solution

Accepted Solutions
Highlighted
293 Views
Registered: ‎01-25-2012

I stumbled across this site https://hwjedi.wordpress.com/2017/02/09/vivado-non-project-mode-part-iii-phys-opt-looping/. I added something similar to my script and am getting way better results. The looping I tried earlier was too limited (only used the -fanout_opt option). The mixture of different directives seems to work much better. It is certainly much better than the sledge hammer approach of -force_replication_on_nets. 

I'm now down to WNS=-0.017 | TNS=-0.927 with a 375MHz clock. Whoo hoo!

Clearly this is a feature that needs to be added to project mode implementation

View solution in original post

4 Replies
Highlighted
404 Views
Registered: ‎01-22-2015

pspear@dwavesys.com 


This has the opposite problem. It goes way overboard and adds too many extra FF's (eg 8 replicated FF's for a fanout of 16!!?!). The extra utilization/congestion makes the routed results worse.

Before replicating FFs, it sounds like you already are (or very nearly are) facing a congestion problem.  Perhaps resolving the congestion problem first is necessary (ref "Reducing Net Delay Caused by Congestion", UG949).

 

When not facing congestion problems, phys_opt_design -force_replication_on_nets has worked well for me using method shown in the following post (although this seems same as what you did).

https://forums.xilinx.com/t5/Xilinx-IP-Catalog/BRAM-address-pipelining/m-p/831646#M3872

 

I'd like to just add one or maybe two FFs, as needed. Doing this in RTL would be really difficult.

I suspect that the RTL route with a bunch of DONT_TOUCH constraints/attributes is your only option.

 

Mark

Highlighted
397 Views
Registered: ‎01-25-2012

Something as simple as a -max_replication <N> option for the -force_replication option would solve this problem. Any undocumented features we could use here? 

0 Kudos
Highlighted
390 Views
Registered: ‎01-25-2012

I'm definitely facing congestion problems (around 5 on the long routes). I'm balanced precariously between congestion and routing path length issues. If I add additional pipeline stages the results get worse. If I remove any pipeline stages the results get worse. I've been wrestling with this design for several months now. Even getting it to work at 300MHz is a major triumph. 

If there is no way to get phys_opt_design to work better, that would be really disappointing. It is definitely the where this kind of fix should be done. I spent a long time trying to do much simpler custom fanouts in RTL. It is horrible. Every change seems to break the functionality and incurs a day or two of simulation. No thanks. 

With the enormous size of the FPGAs on the Alveo and AWS boards, this is going to be a major stumbling block for a lot of new designs. I sure hope Xilinx is focusing on it.

0 Kudos
Highlighted
294 Views
Registered: ‎01-25-2012

I stumbled across this site https://hwjedi.wordpress.com/2017/02/09/vivado-non-project-mode-part-iii-phys-opt-looping/. I added something similar to my script and am getting way better results. The looping I tried earlier was too limited (only used the -fanout_opt option). The mixture of different directives seems to work much better. It is certainly much better than the sledge hammer approach of -force_replication_on_nets. 

I'm now down to WNS=-0.017 | TNS=-0.927 with a 375MHz clock. Whoo hoo!

Clearly this is a feature that needs to be added to project mode implementation

View solution in original post