07-31-2017 08:07 AM
I'm using Vivado 2016.4 (Kintex Ultrascale) and have a design where hold timing is the difficult thing to meet. I've done some clock tree balancing with global buffers, but it's an ASIC design I can't change much.
I have a non-project mode build script that goes through the normal (opt/place/phys_opt/route) steps. I've noticed I can then run phys_opt/route again to improve hold timing and after doing that twice the router finishes with:
Phase 10 Post Router Timing
INFO: [Route 35-57] Estimated Timing Summary | WNS=0.000 | TNS=0.000 | WHS=0.025 | THS=0.000 |
However, when I run report_timing_summary it fails on 123 end points (WHS -0.076, THS -2.625).
I realize the router is making an estimate, but I'm curious if it isn't realizing that hold wasn't met and giving up too early? Ideally I'd fix the clock network to balance all the synchronous clocks, but outside that is repeatedly running phys_opt_design -hold_fix and routing again reasonable/safe?
08-01-2017 03:47 AM - edited 08-01-2017 03:49 AM
Try using different place and route directives. As the hold is failing by 76ps it might help. Please post the report_timing_summary log file here,
08-01-2017 07:56 AM
I'm not able to share the full timing report, but here is a general summary of each stage of the build which might help:
Stage Time tnsFailingEp whs ths thsFailingEp thsTotalEp
Synth 0 0 -0.333000 -1452.589000 20403 187277
opt_design 00:01:08 0 -0.333000 -6975.234000 52308 187276
place_design 00:14:02 0 -3.221000 -16354.364000 22574 187276
phys_opt_design 00:23:46 0 -2.849000 -16170.429000 22510 187291
route_design 02:01:17 0 -0.963000 -48.244000 674 187291
phys_opt_design 00:32:20 0 -0.430000 -40.233000 654 187291
route_design 02:39:52 0 -1.516000 -14.927000 356 187291
phys_opt_design 00:25:56 0 -1.483000 -25.421000 348 187291
route_design 02:06:57 0 -0.076000 -2.625000 123 187291
There are about 70 clocks defined and several are FF dividers. Balancing the clock skew on synchronous clocks would be the ideal solution for sure.
opt_design -sweep -remap -propconst
place_design -directive ExtraTimingOpt
phys_opt_design -directive ExploreWithHoldFix
route_design -directive AdvancedSkewModeling
The additional phys_opt_design calls are with -hold_fix only. Additional route_design is -directive AdvancedSkewModeling. I've since tried some other build flows similar to this and seen the hold time increase, so I'm guessing this isn't a consistent strategy. I also noticed I have set_param route.enableHoldExpnBailout 1 in that script, which seems to still give similar results (almost identical) but take less time. Maybe that parameter is why the router hold estimate didn't match the timing summary?
I'm happy to try other options, but was mainly posting to understand if the router did not realize hold time was still failing. Now that I realize enableHoldExpnBailout was on, maybe that explains it. I'm still a bit surprised it reports 0 THS though.
08-07-2017 04:59 AM
08-08-2017 08:36 AM
Thanks for the input. A little more background:
Yes, this is an ASIC prototype build and I totally understand having all the fabric generated clocks are discouraged. I'm very familiar with designs for FPGAs, but this is my first ASIC prototype. Their goal is to avoid changing the code as much as possible to also emulate their clock design. Due to the number of dividers and clock muxes there isn't an easy path to Xilinx clock resources for all clocks. On the plus side the clock rate is <20mhz and setup is not an issue.
Most of the design actually is synchronous to a divided and undivided clock with lots of clock gating. The constraints were carried over from the ASIC and appear to cover all the CDC cases. The problem is definitely high skew caused by all the clock gates/dividers on synchronous clocks. The real issue is the number of functional clock gates throughout the design from what I can tell.
Ideally I'm hoping to get code changes to at least balance the clock logic paths and minimize some of the skew.
We've tried using Synplify Pro clock gate conversion, but it ended up partially converting a lot of clocks which lead to even more skew problems. I will be keeping my eye out for good spots to use Xilinx resources to simplify things, but a lot of the gating is functionally required and not easy to replace.
I'm happy to get any suggestions related to this problem.
I'm still curious if the Vivado router could think it met timing and finish when in reality if hasn't met timing on some nets. Or maybe it really couldn't do any better and knew it wasn't done, but due to rounding or something it reports 0 timing errors in the estimate?
08-08-2017 08:46 AM
Also, I ran report_clock_interaction and it looks ok, although I hadn't looked at that previously. Thanks.
08-10-2017 03:04 AM
08-11-2017 09:05 AM
I don't think I could share the full report due to all the path info. The failures are not in the async section. The failures are a mix of Intra/Inter clock paths but all on synchronous paths that have logic (gates/muxes) in the clock path. Being an ASIC emulation effort we don't really want to modify that if possible.
Also to be clear, the "report_timing_summary" output does show these failures. It's the "route_design" summary that I'm talking about, which clearly says it is *estimated* timing, but no negative slack.
If I look at a timing report for the worst failing path, the setup check has ~100ns of slack, but hold fails by 76 ps. It seems like that would be something it should be able to fix... That's why I wonder if the router some how overlooks this small failure.
08-11-2017 09:24 AM
Also, routing congestion is ~60% horizontal/vertical... so I'd think routing longer paths would be possible although some nets are already around 2.5ns.