02-04-2021 12:33 AM
I'm implementing a large design on ultrascale+ (xcvu9p), all Verilog and very few physical IO IPs (CMAC and PCIe), and I'm using vivado 2020.1. I've noticed implementation results change dramatically if I change the directory that I'm running the design from, or name of the verilog files, still with the exact same content. I guess start time of runs are also different, but I hope that's not changing the results!
I use tcl scripts and makefiles, so results should be completely reproducible. I usually test with couple different implementation parameters, e.g. different strategies in different directories, and then select the best results to be used. However, when I copy the exact same design to be run from the main directory (aka VCU1525_100g instead of VCU1525_100g_updated_modulex), or if I change the winner file for an specific module (for example modulex2.v to modulex.v) the results changes dramatically, sometimes better timing, usually worse. And when I vimdiff the run logs up to end of placement they are exactly the same: number of cells, number of lines in the log, everything (other than directory name, time of run, and checksum values obviously).
The most recent one I met timing with WNS 0.015, which got (WNS=-0.238 | TNS=-2.280) after placement and hence ran the phys_opt. The exact same run, just changed the running directory and filename, I got WNS=0.192 after placement, the phys_opt didn't run, and final run failed timing!
I remember ISE had some seed that you could get consistent results, and then Vivado said we removed the seed, but the run results are consistent. Did they mean even the file names should be the same to get consistency? Is there a way to give it a seed or avoid using directory/file names in the seed? I try to make my design better, but such randomness makes it impossible to conclude which design is better! And also there seems to be no way to force phys_opt if wns is greater than 0!
P.S.: Use of vivado 2020.2 is not an answer, since it does much worse in several designs that I have. I guess it's trying to do a different SLR crossing algorithm which always makes it much worse!
02-11-2021 10:00 AM
I did some more tries and it is actually more confusing! After some changes to the XDC file to reduce the probability of some unexpected errors. I got it to close timing with WNS 0.010 and PR runs also went through nicely and met with the same timing. This successful run was started from VCU1525_100g_12.
Now I copied the whole directory to VCU1525_100g (which is the base one I keep in my repo) and reran the run. It failed timing miserably. Then I created the VCU1525_100g_12 again and copied back the project generation files and ran the run, make clean+make, failed timing miserably, exactly the same! That meant it's not directory location nor time of run.
Then I thought maybe Vivado has cached something. Deleted ~/.Xil and ~/.Xilinx (just kept the node_locked license file), and also found bunch of file in /tmp/ and deleted them all. In both directories did make clean+make, within seconds from each other, in different tmux sessions. VCU1525_100g_12 met timing exactly same as before (0.010), and the other one met timing with 0.005!!!!!
I vimdiffed them to see where the difference starts, and it was the same until 3.1.1 (Global Routing), meaning the exact same number of Unrouted Nets, Partially Routed Nets and even estimated SLL demand per Column, but in 3.1.2 the first iteration of Global Routing starts with Total GR Congestion of 3169 instead of 3173 and the rest of the results are different. In the runs that timing failed miserably (TNS -60) number of unrouted and partially routed was different by 1 or 2 and the rest went super different
I understand Vivado is solving a very challenging routing problem, and there is a heuristic algorithm where a small change can totally change the final results. That being said, the random seed was removed saying Vivado will be deterministic:
But I'm not even trying to explore different designs, I'm just trying to get the same results! The placement seems to be the same, so how can I force the router to do similar steps each time? I'm using Performance_ExtraTimingOpt strategy, maybe there is a bug there. The design is large and fails timing without this, although I'm doubting maybe the directory name wasn't good when I tried other strategies! Literally I cannot draw any conclusions from the runs, and know what to optimize! Let me know if sharing a DCP file or run log can be helpful.
P.S.: Those unexpected errors are sometimes for some directories/XDC-options it got an unroute net error for PR runs, or even sometimes it says two PR blocks are sharing a same clock domain, while all the time it was fine and even in the warning it says vertical sharing while they are left and right! And I have 2 PR runs, it says it for one of them!
02-11-2021 12:24 PM
Please check this AR once on Vivado repeatability: https://www.xilinx.com/support/answers/61599.html
If it is the same design sources and runs are on the same OS then Vivado should give repeatable results if it's run from different directories. First check is to make sure you are using supported OS with Vivado version you are using. When you compare the log files, which phase in run do you see the checksum diverges?
Can you share the design with us? I can send you a private message (ezmove) with steps to share the design only with Xilinx.
02-11-2021 12:52 PM - edited 02-11-2021 12:59 PM
- I'm running Ubuntu 18.04.4 LTS (kernel 4.15.0-96-generic) which is mentioned in Vivado Design Suite User Guide for 2020.1. And they are on the same machine, just different tmux sessions.
- The checksums first diverge in "Phase 3.10 Fast Optimization", even though WNS numbers and number added cells and everything are exactly the same before routing.
- I am running the implementation with single thread (no -jobs 12 or so) and I see it's using single thread in top.
Yes I can share the design, please let me know the steps.
02-11-2021 10:12 PM
I have sent you an email, please check and respond back. In the meanwhile, Can you please retry after applying the following tactical patch for 2020.1 which fixes some known issue related to bug around constraints?
02-12-2021 12:21 PM - edited 02-12-2021 12:24 PM
I've tried the patch but it didn't help. Actually the results became more inconsistent. I also made a separate copy of the project to send it to you, which literally has one less term in the directory path (instead of ~/X/fpga_src it's ~/fpga_src).
- ~/X/fpga_src/VCU1525_100g top directory went from WNS=0.005 to WNS=-0.032 and TNS=-4.578
- ~/X/fpga_src/VCU1525_100g_12 top directory went from WNS=0.010 to WNS=0.000, actually with physical opt after routing
- ~/fpga_src/VCU1525_100g got the same WNS=0.010 as before.
- ~/fpga_src/VCU1525_100g_12 closed timing with WNS=0.000 after phys_opt after routing.
Checking where the checksums diverge, the two _12 ones are exactly the same, and the rest all diverge from "Phase 3.10 Fast Optimization".
P.S.: I verified that the patch is applied, it says "Vivado v2020.1_AR75369 (64-bit)" when starting the run.
02-15-2021 04:13 AM
Does your design have any TIMING-6 or TIMING-7 Critical Warnings in Report_Methodology? These special DRCs check if you have described the relationship between 2 timed clocks but Vivado has not detected any common node/primary clock.
In some cases, although rare, having these in a design can cause QoR issues such as inconsistent results between runs.
02-15-2021 09:12 AM - edited 02-15-2021 03:56 PM
The only concerning warning I get from Methodology is this one:
Net pcie4_uscale_plus_inst/inst/gt_top_i/diablo_gt.diablo_gt_phy_wrapper/gt_wizard.gtwizard_top_i/pcie4_uscale_plus_0_gt_i/inst/gen_gtwizard_gtye4_top.pcie4_uscale_plus_0_gt_gtwizard_gtye4_inst/gen_gtwizard_gtye4.gen_channel_container.gen_enabled_channel.gtye4_channel_wrapper_inst/channel_inst/txoutclk_out is assigned constraint CLOCK_DELAY_GROUP 'group_i0' and is not driven by a global clock buffer. Instead the driver is GTYE4_CHANNEL cell pcie4_uscale_plus_inst/inst/gt_top_i/diablo_gt.diablo_gt_phy_wrapper/gt_wizard.gtwizard_top_i/pcie4_uscale_plus_0_gt_i/inst/gen_gtwizard_gtye4_top.pcie4_uscale_plus_0_gt_gtwizard_gtye4_inst/gen_gtwizard_gtye4.gen_channel_container.gen_enabled_channel.gtye4_channel_wrapper_inst/channel_inst/gtye4_channel_gen.gen_gtye4_channel_inst.GTYE4_CHANNEL_PRIM_INST. This could potentially cause timing issues because of increased skew between synchronous clocks. The CLOCK_DELAY_GROUP should be assigned to the net segments directly driven by the global clock buffers that require their trees to be balanced. Please check your constraints for correctness.
It's a CLOCK_DELAY_GROUP driver check warning with type NTCN-1. But it's coming from inside the Xilinx PCIe IP and I don't have any control over it. I think it also results in a bad practice warning (TIMING-9):
One or more asynchronous Clock Domain Crossing has been detected between 2 clock domains through a set_false_path or a set_clock_groups or set_max_delay -datapath_only constraint but no double-registers logic synchronizer has been found on the side of the capture clock. It is recommended to run report_cdc for a complete and detailed CDC coverage. Please consider using XPM_CDC to avoid Critical severities
You can find the short and long versions of CDC report attached. The critical warnings have exception of "False Path" for the ones that I used a sync reset, or "Asynch Clock Groups" for the warnings inside the Xilixn PCIe core. Please let me know if I'm missing reading something from those reports.
Other than these 2 the rest of the warnings are pointing out async_resets driven from LUTs, BRAM timing might be suboptimal, or input/output delay values which I set myself.
I've also checked DRC warnings, which are few overutilized soft PBlocks or several of this one:
URAM288 mem_reg_uram_0 has CASCADE_ORDER_A (NONE) set to FIRST or NONE, the CAS_IN_ADDR_A[22:0] bus pins should be tied LOW.
It would have been nicer if Vivado would have set it properly after inference, but I doubt if it would cause any problems.
03-15-2021 11:30 AM
03-16-2021 06:48 AM
I don't see anything in the warnings that could be causing the inconsistent results. I'll check in with Syed here.
04-16-2021 11:21 AM
There was no follow up after more than two months, but I might have got a clue regarding the problem.
There is a set of warnings that didn't make any sense and it showed up for some builds and not always (I've put one instance, the others are just different bit in the same register or similar registers). The warning is referring to a module inside corundum (https://github.com/corundum/corundum) that I use as a part of my project:
Phase 1.2 IO Placement/ Clock Placement/ Build Placer Device WARNING: [Place 30-439] A collection of cells which have restrictive placement or routing requirements has some cells within some area groups and some cells outside. This is likely to cause placement problems. The cells involved are "core_inst/pcie_controller_inst/virtual_ports.corundum_inst/iface.interface_inst/port.port_inst/rx_checksum_inst/genblk1.sum_reg_reg_i_1" "core_inst/pcie_controller_inst/rx_checksum_inst/genblk1.sum_reg_reg_i_1" "core_inst/pcie_controller_inst/rx_checksum_inst/genblk1.sum_reg_reg_i_1" in Carry-chain. The area groups involved are "Corundum_pblock" ...
The corundum build by itself does not show this warning, but since I have a pblock for corundum it seems to be triggering this problem. The warning message makes no sense, and I even tried putting a rule in xdc to include all replicated cells from corundum and keep them in that pblock, to no avail. I also tried resizing the pblocks to get rid of this warning, again to no avail. Since the project was working in deployment and the warning was not consistent I gave up on it.
The interesting part is I noticed that warning was present in the inconsistent builds. But recently I updated corundum, and the warnings are gone, and builds seems to be consistent. The confusing part is the module mentioned in the warning was not changed at all, and some modules in the adjacent pblock were mostly changed (pcie dma modules).
@dsheils I'm wondering if there is a way to get insight into that warning if it shows up again, because the warning itself was misleading. Something which is more verbose on what placer is trying to do that causes the problem, leading to inconsistent fast optimization results which is perplexing! I couldn't find any documentation or forum posts about this warning.