cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
melkin
Explorer
Explorer
950 Views
Registered: ‎02-13-2012

Is there an implementations comparison tool?

We run vivado using non-gui tcl scripts (aka batch mode).  The general flow is:

run synth
write post_synth.dcp

run opt
write post_opt.dcp

run placer
run phys_opt
write post_place.dcp

run router
run phys_opt
write post_route.dcp

generate bit and bin files
generate reports

 

Given the .dcp files created above, we have a TCL script which will rerun placer, router, and phys_opt with a select set of 16 alternative placer "-directive" and 2 alternative flags.  After the placer, phys_opt and router use the original -directive settings.  When all implementations are done, the script generates a .csv file which to summarize timing closure and run time for these 32 different implementations.  Quite often, only several of these achieve timing closure.

I have observed that even smallish changes to RTL will result in different placer directives being successful.  And the vivado build log and reports show significantly different intermediate results.

I would like to be able to compare these results, either against each other or against prior runs, to get a better understanding why the various placer directives succeed one time but fail other times.

Is there a tool to help do these comparisons?  It can be either built-in or TCL based or even a 3rd party software product.

10 Replies
avrumw
Expert
Expert
906 Views
Registered: ‎01-23-2009

I would like to be able to compare these results, either against each other or against prior runs, to get a better understanding why the various placer directives succeed one time but fail other times.

You can't - it is impossible to find a pattern in these results since none exists.

If you think about some of these processes (placement is easiest to understand) the number of possible solutions is truly immense. I use this example to understand this. Take a (by today's standard) tiny FPGA - the smallest Virtex-5. It had 4800 slices. If you had a design that used 75% of them (3600 slices) then when it got time to placement, the placer can place the first slice in your design in one of the 4800 slices, the next one in any of the 4799 remaining ones, the next in the 4798, .... The number of possible solutions is 4800P3600, or 4800!/1200!, which is 6x10^12411 possible solutions.

If you started iterating through these solutions at the moment the universe began (the Big Bang)  and started trying a billion every second since then, you would have been able to search about 4x10^26 of these solutions (of the 6x10^12411 that exist). This gives you an idea about how immense this search space is.

When you have a search space this big, you can never find the "best" solution, you are looking for one that is "good enough". For this you have heuristics that attempt to walk through this immense space finding increasingly better solutions. Another way of looking at this is attempting to find a minimum in a curve in a 4800 dimensional space. You start somewhere in this immense space, and start walking in a direction that has a better result then your current one. You continue until you find one that is "good enough" or you get trapped in a local minimum that isn't good enough. 

All of the heuristics for doing this are "mathematically chaotic". This means that your result is extremely sensitive to your starting point. If you change anything in the system - your RTL, your constraints, your project options, any LOC'ed cells, you are starting in a different spot in this immense space, and the "walk" will end up in a completely different location in the space. This is chaotic.

What you are seeing in your process is the result of this chaos. By changing your directives, you are simply changing your starting point, and, due to chaos, you are ending up with a completely different result. If you change your RTL, then you are changing it again, so a directive that worked great on your last version of the RTL has no more or no less of a chance of giving you a good result on your slightly modified result. There is no pattern...

Avrum

 

Tags (1)
melkin
Explorer
Explorer
852 Views
Registered: ‎02-13-2012

Avrum,

First, thanks.  I understand your arguments about the vast number of possible solutions; although I would also point out that with good timing, i/o pin, and internal location constraints, the probability calculations cannot include every cell ending up in any possible location within the FPGA sites.  A CMACE4 cannot be placed in general SLICE locations, nor in a PCIE hard macro site; an 1.8V LVDS IO will be defined and fixed to a LOC by the board designer.  Once those are placed, then timing constraints and floor planning will force associated and related cells to be localized in that area.  Nonetheless, this still leaves an extremely large (and uncalculatable) set of possibilities.

My inquiry was also not asking for a tool that would predict results a priori.  The starting conditions for a given design are the same -- i.e. the .dcp after synthesis and opt-design have run.  Once the set of implementations for that has finished, some post-route-post-physopt results will have met timing and some will not.  Vivado will report which timing paths have constraint violations in each of the results.  With the help of a post-run analysis tool, patterns can emerge and be interpreted by a human mind (or maybe by a ML algorithm some day).  Those patterns might then provide clues to the humans where they can strengthen the design's inputs (RTL, constraints, tool directives and flags) to make the results more predictable and less random.  Or in the chaos theory terms you used above, to reduce the quantity and/or depth of local minimums.

My inquiry was not asking for a magic bullet.  I was simply asking the community if there was any information they could share about an analysis assisting tool.  Maybe nothing exists.  Maybe some research has been done that is far from a practical application for FPGA developers.  Maybe there are some prototype algorithms that others are willing to share.  But without asking, each of us is stuck in our own local minimum in the solution space.

drjohnsmith
Teacher
Teacher
830 Views
Registered: ‎07-09-2009

A great question, 

   Im looking forward to hearing ,

 

One question,

    your hghlighted failng paths,

        and then say that the aim is to make the results more predictable 

Why would you want to predict failing paths ?

 

It sounds a bit like your trying to reverse engineer the Xilinx proprietary tools ?

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
avrumw
Expert
Expert
811 Views
Registered: ‎01-23-2009

A CMACE4 cannot be placed in general SLICE locations, nor in a PCIE hard macro site; an 1.8V LVDS IO will be defined and fixed to a LOC by the board designer. 

All of this is true - there are different resources that can only go in certain locations, or are normally in fixed locations through things like PACKAGE_PIN constraints and associated I/O constraints.

Once those are placed, then timing constraints and floor planning will force associated and related cells to be localized in that area. 

So this isn't actually true - they are not "forced" to a location, the cost function (which for Vivado is minimizing a function of worst negative slack, wire length and congestion), which is influenced by constraints, is heavily weighted to draw them toward specific locations, but all solutions are viable. Presumably most heuristics will relatively easily walk away from clearly high cost solutions (like having a flip-flop that is connected directly to a LOC'ed IOB being placed on the opposite side of the die), but they are still part of the search space.

And even if you reduce the search space by a billion fold, that reduces the space to 6x10^12402; it is still unimaginably large.

But really my point behind this is that the system is truly mathematically chaotic. My suspicion is that no system, be it a human mind or an AI could detect a pattern in the results because the chaos in the system precludes there being any pattern.

In fact, this is one of the things that we as humans have trouble understanding. I can't count the number of times I have heard the question/complaint from a designer that they had a design that met timing, and had to make a seemingly insignificant RTL change to an unrelated (and not timing critical) portion of the RTL and now the design can't meet timing, and they don't understand why that is. As humans we are constantly looking for patterns, for cause and effect. When we encounter truly chaotic systems which have no patterns, we still grasp for them and have trouble coming to grip with the fact that none exists.

Anyway, to answer your question, I know of no such tool, and I have never heard of anyone seriously investigating one. And it's certainly not for lack of desire - this problem has been plaguing the electronics industry (and, I'm sure, many others) since the beginning, and has therefore cost companies immense amounts of money. So the monetary incentive for finding better solutions to this are certainly there... 

The implementation processes within Vivado are the culmination of years of investigation, improvement and tweaking, taking approaches from all kinds of industry and academic investigation. They are continually evolving and improving - Vivado is leaps and bounds better than ISE, and newer versions of the tools (perhaps not monotonically) generally do better than older versions. But they still have the same problem - these are still heuristics (now better heuristics) for dealing with the immense problem space.

Avrum

melkin
Explorer
Explorer
757 Views
Registered: ‎02-13-2012

So this isn't actually true - they are not "forced" to a location, ... 

Semantics.  And perhaps a bit of linguistic ambiguity.  I used "force" (present tense, not past tense) to describe an attractor, not a limiter.  Like any force-of-nature, there can be counter-acting forces.  There are only a fixed number of SLICES and ROUTES available in a fixed region in the FPGAs, creating a repulsive force which will not allow the entire circuit to be packed into the closest possible resources to the attractor.  Similarly, a second attractor can be pulling on those same logic elements.  The proverbial conflict between the unmovable object and the irresistible force.

 

... I have never heard of anyone seriously investigating one. ... So the monetary incentive for finding better solutions to this are certainly there...

The implementation processes within Vivado are the culmination of years of investigation, improvement and tweaking, taking approaches from all kinds of industry and academic investigation. They are continually evolving and improving....

Those two statements almost seem at odds with each other.  The fact that Xilinx applies industry and academic investigations to evolve and improve their product would suggest that an investigation along these lines is a possibility.  Applied science and engineering often take years/decades/centuries to turn out practical tools once the basic research has matured. 

And I am sure there have been dead-ends or cost prohibitive hurdles that have stopped developments.  We were approached ~5 years ago by a couple of startup developers.  Their product was conceptually something similar to what I was inquiring about.  Their product took a logic design, ran Vivado with a lot of different options, then fed the results into their proprietary AI ML algorithm.  The output of that was a set of recommendations which should produce better implementation results.  The drawback for us was that 1) it was still in development, 2) it was expensive, 3) we absorbed all the risk, they did not.  I don't know if they were ever successful.  They have not returned to tell us about their advances.  Perhaps their business model failed.

Improvements and advances are not made by those that say "it cannot be done".  I am not some starry eyed young engineer seeking to revolutionize the world.  But I am not so jaded as to believe that there is zero value in such a product or project.  I believe in Deming's approach to problems -- continuous, small incremental improvements versus leaps and bounds attempts.  Maybe this will be a sideline project for me in the few years left until I retire.  I may not even finish it but hand it down to the next generation behind me.

Anyway, this has been an interesting discussion.

melkin
Explorer
Explorer
741 Views
Registered: ‎02-13-2012

One question,

    your hghlighted failng paths,

        and then say that the aim is to make the results more predictable 

Why would you want to predict failing paths ?

Predicting failing paths is not the goal. Understanding the differences between multiple approaches that led to different results (success/failure) is the goal.  Then using that to guide next efforts to the more likely paths to success is the goal.

The process is actually twofold.  Understand was works well and preserve that.  Understand what is not working and avoid/change that.  The saying "crazy is doing the same thing over and over and expecting different result" seems apropos.  Specifically from the avoid perspective, it could make sense to reduce the number of placer directives being tried if those never result in a successful implementation.  That frees up computation resources and indirectly saves time and cost.

It sounds a bit like your trying to reverse engineer the Xilinx proprietary tools ?

No.  I have no desire to reverse engineer Xilinx's tools.  The Xilinx tools are a black box and will remain so from my perspective.  But through experimentation and observations of that black box, if I can identify some approaches that work better than others, that is sufficient.  

What I am doing is trying new strategies and techniques that support my team's deliverables goals (FPGA implementations) to downstream engineering teams, and ultimately my employer's competitive edge and time-to-market goals.

drjohnsmith
Teacher
Teacher
610 Views
Registered: ‎07-09-2009

"What I am doing is trying new strategies and techniques that support my team's deliverables goals (FPGA implementations) to downstream engineering teams, and ultimately my employer's competitive edge and time-to-market goals."

 

Thats a great goal.

  At the end of the day, we all learn tricks over time that make the simulation / synthesis / implimentation time quicker or makes a better result

Look at the MUX 7 in the newer parts, seem the tools still have fin optimising them, and mux's can still be made by hand better than the tools, but 99.9% of the time, the tools are dammed good enough.

One question,

    as I say, we have all picked up tricks over the decades,

       but I am also conscious that the tools change, in some case quiet radically over time,

ISE to vivado was a big one,

    ISE "clasicc" to ISE 3 was a big one,

Let alone ABLE to VHDL etc etc.

Even now, the current Vivado tools seem to have very different optimisation routines to the first generations, 

   I know some of the bits I do out of habit now, have no real effect on the code produced where as they used to be essential.

So quesiton is

   how long do you think the work will give your employer an advantage ?

Out of interest,

   do you follow this ?

https://www.xilinx.com/products/design-tools/ultrafast.html

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
syedz
Moderator
Moderator
494 Views
Registered: ‎01-16-2013

@melkin 

 

I would like to check if you are aware of Report QoR Assessment (RQA) which is a new Vivado feature that will give a score and guidance for design closure. 

Also, check this forum blog: https://forums.xilinx.com/t5/Design-and-Debug-Techniques-Blog/Design-Closure-with-RQA-and-RQS/ba-p/1118594 

 

--Syed

---------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.

Did you check our new quick reference timing closure guide (UG1292)?
---------------------------------------------------------------------------------------------
0 Kudos
avrumw
Expert
Expert
467 Views
Registered: ‎01-23-2009

Those two statements almost seem at odds with each other.  The fact that Xilinx applies industry and academic investigations to evolve and improve their product would suggest that an investigation along these lines is a possibility

I don't see them as being at odds with each other. It is one thing to look for ways of improving by tweaking the heuristics for how you do your walk - how you decide which direction to walk in, how you decide to avoid (or get kicked out of) local minima, even how to tweak your cost function (and probably many many more that I can't think of). This is an ongoing task, and probably one that will always be able to be improved.

But what you are asking for is looking at the starting point (in this case, only one starting point, the directives) and only the ending point (the fully implemented design) and looking for a pattern to emerge between these two. It is my assertion that due to the chaos in the solving heuristics, no such pattern exists. 

Avrum

 

driesd
Xilinx Employee
Xilinx Employee
378 Views
Registered: ‎11-28-2007

Hi@melkin ,

interesting discussion here. Although there is some truth in @avrumw's words that there is some mathematical chaos, Vivado is very much timing driven.

Obviously pre-placement, the imprecision of the timing estimation is a problem, but it is timing driven and a path with 1 logic level will be handled very differently than one with 10 logic levels. Or register with a fanout of 5 very different than register with a fanout of 500.

My point is: Vivado's algorithms have vastly improved over the algorithms of ISE and we developed a methodology to get the best QoR out of Vivado: the UltraFAST Design Methodology (UFDM). You can take a training course on that or you can read UG949 or one of the Quick Reference Guides UG1231 or UG1292.

The most important to know about Vivado is that it timing driven and WNS driven.The timing estimation is pretty good, so main principles of UFDM is baselining and consists of two parts:

1) ensure your constraints are pristine (correct and complete). Review the check_timing section of the timing summary report and review the methodology report.

We did a blog series on the methodology report a while back. Take especially a look at "Using the Methodology Report Part Two: Effect of Methodology Violations on QoR"

You should not have any of the following violations:

  • TIMING-6
  • TIMING-7
  • TIMING-8
  • TIMING-14
  • TIMING-35

In 2021.1 this will become more apparent as once your ran the methodology report, these most critical violations for QoR will also be shown in the timing summary report.

2) review and close timing after every stage of Vivado and monitor regressions. Regressions at any stage point to specific issues:

  • after synthesis: here you will see timing issues due to logic levels. In addition you could run the failfast report (see UG1292) to analyze timing using net budgeting or LUT budgeting
  • after placement/physopt: here you will see timing issue due to physical issues like fanout or SLR partitioning
    • make sure you don't have any large hold timing failures (> 0.5ns WHS) before proceeding to route_design
    • ensure you don't have any setup failures as router will only...route
  • during route design timing may regress due to routing congestion or hold time fixing

Follow this methodology as simply comparing implemented designs will not show you the real problems. If you don't tackle the real problem like a logic level issue after synthesis or an SLR crossing issue after placement, Vivado will focus on these paths and won't spend enough effort on other paths that it could have solved otherwise.

 

I understand this is maybe not the exact answer you were looking for, but this should guide you to the correct methodology for best QoR of your design with Vivado.

 

Best regards

Dries

--------------------------------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented by clicking the star next to the post.