02-22-2018 04:10 PM - edited 02-22-2018 04:50 PM
Could someone give me a reality check on how fast the Vivado tool chain is expected to run? I mean the entire Synthesis, Implementation, Generate Bitstream sequence.
Here's what I get on a simple "hello world" design involving a single top.v with a simple 40-bit counter, sending 8 outputs to pins, no additional IP, targeting a CMod A7-35. I am using pretty much out-of-the-box settings in Vivado, so far as I am aware.
If I make a minor change to top.v, then click on "Generate Bitstream", this invokes the entire chain (no surprise), which takes a total of over 3 minutes to run, using 100% of one CPU, and sometimes up to an additional 50% of another.
Windows 7, on AMD Phenom 6 core CPU with 8G or RAM. There does not seem to be undue memory paging and not all that much disk activity in general, so this seems to be CPU bound.
Is this normal? Or am I unknowingly asking it to go particularly slow somehow?
02-22-2018 04:32 PM
@gwideman Before I answer, let me ask a couple questions.
Are you coming from a software development background? Are you comparing Vivado with say, MS Visual Studio?
02-22-2018 04:47 PM
The short answer is: I am comparing it to the clock.
If you are asking whether I have experience with other environments that have a build process, yes I do. I don't know how this would be relevant, since I don't expect a build process that produces an "X" to have any relationship to one that produces a "Y".
I do expect that development iterations should be as fast as possible, because waiting costs time, attention and salary. As a consequence, I would expect that if a build process doesn't have to do very much (surely producing a bitstream for a counter is pretty easy), then it should run fast.
Hence my motivation to find out whether a baseline build time of over 3 minutes is normal, or have I missed something?
02-22-2018 05:03 PM
A build time for any modern FPGA of 3 minutes would be considered extraordinarily fast. Even for a "hello world" type application.
Most FPGA build times are measured in hours. In *rare cases* 10s of minutes. The level of integration of even the smallest available FPGA's is still quite large.
02-22-2018 05:17 PM
@gwideman This is such a broad question. But here is one point I have to offer before others can chip in with more details: VS/gcc compilation is a linear process. FPGA synthesis/elaboration is an optimization process (in the sense of mathematical optimization). That is the fundamental difference. It is so that two runs with the same inputs can have different results (non deterministic), while with VS/Gcc you are guaranteed to have the same binary checksum in the end.
But yes, all "FPGA compilers" take quite a long time to finish. Targeting smaller devices speeds things up considerably. That is why I love my Zybo and Zedboards for rapid prototyping - results are typically 10 times faster than targeting a larger device like the MPSoCs. It is not abnormal for a design to take several hours to finish. This is the same with any vendor, by the way.
With experience, you will see there are a few things you can do about it, among them:
- using the RuntimeOptimized directive
- using incremental compilation
02-22-2018 05:31 PM
hbucher: I appreciate your taking the time to answer. And yes, I understand that FPGA optimization is an iterative process, though I would assume at this late date it's hardly a naive optimize-from-scratch process, and must have some heuristics to give it a sensible start.
I am very interested to try the various options that might speed things up. Sadly, though I have already seen the Vivado Implementation doc, and discussions of the Flow_RuntimeOptimize and Flow_Quick flows, which seem promising, I didn't find a description of how to actually invoke them in the Vivado UI.
So if you could fill in that blank, that would be a great leap forward.
The same goes for Incremental Compiliation. That sounds promising too, but when I follow the instructions at the link you provided, I get to the step where it wants me to "Browse to the reference DCP file as the incremental compile checkpoint", and that clearly depends on some other step that hasn't been explained. (And the file dialog pops up, and I have no idea where to browse to in order to find a suitable file.)
Again, filling in that blank would be lovely!
Thanks -- Graham
02-22-2018 05:37 PM
Actually the MPSOC designs can take comparatively lower time to implement if you baseline with die size. The MPSOC cores represent a larger portion of the design that's non-malleable. It's fixed, so there's little to "work on" if you will.
Some hard numbers for Graham.
One of our ZU7 MPSOC designs, that's relatively empty, and has easy timing goals: ~1 hour implementation.
A moderately full ku040 design: ~5 hours.
Another moderately full ku040 design, but has harder timing goals: ~8 hours.
I just saw your new post inquiring about incremental flows, and other similar strategies in order to speed things up. Those incremental flows are NOT aimed at helping those design that just take 10s of minutes or less to build. Those flows are designed to help (even larger designs that mine above) - trying to decrease 10s of HOURS of build time into just a few hours.
Just trying to level-set your expectations.
02-22-2018 05:59 PM
@gwideman As any genXer would know, there is a youtube video for that
Strategies - see item (e) page 25 on UG904
02-22-2018 06:48 PM
So just a couple of points...
First, 3 minutes is nothing. You have to understand that Vivado is designed to handle huge designs - the stated goal is 10+million LUTs. In order to be able to process designs of that size in reasonable time, the databases that Vivado uses has to be optimized to be able to support the (I was going to say billions, but its way larger than that) optimizations that need to be done in order to complete a reasonably full reasonably large FPGA. The format of these databases are critical...
The time spent creating these databases and transferring these databases from process to process are insignificant as compared to the overall run time on a large design. However, on a tiny design, they dominate - almost certainly the vast majority of the 3 minutes you are seeing is this maintenance - virtually none of it being the actual synthesis or implementation. This is in spite of the fact that the synthesis/implementation process does start completely from scratch on each run (there is no caching or reuse done on designs).
Furthermore, I suspect that you are running in project mode. The overhead of project mode - again, while not significant on a reasonable sized design - is dominating on your small design. In project mode, the design is written to and read from disk more than once, and the constraints are processed a couple of times - all of this is overhead. If you were to do the same design in non-project mode, it would probably be measurably faster.
But again, this whole discussion is meaningless - the speed of implementation of a tiny design is a don't care to most people. What they care about is the speed of large designs - on these Vivado performs very well - substantially faster and more predictable than the previous tool (ISE). But, even with that, and even with very fast machines with lots of memories, the complete implementation process for large FPGAs (particularly one with tough timing requirements) is measured in hours - sometimes even a good number of them.
One last point:
It is so that two runs with the same inputs can have different results (non deterministic)
This is not strictly true. In fact, this would be considered a bug in the tools. The correct term for the implementation tools is "chaotic", not "non-deterministic". While chaotic sounds worse, it isn't...
The mathematical definition of "chaotic" is that the process is extremely sensitive to the initial conditions. Stated differently, any change, no matter how minor or how insignificant it seems, can result in a drastically different implementation. This is true of the implementation process - we often hear the complaint "My design was meeting timing, and then I made one tiny change to fix a bug in a totally non-critical part of the design, and now it doesn't meet timing anymore" - that is the chaotic nature of the system.
But as chaotic as it is, it is deterministic. What that means is that starting with exactly the same input conditions (i.e all the same RTL files, the same tool options, the same constraints...), you will get exactly the same result. And Vivado is deterministic (or at least is is supposed to be!).
02-22-2018 06:53 PM
@avrumw Theoretically correct. Vivado would be deterministic IF you can control for EVERY variable. However in practice you hardly can.
For example, how can you possibly control for cache effects on an Intel platform? Put another process running or even a system service, and you got enough entropy to affect the vivado result.
02-22-2018 07:09 PM
For example, how can you possibly control for cache effects on an Intel platform?
Things like this are not supposed to affect the determinism of Vivado. The heuristics used to perform the process (i.e. the code that is the Vivado synthesis and implementation processes) are deterministic. If things like these "extra-algorithm" effects (what else is running on the machine, the cache state) could affect an algorithm running in a process NO code running on a machine would be deterministic.
So, if you use
- the same RTL code (with absolutely no change at all)
- the same constraints (with absolutely no change at all)
- the same script to control the run (from scratch - starting by creating a clean project)
(and assuming the script doesn't do any seeding of the run, like changing a version number or a date code)
- the same version of the tool
You should (I would argue must) end up with exactly the same bitstream.
It is even supposed to be true if you run these same conditions on a different machine, although I have heard that the Windows and Linux versions of the tools don't generate bit exact results with each other (even though it was my understanding that they should).
02-22-2018 07:29 PM
@avrumw Would you say then if a rogue process (some user's crontab) kicks in and takes some cpu, the result is likely to be different?
I saw this because I have seen this question coming in over and over of people frustrated that they run continuous integration tools and they want to rely on checksums to sign off on their "best practices". This is a big deal in banks, for example, with lots of regulation and scrutiny from everyone, especially after the 2008 mess. Invariably I see them rant and vent with everyone, Xilinx employees trying to help and the thread gets left hanging without conclusion.
I think people have to be told and understand that to be reproducible, the runs have to happen in a nearly-perfect environment, which is very hard to achieve.
02-22-2018 07:41 PM
Would you say then if a rogue process (some user's crontab) kicks in and takes some cpu, the result is likely to be different?
My understanding is no - this is not supposed to make a difference.
But I have no mechanism to state this "authoritatively". I have done some training material development where a lab was run multiple times by multiple people on multiple computers with exactly the same inputs, and I have observed exactly the same outputs - the same routing resources, the same placement, the same timing exactly. So, I know that you can get exactly (or apparently exactly) the same results from the same inputs. But I cannot extend this observation to "all designs everywhere" (and by design these were small and uncomplicated designs).
But again, it was my understanding that one of the stated goals of the tools is determinism...
Maybe someone like @austin can get an "official" answer on this one...
02-22-2018 07:56 PM
@avrumw Here's a thread left on a dissonant note
02-22-2018 09:53 PM
Thanks @hbucher for the additional pointers to video and docs on Incremental Compilation and Strategies. That was helpful.
I have tried creating and configuring a run using Synth RuntimeOptimized and Impl Flow_Quick, and that sped the entire run from about 3:10 to 2:35.
02-23-2018 07:59 AM
Yes, determinism is quite important. For example, the safety critical systems business requires that the tools also meet the requirements involved in designing, creating, and maintaining a safety critical system. Getting the tools certified is far easier if they produce a consistent result given the same requirements. As the tools, and device(s) are certified as a IEC 61508 Safety Element out of Context (SEooC) for an example design, the process of getting your actual system certified is simplified. The final system must still go through the certification process, but you will not have to prove the device or the tools are safe, we have already provided evidence of that. You get to concentrate on how your design is safe.
02-23-2018 08:23 AM
So, to be clear, I am correct in asserting that the Vivado tool set is deterministic, even in the presence of "outside" interference things on the computer like what @hbucher is describing (i.e. processor context switching, processor cache state).
Given this, do you have any ideas about the observations highlighted in the thread linked by @hbucher, where a customer seems to be describing some non-determinism?
02-23-2018 09:06 AM
Only those tools certified would be under consideration (ie 201x.y) for a certified solution. I believe we have one so far, I cannot recall which one. All later versions will likely work, too (such as 2017.4).
Building for Zynq involves much more than 'just' a bitstream. Depending on the OS chosen, it might also contain many open source dependencies, which, if not locked done to specific (hash) versions, will lead to differences.
Vivado for the synthesis, implementation, bitstream is all that I am speaking to.
A certified OS for safety critical is its own exercise.
02-23-2018 10:38 AM
@austin Ok so I was not aware that Vivado actually go to lenghts to ensure that determinism is enforced. That is actually very good to know.