11-24-2010 01:37 PM
I'm building the same project on a Windows and Linux machine, and get different results.
The XST differences are in the number of warnings and logic utilization.
The difference is less than 1%, but still I'd like to know why.
Another problem is that the build on Linux doesn't meet timing, whereas the one on Windows does.
The tools I'm using are 12.1 - M.53d
Linux 64-bit server, Windows XP
.xise, .prj. .xst files are identical
11-24-2010 11:06 PM
XST should produce the same results, since the optimization process is deterministic.
Have you absolutely made sure that your settings and inputs are identical, e.g. by using a TCL script (autogenerated with all properties set).
If it still is the way you described, I'd also like to hear some comment from Xilinx on this.
For the implementation part it even matters what speed your PC is running at.
Mainly because P&R uses heuristic algorithms, and obviosly there are timeouts or speed dependend iteration deepths to get acceptable results on slow machines within moderate computing time.
Faster computers give better results, and so the OS may have an impact too.
Have a nice synthesis
11-25-2010 10:58 AM
Yes, the project settings are identical.
Why would PC speed matter for the implementation part? You're saying that a build on the same PC will produce different results depending on the CPU load ?
# Faster computers give better results
That's not what I'm seeing. Faster computer (Win XP vs Linux server, 12.x tools) gave worse results in terms of timing closure. That's consistent with several projects (200-300MHz clocks, high logic utilization) that use mid-range V5 chips.
11-25-2010 11:59 PM
I never mentioned CPU load.
I only said that the OS may have an impact. But there I had in mind different system dependent libraries that may be used to determine system performance or are used in any other way by some of the XILINX PAR algorithms.
The PAR algorithms used by xilinx are not deterministic. Thats why two PAR runs with the same design rarely produce identical results. Of course the number of iterations in the used algorithms could be held constant, but that would cause extremely high runtimes on slower machines. And this annoys customers, at least those who can't afford buing the latest and greatest PC. So, some derating mechanism makes sense, producing an acceptable (but not the best possible) result within a moderate runtime on slower machines.
Of course these conclusions are just based on the observations described below.
My statement that faster computers give better results is based on an observation we made in our lab.
We started PAR with the same data and tool on a number of computers ranging from 500MHz PentiumIII to 2.2 GHz Athlons. All using the same linux OS, The tool was provided by the server (so it also was always the same) , running on the clients.
The faster machines consistently produced better results concerning the timing than the slower ones.
Of course this was seen on some older ISE (probably some 10.x version).
This could have been changed, but I doubt that.
Do the computers you used for your tests also have such a wide range of different performance (up to 4 times the speed)?
If not, your observations may be caused by some other factors. (OS, memory etc.)
11-26-2010 10:28 AM
We have WinXP / Win7 laptops with 4GB memory and 2 GHz dual core CPUs.
Linux build servers are 8-cores and 64 GB memory.
I haven't seen a strong correlation between speed and timing results. What I've seen is that the results are different.
In fact, the projects I've mentioned have worse timing results on faster machines. The build time is over x2 faster on build servers than laptops. This is using ISE 12.x
03-11-2012 04:44 PM
I have been told by the Xilinx webcase team that this is a known bug(different build results on Windows and Linux machines) and is fixed in ISE 13.3. But there is no public release notes available for this.
05-23-2013 09:17 AM
There is no such thing as "deterministic" algorithms - only shades of gray. If you control all possible variables, for example all the inputs to a system such as RTL and parameters, scripts, timing and physical constraints, the executable, the hardware machine, OS version and memory available to it, and limit mult-threading and all other processes running concurrently on the machine you can approach a "repeatable" result from run to run - but this is not quite the same as "deterministic." You have to understand that algorithms, particularly when you inject parallelism where mulitple threads are scheduled and may complete at different times relative to one another affect the outcome of any computational process. We have made great strides with Vivado to remove the most common and obvious sources of non-repeatability - but there should never be any expectation of perfection.
It is true that if you hold all things the same and get different results from run over run we consider that a bug, but how often is that useful or even possilbe. Some of our customers need this - and go to great lengths to ensure this with methdology and expensive hardware enclaves. But for the majority - your energy is far better spent assuring that the design is robust enough and reasonable to implement such that you get repeatable results. The further upstream you work the more repeatable and more quickly your designs will close. This means architecting your RTL and choosing the most appropriate IP and parameters to achieve your desired specifications. It means providing complete and reasonable constraints, and in particular considering clock domain crossings and IO. It means only overconstraining where necessary with minimal constraints. And it means debugging and iterating as far upstream as possible in Synthesis or earlier to ensure the design has the best chance in the back end implementation.
Finally, you can do all these things and still have design closure issues. The closer you are to the upper limits of fabric performance and utilization of physical resources, the higher your chances of having closure and repeatbility issues - bugs in the tools notwithstanding. This is true not just of Xilinx FPGA methodology, but of our competitor PLD flows, ASIC and full-custom circuits as well.
10-27-2014 11:10 AM
There is an answer record AR# 23904  that on the contrary implies for the implementation process to be deterministic and that any event of random behavior is considered a bug or a developers fault. Is this AR obsolete or how can I interpret it?
I would like to hear more details on sources of non-determinism in these algorithms. I understand that according to many sources such as wp361 - Maintaining Repeatable Results , we have to focus on writing nice HDL and synthesis. But I would still like to have a more detailed insight on the topic, to have a peace of mind. Some coworkers have blamed the tools for their problems before and since we sometimes work on QoR projects I would like to be able to trust the tools to do their job.
By running a few small designs repeatedly under same conditions, I tried to reproduce the problem, but was not successful. Checksums were always the same. Do differences in results occur only rarely and under big designs? I used ISE 14.1.
I look forward to receiving a reply.