08-26-2018 10:15 AM
I see much longer elapsed times for vivado 2018.2 compared to 2017.2 on a Linux system. I use Ubuntu 2016.04 LTS, thus a supported system. I have cross checked on a Debian Wheezy system, with very compatible trends.
The issue is visible in synthesis, implementation and bit stream generation, so simply post it in the synthesis section of the forum.
It's most pronounced for small designs, the elapsed time shown in the GUI is
2017.2 2018.2 synth_1 0:51 1:31 impl_1 1:04 2:57
so both synthesis step as well as implementation are much slower in 2018.2 than they are in 2017.2 .
Two example projects, with identical sources, ran on the same CPU, are attached.
Usually I use a fully scripted implementation flow. The elapsed times for a somewhat larger design, but still small by todays standards, for synthesis only, synthesis and implementation, and full synthesis+implementation+bitgen are
2017.2 2018.2 synth 1m08.6s 2m16.6s synth+impl 3m10.6s 6m56.1s synth+impl+bit 3m50.1s 9m16.1s
From this it's clear that all three steps are substantially slower.
When looking at a CPU monitor (like good old xosview) one immediately sees that for 2018.2 runs vivado uses a lot of system state time. In certain phases, which last for half a minute, one sees 75% system time. This is not observed for 2017.2. Needless to say that tests were done on an otherwise idle machine, and that no significant paging was visible.
I used pidstat to trace user and system state time and got for full synth+impl+bit runs
very small design 2017.1: usr: 136.07 sys: 6.36 tot: 142.43 2018.2: usr: 180.85 sys: 190.21 tot: 371.06 small design 2017.1: usr: 492.83 sys: 19.99 tot: 512.82 2018.2: usr: 853.77 sys: 224.00 tot: 1077.77
It is clearly visible that usr time slightly increases, and sys time explodes by more that an order of magnitude.
Last but not least I checked in which phases vivado 2018.2 behaves differently that vivado 2017.2
--- synth step --- INFO: [Device 21-403] Loading part xc7a100tcsg324-1 --> 75-80% system time --- impl step --- INFO: [Device 21-403] Loading part xc7a100tcsg324-1 --> 75-80% system time Starting Cache Timing Information Task --> 55-80% system time report writing INFO: [Device 21-403] Loading part xc7a100tcsg324-1 --> 75-80% system time INFO: [Timing 38-478] Restoring timing data from binary archive. --> 75-80% system time --- bitgen --- INFO: [Device 21-403] Loading part xc7a100tcsg324-1 --> 75-80% system time INFO: [Timing 38-478] Restoring timing data from binary archive. --> 75-80% system time INFO: [IP_Flow 19-2313] Loaded Vivado IP repository 'Vivado/2018.2/data/ip'. --> is simply idle for quite some time
From all the above the bottom line is
1. it seems that a very inefficient I/O library is used
INFO: [Device 21-403] Loading part ... INFO: [Timing 38-478] Restoring timing data from binary archive
2. it seems that vivado waits for something when loading the IP repository
The effects I've seen are quite drastic, and reproducible on two Linux systems (one being a supported Ubuntu 2016.04 LTS). So I wonder whether others have seen similar issues
08-26-2018 11:06 AM
thats a fantastic set of tests, well done
Must admit I see small designs taking longer too,
whilst large designs seem to be faster,
but that is all it was a feeling,
I'm looking forward to some xilinx input on this
09-01-2018 08:18 AM
did you see this under Unix/Linux (if yes, which flavor), or under Windows ?
It be very interesting to understand whether this is a Unix/Linux integration issue, or a general one.
09-01-2018 08:35 AM
different companies I work for have various versions of widows and linux,
And as I said, its a feeling,
Generally, for vivado, Linux is faster than windows, windows 7 was faster than 10,
but the original test that were not done by me, are very interesting
09-01-2018 09:11 AM
I've done a little bit more research on this issue. I've tested two designs
snhumaio ~190 LUT; ~160 Flops; ~1% slice of 7a35 (Basys3) w11a ~5700 LUT; ~2400 Flops; ~20% slice of 7a35 (Basys3)
so one 'null-design' and one small design, both for a Digilent Basys3 board on two systems
sys1 Ubuntu 16.04 LTS; dual core sys2 Debian 7; XEON quad core with hyperthreading
which both have Vivado versions from 2016.4 up to 2018.2 installed, and get as elapsed times for my scripted build flow (syn+imp+bit)
-- snhumanio -- ----- w11a ----- sys1 sys2 sys1 sys2 2016.4 3m44.2s 2m34.2s 8m00.0s 5m00.5s 2017.1 3m28.5s 2m19.6s 8m00.6s 4m50.9s 2017.2 3m43.6s n/a 8m35.9s n/a 2017.3 n/a 6m12.1s n/a 9m09.1s 2017.4 9m36.5s 7m17.7s 15m14.7s 10m10.1s 2018.1 9m30.4s 7m47.8s 14m08.9s 10m20.5s 2018.2 8m55.2s 7m19.3s 13m47.6s 10m01.8s
The snhumanio essentially measures general setup overhead, because there is little to compile or route.
From this is apparent that
As stated before, watching a process monitor, like xosview, shows for 2018.2
I used pipstat to trace this, and tried to visualize this with gnuplot. Four pictures are attached
which show the CPU utilization (in %) over time, red is system state time, green is user state time.
The transitions from synth to impl to bitgen phases are nicely visible.
It is striking that 2018.2 has extended times with about 75% system time, while 2017.1 shows only a very moderate system time fraction. The data for the pictures were taken on sys2, on an otherwise idle system, back-to-back in one session.
It be nice to hear from others whether they see similar effects. Some feedback from Xilinx is of course also much appreciated.
09-01-2018 09:42 AM
I am currently checking your testcase with Vivado 2018.3 internally. And i will get back to you with the outcomes. If the issue still persist we need to send it to the factory.
09-03-2018 02:54 AM
I have filed a CR (Change Request) on this issue. Let factory look into it and they may make necessary changes in upcoming versions of Vivado.
09-03-2018 12:23 PM
09-03-2018 09:00 PM
Yes, 2018.3 and 2018.2 showed same results with high run time compared to 2017.2. Hence, a CR.
09-05-2018 10:02 AM
thanks for the statement. From this I conclude that the essentials of my observations could be reproduced. I'll flag this as 'solved' when I see the Vivado version which restores the pre-2017.3 performance.
Thanks again, Walter
09-24-2018 02:08 AM
Not surprising. Software has become a code-and-stuff pile-up exercise. It has reached that point where faster clocks (not much faster nowadays) cannot cope with the global software overweight. Time for GPU makers making money until the next bottleneck. Look at android apps... how many can you stuff in a GB? They are lean. It's the opposite for PC apps: how many GB takes an app... Anyone sees why?
01-03-2019 06:21 AM
I've re-tested the cases described on 2018-09-01 with Vivado 2018.3, the measured execution times for a syn+imp+bit run were
-- snhumanio -- ----- w11a ----- sys1 sys2 sys1 sys2 2017.1 3m28.5s 2m19.6s 8m00.6s 4m50.9s 2017.2 3m43.6s n/a 8m35.9s n/a 2018.2 8m55.2s 7m19.3s 13m47.6s 10m01.8s 2018.3 4m32.2s 3m44.5s 12m01.7s 8m20.2s
Vivado 2018.3 is a bit faster than 2018.2, but still significantly slower than 2017.1 or 2017.2. I've also repeated the pipstat traces. Vivado 2018.3 does not longer show the excessive amount of system time seen
in 2018.2, see appended figures. However, there is still an hard to explain idle time of several 10 secconds at the end of the bitgen phase. This was also observed for 2018.2, but not for 2017.1.
01-03-2019 11:59 AM
01-03-2019 01:05 PM
In the tests I did on one of my projects, you may see an substantial improvement on 2018.3 on both synthesis and implementation. Of course, your mileage may vary.