cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
mmatusov
Voyager
Voyager
1,654 Views
Registered: ‎02-17-2009

Multithreading in Vivado

I know there has been a number of discussions on this topic but I was wondering if someone could summarize the current state of the affairs. I have a big design in a RFSoC, which is mostly a BD with lots of IP cores. I have a fast machine with 48 CPU cores available but it seems that Vivado doesn't make a good use of them, especially during preparation for OOC runs whatever this means. I set general.maxThreads to 32 (maximum accepted) but still see only 4 or 5 CPU cores being used. This phase is taking many hours for my design after every small change in the BD. 

21 Replies
joancab
Teacher
Teacher
1,618 Views
Registered: ‎05-11-2015

Threads is a software division that doesn't necessarily translate to cores. The "multi-core adventure" that started when they couldn't turn up the clocks any more didn't work well because of the extra investment needed by software manufacturers that didn't do. So that's the sad reality: many cores, a few in use and lots of people believing they have something powerful with just an educated minority spotting the trick.

mmatusov
Voyager
Voyager
1,606 Views
Registered: ‎02-17-2009

@joancab - Have you just called me uneducated? I understand that some parts of the software don't lend themselves well to multi-threading and/or using multiple cores but preparing OOC runs seem to be going through sub-designs sequentially and I am still hoping that it's me doing something wrong.

markcurry
Scholar
Scholar
1,605 Views
Registered: ‎09-16-2009

> The "multi-core adventure" that started when they couldn't turn up the clocks any more didn't work well because of the extra investment needed by software manufacturers that didn't do.

To be fair - only certain sets of problems can be helped by parallel threads. (Amdahl's law).  It's not just a case of "software manufacturers" (i.e. Xilinx) not doing something. It's just that only rare parts of the build process can actually benefit from using multiple cores.  Last I remember, place, route and timing can make use of 8 cores, Synthesis only 2. 

If one can figure out a better algorithm to "parallelize" the FPGA build process, you'll be a rich person.

drjohnsmith
Teacher
Teacher
1,567 Views
Registered: ‎07-09-2009

Totaly agree with the above,

The tools are inherantly single core, 

     some bits can use two cores, but hats about it,

 

An interestng aside,

   not done this for a few years, but might be worth while,

    try a virtual machine,

   the last experiments we did,

   a 16 core processor could "simulate" the 2 core processor faster than the real processor !

     The monitor software we had showed that more cores were being thrashed than when we had no VM,  and the results came out quicker,

Then if you use linux on the VM, things get even quicker, 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
mmatusov
Voyager
Voyager
1,556 Views
Registered: ‎02-17-2009

If one can figure out a better algorithm to "parallelize" the FPGA build process, you'll be a rich person.

I think what I am asking at the moment doesn't require much magic. As I've said preparing OOC runs phase should be able to take advantage of at least as many cores as there are separate sub-designs. 

0 Kudos
mmatusov
Voyager
Voyager
1,554 Views
Registered: ‎02-17-2009

@drjohnsmith - That's interesting about a virtual machine. With regards to Linux, is your experience that Vivado works faster in the Linux environment? Any idea by how much? Thanks.

0 Kudos
richardhead
Scholar
Scholar
1,500 Views
Registered: ‎08-01-2012

You need to increase the max jobs attribute.  This defines the max number of ooc runs that compile in parallel.  It can assign maxthreads per job. 

While loads of cores is great for running the ooc runs in parallel,  these arnt run very often (unless you do a fresh rebuild every time). Otherwise having maxthreads more than 4 doesn't really give much benefit. 

Synthesis doesn't use much more than 1 thread. The only thing that the really uses all the threads is the router. Placement generally uses less. 

joancab
Teacher
Teacher
1,417 Views
Registered: ‎05-11-2015

@mmatusov , no, you spotted it, didn't you?

0 Kudos
drjohnsmith
Teacher
Teacher
1,401 Views
Registered: ‎07-09-2009

Re different enviroments / PC's

 

Certainly there has been lots of experimentation over the decades,

but its a moving target,

  and actual numbers form yesterday are irrelevant.

Try it,

   its costs nothing but your time, 

thngs like 

  memory speed / amount of memory , GPU card or in built GPU, mother board type and in particular its chip set / bus speed.

     all have an effect,

"gamers" PC's performance rather than "desk top" PCs is what your looking for. 

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
dpaul24
Scholar
Scholar
1,382 Views
Registered: ‎08-07-2014

@mmatusov ,

In the older Vivado, there used to be a init.tcl file (or one had to create it, do not remember) under C:\Xilinx\Vivado\<vivado-version>\scripts

where something like

set_param general.maxThreads 8

could be added.

For the recent versions I am not very sure how it is or can be done. I did not bother to find because as explained by someone else above, it affects only OOC and Routing runs (which are not run very often).

------------FPGA enthusiast------------
Consider giving "Kudos" if you like my answer. Please mark my post "Accept as solution" if my answer has solved your problem
Asking for solutions to problems via PM will be ignored.

richardhead
Scholar
Scholar
1,375 Views
Registered: ‎08-01-2012

@dpaul24 

You can set them via params on the current project, or arguments to the build commands:

set_param general.maxThreads $max_threads
launch_runs synth_1 -jobs $max_jobs

 

mmatusov
Voyager
Voyager
1,336 Views
Registered: ‎02-17-2009

@joancab - There was a smiley face after my question when I was typing it but it didn't make it through for some reason... The truth is, I am probably not educated enough.

0 Kudos
mmatusov
Voyager
Voyager
1,325 Views
Registered: ‎02-17-2009

> "gamers" PC's performance rather than "desk top" PCs is what your looking for. 

That's what I have:
AMD Ryzen Threadripper 3960X 24-core  CPU
ASUS ROG STRIX TRX40-E Gaming motherboard
128 GB DDR4 3600 MHz
SSD hard drive

0 Kudos
maps-mpls
Mentor
Mentor
1,318 Views
Registered: ‎06-20-2017

>In the older Vivado, there used to be a init.tcl file (or one had to create it, do not remember) under C:\Xilinx\Vivado\<vivado-version>\scripts --@dpaul24 

Nice tip.  BTW, it's now called Vivado_init.tcl.  Same location (on windows). 

There are other tricks to speed up builds (e.g., HD) but I find them more useful in non-embedded design flows. 

BTW, PetaLinux tools are much better if you have a lot of cores. 

When a Vivado build from scratch takes more than 2-4 hours, that is when I start employing other tactics.  A 45 minute build is not worth it. 

I just dug up some stuff I found helpful on RFSoC builds, that may or may not help @mmatusov (specifically, look into create_ip_run and generate_target commands...which you may need to refine and more selectively target):

 

 

set MY_IP_1 whatever_it_is1
set MY_IP_2 whatever_it_is2
set GENERATED_BD_FILE whatever_it_is3 ; # often design_1.bd, may need full path
# Create OOC synthesis runs without starting OOC synthesis
# in order to set properties of some of the OOC runs
create_ip_run [get_files -of_objects [get_fileset sources_1] $GENERATED_BD_FILE]
# Turn on retiming and performance optimization during OOC synth for these two high performance IPs
set_property strategy Flow_PerfOptimized_high [get_runs $MY_IP_1]
set_property STEPS.SYNTH_DESIGN.ARGS.RETIMING true [get_runs $MY_IP_1]
set_property strategy Flow_PerfOptimized_high [get_runs $MY_IP_2]
set_property STEPS.SYNTH_DESIGN.ARGS.RETIMING true [get_runs $MY_IP_2]
generate_target all [get_files $GENERATED_BD_FILE]
set_param general.maxThreads $NUM_THREADS
launch_runs synth_1 -jobs $NUM_JOBS
wait_on_run synth_1

 

There are other issues (not shown) I had to solve to get reasonable builds, such as applying placement properties via tcl script for a tcl script determined number of primitives (DSP48s).  However, most projects won't need that level of optimization.  (I needed it because I used up to 100% of the DSP48s at max frequency, and without the attributes, Vivado implementation takes a long time finding an optimal solution to place 100% of DSP48s).

There may be syntax errors above, I had to do some editing to remove project identifying information.

 

*** Destination: Rapid design and development cycles *** Please remember to give internet points to those who help you here. ***
drjohnsmith
Teacher
Teacher
1,287 Views
Registered: ‎07-09-2009

For reference,  Big builds can easily take over night , 

   

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
maps-mpls
Mentor
Mentor
1,269 Views
Registered: ‎06-20-2017

>For reference, Big builds can easily take over night

Your builds sound far less than optimal. Could be:

1.  Poorly written constraints.

2.  Too small a part for design.

3.  Project mode.

For reference, you want to target 2-4 builds per day or have lots of room for milestone slips...meaning, you're working for government.

*** Destination: Rapid design and development cycles *** Please remember to give internet points to those who help you here. ***
0 Kudos
markcurry
Scholar
Scholar
1,249 Views
Registered: ‎09-16-2009

> Your builds sound far less than optimal.

Not in my experience.  We regularly have 6-8 hour jobs. (xcvu7p-flva2104).  For the bigger multi-die parts it can easily go longer.

This is just an unfortunate side effect of these larger, deeply embedded parts. Is it ideal - no way!  Can it be better - I hope so!  But for these designs only getting 1 spin a day (or two if I'm lucky,and get up early for the first spin, and stay up late for the second) isn't ideal, but just the way things are.

--Mark

0 Kudos
mmatusov
Voyager
Voyager
1,243 Views
Registered: ‎02-17-2009

@maps-mpls - Thank you for the tips! I must be doing something wrong, probably at multiple levels, but my worst nightmare right now is how long it takes after a small change to the BD design, such as disabling TREADY on a few AXIS_SWITCH blocks, to actually start synthesis. I am doing this in the project mode and I believe Vivado is essentially doing BD validation after I launch a synthesis run. I don't know if the sheer size of my BD design is the problem but it takes hours before a few required OOC runs actually start. After that it is not too bad. Implementation takes about 6.5 hours and that's where my multi-core machine really shines as I can run multiple strategies in parallel in the same amount of time. 

0 Kudos
richardhead
Scholar
Scholar
1,217 Views
Registered: ‎08-01-2012

@maps-mpls 

Thats not entirely fair.

1. 2019.2 has a known issue processing constraints. moving to 2020.2 chopped an hour off our builds (and looking through the logs, a LOT of this time is in constraint processing)

2. This is usually not a variable and is a constant. Its pretty normal to add features over time.

3. I dont know about the difference. But I really dont want to handle the threading building 100 OOC xilinx IPs in non-project mode (launch_runs synth_1 handles it for you in project mode)

Our builds are currently 4 hours. And this has been a fair benchmark everywhere Ive worked. A previous job required me to build 40 builds a night at 8 hours / build (we had plenty of horsepower) to get maybe 2-3 builds that met timing. And that had plenty of area constraints. I never want to have to do that again.

 

0 Kudos
maps-mpls
Mentor
Mentor
1,136 Views
Registered: ‎06-20-2017

@richardhead 

I did not mean to be unfair.  

4 hour build is okay for 2 builds / long day during debug.

*** Destination: Rapid design and development cycles *** Please remember to give internet points to those who help you here. ***
drjohnsmith
Teacher
Teacher
962 Views
Registered: ‎07-09-2009

plus one build over night !! ..   

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>