cancel
Showing results for 
Search instead for 
Did you mean: 
925 Views
Registered: ‎02-19-2019

Issues Setting up LSF

Jump to solution

Hello Xilinx board!
I've been trying to setup a test LSF cluster at my work, and I've run into a series of problems while trying to get this to work that I was hoping I might find some answers to here.

For a bit of background, my machine is running Ubuntu 16.04 in VirtualBox running on a Windows 10 host machine. I'm using Vivado 2018.2. The cluster currently only consists of an Ubuntu master and an Ubuntu slave running in separate virtual machines.
After working through some permissions issues, I am able to now start the cluster, test it using Vivado (which it passes), and send the slave work.

The LSF command I'm using for this is:
bsub -R select[type=X86_64] -N -q normal -m slave-vm

Which puts jobs in the "normal" queue and sends them only to my slave-vm (excluding the master from jobs).
I can see directories in the .runs directory get populated:

2019-05-28 15_57_25-Ubuntu 16.04 [Running] - Oracle VM VirtualBox.png

(For my simple test project I only have three out of context modules that need to be synthesized).
Part way through synthesis of any of these modules however, I receive an error that looks like the following:

 

*** Running vivado
with args -log zcu102_led_controller_simpl_0_0.vds -m64 -product Vivado -mode batch -messageDb vivado.pb -notrace -source zcu102_led_controller_simpl_0_0.tcl


****** Vivado v2018.2 (64-bit)
**** SW Build 2258646 on Thu Jun 14 20:02:38 MDT 2018
**** IP Build 2256618 on Thu Jun 14 22:10:49 MDT 2018
** Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.

source zcu102_led_controller_simpl_0_0.tcl -notrace
Command: synth_design -top zcu102_led_controller_simpl_0_0 -part xczu9eg-ffvb1156-2-e -mode out_of_context
Starting synth_design
Attempting to get a license for feature 'Synthesis' and/or device 'xczu9eg'
INFO: [Common 17-349] Got license for feature 'Synthesis' and/or device 'xczu9eg'
INFO: Launching helper process for spawning children vivado processes
INFO: Helper process launched with PID 7633
---------------------------------------------------------------------------------
Starting RTL Elaboration : Time (s): cpu = 00:00:04 ; elapsed = 00:00:08 . Memory (MB): peak = 1659.742 ; gain = 0.000 ; free physical = 5584 ; free virtual = 9201
---------------------------------------------------------------------------------
INFO: [Synth 8-6157] synthesizing module 'zcu102_led_controller_simpl_0_0' [/home/projects/xilinx-zcu102-2018.2/hardware/xilinx-zcu102-2018.2/xilinx-zcu102-2018.2.srcs/sources_1/bd/zcu102/ip/zcu102_led_controller_simpl_0_0/synth/zcu102_led_controller_simpl_0_0.v:57]
INFO: [Synth 8-6157] synthesizing module 'led_controller_simplified' [/home/projects/xilinx-zcu102-2018.2/hardware/xilinx-zcu102-2018.2/xilinx-zcu102-2018.2.srcs/sources_1/bd/zcu102/ipshared/809c/hdl/led_controller_simplified.v:23]
INFO: [Synth 8-6155] done synthesizing module 'led_controller_simplified' (1#1) [/home/projects/xilinx-zcu102-2018.2/hardware/xilinx-zcu102-2018.2/xilinx-zcu102-2018.2.srcs/sources_1/bd/zcu102/ipshared/809c/hdl/led_controller_simplified.v:23]
INFO: [Synth 8-6155] done synthesizing module 'zcu102_led_controller_simpl_0_0' (2#1) [/home/projects/xilinx-zcu102-2018.2/hardware/xilinx-zcu102-2018.2/xilinx-zcu102-2018.2.srcs/sources_1/bd/zcu102/ip/zcu102_led_controller_simpl_0_0/synth/zcu102_led_controller_simpl_0_0.v:57]
---------------------------------------------------------------------------------
Finished RTL Elaboration : Time (s): cpu = 00:00:05 ; elapsed = 00:00:13 . Memory (MB): peak = 1659.742 ; gain = 0.000 ; free physical = 5445 ; free virtual = 9069
---------------------------------------------------------------------------------
Failed to open file ./.Xil/Vivado-7542-seth-VirtualBox/elab.rtd. Please check the path and rerun synthesis.
invalid command name "NULL"
INFO: [Common 17-83] Releasing license: Synthesis
6 Infos, 0 Warnings, 0 Critical Warnings and 1 Errors encountered.
synth_design failed
ERROR: [Common 17-69] Command failed: Synthesis failed - please see the console or run log file for details
INFO: [Common 17-206] Exiting Vivado at Tue May 28 16:11:37 2019...

 

The file it fails to open, elab.rtd, I have verified is being created at that path with proper permissions.
This was hard to verify, because as soon as the failure occurs, the .Xil directory is wiped.

In an attempt to simplify the problem, and eliminate the intricacies of LSF from the equation,
(since I assumed I was running into a permissions issue of some sort),
I've put my small test project on a shared network drive and have tried synthesizing it with a single computer (the master VM).

I have the project directory mounted as a samba share using the following entry in /etc/fstab:
//192.168.7.2/share/Seth/xilinx-zcu102-2018.2 /home/projects/xilinx-zcu102-2018.2 cifs user=***,pass=***,rw,file_mode=0777,dir_mode=0777 0 0

Surprisingly enough, I get the same exact failure as above when trying to synthesize a project stored on a network drive when not using LSF.
If i take the same exact project, put it locally on my machine at the exact same path as where I mounted it, it synthesizes just fine.

Is having a project directory stored on a network share something that is supported by Vivado?
Is there something special I need to do to make this work?
Since LSF is an option for remote synthesis, and LSF requires you to have shared directories
between computers in your cluster, I assumed that this was something that was supported by Vivado.

Any tips or hints from people who've tried out or done this configuration are greatly appreciated.
I've been banging my head against the wall trying to get this to work!

Thanks,
~Seth

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Mentor
Mentor
919 Views
Registered: ‎06-16-2013

Re: Issues Setting up LSF

Jump to solution

Hi seth.kramer@spectranetix.com 

 

It seems VirtualBox issue or samba issue.

I suggest you to use latest VirtualBox and NFS instead of samba without delayed writing option.

 

Best regards,

View solution in original post

0 Kudos
5 Replies
Highlighted
Mentor
Mentor
920 Views
Registered: ‎06-16-2013

Re: Issues Setting up LSF

Jump to solution

Hi seth.kramer@spectranetix.com 

 

It seems VirtualBox issue or samba issue.

I suggest you to use latest VirtualBox and NFS instead of samba without delayed writing option.

 

Best regards,

View solution in original post

0 Kudos
Highlighted
887 Views
Registered: ‎02-19-2019

Re: Issues Setting up LSF

Jump to solution

Thanks for the reply watari!
I'm using a very recent version of VirtualBox (which I've since upgraded to 6.0.8 and didnt resolve the issue)
but you're probably correct about the write delays causing my issue.

Unfortunately I'm not a networking person by trade, so I'm probably going to have to do some
spelunking to try and figure out how i might reduce the delays.

My original test setup was using a directory shared via samba on my master VM,
and then mounted on my slave VM.
In briefly looking at the samba configuration options, it appears something like:
socket options = IPTOS_LOWDELAY TCP_NODELAY
might help.

Will have to look into setting up NFS as a possible option.

0 Kudos
Highlighted
847 Views
Registered: ‎02-19-2019

Re: Issues Setting up LSF

Jump to solution

So I tried setting up my Samba server with different config options to minimize the delay, but was unsuccessful with just:
socket options = IPTOS_LOWDELAY TCP_NODELAY

Based on suggestions, i then went to setup an NFS server on my master.
Had some permissions issues getting this to work at first, but worked through them.
This solved my issue and now LSF works in Vivado for me.
Below is a brief description of setting up NFS for LSF in hopes that it helps someone in the future trying to configure this.

I set up my intial NFS configuration by using the guide here:
https://www.linuxuprising.com/2018/11/easy-nfs-share-setup-in-ubuntu-linux.html
Essentially it uses Simple NFS GUI to give you a basic implementation.

I then made some modifications to my master and slave to get permissions that would allow
the slave to both read/write and execute files.

Master
I modified the entry for my share made in /etc/exports to look like this:

# Shared folder NFS as Server
/home/build/projects/ 192.168.7.91(rw,all_squash,anonuid=0,anongid=0,sync)

From my understanding, what this does is squash all user account access from clients to
an anonymous user who has root access (anonuid=0,anongid=0).
This could be a large security risk if used as a final implementation, but it works just fine for my testing.


Slave
I modified the entry created in /etc/fstab:

# Shared folder NFS from Server & mount point
192.168.7.92:/home/build/projects /home/build/projects nfs exec,auto,hard,intr 0 0

I changed the mount path (/home/build/projects in my case) to match that of the master,
since for LSF to work, the paths to the project directory must be the same for all hosts in the cluster.
I also modified the default "users" option to "exec", since "users" defaults to the noexec option.
A good way to check the current options for a given mountpoint is by looking at the entry made for it in /etc/mtab.

Highlighted
97 Views
Registered: ‎06-25-2018

Re: Issues Setting up LSF

Jump to solution

Hi Seth,

 

What kind of cluster software are you using?

 

I'm trying to find out what kind of cluster software that works with Vivado, eventually to setup a cluster. 

 

Thanks,

 

Magnus

0 Kudos
Highlighted
44 Views
Registered: ‎02-19-2019

Re: Issues Setting up LSF

Jump to solution

We were using IBM's Platform LSF.
Ended up not being a viable solution for our needs though, as Vivado 2018 didn't have great support for using LSF. It can only be used for precious few parts of the compilation process- details below as I remember them.

For Synthesis:
Initial IP generation can only be done on the Client computer, not the cluster.
For large projects this can take a considerable amount of time as it appears to only use one thread / one core.

OOC IP however can benefit from a speedup due to the number of jobs being created that the cluster can work on. This all depends on the dependencies between IP though, as we had several large IP that gated the progress of working on subsequent components. If the overall clock speeds of the systems in your cluster aren't greater than that of the Client computer you're using, you may not get as much of a speedup as you were hoping for.

For Implementation:
This doesn't appear to work for LSF. I encountered a bug when doing this step that I wasn't able to resolve, asked on the forum, and received no help. I would assume however, that since a max of 8 threads is only ever used at one time, that if you have a sufficiently powerful Client computer you wouldn't receive a speedup in this area.
Using remote hosts is an option for implementation though, so if your Client computer is somewhat weak, this might be something you would want to explore.


In Summary:
It appears that LSF is only useful for doing OOC IP generation.
Remote hosts a possibility to speedup Implementation if the Remote system is much stronger than the Client system.

~Seth

0 Kudos