02-05-2020 06:45 AM
I'd like to integrate the FPGA toolchain into our regular CI setup, which is based on Jenkins and Docker, to close the last remaining gap in our pipeline.
We already keep VHDL sources in version control, and we have automated testing of bitstreams (where the image is deployed to several machines, written to flash, and run through a testsuite on real hardware), but what is currently missing is automated compilation.
Is there a minimal Docker image containing just commandline compilers and the IEEE libraries that we could base this setup on (passing in information about the licensing server through environment variables)?
02-07-2020 11:43 AM
The answer here is going to be a resounding "no".
We have nightly automated builds, and regressions. We avoid the Vivado GUI like the plague, and have mature script based FPGA build flows. But all of that requires a full Xilinx toolchain install - which is quite large. It'd be nice if it weren't, but an FPGA build is a complicated process, not really comparable to a software "compile" beyond a first glance comparison.
I'm interested, perhaps, in hearing other's experience with sticking the Vivado toolchain in a Docker image, and how that works. But it's going to be a very large image.
(Note, I've no actual experience with Docker, just a passing curiousity that I might want to play with it one day...)
02-07-2020 01:16 PM
> But it's going to be a very large image
I suggesto you to consider "volume mount" option, "rootless mode" and "using NFS" to reduce container image size.
These keywords are very inportant to prepare CI/CD environment.
02-07-2020 01:48 PM
I understand shared NFS mounts (and shared guest/host volumes) quite well.
I guess my ignorance of Docker, Jenkins, and CI/CD environments is showing. Maybe you or the @simonrichter could help clear things up for me.
I understand the basic concepts of each, but not sure how they apply to FPGA design. Perhaps another thread is in order, but I'll add a few more comments/questions here until people tire of it.
From my (again rudimentary) knowledge, Jenkins servers (and CI/CD env in general) have parallels to ASIC/FPGA nightly builds, and nightly self-checking testbenches and regresssions. Stuff us digital designers have been doing for more than 25 years. That's not intended as a dig at software folks. The nature of digital designs, bugs can cost months of delays and very expensive (>$1M!) new masks sets/etc. We've been forced to do these types of things for a very long time. Verification is king.
Docker is somewhere in between a linux "chroot" on steroids, and a full on virtual guest machine. I'm not exactly sure where it fits in the middle, but the basic concept is one can create (potentially) many virtual sandboxes, where one can test/verify new software. Question, on a specific host, the only variation between different docker images is just the underlying file system correct? One doesn't vary architectures in Docker images a la virtualization, correct? Said another way, the ABI of executables is fixed across all docker images (and must match the host?).
How either of these applies to FPGA (or ASIC) design, I'm unclear - and actually quite curious. If anyone out there is doing things like this for FPGA design - I'd be curious to understand their flow.
(Not a dinosaur yet... And willing to learn)
02-10-2020 03:23 AM
@markcurryCI builds are precisely that, and the goal is to have a single system that implements the entire pipeline from the source code to a tested installation image of the entire software stack.
If there is an FPGA in the system, there is usually an upgrade mechanism in place that is under the control of the device firmware, part of system startup is a test whether the FPGA version is the expected one, and an automatic reload if not, because that way we can always be sure to have matching interfaces, and we can reuse the normal software update channels for FPGA updates.
Generation of the FPGA bitstreams is just one step somewhere in the middle of a longer pipeline, usually there is a step before that generates HDL files for the bus interface and the register map, so these are consistent with the software's view -- e.g. if the bus interface reacts to a fixed physical address, there will be an OpenFirmware device tree fragment generated as well that becomes part of the bootloader, so the OS knows where the hardware is mapped and which driver to load.
Because the pipeline is complex, the setup instructions for the required toolchain is also supposed to be machine-readable, and this is where Docker comes in: I have a description file that says "to install Xilinx ISE 14.7, unpack this tar file into an empty directory, then run the "batchxsetup" utility and point it to this batch description file. This is run inside a container, and the contents of the file system are then archived afterwards and given a name to refer to them, so whenever a compilation step requires ISE to be installed, it can refer to the tag, and the pipeline orchestrator makes sure the step is run inside a container with ISE installed.
Without Docker, this could still be done, but someone would have to manually install ISE on a machine and connect that machine to the CI system.
That would still leave the problem of running the build automatically, which I've asked in a separate thread, because deploying ISE in Docker is already a difficult enough topic on its own. I can somewhat successfully build a container with ISE installed, but it is massive because all the GUI tools and the IP are installed when only the commandline tools and the IEEE library would suffice for us, and we don't have any clue about licence handling because containers are destroyed after the build is complete, which takes the checked-out licence with them.
02-16-2020 02:06 PM
Sorry for my late reply.
If you have already prepared build flow with tcl and shell script on Vivado, you can easly try building CI/CD flow.
But I guess it's a little difficult to describe build flow on Jenkins. It's a little complex for hw engineer. Because the purpose to build final design is differebt between SW and HW.
02-17-2020 12:34 AM
I recommend you try and keep the FPGA compilation and SW compilation separate.
Wherever I have worked where CI is involved, the FPGA firmware is a known good binary used in the SW build flow. Because there is so much variability in a FW build, and the time it takes, it usually best to have the FW as a separate pipeline.
We have our FW pipeline set up as:
Checks - sims - build - test - publish
Checks would be simple syntax checks, quick FPGA elaboration etc.
You need to have your own scripting set-up for this to work. Because build resources are limited, and builds take several hours, the build phase onward are all manual. Once a build is "known good" it can be put into whatever location needed so that the SW can take the binary as part if its build processes.
We have this all setup for vivado. No Experience with ISE
02-18-2020 01:59 PM
Our flow is similar to Richard. A "CI" like flow for RTL -> Simulation -> regression sims -> Build FPGA (repeat). We've done this in Vivado and ISE just fine. (Although I'm still wondering what Jenkins and Docker could offer to this process. I don't see any benefits..)
Another "CI" loop that software uses downstream. Jenkins + Docker used a lot there.
But the bridge between these - test, and publish new FPGAs is an extensively manual process. I think this is a fairly standard flow in the industry. IMHO keeping the hardware "CI" loop and software "CI" loop separate is probably a wise thing as Richard notes.
There's certainly overlap. Part of the regression sims run for the FPGA would certainly include self-testing sims targeting the hardware->software API. But it's going to be targeted. You're going to be running, at best, a heavily redacted version of production software in simulation - due to the sheer run time.
On the other side - on actual hardware, you can be running mostly production software, but then your visibility is limited. It's all an optimization game - how does one verify our design in the most optimal, and most complete way?
02-19-2020 02:01 AM
@markcurryDocker is just a container system. The idea is that if you have a docker image containing the tools you need to build, you need minimal setup on a server - the docker image just deploys to a machine with all the tools in it and runs. So you can just add a load of generic servers and the docker container just gets deployed and the job run - then the docker image is torn down at the end of the build.
This can be done for Vivado, but often you need machines that have different requirements to what software needs - usually a fast processor with lots of ram, software builds usually just wants lots of threads.
We used to have jenkins running overnight simulations and builds on command. The issue with this was it all ran out of master using idividual jobs. We have migrated to gitlab and now have a CI setup such that whenever anyone pushes to a branch it runs all the syntax checks and simulations - they get quick feedback if they broke something in their branch. The idea is that master should never be broken because the checks all got fixed in the branch. They cannot press build on anything if any of the sims broke. This all comes for "free" as part of a pipeline setup.
For various reasons we might be migritaing back to jenkins, but jenkins pipelines do seem to be more capable than gitlab ones currently. Gitlab also has some really really annoying issues (like jobs getting cancelled if they dont get started for an hour!!!). Its really designed around software builds and the idea that a build may take longer than an hour seems mostly alien to the developers! Its not even something you can easily fix. If you can write some python, you can work around it, but its not exactly clean.
Jenkins should have none of these silly problems.
02-19-2020 06:01 AM
I'd like to optimize in two places:
For one, I need to make the setup repeatable across multiple machines. So far, we've had one person who actually understands VHDL in the room, now we have two, so now there are two machines on which we compile, these need to give consistent results, and version control has just gotten a lot more complex. We can't work with "just check in everything including the logs and binaries" anymore, because all of these files cause merge conflicts.
The other thing is that I'd like to automate the tests that are running on actual hardware to see that PCIe link training actually succeeds on all ten test machines.
For the latter, we can use a CI system to automatically deploy new bitstreams to a rack full of test machines, these are then automatically programmed into flash and a test utility is run afterwards and the reports collected.
I would now like to close the gap to the first problem as well: the build machine should get the files that are under source control and only these files, and it should produce a bitstream that is then tested, to make sure that the project in source control is correct and consistent.
The Xilinx toolchain is fairly brittle here: if I run xflow and tell it to use fpga.flw, it will either use the one that is already there, or copy the standard one to the current directory and then use that, so if one developer decides to change the flow file in order to pull in an extra user constraints file, and forgets to check it in, they get a working bitstream when they compile locally, but the project state in version control does not work because its timings are too loose, with no error indication during build whatsoever. If the other developer then updates the project, they get a non-working state and have to debug why that is, and that costs a lot of time.
Ideally, developers would test locally as before, with the machine on their desktop, and if they are satisfied, check in their changes to version control. Over night, new bitstreams are built and run the full gauntlet, then the results are manually inspected and the bitstream released for inclusion in the next software version (which is still a manual step on the software side as well, so they have a version control entry for the change to a newer bitstream, as that might require interface adaptations).
02-19-2020 06:31 AM - edited 02-19-2020 06:34 AM
The firmware build does not need to go into the repo - just into some known location or blob store. The problem with rebuilding the firmware every time is that the results are not garanteed to meet timing. And ISE doesnt even produce the same results for the same source code + seed. Hence why its best to have a "known good" firmware build constantly available without having to rebuild. This is how all companies Ive worked at have done it.
As for testing on real hardware - it is perfectly possible from CI. You'll have to set up your own scripts to do this.
02-19-2020 06:48 AM
I'll chime in here, as I have a few years experience of using Jenkins to automate FPGA verification and build release processes. My $0.02 would be to simply focus on using Jenkins, without trying to use Docker. I think the tradeoff is that you'll need to make sure that the Jenkins execution hosts have been setup with the proper tools before the Jenkins job can be executed on them, which I presume is what SW folks use Docker containers for (to spell out the dependencies needed on the system for a given software build), I don't have a lot of Docker experience, so take that with a grain of salt.
In my experience (across two companies automating their FPGA verification and release process), I've seen that the exact workflow used will depend heavily on the existing team workflows, and whatever the existing IT support is. In one company, we had "nightly builds" that ran on any design that had changes during the day, and sent out "nastygrams" to anyone who broke things, in addition to requiring that before any release was made, the entire simulation and "build" processes complete without any errors or timing violations. That worked well at that company. Now, at another company, we validate our simulations at every pull request back to master (since here git is used, and at the last place Perforce was used, and no pull-request procedure was in place). So we don't let pull-requests get merged back into the master branch, unless all of the existing tests pass, and usually require new tests to be written against the new features added. Then we can simplify the automated release builds so that they just run off of the master branch whenever we decide, and Jenkins automates those builds, and validates that they pass timing before officially publishing the build and making it available for consumption.
In a very simplistic Jenkins setup, you could have all of the Jenkins config on a virtual machine, and setup your existing Linux build system as an execution host for Jenkins, then you'd basically be using the Jenkins framework to simply automate your simulations and builds, on the same host(s) that you'd usually perform your builds. You can then easily extend that to multiple hosts through Jenkins, and it will do some level of load balancing across them, however I don't think that the Jenkins scheduling system is super rich on load balancing features across hosts (most likely not as rich as a grid scheduling system would be), but I've found it to be sufficient for the use case I have. If it's not sufficient for you, then the job that Jenkins starts can still simply send the jobs to your grid, and have that grid manage the load distribution for you.
Hopefully this is useful, sorry for the rambling!
02-19-2020 06:56 AM
Right, my goal isn't to rebuild every time, but to be able to rebuild when I need to.
Right now, we have one installation of ISE that we pull working images from, and one installation of ISE that generates broken images. That doesn't stop us from working, but it means that only the person who has the working installation can build images, and no one else can modify VHDL sources and test them. That is slightly unsatisfactory, to say the least.
That's why I want to move to a process where the "working" installation is one that is owned by no one, because it lives on some server in the basement and is reset for every build. If a project compiles there and produces a working bitstream, then that becomes the "known good" state, and the fact that the only way to get any information into this box is to check it into version control means that there is a good chance that a new workstation can be set up to produce working binaries from that state as well.
02-19-2020 07:08 AM
I'll just throw out there (what appears to me), the old school way of solving these things. Xilinx installs, (and ALL our EDA software for that matter) is installed on an NFS drive by one user, who verifies the integrity of the install. The new tool is then declared ok, and everyone can use it. The NFS share is exported to as many machines as we desire. This includes virtual servers. So, only one tool install shared across many machines.
Multiple machines are managed by SGE/LSF/other grid software/load balancing software.
Also note, all our design directories are on mounted NFS shares as well. Different machines producing different results are pretty much impossible. (Or at most a very rare exception)
02-19-2020 07:09 AM
@simonrichter I would investigate why one machine creates working builds, and the other one doesnt. How do you know it doesnt work? is it throwing errors during the build process? or is it just that the bitfile doesnt "work"? If its the latter, you've likely got timing problems from an incomplete specification of FPGA timing requirements. If a build meets timing and builds, then it should always "work" from any machine. If it doesnt, you have a problem with the source.
02-19-2020 07:10 AM
Hmm, so you are saying that on one machine, it's ISE installation generates desired results, whereas on the other machine, it's ISE installtion generates different results? I don't have any experience with ISE, but I have had extensive experience in looking at repeatability of build results in both Quartus and in Vivado, and my experience has been that so long as you are running the exact same source code (absolutely no changes, not even comment changes) and you are running the exact same version of Vivado, then you can recreate the exact same MD5SUM of the bitstream using Jenkins automation (note that Jenkins has a "Fingerprint" feature that is very handy for tracking if bitstreams are identical between builds or not, just make sure the bitstream doesn't have a timestamp in the header before you compare it).
In all of my experiences, we've always had the Vivado (or Quartus) toolset on a network mounted drive and had all the various versions of the tool we need loaded onto that drive, then each build simply controls which version it runs with it's environment and scripts. Maybe moving the ISE installtion to a network mounted drive would solve some of your consistency problems? That being said, I also don't have any build repeatability experience with ISE, so there is a chance that it fundamentally isn't repeatable. Although repeatability isn't really a concern in the case where you "store away" known good bitstreams that can then be referenced by a version number. The only reason repeatability would be a concern in that setup, would be if you encountered some bug in the field that required an incredibly deep root cause analysis, in which case you'd want to try and recreate it exactly from source code and dig into the details, which also may or may not be something necessary in each situation.
02-19-2020 07:13 AM
02-19-2020 07:17 AM
Regarding Vivado producing an EXACT (i.e md5sum output match) binary from the exact same source files. Have you actually done this, or just heard about it?
As far as I was aware, this process is theoretically possible - more so in Vivado, than was so in ISE. But still requires many hoops to jump through. I.e. one must turn off ALL multi-threaded abilities throughout the entire implementation process- sacrificing sometimes signficant build time optimizations. Plus some other hoops.
Pretty pointless IMHO, although I know some industries have some sort of certification process requiring such a confimation of the process. For us, we have, at bare-minimum a timestamp buried in the bitstream. So we'll never produce an exact bit match. And we really don't care. As long as it mathes (LEC wise) the RTL description, and passes timing, things are good.
02-19-2020 07:27 AM
@markcurry I have actually tracked this in Jenkins myself, with both Quartus and Vivado if I recall correctly. It was at my previous place of employment (haven't been there for 18 months), so I don't recall all of the particular details, but I do recall needing to specifically generate the correct version of the bitstream in Vivado that didn't have a timestamp in the header so that I could accurately track this.
It wasn't because of an industry requirement to do such, but more of an OCD CI/CD metric that I could track and know which builds were identical to which, coupled with a few nasty bugs that bit us in the backside and being able to recreate them exactly would have greatly sped up their root cause analysis.
I don't recall needing to turn off a bunch of features to make bitfiles repeatable, but it's been awhile since I went down that hole so I'll admit that I don't recall all the exact details.
You are absolutely right though, MOST folks likely won't ever care about exact repeatability, and in my current place of employment, it's not important, and we therefore aren't setup to track it at all.
02-19-2020 07:50 AM
The bitstream compiles fine, but doesn't work (as in, it stops the host machine the FPGA is attached to from booting by stepping on the IRQ line and keeping the CPU in an interrupt loop), so it is most likely a constraint not being processed.
My feeling is that someone modified a file in the installation. My initial suspect was fpga.flw, since that is responsible for pulling in the (singular) UCF file during an xflow based build, but that's not it, since we build from within ISE and don't use xflow at all (as far as I can see) -- and since we have two UCF files in the source folder and neither has the name expected by fpga.flw.
I will have to compare the "working" installation against a fresh one, that is already clear -- the end goal is to reach a state where I have written documentation on how to install ISE onto a new machine, get the sources, build an image and test it, and the ideal outcome is that this documentation is also machine-readable.
02-19-2020 08:29 AM
Ive just skim read all these posts.
As I understnad it, you have two machines that run ISE, and out of each withthe same source, you get two different bit files.
When you use one bit file, your design runs, when you use the other your design fails,
Is this consistant build to build, one machine fails one dosn't ?
Do both the build put out the same warnings ? look in the log file.
ISE is an old dog, that has various randomisations in it to "prevent" the tools geting stuck looking for a false minimum.
ISE also does not run on windows 10, and the VM version is in my experiance very variable .
Re W10, it might work one day, the next fail, for no apparant reason. Let alone the W10 update system. My advise if you have the VM verion or trying to run under W10 is don't.
Get your own version of a VM, with support, and install windows 7, and all its patches, and then install ISE 14.7.
Then run that VM on as many machines as you want. results in my experiance then will be consistant.