cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Anonymous
Not applicable
3,300 Views

re-synthesis required (re-configuration)

The FPGA is providing the synthesis as a service with the disadvantage that its is only done once and then the user cannot use the fabric for anything else unless re-synthesising requiring to have another computer operating system and the ISE/Vivado program and the re-design and that takes about a few weeks usually because a re-synthesis is more like a redesign than a programing modification.

 

The idea of having an array that one can route all elements to any wished is great but that this incredible advancement stays "limited" by the fixed synthesis that cannot be undone in run-time (unless having the FGPA pretend its a processor and then its no longer a FPGA like for the ZYNQ).

 

It is time to ask Xilinx to move forward for a new generation FPGA that have the synthesis done within runtime (but that would involve finding a way to have the synthesis done by the FPGA instead of an external bulky programm (Vivado etc..) ?

 

Is that feasible?  Its the only way the FPGA could become the state of the art in every computing device?

 

0 Kudos
4 Replies
Highlighted
Mentor
Mentor
3,288 Views
Registered: ‎04-26-2015

Re: re-synthesis required (re-configuration)

The Zynq isn't an FPGA pretending to be a processor. It's a processor with an FPGA stuck to it. The processor is not programmable logic, and none of the programmable logic is used for the processor.


Xilinx already has a generation of FPGA where the synthesis can be done at runtime: it's called the XC3000. I reckon any modern PC could complete a synthesis and implementation run for one of these more-or-less instantly. Of course, by modern standards they're pretty tiny - and you'll have a hard time finding them for sale.

Running synthesis/implementation on the FPGA is not a magic bullet. Yes, some steps are highly parallel and you could potentially drop the time by an order of magnitude or two - so your time drops from several hours to several minutes. That assumes that the design (a) actually fits on the chip, and (b) fits on there reasonably easily, without requiring a huge number of iterations and/or human assistance. For realtime you want it closer to several milliseconds. Several steps are also not parallel, and would not work well on an FPGA.

Hardware-acceleration of Vivado has been discussed before on these forums, and it's something that I'd definitely like to see. Cutting the build time down from a few hours to maybe half an hour would mean that I could do 6 - 8 design iterations in a day rather than two. Preventing Vivado from occupying all my CPU time would also help a lot. However, this is not a solution for realtime processing.

Incidentally, this is no different to normal PC software. For big software projects in C, you let the compiler run for a couple of hours/days in order to produce an executable for a specific type of system. You don't expect to do the compile-and-run at runtime. Interpreted languages (which are compiled at runtime) are effectively the same as HDL simulation - they save compilation time but they're not quick.

0 Kudos
Highlighted
Anonymous
Not applicable
3,255 Views

Re: re-synthesis required (re-configuration)

You write "It's a processor with an FPGA stuck to it" please look at the FPGA chip it has the ZYNQ inside not outside

 

then : "Xilinx already has a generation of FPGA where the synthesis can be done at runtime: it's called the XC3000"

May I ask you to read: https://www.xilinx.com/support/documentation/data_sheets/3000.pdf

I spend 2 years working on XC4000 under Iain Mc Nally and the EEPROM is outside not inside

 

You also write "I reckon any modern PC could complete a synthesis and implementation run for one of these more-or-less instantly" I run the ISE 4 in many minutes and the FPGA instant is in nS not minutes may I remind you, but the point is you need an external device (computer and synthesis program) to do that and that external device is very slow relatively to the FPGA that can run 400 pins data streams in and out with operations in nS (1 nano second is 10 ^-9 second one billionth) and that contradicts what you wrote "Yes, some steps are highly parallel and you could potentially drop the time by an order of magnitude" May I remind you that the fastest Analog Device Blackfin or TI Tigershark DSP have switchover time upon interrupts that are >1uS and if you compare that to 400 running at say 400Mhz there is at least 5-6 magnitudes not one and that is per 1 chip count

 

then "For realtime you want it closer to several milliseconds" yes that was true for the Commodore64 but on a modern server you have perhaps 100 users that are accessing the machine then what?  each user has perhaps just one HTML but its already several thousand elements that in 0x86 assembler each take many instructions so that its causing the lags experienced making the FPGA real in demand to help accelerate that, no 1mS is not fast in FPGA terms its 5-6 magnitudes slower?

 

Finally for the time of compilation the FPGA needs a Human + External PC + FPGA  + EEPROM to allow for what was once configured (in the past it was burnt in the CPLD with UV light) to be configured once more. Similar to a program that does that each single clock cycle? Besides the Vivado is not as fast as the normal software development IDE, I recently done C# with Visual Studio 2015 and its almost instantly compiled? But that is not the point, the point is to have the FPGA re-synthesize instantly depending on the user code

 

We are talking about HDL code here so the FPGA would have the logic gates "glued" dynamically which would allow for one FPGA to perform 1000 tasks instead of like it is now 1000 FPGA with all 1 single synthesized task defined

 

I now spend 2 years re-training in Software and see what they do that the FPGA really has to learn to do in next generations and its now it has to happen, its a need a real need so I would appreciate if people would be less complacent, therefor saying that the FPGA is re-programmable is a blatant lie because it is programmable only and the re-programming needs external hardware (EEPROM) and even a human on the top of it so it can be done in a few hours or weeks but the device  itself (inside only) is incapable of re-programming FYI

 

I really would thank everyone at Xilinx to have maintained that unique in the world architecture and it is about us that use it to prove that it can sustain and also fulfill the needs out there in industry

 

Would it not be wonderful if one could bridge the gap and have the routing within the FPGA defined by code instead of the synthesis force only one routing of the elements and then force the same "hard coded" gate array configuration

 

 

0 Kudos
Highlighted
Mentor
Mentor
3,239 Views
Registered: ‎04-26-2015

Re: re-synthesis required (re-configuration)

>> please look at the FPGA chip it has the ZYNQ inside not outside

You mean on Xilinx's pretty diagram of the Zynq?

https://www.xilinx.com/content/dam/xilinx/imgs/block-diagrams/zynq-mp-core-dual.png

Yes, it's got the Zynq Processing System (ie the CPU) inside the chip - but the chip is not the FPGA. Only the Programmable Logic side of it is the FPGA.

The Zynq Ultrascale+ has a Mali 400 GPU in the chip; are you going to tell me that it's just a GPU that's pretending to be a CPU and an FPGA as well?

>> I spend 2 years working on XC4000 under Iain Mc Nally and the EEPROM is outside not inside

How is that relevant? The EEPROM location has no effect on the ability to run synthesis.

>> I run the ISE 4 in many minutes and the FPGA instant is in nS not minutes may I remind you, but the point is you need an external device (computer and synthesis program) to do that and that external device is very slow relatively to the FPGA

I did say "on a modern PC". Give me a synthesis tool for the XC3000/XC4000 that works on a Core i7, and it'll finish synthesis pretty much instantly. Even a small Zynq 7020 design finishes almost instantly. Obviously on the 286s and 386s that were common at the time of the XC3000/XC4000, synthesis would take a good deal longer.

Regarding the external PC: isn't the whole point of this to make the FPGA accelerate PC tasks? So you'll always have the PC available anyway.

>> that can run 400 pins data streams in and out with operations in nS (1 nano second is 10 ^-9 second one billionth) and that contradicts what you wrote "Yes, some steps are highly parallel and you could potentially drop the time by an order of magnitude" May I remind you that the fastest Analog Device Blackfin or TI Tigershark DSP have switchover time upon interrupts that are >1uS and if you compare that to 400 running at say 400Mhz there is at least 5-6 magnitudes not one and that is per 1 chip count

I must admit that I am very, very confused. How does the number of pins have any relevance here? Speed of data getting into or out of the chip is not the limiting factor for synthesis and implementation; if it was then it'd be done in about 100ms (how long it takes to get all the HDL code into the CPU from the HDD, and get the bitstream back out).

An FPGA can do operations in 1ns, but a modern CPU can do operations in 200ps (1ps = 10^-12 seconds, or one trillionth of a second). It's completely irrelevant unless you're (a) doing the same operations, and (b) only doing one of them. Similarly, how does the switchover time to an interrupt have any relevance to synthesis time? It's like saying "my torch takes only 1ns to turn on when I flick the switch, therefore it's quicker than a DSP!"

>> then "For realtime you want it closer to several milliseconds" yes that was true for the Commodore64 but on a modern server you have perhaps 100 users that are accessing the machine then what?  each user has perhaps just one HTML but its already several thousand elements that in 0x86 assembler each take many instructions so that its causing the lags experienced making the FPGA real in demand to help accelerate that, no 1mS is not fast in FPGA terms its 5-6 magnitudes slower?

OK, I can live with demanding times shorter than a few milliseconds. So now we want it closer to several microseconds (ie a full synthesis and implementation run in maybe 1000 FPGA clock cycles), or even nanoseconds (a full synthesis and implementation run in 1 - 2 clock cycles). This is going to be really, really hard.

Consider a modern PC running synthesis. Mine (quad-core approx. 3.3GHz Xeon) takes about two hours to do a synthesis/implementation run on a modest Zynq 7045 design. It appears that a CPU of this standard manages around 117 billion instructions per second (at an average of 0.008ns each). This gives us a very rough estimate of 800 trillion instructions for a synthesis and implementation run. If you shift this to an FPGA doing one instruction per nanosecond, then you're looking at a synthesis/implementation time of 800,000 seconds or about nine days. Even if you can make a thousand processing cores on the FPGA, each dealing with one instruction every nanosecond, you're still looking at over ten minutes.

>> Finally for the time of compilation - you still don't get it - the FPGA needs a Human + External PC + FPGA  + EEPROM to allow for what was once configured (in the past it was burnt in the CPLD with UV light) to be configured once more. Similar to a program that does that each single clock cycle?

No, I definitely don't get it. An FPGA could easily do the build in an automated way (just need a suitable TCL script). You don't need the EEPROM; if you're expecting to change the image at run-time then you can just load the SRAM cells directly. A program does not reload itself in every clock cycle.

>> Besides the Vivado is not as fast as the normal software development IDE, I recently done C# with Visual Studio 2015 and its almost instantly compiled? But that is not the point, the point is to have the FPGA re-synthesize instantly depending on the user code

Compiling for a CPU is a far, far easier task than building for an FPGA. On a CPU you're just converting sequential instructions in C into sequential instructions in machine code. What you get out is a program with some (previously unknown) size that uses some (unknown) amount of RAM - and that's just fine. After all, a desktop PC has more or less unlimited RAM and HDD space, so you don't need to be too careful about conserving them. Even so, building a large program for a CPU, on a CPU, takes a lot of time. A couple of hours isn't unusual for a Linux kernel compile.

On an FPGA, you're first converting HDL code into a netlist. This is very similar to C, and would work fine in parallel systems. Then you have to take that and squeeze it into a specific amount of space on the chip, with limitations on which things can be connected to which other things, and limitations on each type of resource. Then, once you've finally got that sorted out, you realise that the signals can't get between all of the required parts (the Route step), so you have to rip it all up and try again. And again. And again. Finally you get the routing done, and then find that while the signals reach the required places, they don't do so at the right speed. So now you rip it all up again, and place it all again, and route it all again. These steps are less parallelism-friendly; since every change affects other hardware a parallel system would spend most of its time communicating between cores/threads.

The short version is that CPU compilation is an "easy" problem; in fact it's probably just about linear time. FPGA synthesis is easy, but optimal place & route is an NP-hard problem - which means that actually coming up with a solution is impractical. Instead we have approximate iterative solutions (as used by Vivado), which allow a tradeoff between time taken (number of iterations) and likelihood of approaching an optimal result.

 

 

With regards to an FPGA instantly updating to whatever code the user was writing - what purpose would this have? 99.99% of users are never, ever going to be writing HDL code, so there would be nothing for the FPGA to do. From those who are writing it, very few are going to want it to run immediately. Their code will be for an embedded system, and it'll only make sense to run that on the embedded system.

 

It seems more logical to have the FPGA automatically become an accelerator of whatever sort is required. There are definitely ways to do that, but a full synthesis and implementation run is not the right way to go about it - in the same way that when you start Excel, it doesn't sit there and compile the whole program first. It'd make much more sense to have a selection of accelerator cores (eg. "HTML accelerator", "Javascript accelerator", "the Excellerator", etc) which are pre-compiled and can be loaded on demand. This is very much how DLLs work on desktop PCs. Implementing this on a Zynq is already possible without any help from Xilinx; but nobody's bothered to write those accelerators yet.

0 Kudos
Highlighted
Anonymous
Not applicable
3,086 Views

Re: re-synthesis required (re-configuration)

I agree with you that it would be neat if the FPGA could have software to allow it to used as an accelerator

 

You mention ""HTML accelerator", "Javascript accelerator", "the Excellerator"" I am amazed because that is exactly what really is most needed, they have the web pages now that are dynamic with animations and 3D contents and games and other things going on that make the server system come to the limit?

 

Inspired by what you told I am going to try in the next 30 days to build a prototype and post it and see if that works to have a stream with JS or HTML or calculus tasks to be thrown in the FGPA and have the synthesis made in such way that the routing is redefinable in accordance to the code 

 

0 Kudos