02-08-2021 12:13 AM
I managed to create a few design using multiple SLR. Since then, I was wondering, as the name of the subject suggest:
Sorry if this question has already been asked, I did not found any answer on the research function of the forum.
Thank you in advance,
02-08-2021 03:50 AM
Please reference the following documents:
02-08-2021 05:20 AM
It's primarily a simple question of available technology. The big Virtex UltraScale+ chips have 35 billion transistors - more than ten times as many as an Intel Xeon Broadwell-E 10-core CPU. Is it possibly to build that as a monolithic chip? Probably. Is it possible to do it cost-effectively and reliably? Maybe not.
FPGAs are particularly challenging in that you need all the parts to be the same. With something like a CPU or GPU, you can provide more "stuff" than required (eg. twelve cores in what will be sold as a "10-core" CPU). If a bit doesn't work, you just disable that and sell it as a 10-core CPU. With an FPGA, because the timing is absolutely dependent on where every part is on the chip, if you disable a bit of block RAM on one chip and a different bit of block RAM on another chip, the two are no longer compatible - they will need different bitstreams and different layouts to achieve timing. To make this at least a bit more practical, it makes a lot of sense to build smaller chips (which are each less likely to have problems simply due to the lower complexity), test them, and then put together a bunch of them.
Added to that, the market for such huge chips is probably fairly small. I would expect that the die used in the big Virtex chips is probably also used in smaller Virtex and Kintex chips - but instead of having 2/3/4 of them, there'll be only one. That way Xilinx doesn't have to run a whole production line for a very small volume.
Added to that, having different dies means you can mix and match technology. The sort of chip suitable for ultra-dense, ultra-fast logic is probably not something you want interfacing to the outside world with all of its nasty high voltages, ESD, etc. If you have separate dies then you can use a larger, more robust manufacturing process for the I/O ports and a smaller, higher-performance process for the processing parts.
The SSI does make the chip less flexible, but that's not such a bad thing. It simplifies place & route (because the constraints help to define how the process must be done), and it prevents a situation where almost all the chip real estate is occupied with moving data between sections.
02-09-2021 08:29 PM
@mikerthanks a lot for the white papers! A few comment:
p.2 the title "The challenge of interconnecting multiple FPGA" assume that the only way to increase the capacity of current FPGA is to actually connect multiple FPGA. Is it the case? What is following in this white paper was essentially my question. The challenge is to increase I/O, ressources and at the same time to reduce latency. Multi-FPGA and SLR is really the best candidate over a bigger monolithic FPGA?
"Devices of this logic capacity are physically impossible to achieve in a 20nm monolithic solution, as they would exhibit unacceptable variation in performance and static power consumption"
I understand, this is a technological limitation then.
When looking at figure 3, the comparison is difficult (by the way what LC refere to, logic component?). For example, the Intel Stratix 10 Gx10M achieve 5.5M LE per die (for 2 dies).
@u4223374Thanks a lot for your detailed answer!
I understand better now.
"The SSI does make the chip less flexible, but that's not such a bad thing. It simplifies place & route (because the constraints help to define how the process must be done), and it prevents a situation where almost all the chip real estate is occupied with moving data between sections."
A design using pblock and well constrained on a monolithic device would not be as efficient (or better)?