cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
abyssion
Observer
Observer
792 Views
Registered: ‎06-24-2016

Why "Xilinx" choose to organize FPGAs into multiple SLR?

Dear community,

I managed to create a few design using multiple SLR. Since then, I was wondering, as the name of the subject suggest:

  • Why "Xilinx" choose to developed FPGA chip organized with several SLR?
    • Is it a fundamental limitation linked to the electronic design?
    • Does it make easier the synthesis of larger design?
    • UG872 mentioned the advantages, but why not just creating a large FPGA instead of some virtual FPGA connected with SLL?
  • I do not know if it is the good place to ask but since I only used the Xilinx devices I do not know if the Stacked Silicon Interconnect technology is only used by Xilinx or if other FPGA provider use it as well. I was then wondering, is this technology common in most high-end FPGA or is it only a Xilinx feature? If it is the case what would be the advantages over other technologies?

Sorry if this question has already been asked, I did not found any answer on the research function of the forum.

Thank you in advance,

Aby

0 Kudos
3 Replies
miker
Xilinx Employee
Xilinx Employee
748 Views
Registered: ‎11-30-2007

@abyssion 

Please reference the following documents:

  • Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency White Paper (WP380; v1.2)
  • Xilinx Multi-node Technology Leadership Continues with UltraScale+ Portfolio"3D on 3D" Solutions White Paper (WP472; v1.0)
Please Reply, Kudos, and Accept as Solution.
u4223374
Advisor
Advisor
722 Views
Registered: ‎04-26-2015

It's primarily a simple question of available technology. The big Virtex UltraScale+ chips have 35 billion transistors - more than ten times as many as an Intel Xeon Broadwell-E 10-core CPU. Is it possibly to build that as a monolithic chip? Probably. Is it possible to do it cost-effectively and reliably? Maybe not.

FPGAs are particularly challenging in that you need all the parts to be the same. With something like a CPU or GPU, you can provide more "stuff" than required (eg. twelve cores in what will be sold as a "10-core" CPU). If a bit doesn't work, you just disable that and sell it as a 10-core CPU. With an FPGA, because the timing is absolutely dependent on where every part is on the chip, if you disable a bit of block RAM on one chip and a different bit of block RAM on another chip, the two are no longer compatible - they will need different bitstreams and different layouts to achieve timing. To make this at least a bit more practical, it makes a lot of sense to build smaller chips (which are each less likely to have problems simply due to the lower complexity), test them, and then put together a bunch of them.

Added to that, the market for such huge chips is probably fairly small. I would expect that the die used in the big Virtex chips is probably also used in smaller Virtex and Kintex chips - but instead of having 2/3/4 of them, there'll be only one. That way Xilinx doesn't have to run a whole production line for a very small volume.

Added to that, having different dies means you can mix and match technology. The sort of chip suitable for ultra-dense, ultra-fast logic is probably not something you want interfacing to the outside world with all of its nasty high voltages, ESD, etc. If you have separate dies then you can use a larger, more robust manufacturing process for the I/O ports and a smaller, higher-performance process for the processing parts.

 

The SSI does make the chip less flexible, but that's not such a bad thing. It simplifies place & route (because the constraints help to define how the process must be done), and it prevents a situation where almost all the chip real estate is occupied with moving data between sections.

abyssion
Observer
Observer
625 Views
Registered: ‎06-24-2016

@mikerthanks a lot for the white papers! A few comment:

WP380

p.2 the title "The challenge of interconnecting multiple FPGA" assume that the only way to increase the capacity of current FPGA is to actually connect multiple FPGA. Is it the case? What is following in this white paper was essentially my question. The challenge is to increase I/O, ressources and at the same time to reduce latency. Multi-FPGA and SLR is really the best candidate over a bigger monolithic FPGA?

WP472

WP472 says:

"Devices of this logic capacity are physically impossible to achieve in a 20nm monolithic solution, as they would exhibit unacceptable variation in performance and static power consumption"

I understand, this is a technological limitation then.

When looking at figure 3, the comparison is difficult (by the way what LC refere to, logic component?). For example, the Intel Stratix 10 Gx10M achieve 5.5M LE per die (for 2 dies).

@u4223374Thanks a lot for your detailed answer!
I understand better now.

"The SSI does make the chip less flexible, but that's not such a bad thing. It simplifies place & route (because the constraints help to define how the process must be done), and it prevents a situation where almost all the chip real estate is occupied with moving data between sections."

A design using pblock and well constrained on a monolithic device would not be as efficient (or better)?

 

Best,

Aby

0 Kudos