01-02-2021 12:44 AM - edited 01-02-2021 01:13 AM
I'm trying to synthethis my IP which contains 3 arrays of 32 bits std_logic_vector, for a total of 10000 lines.
Does it mean I need an FPGA with at least 10000*32 = 320 000 flipflops ?
01-02-2021 01:37 AM - edited 01-02-2021 01:41 AM
If that data is registered I/O (ports), yes, that´s a very minimum figure just to hold the bits.
If that data is internal, it could be held in BRAM. It is unpractical to have such large I/Os, they cause routing problems (so, routing fails) It's preferable to serialize (stream) both input and output.
When you compile in HLS you get a resource usage estimation, check that.
01-02-2021 02:53 AM
It is internal. In fact, I serialize the input and output, but having averything not in BRAM allows me to make all my calculation in 1 cycle... with the BRAM I will have to serialize...
Where can I see the resource usage in vivado before synthesis ???
01-02-2021 03:47 AM - edited 01-02-2021 03:48 AM
Many people insist "I want my calculations fast, I want them in one cycle". This is latency = 1. Getting things done in 5 ns (period of 200 MHz) is in most cases unnecessary. Do you really need that? Such a scheme may be impossible or use a ridiculous amount of resources. Most so-called high speed data processing actually refer to the input and output data rate, that is, the II (initiation interval). You can, for example, feed data at 200 MHz (5 ns) and every data block take, say, 200 cycles and the output comes with 1 microsecond delay. That's probably fast enough to steer a hypersonic vehicle. What is your need for speed?
01-02-2021 04:06 AM
Yes Joancab, you are absolutely right. I realized that during this project.
In fact, it is an academic project, and even if we use several thousands cycles to do our calculations inside the IP, in comparison to the whole algorithm, it is ridiculous. Thanks four your help.
However, I can confirm, it just crashes when there is not enough ressources LOL
01-02-2021 04:09 AM
by the way, can you confirm that the calculation is roughly 1 flipflop per bit we want to store ? (in order to choose the right model of fpga for a project)
01-02-2021 04:19 AM
Many people come to FPGAs with the idea that are capable and fast, they can do everything, they beat processors, they have billions of transistors, blah, blah. Marketing, just marketing. FPGAs are capable and fast, but, like anything, have their limitations. One cannot just drop 1k layers of a DNN with 10k nodes each and expect that fitting and/or routing.
The problem with academic projects (not critizising the academic World) is that in the white board (black in my times) everything is possible. There is little extra effort in writing 1000 respect to 100. But in practice... in many cases is not just "I need ten times more of everything". I've done a similar mistake a number of times, starting with a big idea then realizing the real World limitations, and eventually going to a much smaller working model that I would later optimize and perfect to make gradually and slowly better. That's how prototypes and demonstrations are born and products get developed.
01-02-2021 05:37 PM
@Charlycop That is correct, but the flip-flops are not the problem here. It's like saying "a person is 40cm wide and 30cm long front-to-back, and you can safely stack them ten high (with a 10-storey building), so if we want to design a new city to rival Tokyo (~40M people) we need to find a block of land 1600m * 1200m. That will clearly fit everyone...". It will, as long as nobody ever needs to move. Once you add a place to live for each person, a place to work for each person, roads, train lines, public buildings, etc then each person needs quite a lot more space.
In your case, all the other logic that goes around the flip-flops is going to be the problem. How do you fill the flip-flops? With a data bus from an I/O port to every single flip-flop on the chip? And a way of enabling any 32 out of the 320,000 flip-flops that you're using? How do you get data back out of the flip-flops? If it's ever going to be serialised (eg. for output), then you're going to have a 32-bit 10,000:1 multiplexer which will have to be spread right across the chip. And then you've got the LUTs and DSPs to actually do the processing on the data. If each of your calculations requires a single 32-bit integer multiply - then that'll require 40,000 DSP slices. This will then require at least four XCVU13P chips, so $160K US.
@joancab Excellent rant! And something that definitely needed to be said. It's all very well for major governments to demand extreme resources (eg. I'm sure that if you told the NSA that you can crack RSA encryption, but you need a chip with a million DSP slices, then within a couple of months they'll have built a chip with a million DSP slices). For the rest of us, about 10% of the work is finding a nice algorithm - the other 90% is making compromises on the algorithm (or replacing it entirely), hardware, implementation, etc until it works.
01-03-2021 05:33 AM
@joancab you're right, but I think this is my mistake, I should have done the calculations before. This project is the way they chose to teach us how to make our ideas fit inside a real fpga... and clearly it's working