03-03-2017 10:29 AM
I have a question regarding the Floating-Point IP 7.1 core and the latency parameter for the multiplication and division.
My question is actually very similar to this older post:
where the poster states his/her skepticism towards the possibility of implementing, say a division with a 0 latency and few resources.
I understand that many improvements have been done in FPGA architecture since the question was posted in 2011 but I am still curious of the feasibility of a full design with these cores.
I have great interest in having several mathematical operations calculated on the FPGA and I have previously always used fixed point logic to save resources and satisfy the precision need of each operation.
Nevertheless, late requirements are making me reconsider using floating point and therefore I am curious of how realistic is to have a design on an FPGA with 15+ divisions and 15+ multiplications, with double precision and 0 latency (combinational only) using the 7.1 IP core. I am actually more interested in having a low latency than a high throughput.
What are the costs, if actually possible, of using 0 latency?
If cost was not a problem and therefore the FPGA size and resources as well, using a xc7vx690 for example, will my max clock freq be limited to a really low value?
Will the resource consumption increase so much that it wont fit in the FPGA?
I am assuming that given that the selection of 0 latency for the division is possible on the IP core GUI.
Also that the post synthesis functional simulation is working properly for a simple example.
But I don't want to continue with the full design before having a better overview of limitations I will run into.
Ive read several papers (older) papers on low latency division on an FPGA that normally have 1-2 clock cycles latency and Ive searched through the whole IPcore documentation but there doesn't seem to be any information discouraging me from this doing it.
I thought Id post this here as my last filter before proceeding with the full design and simulation using these 0 latency operations.
03-03-2017 11:04 AM
@humtm low latency divider will be quite slow. It's quite easy to generate each kind of IP you need and implement them independently to get an estimate for the size and delay. Probably can be accomplished in less than an hour. Division is quite a sequential process and it's difficult to improve on it too much and it doesn't even benefit from DSP blocks.
03-05-2017 02:28 AM
I suspect that it'd make more sense to have a sequential divider clocked far higher than the "base" IP. Much lower resource usage, and probably a similar speed.
The combinational one may well have "zero-cycle latency", but when it can only pass timing at 5MHz it would be more correct to say "200ns latency". The sequential might take twenty cycles to run, but if the vastly reduced amount of work per clock cycle means it can run at 100MHz then you end up with the same 200ns latency.
03-06-2017 02:13 PM
a sequential divider with "Much lower resource usage" would be an iterative one so in addition to having a large latency, it would also have a large "initiation interval" ie non-pipelineable so one can only start and complete a new division every N cycles. A pipelined divider would have the same resource usage as a "zero latency" divider (ie quite large) but one can start a new division every cycle and after the initial pipeline fill, one gets a new result every cycle. I think it's important to note this distinction.