09-25-2020 02:41 PM
Hi,
Is there any way to do better division in ultrascale+ devices?
I have a design running at 125 MHz (8ns Pulse) and I do a large number division (125000000/115200) and it consumes 43 LUTs and 130 CARRY8 and have 34.087ns data path delay and that leading to -26.233ns setup timing violation.
Regards
P.S. I do such division in my design to calculate clocks per bit for UART and is intended to alter on the fly.
09-26-2020 12:53 AM
If the calculation has both numerator and denominator as fixed, why even bother with it at all? does one of these ever change? how are you doing the division? are you using a divide IP core? Or simply a/b in the RTL?
If you can convert it into a A * (1/N) calculation where (1/N) is already calculated in fixed point, it will be much faster and much smaller.
09-26-2020 05:03 AM
Pretty much any approach to division will be better than the obvious approach (ie the "/" operator). It's easy (and compact) to make a divider that'll give you one bit of output per clock cycle. The absolute maximum that you'll generate is a 27-bit number, so you could have this division completed in under 30 cycles. I suspect that for a UART, a ~30 cycle setup delay would be irrelevant (30 cycles at 125MHz = ~1/36th of a bit at 115200bps).
09-26-2020 12:59 PM - edited 09-26-2020 01:00 PM
P.S. I do such division in my design to calculate clocks per bit for UART and is intended to alter on the fly.
09-27-2020 02:01 PM
09-27-2020 02:02 PM
09-27-2020 02:04 PM
09-27-2020 02:22 PM
for example.
lets say you had N / 4
Instead of this, you can work out what 1/4 is in fixed point (b0.01) and do a multiplication instead, because a multiply can easily be done in a single clock.
09-28-2020 05:29 AM
Hi @richardhead @avrumw and @u4223374
In ultrascale+ devices there are dsp blocks. Can I use it for this for efficient divison? If yes, Is there any Xilinx example for this?
09-28-2020 05:57 AM
The easiest way is to convert your divide into a multiply, as already suggested.
09-28-2020 07:04 AM
@richardhead Multiplying the way you mentioned helped to reduce data path delay from 34.087ns to 30.662 but not completely eliminate what is the requirement.
09-28-2020 08:41 AM
30ns is a very slow path, especially for a US+. This implies you have 0 pipelining in a long combinatorial path. 5ns should be an easily achievable path delay in a US+, with the DSPs able to go much faster than that.