cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
fpgalearner
Voyager
Voyager
1,080 Views
Registered: ‎04-11-2016

Division in ultrascale+ devices

Hi,

Is there any way to do better division in ultrascale+ devices?

I have a design running at 125 MHz (8ns Pulse) and I do a large number division (125000000/115200) and it consumes 43 LUTs and 130 CARRY8 and have 34.087ns data path delay and that leading to -26.233ns setup timing violation.

Regards 

P.S. I do such division in my design to calculate clocks per bit for UART and is intended to alter on the fly.

0 Kudos
11 Replies
richardhead
Scholar
Scholar
1,045 Views
Registered: ‎08-01-2012

If the calculation has both numerator and denominator as fixed, why even bother with it at all? does one of these ever change? how are you doing the division? are you using a divide IP core? Or simply a/b in the RTL?

If you can convert it into a A * (1/N) calculation where (1/N) is already calculated in fixed point, it will be much faster and much smaller.

u4223374
Advisor
Advisor
1,029 Views
Registered: ‎04-26-2015

Pretty much any approach to division will be better than the obvious approach (ie the "/" operator). It's easy (and compact) to make a divider that'll give you one bit of output per clock cycle. The absolute maximum that you'll generate is a 27-bit number, so you could have this division completed in under 30 cycles. I suspect that for a UART, a ~30 cycle setup delay would be irrelevant (30 cycles at 125MHz = ~1/36th of a bit at 115200bps).

 

avrumw
Expert
Expert
1,004 Views
Registered: ‎01-23-2009

P.S. I do such division in my design to calculate clocks per bit for UART and is intended to alter on the fly.

There are relatively few "legal" Baud rates for a UART - there are something like 14 of them. Rather than do the division, why not just have a table of the proper divider for each of the valid UART Baud rates. 
 
Furthermore, most of the Baud rates are related - there is a whole stream of them that are 300*2^N, so if you have the divisor for 300, then you just divide it by 2^N for the other rates. Taking the 300*2^N ones out, that leaves you with something like 6 others (a few of which are related).
 
A divider seems like overkill here...
 
Avrum
fpgalearner
Voyager
Voyager
957 Views
Registered: ‎04-11-2016

@richardhead simply a/b in the RTL.
numerator fixed and denominator varying. What do you mean by this? where (1/N) is already calculated in fixed point
0 Kudos
fpgalearner
Voyager
Voyager
957 Views
Registered: ‎04-11-2016

@u4223374 can you explain it little bit?
absolute maximum that you'll generate is a 27-bit number
0 Kudos
fpgalearner
Voyager
Voyager
956 Views
Registered: ‎04-11-2016

@avrumw I am using minicom for my test as a uart terminal + 16550 UART from TI and there are possible baudrates:
300, 1200, 2400, 4800, 9600, 19200, 38400, 56000, 57600, 115200, 128000, 230400, 460800, 500000, 576000, 921600, 1000000, 1152000, 1500000, 2000000, 2500000, 3000000, 3500000, 4000000,
0 Kudos
richardhead
Scholar
Scholar
941 Views
Registered: ‎08-01-2012

@fpgalearner 

for example.

lets say you had N / 4

Instead of this, you can work out what 1/4 is in fixed point (b0.01) and do a multiplication instead, because a multiply can easily be done in a single clock.

0 Kudos
fpgalearner
Voyager
Voyager
885 Views
Registered: ‎04-11-2016

Hi  @richardhead @avrumw and @u4223374 

In ultrascale+ devices there are dsp blocks. Can I use it for this for efficient divison? If yes, Is there any Xilinx example for this? 

0 Kudos
richardhead
Scholar
Scholar
876 Views
Registered: ‎08-01-2012

@fpgalearner 

The easiest way is to convert your divide into a multiply, as already suggested.

0 Kudos
fpgalearner
Voyager
Voyager
863 Views
Registered: ‎04-11-2016

@richardhead    Multiplying the way you mentioned helped to reduce data path delay from 34.087ns to 30.662 but not completely eliminate what is the requirement.

0 Kudos
richardhead
Scholar
Scholar
846 Views
Registered: ‎08-01-2012

30ns is a very slow path, especially for a US+. This implies you have 0 pipelining in a long combinatorial path. 5ns should be an easily achievable path delay in a US+, with the DSPs able to go much faster than that.

0 Kudos