10-04-2016 11:57 AM
using Spartan 6 - 100.
1- Is it possible to make 16 bit division in one clock cycle(no pipelining , the result needs to be ready in the next clock cycle)
2- Is it possible to make 8 bit division in one clock cycle.(no pipelining , the result needs to be ready in the next clock cycle)
PS: Numbers are 16 bit random numbers. Not constants. I need a true 16 bit division.
10-04-2016 12:15 PM
10-04-2016 12:32 PM
You can use a large look up table (or tables) to get an answer in one clock cycle. For two 8 bit numbers, that is 16 bits of address (65,536) possible results, so for an 8 bit result, that is 64K bytes of BRAM used as a giant lookup table (requires 4 36k bit BRAMS).
In the early days of FPGA devices, many tricks like I described above have been used. Another one is to use a look up table to create the 1/divsior value, and multiply that by the quotient (use a DSP48 in one clock to get result). This uses less BRAM, but requires two clocks (use the PLL do double the system clock to get back to one cycle).
10-04-2016 12:51 PM
Last time I checked, 64K bytes took 16 36K BRAMs, although this is still reasonable in a newer device. What you can do in a single clock cycle obviously depends on the clock period. Large BRAM arrays can result in long route lengths, so it's not just a matter of the BRAM's max clock rate.
10-04-2016 05:29 PM - edited 10-04-2016 05:29 PM
It depends on clock frequency.
If clock frequency is too low like under 1MHz, it works fine w/o LUT and/or pipe line.
So, I recommend to organize your requirements.
10-05-2016 02:26 AM
As @watari has said, with a low enough clock speed you can definitely do 16-bit division in one clock cycle. You have two options: either you can do it with combinational logic or with faster-clocked sequential logic.
Combinational logic has the advantage that it's simple. I'm not sure about ISE, but Vivado can actually turn the "/" operator into a combinational divider - which greatly simplifies implementation. However, it's going to be a pretty substantial bit of hardware.
A sequential divider running at a higher clock speed (eg. 20MHz to have a division operation completed within one 1MHz cycle) would be much smaller, but somewhat more difficult to implement (due to the need for multiple clocks). One option might be to run the whole design at 20MHz and artificially delay outputs so it looks like 1MHz.
Of course, by far the most preferable option is to just write the code so that single-cycle division isn't required. Either remove division entirely or set up the algorithm to allow multiple-cycle delays.