cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
rudy
Explorer
Explorer
802 Views
Registered: ‎04-29-2010

80x80 bit multiplication

Hi, 

Is there any Xilinx IP core that performs a very wide number of bit multiplication (such as 80-bit x 80bit), by breaking it down to several smaller multiplication, over multiple clock cycles? 

Or, there is no such an IP, and we need to manually break down such wide multiplication to smaller ones over multiple clock cycles ourselves? 

0 Kudos
8 Replies
markcurry
Scholar
Scholar
715 Views
Registered: ‎09-16-2009

I'd start with just inferring from RTL, and see what that gets you.  I'm reasonably confident Vivado will build the multiplier with reasonable efficiency. If it meets your needs, you're done.

Regards

Mark

dpaul24
Scholar
Scholar
692 Views
Registered: ‎08-07-2014

@rudy ,

Or, there is no such an IP, and we need to manually break down such wide multiplication to smaller ones over multiple clock cycles ourselves?

For 7 series FPGAs there is the SDP48E1 Slice and the latest one is the LogiCORE™ DSP Macro.

So 80x80 is definitely very big and I do not know about any Xilinx macro that wide. I guess you have to break it down at the RTL level and let the tool do the inference.

------------FPGA enthusiast------------
Consider giving "Kudos" if you like my answer. Please mark my post "Accept as solution" if my answer has solved your problem
Asking for solutions to problems via PM will be ignored.

0 Kudos
drjohnsmith
Teacher
Teacher
679 Views
Registered: ‎07-09-2009

Def try the inference,

   just remember , to include a good few pipeline registers on the output so the tools can push back into the DSP block.

   also be careful about reset, best is not to ,

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
avrumw
Expert
Expert
642 Views
Registered: ‎01-23-2009

The Xilinx IP "Multiply" (mult_gen) will generate multipliers with inputs that are wider than the native DSPs. However, they appear to be limited to 64 bit inputs (for what appears to be no particularly good reason).

The DSP48s are designed to be able to cascaded to create wider functions. Of particular interest is the Z_MUX options for P >> 17 and PCIN >> 17 - the value 17 is significant since the multiplier is a 25x18 multiplier (where the top bit is the sign bit), so doing some decomposition of your inputs on 17 bit boundaries (i.e. in multiples of 2^17) allows you to sum some partial products... This is the basis of the wide multiplication implemented in the mult_gen - it is fairly easy to see how you can generate a 25xN multiplier (where N is any value) by cascading and pipelining a number ciel(N/17) DSP48 cells using the PCIN >> 17 path. 

I would start with mult_gen and ask for a 64x64 multiplier, study the connections between the DSP48s (and the OPMODEs) and then extend the structure to 80 bits. A quick configuration of the mult_gen shows that it can implement a 64x64 multiplier with 16 DSP48 cells and a latency of 18 clock cycles...

Avrum

markcurry
Scholar
Scholar
620 Views
Registered: ‎09-16-2009

For kicks, I coded up a quick example with RTL inference.  Unsigned multiply (80bit*80bit) = 160 bit product.

15 stages of pipeline on the input arguments, and output product.

Code is little more than:

 

 wire [ _PRODUCT_WIDTH - 1 : 0 ] product = a_selected_sign_extend * b_selected_sign_extend;

 

(Plus pipeline registers, not shown)

Result:

Easily hit 500 MHz (0.775 ns slack) (KU15P) (Synthesis)

25 DSP48s.  That seems excessive, but I've not really thought it through too much.

As I said, if it meets your needs, just go with simple.

Regards,

Mark

Bernard2154
Newbie
Newbie
456 Views
Registered: ‎07-21-2021

I am pretty much pleased with your good work. You put really very helpful information. Keep it up. Keep blogging.

 

Paycheckrecords

0 Kudos
joancab
Teacher
Teacher
447 Views
Registered: ‎05-11-2015

It's not much difficult to implement any N-bit multiplier. First you can divide any number into the form:

x = Sa + b

Where S is kind of a shift operator, so a is the higher bits and b the lower.

A product of two such numbers becomes:

(Sa + b)(Sc + d) = SSac + Sad + Sbc + bd

So, if you divide each number into K pieces (here K = 2) you need K^2 products that can be done in parallel and then you add them up (can be pipelined) taking into account the shift indicated by S.

Thao25
Newbie
Newbie
181 Views
Registered: ‎07-30-2021

Multiplication can be performed done exactly as with decimal numbers, except that you have only two digits (0 and 1). The only number facts to remember are that 0*1=0, and 1*1=1 (this is the same as a logical "and"). In this case the result was 7 bit, which can be extended to 8 bits by adding a 0 at the left.

 

krogerfeed

0 Kudos