Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎06-19-2019

How to multiply two 1024-bit unsigned integers ?

I have to do a multiplication operation of two 1024-bit unsigned integers comming from serial communication (RS232) for the target board of BASYS 2. It seems that simply writing a verilog code multiplying these two numbers by (*) does not synthesize, or more precisely takes very long time to synthesize, at the end I think the resources of my board ( DSP) will not be enough for direct multiplication. What simple method/topology I can use to do such a big multiplication operation for varying inputs at the correct timing? And if I am doing the multiplication operation for smaller numbers at my serial communication system in one STATE, will the new method take more than one state? How will it perform in terms of timing when I am sending my outputs as character by character to the Putty screen ( 8-bit each character)


Edit: Thank you all for your help! untill now I have tried direct multiplication and fully serial add- and -shift multiplicaitons and they did not work. 

This is my target device resources:, it seems that I do not have even 

DSP in my FPGA, and my tries to implement use more than %700 ratio of the area! 

I think that I need a super-compressing method to do this kind of multiplication,

do you think that I can do it with any other algorithim or even using BRAM? 

0 Kudos
5 Replies
Registered: ‎05-21-2015

Two options I know of:

  • You can always implement a multiplication with a state machine.  Split the 1024x1024 multiplication into a polynomial ax^64 + bx^63 + ..., where x=2^16, do the various pieces and add them back together again
  • My Numerical Recipes in C book recommended turning exceptionally large multiplies into Fourier Transforms.  I've been successful using that algorithm from C, although I've never tried it in an FPGA.  I'm not quite certain how well (or poorly) it would work, but it might be worth trying.


Registered: ‎07-09-2009

If you have two number so 1024 bits in , how many bits out are you expecting ?

      answer 2048 , think about how much logic that is going to take as a single adder. ..

       look up booths multipliers,


You could try tw techniques,


a) serial multiplication, e.g.

b) look at splitting the multiplicatoin into parts, 



<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
Registered: ‎06-25-2010

please use Vivado HLS to write a multiplier IP. you can limit resource by dicrective ALLOCATION, detail at to ug902 ch1 "optimize the design"--"optimize for area". 

0 Kudos
Registered: ‎07-16-2009

Hello mustafaghanim,

Yes you can multiply two 1024-bit numbers in XC3S100E, but please keep in mind that it is one of the smallest FPGAs. If I looked correctly (Table 9 of it contains about 1920 flip-flops. So if you want to multiply two 1024 bit numbers you will need to store them in RAM (preferably BlockRAM since number of LUT is also limited).

As a prove, it can be done, I propose to use clasic pen and paper multiplication ( ). To make it work in FPGA few modification should be done.

First, since Spartan 3 multipliers support 18x18 bits, I would start by splitting 1024bit wide number into 18bit digits. 

Second modification is that I would not store all results of multiplications but only one partial sum. 

The proposed solution will need 4096bit memory (2x 1024-bit operands, 1x 2048 partial sum result), one 18x18 multiplier, one 36bit adder, 2 6bit counter for counting the position of digit and some FSM control logic (There you need to solve issue of cary for addition, so possibly another 6bit counter). 

Now, lets compute expected latency of proposed algorithm, first we need to know the number of digits in each operand (1024/18) which is 57. So for every digit of the second operand, we will need to perform 57 multiplication and addition. Since there is 57 digits in the second operand as well, 57*57 MAC operation will be necessary. 


Registered: ‎07-09-2009

You could also speed it up by using more of the multipliers available , and the ability to fast cascade,
<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>