Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Community Forums
- :
- Forums
- :
- About Our Community
- :
- General Technical Discussion
- :
- Re: 80x80 bit multiplication

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

rudy

Explorer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-14-2021 02:54 PM

802 Views

Registered:
04-29-2010

80x80 bit multiplication

Hi,

Is there any Xilinx IP core that performs a very wide number of bit multiplication (such as 80-bit x 80bit), by breaking it down to several smaller multiplication, over multiple clock cycles?

Or, there is no such an IP, and we need to manually break down such wide multiplication to smaller ones over multiple clock cycles ourselves?

8 Replies

markcurry

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-19-2021 11:15 AM - edited 07-19-2021 11:17 AM

715 Views

Registered:
09-16-2009

I'd start with just inferring from RTL, and see what that gets you. I'm reasonably confident Vivado will build the multiplier with reasonable efficiency. If it meets your needs, you're done.

Regards

Mark

dpaul24

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-19-2021 12:34 PM

692 Views

Registered:
08-07-2014

@rudy ,

*Or, there is no such an IP, and we need to manually break down such wide multiplication to smaller ones over multiple clock cycles ourselves?*

For 7 series FPGAs there is the SDP48E1 Slice and the latest one is the LogiCORE™ DSP Macro.

So 80x80 is definitely very big and I do not know about any Xilinx macro that wide. I guess you have to break it down at the RTL level and let the tool do the inference.

------------FPGA enthusiast------------

Consider giving "Kudos" if you like my answer. Please mark my post "Accept as solution" if my answer has solved your problem

Asking for solutions to problems via PM will be ignored.

drjohnsmith

Teacher

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-19-2021 12:43 PM

679 Views

Registered:
07-09-2009

Def try the inference,

just remember , to include a good few pipeline registers on the output so the tools can push back into the DSP block.

also be careful about reset, best is not to ,

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>

avrumw

Expert

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-19-2021 03:05 PM

642 Views

Registered:
01-23-2009

The Xilinx IP "Multiply" (mult_gen) will generate multipliers with inputs that are wider than the native DSPs. However, they appear to be limited to 64 bit inputs (for what appears to be no particularly good reason).

The DSP48s are designed to be able to cascaded to create wider functions. Of particular interest is the Z_MUX options for P >> 17 and PCIN >> 17 - the value 17 is significant since the multiplier is a 25x**18** multiplier (where the top bit is the sign bit), so doing some decomposition of your inputs on 17 bit boundaries (i.e. in multiples of 2^17) allows you to sum some partial products... This is the basis of the wide multiplication implemented in the mult_gen - it is fairly easy to see how you can generate a 25xN multiplier (where N is any value) by cascading and pipelining a number ciel(N/17) DSP48 cells using the PCIN >> 17 path.

I would start with mult_gen and ask for a 64x64 multiplier, study the connections between the DSP48s (and the OPMODEs) and then extend the structure to 80 bits. A quick configuration of the mult_gen shows that it can implement a 64x64 multiplier with 16 DSP48 cells and a latency of 18 clock cycles...

Avrum

markcurry

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-19-2021 04:05 PM - edited 07-19-2021 04:06 PM

620 Views

Registered:
09-16-2009

For kicks, I coded up a quick example with RTL inference. Unsigned multiply (80bit*80bit) = 160 bit product.

15 stages of pipeline on the input arguments, and output product.

Code is little more than:

` wire [ _PRODUCT_WIDTH - 1 : 0 ] product = a_selected_sign_extend * b_selected_sign_extend;`

(Plus pipeline registers, not shown)

Result:

Easily hit 500 MHz (0.775 ns slack) (KU15P) (Synthesis)

25 DSP48s. That seems excessive, but I've not really thought it through too much.

As I said, if it meets your needs, just go with simple.

Regards,

Mark

Bernard2154

Newbie

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-21-2021 04:22 AM - edited 07-22-2021 10:05 PM

456 Views

Registered:
07-21-2021

joancab

Teacher

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-21-2021 05:19 AM

447 Views

Registered:
05-11-2015

It's not much difficult to implement any N-bit multiplier. First you can divide any number into the form:

x = Sa + b

Where S is kind of a shift operator, so a is the higher bits and b the lower.

A product of two such numbers becomes:

`(Sa + b)(Sc + d) = SSac + Sad + Sbc + bd`

So, if you divide each number into K pieces (here K = 2) you need K^2 products that can be done in parallel and then you add them up (can be pipelined) taking into account the shift indicated by S.

Thao25

Newbie

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-30-2021 11:04 PM - edited 08-01-2021 08:56 PM

181 Views

Registered:
07-30-2021