UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor liu2015@
Visitor
297 Views
Registered: ‎09-06-2018

A question about DSP48E2 1.75X performance in wp486

Hi

In wp486, it says "As a result, 8 DSP slices here perform 7x2 INT8 multiply-add operations, 1.75X the INT8 deep learning operations compared to
competitive devices with the same number of multipliers."

How to caculate  the 1.75x ?

0 Kudos
2 Replies
Highlighted
Xilinx Employee
Xilinx Employee
224 Views
Registered: ‎09-18-2018

Re: A question about DSP48E2 1.75X performance in wp486

Hi,

The Xilinx devices mentioned in the WP486 are US and U+devices containing a DSP slice which has a 27X18 bit multiplier. This 27 bit provides the advantage of performing 2 INT8 Multiply -Add operations. But only 7 product terms can be accumulated using a single DSP slice. Just to ensure overflow does not occur an additional DSP slice is required. Thus 7 DSP slices are used to compute a 2 -INT8 Multiply Add operation on each slice and 1 extra DSP slice to contain overflow. How the 2 INT8 Multiply Add operations are acheived is described in page 2,3, and 4 of WP486 with figure 2 and Figure 3.

The other competitior devices have only 18x19 bit Multipliers .At a minimum one of the inputs to the multiplier needs to be at least 24-bits and the carry accumulator needs to be 32-bits to perform two INT8 MACC concurrently on one DSP slice. So clearly 2 DSP slices are required to perform 2 INT8 MACs here.

So for 7x2 INT8 MACC on competitor , it requires 14 devices. Compared to this only 8 DSP slices required on Xilinx FPGA which is a improvement of 1.75x (14/1.75) over competitiors.

Visitor liu2015@
Visitor
157 Views
Registered: ‎09-06-2018

Re: A question about DSP48E2 1.75X performance in wp486

Hi,vkanchan,thx for reply.

I have already knew that how the 2 INT8 Multiply Add operations are acheived.

I still can not understand about “But only 7 product terms can be accumulated using a single DSP slice”.  I think only 4 product terms can be accumulated using a single DSP slice.

There are 2-bits remaining between the lower and upper product terms( Figure3, wp486), thus only 4 product terms can be accumulated.

And how the extra DSP slice can contain overflow?

 

0 Kudos