02-18-2019 07:10 PM
02-23-2019 09:18 AM
The Xilinx devices mentioned in the WP486 are US and U+devices containing a DSP slice which has a 27X18 bit multiplier. This 27 bit provides the advantage of performing 2 INT8 Multiply -Add operations. But only 7 product terms can be accumulated using a single DSP slice. Just to ensure overflow does not occur an additional DSP slice is required. Thus 7 DSP slices are used to compute a 2 -INT8 Multiply Add operation on each slice and 1 extra DSP slice to contain overflow. How the 2 INT8 Multiply Add operations are acheived is described in page 2,3, and 4 of WP486 with figure 2 and Figure 3.
The other competitior devices have only 18x19 bit Multipliers .At a minimum one of the inputs to the multiplier needs to be at least 24-bits and the carry accumulator needs to be 32-bits to perform two INT8 MACC concurrently on one DSP slice. So clearly 2 DSP slices are required to perform 2 INT8 MACs here.
So for 7x2 INT8 MACC on competitor , it requires 14 devices. Compared to this only 8 DSP slices required on Xilinx FPGA which is a improvement of 1.75x (14/1.75) over competitiors.
02-28-2019 11:04 PM
Hi，vkanchan，thx for reply.
I have already knew that how the 2 INT8 Multiply Add operations are acheived.
I still can not understand about “But only 7 product terms can be accumulated using a single DSP slice”. I think only 4 product terms can be accumulated using a single DSP slice.
There are 2-bits remaining between the lower and upper product terms( Figure3, wp486), thus only 4 product terms can be accumulated.
And how the extra DSP slice can contain overflow?