Big improvements were made to the DSP48 slice in the new Xilinx UltraScale architecture while maintaining backwards compatibility with the DSP slice in the Xilinx 7 series All Programmable device generation. A simplified UltraScale DSP42E2 slice looks like this:
There are two of these DSP48E2 slices per DSP tile in the UltraScale architecture. Many significant improvements were made to the DSP48E2 slice. Some key improvements include:
The multiplier has been expanded to 27x18 bits (from 25x18 in the 7 series’ DSP48E1).
The pre-adder has been expanded to 27 bits (from 25 in the 7 series’ DSP48E1).
The pre-adder can now accept input from either the A or B inputs in addition to the D input.
The output of the pre-adder can be squared.
The ALU now accepts a fourth operand through the W multiplexer.
The XOR block can operate as an octal 12-bit XOR, a quad 24-bit XOR, a dual 48-bit XOR, or one 96-bit XOR.
For DSP applications, the UltraScale architecture’s DSP48E2 slice can implement complex 18x18-bit multipliers with three slices (1.5 DSP tiles) and 18x27-bit complex multipliers with four slices (two DSP tiles). The new 96-bit XOR unit allows you to use the DSP48E2 slices to create efficient accelerators for ECC, forward error correction (FEC), and CRC computations in a variety of wired and wireless communications applications.
You cannot see all of these improvements and the many others made to the DSP48E2 slice in the simplified block diagram, so here’s the full-blown block diagram taken from the UltraScale Architecture DSP Slice User Guide, UG579. (I’m not about to explain this block diagram in a blog. Please check out the User Guide for much more detailed information.)
The DSP48E2 slice is one of the significant ways that the UltraScale architecture delivers the fastest DSP processing while consuming fewer routing and CLB resources than ever.