We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Showing results for 
Search instead for 
Did you mean: 

The UltraScale architecture offers many advantages for next-gen wireless designs. Think 5G

Xilinx Employee
Xilinx Employee
0 0 44.9K

Note: The following post is based on the article “Xilinx’s 20-nm UltraScale Architecture Advances Wireless Radio Applications” by Michael Pecot, Wireless Systems Architect at Xilinx, which appears in the hot-off-the press Xcell Journal, issue 87.


Upcoming 5G wireless communications systems will likely be required to support much wider bandwidths (200 MHz and larger) than the 4G systems used today, along with large antenna arrays, enabled by higher carrier frequencies, that will make it possible to build much smaller antenna elements. These so-called massive MIMO applications, together with more stringent latency requirements, will increase design complexity by an order of magnitude. Xilinx announced the 20nm UltraScale family of All Programmable devices at the end of last year. This new technology brings many advantages for wireless communications.


First, the UltraScale device family operates with 10-15% less static power compared with 7-series devices of the same size and 20-25% less dynamic power for similar designs. There is a performance advantage as well. The slowest-speed-grade UltraScale devices support designs with clock rates higher than 500 MHz while a mid-speed grade is required for 7-series devices.


 Xilinx has also significantly lowered SerDes power consumption in the UltraScale product line. UltraScale SerDes transceivers support 12.5Gbps throughput on even the slowest UltraScale speed grade, supporting JESD204B interfacing at its maximum speed. JESD204B interfaces will soon available on most DACs and ADCs of interest to wireless system designers.


However, it is essentially the improvements to the DSP48 slice and Block RAMs (BRAMs) that have the most impact on radio design architectures. These building blocks are especially useful for implementing radio digital front-end (DFE) applications.


Kintex UltraScale devices contain as many as 5,520 DSP48 slices. That’s almost three times more than the maximum count of 1,920 available in 7-series FPGAs. The Kintex UltraScale DSP-to-logic ratio is more closely aligned with what is typically required for DFE designs. Kintex UltraScale devices have eight to 8.5 DSP48 slices per 1K lookup tables (LUTs), while this number is only around six for 7-series devices. In addition, numerous enhancements were made to the DSP48E1 slice used in 7-series devices to produce the UltraScale architecture’s DSP48E2 slice.


All of these DSP-related factors make higher integration levels possible when designing with Kintex UltraScale devices. (See the previous Xcell Daily blog post “The UltraScale DSP48E2: More DSP in every slice.”) For example, you can implement a complete 8Tx/8Rx DFE system with instantaneous bandwidth of 80 to 100 MHz in a single midrange UltraScale FPGA. A two-chip solution is necessary using 7 Series devices with each 7-series chip effectively supporting a 4x4 system. (For a detailed functional description of such designs, read the Xilinx white paper WP445, “Enabling High-Speed Radio Designs with Xilinx All Programmable FPGAs and SoCs”.)



The BRAMs used in UltraScale devices employ two new advancements: hardware data cascading and dynamic power gating. Data multiplexers embedded between every upper and lower adjacent BRAM in a column allow you to build larger memories in a bottom-up fashion without additional use of logic resources. The cascade permits construction of large memories requiring more than one BRAM while simultaneously supporting minimal footprint, higher clock rate, and minimal power.



 UltraScale BRAM with Data Cascade.gif


UltraScale BRAM with Data Cascade



For example, a 16K memory storing 16-bit data is better implemented with eight BRAMs (36Kbits) configured as 16Kx2-bit for a 7-series device to avoid external data multiplexing. However, this design approach adds logic resources and latency and can negatively impact timing and routing congestion. This approach also enables all eight Block RAMs during RAM reads and writes, which increases dynamic operating power. A better solution employs a 2Kx16-bit memory configuration where only one BRAM is enabled for any given read or write operation, which reduces dynamic power by 87.5% relative to using 16Kx2-bit BRAM configurations. The cascade feature added to UltraScale BRAMs together with the new dynamic power-gating capability makes this preferred design approach possible.


To read the full version of Michael Pecot’s article, click here.