UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

FPGA Advances Compact Spatial Multiplexing Filters for Massive MIMO wireless systems

by Xilinx Employee on ‎03-07-2017 12:46 PM (1,363 Views)

 

By Lei Guan, MTS Nokia Bell Labs (lei.guan@nokia.com)

 

Many wireless communications signal-processing stages, for example equalization and precoding, require linear convolution functions. Particularly, complex linear convolution will play a very important role in future-proofing massive MIMO system through frequency-dependent, spatial-multiplexing filter banks (SMFBs), which enable efficient utilization of wireless spectrum (see Figure 1). My team at Nokia Bell Labs has developed a compact, FPGA-based SMFB implementation.

 

 

Figure 1.jpg

 

Figure 1 - Simplified diagram of SMFB for Massive MIMO wireless communications

 

 

 

Architecturally, linear convolution shares the same structure used for discrete finite impulse response (FIR) filters, employing a combination of multiplications and additions. Direct implementation of linear convolution in FPGAs may not satisfy the user constraints regarding key DSP48 resources, even when using the compact semi-parallel implementation architecture described in “Xilinx FPGA Enables Scalable MIMO Precoding Core” in the Xilinx Xcell Journal, Issue 94.

 

From a signal-processing perspective, the discrete FIR filter describes the linear convolution function in the time domain. Because the linear convolution in the time domain is equivalent to multiplication in the frequency domain, an alternative algorithm—called “fast linear convolution” (FLC)—is good candidate for FPGA implementation. Unsurprisingly, such an implementation is a game of trade-offs between space and time, between silicon area and latency. In this article, we mercifully skip the math for the FLC operation (but you will find many more details in the book “FPGA-based Digital Convolution for Wireless Applications”). Instead, let’s take closer look at the multi-branch FLC FPGA core that our team created.

 

The design targets supplied by the system team included:

 

  • The FLC core should be able to operate on multi-rate LTE systems (5MHz, 10MHz and 20MHz).
  • Each data stream to an antenna pair requires a 160-tap complex asymmetric FIR-type linear convolution filter. For example, if we are going to transmit 4 LTE data streams via 32 antennas, we require 4´32 = 128 160-tap FIR filters.
  • The core should be easily stackable or cascadable.
  • Core latency should be less than one tenth of one time slot of an LTE-FDD radio frame (i.e. 50μsec).

 

Figure 2 shows the top-level design of the resulting FLC core in the Vivado System Generator Environment. Figure 3 illustrates the simplified processing stages at the module level with four branches as an example.

 

 

Figure 2.jpg

 

 

Figure 2 - Top level of the FLC core in Xilinx Vivado System Generator

 

 

Figure 3.jpg

 

 

Figure 3 - Illustration of multi-branch FLC-core processing (using 4 branches as an example)

 

 

 

The multi-branch FLC-core contains the following five processing stages, isolated by registers for logic separation and timing improvement:

 

  1. InBuffer Stage: This module caches the incoming continuous, slow-rate (30.72MSPS) data stream and reproduces the data in the form of bursty data segments at a higher processing rate (e.g., 368.64MSPS) so that functions in multiple branches in the later processing stages—such as FFT, CM and IFFT modules—can share the DSP48-sensitive blocks in a TDM manner, resulting in a very compact implementation. Our FPGA engineer built a very compact buffer based on four dual-port block RAMs, as shown in Figure 4.

 

Figure 4.jpg

 

Figure 4 - Simple Dual-Port RAM based input data buffer and reproduce stage


  1. FFT Stage: To save valuable R&D time at the prototyping stage, we used the existing Xilinx FFT IP-core directly. This core can be easily configured by the provided GUI and we choose pipelined streaming I/O to minimize the FFT core’s idle processing time. We also selected Natural order output ordering to maintain correct processing for the subsequent IFFT operation.
  2. Complex Multiplication (CM) Stage: After converting the data from the time domain to the frequency domain, we added a complex multiplication processing stage to perform convolution in the frequency domain. We implemented a fully pipelined complex multiplier using three DSP48 blocks at a latency cost of 6 clock cycles. We instantiated a dual-port, 4096-word RAM for storing eight FLC coefficient groups. Each coefficient group contains 512 I&Q frequency domain coefficients converted by another FFT-core. We implement multiple parallel complex multiplications using only one high-speed TDM-based CM to minimize DSP48 utilization.
  3. IFFT Stage: This module provides the IFFT function. It was configured similarly to the FFT module.
  4. OutBuffer Stage: At this stage, the processed data streams are interleaved at the data-block level. We passed this high-speed sequential data stream to 8 parallel buffer modules built using dual-port RAMs. Each module buffers and re-assembles the bursty segmental convolution data into a final data stream at the original data rate. Delay lines are required to synchronize the eight data streams.

 

Table 1 compares the performance of our FLC design and a semi-parallel solution. Our compact FLC core implemented with Xilinx UltraScale and UltraScale+ FPGAs creates a cost-effective, power-efficient, single-chip frequency dependent Massive MIMO spatial multiplexing solution for actual field trials. For more information, please contact the author.

 

 

Table 1.jpg

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.