The multi-GHz processing capabilities of Xilinx FPGAs never fails to amaze me and the following video from National Instruments (NI) demonstrating the real-time signal-generation and analysis capabilities of the NI PXIe-5840 VST (Vector Signal Transceiver) are merely one more proof point. The NI VST is designed for use in a wide range of RF test systems including 5G and IoT RF applications, ultra-wideband radar prototyping, and RFIC testing. In the demo below, this 2nd-generation NI VST is generating an RF signal spanning 1.2GHz to 2.2GHz (1GHz of analog bandwidth) containing five equally spaced LTE channels. The analyzer portion of the VST is simultaneously and in real time demodulating and decoding the signal constellations in two of the five LTE channels.
The resulting analysis screen generated by NI’s LabVIEW software tells the story:
The reason that the NI PXIe-5840 VST can perform all of these feats in real time is because there’s a Xilinx Virtex-7 690T FPGA inside pulling the levers, making this happen. (NI’s 1st-generation VSTs employed Xilinx Virtex-6 FPGAs.)
Here's the 2-minute video of the NI VST demo:
Please contact National Instruments directly for more information on its VST family.
For additional blogs about NI’s line of VSTs, see:
You want to learn how to design with and use RF, right? Students from all levels and backgrounds looking to improve their RF knowledge will want to take a look at the new ADALM-PLUTO SDR USB Learning Module from Analog Devices. The $149 USB module has an RF range of 325MHz to 3.8GHz with separate transmit and receive channels and 20MHz of instantaneous bandwidth. It pairs two devices that seem made for each other: an Analog Devices AD9363 Agile RF Transceiver and a Xilinx Zynq Z-7010 SoC.
Analog Devices’ $149 ADALM-PLUTO SDR USB Learning Module
Here’s an extremely simplified block diagram of the module:
Analog Devices’ ADALM-PLUTO SDR USB Learning Module Block Diagram
However, the learning module’s hardware is of little use without training material and Analog Devices has already created dozens of online tutorials and teaching materials for this device including ADS-B aircraft position, receiving NOAA and Meteor-M2 weather satellite imagery, GSM analysis, listening to TETRA signals, and pager decoding.
Pentek has published the 10th edition of “Putting FPGAs to Work in Software Radio Systems,” a 90-page tutorial written by Rodger H. Hosking, Pentek’s Vice-President & Cofounder. As the preface of this tutorial guide states:
“FPGAs have become an increasingly important resource for software radio systems. Programmable logic technology now offers significant advantages for implementing software radio functions such as DDCs (Digital Downconverters). Over the past few years, the functions associated with DDCs have seen a shift from being delivered in ASICs (Application-Specific ICs) to operating as IP (Intellectual Property) in FPGAs.
“For many applications, this implementation shift brings advantages that include design flexibility, higher precision processing, higher channel density, lower power, and lower cost per channel. With the advent of each new, higher-performance FPGA family, these benefits continue to increase.
“This handbook introduces the basics of FPGA technology and its relationship to SDR (Software-Defined Radio) systems. A review of Pentek’s GateFlow FPGA Design Resources is followed by a discussion of features and benefits of FPGA-based DDCs. Pentek SDR products that utilize FPGA technology and applications based on such products are also presented.”
Pentek has long used Xilinx All Programmable devices in its board-level products and that long experience shows in some unique, multi-generational analysis of the performance improvements in Xilinx’s Virtex FPGA generations starting with the Virtex-II Pro family (introduced in 2002) and moving through the Virtex-7 device family.
By Dr. Rajan Bedi, Spacechips
Several of my satcom ground-segment clients and I are considering Xilinx's recently announced RFSoC for future transceivers and I want to share the benefits of this impending device. (Note: For more information on the Xilinx RFSoC, see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable,” we mean it!”)
Direct RF/IF sampling and direct DAC up-conversion are currently being used very successfully in-orbit and on the ground. For example, bandpass sampling provides flexible RF frequency planning with some spacecraft by directly digitizing L- and S-band carriers to remove expensive and cumbersome superheterodyne down-conversion stages. Today, many navigation satellites directly re-construct the L-band carrier from baseband data without using traditional up-conversion. Direct RF/IF Sampling and direct DAC up-conversion have dramatically reduced the BOM, size, weight, power consumption, as well as the recurring and non-recurring costs of transponders. Software-defined radio (SDR) has given operators real scalability, reusability, and reconfigurability. Xilinx's new RFSoC will offer further hardware integration advantages for the ground segment.
The Xilinx RFSoC integrates multi-Gsamples/sec ADCs and DACs into a 16nm Zynq UltraScale+ MPSoC. At this geometry and with this technology, the mixed-signal converters draw very little power and economies of scale make it possible to add a lot of digital post-processing (Small A/Big D!) to implement functions such as DDC (digital down-conversion), DUC (digital up-conversion), AGC (automatic gain control), and interleaving calibration.
While CMOS scaling has improved ADC and DAC sample rates, which results in greater bandwidths at lower power, the transconductance of transistors and the size of the analog input/output voltage swing are reduced for analog designs, which impacts G/T at the satellite receiver. (G/T is antenna gain-to-noise-temperature, a figure of merit in the characterization of antenna performance where G is the antenna gain in decibels at the receive frequency and T is the equivalent noise temperature of the receiving system in kelvins. The receiving system’s noise temperature is the summation of the antenna noise temperature and the RF-chain noise temperature from the antenna terminals to the receiver output.)
Integrating ADCs and DACs with Xilinx's programmable MPSoC fabric shrinks physical footprint, reduces chip-to-chip latency, and completely eliminates the external digital interfaces between the mixed-signal converters and the FPGA. These external interfaces typically consume appreciable power. For parallel-I/O connections, they also need large amounts of pc board space and are difficult to route.
There will be a number of devices in the Xilinx RFSoC family, each containing different ADC/DAC combinations targeting different markets. Depending on the number of integrated mixed-signal converters, Xilinx is predicting a 55% to 77% reduction in footprint compared to current discrete implementations using JESD204B high-speed serial links between the FPGA and the ADCs and DACs, as illustrated below. Integration will also benefit clock distribution both at the device and system level.
Figure 1: RFSoC device concept (Source Xilinx)
The RFSoC’s integrated 12-bit ADCs can each sample up to 4Gsamples/sec, which offers flexible bandwidth and RF frequency-planning options. The analog input bandwidth of each ADC appears to be 4GHz, which allows direct RF/IF sampling up to the S-band.
Direct RF/IF sampling obeys the bandpass Nyquist Theorem when oversampling at 2x the information bandwidth (or greater) and undersampling the absolute carrier frequencies. For example, the spectrum below shows a 48.5MHz L-band signal centerd at 1.65GHz, digitized using an undersampling rate of 140.5Msamples/sec. The resulting oversampling ratio is 2.9 with the information located in the 24th Nyquist zone. Digitization aliases the bandpass information to the first Nyquist zone, which may or may not be baseband depending on your application. If not, the RFSoC's integrated DDC moves the alias to dc, allowing the use of a low-pass filter.
Figure 2: Direct L-Band Sampling
As the sample rate increases, the noise spectral density spreads across a wider Nyquist region with respect to the original signal bandwidth. Each time the sampling frequency doubles, the noise spectral density decreases by 3dB as the noise re-distributes across twice the bandwidth, which increases dynamic range and SNR. Understandably, operators want to exploit this processing gain! A larger oversampling ratio also moves the aliases further apart, relaxing the specification of the anti-aliasing filter. Furthermore, oversampling increases the correlation between successive samples in the time-domain, allowing the use of a decimating filter to remove some samples and reduce the interface rate between the ADC and the FPGA.
The RFSoC’s integrated 14-bit DACs operate up to 6.4Gsamples/sec, which also offers flexible bandwidth and RF frequency-planning options.
Just like any high-frequency, large bandwidth mixed-signal device, designing an RFSoC into a system requires careful consideration of floor-planning, front/back-end component placement, routing, grounding, and analog-digital segregation to achieve the required system performance. The partitioning starts at the die and extends to the module/sub-system level with all the analog signals (including the sampling clock) typically on one side of an ADC or DAC. Given the RFSoC's high sampling frequencies, at the pcb level, analog inputs and outputs must be isolated further to prevent crosstalk between adjacent channels and clocks, and from digital noise.
At low carrier frequencies, the performance of an ADC or DAC is limited by its resolution and linearity (DNL/INL). However at higher signal frequencies, SNR is determined primarily by the sampling clock’s purity. For direct RF/IF applications, minimizing jitter will be key to achieving the desired performance as shown below:
Figure 3: SNR of an ideal ADC vs analog input frequency and clock jitter
While there are aspects of the mixed-signal processing that could be improved, from the early announcements and information posted on their website, Xilinx has done a good job with the RFSoC. Although not specifically designed for satellite communication, but more so for 5G MIMO and wireless backhaul, the RFSoC's ADCs and DACs have sufficient dynamic range and offer flexible RF frequency-planning options for many ground-segment OEMs.
The specification of the RFSoC's ADC will allow ground receivers to directly digitize the information broadcast at traditional satellite communication frequencies at L- and S-band as well as the larger bandwidths used by high-throughput digital payloads. Thanks to its reprogrammability, the same RFSoC-based architecture with its wideband ADCs can be re-used for other frequency plans without having to re-engineer the hardware.
The RFSoC's DAC specification will allow ground transmitters to directly construct approximately 3GHz of bandwidth up to the X-band (9.6GHz). Xilinx says that first samples of RFSoC will become available in 2018 and I look forward to designing the part into satcom systems and sharing my experiences with you.
Dr. Rajan Bedi pioneered the use of Direct RF/IF Sampling and direct DAC up-conversion for the space industry with many in-orbit satellites currently using these techniques. He was previously invited by ESA and NASA to present his work and was also part of the project teams which developed many of the ultra-wideband ADCs and DACs currently on the market. These devices are successfully operating in orbit today. Last year, his company, Spacechips, was awarded High-Reliability Product of the Year for advancing Software Defined Radio.
Spacechips provides space electronics design consultancy services to manufacturers of satellites and spacecraft around the world. The company also helps OEMs assess the benefits of COTS components and exploit the advantages of direct RF/IF sampling and direct DAC up-conversion. Prior to founding Spacechips, Dr. Bedi headed the Mixed-Signal Design Group at Airbus Defence & Space in the UK for twelve years. Rajan is the author of Out-of-this-World Design, the popular, award-winning blog on Space Electronics. He also teaches direct RF/IF sampling and direct DAC up-conversion techniques in his Mixed-Signal and FPGA courses which are offered around the world. Rajan offers a series of unique training courses, Courses for Rocket Scientists, which teach and compare all space-grade FPGAs as well as the use of COTS Xilinx UltraScale and UltraScale+ parts for implementing spacecraft IP. Rajan has designed every space-grade FPGA into satellite systems!
By Lei Guan, MTS Nokia Bell Labs (email@example.com)
Many wireless communications signal-processing stages, for example equalization and precoding, require linear convolution functions. Particularly, complex linear convolution will play a very important role in future-proofing massive MIMO system through frequency-dependent, spatial-multiplexing filter banks (SMFBs), which enable efficient utilization of wireless spectrum (see Figure 1). My team at Nokia Bell Labs has developed a compact, FPGA-based SMFB implementation.
Figure 1 - Simplified diagram of SMFB for Massive MIMO wireless communications
Architecturally, linear convolution shares the same structure used for discrete finite impulse response (FIR) filters, employing a combination of multiplications and additions. Direct implementation of linear convolution in FPGAs may not satisfy the user constraints regarding key DSP48 resources, even when using the compact semi-parallel implementation architecture described in “Xilinx FPGA Enables Scalable MIMO Precoding Core” in the Xilinx Xcell Journal, Issue 94.
From a signal-processing perspective, the discrete FIR filter describes the linear convolution function in the time domain. Because the linear convolution in the time domain is equivalent to multiplication in the frequency domain, an alternative algorithm—called “fast linear convolution” (FLC)—is good candidate for FPGA implementation. Unsurprisingly, such an implementation is a game of trade-offs between space and time, between silicon area and latency. In this article, we mercifully skip the math for the FLC operation (but you will find many more details in the book “FPGA-based Digital Convolution for Wireless Applications”). Instead, let’s take closer look at the multi-branch FLC FPGA core that our team created.
The design targets supplied by the system team included:
Figure 2 shows the top-level design of the resulting FLC core in the Vivado System Generator Environment. Figure 3 illustrates the simplified processing stages at the module level with four branches as an example.
Figure 2 - Top level of the FLC core in Xilinx Vivado System Generator
Figure 3 - Illustration of multi-branch FLC-core processing (using 4 branches as an example)
The multi-branch FLC-core contains the following five processing stages, isolated by registers for logic separation and timing improvement:
Figure 4 - Simple Dual-Port RAM based input data buffer and reproduce stage
Table 1 compares the performance of our FLC design and a semi-parallel solution. Our compact FLC core implemented with Xilinx UltraScale and UltraScale+ FPGAs creates a cost-effective, power-efficient, single-chip frequency dependent Massive MIMO spatial multiplexing solution for actual field trials. For more information, please contact the author.
A simple press release last month from the UK’s U of Bristol announced a 5G Massive MIMO milestone jointly achieved by BT, the Universities of Bristol and Lund, and National Instruments (NI): serving 2Gbps to 24 users simultaneously using a 20MHz LTE channel. That’s just short of 100 bits/sec/Hz and improves upon today’s LTE system capacity by 10x. The system that achieved this latest LTE milestone is based on the same Massive MIMO SDR system based on NI USRP RIO dual-channel SDR radios that delivered 145.6 bps/Hz in 5G experiments last year. (See “Kapow! NI-based 5G Massive MIMO SDR proto system “chock full of FPGAs” sets bandwidth record: 145.6 bps/Hz in 20MHz channel.”)
According to the press release:
“Initial experiments took place in BT’s large exhibition hall and used 12 streams in a single 20MHz channel to show the real-time transmission and simultaneous reception of ten unique video streams, plus two other spatial channels demonstrating the full richness of spatial multiplexing supported by the system.
“The system was also shown to support the simultaneous transmission of 24 user streams operating with 64QAM on the same radio channel with all modems synchronising over-the-air. It is believed that this is the first time such an experiment has been conducted with truly un-tethered devices, from which the team were able to infer a spectrum efficiency of just less than 100bit/s/Hz and a sum rate capacity of circa two Gbits/s in this single 20MHz wide channel.”
The NI USRP SDRs are based on Xilinx Kintex-7 325T FPGAs. Again, quoting from the press release:
“The experimental system uses the same flexible SDR platform from NI that leading wireless researchers in industry and academia are using to define 5G. To achieve accurate, real-time performance, the researchers took full advantage of the system's FPGAs using LabVIEW Communications System Design and the recently announced NI MIMO Application Framework. As lead users, both the Universities of Bristol and Lund worked closely with NI to implement, test and debug this framework prior to its product release. It now provides the ideal foundations for the rapid development, optimization and evaluation of algorithms and techniques for massive MIMO.”
Here’s a BT video describing this latest milestone in detail:
National Instruments (NI) has just added two members to its growing family of USRP RIO SDRs (software-defined radios)—the USRP-2944 and USRP-2945—with the widest frequency ranges, highest bandwidth, and best RF performance in the family. The USRP-2945 features a two-stage superheterodyne architecture that achieves superior selectivity and sensitivity required for applications such as spectrum analysis and monitoring, and signals intelligence. With four receiver channels, and the capability to share local oscillators, this SDR also sets new industry price/performance benchmarks for direction-finding applications. The USRP-2944 is a 2x2 MIMO-capable SDR that features 160MHz of bandwidth per channel and a frequency range of 10 MHz to 6 GHz. This SDR operates in bands well suited to LTE and WiFi research and exploration.
NI USRP RIO Platform
Like all of its USRP RIO products, the NI USRP-2944 and USRP-2945 incorporate Xilinx Kintex-7 FPGAs for local, real-time signal processing. The Kintex-7 FPGA implements a reconfigurable LabVIEW FPGA target that incorporates DSP48 coprocessing for high-rate, low-latency applications. With the company’s LabVIEW unified design flow, researchers can create prototype designs faster and significantly shorten the time needed to achieve results.
Here’s a block diagram showing the NI USRP RIO SDR architecture:
USRP RIO Block Diagram
Adam Taylor just published an EETimes review of the Xilinx RFSoC, announced earlier this week. (See “Game-Changing RFSoCs from Xilinx”.) Taylor has a lot of experience with high-speed analog converters: he’s designed systems based on them—so his perspective is that of a system designer who has used these types of devices and knows where the potholes are—and he’s worked for a semiconductor company that made them—so he should know what to look for with a deep, device-level perspective.
Here’s the capsulized summary of his comments in EETimes:
“The ADCs are sampled at 4 Gsps (gigasamples per second), while the DACs are sampled at 6.4 Gsps, all of which provides the ability to work across a very wide frequency range. The main benefit of this, of course, is a much simpler RF front end, which reduces not only PCB footprint and the BOM cost but -- more crucially -- the development time taken to implement a new system.”
“…these devices offer many advantages beyond the simpler RF front end and reduced system power that comes from such a tightly-coupled solution.”
“These devices also bring with them a simpler clocking scheme, both at the device-level and the system-level, ensuring clock distribution while maintaining low phase noise / jitter between the reference clock and the ADCs and DACs, which can be a significant challenge.”
“These RFSoCs will also simplify the PCB layout and stack, removing the need for careful segregation of high-speed digital signals from the very sensitive RF front-end.”
“I, for one, am very excited to learn more about RFSoCs and I cannot wait to get my hands on one.”
For more information about the new Xilinx RFSoC, see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say ‘All Programmable,’ we mean it!” and “The New All Programmable RFSoC—and now the video.”
Yesterday, Xilinx announced breakthrough RF converter technology that allows the creation of an RFSoC with multi-Gsamples/sec DACs and ADCs on the same piece of TSMC 16nm FinFET silicon as the digital programmable-logic circuitry, the microprocessors, and the digital I/O. This capability transforms the Zynq UltraScale+ MPSoC into an RFSoC that's ideal for implementing 5G and other advanced RF system designs. (See “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say ‘All Programmable,’ we mean it!” for more information about that announcement.)
Today there’s a 4-minute video with Sr. Staff Technical Marketing Engineer Anthony Collins providing more details including an actual look at the performance of a 16nm test chip with the 12-bit, 4Gsamples/sec ADC and the 14-bit, 6.4Gsamples/sec DAC in operation.
Here’s the video:
To learn more about the All Programmable RFSoC architecture, click here or contact your friendly, neighborhood Xilinx sales representative.
Xilinx has just introduced a totally new technology for high-speed RF designs: an integrated RF-processing subsystem consisting of RF-class ADCs and DACs implemented on the same piece of 16nm UltraScale+ silicon along with the digital programmable-logic, microprocessor, and I/O circuits. This technology transforms the All Programmable Zynq UltraScale+ MPSoC into an RFSoC. The technology’s high-performance, direct-RF sampling simplifies the design of all sorts of RF systems while cutting power consumption, reducing the system’s form factor, and improving accuracy—driving every critical, system-level figure of merit in the right direction.
The fundamental converter technology behind this announcement was recently discussed in two ISSCC 2017 papers by Xilinx authors: “A 13b 4GS/s Digitally Assisted Dynamic 3-Stage Asynchronous Pipelined-SAR ADC” and “A 330mW 14b 6.8GS/s Dual-Mode RF DAC in 16nm FinFET Achieving -70.8dBc ACPR in a 20MHz Channel at 5.2GHz.” (You can download a PDF copy of those two papers here.)
This advanced RF converter technology vastly extends the company’s engineering developments that put high-speed, on-chip analog processing onto Xilinx All Programmable devices starting with the 1Msamples/sec XADC converters introduced on All Programmable 7 series devices way back in 2012. However, these new 16nm RFSoC converters are much, much faster—by more than three orders of magnitude. Per today’s technology announcement, the RFSoC’s integrated 12-bit ADC achieves 4Gsamples/sec and the integrated 14-bit DAC achieves 6.4Gsamples/sec, which places Xilinx RFSoC technology squarely into the arena for 5G direct-RF design as well as millimeter-wave backhaul, radar, and EW applications.
Here’s a block diagram of the RFSoC’s integrated RF subsystem:
Xilinx Zynq UltraScale+ RFSoC RF Subsystem
In addition to the analog converters, the RF Data Converter subsystem includes mixers, a numerically controlled oscillator (NCO), decimation/interpolation, and other DSP blocks dedicated to each channel. The RF subsystem can handle real and complex signals, required for IQ processing. The analog converters achieve high sample rates, large dynamic range, and the resolution required for 5G radio-head and backhaul applications. In some cases, the integrated digital down-conversion (DDC) built into the RF subsystem requires no additional FPGA resources.
The end result is breakthrough integration. The analog-to-digital signal chain, in particular, is supported by a hardened DSP subsystem for flexible configuration by the analog designer. This leads to a 50-75% reduction in system power and system footprint, along with the needed flexibility to adapt to evolving specifications and network topologies.
Where does that system-power reduction come from? The integration of both the digital and analog-conversion electronics on one piece of silicon eliminates a lot of power-hungry I/O and takes the analog converters down to the 16nm FinFET realm. Here’s a power-reduction table from the backgrounder with three MIMO radio example systems:
How about the form-factor reduction? Here’s a graphical example:
You save the pcb space needed by the converters and you save the space required to route all of the length-matched, serpentine pcb I/O traces between the converters and the digital SoCs. All of that I/O connectivity and the length matching now takes place on-chip.
To learn more about the All Programmable RFSoC architecture, click here or contact your friendly, neighborhood Xilinx sales representative.
Note: When we say “All Programmable” we mean it.
TI has created a power supply reference design for the Xilinx Zynq UltraScale+ MPSoC specifically for Remote Radio Head (RRH) and backhaul applications but there’s no reason you can’t use this design in any other design employing the Zynq UltraScale+ MPSoC. The compact reference design is based on TI’s TPS6508640 power-management IC (PMIC), which is a pretty sophisticated power supply controller, several power FETs, and a TPS544C25 high-current regulator. The TPS6508640 PMIC reduces board size, cost, and power loss using a high switching frequency and separate rails for core supplies.
The design creates ten regulated supply voltages for the Zynq UltraScale+ MPSoC based on a 12V source supply. Here’s what TI’s reference design looks like:
Here’s what the design looks like when placed on a pc board:
You’ll find a PDF describing this reference design in detail here.
Please contact TI directly for additional details.
Avnet has just announced the 1x1 version of its PicoZed SDR 2x2 SOM that you can use for rapid development of software-defined radio applications. The 62x100mm form factor for the PicoZed SDR 1x1 SOM is the same as that used for the 2x2 version but the PicoZed SDR 1x1 SOM uses the Analog Devices AD9364 RF Agile Transceiver instead of the AD9361 used in the PicoZed SDR 2x2 SOM. Another difference is that the 2x2 version of the PicoZed SDR SOM employs a Xilinx Zynq Z-7035 SoC and the 1x1 SOM uses a Zynq Z-7020 SoC.
Avnet’s Zynq-based PicoZed SDR 1x1 SOM
One final difference: The Avnet PicoZed SDR 1x1 sells for $549 and the PicoZed SDR 2x2 sells for $1095. So if you liked the idea of the original PicoZed SDR SOM but wished for a lower-cost entry point, your wish is granted, with immediate availability.
Do you have a big job to do? How about a terabit router bristling with optical interconnect? Maybe you need a DSP monster for phased-array radar or sonar. Beamforming for advanced 5G applications using MIMO antennas? Some other high-performance application with mind-blowing processing and I/O requirements?
You need to look at Xilinx Virtex UltraScale+ FPGAs with their massive data-flow and routing capabilities, massive memory bandwidth, and massive I/O bandwidth. These attributes sweep away design challenges caused by performance limits of lesser devices.
Now you can quickly get your hands on a Virtex UltraScale+ Eval Kit so you can immediately start that challenging design work. The new eval kit is the Xilinx VCU118 with an on-board Virtex UltraScale+ VU9P FPGA. Here’s a photo of the board included with the kit:
Xilinx VCU118 Eval Board with Virtex UltraScale+ VU9P FPGA
The VCU118 eval kit’s capabilities spring from the cornucopia of on-chip resources provided by the Virtex UltraScale+ VU9P FPGA including:
If you can’t build what you need with the VCU118’s on-board Virtex UltraScale+ VU9P FPGA—and it’s sort of hard to believe that’s even possible—just remember, there are even larger parts in the Virtex UltraScale+ FPGA family.
Innovative Integration has just announced the rugged K707 Digital Receiver, which pairs a quad-core Intel Core i7 microprocessor running 64-bit Linux with a Xilinx Kintex-7 K410T FPGA. The receiver accepts one or two 4-channel FMC-310 310Msamples/sec ADC modules, which provide the receiver with as many as six antenna inputs and 100MHz real-time bandwidth. (There’s an optional 3-18GHz tuner as well.)
The Kintex-7 FPGA implements 128 DDC (digital down-conversion) channels and a spectrum analyzer. Eight 16-channel DDC banks support monitoring of 128 DDC channels per FMC-310 module. Each DDC bank can select its own FMC-310 ADC and decimation rate and each DDC channel has its own programmable tuner and programmable low-pass filtering with bandwidths to 800KHz.
Innovative Integration’s rugged K707 Digital Receiver
The K7070 Digital Receiver packages output data in VITA-49 format with accurate timestamps, synchronous to an external PPS signal and attains a sustained logging rate up-to 1,300 Mbytes/sec (until you run out of disk space). An embedded digital power meter monitors any ADC input’s power allowing analog gain control of external front-end devices.
Innovative Integration supplies a development kit for the K707 Digital Receiver that permits you to create custom instrumentation for advanced applications in the form of user-developed VHDL cores that you instantiate in the Xilinx Kintex-7 FPGA.
For more information on the K707 Rugged Digital Receiver, contact Innovative Integration directly.
Analog Devices (ADI) introduced the AD9371 Integrated, Dual Wideband RF Transceiver back in May as part of its “RadioVerse.” You use the AD9371 for building extremely flexible, digital radios with operating frequencies of 300MHz to 6GHz, which covers most of the licensed and unlicensed cellular bands. The IC supports receiver bandwidths to 100MHz. It also supports observation receiver and transmit synthesis bandwidths to 250MHz, which you can use to implement digital correction algorithms.
Last week, the company started shipping FMC eval cards based on the AD9371: the ADRV9371-N/PCBZ and ADRV9371-W/PCBZ.
ADRV9371-N Eval Board for the Analog Devices AD9371 Integrated Wideband RF Transceiver
ADI was showing one of these new AD9371 Eval Boards in operation this week at the GNU Radio Conference held in Boulder, Colorado. The board was plugged into the FMC connector on a Xilinx ZC706 Eval Kit, which is based on a Xilinx Zynq Z7045 SoC. The Xilinx Zynq SoC and the AD9371 make an extremely powerful design combination for developing all sorts of SDRs (software-defined radios).
Appropriately located in the Glenn Miller Ballroom on the University of Colorado at Boulder campus, the GNU Radio Conference (GRCon) 2016 will focus on the latest research and the newest products to implement software-defined radio based on GNU Radio, a free and open software radio ecosystem. There’s a 1-day New Users’ Intro Day (Monday, September 12) with a simultaneous Advanced Track that day for non-beginners, followed by four more days of tutorials, presentations, and receptions. There’s also an area set up for vendor booths and demos. The conference organizers include the GNU Radio Foundation and the University of Colorado at Boulder.
The GRCon Diamond sponsor for event is Ettus Research, a National Instruments company. Ettus makes the dc-to-6GHz USRP (Universal Software Radio Peripheral) line of SDR platforms including the:
Why use FPGAs for SDR? Because SDR requires a lot of fast DSP and FPGAs have that.
For more information about Ettus Research USRPs, see:
MaXentric Technologies develops cutting-edge wireless products ranging from simple, low-cost millimeter broadband wireless transceivers, to passive RFID readers, to a high-efficiency envelope-tracking power amplifiers (ETPAs) for the military defense and telecommunications/broadcast commercial markets. The cutting edge for RF power amplifiers is all about linearity without wasting power. ETPAs address the increasingly crowded frequency spectrum and increasing demand for higher data rates through non-constant RF envelopes and high peak-to-average power ratios (PAPR). Simply put, ET technology allows the operator to use only as much power as necessary to provide the amplified output.
Communication systems using ETPAs require high linearity to minimize signal distortion, reduce bit error rate, improve spectral efficiency, and reduce adjacent channel interference. These requirements gave MaXentric Technologies many parameters to measure and optimize during ETPA development and the company required an automated testbench to test designs for various target applications that use the LTE and 5G wireless bands, GPS signaling, and military RF. The testbench had to be able to test any ETPA for any application, and all on the same day with simply a click to change the RF frequency.
The company selected a variety of PXI instruments from National Instruments (NI) to build the ETPA testbench and uses NI’s LabVIEW System Development Software to program the testbench hardware. An NI PXI-5646R Vector Signal Transceiver (VST) serves as the testbench’s RF signal generator and RF feedback analyzer, which permits the testbench to quickly switch RF frequencies from LTE, to GPS, to military. (NI’s 1st-generation PXI-5646R VST was based on a Xilinx virtex-6 FPGA. Its just-introduced 2nd-generation PXIe-5840 Vector Signal Transceiver is based on a Xilinx Virtex-7 690T FPGA, see “NI launches 2nd-Gen 6.5GHz Vector Signal Transceiver with 5x the instantaneous bandwidth, FPGA programmability.”) An NI PXIe-5451 Arbitrary Waveform Generator initially generated the envelope waveforms for the PA. The NI PXIe-5451 also incorporates an FPGA for digital flatness correction and inline waveform processing.
Here’s a semi-block diagram of the ETPA testbench:
MaXentric Technologies ETPA Testbench
MaXentric used this testbench to tune and optimize an LTE Band 1 (2.14 GHz) ETPA based on the company’s MaXEA 1.0 Integrated Envelope Modulator and a GaN RF transistor used as the power amplifier. The time alignment of the envelope amplitude and RF input signals to the ETPA is critical for optimized, efficient RF performance. Time misalignment between these two signals distorts the output signal, degrades ACPR (adjacent channel power ratio), and reduces efficiency. MaXentric could literally see the improvement or degradation in ETPA linearity and efficiency as the testbench altered the alignment between the RF signal and the envelope supply in real time.
The testbench-enabled tuning greatly improved the LTE ETPA’s linearity and efficiency. Before tuning, the GaN ETPA achieved 6W of output power with 11.5dB of gain and 53% power added efficiency (PAE). After optimization, the ETPA’s output power and gain remained the same with a slight improvement in PAE to 54.6% and with better than -45dBc ACPR—a highly desirable result.
According to MaXentric, using NI’s LabVIEW cut the development time for this ETPA testbench from nearly a year to two months.
Note: This project won in the Electronics and Semiconductor category at this month’s NI Engineering Impact Awards. You can read more details in the full case study here.
In May, a team of 5G engineers from the Universities of Bristol and Lund set a new world record for wireless spectrum efficiency—145.6 bps/Hz in a narrow 20MHz channel serving 22 clients—using a 5G Massive MIMO SDR system developed and prototyped with FPGA-based SDR equipment from National Instruments (NI). This result breaks the team’s own record set less than two months prior, which had been 79.4bps/Hz serving 12 clients. 5G system goals require this sort of extreme, breakthrough data-carrying efficiency to achieve 1000x more data bandwidth for customers.
128-Antenna 5G Massive MIMO System, with researchers Paul Harris and Siming Zhang
This latest 5G testbed uses a 128-antenna MIMO array (dubbed ARIES) coupled to 64 NI USRP RIO dual-channel SDR radios, which are based on Xilinx Kintex-7 325T FPGAs. The 64 USRP RIO units talk to multiple NI PXIe-7976R FlexRIO FPGA modules (based on Xilinx Kintex-7 410T FPGAs) plugged into NI PXIe chassis for further processing of the Massive MIMO signals. The USRP RIO units connect over high-speed PCIe cabling, through PCIe switch boxes, to NI PXIe-8384 interface modules plugged into the same chassis as the FlexRIO FPGA modules. (By my count, there are at least 68 Kintex-7 FPGAs in this system.) The entire system was developed with NI’s LabVIEW System Design Software. There’s a full description of the 5G Massive MIMO system here, a case study here, and here’s a system block diagram:
Massive MIMO SDR system developed by the Universities of Bristol and Lund using NI hardware and software
This project won big at last week’s NI Engineering Impact Awards. It won in its category, “Wireless and Mobile communications”; it won the “Powered by Xilinx” Award; it won the “Engineering Grand Challenges Award”; it won the “HPE Edgeline Big Analog Data Award”; and it won the “2016 Customer Application of the Year Award” from NI.
No surprise then that NI has introduced the world’s first Application Framework for Massive MIMO to speed innovation in 5G prototyping based on this groundbreaking research. You can scale 5G testbeds from 4 to 128 antennas using this NI 5G framework. More information is available in this 3-minute video:
For more information about NI’s USRP RIO SDR, see “Software-defined radio dev platform for 5G research handles MIMO, massive MIMO using Kintex-7 FPGA.”
To see additional uses for the NI PXIe-7976R FlexRIO FPGA modules, see “NI and Nokia Networks develop 10Gbps, FPGA-based mmWave SDR transceiver for 5G using new NI mmWave Transceiver System,” and “What do you need to build the world’s highest-performance, Real-Time Spectrum Analyzer? RADX says: NI, Xilinx, and Kintex-7 FPGAs.”
You can use Keysight Technologies’ U5340A FPGA Development Kit for High-Speed Digitizers to develop custom algorithms with high-speed Keysight digitizers with 8- to 12-bit resolution and sampling rates ranging from 1 to 4 Gsamples/sec. Keysight’s Giovanni Lucia has just published an article titled “Embedding custom real-time processing in a multi-gigasample high-speed digitizer” on Embedded.com that describes why you would want to develop custom processing algorithms directly into a high-speed digitizer and how to go about doing it.
Lucia explains that the custom processing algorithms running on an on-board FPGA speeds algorithmic execution and reduces the amount of data you’ll need to extract from the digitizer, reducing I/O and storage loads. A surprising revelation is that you can also insert encryption algorithms into the FPGA, which improves data security by never allowing unencrypted data to leave the digitizer. Development projects targeting rapidly changing markets such as 5G may well benefit from such security measures.
The article also enumerates the minimum features you should expect from an FPGA development kit designed for digitizers:
Keysight’s U5340A FPGA Development Kit is built on Mentor Graphics HDL Designer and ModelSim and Xilinx development tools and LogiCore IP. The kit is compatible with several Keysights digitizers including:
All of these high-speed Keysight digitizers have Xilinx Virtex-6 FPGAs on board for signal processing, both Keysight-written and custom.
Note: For more information about the Keysight U5340A FPGA Development Kit, see “Keysight ups PCIe and AXIe game—lets you add signal processing to the FPGAs in its high-speed digitizers.”
Today, National Instruments (NI) launched its 2nd-generation PXIe-5840 Vector Signal Transceiver (VST), which combines a 6.5GHz RF vector signal generator and a 6.5GHz vector signal analyzer in a 2-slot PXIe module. The instrument has 1GHz of instantaneous bandwidth and is designed for use in a wide range of RF test systems including 5G and IoT RF applications, ultra-wideband radar prototyping, and RFIC testing. Like all NI instruments, the PXIe-5840 VST is programmable with the company’s LabVIEW system-design environment and that programmability reaches all the way down to the VST’s embedded Xilinx Virtex-7 690T FPGA. (NI’s 1st-generation VSTs employed Xilinx Virtex-6 FPGAs.)
National Instruments uses this FPGA programmability to create varied RF test systems such as this 8x8 MIMO RF test system:
And this mixed-signal IoT test system:
For additional information on NI’s line of VSTs, see:
Epiq Solutions has just announced the Sidekiq SDR (software-defined radio) card in the diminutive M.2 form factor (30mm x 42mm x 4mm). It has a full 2x2 MIMO RF interface that covers 70MHz to 6GHz. Compared to the company’s MiniPCIe version, this card uses PCIe Gen2 to double the data rate while reducing the size by 20% using the M.2 form factor. The card is based on a Xilinx Artix-7 A50T FPGA, which implements the SDR signal processing and allows advanced users to add their own processing blocks to radically alter and increase the card’s signal processing capabilities. Epiq provides a Sidekiq Platform Development Kit (PDK) to customers with a software API for interfacing to the card and customizable source code for the FPGA-based SDR reference design.
Epiq Solutions’ Sidekiq SDR Card in an M.2 form factor, based on a Xilinx Artix-7 FPGA
The asteroid called 5G is speeding towards earth and if you’re involved in 5G infrastructure development, then fronthaul and fronthaul interfaces are likely to be in your targeting crosshairs. Comcores and Xilinx recently published a lengthy article on this topic titled “Fronthaul Evolution Toward 5G: Standards and Proof of Concepts.” (A shorter version of the article appears here.) These articles provide a detailed overview of innovations that are creating design challenges for with new radio interfaces. These innovations include massive MIMO, carrier aggregation, multi-band support, and radio-cell densification and they are driving all sorts of new advances including the development of a Next Generation Fronthaul Interface (NGFI) under the IEEE P1914.1 initiatives.
Xilinx now offers a variety of proof-of-concept designs for an Ethernet-to-CPRI Gateway, allowing immediate experimentation with NGFI challenges. In addition, Comcores has developed an I/Q switch platform based on Xilinx silicon technology and the companies’ collective fronthaul expertise. For more information, see this Xilinx Web page and for information on the Comcores CPRI I/Q Cross connect Switch, click here.
On Wednesday, June 15, MathWorks’ Noam Levine will be presenting a free Webinar titled “Streamline your Software-Defined Radio with Model-Based Design.” This webinar will show you how to use Model-Based Design with MATLAB and Simulink as a common design framework for developing SDR systems based on the Xilinx Zynq-7000 SoC.
Nutaq just posted a rather complex video showing its 2nd-generation PicoSDR 8x8 connected to a 16-element, 2D antenna array and processing a real-world eNB downlink. The hardware system is being controlled by Matlab’s LTE System Toolbox. Nutaq’s PicoSDR 8×8-E relies on one 0-6Ghz radio, built on AD9361 RFICs from Analog Devices and controlled by an on-board Xilinx Virtex-6 FPGA as shown in the following block diagram:
Here’s the LTE demo on video:
Please contact Nutaq for information about the PicoSDR 8x8.
Avnet has just rolled out its second FMC Carrier Card for the Avnet PicoZed SOM, which is based on a Xilinx Zynq-7000 SoC (a Z-7010, Z-7015, Z-7020, or Z-7030). The $349 PicoZed FMC Carrier Card V2 greatly expands the I/O capabilities of the PicoZed SOM with connector interfaces for the on-module Gigabit Ethernet PHY and USB PHY. The carrier card also has a micro SD card and USB-UART. The majority of the Zynq SoC’s programmable-logic I/O pins are brought out to an LPC FMC connector. In addition, an HDMI output port, real-time clock, high-performance clock synthesizer, two MAC ID EEPROMs and several Digilent-compatible Pmod connectors. The four serial transceivers on the 7015 and 7030 SOMs are allocated to a PCIe Gen2 x1 card edge interface, the previously mentioned FMC connector, an SFP+ cage for high-speed optical networking, and general-purpose SMA connectors. You’ll also find an HDMI output port, a real-time clock, and several Digilent-compatible Pmod connectors on the new PicoZed FMC Carrier Card V2.
Here’s a block diagram of the PicoZed FMC Carrier Card V2:
$349 Avnet PicoZed FMC Carrier Card V2 Block Diagram
The carrier card comes bundled with Wind River Pulsar Linux to speed embedded development, including the development of IoT—particularly industrial IoT (IIoT)—devices. With the FMC expansion connector, you can rapidly develop and prototype many types of embedded systems for vision and video, motor-control, and SDR applications.
Here’s a 5-minute Avnet video to explain things:
Even if you’re not especially interested in a PicoZed carrier card at the moment, it’s worth watching this video for some design tips embedded in it that you’ll find particularly interesting for your own Zynq-based hardware designs. At the 3-minute mark in the video, Avnet Project Engineer Dan Rozwood discusses some interesting specifics of a cost-reduced clocking system designed for this new carrier card based on an IDT programmable-clock IC. You don’t get design tips like this dropped on you every day, so take five minutes to grab this one.
For more information about the Avnet PicoZed SOM, see:
By Robin Getz, Analog Devices and Luc Langlois, Avnet Electronics Marketing
By integrating the critical RF signal path and high-speed programmable logic in a fully verified system-on-module (SOM), Avnet’s PicoZed SDR SOM delivers the flexibility of software-defined radio in a device the size of deck of cards, enabling frequency-agile, wideband 2x2 receive and transmit paths in the 70-MHz to 6.0-GHz range for diverse fixed and mobile SDR applications.
PicoZed SDR combines the Analog Devices AD9361 integrated RF Agile Transceiver with the Xilinx Z-7035 Zynq-7000 All Programmable SoC. The architecture is ideal for mixed software-hardware implementations of complex applications, such as digital receivers, in which the digital front end (physical layer) is implemented in programmable logic, while the upper protocol layers run in software on a dual-core ARM Cortex-A9 MPCore processor. Let’s look at the software-related features of the PicoZed SDR throughout the development process.
Leveraging the full potential of PicoZed SDR calls for a robust, multidomain simulation environment to model the entire signal chain, from the RF analog electronics to the baseband digital algorithms. This is the inherent value of Model-Based Design, a methodology from MathWorks that places the system model at the center of the development process, spanning from requirements definition through design, code generation, implementation and testing. Avnet worked with Analog Devices and MathWorks to develop a support infrastructure for PicoZed SDR in each facet of the design process, starting at the initial prototyping phase.
Using a MATLAB software construct called System objects, MathWorks created a support package for Xilinx Zynq-Based Radio that enables PicoZed SDR as an RF front end to prototype SDR designs right out of the box. Optimized for iterative computations that process large streams of data, System objects automate streaming data between PicoZed SDR and the MATLAB and Simulink environments in a configuration known as radio-in-the-loop, as shown in Figure 1.
Akin to concepts of object-oriented programming, System objects are created by a constructor call to a class name, either in MATLAB code or as a Simulink block. Once a System object is instantiated, you can invoke various methods to stream data through the System object during simulation. The Communications System Toolbox Support Package for Xilinx Zynq-Based Radio from MathWorks contains predefined classes for the PicoZed SDR receiver and transmitter, each with tunable configuration attributes for the AD9361, such as RF center frequency and sampling rate. The code example in Figure 2 creates a PicoZed SDR receiver System object to receive data on a single channel, with the AD9361 local oscillator frequency set to 2.5 GHz and a baseband sampling rate of 1 megasample/second (Msps). The captured data is saved using a log.
Analog Devices has developed the Libiio library to ease the development of software interfacing to Linux Industrial I/O (IIO) devices, such as the AD9361 on the PicoZed SDR SOM. The open-source (GNU Lesser General Public License V2.1) library abstracts the low-level details of the hardware and provides a simple yet complete programming interface that can be used for advanced projects. The library consists of a high-level application programming interface and a set of back ends, as shown in Figure 3.
As shown in Figure 4, the hardware-software co-design workflow in HDL Coder from MathWorks lets you explore the optimal partition of your design between software and hardware targeting the Zynq SoC. The part destined for programmable logic can be automatically packaged as an IP core, including hardware interface components such as ARM AMBA AXI4 or AXI4-Lite interface-accessible registers, AXI4 or AXI4-Lite interfaces, AXI4-Stream video interfaces, and external ports. The MathWorks HDL Workflow Advisor IP core generation workflow lets you insert your generated IP core into a predefined embedded system project in the Xilinx Vivado HLx Design Suite. HDL Workflow Advisor contains all the elements Vivado IDE needs to deploy your design to the SoC platform, except for the custom IP core and embedded software that you generate.
If you have a MathWorks Embedded Coder license, you can automatically generate the software interface model, generate embedded C/C++ code from it, and build and run the executable on the Linux kernel on the ARM processor within the Zynq SoC. The generated embedded software includes AXI driver code, generated from the AXI driver blocks, that controls the HDL IP core. Alternatively, you can write the embedded software and manually build it for the ARM processor.
Note: This article was abstracted from a much longer article that appeared in Xcell Software Journal, Issue 3.
Yesterday, National Instruments (NI) unveiled a new mmWave Transceiver System, which serves as a modular, reconfigurable SDR platform for 5G R&D projects. This prototyping platform offers 2GHz of real-time bandwidth for evaluating transmission systems designs in the mmWave E band, which is 71-76GHz for NI’s modular transmit and receive radio heads. You can prototype unidirectional and bidirectional single-antenna and MIMO systems using one or more pairs of these radio heads in conjunction with the transceiver system’s modular PXIe processing chassis.
National Instruments mmWave Transceiver System
The block diagram for this mmWave transceiver system shows that it relies heavily on FPGAs—specifically Xilinx All Programmable devices—to perform the required real-time processing in both the transmitter and receiver chains. Here’s the system block diagram:
National Instruments mmWave Transceiver System Block Diagram
The key to this system’s modularity is NI’s 18-slot PXIe-1085 chassis, which accepts a long list of NI processing modules as well as ADC, DAC, and RF transceiver modules. For the NI mmWave Transceiver System, critical processing modules include the NI PXIe-7976R FlexRIO FPGA module —based on a Xilinx Kintex-7 410T FPGA—and the NI PXIe-7902R FPGA module—based on a Xilinx Virtex-7 485T.
NI PXIe-7976R FlexRIO FPGA module based on a Xilinx Kintex-7 410T FPGA
NI PXIe-7902 FPGA module based on a Xilinx Virtex-7 485T
The NI mmWave Transceiver System maps the different mmWave processing tasks to multiple FPGAs, depending on the particular configuration, in a software-configurable manner using the company’s LabVIEW System Design Software, which provides deep hardware control even down into the FPGAs distributed in the system’s various PXIe processing modules. NI’s LabVIEW relies on the Xilinx Vivado Design Suite for compiling the FPGA configurations. The FPGAs distributed in the NI mmWave Transceiver System provide the flexible, high-performance, low-latency processing required to quickly build and evaluate prototype 5G radio transceiver systems in the mmWave band.
NI has posted a 2-minute video of 5G mmWave proof-of-concept work it’s done with Nokia Networks over the past year using early versions of the mmWave Transceiver System. Using this system, NI and Nokia Networks developed one of the first mmWave communication links capable of streaming data at 10GBps. The quick-prototyping nature of NI’s transceiver prototyping system along with the graphical LabVIEW development environment saved Nokia Networks a year’s development time (!!!), according to the estimate in this video:
By Lei Guan, Member of Technical Staff, Bell Laboratories, Nokia
Massive-MIMO wireless systems have risen to the forefront as the preferred foundation architecture for 5G wireless networks. A low-latency precoding implementation scheme is critical for enjoying the benefits of the multi-transmission architecture inherent in the multiple-input, multiple-output (MIMO) approach. Our team built a high-speed, low-latency precoding core with Xilinx System Generator and the Vivado Design Suite that is simple and scalable.
Due to their intrinsic multiuser spatial-multiplexing transmission capability, massive-MIMO systems significantly increase the signal-to-interference-and-noise ratio at both the legacy single-antenna user equipment and the evolved multi-antenna user terminals. The result is more network capacity, higher data throughput and more efficient spectral utilization.
But massive-MIMO technology does have its challenges. To use it, telecom engineers need to build multiple RF transceivers and multiple antennas based on a radiating phased array. They also have to utilize digital horsepower to perform the so-called precoding function.
Our solution was to build a low-latency and scalable frequency-dependent precoding piece of intellectual property (IP), which can be used in Lego fashion for both centralized and distributed massive-MIMO architectures.
Key to this DSP R&D project were high-performance Xilinx 7 series FPGAs, along with Xilinx’s Vivado Design Suite 2015.1 with System Generator and MATLAB/Simulink.
Precoding in Generalized Systems
In a cellular network, user data streams that radiate from generalized MIMO transmitters will be “shaped” in the air by the so-called channel response between each transmitter and receiver at a particular frequency. In other words, different data streams will go through different paths, reaching the receiver at the other end of the airspace. Even the same data stream will behave differently at certain times because of a different “experience” in the frequency domain.
This inherent wireless-transmission phenomenon is equivalent to applying a finite impulse response (FIR) filter with particular frequency response on each data stream, resulting in poor system performance due to the introduced frequency “distortion” by the wireless channels. If we treat the wireless channel as a big black box, only the inputs (transmitter outputs) and outputs (receiver inputs) are apparent at the system level. We can actually add a pre-equalization black box at the MIMO transmitter side with inversed channel response to precompensate the channel black-box effects, and then the cascade system will provide reasonable “corrected” data streams at the receiver equipment.
We call this pre-equalization approach precoding, which basically means applying a group of “reshaping” coefficients at the transmitter chain. For example, if we are going to transmit NRX independent data streams with NTX (number of transmitters) antennas, we will need to perform a pre-equalization precoding at a cost of NRX × NTX temporary complex linear convolution operations and corresponding combining operations before radiating NTX RF signals to the air.
A straightforward low-latency implementation of complex linear convolution is a FIR-type complex discrete digital filter in the time domain.
System Functional Requirements
Under the mission to create a low-latency precoding piece of IP, my team faced a number of essential requirements.
1. We had to precode one data stream into multiple-branch parallel data streams with different sets of coefficients.
2. We needed to place a 100-plus taplength complex asymmetric FIR function at each branch to provide reasonable precoding performance.
3. The precoding coefficients needed to be updated frequently.
4. The designed core must be easily updated and expanded to support different scalable system architectures.
5. Precoding latency should be as low as possible with given resource constraints.
Moreover, besides attending to the functional requirements for a particular design, we had to be mindful of hardware resource constraints as well. In other words, creating a resource-friendly algorithm implementation would be beneficial in terms of key-limited hardware resources such as DSP48s, a dedicated hardware multiplier on Xilinx FPGAs.
High-Speed, Low-Latency Precoding (HLP) Core Design
Essentially, scalability is a key feature that must be addressed before you begin a design of this nature. A scalable design will enable a sustainable infrastructure evolution in the long term and lead to an optimal, cost-effective deployment strategy in the short term. Scalability comes from modularity. Following this philosophy, we created a modularized generic complex FIR filter evaluation platform in Simulink with Xilinx System Generator.
Figure 1 illustrates the top-level system architecture. Simulink_HLP_core describes multibranch complex FIR filters with discrete digital filter blocks in Simulink, while FPGA_HLP_core realizes multibranch complex FIR filters with Xilinx resource blocks in System Generator, as shown in Figure 2.
Different FIR implementation architectures lead to different FPGA resource utilizations. Table 1 compares the complex multipliers (CM) used in a 128-tap complex asymmetric FIR filter in different implementation architectures. We assume the IQ data rate is 30.72 Msamples/second (20MHz bandwidth LTE-Advanced signal).
The full parallel implementation architecture is quite straightforward according to its simple mapping to the direct-I FIR architecture, but it uses a lot of CM resources. A full serial implementation architecture uses the fewest CM resources by sharing the same CM unit with 128 operations in a time-division multiplexing (TDM) manner, but runs at an impossible clock rate for the state-of-the-art FPGA.
A practical solution is to choose a partially parallel implementation architecture, which splits the sequential long filter chain into several segmental parallel stages. Two examples are shown in Table 1. We went for plan A due to its minimal CM utilization and reasonable clock rate. We can actually determine the final architecture by manipulating the data rate, clock rate and number of sequential stages thus:
FCLK = FDATA×NTAP÷NSS
where NTAP and NSS represent the length of the filters and number of sequential stages.
Then we created three main modules:
Branch 1 includes four subprocessing stages isolated by registers for better timing: a FIR coefficients RAM (cRAM) sequential-write and parallel-read stage; a complex multiplication stage; a complex addition stage; and a segmental accumulation-and-downsample stage.
In order to minimize the I/O numbers for the core, our first stage involved creating a sequential write operation to load the coefficients from storage to the FIR cRAM in a TDM manner (each cRAM contains 16 = 128/8 IQ coefficients). We designed a parallel read operation to feed the FIR coefficients to the CM core simultaneously.
In the complex multiplication stage, in order to minimize the DSP48 utilization, we chose the efficient, fully pipelined three-multiplier architecture to perform complex multiplication at a cost of six time cycles of latency.
Next, the complex addition stage aggregates the outputs of the CMs into a single stream. Finally, the segmental accumulation-and-downsample stage accumulates the temporary substreams for 16 time cycles to derive the corresponding linear convolution results of a 128-tap FIR filter, and to downsample the high-speed streams back to match the data-sampling rate of the system—here, 30.72MHz.
We performed the IP verification in two steps. First, we compared the outputs of the FPGA_HLP_core with the referenced double-precision multibranch FIR core in Simulink. We found we had achieved a relative amplitude error of less than 0.04 percent for a 16-bit-resolution version. A wider data width will provide better performance at the cost of more resources.
After verifying the function, it was time to validate the silicon performance. So our second step was to synthesize and implement the created IP in the Vivado Design Suite 2015.1 targeting the FPGA fabric of the Zynq-7000 All Programmable SoC (equivalent to a Kintex xc7k325tffg900-2). With full hierarchy in the tools’ synthesize and default implementation settings, it was easy to achieve the required timing at a 491.52MHz internal processing clock rate, since we created a fully pipelined design with clear registered hierarchies.
The HLP IP we designed can be easily used to create a larger massive-MIMO precoding core. Table 2 presents selected application scenarios, with key resource utilizations. You will need an extra aggregation stage to deliver the final precoding results.
For example, as shown in Figure 4, it’s easy to build a 4 x 4 precoding core by plugging in four HLP cores and one extra pipelined data aggregation stage.
Efficient and Scalable
We have illustrated how to quickly build an efficient and scalable DSP linear convolution application in the form of a massive-MIMO precoding core with Xilinx System Generator and Vivado design tools. You could expand this core to support longer-tap FIR applications by either using more sequential stages in the partially parallel architecture, or by reasonably increasing the processing clock rate to do a faster job. For the latter case, it would be helpful to identify the bottleneck and critical path of the target devices regarding the actual implementation architecture.
Then, co-optimization of hardware and algorithms would be a good approach to tune the system performance, such as development of a more compact precoding algorithm regarding hardware utilization. Initially, we focused on a precoding solution with the lowest latency. For our next step, we are going to explore an alternative solution for better resource utilization and power consumption.
For more information, please contact the author by e-mail: firstname.lastname@example.org.
Note: This article appeared in the latest issue of Xcell Journal, Issue 94.
If you look at what’s happening with Moore’s Law (just read any article about the topic during the last two years), you see that systems design is being forced to make use of All Programmable devices at an increasing rate because of the enormous NRE costs associated with roll-your-own ASICs at 16nm, 10nm, and below. Companies still need the differentiation afforded by custom hardware to boost product margins in their competitive, global marketplaces, but they need to get it in a different way.
Nowhere is that more true than in the six Megatrends that Xilinx has identified:
These Megatrends drive the future of the electronics industry—and they drive Xilinx’s future as well. Xilinx has made a slick, 4-minute video discussing these trends:
The latest issue of Xcell Journal, issue 94, is now online with these feature-length articles:
You can also download a PDF of the entire issue here.