The RISC-V open-source processor has a growing ecosystem and user community so it’s not surprising that someone would want to put one of these processors into a low-cost FPGA like a Xilinx Artix-7 device. And what could be easier than doing so using an existing low-cost dev board? Cue Digilent’s Arty Dev Board, currently on sale for $89.99 here. Normally, you’d find a copy of the Xilinx MicroBlaze soft RISC processor core inside of Arty’s Artix-7 FPGA but a SiFive Freedom E310 microcontroller platform that combines a RISC-V processor with peripherals seems to fit just fine so that’s just what Andrew Black has done using the no-cost Xilinx Vivado HL WebPack Edition to compile the HDL.
Digilent’s ARTY Artix-7 FPGA Dev Board
With Black’s step-by-step instructions based on SiFive's "Freedom E300 Arty FPGA Dev Kit Getting Started Guide", you can do the same pretty easily. (See “Build an open source MCU and program it with Arduino.”)
Note: For more information on the Digilent Arty Dev Board, see “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????” and “Free Webinar on $99 Arty dev kit, based on Artix-7 FPGA, now online.”
IEEE Spectrum has been rolling out the first inductees into its new Chip Hall of Fame this week and the Xilinx XC2064—the world’s first commercial FPGA—appeared in the Chip Hall of Fame today. Here’s the first paragraph of the writeup:
“Back in the early 1980s, chip designers tried to get the most out of each and every transistor on their circuits. But then Ross Freeman had a pretty radical idea. He came up with a chip packed with transistors that formed loosely organized logic blocks with connections that could be configured and reconfigured with software. As a result, sometimes a bunch of transistors wouldn’t be used—heresy!—but Freeman was betting that Moore’s Law would eventually make transistors so cheap that no one would care. He was right. To market his chip, called a field-programmable gate array, or FPGA, Freeman cofounded Xilinx. (Apparently, a weird concept called for a weird company name.)”
The photo of the XC2064 used on this page is of a chip that’s sitting in my drawer at Xilinx. I purchased this chip from eBay a couple of years ago so I’d have a copy of the company’s earliest device. The date code on this XC2064 FPGA is early 1988, less than three years after the first devices rolled out of the fab.
Xilinx XC2064--the world's first commercial FPGA--inducted into the IEEE Chip Hall of Fame today
The Xilinx XC2064 FPGA takes its place among some of the most famous, most important ICs ever created including the Fairchild μA741 op-amp, the Signetics 555 timer, Mostek’s MK4096 4Kbit DRAM, Intel’s 8088 microprocessor, MOS Technology’s 6502 microprocessor, Motorola’s MC68000 microprocessor, and the Zilog Z80 microprocessor. So the XC2064 is in very, very good company. It’s appropriate that this is the Independence Day weekend here in the US because these are all truly revolutionary ICs.
Note: For true history buffs, I also have a photo of two early XC2064 engineering samples in PLCC packages dated late in 1985:
Early XC2064 FPGA engineering samples dated late 1985
Also for true history buffs, I’ve attached a PDF of the original XC2064 press release, dated November 1, 1985.
In February, I wrote a blog detailing the use of a Xilinx Kintex-7 K325T or K410T FPGA in Keysight’s new line of high-speed AWGs (arbitrary waveform generators) and signal digitizers. (See “Kintex-7 FPGAs sweep the design of six new Keysight high-speed PXI AWGs and Digitizers.”) The six new Keysight PXI instruments in that blog included the M3100A 100MSamples/sec, 4 or 8-channel FPGA digitizer; the M3102A 500Msamples/sec, 2 or 4-channel FPGA digitizer; M3201A 500MSamples/sec FPGA arbitrary waveform generator; the M3202A 1GSamples/sec FPGA arbitrary waveform generator; M3300A 500MSamples/sec, 2-channel FPGA AWG/digitizer combo; and the M3302A 500MSamples/sec, 4-channel FPGA AWG/digitizer combo.
In that blog post, I wrote:
“This family of Keysight M3xxx instruments clearly demonstrates the ability to create an FPGA-based hardware platform that enables rapid development of many end products from one master set of hardware designs. In this case, the same data-acquisition and AWG block diagrams recur on the data sheets of these instruments, so you know there’s a common set of designs.”
And that’s still true. Incorporating a Xilinx All Programmable FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC into your product design allows you to create a hardware platform (or platforms) that give you a fast way to spin out new, highly differentiated products based on that platform. Keysight, realizing that the FPGA capability would be useful to its own customers as well, exposed much of the internal FPGA capabilities in these instruments through the Keysight M3602A Graphical FPGA Development Environment, which allows you to customize these instruments using off-the-shelf DSP blocks, MATLAB/Simulink, the Xilinx CORE Generator and Vivado IP cores, and the Xilinx Vivado Design Suite with either VHDL or Verilog code.
Keysight’s M3602A FPGA Block Diagram Editor
A recent Keysight email alerted me to three new application notes Keysight has published that detail the use of on-board FPGA resources to enhance the instruments for specific applications. The three app notes are:
Only All Programmable devices give you this kind of high-speed hardware programmability in addition to microprocessor-based software programmability and these Keysight instruments and the M3602A Development Environment are yet one more demonstration of why that’s a very handy option for you to consider when designing your own products.
As I concluded in that February blog post (and it’s worth repeating):
“Xilinx FPGAs are inherently well-suited to this type of platform-based product design because of the All-Programmable (I/O, hardware, and software) nature of the devices. I/O programmability permits any-to-any connectivity—as is common with, for example, camera designs when you’re concerned about adapting to a range of sensors or different ADCs and DACs for digitizers and AWGs. Hardware programmability allows you to rapidly modify real-time signal-processing or motor-control algorithms—as is common with diverse designs including high-speed instrument designs and industrial controllers.”
Of course these same ideas apply to all types of products, not just AWGs and digitizers.
(You can access the three Keysight app notes here.)
Bittware’s XUPP3R PCIe card based on the Xilinx Virtex UltraScale+ VU9P FPGA has become really popular with customers. (See “BittWare’s UltraScale+ XUPP3R board and Atomic Rules IP run Intel’s DPDK over PCIe Gen3 x16 @ 150Gbps.”) That popularity has led to the inevitable question from BittWare’s customers: How about a bigger FPGA? Although physically, it’s easy to stick a bigger device on a big PCIe card, there’s an issue with heat—getting rid of it. To tackle this engineering problem, BittWare has developed an entirely new platform called “Viper” that employs computer-based thermal modeling, heat pipes, channeled airflow, and the new Xilinx “lidless” D2104 package to get heat out of the FPGA and into the cooling airstream of the PCIe card cage more efficiently. (For more information about the Xilinx lidless D2104 package, see “Mechanical and Thermal Design Guidelines for the UltraScale+ FPGA D2104 Lidless Flip-Chip Packages.”)
The first card to use the Viper platform is the BittWare XUPVV4.
BittWare’s XUPVV4 PCIe Card employs the company’s new Viper Platform with heat-pipe cooling for lidless FPGAs
Here are the specs for the BittWare XUPVV4:
You should be able to build pretty much whatever you want with this board. So, if someone comes to you and says, “you’re gonna’ need a bigger FPGA,” take a look at the BittWare XUPVV4. Plug it into a server and accelerate something today.
Last year at Embedded World 2016, a vision-guided robot based on a Xilinx Zynq UltraScale+ ZU9 MPSoC incorporated into a ZCU102 eval kit autonomously played solitaire on an Android tablet in the Xilinx booth. (See “3D Delta Printer plays Robotic Solitaire on a Touchpad under control of a Xilinx Zynq UltraScale+ MPSoC.”) This year at Embedded World 2017, an upgraded and improved version of the robot again appeared in the Xilinx booth, still playing solitaire.
In the original implementation, an HD video camera monitored the Android tablet’s screen to image the solitaire playing cards. Acceleration hardware implemented in the Zynq MPSoC’s PL (programmable logic) performed real-time preprocessing of the HD video stream including Sobel edge detection. Software running on the Zynq MPSoC’s ARM Cortex-A53 APU (Application Processing Unit) recognized the playing cards from the processed video supplied by the Zynq MPSoC’s PL and planned the solitaire game moves for the robot. The Zynq MPSoC’s dual-core ARM Cortex-R5 RPU (Real-Time Processing Unit) operating in lockstep—useful for safety-critical applications such as robotic control—operated the robotic stylus positioner, fashioned from a 3D Delta printer. The other processing sections of the Zynq UltraScale+ ZU9 MPSoC were also gainfully employed in this demo.
This year a trained, 3-layer Convolutional BNN (Binary Neural Network) with 256 neurons/layer executed the playing-card recognition algorithm. The tangible results: improved accuracy and a performance boost of 11,320x! (Not to mention the offloading of the recognition task from the Zynq MPSoC’s APU.)
Here’s a new, 2-minute video explaining the new autonomous solitaire-playing demo system:
Note: For more information about BNNs and programmable logic, see:
Metamako decided that it needed more than one Xilinx UltraScale FPGA to deliver the low latency and high performance it wanted from its newest networking platform. The resulting design is a 1RU or 2RU box that houses one, two, or three Kintex UltraScale or Virtex UltraScale+ FPGAs, connected by “near-zero” latency links. The small armada of FPGAs means that the platform can run multiple networking applications in parallel—very quickly. This new networking platform allows Metamako to expand far beyond its traditional market—financial transaction networking—into other realms such as medical imaging, SDR (software-defined radio), industrial control, and telecom. The FPGAs are certainly capable of implementing tasks in all of these applications with extremely high performance.
Metamako’s Triple-FPGA Networking Platform
The Metamako platform offers an extensive range of standard networking features including data fan-out, scalable broadcast, connection monitoring, patching, tapping, time-stamping, and a deterministic port-to-FPGA latency of just 3nsec. Metamako also provides a developer’s kit with the platform with features that include:
This latest networking platform from Metamako demonstrates a key attribute of Xilinx All Programmable technology: the ability to fully differentiate a product by exploiting the any-to-any connectivity and high-speed processing capabilities of Xilinx silicon using Xilinx’s development tools. No other chip technology could provide Metamako with a comparable mix of extreme connectivity, speed, and design flexibility.
Drone maker Zerotech announced the Dobby AI pocket-sized drone earlier this year. Now, there’s a Xilinx video of DeePhi Tech’s Fuzhang Shi explaining a bit more about the machine-learning innards of the Dobby AI drone, which uses deep-learning algorithms for tasks including pedestrian detection, tracking, and gesture recognition. DeePhi’s algorithms are running on a Xilinx Zynq Z-7020 SoC integrated into the Dobby AI drone.
Power consumption, stability, and cost are all critical factors in drone design and DeePhi developed a low-power, low-cost, high-stability system using the Zynq SoC, which executes 230GOPS while consuming a mere 3W. This is far more power-efficient than running similar application on CPUs or GPUs, explains Fuzhang Shi.
Zerotech’s Dobby AI Palm-sized autonomous drone pcb with Zynq Z-7020 SoC running DeePhi deep-learning algorithms
Here’s the 2-minute video:
You can now download the Vivado Design Suite 2017.2 HLx editions, which include many new UltraScale+ devices:
In addition, the low-cost Spartan-7 XC7S50 FPGA has been added to the WebPack edition.
Download the latest releases of the Vivado Design Suite HL editions here.
By Adam Taylor
Having looked previously at the profiling we can perform on the Zynq SoC’s and Zynq UltraScale+ MPSoC’s PS (processing system) cores, the next step is to examine the performance of the AXI links between the Zynq SoC’s PS and the PL (programmable logic) . We can use a similar approach for both Zynq SoC and Zynq MPSoC devices, so this blog covers both devices.
We’ll use the AXI Performance Monitor (APM) IP block to monitor the performance between the Zynq PS and the PL. These blocks are instantiated within the programmable logic design and monitor selected AXI links. When we insert these APM blocks into the PL, we must ensure they have at least eight counters and are set for the advanced mode.
Configuring the APM for insertion into the PL
In the example used below for the Zynq SoC, I have added a BRAM memory connected to the PS using the PS Master AXI interface and an AXI BRAM controller. I have connected the APM monitor port (called a slot) on the AXI interface between the between the AXI Interconnect and the AXI BRAM controller. This enables us to monitor the read and write throughput and latencies when our software application accesses the memory.
Zynq APM Design Example
For the Zynq UlraScale+ MPSoC design, I followed a similar approach. However, I connected a second BRAM and AXI BRAM controller to the PS. This enabled me to monitor the low-power AXI interface and the full-power AXI master interfaces between the PS and PL.
Zynq MPSoC Design Example
I also added in a second APM to monitor the second AXI BRAM interface. However, when it comes to the Zynq MPSoC, these are not the only APM blocks present within the design. Several additional APMs exist within the PS subsystem itself and we can also monitor these within XSDK just as we do the ones within the PL.
These additional APMS are located in the following channels:
PS Subsystem Showing the APM locations in the Zynq MPSoC
This ability also provides a wider range of information on our Zynq MPSoC system and can be used for more advanced safety and security applications.
Just as we did with the PS Core, performance monitoring provides both a graphical and a summary table of the latency and data transfers. Running the performance analysis on the AXI interfaces resembles what we did previously. However, before we run it we need to configure the APM blocks.
We do this by right-clicking on the “performance analysis on local” to enable the APM (PS Master) and click on the edit button. This will open a second dialog that allows us to configure the PSU APMs and the inserted PL APMs.
Configuring the APMs within the design (Example Shown Zynq MPSoC)
Once the APM is configured we can run the code on the Zynq SoC or Zynq UltraScale+ MPSoC and obtain the results.
Running the code on the Zynq MPSoC APU to transfer data to and from the BRAMs using first the LPD AXI interface and then the FPD interface provides the results as shown in the tables below.
APM Results accessing BRAM via the LPD AXI Interface
APM Results accessing BRAM via the FPD AXI Interface
You can also see in this example that PSU APM slots are being reported. (Slot 0 = DDRC, Slot 1 = CCI, Slot 2 = OCM and Slot 5 = LPD). AXI Performance Monitor slot 0 is the LPD AXI interface to PL and Slot 1 is the FPD AXI interface to PL.
We know understand a little bit more about how we can profile our system, so we can optimize our design if necessary to give us the best results. What we need to look at over future blogs is how we can optimize the overall design.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Think you don’t need HBM (high-bandwidth memory) in your FPGA-based designs? There was probably a time, not that long ago, when you thought you didn’t need a smartphone. Still think so? With its 460Gbytes/sec bandwidth, HBM doesn’t crash through the memory wall, it vaults you over the wall. And who needs to get over the memory wall? Anyone working with high-speed Ethernet, high-res video, and most high-performance DSP applications. Pretty much anything you’d use a Xilinx UltraScale+ All Programmable device for. Here’s a chart illustrating the problem:
Allow me to translate this chart for you: “You’re not going to get there with DDR SDRAM.”
Fortunately, there’s no longer a need for me to convince you that you need HBM. There’s an 11-page White Paper to do that job. It’s titled “Virtex UltraScale+ HBM FPGA: A Revolutionary Increase in Memory Performance.”
And, if you weren’t aware that Xilinx was adding HBM to its FPGAs, read this blog from last November: “Xilinx Virtex UltraScale+ FPGAs incorporate 32 or 64Gbits of HBM, delivers 20x more memory bandwidth than DDR.”
Last month, Xilinx Product Marketing Manager Darren Zacher presented a Webinar on the extremely popular $99 Arty Dev Kit, which is based on a Xilinx Artix-7 A35T FPGA, and it’s now online. If you’re wondering if this might be the right way for you to get some design experience with the latest FPGA development tools and silicon, spend an hour with Zacher and Arty. The kit is available from Avnet and Digilent.
For more information about the Arty Dev Kit, see: “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????”
Anthony Collins, Harpinder Matharu, and Ehab Mohsen of Xilinx have just published an application article about the 16nm Xilinx RFSoC in MicroWave Journal titled “RFSoC Integrates RF Sampling Data Converters for 5G New Radio.” Xilinx announced the RFSoC, which is based on the 16nm Xilinx Zynq UltraScale+ MPSoC, back in February (see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable,” we mean it!”). The Xcell Daily blog with that announcement has been very popular. Last week, another blog gave more details (see “Ready for a few more details about the Xilinx All Programmable RFSoC? Here you go”), and now there’s this article in Microwave Journal.
This new article gets into many specifics with respect to designing the RFSoC into systems with block diagrams and performance numbers. In particular, there’s a table showing MIMO radio designs based on the RFSoC with 37% to 51% power reductions and significant pcb real-estate savings due to the RFSoC’s integrated, multi-Gbps ADCs and DACs.
If you’re looking to glean a few more technical details about the RFSoC, this article is the latest place to go.
Cloud computing and application acceleration for a variety of workloads including big-data analytics, machine learning, video and image processing, and genomics are big data-center topics and if you’re one of those people looking for acceleration guidance, read on. If you’re looking to accelerate compute-intensive applications such as automated driving and ADAS or local video processing and sensor fusion, this blog post’s for you to. The basic problem here is that CPUs are too slow and they burn too much power. You may have one or both of these challenges. If so, you may be considering a GPU or an FPGA as an accelerator in your design.
How to choose?
Although GPUs started as graphics accelerators, primarily for gamers, a few architectural tweaks and a ton of software have made them suitable as general-purpose compute accelerators. With the right software tools, it’s not too difficult to recode and recompile a program to run on a GPU instead of a CPU. With some experience, you’ll find that GPUs are not great for every application workload. Certain computations such as sparse matrix math don’t map onto GPUs well. One big issue with GPUs is power consumption. GPUs aimed at server acceleration in a data-center environment may burn hundreds of watts.
With FPGAs, you can build any sort of compute engine you want with excellent performance/power numbers. You can optimize an FPGA-based accelerator for one task, run that task, and then reconfigure the FPGA if needed for an entirely different application. The amount of computing power you can bring to bear on a problem is scary big. A Virtex UltraScale+ VU13P FPGA can deliver 38.3 INT8 TOPS (that’s tera operations per second) and if you can binarize the application, which is possible with some neural networks, you can hit 500TOPS. That’s why you now see big data-center operators like Baidu and Amazon putting Xilinx-based FPGA accelerator cards into their server farms. That’s also why you see Xilinx offering high-level acceleration programming tools like SDAccel to help you develop compute accelerators using Xilinx All Programmable devices.
For more information about the use of Xilinx devices in such applications including a detailed look at operational efficiency, there’s a new 17-page White Paper titled “Xilinx All Programmable Devices: A Superior Platform for Compute-Intensive Systems.”
There’s considerable 5G experimentation taking place as the radio standards have not yet gelled and researchers are looking to optimize every aspect. SDRs (software-defined radios) are excellent experimental tools for such research—NI’s (National Instruments’) SDR products especially so because, as the Wireless Communication Research Laboratory at Istanbul Technical University discovered:
“NI SDR products helped us achieve our project goals faster and with fewer complexities due to reusability, existing examples, and the mature community. We had access to documentation around the examples, ready-to-run conceptual examples, and courseware and lab materials around the grounding wireless communication topics through the NI ecosystem. We took advantage of the graphical nature of LabVIEW to combine existing blocks of algorithms more easily compared to text-based options.”
Researchers at the Wireless Communication Research Laboratory were experimenting with UFMC (universal filtered multicarrier) modulation, a leading modulation candidate technique for 5G communications. Although current communication standards frequently use OFDM (orthogonal frequency-division multiplexing), it is not considered to be a suitable modulation technique for 5G systems due to its tight synchronization requirements, inefficient spectral properties (such as high spectral side-lobe levels), and cyclic prefix (CP) overhead. UFMC has relatively relaxed synchronization requirements.
The research team at the Wireless Communication Research Laboratory implemented UFMC modulation using two USRP-2921 SDRs, a PXI-6683H timing module, and a PXIe-5644R VST (Vector signal Transceiver) module from National Instruments (NI)–and all programmed with NI’s LabVIEW systems engineering software. Using this equipment, they achieved better spectral results over the OFDM usage and, by exploiting UFMC’s sub-band filtering approach, they’ve proposed enhanced versions of UFMC. Details are available in the NI case study titled “Using NI Software Defined Radio Solutions as a Testbed of 5G Waveform Research.” This project was a finalist in the 2017 NI Engineering Impact Awards, RF and Mobile Communications category, held last month in Austin as part of NI Week.
5G UFMC Modulation Testbed based on Equipment from National Instruments
Note: NI’s USRP-2921 SDR is based on a Xilinx Spartan-6 FPGA; the NI PXI-6683 timing module is based on a Xilinx Virtex-5 FPGA; and the PXIe-5644R VST is based on a Xilinx Virtex-6 FPGA.
Although humans once served as the final inspectors for pcbs, today’s component dimensions and manufacturing volumes mandate the use of camera-based automated optical inspection (AOI) systems. Amfax has developed a 3D AOI system—the a3Di—that uses two lasers to make millions of 3D measurements with better than 3μm accuracy. One of the company’s customers uses an a3Di system to inspect 18,000 assembled pcbs per day.
The a3Di control system is based on a National Instruments (NI) cRIO-9075 CompactRIO controller—with an integrated Xilinx Virtex-5 LX25 FPGA—programmed with NI’s LabVIEW systems engineering software. The controller manages all aspects of the a3Di AOI system including monitoring and control of:
The system provides height-graded images like this:
3D Image of a3Di’s Measurement Data: Colors represent height, with Z resolution down to less than a micron. The blue section at the top indicates signs of board warp. Laser etched component information appears on some of the ICs.
The a3Di system then compares this image against a stored golden reference image to detect manufacturing defects.
Amfax says that it has found the CompactRIO system to be “CompactRIO system has proven to be a dependable, reliable, and cost-effective.” In addition, the company found it could get far better timing resolution with the CompactRIO system than the 1msec resolution usually provided by PLC controllers.
This project was a 2017 NI Engineering Impact Award Finalist in the Electronics and Semiconductor category last month at NI Week. It is documented in this NI case study.
Hyundai Heavy Industries (HHI) is the world’s foremost shipbuilding company and the company’s Engine and Machinery Division (HHI-EMD) is the world’s largest marine diesel engine builder. HHI’s HiMSEN medium-sized engines are four-stroke diesels with output power ranging from 960kW to 25MW. These engines power electric generators on large ships and serve as the propulsion engine on medium and small ships. HHI-EMD is always developing newer, more fuel-efficient engines because the fuel costs for these large diesels runs about $2000/hour. Better fuel efficiency will significantly reduce operating costs and emissions.
For that research, HHI-EMD developed monitoring and diagnostic equipment to better understand engine combustion performance and an HIL system to test new engine controller designs. The test and HIL systems are based on equipment from National Instruments (NI).
Engine instrumentation must be able to monitor 10-cylinder engines running at thousands of RPM while measuring crankshaft angle to 0.1 degree of resolution. From that information, the engine test and monitoring system calculates in-cylinder peak pressure, mean effective pressure, and cycle-to-cycle pressure variation. All this must happen every 10 μ sec for each cylinder.
HHI-EMD elected to use an NI cRIO-9035 Controller, which incorporates a Xilinx Kintex-7 70T FPGA, to serve as the platform for developing its HiCAS test and data-acquisition system. The HiCAS system monitors all aspects of the engine under test including engine speed, in-cylinder pressure, and pressures in the intake and exhaust systems. This data helped HHI-EMD engineers analyze the engine’s overall performance and the performance of key parts using thermodynamic analysis. HiCAS provides real-time analysis of dynamic data including:
Using the collected data, the engineering team then developed a model of the diesel engine, resulting in the development of an HMI system used to exercise the engine controllers. This engine model runs in real time on an NI PXI system synchronized with the high-speed signal-sensor simulation software running on the PXI system’s multifunction FPGA-based FlexRIO module. The HMI system transmits signals to the engine controllers, simulating an operating engine and eliminating the operating costs of a large diesel engine during these tests. HHI-EMD credits the FPGAs in these systems for making the calculations run fast enough for real-time simulation. The simulated engine also permits fault testing without the risk of damaging an actual engine. Of course, all of this is programmed using NI’s LabVIEW systems engineering software and LabVIEW FPGA.
HHI-EMD HIL Simulator for Marine Diesel Engines
According to HHI-EMD, development of the HiCAS engine-monitoring system and virtual verification based on the HIL system shortened development time from more than three years to one, significantly accelerating the time-to-market for HHI-EMD’s more eco-friendly marine diesel engines.
This project was a 2017 NI Engineering Impact Award Finalist in the Transportation and Heavy Equipment category last month at NI Week and won the 2017 HPE Edgeline Big Analog Data Award. It is documented in this NI case study.
Chang Guang Satellite Technology, China’s first commercial remote sensing satellite company, develops and operates the JILIN-1 high-resolution remote-sensing satellite series, which has pioneered the application of commercial satellites in China. The company contemplates putting 60 satellites in orbit by 2020 and 138 satellites in orbit by 2030. Achieving that goal is going to take a lot of testing and testing consumes about 70% of the development cycle for space-based systems. So Chang Guang Satellite Technology knew it would need to automate its test systems and turned to National Instruments (NI) for assistance. The resulting automated test system has three core test systems using products from NI:
A Chang Guang Satellite Technology test system based on NI’s 1st-generation VST and FlexRIO PXIe modules
Here’s a sample image from the company’s growing satellite imaging portfolio:
Shanghai Disneyland as viewed from space
NI’s VSTs and FlexRIO modules are all based on multiple generations of Xilinx FPGAs. The company’s 2nd-generation VSTs are based on Virtex-7 FPGAs and its latest FlexRIO modules are based on Kintex-7 FPGAs.
This project was a 2017 NI Engineering Impact Award Finalist in the Aerospace and Defense category last month at NI Week. It is documented in this NI case study.
For more information about NI’s VST family, see:
Avnet has formally introduced its MiniZed dev board based on the Xilinx Zynq Z-7000S SoC with the low, low price of just $89. For this, you get a Zynq Z-7007S SoC with one ARM Cortex-A9 processor core, 512Mbytes of DDR3L SDRAM, 128Mbits of QSPI Flash, 8Gbytes of eMMC Flash memory, WiFi 802.11 b/g/n, and Bluetooth 4.1. The MiniZed board incorporates an Arduino-compatible shield interface, two Pmod connectors, and a USB 2.0 host interface for fast peripheral expansion. You’ll also find an ST Microelectronics LIS2DS12 Motion and temperature sensor and an MP34DT05 Digital Microphone on the board. This is a low-cost dev board that packs the punch of a fast ARM Cortex-A9 processor, programmable logic, a dual-wireless communications system, and easy system expandability.
I find the software that accompanies the board equally interesting. According to the MiniZed Product Brief, the $89 price includes a voucher for an SDSoC license so you can program the programmable logic on the Zynq SoC using C or C++ in addition to Verilog or VHDL using Vivado. This is a terrific deal on a Zynq dev board, whether you’re a novice or an experienced Xilinx user.
Avnet’s announcement says that the board will start shipping in early July.
Stefan Rousseau, senior technical marketing engineer for Avnet, said, “Whether customers are developing a Linux-based system or have a simple bare metal implementation, with MiniZed, Zynq-7000 development has never been easier. Designers need only connect to their laptops with a single micro-USB cable and they are up and running. And with Bluetooth or Wi-Fi, users can also connect wirelessly, transforming a mobile phone or tablet into an on-the-go GUI.”
Here’s a photo of the MiniZed Dev board:
Avnet’s $89 MiniZed Dev Board based on a Xilinx Zynq Z-7007S SoC
And here’s a block diagram of the board:
Avnet’s $89 MiniZed Dev Board Block Diagram
Many engineers in Canada wear the Iron Ring on their finger, presented to engineering graduates as a symbolic, daily reminder that they have an obligation not to design structures or other artifacts that fail catastrophically. (Legend has it that the iron in the ring comes from the first Quebec Bridge—which collapsed during its construction in 1907—but the legend appears to be untrue.) All engineers, whether wearing the Canadian Iron Ring or not, feel an obligation to develop products that do not fail dangerously. For buildings and other civil engineering works, that usually means designing structures with healthy design margins even for worst-case projected loading. However, many structures encounter worst-case loads infrequently or never. For example, a sports stadium experiences maximum loading for perhaps 20 or 30 days per year, for only a few hours at a time when it fills with sports fans. The rest of the time, the building is empty and the materials used to ensure that the structure can handle those loads are not needed to maintain structural integrity.
The total energy consumed by a structure over its lifetime is a combination of the energy needed to mine and fabricate the building materials and to build the structure (embodied energy) and the energy needed to operate the building (operational energy). The resulting energy curve looks something like this:
For completely passive structures, which describes most structures built over the past several thousand years, embodied energy dominates the total consumed energy because structural members must be designed to bear the full design load at all times. Alternatively, a smart structure with actuators that stiffen the structure only when needed will require more operational energy but the total required embodied energy will be smaller. Looking at the above conceptual graph, a well-designed active-passive system minimizes the total required energy for the structure.
Active control has already been used in structure design, most widely for vibration control. During his doctorate work, Gennaro Senatore formulated a new methodology to design adaptive structures. His research project was a collaboration between the University College London and Expedition Engineering. As part of that project, Senatore built a large scale prototype of an active-passive structure at the University College London structures laboratory. The resulting prototype is a 6m cantilever spatial truss with a 37.5:1 span-to-depth ratio. Here’s a photo of the large-scale prototype truss:
You can see the actuators just beneath the top surface of the truss. When the actuators are not energized, the cantilever truss flexes quite a lot with a load placed at the extreme end. However, this active system detects the load-induced flexion and compensates by energizing the actuators and stiffening the cantilever.
Here’s a photo showing the amount of flex induced by a 100kg load at the end of the cantilever without and with energized actuators:
The top half of the image shows that the truss flexes 170mm under load when the actuators are not energized, but only 2mm when the system senses the load and energizes the linear actuators.
The truss incorporates ten linear electric actuators that stiffen the truss when sensors detect a load-induced deflection. The control system for this active-passive truss consists of a National Instruments (NI) CompactRIO cRIO-9024 controller, 45 strain-gage sensors, 10 actuators, and five driver boards (one for each actuator pair.) The NI cRIO-9024 controller pairs with a card cage that accepts I/O modules and incorporates a Virtex-5 FPGA for reconfigurable I/O. (That’s what the “RIO” in cRIO stands for.) In this application, the integral Virtex-5 FPGA also provides in-line processing for acquired and generated signals.
The system is programmed using NI’s LabVIEW systems engineering software.
A large structure would require many such subsystems, all communicating through a network. This is clearly one very useful way to employ the IIoT in structures.
This project was a 2017 NI Engineering Impact Award Finalist in the Industrial Machinery and Control category last month at NI Week. It is documented in this NI case study, which includes many more technical details and a short video showing the truss in action as a load is applied.
National Instruments (NI) has just announced a baseband version of its 2nd-Generation PXIe VST (Vector Signal Transceiver), the PXIe-5820, with 1GHz of complex I/Q bandwidth. It’s designed to address the most challenging RF front-end module and transceiver test applications. Of course, you program it with NI’s LabVIEW system engineering software like all NI instruments and, like its RF sibling the PXIe-5840, the PXIe-5820 baseband VST is based on a Xilinx Virtex-7 690T FPGA and a chunk of the FPGA’s programmable logic is available to users for creating real-time, application-specific signal processing using LabVIEW FPGA. According to Ruan Lourens, NI’s Chief Architect of RF R&D, “The baseband VST can be tightly synchronized with the PXIe-5840 RF VST to sub-nanosecond accuracy, to offer a complete solution for RF and baseband differential I/Q testing of wireless chipsets.”
NI’s new PXIe-5820 Baseband VST
How might you use this feature? Here’s a very recent, 2-minute video demonstration of a DPD (digital predistortion) measurement application that provides a pretty good example:
My very good friend Jack Ganssle, the firmware expert and consultant, publishes a free semi-monthly newsletter called the Embedded Muse for engineers working on embedded systems and the latest issue, #330, contains this testimonial for Saleae and its FPGA-based logic analyzers:
“Another reader has nice things to say about Saleae. Dan Smith writes:
“I've owned Saleae logic analyzers since 2011, starting with their original (and at that time, only) logic analyzer, aptly named "Logic", which was an 8-channel unit and very good host-side software to go with it. Never had a problem with the unit or the software.
“Fast forward to a Saturday morning in 2017, where I was debugging a strange bus timing problem under a tight project deadline. I'd woken up early because I couldn't "turn my mind off" from the night before. I was using a newer, more expensive model with features I needed (analog capture, must higher bandwidth, etc.) For some reason, the unit wouldn't enumerate when I plugged it in that morning. I contacted support on a Saturday morning; to my surprise, I had a response later that day from Mark, one of the founders and also the primary architect and developer of the host-side software. After discussing the problem and my situation, they sent me a replacement unit right away, even before the old unit was returned and inspected. I'd also received excellent support a few weeks earlier getting the older Logic unit working with a strange combination of MacBook Pro, outdated version of OSX and an uncooperative USB port.
“My point is simply that even though the Saleae products -- hardware and software -- are excellent, it's their customer service that has earned my loyalty. Too often, great service and support go unmentioned; in my case, it's what saved me. And yes, I debugged the problem that weekend and met the deadline!”
Saleae engineers its compact, USB-powered logic analyzers using FPGAs like the Xilinx Spartan-6 LX16 FPGA used in its Logic Pro 8 logic analyzer/scope. (See “Jack Ganssle reviews the Saleae Logic Pro 8 logic analyzer/scope, based on a Spartan-6 FPGA” and “Compact, 8-channel, 500Msamples/sec logic analyzer relies on Spartan-6 FPGA to provide most functions.”) Although the Spartan-6 FPGA is a low-cost device, its logic and I/O programmability are a great match for the logic analyzer’s I/O and data-capture needs.
Saleae Logic Pro 8 logic analyzer/scope
MathWorks has just published a 30-minute video titled “FPGA for DSP applications: Fixed Point Made Easy.” The video targets users of the company’s MATLAB and Simulink software tools and covers fixed-point number systems, how these numbers are represented in MATLAB and in FPGAs, quantization and quantization challenges, sources of error and minimizing these errors, how to use MathWorks’ design tools to understand these concepts, implementation of fixed-point DSP algorithms on FPGAs using MathWorks’ tools, and the advantages of the Xilinx DSP48 block—which you’ll find in all Xilinx 28nm series 7, 20nm UltraScale, and 16nm UltraScale+ devices including Zynq SoCs and Zynq UltraScale+ MPSoCs.
The video also shows the development of an FIR filter using MathWorks’ fixed-point tools as an example with some useful utilization feedback that helps you optimize your design. The video also briefly shows how you can use MathWorks’ HDL Coder tool to develop efficient, single-precision, floating-point DSP hardware for Xilinx FPGAs.
By Adam Taylor
We can create very responsive design solutions using Xilinx Zynq SoC or Zynq UltraScale+ MPSoC devices, which enble us to architect systems that exploit the advantages provided by both the PS (processor system) and the PL (programmable logic) in these devices. When we work with logic designs in the PL, we can optimize the performance of design techniques like pipelining and other UltraFast design methods. We can see the results of our optimization techniques using simulation and Vivado implementation results.
When it comes to optimizing the software, which runs on acceleration cores instantiated in the PS, things may appear a little more opaque. However, things are not what they might appear. We can gather statistics on our accelerated code with ease using the performance analysis capabilities built into XSDK. Using performance analysis, we can examine the performance of the software we have running on the acceleration cores and we can monitor AXI performance within the PL to ensure that the software design is optimized for the application at hand.
Using performance analysis, we can examine several aspects of our running code:
For those who may not be familiar with the concept, a stall occurs when the cache does not contain the requested data, which must then be fetched from main memory. While the data is fetched, the core can continue to process different instructions using out-of-order (OOO) execution, however the processor will eventually run out of independent instructions. It will have to wait for the information it needs. This is called a stall.
We can gather these stall statistics thanks to the Performance Monitor Unit (PMU) contained within each of the Zynq UltraScale+ MPSoC’s CPUs. The PMU provides six profile counters, which are configured by and post processed by XSDK to generate the statistics above.
If we want to use the performance monitor within SDK, we need to work with a debug build and then open the Performance Monitor Perspective within XSDK. If we have not done so before, we can open the perspective as shown below:
Opening the Performance Analysis Perspective
With the performance analysis perspective open, we can debug the application as normal. However, before we click on the run icon (the debugger should be set to stop at main, as default), we need to start the performance monitor. To do that, right click on the “System Debugger on Local” symbol within the performance monitor window and click start.
Starting the Performance Analysis
Then, once we execute the program, the statistics will be gathered and we can analyse them within XDSK to determine the best optimizations for our code.
To demonstrate how we can use this technique to deliver a more optimized system, I have created a design that runs on the ZedBoard and performs AES256 Encryption on 1024 packets of information. When this code was run the ZedBoard the following execution statistics were collected:
So far, these performance statistics only look at code executing on the PS itself. Next time, we will look at how we can use the AXI Performance Monitor with XSDK. If we wish to do this, we need to first instrument the design in Vivado.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Linc, Perrone Robotics’ autonomous Lincoln MKZ automobile, took a drive around the Perrone paddock at the TU Automotive autonomous vehicle show in Detroit last week and Dan Isaacs, Xilinx’s Director Connected Systems in Corporate Marketing, was there to shoot photos and video. Perrone’s Linc test vehicle operates autonomously using the company’s MAX (Mobile Autonomous X), a “comprehensive full-stack, modular, real-time capable, customizable, robotics software platform for autonomous (self-driving) vehicles and general purpose robotics.” MAX runs on multiple computing platforms including one based on an Iveia controller, which is based on an Iveia Atlas SOM, which in turn is based on a Xilinx Zynq UltraScale+ MPSoC. The Zynq UltraScale+ MPSoC handles the avalanche of data streaming from the vehicle’s many sensors to ensure that the car travels the appropriate path and avoids hitting things like people, walls and fences, and other vehicles. That’s all pretty important when the car is driving itself in public. (For more information about Perrone Robotics’ MAX, see “Perrone Robotics builds [Self-Driving] Hot Rod Lincoln with its MAX platform, on a Zynq UltraScale+ MPSoC.”)
Here’s a photo of Perrone’s sensored-up Linc autonomous automobile in the Perrone Robotics paddock at TU Automotive in Detroit:
And here’s a photo of the Iveia control box with the Zynq UltraScale+ MPSoC inside, running Perrone’s MAX autonomous-driving software platform. (Note the controller’s small size and lack of a cooling fan):
Opinions about the feasibility of autonomous vehicles are one thing. Seeing the Lincoln MKZ’s 3800 pounds of glass, steel, rubber, and plastic being controlled entirely by a little silver box in the trunk, that’s something entirely different. So here’s the video that shows Perrone Robotics’ Linc in action, driving around the relative safety of the paddock while avoiding the fences, pedestrians, and other vehicles:
If you’re designing next-generation avionics systems, you may be facing some challenges:
Do these sound like your challenges? Want some help? Check out this June 20 Webinar.
When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.
Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:
That’s a huge range covered in just five minutes.
Here’s the video:
Signal Integrity Journal just published a new article titled “Addressing the 5G Challenge with Highly Integrated RFSoC,” written by four Xilinx authors. The articles discusses some potential uses for Xilinx RFSoC technology, announced in February. (See “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable, we mean it!”)
Cutting to the chase of this 2600-word article, the Xilinx RFSoC is going to save you a ton of power and make it easier for you to achieve your performance goals for 5G and many other advanced, mixed-signal system designs.
If you’re involved in the design of a system like that, you really should read the article.
Light Reading’s International Group Editor Ray Le Maistre recently interviewed David Levi, CEO of Ethernity Networks, who discusses the company’s FPGA-based All Programmable ACE-NIC, a Network Interface Controller with 40Gbps throughput. The carrier-grade ACE-NIC accelerates vEPC (virtual Evolved Packet Core, a framework for virtualizing the functions required to converge voice and data on 4G LTE networks) and vCPE (virtual Customer Premise Equipment, a way to deliver routing, firewall security and virtual private network connectivity services using software rather than dedicated hardware) applications by 50x, dramatically reducing end-to-end latency associated with NFV platforms. Ethernity’s ACE-NIC is based on a Xilinx Kintex-7 FPGA.
“The world is crazy about our solution—it’s amazing,” says Levi in the Light Reading video interview.
Ethernity Networks All Programmable ACE-NIC
Because Ethernity implements its NIC IP in a Kintex-7 FPGA, it was natural for Le Maistre to ask Levi when his company would migrate to an ASIC. Levi’s answer surprised him:
“We offer a game changer... We invested in technology—which is covered by patents—that consumes 80% less logic than competitors. So essentially, a solution that you may want to deliver without our patents will cost five times more on FPGA… With this kind of solution, we succeed over the years in competing with off-the-shelf components… with the all-programmable NIC, operators enjoy the full programmability and flexibility at an affordable price, which is comparable to a rigid, non-programmable ASIC solution.”
In other words, Ethernity plans to stay with All Programmable devices for its products. In fact, Ethernity Networks announced last year that it had successfully synthesized its carrier-grade switch/router IP for the Xilinx Zynq UltraScale+ MPSoC and that the throughput performance increases to 60Gbps per IP core with the 16nm device—and 120Gbps with two instances of that core. “We are going to use this solution for novel SDN/NFV market products, including embedded SR-IOV (single-root input/output virtualization), and for high density port solutions,” – said Levi.
Towards the end of the video interview, Levi looks even further into the future when he discusses Amazon Web Services’ (AWS’) recent support of FPGA acceleration. (That’s the Amazon EC2 F1 compute instance based on Xilinx Virtex UltraScale+ FPGAs rolled out earlier this year.) Because it’s already based on Xilinx All Programmable devices, Ethernity’s networking IP runs on the Amazon EC2 F1 instance. “It’s an amazing opportunity for the company [Ethernity],” said Levi. (Try doing that in an ASIC.)
Here’s the Light Reading video interview:
With LED automotive lighting now becoming commonplace, newer automobiles have the ability to communicate with each other (V2V communications) and with roadside infrastructure by quickly flashing their lights (LiFi) instead of using radio protocols. Researchers at OKATEM—the Centre of Excellence in Optical Wireless Communication Technologies at Ozyegin University in Turkey—have developed an OFDM-based LiFi demonstrator for V2V (vehicle-to-vehicle) and V2I (vehicle-to-infrastructure) applications that has achieved 50Mbps communications between vehicles as far apart as 70m in a lab atmospheric emulator.
Inside the OKATEM LiFi Atmospheric Emulator
The demo system is based on PXIe equipment from National Instruments (NI) including FlexRIO FPGA modules. (NI’s PXIe FlexRIO modules are based on Xilinx Virtex-5 and Virtex-7 FPGAs.) The FlexRIO modules implement the LiFi OFDM protocols including channel coding, 4-QAM modulation, and an N-IFFT. Here’s a diagram of the setup:
Researchers developed the LiFi system using NI’s LabVIEW and LabVIEW system engineering software. Initial LiFi system performance demonstrated a data rate of 50 Mbps with as much as 70m between two cars, depending on the photodetectors’ location in the car (particularly its height above ground level). Further work will try to improve the total system performance by integrating advanced capabilities such as multiple-input, multiple-output (MIMO) communication and link adaptation on the top of OFDM architecture.
This project was a 2017 NI Engineering Impact Award Winner in the RF and Mobile Communications category last month at NI Week. It is documented in this NI case study.