We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Compute Acceleration: GPU or FPGA? New White Paper gives you numbers

by Xilinx Employee ‎06-14-2017 02:24 PM - edited ‎06-14-2017 02:28 PM (3,431 Views)


Cloud computing and application acceleration for a variety of workloads including big-data analytics, machine learning, video and image processing, and genomics are big data-center topics and if you’re one of those people looking for acceleration guidance, read on. If you’re looking to accelerate compute-intensive applications such as automated driving and ADAS or local video processing and sensor fusion, this blog post’s for you to. The basic problem here is that CPUs are too slow and they burn too much power. You may have one or both of these challenges. If so, you may be considering a GPU or an FPGA as an accelerator in your design.


How to choose?


Although GPUs started as graphics accelerators, primarily for gamers, a few architectural tweaks and a ton of software have made them suitable as general-purpose compute accelerators. With the right software tools, it’s not too difficult to recode and recompile a program to run on a GPU instead of a CPU. With some experience, you’ll find that GPUs are not great for every application workload. Certain computations such as sparse matrix math don’t map onto GPUs well. One big issue with GPUs is power consumption. GPUs aimed at server acceleration in a data-center environment may burn hundreds of watts.


With FPGAs, you can build any sort of compute engine you want with excellent performance/power numbers. You can optimize an FPGA-based accelerator for one task, run that task, and then reconfigure the FPGA if needed for an entirely different application. The amount of computing power you can bring to bear on a problem is scary big. A Virtex UltraScale+ VU13P FPGA can deliver 38.3 INT8 TOPS (that’s tera operations per second) and if you can binarize the application, which is possible with some neural networks, you can hit 500TOPS. That’s why you now see big data-center operators like Baidu and Amazon putting Xilinx-based FPGA accelerator cards into their server farms. That’s also why you see Xilinx offering high-level acceleration programming tools like SDAccel to help you develop compute accelerators using Xilinx All Programmable devices.


For more information about the use of Xilinx devices in such applications including a detailed look at operational efficiency, there’s a new 17-page White Paper titled “Xilinx All Programmable Devices: A Superior Platform for Compute-Intensive Systems.”






There’s considerable 5G experimentation taking place as the radio standards have not yet gelled and researchers are looking to optimize every aspect. SDRs (software-defined radios) are excellent experimental tools for such research—NI’s (National Instruments’) SDR products especially so because, as the Wireless Communication Research Laboratory at Istanbul Technical University discovered:


“NI SDR products helped us achieve our project goals faster and with fewer complexities due to reusability, existing examples, and the mature community. We had access to documentation around the examples, ready-to-run conceptual examples, and courseware and lab materials around the grounding wireless communication topics through the NI ecosystem. We took advantage of the graphical nature of LabVIEW to combine existing blocks of algorithms more easily compared to text-based options.”


Researchers at the Wireless Communication Research Laboratory were experimenting with UFMC (universal filtered multicarrier) modulation, a leading modulation candidate technique for 5G communications. Although current communication standards frequently use OFDM (orthogonal frequency-division multiplexing), it is not considered to be a suitable modulation technique for 5G systems due to its tight synchronization requirements, inefficient spectral properties (such as high spectral side-lobe levels), and cyclic prefix (CP) overhead. UFMC has relatively relaxed synchronization requirements.


The research team at the Wireless Communication Research Laboratory implemented UFMC modulation using two USRP-2921 SDRs, a PXI-6683H timing module, and a PXIe-5644R VST (Vector signal Transceiver) module from National Instruments (NI)–and all programmed with NI’s LabVIEW systems engineering software. Using this equipment, they achieved better spectral results over the OFDM usage and, by exploiting UFMC’s sub-band filtering approach, they’ve proposed enhanced versions of UFMC. Details are available in the NI case study titled “Using NI Software Defined Radio Solutions as a Testbed of 5G Waveform Research.” This project was a finalist in the 2017 NI Engineering Impact Awards, RF and Mobile Communications category, held last month in Austin as part of NI Week.



5G UFMC Modulation Testbed.jpg 



5G UFMC Modulation Testbed based on Equipment from National Instruments



Note: NI’s USRP-2921 SDR is based on a Xilinx Spartan-6 FPGA; the NI PXI-6683 timing module is based on a Xilinx Virtex-5 FPGA; and the PXIe-5644R VST is based on a Xilinx Virtex-6 FPGA.







Although humans once served as the final inspectors for pcbs, today’s component dimensions and manufacturing volumes mandate the use of camera-based automated optical inspection (AOI) systems. Amfax has developed a 3D AOI system—the a3Di—that uses two lasers to make millions of 3D measurements with better than 3μm accuracy. One of the company’s customers uses an a3Di system to inspect 18,000 assembled pcbs per day.


The a3Di control system is based on a National Instruments (NI) cRIO-9075 CompactRIO controller—with an integrated Xilinx Virtex-5 LX25 FPGA—programmed with NI’s LabVIEW systems engineering software. The controller manages all aspects of the a3Di AOI system including monitoring and control of:



  • Machine motors
  • Control switches
  • Optical position sensors
  • Inverters
  • Up and downstream SMEMA (Surface Mount Equipment Manufacturers Association) conveyor control
  • Light tower
  • Pneumatics
  • Operator manual controls for width PCB control
  • System emergency stop



The system provides height-graded images like this:




Amfax 3D PCB image.jpg 


3D Image of a3Di’s Measurement Data: Colors represent height, with Z resolution down to less than a micron. The blue section at the top indicates signs of board warp. Laser etched component information appears on some of the ICs.




The a3Di system then compares this image against a stored golden reference image to detect manufacturing defects.


Amfax says that it has found the CompactRIO system to be “CompactRIO system has proven to be a dependable, reliable, and cost-effective.” In addition, the company found it could get far better timing resolution with the CompactRIO system than the 1msec resolution usually provided by PLC controllers.



This project was a 2017 NI Engineering Impact Award Finalist in the Electronics and Semiconductor category last month at NI Week. It is documented in this NI case study.


HIL simulator based on NI equipment allows Hyundai to cut marine diesel development from 3 years to one

by Xilinx Employee ‎06-13-2017 12:09 PM - edited ‎06-13-2017 12:11 PM (1,324 Views)


Hyundai Heavy Industries (HHI) is the world’s foremost shipbuilding company and the company’s Engine and Machinery Division (HHI-EMD) is the world’s largest marine diesel engine builder. HHI’s HiMSEN medium-sized engines are four-stroke diesels with output power ranging from 960kW to 25MW. These engines power electric generators on large ships and serve as the propulsion engine on medium and small ships. HHI-EMD is always developing newer, more fuel-efficient engines because the fuel costs for these large diesels runs about $2000/hour. Better fuel efficiency will significantly reduce operating costs and emissions.


For that research, HHI-EMD developed monitoring and diagnostic equipment to better understand engine combustion performance and an HIL system to test new engine controller designs. The test and HIL systems are based on equipment from National Instruments (NI).


Engine instrumentation must be able to monitor 10-cylinder engines running at thousands of RPM while measuring crankshaft angle to 0.1 degree of resolution. From that information, the engine test and monitoring system calculates in-cylinder peak pressure, mean effective pressure, and cycle-to-cycle pressure variation. All this must happen every 10 μ sec for each cylinder.


HHI-EMD elected to use an NI cRIO-9035 Controller, which incorporates a Xilinx Kintex-7 70T FPGA, to serve as the platform for developing its HiCAS test and data-acquisition system. The HiCAS system monitors all aspects of the engine under test including engine speed, in-cylinder pressure, and pressures in the intake and exhaust systems. This data helped HHI-EMD engineers analyze the engine’s overall performance and the performance of key parts using thermodynamic analysis. HiCAS provides real-time analysis of dynamic data including:


  • In-cylinder peak pressure
  • Indicated mean effective pressure and cycle-to-cycle variation
  • Cylinder-to-cylinder distribution
  • Cyclic moving parts fault diagnosis


Using the collected data, the engineering team then developed a model of the diesel engine, resulting in the development of an HMI system used to exercise the engine controllers. This engine model runs in real time on an NI PXI system synchronized with the high-speed signal-sensor simulation software running on the PXI system’s multifunction FPGA-based FlexRIO module. The HMI system transmits signals to the engine controllers, simulating an operating engine and eliminating the operating costs of a large diesel engine during these tests. HHI-EMD credits the FPGAs in these systems for making the calculations run fast enough for real-time simulation. The simulated engine also permits fault testing without the risk of damaging an actual engine. Of course, all of this is programmed using NI’s LabVIEW systems engineering software and LabVIEW FPGA.





HHI-EMD HIL Simulator for Marine Diesel Engines.jpg 



HHI-EMD HIL Simulator for Marine Diesel Engines




According to HHI-EMD, development of the HiCAS engine-monitoring system and virtual verification based on the HIL system shortened development time from more than three years to one, significantly accelerating the time-to-market for HHI-EMD’s more eco-friendly marine diesel engines.




This project was a 2017 NI Engineering Impact Award Finalist in the Transportation and Heavy Equipment category last month at NI Week and won the 2017 HPE Edgeline Big Analog Data Award. It is documented in this NI case study.










Chang Guang Satellite Technology, China’s first commercial remote sensing satellite company, develops and operates the JILIN-1 high-resolution remote-sensing satellite series, which has pioneered the application of commercial satellites in China. The company contemplates putting 60 satellites in orbit by 2020 and 138 satellites in orbit by 2030. Achieving that goal is going to take a lot of testing and testing consumes about 70% of the development cycle for space-based systems. So Chang Guang Satellite Technology knew it would need to automate its test systems and turned to National Instruments (NI) for assistance. The resulting automated test system has three core test systems using products from NI:


  • An S-band ground monitoring system, based mainly on NI’s 1st-generation PXI RF Vector Signal Transceiver (VST) and an NI FlexRIO module
  • An on-orbit satellite dynamic model hardware-in-the-loop (HIL) system with a sub-1msec closed-loop period, 100x better than a conventional test system design
  • A GPS simulator using an FPGA-based FlexRIO module to simulate high-dynamic, in-orbit satellite navigation signals



Chang Guang Test System.jpg


A Chang Guang Satellite Technology test system based on NI’s 1st-generation VST and FlexRIO PXIe modules



Here’s a sample image from the company’s growing satellite imaging portfolio:




Disneyland from Space.jpg 



Shanghai Disneyland as viewed from space





NI’s VSTs and FlexRIO modules are all based on multiple generations of Xilinx FPGAs. The company’s 2nd-generation VSTs are based on Virtex-7 FPGAs and its latest FlexRIO modules are based on Kintex-7 FPGAs.




This project was a 2017 NI Engineering Impact Award Finalist in the Aerospace and Defense category last month at NI Week. It is documented in this NI case study.




For more information about NI’s VST family, see:








Avnet has formally introduced its MiniZed dev board based on the Xilinx Zynq Z-7000S SoC with the low, low price of just $89. For this, you get a Zynq Z-7007S SoC with one ARM Cortex-A9 processor core, 512Mbytes of DDR3L SDRAM, 128Mbits of QSPI Flash, 8Gbytes of eMMC Flash memory, WiFi 802.11 b/g/n, and Bluetooth 4.1. The MiniZed board incorporates an Arduino-compatible shield interface, two Pmod connectors, and a USB 2.0 host interface for fast peripheral expansion. You’ll also find an ST Microelectronics LIS2DS12 Motion and temperature sensor and an MP34DT05 Digital Microphone on the board. This is a low-cost dev board that packs the punch of a fast ARM Cortex-A9 processor, programmable logic, a dual-wireless communications system, and easy system expandability.


I find the software that accompanies the board equally interesting. According to the MiniZed Product Brief, the $89 price includes a voucher for an SDSoC license so you can program the programmable logic on the Zynq SoC using C or C++ in addition to Verilog or VHDL using Vivado. This is a terrific deal on a Zynq dev board, whether you’re a novice or an experienced Xilinx user.


Avnet’s announcement says that the board will start shipping in early July.


Stefan Rousseau, senior technical marketing engineer for Avnet, said, “Whether customers are developing a Linux-based system or have a simple bare metal implementation, with MiniZed, Zynq-7000 development has never been easier. Designers need only connect to their laptops with a single micro-USB cable and they are up and running. And with Bluetooth or Wi-Fi, users can also connect wirelessly, transforming a mobile phone or tablet into an on-the-go GUI.”




Here’s a photo of the MiniZed Dev board:



Avnet MiniZed 3.jpg 


Avnet’s $89 MiniZed Dev Board based on a Xilinx Zynq Z-7007S SoC



And here’s a block diagram of the board:



MiniZed Block Diagram.jpg 


Avnet’s $89 MiniZed Dev Board Block Diagram


Many engineers in Canada wear the Iron Ring on their finger, presented to engineering graduates as a symbolic, daily reminder that they have an obligation not to design structures or other artifacts that fail catastrophically. (Legend has it that the iron in the ring comes from the first Quebec Bridge—which collapsed during its construction in 1907—but the legend appears to be untrue.) All engineers, whether wearing the Canadian Iron Ring or not, feel an obligation to develop products that do not fail dangerously. For buildings and other civil engineering works, that usually means designing structures with healthy design margins even for worst-case projected loading. However, many structures encounter worst-case loads infrequently or never. For example, a sports stadium experiences maximum loading for perhaps 20 or 30 days per year, for only a few hours at a time when it fills with sports fans. The rest of the time, the building is empty and the materials used to ensure that the structure can handle those loads are not needed to maintain structural integrity.


The total energy consumed by a structure over its lifetime is a combination of the energy needed to mine and fabricate the building materials and to build the structure (embodied energy) and the energy needed to operate the building (operational energy). The resulting energy curve looks something like this:




Embodied versus Operational Energy for a Structure.jpg



For completely passive structures, which describes most structures built over the past several thousand years, embodied energy dominates the total consumed energy because structural members must be designed to bear the full design load at all times. Alternatively, a smart structure with actuators that stiffen the structure only when needed will require more operational energy but the total required embodied energy will be smaller. Looking at the above conceptual graph, a well-designed active-passive system minimizes the total required energy for the structure.


Active control has already been used in structure design, most widely for vibration control. During his doctorate work, Gennaro Senatore formulated a new methodology to design adaptive structures. His research project was a collaboration between the University College London and Expedition Engineering. As part of that project, Senatore built a large scale prototype of an active-passive structure at the University College London structures laboratory. The resulting prototype is a 6m cantilever spatial truss with a 37.5:1 span-to-depth ratio. Here’s a photo of the large-scale prototype truss:



Active-Passive Cantilever Truss.jpg



You can see the actuators just beneath the top surface of the truss. When the actuators are not energized, the cantilever truss flexes quite a lot with a load placed at the extreme end. However, this active system detects the load-induced flexion and compensates by energizing the actuators and stiffening the cantilever.


Here’s a photo showing the amount of flex induced by a 100kg load at the end of the cantilever without and with energized actuators:




Cantilever Flexion.jpg 



The top half of the image shows that the truss flexes 170mm under load when the actuators are not energized, but only 2mm when the system senses the load and energizes the linear actuators.


The truss incorporates ten linear electric actuators that stiffen the truss when sensors detect a load-induced deflection. The control system for this active-passive truss consists of a National Instruments (NI) CompactRIO cRIO-9024 controller, 45 strain-gage sensors, 10 actuators, and five driver boards (one for each actuator pair.) The NI cRIO-9024 controller pairs with a card cage that accepts I/O modules and incorporates a Virtex-5 FPGA for reconfigurable I/O. (That’s what the “RIO” in cRIO stands for.) In this application, the integral Virtex-5 FPGA also provides in-line processing for acquired and generated signals.

The system is programmed using NI’s LabVIEW systems engineering software.


A large structure would require many such subsystems, all communicating through a network. This is clearly one very useful way to employ the IIoT in structures.



This project was a 2017 NI Engineering Impact Award Finalist in the Industrial Machinery and Control category last month at NI Week. It is documented in this NI case study, which includes many more technical details and a short video showing the truss in action as a load is applied.





National Instruments (NI) has just announced a baseband version of its 2nd-Generation PXIe VST (Vector Signal Transceiver), the PXIe-5820, with 1GHz of complex I/Q bandwidth. It’s designed to address the most challenging RF front-end module and transceiver test applications. Of course, you program it with NI’s LabVIEW system engineering software like all NI instruments and, like its RF sibling the PXIe-5840, the PXIe-5820 baseband VST is based on a Xilinx Virtex-7 690T FPGA and a chunk of the FPGA’s programmable logic is available to users for creating real-time, application-specific signal processing using LabVIEW FPGA. According to Ruan Lourens, NI’s Chief Architect of RF R&D, “The baseband VST can be tightly synchronized with the PXIe-5840 RF VST to sub-nanosecond accuracy, to offer a complete solution for RF and baseband differential I/Q testing of wireless chipsets.”




NI PXIe-5820 Baseband VST.jpg 


NI’s new PXIe-5820 Baseband VST





How might you use this feature? Here’s a very recent, 2-minute video demonstration of a DPD (digital predistortion) measurement application that provides a pretty good example:






MathWorks has just published a 30-minute video titled “FPGA for DSP applications: Fixed Point Made Easy.” The video targets users of the company’s MATLAB and Simulink software tools and covers fixed-point number systems, how these numbers are represented in MATLAB and in FPGAs, quantization and quantization challenges, sources of error and minimizing these errors, how to use MathWorks’ design tools to understand these concepts, implementation of fixed-point DSP algorithms on FPGAs using MathWorks’ tools, and the advantages of the Xilinx DSP48 block—which you’ll find in all Xilinx 28nm series 7, 20nm UltraScale, and 16nm UltraScale+ devices including Zynq SoCs and Zynq UltraScale+ MPSoCs.


The video also shows the development of an FIR filter using MathWorks’ fixed-point tools as an example with some useful utilization feedback that helps you optimize your design. The video also briefly shows how you can use MathWorks’ HDL Coder tool to develop efficient, single-precision, floating-point DSP hardware for Xilinx FPGAs.






When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.


Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:


  • Machine learning for manufacturing
  • Cloud acceleration
  • Autonomous cars, drones, and robots
  • Real-time 4K, UHD, and 8K video and image processing
  • VR and AR
  • High-speed networking by RF, LED-based free-air optics, and fiber
  • Cybersecurity for IIoT


That’s a huge range covered in just five minutes.


Here’s the video:






With LED automotive lighting now becoming commonplace, newer automobiles have the ability to communicate with each other (V2V communications) and with roadside infrastructure by quickly flashing their lights (LiFi) instead of using radio protocols. Researchers at OKATEM—the Centre of Excellence in Optical Wireless Communication Technologies at Ozyegin University in Turkey—have developed an OFDM-based LiFi demonstrator for V2V (vehicle-to-vehicle) and V2I (vehicle-to-infrastructure) applications that has achieved 50Mbps communications between vehicles as far apart as 70m in a lab atmospheric emulator.




Inside the OKATEM LiFi Atmospheric Chamber.jpg 


Inside the OKATEM LiFi Atmospheric Emulator



The demo system is based on PXIe equipment from National Instruments (NI) including FlexRIO FPGA modules. (NI’s PXIe FlexRIO modules are based on Xilinx Virtex-5 and Virtex-7 FPGAs.) The FlexRIO modules implement the LiFi OFDM protocols including channel coding, 4-QAM modulation, and an N-IFFT. Here’s a diagram of the setup:



LiFi V2V Communications Block Diagram.jpg 




Researchers developed the LiFi system using NI’s LabVIEW and LabVIEW system engineering software. Initial LiFi system performance demonstrated a data rate of 50 Mbps with as much as 70m between two cars, depending on the photodetectors’ location in the car (particularly its height above ground level). Further work will try to improve the total system performance by integrating advanced capabilities such as multiple-input, multiple-output (MIMO) communication and link adaptation on the top of OFDM architecture.




This project was a 2017 NI Engineering Impact Award Winner in the RF and Mobile Communications category last month at NI Week. It is documented in this NI case study.


A wide range of commercial, government, and social applications require precise aerial imaging. These application range from the management of high-profile, international-scale humanitarian and disaster relief programs to everyday commercial use—siting large photovoltaic arrays for example. Satellites can capture geospatial imagery across entire continents, often at the expense of spatial resolution. Satellites also lack the flexibility to image specific areas on demand. You must wait until the satellite is above the real estate of interest. Spookfish Limited in Australia along with ICON Technologies have developed the Spookfish Airborne Imaging Platform (SAIP) based on COTS (commercial off-the-shelf) products including National Instruments’ (NI’s) PXIe modules and LabVIEW systems engineering software that can capture precise images with resolutions of 6cm/pixel to better than 1cm/pixel from a light aircraft cruising at 160 knots at altitudes to 12,000 feet.


The 1st-generation SAIP employs one or more cameras installed in a tube attached to the belly of a light aircraft. Success with the initial prototype led to the development of a 2nd-generation design with two camera tubes. The system has continued to grow and now accommodates as many as three camera tubes with as many as four cameras per tube.


The multiple cameras must be steered precisely in continuous, synchronized motion while recording camera angles, platform orientation, and platform acceleration. All of this data is used to post-process the image data. At typical operating altitudes and speeds, the cameras must be steered with millidegree precision and the camera angles and platform position must be logged with near-microsecond accuracy and precision. Spookfish then uses a suite of open-source and proprietary computer-vision and photogrammetry techniques to process the imagery, which results in orthophotos, elevation data, and 3D models.


Here’s a block diagram of the Spookfish SAIP:



Spookfish SAIP Block diagram.jpg 




The NI PXIe system in the SAIP design consists of a PXIe-1082DC chassis, a PXIe-8135 RT controller, a PXI-6683H GPS/PPS synchronization module, a PXIe-6674T clock and timing module, a PXIe-7971R FlexRIO FPGA Module, and a PXIe-4464 sound and vibration module. (The PXIe7971R FlexRIO module is based on a Xilinx Kintex-7 325T FPGA. The PXI-6683H synchronization module and the PXIe-6674T clock and timing module are both based on Xilinx Virtex-5 FPGAs.)


Here’s an aerial image captured by an SAIP system at 6cm/pixel:



Spookfish SAIP image at 6cm per pixel.jpg 



And here’s a piece of an aerial image taken by an SAIP system at 1.5cm/pixel:



Spookfish SAIP image at 6cm per pixel.jpg 




During its multi-generation development, the SAIP system quickly evolved far beyond its originally envisioned performance specification as new requirements arose. For example, initial expectations were that logged data would only need to be tagged with millisecond accuracy. However, as the project progressed, ICON Technologies and NI improved the system’s timing accuracy and precision by three orders of magnitude.


NI’s FPGA-based FlexRIO technology was also crucial in meeting some of these shifting performance targets. Changing requirements pushed the limits of some of the COTS interfaces, so custom FlexRIO interface implementations optimized for the tasks were developed as higher-speed replacements. Often, NI’s FlexRIO technology is employed for the high-speed computation available in the FPGA’s DSP slices, but in this case it was the high-speed programmable I/O that was needed.


Spookfish and ICON Technologies are now developing the next-generation SAIP system. Now that the requirements are well understood, they’re considering a Xilinx FPGA-based or Zynq-based NI CompactRIO controller as a replacement for the PXIe system. NI’s addition of TSN (time-sensitive networking) to the CompactRIO family’s repertoire makes such a switch possible. (For more information about NI’s TSN capabilities, see “IOT and TSN: Baby you can drive my [slot] car. TSN Ethernet network drives slot cars through obstacles at NI Week.”)




This project was a 2017 NI Engineering Impact Award finalist in the Energy category last month at NI Week. It is documented in this NI case study.


There’s only one problem with the deuterium gas plasma inside of a fusion reactor: it’s hot, really hot! It must be 10 million ˚F (20 million ˚C) hot—hotter than the sun’s surface—if you want to achieve fusion. If this hot plasma touches the relatively cold sides of the reaction vessel, the plasma vanishes. So, you need to confine the plasma tightly in a magnetic field so that it doesn’t escape. You need to do that for long time periods if you want a fusion reaction that reliably produces power, which is after all the objective. How long is a long time?


Many minutes.


Researchers working on the Large Helical Device (LHD), a superconducting stellerator (a form of plasma fusion reactor) project initiated by Japan’s National Institute for Fusion Science to conduct fusion-plasma confinement research in a steady-state machine, have developed an advanced control system based on National Instruments (NI) CompactRIO embedded controller programmed in NI’s LabVIEW and LabVIEW FPGA to keep the plasma confined and hot inside of the reactor.





Interior of the LHD



Stabilizing the plasma inside of the LHD requires real-time control of high-energy heating, magnetic fields generated by superconducting electromagnets, and deuterium gas injection based on observed information such as plasma density, temperature, and optical emission. The heating is supplied by 30kV power lines, so control-system mistakes can have catastrophic consequences. In the past, LHD experiments required two or three operators for complex monitoring and response.


Here’s a “simplified” diagram of the LHD’s plasma control system:



LHD Control Diagram.jpg 




With the NI CompactRIO controller, bolstered by the high performance of its internal Xilinx FPGA (all of NI’s CompactRIO controllers are based on Xilinx FPGAs or the Zynq SoC), the LHD control system sustained a high-performance plasma for more than 48 minutes with a total injected energy was 3.4GJ (that’s GigaJoules). The 48-minute duration for the sustained plasma sets a record that bests the previous record set more than a decade previously by more than 3x.


On March 7, 2017, LHD ignited its first deuterium plasma.


This amazing project was a 2017 NI Engineering Impact Award finalist in the Energy category last month at NI Week and won the 2017 Engineering Grand Challenges Award. It is documented in this NI case study.



Medium- and heavy-duty fleet vehicles account for a mere 4% of the vehicles in use today but they consume 40% of the fuel used in urban environments, so they are cost-effective targets for innovations that can significantly improve fuel economy. Lightning Systems (formerly Lightning Hybrids) has developed a patented hydraulic hybrid power-train system called ERS (Energy Recovery System) that can be retrofitted to new or existing fleet vehicles including delivery trucks and shuttle buses. This hybrid system can reduce fleet fuel consumption by 20% and decrease NOx emissions (the key component of smog) by as much as 50%! In addition to being a terrific story about energy conservation and pollution control, the development of the ERS system tells a great story about using National Instruments’ (NI’s) comprehensive line of LabVIEW-compatible CompactRIO (cRIO) and Single-Board RIO (sbRIO) controllers to develop embedded controllers destined for production.


Like an electric hybrid vehicle power train, an ERS-enhanced power train recovers energy during vehicle braking and adds that energy back into the power train during acceleration. However, Lightning Systems’ ERS stores the energy using hydraulics instead of electricity.


Here are the components of the ERS retrofit system, shown installed in series with a power train’s drive shaft:




Lightning Hybrids ERS Diagram.jpg 



Major components in the Lightning Systems ERS Hybrid Retrofit System




The power-transfer module (PTM) in the above image drives the hydraulic pump/motor during vehicle braking, pumping hydraulic fluid into the high- and low-pressure accumulator tanks, which act like mechanical batteries that store energy in tanks pressurized by nitrogen-filled bladders. When the vehicle accelerates, the pump/motor operates as a motor driven by the pressurized hydraulic fluid’s energy stored in the accumulators. The hydraulic motor puts energy back into the vehicle’s drive train through the PTM. A valve manifold controls the filling and emptying of the accumulator tanks during vehicle operation and all of the ERS control sequencing is handled by a National Instruments (NI) RIO controller programmed using NI’s LabVIEW system development software. All of NI’s Compact and Single-Board RIO controllers incorporate a Xilinx FPGA or a Xilinx Zynq SoC to provide real-time control of closed-loop systems.


Lightning Systems has developed four generations of ERS controllers based on NI’s CompactRIO and Single-Board RIO controllers. The company based its first ERS prototype controller on an 8-slot NI CRIO-9024 controller and deployed the design in pilot systems. A 2nd-generation ERS prototype controller used a 4-slot NI cRIO-9075 controller, which incorporates a Xilinx Spartan-6 LX25 FPGA. The 3rd-generation ERS controller used an NI sbRIO-9626 paired with a custom daughterboard. The sbRIO-9626 incorporates a larger Xilinx Spartan-6 LX45 FPGA and Lightning Systems fielded approximately 100 of these 3rd-generation ERS controllers.




Lightning Hybrids v2 v3 v4 ERS Controllers.jpg 


Three generations of Lightning Systems’ ERS controller (from left to right: v2, v3, and v4) based on

National Instruments' Compact RIO and Single-Board RIO controllers




For its 4th-generation ERS controller, the company is using NI’s sbRIO-9651 single-board RIO SOM (system on module), which is based on a Xilinx Zynq Z-7020 SoC. The SOM is also paired with a custom daughterboard. Using NI’s Zynq-based SOM reduces the controller cost by 60% while boosting the on-board processing power and adding in a lot more programmable logic. The SOM’s additional processing power allowed Lightning Systems to implement new features and algorithms that have increased fuel economy.




Lightning Hybrids v4 ERS Controller.jpg 


Lightning Systems v4 ERS Controller uses a National Instruments sbRIO-9651 SOM based on a

Xilinx Zynq Z-7020 SoC




Lightning Systems is able to easily migrate its LabVIEW code throughout these four ERS controller generations because all of NI’s CompactRIO and Single-Board RIO controllers are software-compatible. In addition, this controller design allows easy field upgrades to the software, which reduces vehicle downtime.


Lightning Systems has developed a modular framework so that the company can quickly retrofit the ERS to most medium- and heavy-duty vehicles with minimal new design work or vehicle modification. The PTM/manifold combination mounts between the vehicle’s frame rails. The accumulators can reside remotely, wherever space is available, and connect to the valve manifold through high-pressure hydraulic lines. The system is designed for easy installation and the company can typically convert a vehicle’s power train into a hybrid system in less than a day. Lightning Systems has already received orders for ERS hybrid systems from customers in Alaska, Colorado, Illinois, and Massachusetts, as well as around the world in India and the United Kingdom.




Lightning Hybrids Typical ERS Installation.jpg 


Typical Lightning Systems ERS Installation



This project recently won a 2017 NI Engineering Impact Award in the Transportation and Heavy Equipment category and is documented in this NI case study.




Blood pumps for extracorporeal life support (ECLS) are used in medical therapies to support failing human organ systems. Conventional blood pumps use mechanically driven impellers supported on bearings and these impellers are prone to stress and heat concentration on the shaft-bearing contact areas, which increases hemolysis (rupture or destruction of red blood cells) and thrombosis (blood clots). Both are bad news in the bloodstream. In addition, ECLS applications require that any components that touch the blood be disposable, to prevent infection.


The Precision Motion Control Lab at MIT and Ension, Inc. are developing a new type of blood pump with a low-cost, disposable, bearingless impeller to reduce costs in ECLS applications. Magnetic levitation through reluctance coupling replaces the impeller’s mechanical bearings and hysteresis coupling drives the impeller using magnetically induced torque, which eliminates the mechanical drive shaft. Both magnetic forces are supplied by a 12-coil electromagnet in this new design.


To further reduce the cost of the replaceable rotor/impeller assembly, the design team substituted a steel ring made of type D2 tool steel for the normal permanent magnet in the rotor. The “D2 ring” is inductively magnetized by the coupled magnetic fields from the stator electromagnets. Reluctance coupling pulls the outer edges of the ring, causing it to levitate, while a rotating magnetic field generated by the twelve stator coils imparts rotational torque on the D2 ring, causing the impeller to spin.


Controlling the stator coils to produce the correct magnetic fields for levitation and motion requires closed-loop control of all twelve electromagnets in the stator. The design team chose the National Instruments (NI) MyRIO Student Embedded Controller because it’s easily programmed in NI’s LabVIEW systems engineering software package and because the MyRIO’s integrated Xilinx Zynq Z-7010 SoC incorporates the high-speed programmable logic needed to provide real-time, deterministic, closed-loop stator control.


Here’s a photo of a prototype bearingless motor for this design, showing the 12-magnet stator and the D2 ring rotor on the left and a National Instruments MyRIO controller on the right (and yes, that’s the Xilinx Zynq SoC peeking through the plastic window in the MyRIO controller):




Bearingless Blood Pump Prototype Motor.jpg 




Closed-loop feedback comes from four eddy-current sensors, which are sense coils driven by Texas Instruments LDC1101 16-bit LDCs (inductance-to-digital converters). The four LDC boards appear in the upper left part of the above photo. The four eddy-current sensors are organized in two pairs that differentially measure real-time rotor position. Each sensor connects to the MyRIO controller and the Zynq SoC using individual 5-wire SPI interfaces, as shown below:




Bearingless Blood Pump LDC Detail.jpg 




The MyRIO controller drives the blood pump's 12-phase stator through twelve analog channels—built from an NI cRIO-9076 4-slot CompactRIO controller (with an integrated Xilinx Virtex-5 LX45 FPGA), three NI-9263 voltage output modules, and one NI 9205 voltage input module—and twelve custom linear transconductance power amplifiers. The flexibility this setup provides permits the design team to experiment with and refine different motor-control algorithms.


Closed-loop, drive-with-feedback control algorithms are implemented in the Zynq SoC’s programmable logic because software-based microcontroller or microprocessor control loops would not have been fast enough or sufficiently deterministic. Although this controller design is capable of implementing a 46KHz control loop, the actual loop rate is 10KHz because that’s fast enough for this electromechanical system. The Zynq SoC’s 32-bit ARM Cortex-A9 processors in the MyRIO controller implement the system’s user interface and data logging.


This project won a 2017 NI Engineering Impact Award in the Advanced Research category and is documented in this NI case study.



Adam Taylor’s MicroZed Chronicles Part 199: The AD9467 SDSOC Platform

by Xilinx Employee ‎05-31-2017 03:36 PM - edited ‎05-31-2017 03:54 PM (2,999 Views)


By Adam Taylor


Having got the base hardware and software designs up and running, the next step is to create a SDSoC platform so that we can use this design efficiently. The SDSoC platform allows us to implement algorithms at a much higher level using C or C++. We can therefore develop C or C++ programs using SDSoC to access the ADC sample data within the DDR memory and verify that our algorithms work correctly. Once we are sure that we have the correct algorithmic function (but not necessarily the desired performance), we can accelerate these algorithms by putting them into the Zynq SoC’s programmable logic (PL) rather seamlessly. Taking such an approach enables us to use one base design for a range of applications. Because we are developing in a higher language, the time taken to produce the first working demonstration is reduced.


To generate an SDSoC platform, we need a Vivado base design, the necessary software libraries, and three definition files:


  • XPFM – This is the top-level definition of the Platform – Generated by hand
  • HPFM – The hardware definition of the platform – Generated by Vivado
  • SPFM – Defines the software definition of the platform – Generated by hand


The first thing we need to do to create the SDSoC platform (I am using version 2016.3) is to modify the design in Vivado using the UG1146 requirements for a hardware platform. This means that we need to update the concatenation block and move the used interrupts down to the least significant inputs. This frees up the remaining interrupts so that SDSoC can use them when it accelerates an algorithm using hardware. I also enabled all four FCLKs and Resets from the Zynq SoC’s PS (processing system) to the PL and instantiated the reset blocks for each of these clocks. I then followed the steps within UG1146 to create the hardware metadata to create one half of the platform. In this case, the hardware side of the SDSoC Platform makes available the AXI ACP, AXI HP2, AXI HP3, and AXI GP Master 1 connections. The other AXI interfaces are already in use by the existing AD9467 demo design.


There is one more thing we need as we create the hardware platform. Because this is a custom platform, which uses custom IP, we need to ensure that the IP is within the Vivado project for the SDSoC Platform. If it is not, then when we try to build our SDSoC platform we will get several failures in the build process because it cannot find IP information. The simplest method for preventing this problem is to use the Vivado Archive function to archive the design. Then the archived design will be extracted and used to define the SDSoC hardware platform.


To create the software platform (as we are using the ZedBoard for this example), I initially copied the software and top-level XML file from the <SDSoC Install>/platforms/zed directory, before editing them to reflect the needs of the platform:





Top Level of the ad9467_fmc_zed SDSoC Platform



These steps provided me with an SDSoC platform that I can use for development with the ZedBoard and the AD9467 FMC. My next step then was to perform some pipe cleaning to ensure that the platform functions as intended. To do this I wanted to:



  1. Build the AD9467 demo application and run it from with SDSoC with no acceleration.
  2. Create a simple acceleration example built onto to the base hardware. For this I am going to use one of the matrix multiply examples.



As I did not declare a prebuilt platform, SDSoC will generate the hardware the first time we build the application. I did this to ensure that SDSoC can re-build the hardware design without any accelerations but with the custom IP blocks needed for the AD9467 demo.






Vivado Diagram as used for the AD9467 Demo application



Having built the first application successfully, I then ran it on the ZedBoard with the AD9467 FMC connected and observed the same performance as I had previously seen when using SDK. This means that I can start developing that use the data provided by the AD9467 within the SDSoC environment.






However, once I have finished generating and testing my algorithms in C/C++, I will want to accelerate elements of the design. That is where the second test of the platform comes in: to test that the platform is correctly defined and is therefore capable of accelerating C and C++ functions into the hardware. Within the AD9467 FMC SDSoC platform, I created an example application for acceleration using one of the predefined SDSoC examples: the mmult. This will add functionality necessary to perform the MMult within the hardware in addition to the base design we have been using for the AD9467.





Accelerating the mmult_accel function in the AD9467 FMC Zed Platform






Resultant SDSoC Vivado design, AD9467 FMC design with additional hardware for the mmult_accel function (circled in red)






MMULT results on the AD9467 FMC Zed Platform



Generating this SDSoC platform was pretty simple and it allows us to develop our applications much faster than would be the case if we were using a standard HDL based approach. We will look at how we can do signal processing with this platform in future blogs.



I have uploaded the SDSoC Platform to the following git hub repository which is different to the standard one due to the organization of the platform.






If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



 MicroZed Chronicles Second Year.jpg



Adam Taylor’s MicroZed Chronicles Part 198: Building the 250Msamples/sec AD9467 FMC Card

by Xilinx Employee ‎05-30-2017 10:48 AM - edited ‎05-30-2017 10:50 AM (3,788 Views)


By Adam Taylor



Last week I mentioned, the Analog Devices AD9467 FMC in the blog and how we could use it with the Xilinx SDSoC development environment to capture data with a simple data-capture chain and then develop and accelerate the algorithm using a high-level language like C or C++.





Analog Devices AD9467 FMC and Zynq-based Avnet ZedBoard Combined




The AD9467 FMC contains the AD9467 ADC, which provides 16-bit quantization at sampling rates of up to 250Msamples/sec (MSPS). These specs allow us to use the AD9467 to sample Intermediate Frequency (IF) signals. An IF is used to move an RF carrier wave down from or up to a higher frequency for reception or transmission.


The first thing we need to do with the AD9467 board is to work out the clocking scheme we’ll use to provide the ADC with a sample clock. We have three options:


  1. Apply an externally generated sine-wave. This option allows us to easily change the sampling frequency. However, to ensure good convertor performance, we’ll need a low-jitter clock from a quality signal source.
  2. Use the on-board oscillator. This option provides a fixed 250MHz reference clock to the ADC. It has the advantage of being an on-board resource with a known good layout. However, its sampling frequency is fixed.
  3. Use the on-board AD9517—an SPI-controlled, 12-output clock generator. This option gives us the ability to set the sampling frequency as desired.


To change between the three sources, we add and remove ac coupling capacitors from the circuit to put the correct clock generator in the clock path. By default, the clock path is configured to use the external clock source.


However, before we can create an SDSoC Platform, we need to create a base design in Vivado. This base design interfaces with the AD9467 FMC and transfers the sampled data into the Zynq SoC’s PS (processing system) DDR memory using DMA. Rather helpfully, the AD9467 FMC comes with a Vivado example that we can use with the ZedBoard. This example design creates the structure to transfer samples into the PS DDR SDRAM using DMA.


To recreate this design, the first thing we need to do is download the Analog Devices Git Hub repository, which contains both the shared IP elements required and the actual Vivado design example. To ensure we are using the latest possible tool chain, select the latest tool revision from the Git Hub and download a zip of the repository or clone the repository from here.


To build this project, we need to be using either a Linux box or, if we are using Microsoft Windows, we’ll need to download and install CYGWIN. If you are using CYGWIN, you need to make sure you have Vivado in your path.


To build the project you just need to use either a terminal or CYGWIN to navigate to the AD9467_FMC directory and execute the make file for the Zed version.





Make file running in CYGWIN to recreate the project




Once this has been recreated, we will be able to open our project in Vivado, explore the design in the block diagram, and export the design. We can then use the test application software to complete the demo.






AD9467 FMC example design




As can be seen in the above example, these steps add the FMC example into the existing Zynq base hardware design so that all the other interfaces like HDMI are still available. These additional interfaces can be very useful to us. In the diagram above, you can see the highlighted path from the AD9467 receiver IP, into a DMA IP block and then an AXI Interconnect block that connects to a Zynq HP (high-performance) AXI port. This design allows the data move seamless into the PS DDR SDRAM for future processing.


Of course to do this we need to run some software on the Zynq SoC’s ARM Cortex-A9 processor to configure the AD9467, the AD9517, and the simple internal processing pipeline. You can download the demo application example from here on GitHub. Helpfully, it comes with batch files (one for Linux one for Windows), which are used to create the demo software application to support the Vivado design.


When we run this example on the Zynq SoC, we will find that it performs a number of tests prior to performing the first ADC sample capture.






Terminal Output from ZedBoard if the FMC is present




The samples will be stored at 0x0800_0000 within the DDR SDRAM. Using the debug facility within SDK, we can examine these values and see that they are updated when the sampling occurs.





DDR Memory location at 0x0800_0000 following power cycle






DDR Memory Location at 0x0800_0000 following the samples being captured




With this up and working, we can now think about how we can use the base platform efficiently to implement higher-level signal-processing algorithms.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg




This week at its annual NI Week conference in Austin, Texas, National Instruments (NI) announced a new FlexRIO PXIe module, the PXIe-7915, based on three Xilinx Kintex UltraScale FPGAs. NI’s PCIe FlexRIO modules serve two purposes in NI-based systems: flexible, programmable, high-speed I/O and high-speed computation (usually DSP). NI’s customers access these FlexRIO resources using the company’s LabVIEW FPGA software, part of NI’s LabVIEW graphical development environment. Thanks to the Kintex UltraScale FPGA, new FlexRIO PXIe-7915 module contains significantly more programmable resources and delivers significantly more performance than previous FlexRIO modules, which are all based on earlier generations of Xilinx FPGAs. The set of graphs below shows the increased resources and performance delivered by the PXIe-7915 FlexRIO module in NI systems relative to previous-generation FlexRIO modules based on Xilinx Kintex-7 FPGAs:




FlexRIO UltraScale Graphs.jpg 



However, the UltraScale-based FlexRIO modules are not simply standalone products. They serve as design platforms for NI’s design engineers, who will use these platforms to develop many new, high-performance instruments. In fact, NI introduced the first two of these new instruments at NI Week 2017: the PXIe-5763 and PXIe-5764 high-speed, quad-channel, 16-bit digitizers. Here are the specs for these two new digitizers from NI:



NI FlexRIO Digitizers based on Kintex UltraScale FPGAs.jpg 



Previous digitizers in this product family employed parallel LVDS signals to couple high-speed ADCs to an FPGA. However, today’s fastest ADCs employ high-speed serial interfaces--particularly the JESD-204B interface specification—necessitating a new design. The resulting new design uses the FlexRIO PXIe-7915 card as a motherboard and the JESD204B ADC card as a mezzanine board, as shown in this photo:



NI FlexRIO PXIe-5764 Digitizer.jpg 




NI’s design engineers took advantage of the pin compatibility among various Kintex UltraScale FPGAs to maximize the flexibility of their design. They can populate the FlexRIO PXIe-7915 card with a Kintex UltraScale KU035, KU040, or KU060 FPGA depending on customer requirements. This flexibility allows them to create multiple products using one board layout—a hallmark of a superior, modular platform design.


Normally, you access the programmable-logic features of a Xilinx FPGA or Zynq SoC inside of an NI product using LabVIEW FPGA, and that’s certainly still true. However, NI has added something extra in its LabVIEW 2017 release: a Xilinx Vivado Project Export feature that provides direct access to the Xilinx Vivado Design Suite tools for hardware engineers experienced with writing HDL code for programmable logic. Here’s how it works:



LabVIEW Vivado Export Design Flow.jpg 




You can export all the necessary hardware files from LabVIEW 2017 to a Vivado project that is pre-configured for your specific deployment target. Any LabVIEW signal-processing IP in the LabVIEW design is included in the export as encrypted IP cores. As an added bonus, you can use the new Xilinx Vivado Project Export on all of NI’s FlexRIO and high-speed-serial devices based on Xilinx Kintex-7 or newer FPGAs.



NI has published a White Paper describing all of this. You’ll find it here.


Please contact National Instruments directly for more information about the new FlexRIO modules and LabVIEW 2017.



Adam Taylor’s MicroZed Chronicles, Part 196: SDSoC and Levels of Abstraction

by Xilinx Employee ‎05-22-2017 09:40 AM - edited ‎05-22-2017 10:28 AM (4,522 Views)


By Adam Taylor



We have looked at SDSoC several times throughout this series, however I recently organized and presented at the NMI FPGA Machine Vision event and during the coffee breaks and lunch, attendees showed considerable interest in SDSoC—not only for its use in the Xilinx reVISION acceleration stack but also its use in a range of over developments. As such, I thought it would be worth some time looking at what SDSoC is and the benefits we have previously gained using it. I also want to discuss a new use case.





SDSoC Development Environment




SDSoC is an Eclipse-based, system-optimizing compiler that allows us to develop our Zynq SoC or Zynq UltraScale+ MPSoC design in its entirety using C or C++. We can then profile the application to find aspects that cause performance bottlenecks and move then into the Zynq device’s Programmable Logic (PL). SDSoC does this using HLS (High Level Synthesis) and a connectivity framework that’s transparent to the user. What this means is that we are able develop at a higher level of abstraction and hence reduce the time to market of the product or demonstration.


To do this, SDSoC needs a hardware platform, which can be pre-defined or custom. Typically, these platforms within the PL provide the basics: I/O interfaces and DMA transfers to and from Zynq device’s PS’ (Processing System’s) DDR SDRAM. This frees up most the PL resources and PL/PS interconnects to be used by SDSoC when it accelerates functions.


This ability to develop at a higher level and accelerate performance by moving functions into the PL enables us to produce very flexible and responsive systems. This blog has previously looked at acceleration examples including AES encryption, matrix multiplication, and FIR Filters. The reduction in execution time has been significant in these cases. Here’s a table of these previously discussed examples:





Previous Acceleration Results with SDSoC. Blogs can be found here




To aid us in the optimization of the final application, we can use pragmas to control the HLS optimizations. We can use SDSoC’s tracing and profiling capabilities while optimizing these accelerated functions and the interaction between the PS and PL.


Here’s an example of a trace:





Results of tracing an example application

(Orange = Software, Green = Accelerated function and Blue = Transfer)



Let us take a look at a simple use case to demonstrate SDSoC’s abilities.


Frequency Modulated Continuous Wave (FMCW) RADAR is used for a number of applications that require the ability to detect objects and gauge their distance. FMCW applications make heavy use of FFT and other signal-processing techniques such as windowing, Constant False Alarm Rate (CFAR), and target velocity and range extraction. These algorithms and models are ideal for description using a high-level language such as C / C++. SDSoC can accelerate the execution of functions described this way and such an approach allows you to quickly demonstrate the application.


It is possible to create a simple FMCW receive demo using a ZedBoard and an AD9467 FPGA Mezzanine Card (FMC). At the simplest level, the hardware element of the SDSoC platform needs to be able to transfer samples received from the ADC into the PS memory space and then transfer display data from the PS memory space to the display, which in most cases will be connected with DVI or HDMI interfaces.






Example SDSoC Platform for FMCW application



This platform permits development of the application within SDSoC at a higher level. It also provides a platform that we can use for several different applications, not just FMCW. Rather helpfully, the AD9467 FMC comes with a reference design that can serve as the hardware element of the SDSoC Platform. It also provides drivers, which can be used as part of the software element.


With a platform in hand, it is possible to write the application within the SDSoC using C or C++, where we can make use of the acceleration libraries and stacks including matrix multiplication, math functions, and the ability to wrap bespoke HLD IP cores and use them within the development.


Developing in this manner provides a much faster development process, and provides a more responsive solution as it leverages the Zynq PL for inherently parallel or pipelined functions. It also makes it easier to upgrade designs in terms. As the majority development will also use C or C++ and because SDSoC is a system-optimizing complier, the application developer does not need to be a HDL specialist.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 




  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 







After telegraphing its intent for more than a year, Xilinx has now added the P416 language to its SDNet Development Environment for high-speed (1Gbps to 100Gbps) packet processing. SDNet release 2017.1 includes a generally accessible, front-end P4-to-SDNet translator. P416 is the latest version of the P4 language and the SDNet workflow compiles packet-processing descriptions into data-plane switching algorithms instantiated in high-speed Xilinx FPGAs. Xilinx debuted the new SDNet release at this week’s P4 Developer Day and P4 Workshop held at Stanford U. in Palo Alto, CA. (There was a beta version of the translator in the prior SDNet 2016.4 release.)


There’s information about the new Xilinx P4-toSDNet translator in the latest version of the SDNet Packet Processor User Guide (UG1012) and the P4-SDNet Translator User Guide (UG1252). If you’re up on recent developments with the P416 language, you might want to jump to these user guides directly. Otherwise, you might want to take a look at this Linley Group White Paper titled “Xilinx SDNet: A New Way to Specify Network Hardware”, written by Senior Analyst Loring Wirbel, or watch this short video first:






And if you have a couple of hours to devote to learning a lot more about the P4 language, try this video from the P4 Language Consortium, which includes presentations from Vladimir Gurevich from Barefoot Networks, Ben Pfaff from VMware, Johann Tonsing from Netronome, and Gordon Brebner from Xilinx:








A paper titled “Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs” that discusses the advantages and efficiencies of Python-based development using the PYNQ development environment—based on the Python programming language and Jupyter Notebooks—and the Digilent PYNQ-Z1 board, which is based on the Xilinx Zynq SoC, recently won the Best Short Paper award at the 25th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2017) held in Napa, CA. The paper’s authors—Senior Computer Scientist Andrew G. Schmidt, Computer Scientist Gabriel Weisz, and Research Director Matthew French from the USC Viterbi School of Engineering’s Information Sciences Institute—evaluated the impact of, the performance implications, and the bottlenecks associated with using PYNQ for application development on Xilinx Zynq devices. The authors then compared their Python-based results against existing C-based and hand-coded implementations.



The authors do a really nice job of describing what PYNQ is:



“The PYNQ application development framework is an open source effort designed to allow application developers to achieve a “fast start” in FPGA application development through use of the Python language and standard “overlay” bitstreams that are used to interact with the chip’s I/O devices. The PYNQ environment comes with a standard overlay that supports HDMI and Audio inputs and outputs, as well as two 12-pin PMOD connectors and an Arduino-compatible connector that can interact with Arduino shields. The default overlay instantiates several MicroBlaze processor cores to drive the various I/O interfaces. Existing overlays also provide image filtering functionality and a soft-logic GPU for experimenting with SIMT [single instruction, multiple threads] -style programming. PYNQ also offers an API and extends common Python libraries and packages to include support for Bitstream programming, directly access the programmable fabric through Memory-Mapped I/O (MMIO) and Direct Memory Access (DMA) transactions without requiring the creation of device drivers and kernel modules.”



They also do a nice job of explaining what PYNQ is not:



“PYNQ does not currently provide or perform any high-level synthesis or porting of Python applications directly into the FPGA fabric. As a result, a developer still must use create a design using the FPGA fabric. While PYNQ does provide an Overlay framework to support interfacing with the board’s IO, any custom logic must be created and integrated by the developer. A developer can still use high-level synthesis tools or the aforementioned Python-to-HDL projects to accomplish this task, but ultimately the developer must create a bitstream based on the design they wish to integrate with the Python [code].”



Consequently, the authors did not simply rely on the existing PYNQ APIs and overlays. They also developed application-specific kernels for their research based on the Redsharc project (see “Redsharc: A Programming Model and On-Chip Network for Multi-Core Systems on a Programmable Chip”) and they describe these extensions in the FCCM 2017 paper as well.




Redsharc Project.jpg




So what’s the bottom line? The authors conclude:


“The combining of both Python software and FPGA’s performance potential is a significant step in reaching a broader community of developers, akin to Raspberry Pi and Ardiuno. This work studied the performance of common image processing pipelines in C/C++, Python, and custom hardware accelerators to better understand the performance and capabilities of a Python + FPGA development environment. The results are highly promising, with the ability to match and exceed performances from C implementations, up to 30x speedup. Moreover, the results show that while Python has highly efficient libraries available, such as OpenCV, FPGAs can still offer performance gains to software developers.”


In other words, there’s a vast and unexplored territory—a new, more efficient development space—opened to a much broader system-development audience by the introduction of the PYNQ development environment.


For more information about the PYNQ-Z1 board and PYNQ development environment, see:







This week, just in time for the Embedded Vision Summit in Santa Clara, Aldec announced its TySOM-2A Embedded Prototyping Board based on a Xilinx Zynq Z-7030 SoC. The board features a combination of memories (1Gbyte of DDR3 SDRAM, SPI flash memory, EEPROM, microSD), communication interfaces (2× Gigabit Ethernet, 4× USB 2.0, UART-via-USB, Wi-Fi, Bluetooth, HDMI 1.4), an FMC connector, and other miscellaneous modules (LEDs, DIP switches, XADC, RTC, accelerometer, temperature sensor). Here’s a photo of the TySOM-2A board:





Aldec TySOM-2A Board.jpg



Aldec TySOM-2A Embedded Prototyping Board based on a Xilinx Zynq Z-7030 SoC



In its booth at the Summit, Aldec demonstrated a real-time, face-detection reference design running on the Zynq SoC. The program depends on the accelerated processing capabilities of the Zynq SoC’s programmable logic to run this complex code, processing a 1280x720-pixel video stream in real time. The most computationally intensive parts of the code including edge detection, color-space conversion, and frame merging were off-loaded from Zynq SoC’s ARM Cortex-A9 processor to the device’s programmable logic using Xilinx’s SDSoC Development Environment.


Here’s a very short video showing the demo:






This week at the Embedded Vision Summit in Santa Clara, CA, Mario Bergeron demonstrated a design he’d created that combines real-time visible and IR thermal video streams from two different sensors. (Bergeron is a Senior FPGA/DSP Designer with Avnet.) The demo runs on an Avnet PicoZed SOM (System on Module) based on a Xilinx Zynq Z-7030 SoC. The PicoZed SOM is the processing portion of the Avnet PicoZed Embedded Vision Kit. An FMC-mounted Python-1300-C image sensor supplies the visible video stream in this demo and a FLIR Systems Lepton image sensor supplies the 60x80-pixel IR video stream. The Lepton IR sensor connects to the PicoZed SOM over a Pmod connector on the PicoZed.


Here’s a block diagram of this demo:



Avnet reVISION demo with PicoZed Embedded Vision Kit.jpg 



Bergeron integrated these two video sources and developed the code for this demo using the new Xilinx reVISION stack, which includes a broad range of development resources for vision-centric platform, algorithm, and application development. The Xilinx SDSoC Development Environment and the Vivado Design Suite including the Vivado HLS high-level synthesis tool are all part of the reVISION stack, which also incorporates OpenCV libraries and machine-learning frameworks such as Caffe.


In this demo, Bergeron’s design takes the visible image stream and performs a Sobel edge extraction on the video. Simultaneously, the design also warps and resizes the IR Thermal image stream so that the Sobel edges can be combined with the thermal image. The Sobel and resizing algorithms come from the Xilinx reVISION stack library and Bergeron wrote the video-combining code in C. He then synthesized these three tasks in hardware to accelerate them because they were the most compute-intensive tasks in the demo. Vivado HLS created the hardware accelerators for these tasks directly from the C code and SDSoC connected the accelerator cores to the ARM processor with DMA hardware and generated the software drivers.


Here’s a diagram showing the development process for this demo and the resulting system:



Avnet reVISION demo Project Diagram.jpg 



In the video below, Bergeron shows that the unaccelerated Sobel algorithm running in software consumes 100% of an ARM Cortex-A9 processor in the Zynq Z-7030 SoC and still only achieves about one frame/sec—far too slow. By accelerating this algorithm in the Zynq SoC’s programmable logic using SDSoC and Vivado HLS, Bergeron cut the processor load by more than 80% and achieved real-time performance. (By my back-of-the envelope calculation, that’s about a 150x speedup: going from 1 to 30 frames/sec and cutting the processor load by more than 80%.)


Here’s the 5-minute video of this fascinating demo:







For more information about the Avnet PicoZed Embedded Vision Kit, see “Avnet’s $1500, Zynq-based PicoZed Embedded Vision Kit includes Python-1300-C camera and SDSoC license.”



For more information about the Xilinx reVISION stack, see “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge,” and “Yesterday, Xilinx announced the reVISION stack for software-defined embedded-vision apps. Today, there’s two demo videos.”



In this 40-minute webinar, Xilinx will present a new approach that allows you to unleash the power of the FPGA fabric in Zynq SoCs and Zynq UltraScale+ MPSoCs using hardware-tuned OpenCV libraries, with a familiar C/C++ development environment and readily available hardware development platforms. OpenCV libraries are widely used for algorithm prototyping by many leading technology companies and computer vision researchers. FPGAs can achieve unparalleled compute efficiency on complex algorithms like dense optical flow and stereo vision in only a few watts of power.


This Webinar is being held on July 12. Register here.


Here’s a fairly new, 4-minute video showing a 1080p60 Dense Optical Flow demo, developed with the Xilinx SDSoC Development Environment in C/C++ using OpenCV libraries:





For related information, see Application Note XAPP1167, “Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries.”


Plethora IIoT develops cutting‑edge solutions to Industry 4.0 challenges using machine learning, machine vision, and sensor fusion. In the video below, a Plethora IIoT Oberon system monitors power consumption, temperature, and the angular speed of three positioning servomotors in real time on a large ETXE-TAR Machining Center for predictive maintenance—to spot anomalies with the machine tool and to schedule maintenance before these anomalies become full-blown faults that shut down the production line. (It’s really expensive when that happens.) The ETXE-TAR Machining Center is center-boring engine crankshafts. This bore is the critical link between a car’s engine and the rest of the drive train including the transmission.




Plethora IIoT Oberon System.jpg 




Plethora uses Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs as the heart of its Oberon system because these devices’ unique combination of software-programmable processors, hardware-programmable FPGA fabric, and programmable I/O allow the company to develop real-time systems that implement sensor fusion, machine vision, and machine learning in one device.


Initially, Plethora IIoT’s engineers used the Xilinx Vivado Design Suite to develop their Zynq-based designs. Then they discovered Vivado HLS, which allows you to take algorithms in C, C++, or SystemC directly to the FPGA fabric using hardware compilation. The engineers’ first reaction to Vivado HLS: “Is this real or what?” They discovered that it was real. Then they tried the SDSoC Development Environment with its system-level profiling, automated software acceleration using programmable logic, automated system connectivity generation, and libraries to speed programming. As they say in the video, “You just have to program it and there you go.”


Here’s the video:





Plethora IIoT is showcasing its Oberon system in the Industrial Internet Consortium (IIC) Pavilion during the Hannover Messe Show being held this week. Several other demos in the IIC Pavilion are also based on Zynq All Programmable devices.



You are never going to get past a certain performance barrier by compiling C for a software-programmable processor. At some point, you need hardware acceleration.


As an analogy: You can soup up a car all you want; it’ll never be an airplane.


Sure, you can bump the processor clock rate. You can add processor cores and distribute the tasks. Both of these approaches increase power consumption, so you’ll need a bigger and more expensive power supply; they increase heat generation, which means you will need better cooling and probably a bigger heat sink or a fan (or another fan); and all of these things increase BOM costs.


Are you sure you want to take that path? Really?


OK, you say. This blog’s from an FPGA company (actually, Xilinx is an “All Programmable” company), so you’ll no doubt counsel me to use an FPGA to accelerate these tasks and I don’t want to code in Verilog or VHDL, thank you very much.


Not a problem. You don’t need to.


You can get the benefit of hardware acceleration while coding in C or C++ using the Xilinx SDSoC development environment. SDSoC produces compiled software automatically coupled to hardware accelerators and all generated directly from your high-level C or C++ code.


That’s the subject of a new Chalk Talk video just posted on the eejournal.com Web site. Here’s one image from the talk:



SDSoC Acceleration Results.jpg



This image shows three complex embedded tasks and the improvements achieved with hardware acceleration:



  • 2-camera, 3D disparity mapping – 292x speed improvement


  • Sobel filter video processing – 30x speed improvement


  • Binary neural network – 1000x speed improvement



A beefier software processor or multiple processor cores will not get you 1000x more performance—or even 30x—no matter how you tweak your HLL code, and software coders will sweat bullets just to get a few percentage points of improvement. For such big performance leaps, you need hardware.


Here’s the 14-minute Chalk Talk video:






By Adam Taylor



Having introduced the Aldec TySOM-2 FPGA Prototyping Board, based on the Xilinx Zynq SoC, and the face detection application running on it, I thought it would be a good idea to take a more detailed examination of the face-detection application’s architecture.


The face detection example uses one Blue Eagle camera, which is connected to the Aldec FMC-ADAS card. The processed frames showing the detected face are output via the TySOM-2 board’s HDMI port. What is worth pointing out is that the application running on the TySOM-2 board, face detection in this case, is enabled by the software. The Zynq PL (programmable logic) hardware design provides the capability to interface with the camera, for sharing the video frames with the Zynq PS (processing system) through the DDR SDRAM, and for display output.


Any application could be implemented—not just face detection. It could be object tracking. I could be corner detection. It could be anything. This is one of the things that makes development of image-processing systems on the Zynq so powerful. We can use the same base platform on the TySOM-2 board and customize the application in software. Of course, we can also use the Xilinx SDSoC development environment to further accelerate the algorithm into the TySOM-2 platform’s remaining resources to increase performance.


The Blue Eagle camera transmits the video stream using a, FPD-Link III link. These links use a high-speed, bi-directional CML (Current Mode Logic) link to transfer the image data. An FPD-Link III receiving device (a TI DS90UB914Q-Q1 FPD-Link III SER/DES) is used on the ADAS FMC to implement this camera interface. This device is configured for the application in hand using the I2C peripheral in the Zynq SoC’s PS. This device provides video to the Zynq PL in a parallel format: the parallel data bits, HSync, VSync, and a pixel clock.






We need to process the frames and store them within the Zynq PS’ DDR SDRAM using Video DMA (Direct Memory Access) to ensure that we can access the image frames within DDR memory using the Zynq SoC’s ARM Cortex-A9 processor. We need to use several IP blocks that come as standard IP within Vivado to implement this. These IP blocks transfer data using the AXI streaming protocol--AXIS.


Therefore, the first thing needed is to convert the received video in parallel format into an AXIS stream. Once the video is in the correct format, we can use the VDMA IP block to transfer video data to and from the Zynq PS’ DDR SDRAM, where the software running on the Zynq SoC’s ARM Cortex-A9 processors can access the frames and implement the application algorithms.


Unlike previous examples we have examined, which used a single AXI High Performance (AXI HP) port, this example uses two of the Zynq SOC’s AXI HP interface ports, one in each direction. This configuration requires a slightly more complicated DMA architecture because we’ll need two VDMA IP Blocks. Within the Zynq PL, the AXI standard used for most IP blocks is AXI 4.0 while the ports on the Zynq SoC implement AXI 3.0. Therefore, we need to use an AXI Interconnect or a protocol convertor to convert between the two standards.







This use of two interfaces will make no performance difference when compared to a single HP AXI interface because the S0 and S1 AXI HP Ports on the Zynq SoC which are used by this configuration are multiplexed down to the M0 port on the memory interconnect and finally connected to the S3 port on the DDR SDRAM controller. This is shown below in the interconnection diagram from UG585, the TRM for the Zynq SoC.







Once the VDMA is implemented, the design then perform color-space conversion, chroma resampling, and finally passes to an on-screen display module. Once this has been completed, the video stream must be converted from AXIS to parallel video, which can then be output to the HDMI transmitter.


With this hardware platform completed, the next step is to write the software to create the application. For this we have the choice of using SDK or using SDSoC, which adds the ability to accelerate some of the application algorithm functions using programmable logic. As this example is implemented on the Zynq Z-7100 SoC, which has a significant amount of free, on-chip programmable resources following the implementation of the base platform, we’ll be using SDSoC for this example. We will look at the software architecture next time.


My code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here


MicroZed Chronicles Second Year.jpg 



Here’s a 40-minute teardown video of a Vision Research Phantom v5 high-speed high-speed, 1024x1024-pixel, 1000frames/sec video camera (circa 2001) from tesla500’s YouTube video channel. His methodical teardown and excellent system-level explanation uncovers a couple of “huge” Xilinx XC4020 FPGAs (circa 2000) on the timing and interface boards and Xilinx XC9500 CPLDs implementing the timing and control on the four high-speed capture-memory boards. There’s also a Hitachi SH-2 32-bit RISC processor with a hardware MAC (for DSP) on the timing board.


The XC4020 FPGAs are 3rd-generation devices that each have 784 CLBs (1560 LUTs total). They were big in their day but they’re very small now. These days, I think you could implement all of the digital timing and control circuitry in this camera including the SH-2 processor’s capabilities using the smallest single-core Zynq Z-7007S SoC—with the ARM Cortex-A9 processor in the Zynq SoC running considerably more than 20x faster than the turn-of-the-millennium SH-2 processor’s roughly 28MHz maximum clock rate.


Of course, Vision Research has moved far beyond 1000 frames/sec over the past 17 years. Its latest cameras can go 1000x faster than that, hitting 1M frames/sec when configured with the company’s FAST option (fast indeed!), while the Phantom v5 is no longer listed even on the company’s “discontinued cameras” page. Nevertheless, I found tesla500’s teardown and explanations fascinating and valuable.


Of course, Xilinx All Programmable devices have long been used to design advanced video equipment like the Vision Research Phantom v5 high-speed camera. Which allows me to quickly remind you of the recent launch of the Xilinx reVISION stack launch for embedded-vision applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)


And now, here’s tesla500’s Vision Research Phantom v5 high-speed camera teardown video:








How to use machine learning for embedded vision—and many other embedded applications

by Xilinx Employee ‎03-30-2017 10:02 AM - edited ‎03-30-2017 12:00 PM (3,577 Views)


Image3.jpg Adam Taylor and Xilinx’s Sr. Product Manager for SDSoC and Embedded Vision Nick Ni have just published an article on the EE News Europe Web site titled “Machine learning in embedded vision applications.” That title’s pretty self-explanatory, but there are a few points I’d like to highlight. Then you can go read the full article yourself.


As the article states, “Machine learning spans several industry mega trends, playing a very prominent role within not only Embedded Vision (EV), but also Industrial Internet of Things (IIoT) and Cloud Computing.” In other words, if you’re designing products for any embedded market, you might well find yourself at a competitive disadvantage if you’re not adding machine-learning features to your road map.


This article closely ties machine learning with neural networks (including Feed-forward Neural Networks (FNNs), Recurrent Neural Networks (RNNs), and Deep Neural Networks (DNNs), and Convolutional Neural Networks (CNNs)). Neural networks are not programmed; they’re trained. Then, if they’re part of an embedded design, they’re deployed. Training is usually done using floating-point neural-network implementations but, for efficiency (power and cost), deployed neural networks can use fixed-point representations with very little or no loss of accuracy. (See “Counter-Intuitive: Fixed-Point Deep-Learning Inference Delivers 2x to 6x Better CNN Performance with Great Accuracy.”)


The programmable logic inside of Xilinx FPGAs, Zynq SoCs, and Zynq UltraScale+ MPSoCs is especially good at implementing fixed-point neural networks, as described in this article by Nick Ni and Adam Taylor. (Go read the article!)


Meanwhile, this is a good time to remind you of the recent Xilinx introduction of the reVISION stack for neural network development using Xilinx All Programmable devices. For more information about the Xilinx reVISION stack, see:
















MathWorks’ HDL Coder wins Embedded World AWARD in Nuremberg last week

by Xilinx Employee ‎03-21-2017 02:07 PM - edited ‎03-22-2017 05:59 AM (3,870 Views)


The organizers of last week’s Embedded World show in Nuremberg gave out embedded AWARDS in three categories last week during the show and MathWorks’ HDL Coder won in the tools category. (See the announcement here.) If you don’t know about this unique development tool, now is a good time to become acquainted with it. HDL Coder accepts model-based designs created using MathWorks’ MATLAB and Simulink and can generate VHDL or Verilog for all-hardware designs or hardware and software code for designs based on a mix of custom hardware and embedded software running on a processor. That means that HDL Coder works well with Xilinx FPGAs and Zynq SoCs.


Here’s a diagram of what HDL Coder does:



MathWorks HDL Coder.jpg 



You might also want to watch this detailed MathWorks video titled “Accelerate Design Space Exploration Using HDL Coder Optimizations.” (Email registration required.)



For more information about using MathWorks HDL Coder to target your designs for Xilinx devices, see:









About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.