UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

Think you don’t need HBM (high-bandwidth memory) in your FPGA-based designs? There was probably a time, not that long ago, when you thought you didn’t need a smartphone. Still think so? With its 460Gbytes/sec bandwidth, HBM doesn’t crash through the memory wall, it vaults you over the wall. And who needs to get over the memory wall? Anyone working with high-speed Ethernet, high-res video, and most high-performance DSP applications. Pretty much anything you’d use a Xilinx UltraScale+ All Programmable device for. Here’s a chart illustrating the problem:

 

 

Memory Wall.jpg 

 

 

Allow me to translate this chart for you: “You’re not going to get there with DDR SDRAM.”

 

Fortunately, there’s no longer a need for me to convince you that you need HBM. There’s an 11-page White Paper to do that job. It’s titled “Virtex UltraScale+ HBM FPGA: A Revolutionary Increase in Memory Performance.”

 

And, if you weren’t aware that Xilinx was adding HBM to its FPGAs, read this blog from last November: “Xilinx Virtex UltraScale+ FPGAs incorporate 32 or 64Gbits of HBM, delivers 20x more memory bandwidth than DDR.”

 

 

 

 

Last month, Xilinx Product Marketing Manager Darren Zacher presented a Webinar on the extremely popular $99 Arty Dev Kit, which is based on a Xilinx Artix-7 A35T FPGA, and it’s now online. If you’re wondering if this might be the right way for you to get some design experience with the latest FPGA development tools and silicon, spend an hour with Zacher and Arty. The kit is available from Avnet and Digilent.

 

Register to watch the video here.

 

 

ARTY v4.jpg 

 

 

For more information about the Arty Dev Kit, see: “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????

 

 

 

 

 

Anthony Collins, Harpinder Matharu, and Ehab Mohsen of Xilinx have just published an application article about the 16nm Xilinx RFSoC in MicroWave Journal titled “RFSoC Integrates RF Sampling Data Converters for 5G New Radio.” Xilinx announced the RFSoC, which is based on the 16nm Xilinx Zynq UltraScale+ MPSoC, back in February (see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable,” we mean it!”). The Xcell Daily blog with that announcement has been very popular. Last week, another blog gave more details (see “Ready for a few more details about the Xilinx All Programmable RFSoC? Here you go”), and now there’s this article in Microwave Journal.

 

This new article gets into many specifics with respect to designing the RFSoC into systems with block diagrams and performance numbers. In particular, there’s a table showing MIMO radio designs based on the RFSoC with 37% to 51% power reductions and significant pcb real-estate savings due to the RFSoC’s integrated, multi-Gbps ADCs and DACs.

 

If you’re looking to glean a few more technical details about the RFSoC, this article is the latest place to go.

 

 

Compute Acceleration: GPU or FPGA? New White Paper gives you numbers

by Xilinx Employee ‎06-14-2017 02:24 PM - edited ‎06-14-2017 02:28 PM (3,095 Views)

 

Cloud computing and application acceleration for a variety of workloads including big-data analytics, machine learning, video and image processing, and genomics are big data-center topics and if you’re one of those people looking for acceleration guidance, read on. If you’re looking to accelerate compute-intensive applications such as automated driving and ADAS or local video processing and sensor fusion, this blog post’s for you to. The basic problem here is that CPUs are too slow and they burn too much power. You may have one or both of these challenges. If so, you may be considering a GPU or an FPGA as an accelerator in your design.

 

How to choose?

 

Although GPUs started as graphics accelerators, primarily for gamers, a few architectural tweaks and a ton of software have made them suitable as general-purpose compute accelerators. With the right software tools, it’s not too difficult to recode and recompile a program to run on a GPU instead of a CPU. With some experience, you’ll find that GPUs are not great for every application workload. Certain computations such as sparse matrix math don’t map onto GPUs well. One big issue with GPUs is power consumption. GPUs aimed at server acceleration in a data-center environment may burn hundreds of watts.

 

With FPGAs, you can build any sort of compute engine you want with excellent performance/power numbers. You can optimize an FPGA-based accelerator for one task, run that task, and then reconfigure the FPGA if needed for an entirely different application. The amount of computing power you can bring to bear on a problem is scary big. A Virtex UltraScale+ VU13P FPGA can deliver 38.3 INT8 TOPS (that’s tera operations per second) and if you can binarize the application, which is possible with some neural networks, you can hit 500TOPS. That’s why you now see big data-center operators like Baidu and Amazon putting Xilinx-based FPGA accelerator cards into their server farms. That’s also why you see Xilinx offering high-level acceleration programming tools like SDAccel to help you develop compute accelerators using Xilinx All Programmable devices.

 

For more information about the use of Xilinx devices in such applications including a detailed look at operational efficiency, there’s a new 17-page White Paper titled “Xilinx All Programmable Devices: A Superior Platform for Compute-Intensive Systems.”

 

 

 

 

 

There’s considerable 5G experimentation taking place as the radio standards have not yet gelled and researchers are looking to optimize every aspect. SDRs (software-defined radios) are excellent experimental tools for such research—NI’s (National Instruments’) SDR products especially so because, as the Wireless Communication Research Laboratory at Istanbul Technical University discovered:

 

“NI SDR products helped us achieve our project goals faster and with fewer complexities due to reusability, existing examples, and the mature community. We had access to documentation around the examples, ready-to-run conceptual examples, and courseware and lab materials around the grounding wireless communication topics through the NI ecosystem. We took advantage of the graphical nature of LabVIEW to combine existing blocks of algorithms more easily compared to text-based options.”

 

Researchers at the Wireless Communication Research Laboratory were experimenting with UFMC (universal filtered multicarrier) modulation, a leading modulation candidate technique for 5G communications. Although current communication standards frequently use OFDM (orthogonal frequency-division multiplexing), it is not considered to be a suitable modulation technique for 5G systems due to its tight synchronization requirements, inefficient spectral properties (such as high spectral side-lobe levels), and cyclic prefix (CP) overhead. UFMC has relatively relaxed synchronization requirements.

 

The research team at the Wireless Communication Research Laboratory implemented UFMC modulation using two USRP-2921 SDRs, a PXI-6683H timing module, and a PXIe-5644R VST (Vector signal Transceiver) module from National Instruments (NI)–and all programmed with NI’s LabVIEW systems engineering software. Using this equipment, they achieved better spectral results over the OFDM usage and, by exploiting UFMC’s sub-band filtering approach, they’ve proposed enhanced versions of UFMC. Details are available in the NI case study titled “Using NI Software Defined Radio Solutions as a Testbed of 5G Waveform Research.” This project was a finalist in the 2017 NI Engineering Impact Awards, RF and Mobile Communications category, held last month in Austin as part of NI Week.

 

 

5G UFMC Modulation Testbed.jpg 

 

 

5G UFMC Modulation Testbed based on Equipment from National Instruments

 

 

Note: NI’s USRP-2921 SDR is based on a Xilinx Spartan-6 FPGA; the NI PXI-6683 timing module is based on a Xilinx Virtex-5 FPGA; and the PXIe-5644R VST is based on a Xilinx Virtex-6 FPGA.

 

 

 

 

 

 

Although humans once served as the final inspectors for pcbs, today’s component dimensions and manufacturing volumes mandate the use of camera-based automated optical inspection (AOI) systems. Amfax has developed a 3D AOI system—the a3Di—that uses two lasers to make millions of 3D measurements with better than 3μm accuracy. One of the company’s customers uses an a3Di system to inspect 18,000 assembled pcbs per day.

 

The a3Di control system is based on a National Instruments (NI) cRIO-9075 CompactRIO controller—with an integrated Xilinx Virtex-5 LX25 FPGA—programmed with NI’s LabVIEW systems engineering software. The controller manages all aspects of the a3Di AOI system including monitoring and control of:

 

 

  • Machine motors
  • Control switches
  • Optical position sensors
  • Inverters
  • Up and downstream SMEMA (Surface Mount Equipment Manufacturers Association) conveyor control
  • Light tower
  • Pneumatics
  • Operator manual controls for width PCB control
  • System emergency stop

 

 

The system provides height-graded images like this:

 

 

 

Amfax 3D PCB image.jpg 

 

3D Image of a3Di’s Measurement Data: Colors represent height, with Z resolution down to less than a micron. The blue section at the top indicates signs of board warp. Laser etched component information appears on some of the ICs.

 

 

 

The a3Di system then compares this image against a stored golden reference image to detect manufacturing defects.

 

Amfax says that it has found the CompactRIO system to be “CompactRIO system has proven to be a dependable, reliable, and cost-effective.” In addition, the company found it could get far better timing resolution with the CompactRIO system than the 1msec resolution usually provided by PLC controllers.

 

 

This project was a 2017 NI Engineering Impact Award Finalist in the Electronics and Semiconductor category last month at NI Week. It is documented in this NI case study.

 

HIL simulator based on NI equipment allows Hyundai to cut marine diesel development from 3 years to one

by Xilinx Employee ‎06-13-2017 12:09 PM - edited ‎06-13-2017 12:11 PM (1,222 Views)

 

Hyundai Heavy Industries (HHI) is the world’s foremost shipbuilding company and the company’s Engine and Machinery Division (HHI-EMD) is the world’s largest marine diesel engine builder. HHI’s HiMSEN medium-sized engines are four-stroke diesels with output power ranging from 960kW to 25MW. These engines power electric generators on large ships and serve as the propulsion engine on medium and small ships. HHI-EMD is always developing newer, more fuel-efficient engines because the fuel costs for these large diesels runs about $2000/hour. Better fuel efficiency will significantly reduce operating costs and emissions.

 

For that research, HHI-EMD developed monitoring and diagnostic equipment to better understand engine combustion performance and an HIL system to test new engine controller designs. The test and HIL systems are based on equipment from National Instruments (NI).

 

Engine instrumentation must be able to monitor 10-cylinder engines running at thousands of RPM while measuring crankshaft angle to 0.1 degree of resolution. From that information, the engine test and monitoring system calculates in-cylinder peak pressure, mean effective pressure, and cycle-to-cycle pressure variation. All this must happen every 10 μ sec for each cylinder.

 

HHI-EMD elected to use an NI cRIO-9035 Controller, which incorporates a Xilinx Kintex-7 70T FPGA, to serve as the platform for developing its HiCAS test and data-acquisition system. The HiCAS system monitors all aspects of the engine under test including engine speed, in-cylinder pressure, and pressures in the intake and exhaust systems. This data helped HHI-EMD engineers analyze the engine’s overall performance and the performance of key parts using thermodynamic analysis. HiCAS provides real-time analysis of dynamic data including:

 

  • In-cylinder peak pressure
  • Indicated mean effective pressure and cycle-to-cycle variation
  • Cylinder-to-cylinder distribution
  • Cyclic moving parts fault diagnosis

 

Using the collected data, the engineering team then developed a model of the diesel engine, resulting in the development of an HMI system used to exercise the engine controllers. This engine model runs in real time on an NI PXI system synchronized with the high-speed signal-sensor simulation software running on the PXI system’s multifunction FPGA-based FlexRIO module. The HMI system transmits signals to the engine controllers, simulating an operating engine and eliminating the operating costs of a large diesel engine during these tests. HHI-EMD credits the FPGAs in these systems for making the calculations run fast enough for real-time simulation. The simulated engine also permits fault testing without the risk of damaging an actual engine. Of course, all of this is programmed using NI’s LabVIEW systems engineering software and LabVIEW FPGA.

 

 

 

 

HHI-EMD HIL Simulator for Marine Diesel Engines.jpg 

 

 

HHI-EMD HIL Simulator for Marine Diesel Engines

 

 

 

According to HHI-EMD, development of the HiCAS engine-monitoring system and virtual verification based on the HIL system shortened development time from more than three years to one, significantly accelerating the time-to-market for HHI-EMD’s more eco-friendly marine diesel engines.

 

 

 

This project was a 2017 NI Engineering Impact Award Finalist in the Transportation and Heavy Equipment category last month at NI Week and won the 2017 HPE Edgeline Big Analog Data Award. It is documented in this NI case study.

 

 

 

 

 

 

 

 

 

Chang Guang Satellite Technology, China’s first commercial remote sensing satellite company, develops and operates the JILIN-1 high-resolution remote-sensing satellite series, which has pioneered the application of commercial satellites in China. The company contemplates putting 60 satellites in orbit by 2020 and 138 satellites in orbit by 2030. Achieving that goal is going to take a lot of testing and testing consumes about 70% of the development cycle for space-based systems. So Chang Guang Satellite Technology knew it would need to automate its test systems and turned to National Instruments (NI) for assistance. The resulting automated test system has three core test systems using products from NI:

 

  • An S-band ground monitoring system, based mainly on NI’s 1st-generation PXI RF Vector Signal Transceiver (VST) and an NI FlexRIO module
  • An on-orbit satellite dynamic model hardware-in-the-loop (HIL) system with a sub-1msec closed-loop period, 100x better than a conventional test system design
  • A GPS simulator using an FPGA-based FlexRIO module to simulate high-dynamic, in-orbit satellite navigation signals

 

 

Chang Guang Test System.jpg

 

A Chang Guang Satellite Technology test system based on NI’s 1st-generation VST and FlexRIO PXIe modules

 

 

Here’s a sample image from the company’s growing satellite imaging portfolio:

 

 

 

Disneyland from Space.jpg 

 

 

Shanghai Disneyland as viewed from space

 

 

 

 

NI’s VSTs and FlexRIO modules are all based on multiple generations of Xilinx FPGAs. The company’s 2nd-generation VSTs are based on Virtex-7 FPGAs and its latest FlexRIO modules are based on Kintex-7 FPGAs.

 

 

 

This project was a 2017 NI Engineering Impact Award Finalist in the Aerospace and Defense category last month at NI Week. It is documented in this NI case study.

 

 

 

For more information about NI’s VST family, see:

 

 

 

 

 

 

 

Avnet has formally introduced its MiniZed dev board based on the Xilinx Zynq Z-7000S SoC with the low, low price of just $89. For this, you get a Zynq Z-7007S SoC with one ARM Cortex-A9 processor core, 512Mbytes of DDR3L SDRAM, 128Mbits of QSPI Flash, 8Gbytes of eMMC Flash memory, WiFi 802.11 b/g/n, and Bluetooth 4.1. The MiniZed board incorporates an Arduino-compatible shield interface, two Pmod connectors, and a USB 2.0 host interface for fast peripheral expansion. You’ll also find an ST Microelectronics LIS2DS12 Motion and temperature sensor and an MP34DT05 Digital Microphone on the board. This is a low-cost dev board that packs the punch of a fast ARM Cortex-A9 processor, programmable logic, a dual-wireless communications system, and easy system expandability.

 

I find the software that accompanies the board equally interesting. According to the MiniZed Product Brief, the $89 price includes a voucher for an SDSoC license so you can program the programmable logic on the Zynq SoC using C or C++ in addition to Verilog or VHDL using Vivado. This is a terrific deal on a Zynq dev board, whether you’re a novice or an experienced Xilinx user.

 

Avnet’s announcement says that the board will start shipping in early July.

 

Stefan Rousseau, senior technical marketing engineer for Avnet, said, “Whether customers are developing a Linux-based system or have a simple bare metal implementation, with MiniZed, Zynq-7000 development has never been easier. Designers need only connect to their laptops with a single micro-USB cable and they are up and running. And with Bluetooth or Wi-Fi, users can also connect wirelessly, transforming a mobile phone or tablet into an on-the-go GUI.”

 

 

 

Here’s a photo of the MiniZed Dev board:

 

 

Avnet MiniZed 3.jpg 

 

Avnet’s $89 MiniZed Dev Board based on a Xilinx Zynq Z-7007S SoC

 

 

And here’s a block diagram of the board:

 

 

MiniZed Block Diagram.jpg 

 

Avnet’s $89 MiniZed Dev Board Block Diagram

 

Many engineers in Canada wear the Iron Ring on their finger, presented to engineering graduates as a symbolic, daily reminder that they have an obligation not to design structures or other artifacts that fail catastrophically. (Legend has it that the iron in the ring comes from the first Quebec Bridge—which collapsed during its construction in 1907—but the legend appears to be untrue.) All engineers, whether wearing the Canadian Iron Ring or not, feel an obligation to develop products that do not fail dangerously. For buildings and other civil engineering works, that usually means designing structures with healthy design margins even for worst-case projected loading. However, many structures encounter worst-case loads infrequently or never. For example, a sports stadium experiences maximum loading for perhaps 20 or 30 days per year, for only a few hours at a time when it fills with sports fans. The rest of the time, the building is empty and the materials used to ensure that the structure can handle those loads are not needed to maintain structural integrity.

 

The total energy consumed by a structure over its lifetime is a combination of the energy needed to mine and fabricate the building materials and to build the structure (embodied energy) and the energy needed to operate the building (operational energy). The resulting energy curve looks something like this:

 

 

 

Embodied versus Operational Energy for a Structure.jpg
 

 

 

For completely passive structures, which describes most structures built over the past several thousand years, embodied energy dominates the total consumed energy because structural members must be designed to bear the full design load at all times. Alternatively, a smart structure with actuators that stiffen the structure only when needed will require more operational energy but the total required embodied energy will be smaller. Looking at the above conceptual graph, a well-designed active-passive system minimizes the total required energy for the structure.

 

Active control has already been used in structure design, most widely for vibration control. During his doctorate work, Gennaro Senatore formulated a new methodology to design adaptive structures. His research project was a collaboration between the University College London and Expedition Engineering. As part of that project, Senatore built a large scale prototype of an active-passive structure at the University College London structures laboratory. The resulting prototype is a 6m cantilever spatial truss with a 37.5:1 span-to-depth ratio. Here’s a photo of the large-scale prototype truss:

 

 

Active-Passive Cantilever Truss.jpg
 

 

 

You can see the actuators just beneath the top surface of the truss. When the actuators are not energized, the cantilever truss flexes quite a lot with a load placed at the extreme end. However, this active system detects the load-induced flexion and compensates by energizing the actuators and stiffening the cantilever.

 

Here’s a photo showing the amount of flex induced by a 100kg load at the end of the cantilever without and with energized actuators:

 

 

 

Cantilever Flexion.jpg 

 

 

The top half of the image shows that the truss flexes 170mm under load when the actuators are not energized, but only 2mm when the system senses the load and energizes the linear actuators.

 

The truss incorporates ten linear electric actuators that stiffen the truss when sensors detect a load-induced deflection. The control system for this active-passive truss consists of a National Instruments (NI) CompactRIO cRIO-9024 controller, 45 strain-gage sensors, 10 actuators, and five driver boards (one for each actuator pair.) The NI cRIO-9024 controller pairs with a card cage that accepts I/O modules and incorporates a Virtex-5 FPGA for reconfigurable I/O. (That’s what the “RIO” in cRIO stands for.) In this application, the integral Virtex-5 FPGA also provides in-line processing for acquired and generated signals.

The system is programmed using NI’s LabVIEW systems engineering software.

 

A large structure would require many such subsystems, all communicating through a network. This is clearly one very useful way to employ the IIoT in structures.

 

 

This project was a 2017 NI Engineering Impact Award Finalist in the Industrial Machinery and Control category last month at NI Week. It is documented in this NI case study, which includes many more technical details and a short video showing the truss in action as a load is applied.

 

 

 

 

National Instruments (NI) has just announced a baseband version of its 2nd-Generation PXIe VST (Vector Signal Transceiver), the PXIe-5820, with 1GHz of complex I/Q bandwidth. It’s designed to address the most challenging RF front-end module and transceiver test applications. Of course, you program it with NI’s LabVIEW system engineering software like all NI instruments and, like its RF sibling the PXIe-5840, the PXIe-5820 baseband VST is based on a Xilinx Virtex-7 690T FPGA and a chunk of the FPGA’s programmable logic is available to users for creating real-time, application-specific signal processing using LabVIEW FPGA. According to Ruan Lourens, NI’s Chief Architect of RF R&D, “The baseband VST can be tightly synchronized with the PXIe-5840 RF VST to sub-nanosecond accuracy, to offer a complete solution for RF and baseband differential I/Q testing of wireless chipsets.”

 

 

 

NI PXIe-5820 Baseband VST.jpg 

 

NI’s new PXIe-5820 Baseband VST

 

 

 

 

How might you use this feature? Here’s a very recent, 2-minute video demonstration of a DPD (digital predistortion) measurement application that provides a pretty good example:

 

 

 

 

Latest Embedded Muse newsletter publishes testimonial for Saleae and its FPGA-based Logic Analyzers

by Xilinx Employee ‎06-12-2017 01:16 PM - edited ‎06-12-2017 01:17 PM (1,527 Views)

 

My very good friend Jack Ganssle, the firmware expert and consultant, publishes a free semi-monthly newsletter called the Embedded Muse for engineers working on embedded systems and the latest issue, #330, contains this testimonial for Saleae and its FPGA-based logic analyzers:

 

“Another reader has nice things to say about Saleae. Dan Smith writes:

 

“I've owned Saleae logic analyzers since 2011, starting with their original (and at that time, only) logic analyzer, aptly named "Logic", which was an 8-channel unit and very good host-side software to go with it.  Never had a problem with the unit or the software.

 

“Fast forward to a Saturday morning in 2017, where I was debugging a strange bus timing problem under a tight project deadline.  I'd woken up early because I couldn't "turn my mind off" from the night before.  I was using a newer, more expensive model with features I needed (analog capture, must higher bandwidth, etc.)  For some reason, the unit wouldn't enumerate when I plugged it in that morning.  I contacted support on a Saturday morning; to my surprise, I had a response later that day from Mark, one of the founders and also the primary architect and developer of the host-side software. After discussing the problem and my situation, they sent me a replacement unit right away, even before the old unit was returned and inspected.  I'd also received excellent support a few weeks earlier getting the older Logic unit working with a strange combination of MacBook Pro, outdated version of OSX and an uncooperative USB port.

 

“My point is simply that even though the Saleae products -- hardware and software -- are excellent, it's their customer service that has earned my loyalty.  Too often, great service and support go unmentioned; in my case, it's what saved me.  And yes, I debugged the problem that weekend and met the deadline!”

 

 

Saleae engineers its compact, USB-powered logic analyzers using FPGAs like the Xilinx Spartan-6 LX16 FPGA used in its Logic Pro 8 logic analyzer/scope. (See “Jack Ganssle reviews the Saleae Logic Pro 8 logic analyzer/scope, based on a Spartan-6 FPGA” and “Compact, 8-channel, 500Msamples/sec logic analyzer relies on Spartan-6 FPGA to provide most functions.”) Although the Spartan-6 FPGA is a low-cost device, its logic and I/O programmability are a great match for the logic analyzer’s I/O and data-capture needs.

 

 

Saleae Logic Analyzer with Spartan-6 FPGA.jpg

 

Saleae Logic Pro 8 logic analyzer/scope

 

 

 

MathWorks has just published a 30-minute video titled “FPGA for DSP applications: Fixed Point Made Easy.” The video targets users of the company’s MATLAB and Simulink software tools and covers fixed-point number systems, how these numbers are represented in MATLAB and in FPGAs, quantization and quantization challenges, sources of error and minimizing these errors, how to use MathWorks’ design tools to understand these concepts, implementation of fixed-point DSP algorithms on FPGAs using MathWorks’ tools, and the advantages of the Xilinx DSP48 block—which you’ll find in all Xilinx 28nm series 7, 20nm UltraScale, and 16nm UltraScale+ devices including Zynq SoCs and Zynq UltraScale+ MPSoCs.

 

The video also shows the development of an FIR filter using MathWorks’ fixed-point tools as an example with some useful utilization feedback that helps you optimize your design. The video also briefly shows how you can use MathWorks’ HDL Coder tool to develop efficient, single-precision, floating-point DSP hardware for Xilinx FPGAs.

 

 

 

 

 

By Adam Taylor

 

We can create very responsive design solutions using Xilinx Zynq SoC or Zynq UltraScale+ MPSoC devices, which enble us to architect systems that exploit the advantages provided by both the PS (processor system) and the PL (programmable logic) in these devices. When we work with logic designs in the PL, we can optimize the performance of design techniques like pipelining and other UltraFast design methods. We can see the results of our optimization techniques using simulation and Vivado implementation results.

 

When it comes to optimizing the software, which runs on acceleration cores instantiated in the PS, things may appear a little more opaque. However, things are not what they might appear. We can gather statistics on our accelerated code with ease using the performance analysis capabilities built into XSDK. Using performance analysis, we can examine the performance of the software we have running on the acceleration cores and we can monitor AXI performance within the PL to ensure that the software design is optimized for the application at hand.

 

Using performance analysis, we can examine several aspects of our running code:

 

  • CPU Utilization – Percentage of non-idling CPU clock cycles
  • CPU Instructions Per Cycle – Estimated number of executed instructions per cycle
  • L1 Cache Data Miss Rate % – L1 data-cache miss rate
  • L1 Cache Access Per msec – Number of L1 data-cache accesses
  • CPU Write Instructions Stall per cycle – Estimated number of stall cycles per instruction
  • CPU Read Instructions Stall per cycle – Estimated number of stall cycles per instruction

 

For those who may not be familiar with the concept, a stall occurs when the cache does not contain the requested data, which must then be fetched from main memory. While the data is fetched, the core can continue to process different instructions using out-of-order (OOO) execution, however the processor will eventually run out of independent instructions. It will have to wait for the information it needs. This is called a stall.

 

We can gather these stall statistics thanks to the Performance Monitor Unit (PMU) contained within each of the Zynq UltraScale+ MPSoC’s CPUs. The PMU provides six profile counters, which are configured by and post processed by XSDK to generate the statistics above.

 

If we want to use the performance monitor within SDK, we need to work with a debug build and then open the Performance Monitor Perspective within XSDK. If we have not done so before, we can open the perspective as shown below:

 

 

Image1.jpg 

 

 

Image2.jpg

 

 

Opening the Performance Analysis Perspective

 

 

With the performance analysis perspective open, we can debug the application as normal. However, before we click on the run icon (the debugger should be set to stop at main, as default), we need to start the performance monitor. To do that, right click on the “System Debugger on Local” symbol within the performance monitor window and click start.

 

 

Image3.jpg

 

Starting the Performance Analysis

 

 

 

Then, once we execute the program, the statistics will be gathered and we can analyse them within XDSK to determine the best optimizations for our code.

 

To demonstrate how we can use this technique to deliver a more optimized system, I have created a design that runs on the ZedBoard and performs AES256 Encryption on 1024 packets of information. When this code was run the ZedBoard the following execution statistics were collected:

 

 

Image4.jpg

 

Performance Graphs

 

 

 

Image5.jpg

 

Performance Counters

 

 

 

So far, these performance statistics only look at code executing on the PS itself. Next time, we will look at how we can use the AXI Performance Monitor with XSDK. If we wish to do this, we need to first instrument the design in Vivado.

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

Linc, Perrone Robotics’ autonomous Lincoln MKZ automobile, took a drive around the Perrone paddock at the TU Automotive autonomous vehicle show in Detroit last week and Dan Isaacs, Xilinx’s Director Connected Systems in Corporate Marketing, was there to shoot photos and video. Perrone’s Linc test vehicle operates autonomously using the company’s MAX (Mobile Autonomous X), a “comprehensive full-stack, modular, real-time capable, customizable, robotics software platform for autonomous (self-driving) vehicles and general purpose robotics.” MAX runs on multiple computing platforms including one based on an Iveia controller, which is based on an Iveia Atlas SOM, which in turn is based on a Xilinx Zynq UltraScale+ MPSoC. The Zynq UltraScale+ MPSoC handles the avalanche of data streaming from the vehicle’s many sensors to ensure that the car travels the appropriate path and avoids hitting things like people, walls and fences, and other vehicles. That’s all pretty important when the car is driving itself in public. (For more information about Perrone Robotics’ MAX, see “Perrone Robotics builds [Self-Driving] Hot Rod Lincoln with its MAX platform, on a Zynq UltraScale+ MPSoC.”)

 

Here’s a photo of Perrone’s sensored-up Linc autonomous automobile in the Perrone Robotics paddock at TU Automotive in Detroit:

 

 

Perrone Robotics Linc Autonomous Driving Lincoln MKZ.jpg 

 

 

And here’s a photo of the Iveia control box with the Zynq UltraScale+ MPSoC inside, running Perrone’s MAX autonomous-driving software platform. (Note the controller’s small size and lack of a cooling fan):

 

 

Iveia Autonomous Driving Controller for Perrone Robotics.jpg 

 

 

Opinions about the feasibility of autonomous vehicles are one thing. Seeing the Lincoln MKZ’s 3800 pounds of glass, steel, rubber, and plastic being controlled entirely by a little silver box in the trunk, that’s something entirely different. So here’s the video that shows Perrone Robotics’ Linc in action, driving around the relative safety of the paddock while avoiding the fences, pedestrians, and other vehicles:

 

 

 

If you’re designing next-generation avionics systems, you may be facing some challenges:

 

  • Developing scalable, reconfigurable, common-compute platforms for flexible deployment
  • Managing tradeoffs in size, weight, power, and cost and between component- and board-level development
  • Meeting safety requirements including those related to radiation effects
  • Meeting stringent and evolving certification requirements for DO-254, DO-178, and guidance related to multi-core use
  • Maximizing hardware and software reuse and dealing with associated certification artifacts while ensuring an appropriate level of design integrity and system safety

 

Do these sound like your challenges? Want some help? Check out this June 20 Webinar.

 

 

When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.

 

Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:

 

  • Machine learning for manufacturing
  • Cloud acceleration
  • Autonomous cars, drones, and robots
  • Real-time 4K, UHD, and 8K video and image processing
  • VR and AR
  • High-speed networking by RF, LED-based free-air optics, and fiber
  • Cybersecurity for IIoT

 

That’s a huge range covered in just five minutes.

 

Here’s the video:

 

 

 

 

 

Signal Integrity Journal just published a new article titled “Addressing the 5G Challenge with Highly Integrated RFSoC,” written by four Xilinx authors. The articles discusses some potential uses for Xilinx RFSoC technology, announced in February. (See “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable, we mean it!”)

 

Cutting to the chase of this 2600-word article, the Xilinx RFSoC is going to save you a ton of power and make it easier for you to achieve your performance goals for 5G and many other advanced, mixed-signal system designs.

 

If you’re involved in the design of a system like that, you really should read the article.

 

 

 

Light Reading’s International Group Editor Ray Le Maistre recently interviewed David Levi, CEO of Ethernity Networks, who discusses the company’s FPGA-based All Programmable ACE-NIC, a Network Interface Controller with 40Gbps throughput. The carrier-grade ACE-NIC accelerates vEPC (virtual Evolved Packet Core, a framework for virtualizing the functions required to converge voice and data on 4G LTE networks) and vCPE (virtual Customer Premise Equipment, a way to deliver routing, firewall security and virtual private network connectivity services using software rather than dedicated hardware) applications by 50x, dramatically reducing end-to-end latency associated with NFV platforms. Ethernity’s ACE-NIC is based on a Xilinx Kintex-7 FPGA.

 

“The world is crazy about our solution—it’s amazing,” says Levi in the Light Reading video interview.

 

 

Ethernity Networks ACE-NIC.jpg

 

Ethernity Networks All Programmable ACE-NIC

 

 

Because Ethernity implements its NIC IP in a Kintex-7 FPGA, it was natural for Le Maistre to ask Levi when his company would migrate to an ASIC. Levi’s answer surprised him:

 

“We offer a game changer... We invested in technology—which is covered by patents—that consumes 80% less logic than competitors. So essentially, a solution that you may want to deliver without our patents will cost five times more on FPGA… With this kind of solution, we succeed over the years in competing with off-the-shelf components… with the all-programmable NIC, operators enjoy the full programmability and flexibility at an affordable price, which is comparable to a rigid, non-programmable ASIC solution.”

 

In other words, Ethernity plans to stay with All Programmable devices for its products. In fact, Ethernity Networks announced last year that it had successfully synthesized its carrier-grade switch/router IP for the Xilinx Zynq UltraScale+ MPSoC and that the throughput performance increases to 60Gbps per IP core with the 16nm device—and 120Gbps with two instances of that core. “We are going to use this solution for novel SDN/NFV market products, including embedded SR-IOV (single-root input/output virtualization), and for high density port solutions,” – said Levi.

 

Towards the end of the video interview, Levi looks even further into the future when he discusses Amazon Web Services’ (AWS’) recent support of FPGA acceleration. (That’s the Amazon EC2 F1 compute instance based on Xilinx Virtex UltraScale+ FPGAs rolled out earlier this year.) Because it’s already based on Xilinx All Programmable devices, Ethernity’s networking IP runs on the Amazon EC2 F1 instance. “It’s an amazing opportunity for the company [Ethernity],” said Levi. (Try doing that in an ASIC.)

 

Here’s the Light Reading video interview:

 

 

 

 

 

 

With LED automotive lighting now becoming commonplace, newer automobiles have the ability to communicate with each other (V2V communications) and with roadside infrastructure by quickly flashing their lights (LiFi) instead of using radio protocols. Researchers at OKATEM—the Centre of Excellence in Optical Wireless Communication Technologies at Ozyegin University in Turkey—have developed an OFDM-based LiFi demonstrator for V2V (vehicle-to-vehicle) and V2I (vehicle-to-infrastructure) applications that has achieved 50Mbps communications between vehicles as far apart as 70m in a lab atmospheric emulator.

 

 

 

Inside the OKATEM LiFi Atmospheric Chamber.jpg 

 

Inside the OKATEM LiFi Atmospheric Emulator

 

 

The demo system is based on PXIe equipment from National Instruments (NI) including FlexRIO FPGA modules. (NI’s PXIe FlexRIO modules are based on Xilinx Virtex-5 and Virtex-7 FPGAs.) The FlexRIO modules implement the LiFi OFDM protocols including channel coding, 4-QAM modulation, and an N-IFFT. Here’s a diagram of the setup:

 

 

LiFi V2V Communications Block Diagram.jpg 

 

 

 

Researchers developed the LiFi system using NI’s LabVIEW and LabVIEW system engineering software. Initial LiFi system performance demonstrated a data rate of 50 Mbps with as much as 70m between two cars, depending on the photodetectors’ location in the car (particularly its height above ground level). Further work will try to improve the total system performance by integrating advanced capabilities such as multiple-input, multiple-output (MIMO) communication and link adaptation on the top of OFDM architecture.

 

 

 

This project was a 2017 NI Engineering Impact Award Winner in the RF and Mobile Communications category last month at NI Week. It is documented in this NI case study.

 

When discussed in Xcell Daily two years ago, Exablaze’s 48-port ExaLINK Fusion Ultra Low Latency Switch and Application Platform with the company’s FastMUX option was performing fast Ethernet port aggregation on as many as 15 Ethernet ports with blazingly fast 100nsec latency. (See “World’s fastest Layer 2 Ethernet switch achieves 110nsec switching using 20nm Xilinx UltraScale FPGAs.”) With its new FastMUX upgrade, also available free to existing customers with a current support contract as a field-installable firmware upgrade, Exablaze has now cut that number in half, to an industry-leading 49nsec (actually, between 48.79nsec and 58.79nsec). The FastMUX option aggregates 15 server connections into a single upstream port. All 48 ExaLINK Fusion ports including the FastMux ports are cross-point enabled so that they can support layer 1 features such as tapping for logging, patching for failover, and packet counters and signal quality statistics for monitoring.

 

 

 

Exablaze ExaLINK Fusion Switch.jpg 

 

 

 

The ExaLINK Fusion platform is based on a Xilinx 20nm UltraScale FPGA, which initially gave Exablaze the ability to initially create the fast switching and fast aggregation hardware and massive 48-port connectivity and then to improve the product’s design by taking advantage of the FPGA’s reprogrammability, which simply requires a firmware upgrade that can be performed in the field.

 

 

 

 

 

A wide range of commercial, government, and social applications require precise aerial imaging. These application range from the management of high-profile, international-scale humanitarian and disaster relief programs to everyday commercial use—siting large photovoltaic arrays for example. Satellites can capture geospatial imagery across entire continents, often at the expense of spatial resolution. Satellites also lack the flexibility to image specific areas on demand. You must wait until the satellite is above the real estate of interest. Spookfish Limited in Australia along with ICON Technologies have developed the Spookfish Airborne Imaging Platform (SAIP) based on COTS (commercial off-the-shelf) products including National Instruments’ (NI’s) PXIe modules and LabVIEW systems engineering software that can capture precise images with resolutions of 6cm/pixel to better than 1cm/pixel from a light aircraft cruising at 160 knots at altitudes to 12,000 feet.

 

The 1st-generation SAIP employs one or more cameras installed in a tube attached to the belly of a light aircraft. Success with the initial prototype led to the development of a 2nd-generation design with two camera tubes. The system has continued to grow and now accommodates as many as three camera tubes with as many as four cameras per tube.

 

The multiple cameras must be steered precisely in continuous, synchronized motion while recording camera angles, platform orientation, and platform acceleration. All of this data is used to post-process the image data. At typical operating altitudes and speeds, the cameras must be steered with millidegree precision and the camera angles and platform position must be logged with near-microsecond accuracy and precision. Spookfish then uses a suite of open-source and proprietary computer-vision and photogrammetry techniques to process the imagery, which results in orthophotos, elevation data, and 3D models.

 

Here’s a block diagram of the Spookfish SAIP:

 

 

Spookfish SAIP Block diagram.jpg 

 

 

 

The NI PXIe system in the SAIP design consists of a PXIe-1082DC chassis, a PXIe-8135 RT controller, a PXI-6683H GPS/PPS synchronization module, a PXIe-6674T clock and timing module, a PXIe-7971R FlexRIO FPGA Module, and a PXIe-4464 sound and vibration module. (The PXIe7971R FlexRIO module is based on a Xilinx Kintex-7 325T FPGA. The PXI-6683H synchronization module and the PXIe-6674T clock and timing module are both based on Xilinx Virtex-5 FPGAs.)

 

Here’s an aerial image captured by an SAIP system at 6cm/pixel:

 

 

Spookfish SAIP image at 6cm per pixel.jpg 

 

 

And here’s a piece of an aerial image taken by an SAIP system at 1.5cm/pixel:

 

 

Spookfish SAIP image at 6cm per pixel.jpg 

 

 

 

During its multi-generation development, the SAIP system quickly evolved far beyond its originally envisioned performance specification as new requirements arose. For example, initial expectations were that logged data would only need to be tagged with millisecond accuracy. However, as the project progressed, ICON Technologies and NI improved the system’s timing accuracy and precision by three orders of magnitude.

 

NI’s FPGA-based FlexRIO technology was also crucial in meeting some of these shifting performance targets. Changing requirements pushed the limits of some of the COTS interfaces, so custom FlexRIO interface implementations optimized for the tasks were developed as higher-speed replacements. Often, NI’s FlexRIO technology is employed for the high-speed computation available in the FPGA’s DSP slices, but in this case it was the high-speed programmable I/O that was needed.

 

Spookfish and ICON Technologies are now developing the next-generation SAIP system. Now that the requirements are well understood, they’re considering a Xilinx FPGA-based or Zynq-based NI CompactRIO controller as a replacement for the PXIe system. NI’s addition of TSN (time-sensitive networking) to the CompactRIO family’s repertoire makes such a switch possible. (For more information about NI’s TSN capabilities, see “IOT and TSN: Baby you can drive my [slot] car. TSN Ethernet network drives slot cars through obstacles at NI Week.”)

 

 

 

This project was a 2017 NI Engineering Impact Award finalist in the Energy category last month at NI Week. It is documented in this NI case study.

 

Adam Taylor has just published a blog on the EEWeb.com site titled “The Benefits of HW/SW Co-Simulation for Zynq-Based Designs” where he discusses the use of hardware/software co-simulation to verify the hardware in your hardware designs based on the Xilinx Zynq SoC. The blog continues by discussing Aldec's Riviera-PRO advanced verification platform, which combines a high-performance simulation engine, advanced debugging capabilities at different abstraction levels, and support for the latest Language and Verification Library Standards. Taylor then covers the bridge between Riviera-PRO and Xilinx’s QEMU emulator.

 

It’s not a long blog, so perhaps after you read it you’ll want more. Well, more is available. (Adam might say “More is on offer.”) Taylor is conducting a Webinar for Aldec on June 29 titled “Addressing the Challenges of SoC Verification in practice using Co-Simulation.” During the Webinar, Taylor will discuss the challenges you’ll face when working with the Zynq SoC; he’ll introduce the concept of co-simulation, discuss its constituent parts, and demonstrate advanced debugging techniques based on co-simulation. Then he’ll examine the required environment and pre-requisites needed for co-simulation. All that in just an hour!

 

 

Register here.

 

 

There’s only one problem with the deuterium gas plasma inside of a fusion reactor: it’s hot, really hot! It must be 10 million ˚F (20 million ˚C) hot—hotter than the sun’s surface—if you want to achieve fusion. If this hot plasma touches the relatively cold sides of the reaction vessel, the plasma vanishes. So, you need to confine the plasma tightly in a magnetic field so that it doesn’t escape. You need to do that for long time periods if you want a fusion reaction that reliably produces power, which is after all the objective. How long is a long time?

 

Many minutes.

 

Researchers working on the Large Helical Device (LHD), a superconducting stellerator (a form of plasma fusion reactor) project initiated by Japan’s National Institute for Fusion Science to conduct fusion-plasma confinement research in a steady-state machine, have developed an advanced control system based on National Instruments (NI) CompactRIO embedded controller programmed in NI’s LabVIEW and LabVIEW FPGA to keep the plasma confined and hot inside of the reactor.

 

 

LHD.jpg 

 

Interior of the LHD

 

 

Stabilizing the plasma inside of the LHD requires real-time control of high-energy heating, magnetic fields generated by superconducting electromagnets, and deuterium gas injection based on observed information such as plasma density, temperature, and optical emission. The heating is supplied by 30kV power lines, so control-system mistakes can have catastrophic consequences. In the past, LHD experiments required two or three operators for complex monitoring and response.

 

Here’s a “simplified” diagram of the LHD’s plasma control system:

 

 

LHD Control Diagram.jpg 

 

 

 

With the NI CompactRIO controller, bolstered by the high performance of its internal Xilinx FPGA (all of NI’s CompactRIO controllers are based on Xilinx FPGAs or the Zynq SoC), the LHD control system sustained a high-performance plasma for more than 48 minutes with a total injected energy was 3.4GJ (that’s GigaJoules). The 48-minute duration for the sustained plasma sets a record that bests the previous record set more than a decade previously by more than 3x.

 

On March 7, 2017, LHD ignited its first deuterium plasma.

 

This amazing project was a 2017 NI Engineering Impact Award finalist in the Energy category last month at NI Week and won the 2017 Engineering Grand Challenges Award. It is documented in this NI case study.

 

 

“Cancer” is one of the scariest words in the English language and 1.6 million people in the US alone will be diagnosed with some form of cancer just this year. About 320,000 of those diagnosed cases will be eligible for proton therapy but there are currently only 24 proton treatment centers in the US. The math says that about 5% of the eligible patients will receive proton therapy and the rest will be treated another way. That’s not a great solution and this graph illustrates why:

 

 

 

Proton therapy Energy chart.jpg 

 

 

Protons can deliver more energy to the tumor and less energy to surrounding healthy tissue.

 

One reason that there are few proton-treatment centers in the US is because you need a synchrotron or cyclotron to create a sufficiently strong beam of high-energy protons. ProNova is developing a lower-cost, smaller, lighter, and more energy efficient proton-therapy system called the ProNova SC360 will make proton therapy more available to cancer patients. It’s doing that by developing a cyclotron using superconducting magnets, a multiplexed delivery system that can deliver the resulting proton beam to as many as five treatment rooms, and a treatment gantry that can securely hold and position the patient while precisely delivering a 4mm-to-8mm proton beam to the tumor with 1mm positioning accuracy.

 

 

ProNova 360 Proton treatment System.jpg 

 

ProNova SC360 Proton Therapy System

 

 

 

It takes a lot of real-time control to do all of this, so ProNova developed the DDS (Dose Delivery System) for the SC360 using three National Instruments (NI) sbRIO-9626 embedded controllers, which incorporate a Xilinx Spartan-6 LX45 FPGA for real-time control. The three controllers implement four specific tasks:

 

  • Control the proton beam intensity
  • Position the proton beam using scanning magnets
  • Monitor delivered dosage to within 1% of the prescribed dose
  • Monitor all aspects of the proton beam and shut off the beam in the case of a fault

 

A treatment plan contains a set of locations, or spots, in 3D space (horizontal-X, vertical-Y, depth-Z) that each receive a prescribed radiological dose. The system delivers the high-energy protons to the tumor by scanning the variable-intensity beam back and forth through the tumor volume, as shown below:

 

 

ProNova Scanning Diagram.jpg 

 

 

 

In addition, these subsystems are responsible for safely removing the beam from the treatment room during spot transitions and enforcing safety interlocks. Hard-wired control signals pass between the Spartan-6 FPGAs on each of the sbRIO controllers to signal spot completion, spot advancement, and treatment faults. Each of these three sbRIO applications is programmed using NI’s LabVIEW systems engineering software.

 

Here’s a block diagram of the ProNova DDS beam-control and -positioning system:

 

 

 

ProNova Beam Control Block Diagram.jpg 

 

 

Here’s an example of the kinds of signals generated by this control system:

 

 

 

ProNova Waveforms.jpg

 

 

Proton-beam spot durations are on the order of 5msec and spot transitions take less than 800μsec.

 

ProNova received FDA approval for the SC360 earlier this year and plans to start treating the first patients later this year at the Provision Center for Proton Therapy in Knoxville, Tennessee.

 

Here’s a 4-minute video explaining the system in detail. It starts sort of over the top, but quickly settles down to the facts:

 

 

 


This life-changing project won a 2017 NI Engineering Impact Award in the Industrial Machinery and Control category last month at NI Week and the 2017 Humanitarian Award. It is documented in this NI case study.

 

Perhaps you think DPDK (Data Plane Development Kit) is a high-speed data-movement standard that’s strictly for networking applications. Perhaps you think DPDK is an Intel-specific specification. Perhaps you think DPDK is restricted to the world of host CPUs and ASICs. Perhaps you’ve never heard of DPDK—given its history, that’s certainly possible. If any of those statements is correct, keep reading this post.

 

Originally, DPDK was a set of data-plane libraries and NIC (network interface controller) drivers developed by Intel for fast packet processing on Intel x86 microprocessors. That is the DPDK origin story. Last April, DPDK became a Linux Foundation Project. It lives at DPDK.org and is now processor agnostic.

 

DPDK consists of several main libraries that you can use to:

 

  • Send and receive packets while minimizing the number of CPU cycles needed (usually less than 80)
  • Develop fast packet-capture algorithms
  • Run 3rd-party fast-path stacks

 

So far, DPDK certainly sounds like a networking-specific development kit but, as Atomic Rules’ CTO Shep Siegel says, “If you can make your data-movement problem look like a packet-movement problem,” then DPDK might be a helpful shortcut in your development process.

 

Siegel knows more than a bit about DPDK because his company has just released Arkville, a DPDK-aware FPGA/GPP data-mover IP block and DPDK PMD (Poll Mode Driver) that allow Linux DPDK applications to offload server cycles to FPGA gates in tandem with the Linux Foundation’s 17.05 release of the open-source DPDK libraries. Atomic Rules’ Arkville release is compatible with Xilinx Vivado 2017.1 (the latest version of the Vivado Design Suite), which was released in April. Currently, Atomic rules provides two sample designs:

 

 

  • Four-Port, Four-Queue 10 GbE example (Arkville + 4×10 GbE MAC)
  • Single-Port, Single-Queue 100 GbE example (Arkville + 1×100 GbE MAC)

 

(Atomic Rules’ example designs for Arkville were compiled with Vivado 2017.1 as well.)

 

 

These examples are data movers; Arkville is a packet conduit. This conduit presents a DPDK interface on the CPU side and AXI interfaces on the FPGA side. There’s a convenient spot in the Arkville conduit where you can add your own hardware for processing those packets. That’s where the CPU offloading magic happens.

 

Atomic Rules’ Arkville IP works well with all Xilinx UltraScale devices but it works especially well with Xilinx UltraScale+ All Programmable devices that provide two integrated PCIe Gen3 x16 controllers. (That includes devices in the Kintex UltraScale+ and Virtex UltraScale+ FPGA families and the Zynq UltraScale+ MPSoC device families.)

 

Why?

 

Because, as BittWare’s VP of Network Products Craig Lund says, “100G Ethernet is hard. It’s not clear that you can use PCIe to get [that bit rate] into a server [using one PCIe Gen3 x16 interface]. From the PCIe specs, it looks like it should be easy, but it isn’t.” If you are handling minimum-size packets, says Lund, there are lots of them—more than 14 million per second. If you’re handling big packets, then you need a lot of bandwidth. Either use case presents a throughput challenge to a single PCIe Root Complex. In practice, you really need two.

 

BittWare has implemented products using the Atomic Rules Arkville IP, based on its XUPP3R PCIe card, which incorporates a Xilinx Virtex UltraScale+ VU13P FPGA. One of the many unique features of this BittWare board is that it has two PCIe Gen3 x16 ports: one available on an edge connector and the other available on an optional serial expansion port. This second PCIe Gen3 x16 port can be connected to a second PCIe slot for added bandwidth.

 

However, even that’s not enough says Lund. You don’t just need two PCIe Gen3 x16 slots; you need two PCIe Gen2 Root Complexes and that means you need a 2-socket motherboard with two physical CPUs to handle the traffic. Here’s a simplified block diagram that illustrates Lund’s point:

 

 

BittWare XUPP3R PCIe Card with two processors.jpg 

 

 

BittWare’s XUPP3R PCIe Card has two PCIe Gen3 x16 ports: one on an edge connector and the other on an optional serial expansion port for added bandwidth

 

 

 

BittWare has used its XUPP3R PCIe card and the Arkville IP to develop two additional products:

 

 

 

Note: For more information about Atomic Rules’ IP and BittWare’s XUPP3R PCIe card, see “BittWare’s UltraScale+ XUPP3R board and Atomic Rules IP run Intel’s DPDK over PCIe Gen3 x16 @ 150Gbps.”

 

 

Arkville is a product offered by Atomic Rules. The XUPP3R PCIe card is a product offered by BittWare. Please contact these vendors directly for more information about these products.

 

 

 

 

 

Ball Aerospace Methane Monitor Aircraft.jpg According to pipeline101.org, there are 2.1 million miles of natural gas distribution pipelines in the US alone. With that much pipe, you can bet that there are leaks. In fact, a very recent, massive, and famous leak occurred at the Aliso Canyon underground gas storage facility near Los Angeles in 2015 that released an estimated 97,100 tonnes of methane and 7,300 tonnes of ethane into the atmosphere. Although PHMSA (the Pipeline and Hazardous Materials Safety Administration) programs have reduced serious pipeline incidents by 39% since 2009, more than 250 serious leaks have occurred in the past eight years.

 

Manual inspection of millions of miles worth of pipeline is problematic and expensive so a fast way to conduct such inspections from low-cost aircraft could cut costs significantly, increase inspection coverage, and make inspections more timely. Using its 50 years of remote-sensing expertise, Ball Aerospace in Boulder, CO has developed just such a method using a pair of airborne lasers with fast processing supplied by an FPGA-based FlexRIO PXIe module from National Instruments (NI).

 

Natural gas consists primarily of methane (CH4) and methane absorbs light between 1.6455nm and 1.6456nm. It does not absorb light at 1.6454nm. Firing two laser pulses into air at those wavelengths and measuring the reflected energy, again at those wavelengths, produces a differential absorption lidar (DIAL) measurement. In other words, the result is a remote-sensing tool for long-range detection of airborne methane. Ball engineers used these methane characteristics to develop Methane Monitor, which can measure methane concentrations in the air. Better yet, this system can visualize methane plumes, positively identifying and pinpointing leaks.

 

Proper location and tracking of targets, attitude correction, and geo-location of methane measurements requires some pretty tight synchronization among all of the Methane Monitor’s components. Ball used an NI PXIe chassis as a measurement platform and populated it with:

 

 

 

 

The Methane Monitor uses three or four transducers with the NI 5761 FlexRIO digitizer adapter module and the resulting throughput (between 11.2Gbps and 15.2Gbps) dictates the FPGA processing. After calibrating each ADC for the transducer chain, the FPGA performs variable offset correction and serializes the high-throughput data between laser firings. The Virtex-5 FPGA captures the DIAL signals, measures range, and calculates methane concentration at 1000 to 10,000 measurements per second. Naturally, all of this is programmed using NI’s LabVIEW system engineering software.

 

To date, this system has logged more than 100 hours of flight time and can detect methane flow rates as low as 50 standard cubic feet/hour with sensing swaths as wide as 200m. The system produces strikingly vivid results like this:

 

 

 

Ball Aerospace Methane Monitor.jpg 

 

Real-world methane plumes discovered by Methane Monitor. On the left is real-time data overlaid on Google Maps. On the right is postprocessed data overlaid on Google Maps. The straight green lines are overlays of buried oil and gas infrastructure. The legend ranges from 0 ppm-m (blue) to 1,000 ppm-m (red) CH4 above background. For reference, the current background level of methane globally is approximately 1.9 ppm.

 

 

 

Ball Aerospace plans to double the operating altitude of this system to 3000 feet, double the spatial resolution, and quintuple the width of the survey swath by upgrading the digitizer to an NI PXIe-5172 Reconfigurable PXI Oscilloscope, which is based on a Xilinx Kintex-7 FPGA.

 

This impressive project won a 2017 NI Engineering Impact Award in the Energy category last month at NI Week and is documented in this NI case study.

 

 

 

By Adam Taylor

 

So far, our examination of the Zynq UltraScale MPSoC + has focused mainly upon the PS (processing system) side of the device. However, to fully utilize the device’s capabilities we need to examine the PL (programmable logic) side also. So in this blog, we will look at the different AXI interfaces between the PS and the PL.

 

 

Image1.jpg

 

Zynq MPSoC Interconnect Structure

 

 

 

These different AXI interfaces provide a mixture of master and slave ports from the PS perspective and they can be coherent or not. The PS is the master for the following interfaces:

 

  1. FPD High Performance Master (HPM) – Two interfaces within the Full Power Domain.
  2. LPD High Performance Master (HPM) – One Interface within the Low Power Domain.

 

For the remaining interfaces the PL is the master:

 

  1. FPD High Performance Coherent (HPC) – Two Interfaces within the Full Power Domain. These interfaces pass through the CCI (Cache Coherent Interconnect) and provide one-way coherency from the PL to the PS.
  2. FPD High Performance (HP) – Four Interfaces within the Full Power Domain. These interfaces provide non-coherent transfers.
  3. Low Power Domain – One interface within the Low Power Domain.
  4. Accelerator Coherency Port (ACP) – One interface within the Full Power Domain. This interface provides one-way coherency (IO) allowing PL masters to snoop the APU Cache.
  5. Accelerator Coherency Extension (ACE) – One interface within the Full Power Domain. This interface provides full coherency using the CCI. For this interface, the PL master needs to have a cache within the PL.

 

Except for the ACE and ACP interfaces, which have a fixed data width, the remaining interfaces have a selectable data width of 32, 64, or 128 bits.

 

To support the different power domains within the Zynq MPSoC, each of the master interfaces within the PS is provided with an AXI isolation block that isolates the interface should a power domain be powered down. To protect the APU and RPU from hanging up performing an AXI access, each PS master interface also has a AXI timeout block to recover from any incorrect AXI interactions—for example, if the PL is not powered or configured.

 

We can use these interfaces simply within our Vivado design, where we can enable, disable, and configure the desired interface.

 

 

Image2.jpg

 

 

 

Once you have enabled and configured the desired interfaces, you can connect them into your design in the PL. Within the simple example in this blog post, we are going to transfer data to and from a BRAM located within the PL.

 

 

Image3.jpg 

 

 

This example uses the AXI master connected to the low-power domain (LPD). However, both the APU and the RPU can address the BRAM via this interface thanks to the SMMU, the Central Switch, and the Low Power Switch. However, the use of the LPD AXI interconnect will allow the RPU to access the PL if the FPD (full-power domain) is powered down. Of course, it does increase complexity when using the APU.

 

This simple example performs the following steps:

 

  • Reads 256 addresses and check that they are all zero.
  • Write a count into the 256 addresses.
  • Read back the data stored in the 256 addresses to demonstrate that the data was written correctly.

 

 

 

Image4.jpg

 

Program Starting to read addresses for part 1

 

 

 
Image5.jpg

 

Data written to the first 256 BRAM addresses

 

 

 

Image6.jpg 

 

Data read back to confirm the write

 

 

The key element in our designs is selecting the correct AXI interface for the application and data transfers at hand and ensuring that we are getting the best possible performance from the interconnect. Next time we will look at the quality of service and the AXI performance monitor.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

My Pappy said

Son, you’re gonna

Drive me to drinkin’

If you don’t stop drivin’

That Hot Rod Lincoln” — Commander Cody & His Lost Planet Airmen

 

 

In other words, you need an autonomous vehicle.

 

For the last 14 years, Perrone Robotics has focused on creating platforms that allow vehicle manufacturers to quickly integrate a variety of sensors and control algorithms into a self-driving vehicle. The company’s MAX (Mobile Autonomous X) is “comprehensive full-stack, modular, real-time capable, customizable, robotics software platform for autonomous (self-driving) vehicles and general purpose robotics.”

 

Sensors for autonomous vehicles include cameras, lidar, radar, ultrasound, and GPS. All of these sensors generate a lot of data—about 1Mbyte/sec for the Perrone test platform. Designers need to break up all of the processing required for these sensors into tasks that can be distributed to multiple processors and then fuse the processed sensor data (sensor fusion) to achieve real-time, deterministic performance. For the most demanding tasks, software-based processing won’t deliver sufficiently quick response.

 

Self-driving systems must make as many as 100 decisions/sec based on real-time sensor data. You never know what will come at you.

 

According to Perrone’s Chief Revenue Officer Dave Hofert, the Xilinx Zynq UltraScale+ MPSoC with its multiple ARM Cortex-A53 and -R5 processors and programmable logic can handle all of these critical tasks and provides a “solution that scales,” with enough processing power to bring in machine learning as well.

 

Here’s a brand new, 3-minute video with more detail and a lot of views showing a Perrone-equipped Lincoln driving very carefully all by itself:

 

 

 

 

For more detailed information about Perrone Robotics, see this new feature story from an NBC TV affiliate.

 

 

 

Medium- and heavy-duty fleet vehicles account for a mere 4% of the vehicles in use today but they consume 40% of the fuel used in urban environments, so they are cost-effective targets for innovations that can significantly improve fuel economy. Lightning Systems (formerly Lightning Hybrids) has developed a patented hydraulic hybrid power-train system called ERS (Energy Recovery System) that can be retrofitted to new or existing fleet vehicles including delivery trucks and shuttle buses. This hybrid system can reduce fleet fuel consumption by 20% and decrease NOx emissions (the key component of smog) by as much as 50%! In addition to being a terrific story about energy conservation and pollution control, the development of the ERS system tells a great story about using National Instruments’ (NI’s) comprehensive line of LabVIEW-compatible CompactRIO (cRIO) and Single-Board RIO (sbRIO) controllers to develop embedded controllers destined for production.

 

Like an electric hybrid vehicle power train, an ERS-enhanced power train recovers energy during vehicle braking and adds that energy back into the power train during acceleration. However, Lightning Systems’ ERS stores the energy using hydraulics instead of electricity.

 

Here are the components of the ERS retrofit system, shown installed in series with a power train’s drive shaft:

 

 

 

Lightning Hybrids ERS Diagram.jpg 

 

 

Major components in the Lightning Systems ERS Hybrid Retrofit System

 

 

 

The power-transfer module (PTM) in the above image drives the hydraulic pump/motor during vehicle braking, pumping hydraulic fluid into the high- and low-pressure accumulator tanks, which act like mechanical batteries that store energy in tanks pressurized by nitrogen-filled bladders. When the vehicle accelerates, the pump/motor operates as a motor driven by the pressurized hydraulic fluid’s energy stored in the accumulators. The hydraulic motor puts energy back into the vehicle’s drive train through the PTM. A valve manifold controls the filling and emptying of the accumulator tanks during vehicle operation and all of the ERS control sequencing is handled by a National Instruments (NI) RIO controller programmed using NI’s LabVIEW system development software. All of NI’s Compact and Single-Board RIO controllers incorporate a Xilinx FPGA or a Xilinx Zynq SoC to provide real-time control of closed-loop systems.

 

Lightning Systems has developed four generations of ERS controllers based on NI’s CompactRIO and Single-Board RIO controllers. The company based its first ERS prototype controller on an 8-slot NI CRIO-9024 controller and deployed the design in pilot systems. A 2nd-generation ERS prototype controller used a 4-slot NI cRIO-9075 controller, which incorporates a Xilinx Spartan-6 LX25 FPGA. The 3rd-generation ERS controller used an NI sbRIO-9626 paired with a custom daughterboard. The sbRIO-9626 incorporates a larger Xilinx Spartan-6 LX45 FPGA and Lightning Systems fielded approximately 100 of these 3rd-generation ERS controllers.

 

 

 

Lightning Hybrids v2 v3 v4 ERS Controllers.jpg 

 

Three generations of Lightning Systems’ ERS controller (from left to right: v2, v3, and v4) based on

National Instruments' Compact RIO and Single-Board RIO controllers

 

 

 

For its 4th-generation ERS controller, the company is using NI’s sbRIO-9651 single-board RIO SOM (system on module), which is based on a Xilinx Zynq Z-7020 SoC. The SOM is also paired with a custom daughterboard. Using NI’s Zynq-based SOM reduces the controller cost by 60% while boosting the on-board processing power and adding in a lot more programmable logic. The SOM’s additional processing power allowed Lightning Systems to implement new features and algorithms that have increased fuel economy.

 

 

 

Lightning Hybrids v4 ERS Controller.jpg 

 

Lightning Systems v4 ERS Controller uses a National Instruments sbRIO-9651 SOM based on a

Xilinx Zynq Z-7020 SoC

 

 

 

Lightning Systems is able to easily migrate its LabVIEW code throughout these four ERS controller generations because all of NI’s CompactRIO and Single-Board RIO controllers are software-compatible. In addition, this controller design allows easy field upgrades to the software, which reduces vehicle downtime.

 

Lightning Systems has developed a modular framework so that the company can quickly retrofit the ERS to most medium- and heavy-duty vehicles with minimal new design work or vehicle modification. The PTM/manifold combination mounts between the vehicle’s frame rails. The accumulators can reside remotely, wherever space is available, and connect to the valve manifold through high-pressure hydraulic lines. The system is designed for easy installation and the company can typically convert a vehicle’s power train into a hybrid system in less than a day. Lightning Systems has already received orders for ERS hybrid systems from customers in Alaska, Colorado, Illinois, and Massachusetts, as well as around the world in India and the United Kingdom.

 

 

 

Lightning Hybrids Typical ERS Installation.jpg 

 

Typical Lightning Systems ERS Installation

 

 

This project recently won a 2017 NI Engineering Impact Award in the Transportation and Heavy Equipment category and is documented in this NI case study.

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.