We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!


Anthony Collins, Harpinder Matharu, and Ehab Mohsen of Xilinx have just published an application article about the 16nm Xilinx RFSoC in MicroWave Journal titled “RFSoC Integrates RF Sampling Data Converters for 5G New Radio.” Xilinx announced the RFSoC, which is based on the 16nm Xilinx Zynq UltraScale+ MPSoC, back in February (see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable,” we mean it!”). The Xcell Daily blog with that announcement has been very popular. Last week, another blog gave more details (see “Ready for a few more details about the Xilinx All Programmable RFSoC? Here you go”), and now there’s this article in Microwave Journal.


This new article gets into many specifics with respect to designing the RFSoC into systems with block diagrams and performance numbers. In particular, there’s a table showing MIMO radio designs based on the RFSoC with 37% to 51% power reductions and significant pcb real-estate savings due to the RFSoC’s integrated, multi-Gbps ADCs and DACs.


If you’re looking to glean a few more technical details about the RFSoC, this article is the latest place to go.




Avnet has formally introduced its MiniZed dev board based on the Xilinx Zynq Z-7000S SoC with the low, low price of just $89. For this, you get a Zynq Z-7007S SoC with one ARM Cortex-A9 processor core, 512Mbytes of DDR3L SDRAM, 128Mbits of QSPI Flash, 8Gbytes of eMMC Flash memory, WiFi 802.11 b/g/n, and Bluetooth 4.1. The MiniZed board incorporates an Arduino-compatible shield interface, two Pmod connectors, and a USB 2.0 host interface for fast peripheral expansion. You’ll also find an ST Microelectronics LIS2DS12 Motion and temperature sensor and an MP34DT05 Digital Microphone on the board. This is a low-cost dev board that packs the punch of a fast ARM Cortex-A9 processor, programmable logic, a dual-wireless communications system, and easy system expandability.


I find the software that accompanies the board equally interesting. According to the MiniZed Product Brief, the $89 price includes a voucher for an SDSoC license so you can program the programmable logic on the Zynq SoC using C or C++ in addition to Verilog or VHDL using Vivado. This is a terrific deal on a Zynq dev board, whether you’re a novice or an experienced Xilinx user.


Avnet’s announcement says that the board will start shipping in early July.


Stefan Rousseau, senior technical marketing engineer for Avnet, said, “Whether customers are developing a Linux-based system or have a simple bare metal implementation, with MiniZed, Zynq-7000 development has never been easier. Designers need only connect to their laptops with a single micro-USB cable and they are up and running. And with Bluetooth or Wi-Fi, users can also connect wirelessly, transforming a mobile phone or tablet into an on-the-go GUI.”




Here’s a photo of the MiniZed Dev board:



Avnet MiniZed 3.jpg 


Avnet’s $89 MiniZed Dev Board based on a Xilinx Zynq Z-7007S SoC



And here’s a block diagram of the board:



MiniZed Block Diagram.jpg 


Avnet’s $89 MiniZed Dev Board Block Diagram


MathWorks has just published a 30-minute video titled “FPGA for DSP applications: Fixed Point Made Easy.” The video targets users of the company’s MATLAB and Simulink software tools and covers fixed-point number systems, how these numbers are represented in MATLAB and in FPGAs, quantization and quantization challenges, sources of error and minimizing these errors, how to use MathWorks’ design tools to understand these concepts, implementation of fixed-point DSP algorithms on FPGAs using MathWorks’ tools, and the advantages of the Xilinx DSP48 block—which you’ll find in all Xilinx 28nm series 7, 20nm UltraScale, and 16nm UltraScale+ devices including Zynq SoCs and Zynq UltraScale+ MPSoCs.


The video also shows the development of an FIR filter using MathWorks’ fixed-point tools as an example with some useful utilization feedback that helps you optimize your design. The video also briefly shows how you can use MathWorks’ HDL Coder tool to develop efficient, single-precision, floating-point DSP hardware for Xilinx FPGAs.






By Adam Taylor


We can create very responsive design solutions using Xilinx Zynq SoC or Zynq UltraScale+ MPSoC devices, which enble us to architect systems that exploit the advantages provided by both the PS (processor system) and the PL (programmable logic) in these devices. When we work with logic designs in the PL, we can optimize the performance of design techniques like pipelining and other UltraFast design methods. We can see the results of our optimization techniques using simulation and Vivado implementation results.


When it comes to optimizing the software, which runs on acceleration cores instantiated in the PS, things may appear a little more opaque. However, things are not what they might appear. We can gather statistics on our accelerated code with ease using the performance analysis capabilities built into XSDK. Using performance analysis, we can examine the performance of the software we have running on the acceleration cores and we can monitor AXI performance within the PL to ensure that the software design is optimized for the application at hand.


Using performance analysis, we can examine several aspects of our running code:


  • CPU Utilization – Percentage of non-idling CPU clock cycles
  • CPU Instructions Per Cycle – Estimated number of executed instructions per cycle
  • L1 Cache Data Miss Rate % – L1 data-cache miss rate
  • L1 Cache Access Per msec – Number of L1 data-cache accesses
  • CPU Write Instructions Stall per cycle – Estimated number of stall cycles per instruction
  • CPU Read Instructions Stall per cycle – Estimated number of stall cycles per instruction


For those who may not be familiar with the concept, a stall occurs when the cache does not contain the requested data, which must then be fetched from main memory. While the data is fetched, the core can continue to process different instructions using out-of-order (OOO) execution, however the processor will eventually run out of independent instructions. It will have to wait for the information it needs. This is called a stall.


We can gather these stall statistics thanks to the Performance Monitor Unit (PMU) contained within each of the Zynq UltraScale+ MPSoC’s CPUs. The PMU provides six profile counters, which are configured by and post processed by XSDK to generate the statistics above.


If we want to use the performance monitor within SDK, we need to work with a debug build and then open the Performance Monitor Perspective within XSDK. If we have not done so before, we can open the perspective as shown below:









Opening the Performance Analysis Perspective



With the performance analysis perspective open, we can debug the application as normal. However, before we click on the run icon (the debugger should be set to stop at main, as default), we need to start the performance monitor. To do that, right click on the “System Debugger on Local” symbol within the performance monitor window and click start.





Starting the Performance Analysis




Then, once we execute the program, the statistics will be gathered and we can analyse them within XDSK to determine the best optimizations for our code.


To demonstrate how we can use this technique to deliver a more optimized system, I have created a design that runs on the ZedBoard and performs AES256 Encryption on 1024 packets of information. When this code was run the ZedBoard the following execution statistics were collected:





Performance Graphs






Performance Counters




So far, these performance statistics only look at code executing on the PS itself. Next time, we will look at how we can use the AXI Performance Monitor with XSDK. If we wish to do this, we need to first instrument the design in Vivado.





Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 




Linc, Perrone Robotics’ autonomous Lincoln MKZ automobile, took a drive around the Perrone paddock at the TU Automotive autonomous vehicle show in Detroit last week and Dan Isaacs, Xilinx’s Director Connected Systems in Corporate Marketing, was there to shoot photos and video. Perrone’s Linc test vehicle operates autonomously using the company’s MAX (Mobile Autonomous X), a “comprehensive full-stack, modular, real-time capable, customizable, robotics software platform for autonomous (self-driving) vehicles and general purpose robotics.” MAX runs on multiple computing platforms including one based on an Iveia controller, which is based on an Iveia Atlas SOM, which in turn is based on a Xilinx Zynq UltraScale+ MPSoC. The Zynq UltraScale+ MPSoC handles the avalanche of data streaming from the vehicle’s many sensors to ensure that the car travels the appropriate path and avoids hitting things like people, walls and fences, and other vehicles. That’s all pretty important when the car is driving itself in public. (For more information about Perrone Robotics’ MAX, see “Perrone Robotics builds [Self-Driving] Hot Rod Lincoln with its MAX platform, on a Zynq UltraScale+ MPSoC.”)


Here’s a photo of Perrone’s sensored-up Linc autonomous automobile in the Perrone Robotics paddock at TU Automotive in Detroit:



Perrone Robotics Linc Autonomous Driving Lincoln MKZ.jpg 



And here’s a photo of the Iveia control box with the Zynq UltraScale+ MPSoC inside, running Perrone’s MAX autonomous-driving software platform. (Note the controller’s small size and lack of a cooling fan):



Iveia Autonomous Driving Controller for Perrone Robotics.jpg 



Opinions about the feasibility of autonomous vehicles are one thing. Seeing the Lincoln MKZ’s 3800 pounds of glass, steel, rubber, and plastic being controlled entirely by a little silver box in the trunk, that’s something entirely different. So here’s the video that shows Perrone Robotics’ Linc in action, driving around the relative safety of the paddock while avoiding the fences, pedestrians, and other vehicles:




If you’re designing next-generation avionics systems, you may be facing some challenges:


  • Developing scalable, reconfigurable, common-compute platforms for flexible deployment
  • Managing tradeoffs in size, weight, power, and cost and between component- and board-level development
  • Meeting safety requirements including those related to radiation effects
  • Meeting stringent and evolving certification requirements for DO-254, DO-178, and guidance related to multi-core use
  • Maximizing hardware and software reuse and dealing with associated certification artifacts while ensuring an appropriate level of design integrity and system safety


Do these sound like your challenges? Want some help? Check out this June 20 Webinar.



When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.


Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:


  • Machine learning for manufacturing
  • Cloud acceleration
  • Autonomous cars, drones, and robots
  • Real-time 4K, UHD, and 8K video and image processing
  • VR and AR
  • High-speed networking by RF, LED-based free-air optics, and fiber
  • Cybersecurity for IIoT


That’s a huge range covered in just five minutes.


Here’s the video:






Signal Integrity Journal just published a new article titled “Addressing the 5G Challenge with Highly Integrated RFSoC,” written by four Xilinx authors. The articles discusses some potential uses for Xilinx RFSoC technology, announced in February. (See “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable, we mean it!”)


Cutting to the chase of this 2600-word article, the Xilinx RFSoC is going to save you a ton of power and make it easier for you to achieve your performance goals for 5G and many other advanced, mixed-signal system designs.


If you’re involved in the design of a system like that, you really should read the article.




Light Reading’s International Group Editor Ray Le Maistre recently interviewed David Levi, CEO of Ethernity Networks, who discusses the company’s FPGA-based All Programmable ACE-NIC, a Network Interface Controller with 40Gbps throughput. The carrier-grade ACE-NIC accelerates vEPC (virtual Evolved Packet Core, a framework for virtualizing the functions required to converge voice and data on 4G LTE networks) and vCPE (virtual Customer Premise Equipment, a way to deliver routing, firewall security and virtual private network connectivity services using software rather than dedicated hardware) applications by 50x, dramatically reducing end-to-end latency associated with NFV platforms. Ethernity’s ACE-NIC is based on a Xilinx Kintex-7 FPGA.


“The world is crazy about our solution—it’s amazing,” says Levi in the Light Reading video interview.



Ethernity Networks ACE-NIC.jpg


Ethernity Networks All Programmable ACE-NIC



Because Ethernity implements its NIC IP in a Kintex-7 FPGA, it was natural for Le Maistre to ask Levi when his company would migrate to an ASIC. Levi’s answer surprised him:


“We offer a game changer... We invested in technology—which is covered by patents—that consumes 80% less logic than competitors. So essentially, a solution that you may want to deliver without our patents will cost five times more on FPGA… With this kind of solution, we succeed over the years in competing with off-the-shelf components… with the all-programmable NIC, operators enjoy the full programmability and flexibility at an affordable price, which is comparable to a rigid, non-programmable ASIC solution.”


In other words, Ethernity plans to stay with All Programmable devices for its products. In fact, Ethernity Networks announced last year that it had successfully synthesized its carrier-grade switch/router IP for the Xilinx Zynq UltraScale+ MPSoC and that the throughput performance increases to 60Gbps per IP core with the 16nm device—and 120Gbps with two instances of that core. “We are going to use this solution for novel SDN/NFV market products, including embedded SR-IOV (single-root input/output virtualization), and for high density port solutions,” – said Levi.


Towards the end of the video interview, Levi looks even further into the future when he discusses Amazon Web Services’ (AWS’) recent support of FPGA acceleration. (That’s the Amazon EC2 F1 compute instance based on Xilinx Virtex UltraScale+ FPGAs rolled out earlier this year.) Because it’s already based on Xilinx All Programmable devices, Ethernity’s networking IP runs on the Amazon EC2 F1 instance. “It’s an amazing opportunity for the company [Ethernity],” said Levi. (Try doing that in an ASIC.)


Here’s the Light Reading video interview:







A wide range of commercial, government, and social applications require precise aerial imaging. These application range from the management of high-profile, international-scale humanitarian and disaster relief programs to everyday commercial use—siting large photovoltaic arrays for example. Satellites can capture geospatial imagery across entire continents, often at the expense of spatial resolution. Satellites also lack the flexibility to image specific areas on demand. You must wait until the satellite is above the real estate of interest. Spookfish Limited in Australia along with ICON Technologies have developed the Spookfish Airborne Imaging Platform (SAIP) based on COTS (commercial off-the-shelf) products including National Instruments’ (NI’s) PXIe modules and LabVIEW systems engineering software that can capture precise images with resolutions of 6cm/pixel to better than 1cm/pixel from a light aircraft cruising at 160 knots at altitudes to 12,000 feet.


The 1st-generation SAIP employs one or more cameras installed in a tube attached to the belly of a light aircraft. Success with the initial prototype led to the development of a 2nd-generation design with two camera tubes. The system has continued to grow and now accommodates as many as three camera tubes with as many as four cameras per tube.


The multiple cameras must be steered precisely in continuous, synchronized motion while recording camera angles, platform orientation, and platform acceleration. All of this data is used to post-process the image data. At typical operating altitudes and speeds, the cameras must be steered with millidegree precision and the camera angles and platform position must be logged with near-microsecond accuracy and precision. Spookfish then uses a suite of open-source and proprietary computer-vision and photogrammetry techniques to process the imagery, which results in orthophotos, elevation data, and 3D models.


Here’s a block diagram of the Spookfish SAIP:



Spookfish SAIP Block diagram.jpg 




The NI PXIe system in the SAIP design consists of a PXIe-1082DC chassis, a PXIe-8135 RT controller, a PXI-6683H GPS/PPS synchronization module, a PXIe-6674T clock and timing module, a PXIe-7971R FlexRIO FPGA Module, and a PXIe-4464 sound and vibration module. (The PXIe7971R FlexRIO module is based on a Xilinx Kintex-7 325T FPGA. The PXI-6683H synchronization module and the PXIe-6674T clock and timing module are both based on Xilinx Virtex-5 FPGAs.)


Here’s an aerial image captured by an SAIP system at 6cm/pixel:



Spookfish SAIP image at 6cm per pixel.jpg 



And here’s a piece of an aerial image taken by an SAIP system at 1.5cm/pixel:



Spookfish SAIP image at 6cm per pixel.jpg 




During its multi-generation development, the SAIP system quickly evolved far beyond its originally envisioned performance specification as new requirements arose. For example, initial expectations were that logged data would only need to be tagged with millisecond accuracy. However, as the project progressed, ICON Technologies and NI improved the system’s timing accuracy and precision by three orders of magnitude.


NI’s FPGA-based FlexRIO technology was also crucial in meeting some of these shifting performance targets. Changing requirements pushed the limits of some of the COTS interfaces, so custom FlexRIO interface implementations optimized for the tasks were developed as higher-speed replacements. Often, NI’s FlexRIO technology is employed for the high-speed computation available in the FPGA’s DSP slices, but in this case it was the high-speed programmable I/O that was needed.


Spookfish and ICON Technologies are now developing the next-generation SAIP system. Now that the requirements are well understood, they’re considering a Xilinx FPGA-based or Zynq-based NI CompactRIO controller as a replacement for the PXIe system. NI’s addition of TSN (time-sensitive networking) to the CompactRIO family’s repertoire makes such a switch possible. (For more information about NI’s TSN capabilities, see “IOT and TSN: Baby you can drive my [slot] car. TSN Ethernet network drives slot cars through obstacles at NI Week.”)




This project was a 2017 NI Engineering Impact Award finalist in the Energy category last month at NI Week. It is documented in this NI case study.


Adam Taylor has just published a blog on the EEWeb.com site titled “The Benefits of HW/SW Co-Simulation for Zynq-Based Designs” where he discusses the use of hardware/software co-simulation to verify the hardware in your hardware designs based on the Xilinx Zynq SoC. The blog continues by discussing Aldec's Riviera-PRO advanced verification platform, which combines a high-performance simulation engine, advanced debugging capabilities at different abstraction levels, and support for the latest Language and Verification Library Standards. Taylor then covers the bridge between Riviera-PRO and Xilinx’s QEMU emulator.


It’s not a long blog, so perhaps after you read it you’ll want more. Well, more is available. (Adam might say “More is on offer.”) Taylor is conducting a Webinar for Aldec on June 29 titled “Addressing the Challenges of SoC Verification in practice using Co-Simulation.” During the Webinar, Taylor will discuss the challenges you’ll face when working with the Zynq SoC; he’ll introduce the concept of co-simulation, discuss its constituent parts, and demonstrate advanced debugging techniques based on co-simulation. Then he’ll examine the required environment and pre-requisites needed for co-simulation. All that in just an hour!



Register here.



There’s only one problem with the deuterium gas plasma inside of a fusion reactor: it’s hot, really hot! It must be 10 million ˚F (20 million ˚C) hot—hotter than the sun’s surface—if you want to achieve fusion. If this hot plasma touches the relatively cold sides of the reaction vessel, the plasma vanishes. So, you need to confine the plasma tightly in a magnetic field so that it doesn’t escape. You need to do that for long time periods if you want a fusion reaction that reliably produces power, which is after all the objective. How long is a long time?


Many minutes.


Researchers working on the Large Helical Device (LHD), a superconducting stellerator (a form of plasma fusion reactor) project initiated by Japan’s National Institute for Fusion Science to conduct fusion-plasma confinement research in a steady-state machine, have developed an advanced control system based on National Instruments (NI) CompactRIO embedded controller programmed in NI’s LabVIEW and LabVIEW FPGA to keep the plasma confined and hot inside of the reactor.





Interior of the LHD



Stabilizing the plasma inside of the LHD requires real-time control of high-energy heating, magnetic fields generated by superconducting electromagnets, and deuterium gas injection based on observed information such as plasma density, temperature, and optical emission. The heating is supplied by 30kV power lines, so control-system mistakes can have catastrophic consequences. In the past, LHD experiments required two or three operators for complex monitoring and response.


Here’s a “simplified” diagram of the LHD’s plasma control system:



LHD Control Diagram.jpg 




With the NI CompactRIO controller, bolstered by the high performance of its internal Xilinx FPGA (all of NI’s CompactRIO controllers are based on Xilinx FPGAs or the Zynq SoC), the LHD control system sustained a high-performance plasma for more than 48 minutes with a total injected energy was 3.4GJ (that’s GigaJoules). The 48-minute duration for the sustained plasma sets a record that bests the previous record set more than a decade previously by more than 3x.


On March 7, 2017, LHD ignited its first deuterium plasma.


This amazing project was a 2017 NI Engineering Impact Award finalist in the Energy category last month at NI Week and won the 2017 Engineering Grand Challenges Award. It is documented in this NI case study.



Perhaps you think DPDK (Data Plane Development Kit) is a high-speed data-movement standard that’s strictly for networking applications. Perhaps you think DPDK is an Intel-specific specification. Perhaps you think DPDK is restricted to the world of host CPUs and ASICs. Perhaps you’ve never heard of DPDK—given its history, that’s certainly possible. If any of those statements is correct, keep reading this post.


Originally, DPDK was a set of data-plane libraries and NIC (network interface controller) drivers developed by Intel for fast packet processing on Intel x86 microprocessors. That is the DPDK origin story. Last April, DPDK became a Linux Foundation Project. It lives at DPDK.org and is now processor agnostic.


DPDK consists of several main libraries that you can use to:


  • Send and receive packets while minimizing the number of CPU cycles needed (usually less than 80)
  • Develop fast packet-capture algorithms
  • Run 3rd-party fast-path stacks


So far, DPDK certainly sounds like a networking-specific development kit but, as Atomic Rules’ CTO Shep Siegel says, “If you can make your data-movement problem look like a packet-movement problem,” then DPDK might be a helpful shortcut in your development process.


Siegel knows more than a bit about DPDK because his company has just released Arkville, a DPDK-aware FPGA/GPP data-mover IP block and DPDK PMD (Poll Mode Driver) that allow Linux DPDK applications to offload server cycles to FPGA gates in tandem with the Linux Foundation’s 17.05 release of the open-source DPDK libraries. Atomic Rules’ Arkville release is compatible with Xilinx Vivado 2017.1 (the latest version of the Vivado Design Suite), which was released in April. Currently, Atomic rules provides two sample designs:



  • Four-Port, Four-Queue 10 GbE example (Arkville + 4×10 GbE MAC)
  • Single-Port, Single-Queue 100 GbE example (Arkville + 1×100 GbE MAC)


(Atomic Rules’ example designs for Arkville were compiled with Vivado 2017.1 as well.)



These examples are data movers; Arkville is a packet conduit. This conduit presents a DPDK interface on the CPU side and AXI interfaces on the FPGA side. There’s a convenient spot in the Arkville conduit where you can add your own hardware for processing those packets. That’s where the CPU offloading magic happens.


Atomic Rules’ Arkville IP works well with all Xilinx UltraScale devices but it works especially well with Xilinx UltraScale+ All Programmable devices that provide two integrated PCIe Gen3 x16 controllers. (That includes devices in the Kintex UltraScale+ and Virtex UltraScale+ FPGA families and the Zynq UltraScale+ MPSoC device families.)




Because, as BittWare’s VP of Network Products Craig Lund says, “100G Ethernet is hard. It’s not clear that you can use PCIe to get [that bit rate] into a server [using one PCIe Gen3 x16 interface]. From the PCIe specs, it looks like it should be easy, but it isn’t.” If you are handling minimum-size packets, says Lund, there are lots of them—more than 14 million per second. If you’re handling big packets, then you need a lot of bandwidth. Either use case presents a throughput challenge to a single PCIe Root Complex. In practice, you really need two.


BittWare has implemented products using the Atomic Rules Arkville IP, based on its XUPP3R PCIe card, which incorporates a Xilinx Virtex UltraScale+ VU13P FPGA. One of the many unique features of this BittWare board is that it has two PCIe Gen3 x16 ports: one available on an edge connector and the other available on an optional serial expansion port. This second PCIe Gen3 x16 port can be connected to a second PCIe slot for added bandwidth.


However, even that’s not enough says Lund. You don’t just need two PCIe Gen3 x16 slots; you need two PCIe Gen2 Root Complexes and that means you need a 2-socket motherboard with two physical CPUs to handle the traffic. Here’s a simplified block diagram that illustrates Lund’s point:



BittWare XUPP3R PCIe Card with two processors.jpg 



BittWare’s XUPP3R PCIe Card has two PCIe Gen3 x16 ports: one on an edge connector and the other on an optional serial expansion port for added bandwidth




BittWare has used its XUPP3R PCIe card and the Arkville IP to develop two additional products:




Note: For more information about Atomic Rules’ IP and BittWare’s XUPP3R PCIe card, see “BittWare’s UltraScale+ XUPP3R board and Atomic Rules IP run Intel’s DPDK over PCIe Gen3 x16 @ 150Gbps.”



Arkville is a product offered by Atomic Rules. The XUPP3R PCIe card is a product offered by BittWare. Please contact these vendors directly for more information about these products.






By Adam Taylor


So far, our examination of the Zynq UltraScale MPSoC + has focused mainly upon the PS (processing system) side of the device. However, to fully utilize the device’s capabilities we need to examine the PL (programmable logic) side also. So in this blog, we will look at the different AXI interfaces between the PS and the PL.





Zynq MPSoC Interconnect Structure




These different AXI interfaces provide a mixture of master and slave ports from the PS perspective and they can be coherent or not. The PS is the master for the following interfaces:


  1. FPD High Performance Master (HPM) – Two interfaces within the Full Power Domain.
  2. LPD High Performance Master (HPM) – One Interface within the Low Power Domain.


For the remaining interfaces the PL is the master:


  1. FPD High Performance Coherent (HPC) – Two Interfaces within the Full Power Domain. These interfaces pass through the CCI (Cache Coherent Interconnect) and provide one-way coherency from the PL to the PS.
  2. FPD High Performance (HP) – Four Interfaces within the Full Power Domain. These interfaces provide non-coherent transfers.
  3. Low Power Domain – One interface within the Low Power Domain.
  4. Accelerator Coherency Port (ACP) – One interface within the Full Power Domain. This interface provides one-way coherency (IO) allowing PL masters to snoop the APU Cache.
  5. Accelerator Coherency Extension (ACE) – One interface within the Full Power Domain. This interface provides full coherency using the CCI. For this interface, the PL master needs to have a cache within the PL.


Except for the ACE and ACP interfaces, which have a fixed data width, the remaining interfaces have a selectable data width of 32, 64, or 128 bits.


To support the different power domains within the Zynq MPSoC, each of the master interfaces within the PS is provided with an AXI isolation block that isolates the interface should a power domain be powered down. To protect the APU and RPU from hanging up performing an AXI access, each PS master interface also has a AXI timeout block to recover from any incorrect AXI interactions—for example, if the PL is not powered or configured.


We can use these interfaces simply within our Vivado design, where we can enable, disable, and configure the desired interface.







Once you have enabled and configured the desired interfaces, you can connect them into your design in the PL. Within the simple example in this blog post, we are going to transfer data to and from a BRAM located within the PL.






This example uses the AXI master connected to the low-power domain (LPD). However, both the APU and the RPU can address the BRAM via this interface thanks to the SMMU, the Central Switch, and the Low Power Switch. However, the use of the LPD AXI interconnect will allow the RPU to access the PL if the FPD (full-power domain) is powered down. Of course, it does increase complexity when using the APU.


This simple example performs the following steps:


  • Reads 256 addresses and check that they are all zero.
  • Write a count into the 256 addresses.
  • Read back the data stored in the 256 addresses to demonstrate that the data was written correctly.






Program Starting to read addresses for part 1





Data written to the first 256 BRAM addresses






Data read back to confirm the write



The key element in our designs is selecting the correct AXI interface for the application and data transfers at hand and ensuring that we are getting the best possible performance from the interconnect. Next time we will look at the quality of service and the AXI performance monitor.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 



My Pappy said

Son, you’re gonna

Drive me to drinkin’

If you don’t stop drivin’

That Hot Rod Lincoln” — Commander Cody & His Lost Planet Airmen



In other words, you need an autonomous vehicle.


For the last 14 years, Perrone Robotics has focused on creating platforms that allow vehicle manufacturers to quickly integrate a variety of sensors and control algorithms into a self-driving vehicle. The company’s MAX (Mobile Autonomous X) is “comprehensive full-stack, modular, real-time capable, customizable, robotics software platform for autonomous (self-driving) vehicles and general purpose robotics.”


Sensors for autonomous vehicles include cameras, lidar, radar, ultrasound, and GPS. All of these sensors generate a lot of data—about 1Mbyte/sec for the Perrone test platform. Designers need to break up all of the processing required for these sensors into tasks that can be distributed to multiple processors and then fuse the processed sensor data (sensor fusion) to achieve real-time, deterministic performance. For the most demanding tasks, software-based processing won’t deliver sufficiently quick response.


Self-driving systems must make as many as 100 decisions/sec based on real-time sensor data. You never know what will come at you.


According to Perrone’s Chief Revenue Officer Dave Hofert, the Xilinx Zynq UltraScale+ MPSoC with its multiple ARM Cortex-A53 and -R5 processors and programmable logic can handle all of these critical tasks and provides a “solution that scales,” with enough processing power to bring in machine learning as well.


Here’s a brand new, 3-minute video with more detail and a lot of views showing a Perrone-equipped Lincoln driving very carefully all by itself:





For more detailed information about Perrone Robotics, see this new feature story from an NBC TV affiliate.




Medium- and heavy-duty fleet vehicles account for a mere 4% of the vehicles in use today but they consume 40% of the fuel used in urban environments, so they are cost-effective targets for innovations that can significantly improve fuel economy. Lightning Systems (formerly Lightning Hybrids) has developed a patented hydraulic hybrid power-train system called ERS (Energy Recovery System) that can be retrofitted to new or existing fleet vehicles including delivery trucks and shuttle buses. This hybrid system can reduce fleet fuel consumption by 20% and decrease NOx emissions (the key component of smog) by as much as 50%! In addition to being a terrific story about energy conservation and pollution control, the development of the ERS system tells a great story about using National Instruments’ (NI’s) comprehensive line of LabVIEW-compatible CompactRIO (cRIO) and Single-Board RIO (sbRIO) controllers to develop embedded controllers destined for production.


Like an electric hybrid vehicle power train, an ERS-enhanced power train recovers energy during vehicle braking and adds that energy back into the power train during acceleration. However, Lightning Systems’ ERS stores the energy using hydraulics instead of electricity.


Here are the components of the ERS retrofit system, shown installed in series with a power train’s drive shaft:




Lightning Hybrids ERS Diagram.jpg 



Major components in the Lightning Systems ERS Hybrid Retrofit System




The power-transfer module (PTM) in the above image drives the hydraulic pump/motor during vehicle braking, pumping hydraulic fluid into the high- and low-pressure accumulator tanks, which act like mechanical batteries that store energy in tanks pressurized by nitrogen-filled bladders. When the vehicle accelerates, the pump/motor operates as a motor driven by the pressurized hydraulic fluid’s energy stored in the accumulators. The hydraulic motor puts energy back into the vehicle’s drive train through the PTM. A valve manifold controls the filling and emptying of the accumulator tanks during vehicle operation and all of the ERS control sequencing is handled by a National Instruments (NI) RIO controller programmed using NI’s LabVIEW system development software. All of NI’s Compact and Single-Board RIO controllers incorporate a Xilinx FPGA or a Xilinx Zynq SoC to provide real-time control of closed-loop systems.


Lightning Systems has developed four generations of ERS controllers based on NI’s CompactRIO and Single-Board RIO controllers. The company based its first ERS prototype controller on an 8-slot NI CRIO-9024 controller and deployed the design in pilot systems. A 2nd-generation ERS prototype controller used a 4-slot NI cRIO-9075 controller, which incorporates a Xilinx Spartan-6 LX25 FPGA. The 3rd-generation ERS controller used an NI sbRIO-9626 paired with a custom daughterboard. The sbRIO-9626 incorporates a larger Xilinx Spartan-6 LX45 FPGA and Lightning Systems fielded approximately 100 of these 3rd-generation ERS controllers.




Lightning Hybrids v2 v3 v4 ERS Controllers.jpg 


Three generations of Lightning Systems’ ERS controller (from left to right: v2, v3, and v4) based on

National Instruments' Compact RIO and Single-Board RIO controllers




For its 4th-generation ERS controller, the company is using NI’s sbRIO-9651 single-board RIO SOM (system on module), which is based on a Xilinx Zynq Z-7020 SoC. The SOM is also paired with a custom daughterboard. Using NI’s Zynq-based SOM reduces the controller cost by 60% while boosting the on-board processing power and adding in a lot more programmable logic. The SOM’s additional processing power allowed Lightning Systems to implement new features and algorithms that have increased fuel economy.




Lightning Hybrids v4 ERS Controller.jpg 


Lightning Systems v4 ERS Controller uses a National Instruments sbRIO-9651 SOM based on a

Xilinx Zynq Z-7020 SoC




Lightning Systems is able to easily migrate its LabVIEW code throughout these four ERS controller generations because all of NI’s CompactRIO and Single-Board RIO controllers are software-compatible. In addition, this controller design allows easy field upgrades to the software, which reduces vehicle downtime.


Lightning Systems has developed a modular framework so that the company can quickly retrofit the ERS to most medium- and heavy-duty vehicles with minimal new design work or vehicle modification. The PTM/manifold combination mounts between the vehicle’s frame rails. The accumulators can reside remotely, wherever space is available, and connect to the valve manifold through high-pressure hydraulic lines. The system is designed for easy installation and the company can typically convert a vehicle’s power train into a hybrid system in less than a day. Lightning Systems has already received orders for ERS hybrid systems from customers in Alaska, Colorado, Illinois, and Massachusetts, as well as around the world in India and the United Kingdom.




Lightning Hybrids Typical ERS Installation.jpg 


Typical Lightning Systems ERS Installation



This project recently won a 2017 NI Engineering Impact Award in the Transportation and Heavy Equipment category and is documented in this NI case study.




Blood pumps for extracorporeal life support (ECLS) are used in medical therapies to support failing human organ systems. Conventional blood pumps use mechanically driven impellers supported on bearings and these impellers are prone to stress and heat concentration on the shaft-bearing contact areas, which increases hemolysis (rupture or destruction of red blood cells) and thrombosis (blood clots). Both are bad news in the bloodstream. In addition, ECLS applications require that any components that touch the blood be disposable, to prevent infection.


The Precision Motion Control Lab at MIT and Ension, Inc. are developing a new type of blood pump with a low-cost, disposable, bearingless impeller to reduce costs in ECLS applications. Magnetic levitation through reluctance coupling replaces the impeller’s mechanical bearings and hysteresis coupling drives the impeller using magnetically induced torque, which eliminates the mechanical drive shaft. Both magnetic forces are supplied by a 12-coil electromagnet in this new design.


To further reduce the cost of the replaceable rotor/impeller assembly, the design team substituted a steel ring made of type D2 tool steel for the normal permanent magnet in the rotor. The “D2 ring” is inductively magnetized by the coupled magnetic fields from the stator electromagnets. Reluctance coupling pulls the outer edges of the ring, causing it to levitate, while a rotating magnetic field generated by the twelve stator coils imparts rotational torque on the D2 ring, causing the impeller to spin.


Controlling the stator coils to produce the correct magnetic fields for levitation and motion requires closed-loop control of all twelve electromagnets in the stator. The design team chose the National Instruments (NI) MyRIO Student Embedded Controller because it’s easily programmed in NI’s LabVIEW systems engineering software package and because the MyRIO’s integrated Xilinx Zynq Z-7010 SoC incorporates the high-speed programmable logic needed to provide real-time, deterministic, closed-loop stator control.


Here’s a photo of a prototype bearingless motor for this design, showing the 12-magnet stator and the D2 ring rotor on the left and a National Instruments MyRIO controller on the right (and yes, that’s the Xilinx Zynq SoC peeking through the plastic window in the MyRIO controller):




Bearingless Blood Pump Prototype Motor.jpg 




Closed-loop feedback comes from four eddy-current sensors, which are sense coils driven by Texas Instruments LDC1101 16-bit LDCs (inductance-to-digital converters). The four LDC boards appear in the upper left part of the above photo. The four eddy-current sensors are organized in two pairs that differentially measure real-time rotor position. Each sensor connects to the MyRIO controller and the Zynq SoC using individual 5-wire SPI interfaces, as shown below:




Bearingless Blood Pump LDC Detail.jpg 




The MyRIO controller drives the blood pump's 12-phase stator through twelve analog channels—built from an NI cRIO-9076 4-slot CompactRIO controller (with an integrated Xilinx Virtex-5 LX45 FPGA), three NI-9263 voltage output modules, and one NI 9205 voltage input module—and twelve custom linear transconductance power amplifiers. The flexibility this setup provides permits the design team to experiment with and refine different motor-control algorithms.


Closed-loop, drive-with-feedback control algorithms are implemented in the Zynq SoC’s programmable logic because software-based microcontroller or microprocessor control loops would not have been fast enough or sufficiently deterministic. Although this controller design is capable of implementing a 46KHz control loop, the actual loop rate is 10KHz because that’s fast enough for this electromechanical system. The Zynq SoC’s 32-bit ARM Cortex-A9 processors in the MyRIO controller implement the system’s user interface and data logging.


This project won a 2017 NI Engineering Impact Award in the Advanced Research category and is documented in this NI case study.



Adam Taylor’s MicroZed Chronicles Part 199: The AD9467 SDSOC Platform

by Xilinx Employee ‎05-31-2017 03:36 PM - edited ‎05-31-2017 03:54 PM (2,999 Views)


By Adam Taylor


Having got the base hardware and software designs up and running, the next step is to create a SDSoC platform so that we can use this design efficiently. The SDSoC platform allows us to implement algorithms at a much higher level using C or C++. We can therefore develop C or C++ programs using SDSoC to access the ADC sample data within the DDR memory and verify that our algorithms work correctly. Once we are sure that we have the correct algorithmic function (but not necessarily the desired performance), we can accelerate these algorithms by putting them into the Zynq SoC’s programmable logic (PL) rather seamlessly. Taking such an approach enables us to use one base design for a range of applications. Because we are developing in a higher language, the time taken to produce the first working demonstration is reduced.


To generate an SDSoC platform, we need a Vivado base design, the necessary software libraries, and three definition files:


  • XPFM – This is the top-level definition of the Platform – Generated by hand
  • HPFM – The hardware definition of the platform – Generated by Vivado
  • SPFM – Defines the software definition of the platform – Generated by hand


The first thing we need to do to create the SDSoC platform (I am using version 2016.3) is to modify the design in Vivado using the UG1146 requirements for a hardware platform. This means that we need to update the concatenation block and move the used interrupts down to the least significant inputs. This frees up the remaining interrupts so that SDSoC can use them when it accelerates an algorithm using hardware. I also enabled all four FCLKs and Resets from the Zynq SoC’s PS (processing system) to the PL and instantiated the reset blocks for each of these clocks. I then followed the steps within UG1146 to create the hardware metadata to create one half of the platform. In this case, the hardware side of the SDSoC Platform makes available the AXI ACP, AXI HP2, AXI HP3, and AXI GP Master 1 connections. The other AXI interfaces are already in use by the existing AD9467 demo design.


There is one more thing we need as we create the hardware platform. Because this is a custom platform, which uses custom IP, we need to ensure that the IP is within the Vivado project for the SDSoC Platform. If it is not, then when we try to build our SDSoC platform we will get several failures in the build process because it cannot find IP information. The simplest method for preventing this problem is to use the Vivado Archive function to archive the design. Then the archived design will be extracted and used to define the SDSoC hardware platform.


To create the software platform (as we are using the ZedBoard for this example), I initially copied the software and top-level XML file from the <SDSoC Install>/platforms/zed directory, before editing them to reflect the needs of the platform:





Top Level of the ad9467_fmc_zed SDSoC Platform



These steps provided me with an SDSoC platform that I can use for development with the ZedBoard and the AD9467 FMC. My next step then was to perform some pipe cleaning to ensure that the platform functions as intended. To do this I wanted to:



  1. Build the AD9467 demo application and run it from with SDSoC with no acceleration.
  2. Create a simple acceleration example built onto to the base hardware. For this I am going to use one of the matrix multiply examples.



As I did not declare a prebuilt platform, SDSoC will generate the hardware the first time we build the application. I did this to ensure that SDSoC can re-build the hardware design without any accelerations but with the custom IP blocks needed for the AD9467 demo.






Vivado Diagram as used for the AD9467 Demo application



Having built the first application successfully, I then ran it on the ZedBoard with the AD9467 FMC connected and observed the same performance as I had previously seen when using SDK. This means that I can start developing that use the data provided by the AD9467 within the SDSoC environment.






However, once I have finished generating and testing my algorithms in C/C++, I will want to accelerate elements of the design. That is where the second test of the platform comes in: to test that the platform is correctly defined and is therefore capable of accelerating C and C++ functions into the hardware. Within the AD9467 FMC SDSoC platform, I created an example application for acceleration using one of the predefined SDSoC examples: the mmult. This will add functionality necessary to perform the MMult within the hardware in addition to the base design we have been using for the AD9467.





Accelerating the mmult_accel function in the AD9467 FMC Zed Platform






Resultant SDSoC Vivado design, AD9467 FMC design with additional hardware for the mmult_accel function (circled in red)






MMULT results on the AD9467 FMC Zed Platform



Generating this SDSoC platform was pretty simple and it allows us to develop our applications much faster than would be the case if we were using a standard HDL based approach. We will look at how we can do signal processing with this platform in future blogs.



I have uploaded the SDSoC Platform to the following git hub repository which is different to the standard one due to the organization of the platform.






If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



 MicroZed Chronicles Second Year.jpg



MIT and Continuum develop “Human Organ Systems Under Test” chip using Zynq-based NI MyRIO controller

by Xilinx Employee ‎05-30-2017 04:52 PM - edited ‎05-30-2017 05:06 PM (3,219 Views)


Last week in Austin at NI Week, MIT Professor Dr. Dave Trumper and Senior Software and Electrical Engineer Jared Kirschner from Continuum demonstrated a “Human Physiome on a Chip,” a collection of human organ tissues linked by nutrient flows and controlled by a National Instruments (NI) MyRIO controller, which is based on a Xilinx Zynq Z-7010 SoC. The Human Physiome chip, developed by an MIT team headed by Dr. Linda Griffith along with Continuum, contains wells for as many as ten different human cells from organs including liver, brain, gut, heart, kidney, pancreas, and bone marrow. The purpose of this “organ systems under test” device is to study the way human organ systems may respond to various drug therapies in vitro. The work is funded by DARPA.




Human Physiome Chip.jpg


7-Organ Version of MIT’s Human Physiome Chip



The nutrient system for the Human Physiome chip consists of as many as a dozen micropumps. Each micropump consists of three small pneumatic valves operated in a sequence that moves the fluid nutrient through the chip. Continuum developed a micropump controller using NI’s MyRIO student controller, one of many Zynq-based products in the NI RIO line of controllers. This controller has 36 control channels (12 pumps times three valves per pump) and pressure sensing. Software to operate the controller is based on NI’s LabVIEW system development software. Continuum needed to develop a system that could control twelve micropumps with a 1KHz update rate and chose the Zynq-based MyRIO controller as the appropriate design solution for this application.



Continuum Micropump controller based on NI MyRIO.jpg


Continuum built its Micropump controller around the Zynq-based NI MyRIO Platform



Here’s a 7-minute NI Week video describing this extremely unusual control application:








Note: For more information about the MyRIO controller, see “How Xilinx All Programmable technology has fundamentally changed business at National Instruments.”




Blue Origin New Shepard Rocket Launch.jpg



Blue Origin New Shepard Rocket Launch – photo courtesy of Blue Origin




Jason Smith, an Instrumentation and Controls Engineer at commercial space pioneer Blue Origins, spoke during a keynote at last week’s NI Week in Austin, TX. According to Smith, Blue Origin’s corporate mission is to “see millions of people living and working in space.” To that end, the company is developing two rocket systems. The first to be tested is the New Shepard, a fully reusable, vertical-takeoff, vertical-landing space vehicle designed for suborbital missions. (Alan Shepard was the first American to rocket into space aboard the Freedom 7 Mercury space capsule atop a Redstone rocket in a suborbital mission.) The New Shepard rocket uses one of the company’s 3rd-generation BE-3 engines, which burns liquid hydrogen using liquid oxygen as the oxidizer. Smith said that Blue Origin has already successfully launched and landed its New Shepard rockets five times and the first crewed flight is scheduled for next year.


The company is also developing the more powerful New Glenn rocket for low-Earth-orbit missions based on seven of its 4th-generation BE-4 engines, which burns liquefied natural gas and again using a liquid oxygen oxidizer. (John Glenn became the first American to reach Earth orbit in 1962 in the Friendship 7 Mercury capsule atop an Atlas rocket.) Blue Origin expects the BE-4 engine to be ready for testing sometime this year and United Launch Alliance (ULA)–maker of the Atlas V and Delta IV launch systems–has chosen the BE-4 engine to power its next generation Vulcan launch vehicle. The New Glenn rocket is designed to be reused as many as 100 times.



Blue Origins BE-4 Rocket Engine.jpg 


Blue Origin’s BE-4 Rocket Engine



Smith is responsible for developing test techniques and test cells to ensure that everything Blue Origin builds—from parachute to guidance fins to reusable rocket engines—works in rigorous launch-and-land missions. To ensure that all rocket components work as designed, Blue Origin builds dedicated test stands with thousands of monitoring and control channels based on National Instruments’ (NI’s) equipment and LabVIEW and TestStand software. Although earlier test systems at Blue Origin were based on NI’s PXI modular instrumentation and CompactDAQ data acquisition systems, Smith is standardizing the design of his newest test systems using NI’s cRIO-9068 8-slot CompactRIO extended-temperature controller because these systems provide the needed autonomy and robustness required by the demanding test-cell environments. NI’s cRIO-9068 incorporates a Xilinx Zynq Z-7020 SoC, which provides that autonomy and robust control.




NI cRIO-9068 v2.jpg 


NI cRIO-9068 CompactRIO Extended-Temperature Controller based on a Xilinx Zynq Z-7020 SoC




Here’s a 7-minute video of Jason Smith’s keynote at NI Week that includes some great shots of the New Shepard rocket taking off and landing. (I advise you to turn up the volume for this one and watch the amazing thrust vectoring as the rocket sticks its landing.)





Adam Taylor’s MicroZed Chronicles Part 198: Building the 250Msamples/sec AD9467 FMC Card

by Xilinx Employee ‎05-30-2017 10:48 AM - edited ‎05-30-2017 10:50 AM (3,788 Views)


By Adam Taylor



Last week I mentioned, the Analog Devices AD9467 FMC in the blog and how we could use it with the Xilinx SDSoC development environment to capture data with a simple data-capture chain and then develop and accelerate the algorithm using a high-level language like C or C++.





Analog Devices AD9467 FMC and Zynq-based Avnet ZedBoard Combined




The AD9467 FMC contains the AD9467 ADC, which provides 16-bit quantization at sampling rates of up to 250Msamples/sec (MSPS). These specs allow us to use the AD9467 to sample Intermediate Frequency (IF) signals. An IF is used to move an RF carrier wave down from or up to a higher frequency for reception or transmission.


The first thing we need to do with the AD9467 board is to work out the clocking scheme we’ll use to provide the ADC with a sample clock. We have three options:


  1. Apply an externally generated sine-wave. This option allows us to easily change the sampling frequency. However, to ensure good convertor performance, we’ll need a low-jitter clock from a quality signal source.
  2. Use the on-board oscillator. This option provides a fixed 250MHz reference clock to the ADC. It has the advantage of being an on-board resource with a known good layout. However, its sampling frequency is fixed.
  3. Use the on-board AD9517—an SPI-controlled, 12-output clock generator. This option gives us the ability to set the sampling frequency as desired.


To change between the three sources, we add and remove ac coupling capacitors from the circuit to put the correct clock generator in the clock path. By default, the clock path is configured to use the external clock source.


However, before we can create an SDSoC Platform, we need to create a base design in Vivado. This base design interfaces with the AD9467 FMC and transfers the sampled data into the Zynq SoC’s PS (processing system) DDR memory using DMA. Rather helpfully, the AD9467 FMC comes with a Vivado example that we can use with the ZedBoard. This example design creates the structure to transfer samples into the PS DDR SDRAM using DMA.


To recreate this design, the first thing we need to do is download the Analog Devices Git Hub repository, which contains both the shared IP elements required and the actual Vivado design example. To ensure we are using the latest possible tool chain, select the latest tool revision from the Git Hub and download a zip of the repository or clone the repository from here.


To build this project, we need to be using either a Linux box or, if we are using Microsoft Windows, we’ll need to download and install CYGWIN. If you are using CYGWIN, you need to make sure you have Vivado in your path.


To build the project you just need to use either a terminal or CYGWIN to navigate to the AD9467_FMC directory and execute the make file for the Zed version.





Make file running in CYGWIN to recreate the project




Once this has been recreated, we will be able to open our project in Vivado, explore the design in the block diagram, and export the design. We can then use the test application software to complete the demo.






AD9467 FMC example design




As can be seen in the above example, these steps add the FMC example into the existing Zynq base hardware design so that all the other interfaces like HDMI are still available. These additional interfaces can be very useful to us. In the diagram above, you can see the highlighted path from the AD9467 receiver IP, into a DMA IP block and then an AXI Interconnect block that connects to a Zynq HP (high-performance) AXI port. This design allows the data move seamless into the PS DDR SDRAM for future processing.


Of course to do this we need to run some software on the Zynq SoC’s ARM Cortex-A9 processor to configure the AD9467, the AD9517, and the simple internal processing pipeline. You can download the demo application example from here on GitHub. Helpfully, it comes with batch files (one for Linux one for Windows), which are used to create the demo software application to support the Vivado design.


When we run this example on the Zynq SoC, we will find that it performs a number of tests prior to performing the first ADC sample capture.






Terminal Output from ZedBoard if the FMC is present




The samples will be stored at 0x0800_0000 within the DDR SDRAM. Using the debug facility within SDK, we can examine these values and see that they are updated when the sampling occurs.





DDR Memory location at 0x0800_0000 following power cycle






DDR Memory Location at 0x0800_0000 following the samples being captured




With this up and working, we can now think about how we can use the base platform efficiently to implement higher-level signal-processing algorithms.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg




By Adam Taylor



So far on this journey (which is only just beginning) of looking at the Zynq UltraScale+ MPSoC we have explored mostly the A53 processors within the Application Processing Unit (APU). However, we must not overlook the Real-Time Processing Unit (RPU), which contains two ARM Cortex-R5 32 bit RISC processors and operates within the Zynq MPSoC’s PS’ (processing systems’) Low Power Domain.






R5 RPU Architecture



The RPU executes real-time processing applications, including safety-critical applications. As such, you can use it for applications that must comply with IEC61508 or ISO 26262. We will be looking at this capability in more detail in a future blog. To support this, the RPU can operate in two distinct modes:


  • Split or Performance: - Both cores operate independently
  • Lock-Step: - Both cores operate in lockstep


Of course, it is the lock-step mode which is implemented as one step when a safety application is being implemented (see chapter 8 of the TRM for full safety and security capabilities). To provide deterministic processing times, both ARM Cortex-R5 cores include 128KB of Tightly Coupled Memory (TCM) in addition to the Caches and OCM (on-chip memory). How the TCMs are used depends upon the operating mode. In Split mode, each processor has 128Kbytes of TCM (divided into A and B TCMs). In lock-step mode, there is one 256Kbyte TCM.





RPU in Lock Step Mode



At reset, the default setting configures the RPU to operate in lock-step mode. However, we can change between the operating modes while the processor group is in reset. We do this by updating the RPU Global Control Register SLCAMP bit, which clamps the outputs of the redundant processors, and the SLSPLIT bit, which sets the operating mode. We cannot change the RPU’s operating mode during operation, so we need to decide upfront during the architectural phase which mode we desire for a given application.


However, we do not have to worry about setting these bits when we use the debugger or generate a boot image. Instead we can use these to configure the operating mode. What I want to look at in the rest of the blog is look at how we configure the RPU operating mode both in our debug applications and boot-image generation.


The first way that we verify many of our designs is to use the System Debugger within SDK, which allows us to connect over JTAG or Ethernet and download our application. Using this method, we can of course use breakpoints and step through the code as it operates, to get to the bottom of any issues in the design. Within the debug configuration tab, we can also enable the RPU to operate in split mode if that’s the mode we want after system reset.





Debug Configuration to enable RPU Split Mode



When you download the code and run it on the Zynq MPSoC’s RPU, you will be able to see the operating mode within the debug window. This should match with your debug configuration setting.





Debug Window showing Lock-Step Mode



Once we are happy with the application, we will want to create a boot image and we will want to determine the RPU operating mode when we create that boot image. We can add the RPU elf to the FSB, FPGA, and APU files using the boot-image dialog. To select the RPU mode, we choose the edit option and then select the destination CPU—either both ARM Cortex-R5 cores in lockstep or the ARM Cortex-R5 core we wish it run on if we are using split mode.






Selecting the R5 Mode of operation when generating a boot image



Of course if we want to be sure we are in the correct mode in this operation, we need to read the RPU Global Control register and ensure the correct mode is selected as expected.


Now that we understand the different operating modes of the Zynq UltraScale+ MPSoC’s RPU, we can come back to these modes when we look at the security and safety capabilities provided by the Zynq MPSoC.



Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here


MicroZed Chronicles Second Year.jpg 



TI has a new design example for a 2-device power converter to supply multiple voltage rails to a Xilinx Zynq UltraScale+ MPSoC for Remote Radio Heads and wireless backhaul applications, but the design looks usable across the board for many applications of the Zynq MPSoC. The two TI power-control and -conversion devices in this reference design are the TPS6508640 configurable, multi-rail PMIC for multicore processors and the TPS544C25 high-current, single-channel dc-dc converter. Here’s a simplified diagram of the design:




TI Remote Radio Head Power Supply Design Example.jpg



Please contact TI for more information about these power-control and –conversion devices.

Adam Taylor’s MicroZed Chronicles, Part 196: SDSoC and Levels of Abstraction

by Xilinx Employee ‎05-22-2017 09:40 AM - edited ‎05-22-2017 10:28 AM (4,522 Views)


By Adam Taylor



We have looked at SDSoC several times throughout this series, however I recently organized and presented at the NMI FPGA Machine Vision event and during the coffee breaks and lunch, attendees showed considerable interest in SDSoC—not only for its use in the Xilinx reVISION acceleration stack but also its use in a range of over developments. As such, I thought it would be worth some time looking at what SDSoC is and the benefits we have previously gained using it. I also want to discuss a new use case.





SDSoC Development Environment




SDSoC is an Eclipse-based, system-optimizing compiler that allows us to develop our Zynq SoC or Zynq UltraScale+ MPSoC design in its entirety using C or C++. We can then profile the application to find aspects that cause performance bottlenecks and move then into the Zynq device’s Programmable Logic (PL). SDSoC does this using HLS (High Level Synthesis) and a connectivity framework that’s transparent to the user. What this means is that we are able develop at a higher level of abstraction and hence reduce the time to market of the product or demonstration.


To do this, SDSoC needs a hardware platform, which can be pre-defined or custom. Typically, these platforms within the PL provide the basics: I/O interfaces and DMA transfers to and from Zynq device’s PS’ (Processing System’s) DDR SDRAM. This frees up most the PL resources and PL/PS interconnects to be used by SDSoC when it accelerates functions.


This ability to develop at a higher level and accelerate performance by moving functions into the PL enables us to produce very flexible and responsive systems. This blog has previously looked at acceleration examples including AES encryption, matrix multiplication, and FIR Filters. The reduction in execution time has been significant in these cases. Here’s a table of these previously discussed examples:





Previous Acceleration Results with SDSoC. Blogs can be found here




To aid us in the optimization of the final application, we can use pragmas to control the HLS optimizations. We can use SDSoC’s tracing and profiling capabilities while optimizing these accelerated functions and the interaction between the PS and PL.


Here’s an example of a trace:





Results of tracing an example application

(Orange = Software, Green = Accelerated function and Blue = Transfer)



Let us take a look at a simple use case to demonstrate SDSoC’s abilities.


Frequency Modulated Continuous Wave (FMCW) RADAR is used for a number of applications that require the ability to detect objects and gauge their distance. FMCW applications make heavy use of FFT and other signal-processing techniques such as windowing, Constant False Alarm Rate (CFAR), and target velocity and range extraction. These algorithms and models are ideal for description using a high-level language such as C / C++. SDSoC can accelerate the execution of functions described this way and such an approach allows you to quickly demonstrate the application.


It is possible to create a simple FMCW receive demo using a ZedBoard and an AD9467 FPGA Mezzanine Card (FMC). At the simplest level, the hardware element of the SDSoC platform needs to be able to transfer samples received from the ADC into the PS memory space and then transfer display data from the PS memory space to the display, which in most cases will be connected with DVI or HDMI interfaces.






Example SDSoC Platform for FMCW application



This platform permits development of the application within SDSoC at a higher level. It also provides a platform that we can use for several different applications, not just FMCW. Rather helpfully, the AD9467 FMC comes with a reference design that can serve as the hardware element of the SDSoC Platform. It also provides drivers, which can be used as part of the software element.


With a platform in hand, it is possible to write the application within the SDSoC using C or C++, where we can make use of the acceleration libraries and stacks including matrix multiplication, math functions, and the ability to wrap bespoke HLD IP cores and use them within the development.


Developing in this manner provides a much faster development process, and provides a more responsive solution as it leverages the Zynq PL for inherently parallel or pipelined functions. It also makes it easier to upgrade designs in terms. As the majority development will also use C or C++ and because SDSoC is a system-optimizing complier, the application developer does not need to be a HDL specialist.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 




  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 







Enea just announced that it has added a BSP (board support package) for the Zynq UltraScale+ MPSoC and ZCU102 Eval Kit to its POSIX-compliant, multicore OSE operating system. OSE offers embedded developers extremely low latency, low jitter, and minimal processing overhead to deliver bare-metal performance that extracts maximum performance from heterogeneous processors like the Zynq UltraScale+ MPSoC. According to Enea, OSE supports both SMP (symmetric multiprocessing) and AMP (asymmetric multiprocessing) and delivers linear performance scalability for MPSoCs with as many as 24 cores, so it should be able to easily handle the four or more 64- and 32-bit ARM Cortex-A53 and –R5 processor cores in the various Zynq UltraScale+ MPSoC family members.



ZCU102 Board Photo.jpg 


Xilinx ZCU102 Eval Kit for the Zynq UltraScale+ MPSoC




Enea’s carrier-grade OSE has long been used in the telecom industry and is incorporated into more than half of the world's radio base stations. In addition, OSE is used in automotive, medical, and avionics designs.



Never at a loss for words, Adam Taylor has just published some additional thoughts on designing with Xilinx All Programmable devices over at the EEWeb.com site. His post, titled “Make Something Awesome with the $99 FPGA-Based Arty Development Board,” serves as a reminder or an invitation to attend the free May 31 Xilinx Webinar titled “Make Something Awesome with the $99 Arty Embedded Kit.”


Here’s what Adam has to say about FPGA design today:


“Both the maker and hobby communities are increasingly using FPGAs within their designs. This is thanks to the provision of boards at the right price point for the market, coupled with the availability of easy-to-use development tools that include simulation and High-Level Synthesis (HLS) capabilities.


“Let's be honest; compared to the reputation FPGAs have had historically, developing FPGA-based designs in this day-and-age is much simpler. This is largely thanks to a wide range of IP modules that are supplied with the development tools from board vendors and places like OpenCores.”



Adam’s article discusses two low-cost Digilent boards:






Digilent Arty Z7.jpg 


Digilent Arty Z7 Development Board




Adam concludes his article with this: “Overall, if you are looking to take your first steps into the world of FPGAs, then the Arty (Artix-based) or the Arty Z7 (Zynq 7000-based) should be high on your list of development boards to consider.”



High-Frequency Trading on Xilinx FPGAs? Aldec demos Kintex UltraScale board at Trading Show 2017, Chicago

by Xilinx Employee ‎05-17-2017 04:39 PM - edited ‎05-17-2017 05:07 PM (3,529 Views)


You’ve probably heard that “time equals money.” That’s especially true with high-frequency trading (HFT), which seeks high profits based on super-short portfolio holding periods driven by quant (quantitative) modeling. Microseconds make the difference in the HFT arena. As a result, a lot of high-frequency trading companies use FPGA-based hardware to make decisions and place trades and a lot of those companies use Xilinx FPGAs. No doubt that’s why Aldec is showing its HES-HPC-DSP-KU115 FPGA accelerator board at the Trading Show 2017 being held in Chicago, starting today.




 Aldec HES-HPC-DSP-KU115 Board.jpg


Aldec HES-HPC-DSP-KU115 FPGA accelerator board




This board is based on two Xilinx All Programmable devices: the Kintex UltraScale KU115 FPGA and the Zynq Z-7100 SoC (the largest member of the Zynq SoC family). This board has been optimized for High Performance Computing (HPC) applications and prototyping of DSP algorithms thanks to the Kintex UltraScale KU115 FPGA’s 5520 DSP blocks. This board partners the Kintex UltraScale FPGA with six simultaneously accessible external memories—two DDR4 SODIMMs and four low-latency RLDRAMs—providing immense FPGA-to-memory bandwidth.


The Zynq Z-7100 SoC can operate as an embedded Linux host CPU and it can implement a PCIe host interface and multiple Gigabit Ethenert ports.


In addition, the Aldec HES-HPC-DSP-KU115 FPGA accelerator board has two QSFP+ optical-module sockets for 40Gbps network connections.




By Adam Taylor


When I demonstrated how to boot the ZedBoard using the TFTP server, there was one aspect I did not demonstrate: configuring the Zynq SoC’s PL (programmable logic) over the TFTP. It’s very simple to do. We can include the PL bin file along with the Kernel, RAM Disk, and Device Tree blob on the server and then allow U-Boot to configure the PL as it boots, just as we did for the other elements.


We can also configure the Zynq SoC’s PL at any time we want using either Linux or bare-metal applications. To do this we use the DevC (Device configuration)/PCAP (Processor Configuration Access Port) within the Zynq SoC’s PS (processing system). There are three methods through which we configure the PL. The most obvious being JTAG, followed by PCAP under PS control, with the final method being the ICAP (Internal Configuration Access Port). It is through the DevC interface that we configure the PL when the device boots using the FSBL or U-Boot. The ICAP path is the least-used method and requires a configured PL prior to its use. One example where you might use the ICAP path would be to allow a MicroBlaze soft-core processor to reconfigure the PL.







When the device is running, we can replace the contents of the PL with an updated design using the same interface. All that we need to do is to have generated the new bit file and ensure that it is accessible to the program running on the ARM Cortex-A9 processors in the Zynq SoC’s PS so that they can download it via the DevC interface.


If we are using Linux, we can upload the file into the file system using FTP. We can then use the built-in DevC driver within the Linux Kernel to download the bit file.








From a command prompt, we can enter the command:



cat {filename} > /dev/xdevcfg



to download the bit file. When I did this for a simple Zedboard design, as shown below—which includes the ability to drive the LEDS connected to the PL—the “Done” LED lit. Of course, to ensure correct operation we need to have the device tree blob correctly configured to support the PL design.








If we want to configure the Zynq SoC’s PL using bare-metal software, we can use a similar approach. The BSP comes with an example file that downloads a PL image using the DevC interface provided that we have the PL file loaded into the Zynq SoC’s attached DDR memory. We can access the example and include it within our design using the System.MSS file, which is provided when we generate a BSP.







To correctly use the example provided, we need to have a PL bit file loaded in the DDR Memory. For a production-ready system, we would have to store the PL configuration file within a non-volatile memory and then load it into the DDR at a known address before running the DevC example code. However, to demonstrate the concept, we can use the debugger to download the configuration file into the DDR at the desired memory location.


Within the application example, all we need to do is define the location of the configuration file and the size of the file:







Having demonstrated how we can reconfigure the PL in its entirety, we can also use a similar approach to partially reconfigure regions within the PL, which we will look at in future blogs.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here


MicroZed Chronicles Second Year.jpg 




LMI TechnologiesGocator 3210 is a smart, metrology-grade, stereo-imaging snapshot sensor that produces 3D point clouds of scanned objects with 35μm accuracy over fields as large as 100x154mm at 4fps. The diminutive (190x142x49mm) Gocator 3210 pairs a 2Mpixel stereo camera with an industrial LED-based illuminator that projects structured blue light onto the subject to aid precise measurement of object width, height, angles, and radii. An integral Xilinx Zynq SoC accelerates these measurements so that the Gocator 3210 can scan objects at 4Hz, which LMI says is 4x the speed of such a sensor setup feeding raw data to a host CPU for processing. This fast scanning speed means that parts can pass by the Gocator for inspection on a production line without stopping for the measurement to be made. The Gocator uses a GigE interface for host connection.



LMI Technologies Gocator 3210.jpg


LMI Technologies Gocator 3210 3D Smart Stereo Vision Sensor



LMI provides a browser-based GUI to process the point clouds and 3D models generated by the Gocator. That means the processing—which includes the calculation of object width, height, angles, and radii—all takes place inside of the Gocator. No additional host software is required.


Here’s a photo of LMI’s GUI showing a 3D scan of an automotive cylinder head (a typical application for this type of sensor):




LMI Gocator GUI.jpg



LMI also offers an SDK so that you can develop sophisticated inspection programs that run on the Gocator. The company has also produced an extensive series of interesting training videos for the Gocator sensor family.


Finally, here’s a short (3 minutes) but information-dense video explaining the Gocator’s features and capabilities:






LMI’s VP of Sales Len Chamberlain has just published a blog titled “Meeting the Demand for Application-Specific 3D Solutions” that further discusses the Gocator 3210’s features and applications.



A paper titled “Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs” that discusses the advantages and efficiencies of Python-based development using the PYNQ development environment—based on the Python programming language and Jupyter Notebooks—and the Digilent PYNQ-Z1 board, which is based on the Xilinx Zynq SoC, recently won the Best Short Paper award at the 25th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2017) held in Napa, CA. The paper’s authors—Senior Computer Scientist Andrew G. Schmidt, Computer Scientist Gabriel Weisz, and Research Director Matthew French from the USC Viterbi School of Engineering’s Information Sciences Institute—evaluated the impact of, the performance implications, and the bottlenecks associated with using PYNQ for application development on Xilinx Zynq devices. The authors then compared their Python-based results against existing C-based and hand-coded implementations.



The authors do a really nice job of describing what PYNQ is:



“The PYNQ application development framework is an open source effort designed to allow application developers to achieve a “fast start” in FPGA application development through use of the Python language and standard “overlay” bitstreams that are used to interact with the chip’s I/O devices. The PYNQ environment comes with a standard overlay that supports HDMI and Audio inputs and outputs, as well as two 12-pin PMOD connectors and an Arduino-compatible connector that can interact with Arduino shields. The default overlay instantiates several MicroBlaze processor cores to drive the various I/O interfaces. Existing overlays also provide image filtering functionality and a soft-logic GPU for experimenting with SIMT [single instruction, multiple threads] -style programming. PYNQ also offers an API and extends common Python libraries and packages to include support for Bitstream programming, directly access the programmable fabric through Memory-Mapped I/O (MMIO) and Direct Memory Access (DMA) transactions without requiring the creation of device drivers and kernel modules.”



They also do a nice job of explaining what PYNQ is not:



“PYNQ does not currently provide or perform any high-level synthesis or porting of Python applications directly into the FPGA fabric. As a result, a developer still must use create a design using the FPGA fabric. While PYNQ does provide an Overlay framework to support interfacing with the board’s IO, any custom logic must be created and integrated by the developer. A developer can still use high-level synthesis tools or the aforementioned Python-to-HDL projects to accomplish this task, but ultimately the developer must create a bitstream based on the design they wish to integrate with the Python [code].”



Consequently, the authors did not simply rely on the existing PYNQ APIs and overlays. They also developed application-specific kernels for their research based on the Redsharc project (see “Redsharc: A Programming Model and On-Chip Network for Multi-Core Systems on a Programmable Chip”) and they describe these extensions in the FCCM 2017 paper as well.




Redsharc Project.jpg




So what’s the bottom line? The authors conclude:


“The combining of both Python software and FPGA’s performance potential is a significant step in reaching a broader community of developers, akin to Raspberry Pi and Ardiuno. This work studied the performance of common image processing pipelines in C/C++, Python, and custom hardware accelerators to better understand the performance and capabilities of a Python + FPGA development environment. The results are highly promising, with the ability to match and exceed performances from C implementations, up to 30x speedup. Moreover, the results show that while Python has highly efficient libraries available, such as OpenCV, FPGAs can still offer performance gains to software developers.”


In other words, there’s a vast and unexplored territory—a new, more efficient development space—opened to a much broader system-development audience by the introduction of the PYNQ development environment.


For more information about the PYNQ-Z1 board and PYNQ development environment, see:






About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.