UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

Last month, Xilinx Product Marketing Manager Darren Zacher presented a Webinar on the extremely popular $99 Arty Dev Kit, which is based on a Xilinx Artix-7 A35T FPGA, and it’s now online. If you’re wondering if this might be the right way for you to get some design experience with the latest FPGA development tools and silicon, spend an hour with Zacher and Arty. The kit is available from Avnet and Digilent.

 

Register to watch the video here.

 

 

ARTY v4.jpg 

 

 

For more information about the Arty Dev Kit, see: “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????

 

 

 

 

 

Avnet has formally introduced its MiniZed dev board based on the Xilinx Zynq Z-7000S SoC with the low, low price of just $89. For this, you get a Zynq Z-7007S SoC with one ARM Cortex-A9 processor core, 512Mbytes of DDR3L SDRAM, 128Mbits of QSPI Flash, 8Gbytes of eMMC Flash memory, WiFi 802.11 b/g/n, and Bluetooth 4.1. The MiniZed board incorporates an Arduino-compatible shield interface, two Pmod connectors, and a USB 2.0 host interface for fast peripheral expansion. You’ll also find an ST Microelectronics LIS2DS12 Motion and temperature sensor and an MP34DT05 Digital Microphone on the board. This is a low-cost dev board that packs the punch of a fast ARM Cortex-A9 processor, programmable logic, a dual-wireless communications system, and easy system expandability.

 

I find the software that accompanies the board equally interesting. According to the MiniZed Product Brief, the $89 price includes a voucher for an SDSoC license so you can program the programmable logic on the Zynq SoC using C or C++ in addition to Verilog or VHDL using Vivado. This is a terrific deal on a Zynq dev board, whether you’re a novice or an experienced Xilinx user.

 

Avnet’s announcement says that the board will start shipping in early July.

 

Stefan Rousseau, senior technical marketing engineer for Avnet, said, “Whether customers are developing a Linux-based system or have a simple bare metal implementation, with MiniZed, Zynq-7000 development has never been easier. Designers need only connect to their laptops with a single micro-USB cable and they are up and running. And with Bluetooth or Wi-Fi, users can also connect wirelessly, transforming a mobile phone or tablet into an on-the-go GUI.”

 

 

 

Here’s a photo of the MiniZed Dev board:

 

 

Avnet MiniZed 3.jpg 

 

Avnet’s $89 MiniZed Dev Board based on a Xilinx Zynq Z-7007S SoC

 

 

And here’s a block diagram of the board:

 

 

MiniZed Block Diagram.jpg 

 

Avnet’s $89 MiniZed Dev Board Block Diagram

 

By Adam Taylor

 

We can create very responsive design solutions using Xilinx Zynq SoC or Zynq UltraScale+ MPSoC devices, which enble us to architect systems that exploit the advantages provided by both the PS (processor system) and the PL (programmable logic) in these devices. When we work with logic designs in the PL, we can optimize the performance of design techniques like pipelining and other UltraFast design methods. We can see the results of our optimization techniques using simulation and Vivado implementation results.

 

When it comes to optimizing the software, which runs on acceleration cores instantiated in the PS, things may appear a little more opaque. However, things are not what they might appear. We can gather statistics on our accelerated code with ease using the performance analysis capabilities built into XSDK. Using performance analysis, we can examine the performance of the software we have running on the acceleration cores and we can monitor AXI performance within the PL to ensure that the software design is optimized for the application at hand.

 

Using performance analysis, we can examine several aspects of our running code:

 

  • CPU Utilization – Percentage of non-idling CPU clock cycles
  • CPU Instructions Per Cycle – Estimated number of executed instructions per cycle
  • L1 Cache Data Miss Rate % – L1 data-cache miss rate
  • L1 Cache Access Per msec – Number of L1 data-cache accesses
  • CPU Write Instructions Stall per cycle – Estimated number of stall cycles per instruction
  • CPU Read Instructions Stall per cycle – Estimated number of stall cycles per instruction

 

For those who may not be familiar with the concept, a stall occurs when the cache does not contain the requested data, which must then be fetched from main memory. While the data is fetched, the core can continue to process different instructions using out-of-order (OOO) execution, however the processor will eventually run out of independent instructions. It will have to wait for the information it needs. This is called a stall.

 

We can gather these stall statistics thanks to the Performance Monitor Unit (PMU) contained within each of the Zynq UltraScale+ MPSoC’s CPUs. The PMU provides six profile counters, which are configured by and post processed by XSDK to generate the statistics above.

 

If we want to use the performance monitor within SDK, we need to work with a debug build and then open the Performance Monitor Perspective within XSDK. If we have not done so before, we can open the perspective as shown below:

 

 

Image1.jpg 

 

 

Image2.jpg

 

 

Opening the Performance Analysis Perspective

 

 

With the performance analysis perspective open, we can debug the application as normal. However, before we click on the run icon (the debugger should be set to stop at main, as default), we need to start the performance monitor. To do that, right click on the “System Debugger on Local” symbol within the performance monitor window and click start.

 

 

Image3.jpg

 

Starting the Performance Analysis

 

 

 

Then, once we execute the program, the statistics will be gathered and we can analyse them within XDSK to determine the best optimizations for our code.

 

To demonstrate how we can use this technique to deliver a more optimized system, I have created a design that runs on the ZedBoard and performs AES256 Encryption on 1024 packets of information. When this code was run the ZedBoard the following execution statistics were collected:

 

 

Image4.jpg

 

Performance Graphs

 

 

 

Image5.jpg

 

Performance Counters

 

 

 

So far, these performance statistics only look at code executing on the PS itself. Next time, we will look at how we can use the AXI Performance Monitor with XSDK. If we wish to do this, we need to first instrument the design in Vivado.

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.

 

Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:

 

  • Machine learning for manufacturing
  • Cloud acceleration
  • Autonomous cars, drones, and robots
  • Real-time 4K, UHD, and 8K video and image processing
  • VR and AR
  • High-speed networking by RF, LED-based free-air optics, and fiber
  • Cybersecurity for IIoT

 

That’s a huge range covered in just five minutes.

 

Here’s the video:

 

 

 

 

 

Perhaps you think DPDK (Data Plane Development Kit) is a high-speed data-movement standard that’s strictly for networking applications. Perhaps you think DPDK is an Intel-specific specification. Perhaps you think DPDK is restricted to the world of host CPUs and ASICs. Perhaps you’ve never heard of DPDK—given its history, that’s certainly possible. If any of those statements is correct, keep reading this post.

 

Originally, DPDK was a set of data-plane libraries and NIC (network interface controller) drivers developed by Intel for fast packet processing on Intel x86 microprocessors. That is the DPDK origin story. Last April, DPDK became a Linux Foundation Project. It lives at DPDK.org and is now processor agnostic.

 

DPDK consists of several main libraries that you can use to:

 

  • Send and receive packets while minimizing the number of CPU cycles needed (usually less than 80)
  • Develop fast packet-capture algorithms
  • Run 3rd-party fast-path stacks

 

So far, DPDK certainly sounds like a networking-specific development kit but, as Atomic Rules’ CTO Shep Siegel says, “If you can make your data-movement problem look like a packet-movement problem,” then DPDK might be a helpful shortcut in your development process.

 

Siegel knows more than a bit about DPDK because his company has just released Arkville, a DPDK-aware FPGA/GPP data-mover IP block and DPDK PMD (Poll Mode Driver) that allow Linux DPDK applications to offload server cycles to FPGA gates in tandem with the Linux Foundation’s 17.05 release of the open-source DPDK libraries. Atomic Rules’ Arkville release is compatible with Xilinx Vivado 2017.1 (the latest version of the Vivado Design Suite), which was released in April. Currently, Atomic rules provides two sample designs:

 

 

  • Four-Port, Four-Queue 10 GbE example (Arkville + 4×10 GbE MAC)
  • Single-Port, Single-Queue 100 GbE example (Arkville + 1×100 GbE MAC)

 

(Atomic Rules’ example designs for Arkville were compiled with Vivado 2017.1 as well.)

 

 

These examples are data movers; Arkville is a packet conduit. This conduit presents a DPDK interface on the CPU side and AXI interfaces on the FPGA side. There’s a convenient spot in the Arkville conduit where you can add your own hardware for processing those packets. That’s where the CPU offloading magic happens.

 

Atomic Rules’ Arkville IP works well with all Xilinx UltraScale devices but it works especially well with Xilinx UltraScale+ All Programmable devices that provide two integrated PCIe Gen3 x16 controllers. (That includes devices in the Kintex UltraScale+ and Virtex UltraScale+ FPGA families and the Zynq UltraScale+ MPSoC device families.)

 

Why?

 

Because, as BittWare’s VP of Network Products Craig Lund says, “100G Ethernet is hard. It’s not clear that you can use PCIe to get [that bit rate] into a server [using one PCIe Gen3 x16 interface]. From the PCIe specs, it looks like it should be easy, but it isn’t.” If you are handling minimum-size packets, says Lund, there are lots of them—more than 14 million per second. If you’re handling big packets, then you need a lot of bandwidth. Either use case presents a throughput challenge to a single PCIe Root Complex. In practice, you really need two.

 

BittWare has implemented products using the Atomic Rules Arkville IP, based on its XUPP3R PCIe card, which incorporates a Xilinx Virtex UltraScale+ VU13P FPGA. One of the many unique features of this BittWare board is that it has two PCIe Gen3 x16 ports: one available on an edge connector and the other available on an optional serial expansion port. This second PCIe Gen3 x16 port can be connected to a second PCIe slot for added bandwidth.

 

However, even that’s not enough says Lund. You don’t just need two PCIe Gen3 x16 slots; you need two PCIe Gen2 Root Complexes and that means you need a 2-socket motherboard with two physical CPUs to handle the traffic. Here’s a simplified block diagram that illustrates Lund’s point:

 

 

BittWare XUPP3R PCIe Card with two processors.jpg 

 

 

BittWare’s XUPP3R PCIe Card has two PCIe Gen3 x16 ports: one on an edge connector and the other on an optional serial expansion port for added bandwidth

 

 

 

BittWare has used its XUPP3R PCIe card and the Arkville IP to develop two additional products:

 

 

 

Note: For more information about Atomic Rules’ IP and BittWare’s XUPP3R PCIe card, see “BittWare’s UltraScale+ XUPP3R board and Atomic Rules IP run Intel’s DPDK over PCIe Gen3 x16 @ 150Gbps.”

 

 

Arkville is a product offered by Atomic Rules. The XUPP3R PCIe card is a product offered by BittWare. Please contact these vendors directly for more information about these products.

 

 

 

 

 

By Adam Taylor

 

So far, our examination of the Zynq UltraScale MPSoC + has focused mainly upon the PS (processing system) side of the device. However, to fully utilize the device’s capabilities we need to examine the PL (programmable logic) side also. So in this blog, we will look at the different AXI interfaces between the PS and the PL.

 

 

Image1.jpg

 

Zynq MPSoC Interconnect Structure

 

 

 

These different AXI interfaces provide a mixture of master and slave ports from the PS perspective and they can be coherent or not. The PS is the master for the following interfaces:

 

  1. FPD High Performance Master (HPM) – Two interfaces within the Full Power Domain.
  2. LPD High Performance Master (HPM) – One Interface within the Low Power Domain.

 

For the remaining interfaces the PL is the master:

 

  1. FPD High Performance Coherent (HPC) – Two Interfaces within the Full Power Domain. These interfaces pass through the CCI (Cache Coherent Interconnect) and provide one-way coherency from the PL to the PS.
  2. FPD High Performance (HP) – Four Interfaces within the Full Power Domain. These interfaces provide non-coherent transfers.
  3. Low Power Domain – One interface within the Low Power Domain.
  4. Accelerator Coherency Port (ACP) – One interface within the Full Power Domain. This interface provides one-way coherency (IO) allowing PL masters to snoop the APU Cache.
  5. Accelerator Coherency Extension (ACE) – One interface within the Full Power Domain. This interface provides full coherency using the CCI. For this interface, the PL master needs to have a cache within the PL.

 

Except for the ACE and ACP interfaces, which have a fixed data width, the remaining interfaces have a selectable data width of 32, 64, or 128 bits.

 

To support the different power domains within the Zynq MPSoC, each of the master interfaces within the PS is provided with an AXI isolation block that isolates the interface should a power domain be powered down. To protect the APU and RPU from hanging up performing an AXI access, each PS master interface also has a AXI timeout block to recover from any incorrect AXI interactions—for example, if the PL is not powered or configured.

 

We can use these interfaces simply within our Vivado design, where we can enable, disable, and configure the desired interface.

 

 

Image2.jpg

 

 

 

Once you have enabled and configured the desired interfaces, you can connect them into your design in the PL. Within the simple example in this blog post, we are going to transfer data to and from a BRAM located within the PL.

 

 

Image3.jpg 

 

 

This example uses the AXI master connected to the low-power domain (LPD). However, both the APU and the RPU can address the BRAM via this interface thanks to the SMMU, the Central Switch, and the Low Power Switch. However, the use of the LPD AXI interconnect will allow the RPU to access the PL if the FPD (full-power domain) is powered down. Of course, it does increase complexity when using the APU.

 

This simple example performs the following steps:

 

  • Reads 256 addresses and check that they are all zero.
  • Write a count into the 256 addresses.
  • Read back the data stored in the 256 addresses to demonstrate that the data was written correctly.

 

 

 

Image4.jpg

 

Program Starting to read addresses for part 1

 

 

 
Image5.jpg

 

Data written to the first 256 BRAM addresses

 

 

 

Image6.jpg 

 

Data read back to confirm the write

 

 

The key element in our designs is selecting the correct AXI interface for the application and data transfers at hand and ensuring that we are getting the best possible performance from the interconnect. Next time we will look at the quality of service and the AXI performance monitor.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

Adam Taylor’s MicroZed Chronicles Part 199: The AD9467 SDSOC Platform

by Xilinx Employee ‎05-31-2017 03:36 PM - edited ‎05-31-2017 03:54 PM (2,999 Views)

 

By Adam Taylor

 

Having got the base hardware and software designs up and running, the next step is to create a SDSoC platform so that we can use this design efficiently. The SDSoC platform allows us to implement algorithms at a much higher level using C or C++. We can therefore develop C or C++ programs using SDSoC to access the ADC sample data within the DDR memory and verify that our algorithms work correctly. Once we are sure that we have the correct algorithmic function (but not necessarily the desired performance), we can accelerate these algorithms by putting them into the Zynq SoC’s programmable logic (PL) rather seamlessly. Taking such an approach enables us to use one base design for a range of applications. Because we are developing in a higher language, the time taken to produce the first working demonstration is reduced.

 

To generate an SDSoC platform, we need a Vivado base design, the necessary software libraries, and three definition files:

 

  • XPFM – This is the top-level definition of the Platform – Generated by hand
  • HPFM – The hardware definition of the platform – Generated by Vivado
  • SPFM – Defines the software definition of the platform – Generated by hand

 

The first thing we need to do to create the SDSoC platform (I am using version 2016.3) is to modify the design in Vivado using the UG1146 requirements for a hardware platform. This means that we need to update the concatenation block and move the used interrupts down to the least significant inputs. This frees up the remaining interrupts so that SDSoC can use them when it accelerates an algorithm using hardware. I also enabled all four FCLKs and Resets from the Zynq SoC’s PS (processing system) to the PL and instantiated the reset blocks for each of these clocks. I then followed the steps within UG1146 to create the hardware metadata to create one half of the platform. In this case, the hardware side of the SDSoC Platform makes available the AXI ACP, AXI HP2, AXI HP3, and AXI GP Master 1 connections. The other AXI interfaces are already in use by the existing AD9467 demo design.

 

There is one more thing we need as we create the hardware platform. Because this is a custom platform, which uses custom IP, we need to ensure that the IP is within the Vivado project for the SDSoC Platform. If it is not, then when we try to build our SDSoC platform we will get several failures in the build process because it cannot find IP information. The simplest method for preventing this problem is to use the Vivado Archive function to archive the design. Then the archived design will be extracted and used to define the SDSoC hardware platform.

 

To create the software platform (as we are using the ZedBoard for this example), I initially copied the software and top-level XML file from the <SDSoC Install>/platforms/zed directory, before editing them to reflect the needs of the platform:

 

 

Image1.jpg 

 

Top Level of the ad9467_fmc_zed SDSoC Platform

 

 

These steps provided me with an SDSoC platform that I can use for development with the ZedBoard and the AD9467 FMC. My next step then was to perform some pipe cleaning to ensure that the platform functions as intended. To do this I wanted to:

 

 

  1. Build the AD9467 demo application and run it from with SDSoC with no acceleration.
  2. Create a simple acceleration example built onto to the base hardware. For this I am going to use one of the matrix multiply examples.

 

 

As I did not declare a prebuilt platform, SDSoC will generate the hardware the first time we build the application. I did this to ensure that SDSoC can re-build the hardware design without any accelerations but with the custom IP blocks needed for the AD9467 demo.

 

 

 

Image2.jpg

 

Vivado Diagram as used for the AD9467 Demo application

 

 

Having built the first application successfully, I then ran it on the ZedBoard with the AD9467 FMC connected and observed the same performance as I had previously seen when using SDK. This means that I can start developing that use the data provided by the AD9467 within the SDSoC environment.

 

 

Image3.jpg

 

 

However, once I have finished generating and testing my algorithms in C/C++, I will want to accelerate elements of the design. That is where the second test of the platform comes in: to test that the platform is correctly defined and is therefore capable of accelerating C and C++ functions into the hardware. Within the AD9467 FMC SDSoC platform, I created an example application for acceleration using one of the predefined SDSoC examples: the mmult. This will add functionality necessary to perform the MMult within the hardware in addition to the base design we have been using for the AD9467.

 

 

Image4.jpg

 

Accelerating the mmult_accel function in the AD9467 FMC Zed Platform

 

 

 

Image5.jpg 

 

Resultant SDSoC Vivado design, AD9467 FMC design with additional hardware for the mmult_accel function (circled in red)

 

 

 

Image6.jpg

 

MMULT results on the AD9467 FMC Zed Platform

 

 

Generating this SDSoC platform was pretty simple and it allows us to develop our applications much faster than would be the case if we were using a standard HDL based approach. We will look at how we can do signal processing with this platform in future blogs.

 

 

I have uploaded the SDSoC Platform to the following git hub repository which is different to the standard one due to the organization of the platform.

 

 

 

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 MicroZed Chronicles Second Year.jpg

 

 

Adam Taylor’s MicroZed Chronicles Part 198: Building the 250Msamples/sec AD9467 FMC Card

by Xilinx Employee ‎05-30-2017 10:48 AM - edited ‎05-30-2017 10:50 AM (3,788 Views)

 

By Adam Taylor

 

 

Last week I mentioned, the Analog Devices AD9467 FMC in the blog and how we could use it with the Xilinx SDSoC development environment to capture data with a simple data-capture chain and then develop and accelerate the algorithm using a high-level language like C or C++.

 

 

Image1.jpg

 

Analog Devices AD9467 FMC and Zynq-based Avnet ZedBoard Combined

 

 

 

The AD9467 FMC contains the AD9467 ADC, which provides 16-bit quantization at sampling rates of up to 250Msamples/sec (MSPS). These specs allow us to use the AD9467 to sample Intermediate Frequency (IF) signals. An IF is used to move an RF carrier wave down from or up to a higher frequency for reception or transmission.

 

The first thing we need to do with the AD9467 board is to work out the clocking scheme we’ll use to provide the ADC with a sample clock. We have three options:

 

  1. Apply an externally generated sine-wave. This option allows us to easily change the sampling frequency. However, to ensure good convertor performance, we’ll need a low-jitter clock from a quality signal source.
  2. Use the on-board oscillator. This option provides a fixed 250MHz reference clock to the ADC. It has the advantage of being an on-board resource with a known good layout. However, its sampling frequency is fixed.
  3. Use the on-board AD9517—an SPI-controlled, 12-output clock generator. This option gives us the ability to set the sampling frequency as desired.

 

To change between the three sources, we add and remove ac coupling capacitors from the circuit to put the correct clock generator in the clock path. By default, the clock path is configured to use the external clock source.

 

However, before we can create an SDSoC Platform, we need to create a base design in Vivado. This base design interfaces with the AD9467 FMC and transfers the sampled data into the Zynq SoC’s PS (processing system) DDR memory using DMA. Rather helpfully, the AD9467 FMC comes with a Vivado example that we can use with the ZedBoard. This example design creates the structure to transfer samples into the PS DDR SDRAM using DMA.

 

To recreate this design, the first thing we need to do is download the Analog Devices Git Hub repository, which contains both the shared IP elements required and the actual Vivado design example. To ensure we are using the latest possible tool chain, select the latest tool revision from the Git Hub and download a zip of the repository or clone the repository from here.

 

To build this project, we need to be using either a Linux box or, if we are using Microsoft Windows, we’ll need to download and install CYGWIN. If you are using CYGWIN, you need to make sure you have Vivado in your path.

 

To build the project you just need to use either a terminal or CYGWIN to navigate to the AD9467_FMC directory and execute the make file for the Zed version.

 

 

Image2.jpg 

 

Make file running in CYGWIN to recreate the project

 

 

 

Once this has been recreated, we will be able to open our project in Vivado, explore the design in the block diagram, and export the design. We can then use the test application software to complete the demo.

 

 

 

Image3.jpg 

 

AD9467 FMC example design

 

 

 

As can be seen in the above example, these steps add the FMC example into the existing Zynq base hardware design so that all the other interfaces like HDMI are still available. These additional interfaces can be very useful to us. In the diagram above, you can see the highlighted path from the AD9467 receiver IP, into a DMA IP block and then an AXI Interconnect block that connects to a Zynq HP (high-performance) AXI port. This design allows the data move seamless into the PS DDR SDRAM for future processing.

 

Of course to do this we need to run some software on the Zynq SoC’s ARM Cortex-A9 processor to configure the AD9467, the AD9517, and the simple internal processing pipeline. You can download the demo application example from here on GitHub. Helpfully, it comes with batch files (one for Linux one for Windows), which are used to create the demo software application to support the Vivado design.

 

When we run this example on the Zynq SoC, we will find that it performs a number of tests prior to performing the first ADC sample capture.

 

 

 

Image4.jpg 

 

Terminal Output from ZedBoard if the FMC is present

 

 

 

The samples will be stored at 0x0800_0000 within the DDR SDRAM. Using the debug facility within SDK, we can examine these values and see that they are updated when the sampling occurs.

 

 

Image5.jpg

 

DDR Memory location at 0x0800_0000 following power cycle

 

 

 

Image6.jpg

 

DDR Memory Location at 0x0800_0000 following the samples being captured

 

 

 

With this up and working, we can now think about how we can use the base platform efficiently to implement higher-level signal-processing algorithms.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

 

 

By Adam Taylor

 

 

So far on this journey (which is only just beginning) of looking at the Zynq UltraScale+ MPSoC we have explored mostly the A53 processors within the Application Processing Unit (APU). However, we must not overlook the Real-Time Processing Unit (RPU), which contains two ARM Cortex-R5 32 bit RISC processors and operates within the Zynq MPSoC’s PS’ (processing systems’) Low Power Domain.

 

 

Image1.jpg 

 

 

R5 RPU Architecture

 

 

The RPU executes real-time processing applications, including safety-critical applications. As such, you can use it for applications that must comply with IEC61508 or ISO 26262. We will be looking at this capability in more detail in a future blog. To support this, the RPU can operate in two distinct modes:

 

  • Split or Performance: - Both cores operate independently
  • Lock-Step: - Both cores operate in lockstep

 

Of course, it is the lock-step mode which is implemented as one step when a safety application is being implemented (see chapter 8 of the TRM for full safety and security capabilities). To provide deterministic processing times, both ARM Cortex-R5 cores include 128KB of Tightly Coupled Memory (TCM) in addition to the Caches and OCM (on-chip memory). How the TCMs are used depends upon the operating mode. In Split mode, each processor has 128Kbytes of TCM (divided into A and B TCMs). In lock-step mode, there is one 256Kbyte TCM.

 

 

Image2.jpg

 

RPU in Lock Step Mode

 

 

At reset, the default setting configures the RPU to operate in lock-step mode. However, we can change between the operating modes while the processor group is in reset. We do this by updating the RPU Global Control Register SLCAMP bit, which clamps the outputs of the redundant processors, and the SLSPLIT bit, which sets the operating mode. We cannot change the RPU’s operating mode during operation, so we need to decide upfront during the architectural phase which mode we desire for a given application.

 

However, we do not have to worry about setting these bits when we use the debugger or generate a boot image. Instead we can use these to configure the operating mode. What I want to look at in the rest of the blog is look at how we configure the RPU operating mode both in our debug applications and boot-image generation.

 

The first way that we verify many of our designs is to use the System Debugger within SDK, which allows us to connect over JTAG or Ethernet and download our application. Using this method, we can of course use breakpoints and step through the code as it operates, to get to the bottom of any issues in the design. Within the debug configuration tab, we can also enable the RPU to operate in split mode if that’s the mode we want after system reset.

 

 

Image3.jpg 

 

Debug Configuration to enable RPU Split Mode

 

 

When you download the code and run it on the Zynq MPSoC’s RPU, you will be able to see the operating mode within the debug window. This should match with your debug configuration setting.

 

 

Image4.jpg

 

Debug Window showing Lock-Step Mode

 

 

Once we are happy with the application, we will want to create a boot image and we will want to determine the RPU operating mode when we create that boot image. We can add the RPU elf to the FSB, FPGA, and APU files using the boot-image dialog. To select the RPU mode, we choose the edit option and then select the destination CPU—either both ARM Cortex-R5 cores in lockstep or the ARM Cortex-R5 core we wish it run on if we are using split mode.

 

 

 

Image5.jpg

 

Selecting the R5 Mode of operation when generating a boot image

 

 

Of course if we want to be sure we are in the correct mode in this operation, we need to read the RPU Global Control register and ensure the correct mode is selected as expected.

 

Now that we understand the different operating modes of the Zynq UltraScale+ MPSoC’s RPU, we can come back to these modes when we look at the security and safety capabilities provided by the Zynq MPSoC.

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

This week at its annual NI Week conference in Austin, Texas, National Instruments (NI) announced a new FlexRIO PXIe module, the PXIe-7915, based on three Xilinx Kintex UltraScale FPGAs. NI’s PCIe FlexRIO modules serve two purposes in NI-based systems: flexible, programmable, high-speed I/O and high-speed computation (usually DSP). NI’s customers access these FlexRIO resources using the company’s LabVIEW FPGA software, part of NI’s LabVIEW graphical development environment. Thanks to the Kintex UltraScale FPGA, new FlexRIO PXIe-7915 module contains significantly more programmable resources and delivers significantly more performance than previous FlexRIO modules, which are all based on earlier generations of Xilinx FPGAs. The set of graphs below shows the increased resources and performance delivered by the PXIe-7915 FlexRIO module in NI systems relative to previous-generation FlexRIO modules based on Xilinx Kintex-7 FPGAs:

 

 

 

FlexRIO UltraScale Graphs.jpg 

 

 

However, the UltraScale-based FlexRIO modules are not simply standalone products. They serve as design platforms for NI’s design engineers, who will use these platforms to develop many new, high-performance instruments. In fact, NI introduced the first two of these new instruments at NI Week 2017: the PXIe-5763 and PXIe-5764 high-speed, quad-channel, 16-bit digitizers. Here are the specs for these two new digitizers from NI:

 

 

NI FlexRIO Digitizers based on Kintex UltraScale FPGAs.jpg 

 

 

Previous digitizers in this product family employed parallel LVDS signals to couple high-speed ADCs to an FPGA. However, today’s fastest ADCs employ high-speed serial interfaces--particularly the JESD-204B interface specification—necessitating a new design. The resulting new design uses the FlexRIO PXIe-7915 card as a motherboard and the JESD204B ADC card as a mezzanine board, as shown in this photo:

 

 

NI FlexRIO PXIe-5764 Digitizer.jpg 

 

 

 

NI’s design engineers took advantage of the pin compatibility among various Kintex UltraScale FPGAs to maximize the flexibility of their design. They can populate the FlexRIO PXIe-7915 card with a Kintex UltraScale KU035, KU040, or KU060 FPGA depending on customer requirements. This flexibility allows them to create multiple products using one board layout—a hallmark of a superior, modular platform design.

 

Normally, you access the programmable-logic features of a Xilinx FPGA or Zynq SoC inside of an NI product using LabVIEW FPGA, and that’s certainly still true. However, NI has added something extra in its LabVIEW 2017 release: a Xilinx Vivado Project Export feature that provides direct access to the Xilinx Vivado Design Suite tools for hardware engineers experienced with writing HDL code for programmable logic. Here’s how it works:

 

 

LabVIEW Vivado Export Design Flow.jpg 

 

 

 

You can export all the necessary hardware files from LabVIEW 2017 to a Vivado project that is pre-configured for your specific deployment target. Any LabVIEW signal-processing IP in the LabVIEW design is included in the export as encrypted IP cores. As an added bonus, you can use the new Xilinx Vivado Project Export on all of NI’s FlexRIO and high-speed-serial devices based on Xilinx Kintex-7 or newer FPGAs.

 

 

NI has published a White Paper describing all of this. You’ll find it here.

 

Please contact National Instruments directly for more information about the new FlexRIO modules and LabVIEW 2017.

 

 

Adam Taylor’s MicroZed Chronicles, Part 196: SDSoC and Levels of Abstraction

by Xilinx Employee ‎05-22-2017 09:40 AM - edited ‎05-22-2017 10:28 AM (4,522 Views)

 

By Adam Taylor

 

 

We have looked at SDSoC several times throughout this series, however I recently organized and presented at the NMI FPGA Machine Vision event and during the coffee breaks and lunch, attendees showed considerable interest in SDSoC—not only for its use in the Xilinx reVISION acceleration stack but also its use in a range of over developments. As such, I thought it would be worth some time looking at what SDSoC is and the benefits we have previously gained using it. I also want to discuss a new use case.

 

 

Image1.jpg 

 

SDSoC Development Environment

 

 

 

SDSoC is an Eclipse-based, system-optimizing compiler that allows us to develop our Zynq SoC or Zynq UltraScale+ MPSoC design in its entirety using C or C++. We can then profile the application to find aspects that cause performance bottlenecks and move then into the Zynq device’s Programmable Logic (PL). SDSoC does this using HLS (High Level Synthesis) and a connectivity framework that’s transparent to the user. What this means is that we are able develop at a higher level of abstraction and hence reduce the time to market of the product or demonstration.

 

To do this, SDSoC needs a hardware platform, which can be pre-defined or custom. Typically, these platforms within the PL provide the basics: I/O interfaces and DMA transfers to and from Zynq device’s PS’ (Processing System’s) DDR SDRAM. This frees up most the PL resources and PL/PS interconnects to be used by SDSoC when it accelerates functions.

 

This ability to develop at a higher level and accelerate performance by moving functions into the PL enables us to produce very flexible and responsive systems. This blog has previously looked at acceleration examples including AES encryption, matrix multiplication, and FIR Filters. The reduction in execution time has been significant in these cases. Here’s a table of these previously discussed examples:

 

 

Image2.jpg

 

Previous Acceleration Results with SDSoC. Blogs can be found here

 

 

 

To aid us in the optimization of the final application, we can use pragmas to control the HLS optimizations. We can use SDSoC’s tracing and profiling capabilities while optimizing these accelerated functions and the interaction between the PS and PL.

 

Here’s an example of a trace:

 

 

Image3.jpg 

 

Results of tracing an example application

(Orange = Software, Green = Accelerated function and Blue = Transfer)

 

 

Let us take a look at a simple use case to demonstrate SDSoC’s abilities.

 

Frequency Modulated Continuous Wave (FMCW) RADAR is used for a number of applications that require the ability to detect objects and gauge their distance. FMCW applications make heavy use of FFT and other signal-processing techniques such as windowing, Constant False Alarm Rate (CFAR), and target velocity and range extraction. These algorithms and models are ideal for description using a high-level language such as C / C++. SDSoC can accelerate the execution of functions described this way and such an approach allows you to quickly demonstrate the application.

 

It is possible to create a simple FMCW receive demo using a ZedBoard and an AD9467 FPGA Mezzanine Card (FMC). At the simplest level, the hardware element of the SDSoC platform needs to be able to transfer samples received from the ADC into the PS memory space and then transfer display data from the PS memory space to the display, which in most cases will be connected with DVI or HDMI interfaces.

 

 

 

Image4.jpg

 

Example SDSoC Platform for FMCW application

 

 

This platform permits development of the application within SDSoC at a higher level. It also provides a platform that we can use for several different applications, not just FMCW. Rather helpfully, the AD9467 FMC comes with a reference design that can serve as the hardware element of the SDSoC Platform. It also provides drivers, which can be used as part of the software element.

 

With a platform in hand, it is possible to write the application within the SDSoC using C or C++, where we can make use of the acceleration libraries and stacks including matrix multiplication, math functions, and the ability to wrap bespoke HLD IP cores and use them within the development.

 

Developing in this manner provides a much faster development process, and provides a more responsive solution as it leverages the Zynq PL for inherently parallel or pipelined functions. It also makes it easier to upgrade designs in terms. As the majority development will also use C or C++ and because SDSoC is a system-optimizing complier, the application developer does not need to be a HDL specialist.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

 

By Adam Taylor

 

When I demonstrated how to boot the ZedBoard using the TFTP server, there was one aspect I did not demonstrate: configuring the Zynq SoC’s PL (programmable logic) over the TFTP. It’s very simple to do. We can include the PL bin file along with the Kernel, RAM Disk, and Device Tree blob on the server and then allow U-Boot to configure the PL as it boots, just as we did for the other elements.

 

We can also configure the Zynq SoC’s PL at any time we want using either Linux or bare-metal applications. To do this we use the DevC (Device configuration)/PCAP (Processor Configuration Access Port) within the Zynq SoC’s PS (processing system). There are three methods through which we configure the PL. The most obvious being JTAG, followed by PCAP under PS control, with the final method being the ICAP (Internal Configuration Access Port). It is through the DevC interface that we configure the PL when the device boots using the FSBL or U-Boot. The ICAP path is the least-used method and requires a configured PL prior to its use. One example where you might use the ICAP path would be to allow a MicroBlaze soft-core processor to reconfigure the PL.

 

 

Image1.jpg 

 

 

 

When the device is running, we can replace the contents of the PL with an updated design using the same interface. All that we need to do is to have generated the new bit file and ensure that it is accessible to the program running on the ARM Cortex-A9 processors in the Zynq SoC’s PS so that they can download it via the DevC interface.

 

If we are using Linux, we can upload the file into the file system using FTP. We can then use the built-in DevC driver within the Linux Kernel to download the bit file.

 

 

 

Image2.jpg 

 

 

 

From a command prompt, we can enter the command:

 

 

cat {filename} > /dev/xdevcfg

 

 

to download the bit file. When I did this for a simple Zedboard design, as shown below—which includes the ability to drive the LEDS connected to the PL—the “Done” LED lit. Of course, to ensure correct operation we need to have the device tree blob correctly configured to support the PL design.

 

 

 

Image3.jpg 

 

 

 

If we want to configure the Zynq SoC’s PL using bare-metal software, we can use a similar approach. The BSP comes with an example file that downloads a PL image using the DevC interface provided that we have the PL file loaded into the Zynq SoC’s attached DDR memory. We can access the example and include it within our design using the System.MSS file, which is provided when we generate a BSP.

 

 

 

Image4.jpg 

 

 

To correctly use the example provided, we need to have a PL bit file loaded in the DDR Memory. For a production-ready system, we would have to store the PL configuration file within a non-volatile memory and then load it into the DDR at a known address before running the DevC example code. However, to demonstrate the concept, we can use the debugger to download the configuration file into the DDR at the desired memory location.

 

Within the application example, all we need to do is define the location of the configuration file and the size of the file:

 

 

 

Image5.jpg

 

 

Having demonstrated how we can reconfigure the PL in its entirety, we can also use a similar approach to partially reconfigure regions within the PL, which we will look at in future blogs.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

 

The huge number of low-cost Arduino peripheral shields from dozens of vendors makes the Arduino form factor extremely attractive. Now you can take advantage of that shield library using the Zynq SoC with the inexpensive, €89 Trenz ArduZynq, which puts a single-core Xilinx Zynq Z-7007S SoC along with 512Mbytes of DDR3L SDRAM, 16Mbytes of SPI Flash memory, and a MicroSD card socket into an Arduino form factor.

 

Here’s a photo of the Trenz ArduZynq board:

 

 

Trenz ArduZynq.jpg 

 

 

€89 Trenz ArduZynq puts a single-core Xilinx Zynq Z-7007S SoC into the Arduino form factor

 

 

 

Here’s a pinout diagram of the ArduZynq:

 

 

Trenz ArduZynq Pinout.jpg

 

Trenz ArduZynq Pinout

 

 

 

 

Hardware and software design for this board is supported by the Xilinx Vivado Design Suite HL WebPACK Edition, downloadable at no cost.

 

 

Note: For more information about the single-core Zynq SoC family, see “One-ARM Zynq family joins the Zynq parade. Now you can choose from 31 devices with one, two, four, or six ARM microprocessors.”

 

 

By Adam Taylor

 

One of the great things about many of the Zynq SoC’s PS (processing system) peripherals is that we can break them out via the PL (programmable logic) I/O. This capability provides us with great flexibility at the system level as we can implement more peripherals than can be supported over the Zynq SoC’s MIO on its own. However, during the years of writing the MicroZed Chronicles, I have been asked questions by a few readers’ about mapping from the PS to the PL I/O using EMIO and how to map when using PS GPIO. So in this post, I am going to address those questions and provide a nice simple reference for how to do it.

 

We can break out many of the PS peripherals into the PL using EMIO. The exceptions are the USB ports, the SMC (static memory controller), and the QSPI Flash controller. There may be some performance degradation when the EMIO is used. For example, SDcard controller I/O operates at 50MHz when routed to the MIO and 25MHz when routed to the EMIO.

 

 

Image1.jpg

 

Peripheral and Routing to the EMIO

 

 

When we route signals to the EMIO, we will see the appropriate port appear at the top level of the Zynq IP block within Vivado. To enable these signals, we need to configure the SPI to use the EMIO, which is done on the MIO configuration tab of the Zynq IP Configuration Wizard. We can enable the SPI and from the IO drop down select EMIO. This will create a SPI port at the top level of the design.

 

 

Image2.jpg 

 

Selecting the EMIO for SPI

 

 

Image3.jpg  

Resultant SPI port on the Zynq Block with port added

 

 

 

 

We can then use the standard XDC constraints file to route the I/O to any of the PL pins as we would for a normal element within the PL design.

 

Where it gets slightly more complicated is when we are using the GPIO and decide to extend that using EMIO. Suddenly, we need to understand GPIO banks and GPIO Numbers and IO pins.

 

The Zynq-7000 series provides 54 GPIO signals in two banks dedicated the MIO (although if you use all 54 pins you cannot use any other peripherals). These banks consist of a 32-bit bank 0 and a 22-bit bank 1. Additionally, there are also two EMIO-only banks. Both are 32 bits wide. Within the EMIO, these banks provide 64 inputs, 64 outputs, and another 64 output enables that can be used as outputs, giving us a total of 192 I/O signals (64 Inputs, 128 Outputs).

 

These GPIO signals are numbered from 0 to 53, for banks within the PS MIO, and 54 to 117 for GPIO within the EMIO region. When we break these signals out into the EMIO, the Zynq IP block will show them on the Zynq IP block. Note that GPIO 0 on the Zynq port is Pin 54 for the ARM cores. These IO signals can then be routed to the PL IO as we would any other signal and tied to a specific IO pin and standard using the XDC file. The diagram below shows the relationship between the different elements:

 

 

 

Image4.jpg 

 

 

It does get slightly confusing however when we use software to drive the GPIO signals within SDK. To drive the desired GPIO pin, we must use either the bank or the pin number. For GPIO signals, the pin numbers range from 0 and 53. For the EMIO signals, pin numbers range from 54 to 117. Once we understand this and that we route the signals in the PL just like we do any other signal, we can quickly use the EMIO using the XGPIOPS library provided with the BSP.

 

Hopefully this makes things a little clearer to those still starting out the relationships among the Zynq software, the Zynq SoC’s PL, and the XDC file.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

Face it, you use PCIe to go fast. That’s the whole point of the interface. So when you move data over your PCIe buses, you likely want to go as fast as possible. Perhaps you’d like some tips on getting maximum PCIe performance when designing with Xilinx’s most advanced FPGAs. You’re in luck , there’s a new 13-minute video that discusses that topic.

 

The video covers these contributors to PCIe performance:

 

  • Selecting the appropriate link speed and width
  • Maximum payload size
  • Largest possible transfer size
  • Enabling the maximum number of DMA channels
  • Polling versus interrupts (polling is faster)

 

The video explores a PCIe design for the KCU105 Kintex UltraScale FPGA Evaluation Kit using the Vivado Design Suite’s graphical IP Integrator (IPI) tool. The design took only about 20 to 30 minutes to create using IPI.

 

The video then discusses the results of various performance experiments using this design. Results like this:

 

 

 

PCIe results.jpg 

 

 

 

Here’s the video:

 

 

 

 

 

Adam Taylor’s MicroZed Chronicles, Part 192: Pmod – What if there is no Driver?

by Xilinx Employee ‎05-08-2017 10:32 AM - edited ‎05-08-2017 10:33 AM (5,779 Views)

 

By Adam Taylor

 

 

We recently examined how we could use Pmods in our system. There are a lot of Pmods available from many vendors but drivers are not necessarily available for all of them. If there is no driver available, we can use the Pmod bridge in the Zynq SoC’s PL (programmable logic), which enables us to correctly map Pmod ports on our selected development board and to create our own Zynq PS (processing system) driver. If we were to explore one of the provided drivers, we would find these also use the Pmod bridge, coupled with an AXI IIC or SPI component.

 

 

 

Image1.jpg 

 

Pmod AD2 PL Driver Components

 

 

In this example, I am going to be using Digilent’s DA4 octal DAC Pmod. I’ll integrate it with Digilent’s dual ADC AD2 Pmod, which we previously used in the driver example. We will develop our own driver with the Pmod bridge, generate an analog signal using the DA4 Pmod, and then receive the signal using the AD2 Pmod.

 

 

 

Image2.jpg

 

Pmod DA4 test set up

 

 

The Pmod bridge allows us to define the input types for both the top and bottom row of the Pmod connector. We can select from either GPIO, UART, IIC, or SPI interfaces. We make this selection for each of the Pmod rows in line with the Pmod we wish to drive. Selecting the desired type makes the pinout of the Pmod connector align with the standard for the interface type.

 

For the DA4, we need to use a SPI interface on the top row only. With this selected, we need to provide the actual SPI communication channel. As we are using the Zynq SoC, we have two options. The first would be to use an AXI SPI IP block within the PL and connected to the bridge. The second approach—and the one I am going to use—is to connect the bridge to the Zynq PS’ SPI using EMIO. This choice provides us with the ability to wire the pins from the PS SPI ports to the bridge inputs directly.

 

To do this we need to read the standard to ensure we can map from the SPI pins to the input pins on the bridge in the correct order (e.g. which PS SPI signal is connected to IN_0?). As these pins on the bridge represent different interface types, they are named generically. The diagram below shows how I did it for the DA4. Once we have mapped the pins for this example, we can build the project, export it to SDK, and then write the software to drive our DA4.

 

 

Image3.jpg

 

 

We can use the SPI drivers created by the BSP within SDK to drive the DA4. To interact with the DA4, the first thing we need to do is initialize the SPI controller. Once we have set the SPI Options for clock phase and master operation, we can then define a buffer and use the polled-transfer mode to transfer the required information to the DA4. A more complex driver would use an interrupt-driven approach as opposed to a polled one.

 

 

Image4.jpg 

 

 

I have uploaded the file I created to drive the DA4 onto the git hub repository. To test it I drove a simple ramp output and used the scope feature in the Digilent Analog Discovery module to monitor the DAC output. I received the following signal:

 

 

Image5.jpg 

 

 

With this completed and the DA4 known to be working as expected, I connected the DA4 and the AD2 together so that the Zynq SoC could receive the signal:

 

 

Image6.jpg 

 

 

When doing this, we need to be careful to ensure the signal output by the DA4 is within the AD2 Pmod’s operating range.

 

Having completed this and shown that the DA4 is working on the hardware, we now understand how we can create drivers if there is no driver available.

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

This week at the Embedded Vision Summit in Santa Clara, CA, Mario Bergeron demonstrated a design he’d created that combines real-time visible and IR thermal video streams from two different sensors. (Bergeron is a Senior FPGA/DSP Designer with Avnet.) The demo runs on an Avnet PicoZed SOM (System on Module) based on a Xilinx Zynq Z-7030 SoC. The PicoZed SOM is the processing portion of the Avnet PicoZed Embedded Vision Kit. An FMC-mounted Python-1300-C image sensor supplies the visible video stream in this demo and a FLIR Systems Lepton image sensor supplies the 60x80-pixel IR video stream. The Lepton IR sensor connects to the PicoZed SOM over a Pmod connector on the PicoZed.

 

Here’s a block diagram of this demo:

 

 

Avnet reVISION demo with PicoZed Embedded Vision Kit.jpg 

 

 

Bergeron integrated these two video sources and developed the code for this demo using the new Xilinx reVISION stack, which includes a broad range of development resources for vision-centric platform, algorithm, and application development. The Xilinx SDSoC Development Environment and the Vivado Design Suite including the Vivado HLS high-level synthesis tool are all part of the reVISION stack, which also incorporates OpenCV libraries and machine-learning frameworks such as Caffe.

 

In this demo, Bergeron’s design takes the visible image stream and performs a Sobel edge extraction on the video. Simultaneously, the design also warps and resizes the IR Thermal image stream so that the Sobel edges can be combined with the thermal image. The Sobel and resizing algorithms come from the Xilinx reVISION stack library and Bergeron wrote the video-combining code in C. He then synthesized these three tasks in hardware to accelerate them because they were the most compute-intensive tasks in the demo. Vivado HLS created the hardware accelerators for these tasks directly from the C code and SDSoC connected the accelerator cores to the ARM processor with DMA hardware and generated the software drivers.

 

Here’s a diagram showing the development process for this demo and the resulting system:

 

 

Avnet reVISION demo Project Diagram.jpg 

 

 

In the video below, Bergeron shows that the unaccelerated Sobel algorithm running in software consumes 100% of an ARM Cortex-A9 processor in the Zynq Z-7030 SoC and still only achieves about one frame/sec—far too slow. By accelerating this algorithm in the Zynq SoC’s programmable logic using SDSoC and Vivado HLS, Bergeron cut the processor load by more than 80% and achieved real-time performance. (By my back-of-the envelope calculation, that’s about a 150x speedup: going from 1 to 30 frames/sec and cutting the processor load by more than 80%.)

 

Here’s the 5-minute video of this fascinating demo:

 

 

 

 

 

 

For more information about the Avnet PicoZed Embedded Vision Kit, see “Avnet’s $1500, Zynq-based PicoZed Embedded Vision Kit includes Python-1300-C camera and SDSoC license.”

 

 

For more information about the Xilinx reVISION stack, see “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge,” and “Yesterday, Xilinx announced the reVISION stack for software-defined embedded-vision apps. Today, there’s two demo videos.”

 

Free May 31 Webinar on the $99 ARTY FPGA Dev Board: Make something Awesome!

by Xilinx Employee ‎05-03-2017 02:55 PM - edited ‎05-25-2017 04:38 PM (4,944 Views)

 

ARTY v3 Small.jpgLooking for a low-cost way to get into the most advanced FPGA tools and device families? The $99 ARTY Eval Kit available from Avnet and Digilent is an excellent choice. The ARTY board features a Xilinx Artix-7 A35T FPGA with 256Mbytes of DDR3 SDRAM, Ethernet, four Digilent Pmod ports, and a set of Arduino Shield headers. The Artix-7 A35T FPGA is big enough to implement the 32-bit MicroBlaze soft RISC processor core .

 

Perhaps best of all, the kit includes a voucher for a downloadable copy of the Xilinx Vivado HL Design Edition, device-locked to the Artix-7 A35T FPGA. The full-featured Vivado HL Design Edition includes a lot of great tools including the IP Integrator (IPI), which lets you quickly build FPGA-based designs using graphical representations of IP blocks with intelligent, automated stitching.

 

This edition of Vivado also includes Vivado HLS, so you can experiment with logic synthesis based on design descriptions written in C, C++, or SystemC. The $99 ARTY FPGA Dev Board is really a great way to get started with the industry’s most advanced design tools for FPGA-based design.

 

If you want a closer look at the ARTY Dev Kit, there’s a free, 1-hour Webinar on May 31 you might want to attend. Register here.

 

For more information about the ARTY eval Kit, see “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????

 

 

NI PXIe-5172.jpg

 

National Instruments’ (NI’s) PXIe-5172, the newest member of the company’s PXIe-517x family, features four or eight 14-bit, 250Msamples/sec channels. Like the existing members in the family, an on-board Xilinx Kintex-7 FPGA manages the PXIe-5172’s measurement and control features. What’s new is that a portion of the FPGA is now available for user-defined, real-time functions. You can use NI’s LabVIEW FPGA, which integrates with the Xilinx Vivado Design Suite, to define new DSP functions and advanced triggering for the DSO. (You cannot realize such real-time functions at these speeds using software-based microprocessor implementations. You need the speed of programmable hardware.)

 

 

 

Here’s a feature comparison chart of the PXIe-517x DSO family:

 

 

 

 

NI PXIe-517x DSO Family.jpg

 

 

 

 

Here’s a block diagram of a PXIe-517x DSO:

 

 

NI PXIe-517x DSO Family.jpg

 

National Instruments PXIe-517x DSO Block Diagram

 

 

 

Please contact NI directly for more information about the PXIe-517x DSO family.

 

 

Note: Xcell Daily previously discussed PXIe-517x DSO instruments. See “FPGA-based PXIe Digital Oscilloscope is part of National Instruments’ new wave of Software Designed Instruments.”

 

 

By Adam Taylor

 

We need to be able to create more advanced, event-driven applications for Xilinx Zynq UltraScale+ MPSoCs but before that can happen, we need to look at some of the more complex aspects of these devices. In particular, we need to understand how interrupts function within the Zynq MPSoC’s PS (processing system). As would be expected, the Zynq MPSoC’s interrupt structure is slightly more complicated than the Zynq SoC’s PS because the Zynq MPSoC has more processor cores.

 

 

 

Image1.jpg 

 

 

Architectural view of the Interrupt System

 

 

 

The Zynq UltraScale+ MPSoC’s interrupt architecture has four main elements:

 

  1. RPU Generic Interrupt Controller V1 (GIC) – Manages Interrupts within the RPU (real-time processing unit)
  2. APU Generic Interrupt Controller V2 (GIC) – Manages Interrupts within the APU (application processing unit) with support for virtualization
  3. Inter Process Interrupt (IPI) – Enables interrupts between processing units
  4. GIC Proxy – Collates Interrupts acting as a GIC for the PMU (performance monitor unit)

 

At the highest level, we can break these interrupts down into several groupings, which are supplied to each element of the architecture:

 

  • Shared Peripheral Interrupts – 160 interrupt sources. Can be generated by the peripherals within the PS (e.g. IOU peripherals, PCIe etc.) and the PL (programmable logic) within the design
  • Private Peripheral Interrupts – These interrupts are private to a specific processor core
  • Software Generated Interrupts – Interrupts generated by software

 

Shared Peripheral Interrupts can also be sourced by the PL, which is where it gets interesting. We can enable interrupts in either direction between the PS and PL, from within the PS-PL configuration tab of the Zynq MPSoC customization GUI.

 

For the RPU, we are provided an IRQ and an FIQ for each processor core. For fast, low-latency responses, we should use the FIQ. For typical interrupt sources, we should use the IRQ.

 

 

 

Image2.jpg

 

 

RPU IRQ and FIQ Interrupts Enabled for each Core on the MPSoC

 

 

When it comes to the APU, we have two options for connecting the interrupts. The first option is to use the legacy IRQ and FIQ interrupts. There’s one of each for each processor core within the APU. When enabled at the top level of the Zynq MPSoC’s IP Block, we get two 4-bit ports—one for the IRQ and one for the FIQ. Again, the FIQ input provides lowest-latency interrupt response.

 

 

 

Image3.jpg 

 

 

APU IRQ and FIQ Interrupts Enabled for each Core on the MPSoC

 

 

 

The second approach to interrupts is to use interrupt groups. The APU’s GICv2 supports two interrupt groups: zero and one. You can assign interrupts within group zero to the IRQ or FIQ, while those within group 1 can only be assigned to IRQ. This assignment occurs internally within the GICv2. We can also use these interrupt groups when we implement secure environments, with group 0 being used for secure interrupts and group 1 for not-secure.

 

 

 

Image4.jpg 

 

 

APU IRQ Groups Enabled for each Core on the MPSoC

 

 

 

We can use the Inter-Processor Interrupt (IPI) to enable interrupts between the APU, RPU, and PMU. The IPI enables processors within the APU, RPU and PMU to interrupt each other. There IPI also has the ability to interrupt one or more softcore processors implemented within the Zynq MPSoC’s PL.

 

 

 

Image5.jpg

 

 

IPI Interrupt Numbers in the SPI

 

 

 

In addition to the interrupt, the IPI also provides a 32-byte IPI payload buffer for each direction, which can be used for limited communication. The IPI provides eight masters: the APU, RPU0, RPU1, and PMU, along with the LPD and FPD S AXI Interfaces. These masters can be changed from the default allocation by selecting advanced, on the MPSoC Re-Customisation GUI.

 

The final element of the interrupt structure is the Proxy GIC, which collates the shared interrupts connected to the LPU GIC and provides a series of Interrupt Status Registers, used by the PMU.

 

Now that we understand Zynq UltraScale+ MPSoC interrupts a little more, we will look at how we can use these within our designs going forward.

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

Adam Taylor’s MicroZed Chronicles, Part 190: TFTP Boot

by Xilinx Employee on ‎05-01-2017 02:16 PM (4,492 Views)

 

By Adam Taylor

 

 

There are times when we want to configure our system over a network. This maybe because we are commissioning equipment in the lab and want to be able to pick up the latest development builds. Alternatively, we may have multiple systems all connected to a central hub, for example an inflight entertainment system, and configuring it over the network simplifies software updates.

 

In our Zynq SoC and Zynq UltraScale+ MPSoC designs, we can configure systems over the network using TFTP (the Trivial File Transfer Protocol) if we correctly configure U-Boot, the second-stage boot loader. To configure and implement this, we need the following:

 

  • A Linux machine (or a Virtual Machine running Linux) to modify and rebuild U-Boot
  • The latest PetaLinux Release for the Zynq SoC or MPSoC with uImage, Device Tree, and RAM Disk
  • The Xilinx SDK to create the Boot.bin file, which contains the FSBL and SSBL (first- and second-stage boot loaders)
  • The TFTP Server running on a network to serve the uImage, Device Tree, and RAM Disk

 

If this is a fresh Linux installation on a virtual machine, we need to ensure that we have the correct packages loaded to support the build. If not already present, we’ll need to install git, libssl, and ARM’s cross-compilation tools. We can do this by executing the following commands using a terminal (I am using Ubuntu 16.04.2 on my Virtual Machine):

 

sudo apt-get install git

sudo apt-get install libssl-dev

sudo apt-get install gcc-arm-linux-gnueabi

sudo apt-get install device-tree-compiler

sudo apt-get install device-tree-overlay

 

 

Once we have done this, we can clone the Xilinx U-Boot Git Hub onto the Linux installation and make the necessary changes to enable boot over TFTP.  To clone the repository, use the following command:

 

 

git clone git://github.com/Xilinx/u-boot-xlnx.git

 

 

Now that we have the source code for U-Boot, we can make the necessary modifications. But first, we should understand a little about how it works. U-Boot is a second-stage boot loader (SSBL), which is loaded into memory by the FSBL. The SSBL then loads the Linux image, Ram Disk, and device tree into memory, allowing the kernel to start. To do this, U-Boot can read the image, device tree, and RAM disc from several different media including an SD card, QSPI flash memory, NOR or NAND flash memory, or it can download these files using TFTP over Ethernet.

 

To ensure that we can boot from TFTP within U-Boot, we need to update the currently selected boot method to configure over a network. Using this boot method, the system will load the FSBL and U-Boot first from an SD Card or QSPI and will then look for the remaining images on an TFTP server. It will not look for these images on the selected media. This process is simple and requires only one file in U-Boot to be updated, which is Zynq_Common.h. You can find this file in:

 

 

<Cloned Location>/u-boot-xlnx/include/configs

 

 

For this example, I am going to be using the ZedBoard configured to boot from an SD Card. That means I need to change Zynq_common.h stored on the SD Card to look for boot files over the network instead of loading further data from the SD Card. We need to change the original script from:

 

 

Image1.jpg 

 

 

To the following:

 

 

Image2.jpg

 

 

We also need to define the ZedBoard’s IP addresses and the server’s address where the images are located. We do this by adding in the following commands within the CONFIG_EXTRA_ENV_SETTINGS definition:

 

 

 

Image3.jpg

 

Setting the server and ZedBoard IP Address

 

 

 

ipaddr is the IP address selected for the ZedBoard while serverip is the IP address of the TFTP server. This definition also defines the names of the image, device tree, and RAM Disk along with the addresses they will be loaded into. Unless you have a need to change them, I advise you to leave them unaltered.

 

Once these modifications have been completed, the next step is to re-build U-Boot and create a bin file to place on the SD Card. To build U-Boot, ensure that you are within the u-boot-xlnx directory and enter the following commands using a terminal:

 

 

export ARCH=arm

export CROSS_COMPILE=arm-linux-gnueabi-

make zynq_zed_config

make

 

 

These commands will produce an ELF file called u-boot. There will be no file extension. We can use this file with the Zedboard FSBL to create the necessary Boot.Bin file for our SD Card. For the purposes of this demo, I am using the latest release from here. I used the provided FSBL elf with the u-boot elf just created to generate a boot.bin within SDK’s create boot image tool.  

 

 

Image4.jpg 

 

 

Creating the BIN Image using SDK

 

 

 

Once the file has been created, we can copy it to the SD Card and insert the memory card into the ZedBoard. Then when we turn the ZedBoard on, it will attempt to locate the server. Therefore, we need to set up the TFTP server prior to this. I used my laptop and a program called TFTPD to create a server.

 

 

With the server set up and the ZedBoard connected to the network and powered on, the images were downloaded over the network and the Linux images were downloaded and successfully booted as shown below in the images.

 

 

 

Image5.jpg

 

Downloading the uImage over TFTP

 

 

 

 

Image6.jpg 

 

 PetaLinux running on the ZedBoard following the TFTP boot

 

 

 

Image7.jpg 

 

 Log File from the TFTP Server showing transfer of the images

 

 

 

 

I will upload the boot.bin file onto the GitHub for those who are using the ZedBoard. We can of course re-build u-boot for a range of development boards. There is more info on building u-boot here.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 MicroZed Chronicles hardcopy.jpg

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg

 

 

 

Adam Taylor’s MicroZed Chronicles, Part 189: Zynq SoC XADC Sampling Modes

by Xilinx Employee ‎04-26-2017 11:04 AM - edited ‎04-26-2017 11:05 AM (4,205 Views)

 

By Adam Taylor

 

In some applications, we wish to maintain the phase relationship between sampled signals. The Zynq SoC’s XADC contains two ADCs, which we can operate simulateneously in lock step to maintain the phase relationship between two sampled signals. To do this, we use the sixteen auxillary inputs with Channels 0-7 assigned to ADC A and channeld 8-15 assigned to ADC B. In simultaneous mode, we can therefore perform conversions on channels 0 to 7 and at the same time, perform conversions on channels 8 to 15.

 

 

Image1.jpg 

 

 

In simultaneous mode, we can also continue to sample the on-chip parameters, however they are not sampled simultaneously. We are unable to perform automatic calibration in simultaneous mode but we can use another mode to perform calibration when needed. This should be sufficent because calibration is generally performed only on power up of the device for most applications.

 

To use the simulatenous mode, we first need a hardware design on Vivado that breaks out the AuX0 and AuX8 channels. On the Zedboard and MicroZed I/O Carrier Cards, these signals are broken out to the AMS connector. This allows me to connect signal sources to the AMS header to stimulate the I/O pin with a signal. For this example, I an using a Digilent Analog Discovery module as signal source.

  

The hardware design within the Zynq for this example appears below:

 

 

Image2.jpg

 

 

Writing the software in SDK for simultaneous mode is very similar to the other modes of operation we have used in the past. The only major difference is that we need to make sure we have configured the simultaneous channels in the sequencer. Once this is done and we have configured the input format we want—bipolar or unipolar, averaging, etc.—we can start the sequencer using the XSM_SEQ_MODE_SIMUL mode definition.

 

When I ran this on the MicroZed set up as shown above and stored 64 samples from both the AUX0 and AUX8 input using input signals that were offset by 180 degrees, I was able to recover the following waveform, which shows the phase relations ship is maintained:

 

 

Image3.jpg 

 

 

If we want, we can also use simultaneous-conversion mode with an external analog multiplexer. All we need to do is configure the design to use the external mux as we did previously. Perhaps the difference this time is that we need to use two external analog multiplexers because we need to be able to select the two channels to convert simultaneously. Also, we need only use three address bits to cover the 0-7 address range, as opposed four address bits that we needed for addressing all sixteen analog inputs when we previously used sequencer mode. We use the lower three address bits of the four available address bits.

 

 

 

Image4.jpg

 

 

 

At this point, the only XADC mode that we have not looked at is independent mode. This mode is like the XADC’s default (safe) mode, however in independent mode ADC A monitors the internal on chip parameters while ADC B samples the external inputs. Independent mode is intended to implement a monitoring mode. As such, the alarms are active so you can use this mode for implementing security and anti-tamper features in your design.

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

Plethora IIoT develops cutting‑edge solutions to Industry 4.0 challenges using machine learning, machine vision, and sensor fusion. In the video below, a Plethora IIoT Oberon system monitors power consumption, temperature, and the angular speed of three positioning servomotors in real time on a large ETXE-TAR Machining Center for predictive maintenance—to spot anomalies with the machine tool and to schedule maintenance before these anomalies become full-blown faults that shut down the production line. (It’s really expensive when that happens.) The ETXE-TAR Machining Center is center-boring engine crankshafts. This bore is the critical link between a car’s engine and the rest of the drive train including the transmission.

 

 

 

Plethora IIoT Oberon System.jpg 

 

 

 

Plethora uses Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs as the heart of its Oberon system because these devices’ unique combination of software-programmable processors, hardware-programmable FPGA fabric, and programmable I/O allow the company to develop real-time systems that implement sensor fusion, machine vision, and machine learning in one device.

 

Initially, Plethora IIoT’s engineers used the Xilinx Vivado Design Suite to develop their Zynq-based designs. Then they discovered Vivado HLS, which allows you to take algorithms in C, C++, or SystemC directly to the FPGA fabric using hardware compilation. The engineers’ first reaction to Vivado HLS: “Is this real or what?” They discovered that it was real. Then they tried the SDSoC Development Environment with its system-level profiling, automated software acceleration using programmable logic, automated system connectivity generation, and libraries to speed programming. As they say in the video, “You just have to program it and there you go.”

 

Here’s the video:

 

 

 

 

Plethora IIoT is showcasing its Oberon system in the Industrial Internet Consortium (IIC) Pavilion during the Hannover Messe Show being held this week. Several other demos in the IIC Pavilion are also based on Zynq All Programmable devices.

 

Adam Taylor’s MicroZed Chronicles, Part 188: Pmods!

by Xilinx Employee ‎04-24-2017 10:34 AM - edited ‎04-24-2017 10:43 AM (7,323 Views)

 

By Adam Taylor

 

Over the length of this series, we have looked at several different development boards. One thing that is common to many of these boards: they provide one or more Pmod (Peripheral module) connections that allow us to connect small peripherals to our boards. Pmods expand our prototype designs to create final systems. We have not looked in much detail at Pmods but they are an important aspect of many developments. As such, it would be remiss for me not to address them.

The Pmod standard itself was developed by Digilent and is an open-source de facto standard to ensure wide adoption of this very useful interface. There’s a wide range of available Pmods from DA/AD convertors to GPS receivers and OLED displays.

 

Over the years, we have looked at several Zynq-based boards with at least one Pmod port. In some cases, these boards provide Pmod ports that are connected to either the Zynq SoC’s PL (programmable logic), the PS (processing system), or both. If a PS connection is used, we can use the Zynq SoC’s MIO to provide the interface. If the Pmod connection is to the PL, then we need to create our own interface to the Pmod device. Regardless of whether we use the PL or the PS, we will need a software driver to interface with it.

 

 

Image1.jpg

 

Various Zynq-based dev boards and their Pmod connections

 

 

 

That comment may initially bring you to the thought that we need to develop our own Pmod drivers from scratch. This of course increases the time it takes to develop the application. For many Pmods, this is not the case. There is wide range of existing drivers we can use for both the PL and PS we can use within our designs.

 

The first thing we need to do it download the Digilent Vivado library. This library contains several Pmod drivers and DVI sinks and sources plus other very useful IP blocks that can accelerate our design.

 

Once you have downloaded this library, examine the file structure. You will notice multiple folders under the Pmods folder. Each of these folders is named for an available Pmod (e.g. Pmod_AD2 which is ADC). Within each of these drivers, you will see files structures as shown below:

 

 

Image2.jpg 

 

 

Within this structure, the folders contain:

 

  • Drivers – C source files for use in SDK. These files provide drivers and, in many cases, example applications.
  • Src – The HDL source for the IP module.
  • Utils – Contains the board interfaces, e.g. the outputs.
  • Xgui – Contains TCL files used to instantiate the IP modules.

 

The next step, if we wish to use these IP modules, is to include the directory as a repository within our Vivado design. We do this by selecting the project settings within our project. We can add a new repository pointing to the Digilent Vivado library we have just downloaded using the IP settings repository manager tab:

 

 

 

Image3.jpg

 

 

 

Once this is done, we should be able to see the Pmod IP cores within the Vivado IP Catalog. We can then use these IP cores in within our design in the same way we use all other IP.

 

 

 

Image4.jpg

 

 

 

Once we have created our block diagram in Vivado, we can customize the Pmod IP blocks and select the Pmod Port they are connected to—assuming the board definition for the development board we are using supports that.

 

 

Image5.jpg

 

 

 

 

In the case below, which targets the new Digilent ARTY Z7 board, the AD2 Pmod is being connected to Pmod Port B:

 

 

Image6.jpg

 

 

 

If we are unable to find a driver for the Pmod we want to use, we can use the Pmod Bridge driver, which will enable us to create an interface to the desired Pmod with the correct pinout.

 

When it comes to software, all we need to do is import the files from the drivers/<Pmod_name>/src directory to our SDK project. Adding these files will provide a range of drives that we can use to interface with the Pmod PL instantiation and talk to the connected Pmod. If there is example code available, we will find this under the drivers/<Pmod name>/examples directory. When I ran this example code for the PmodAD2 it worked as expected:

 

 

Image7.jpg

 

 

This enables us to get our designs up and running even faster.

 

 

My code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

 

The Vivado Design Suite HLx Editions 2017.1 release is now available for download. The Vivado HL Design Edition and HL System Edition now support partial reconfiguration. Partial reconfiguration is available for the Vivado WebPACK Edition at a reduced price.

 

Xilinx partial reconfiguration technology allows you to swap FPGA-based functions in and out of your design on the fly, eliminating the need to fully reconfigure the FPGA and re-establish links. Partial reconfigurability gives you the ability to update feature sets in deployed systems, fix bugs, and migrate to new standards while critical functions remain active. This capability dramatically expands the flexible use of Xilinx All Programmable designs in a truly wide variety of applications.

 

For example, a detailed article published on the WeChat Web site by Siglent about the company’s new, entry-level SDS1000X-E DSO family—based on a Xilinx Zynq Z-7020 SoC—suggests that the new DSO family’s system design employs the Zynq SoC’s partial-reconfiguration capability to further reduce the parts count and the board footprint: “The PL section has 220 DSP slices and 4.9 Mb Block RAM; coupled with high throughput between the PS and PL data interfaces, we have the flexibility to configure different hardware resources for different digital signal processing.” (See “Siglent 200MHz, 1Gsample/sec SDS1000X-E Entry-Level DSO family with 14M sample points is based on Zynq SoC.”)

 

 

 

Siglent SDS1202X-E DSO.jpg
 

 

 

Siglent’s new, entry-level SDS1000X-E DSO family is based on a Xilinx Zynq Z-7020 SoC

 

 

 

In addition, the Vivado 2017.1 release includes support for the Xilinx Spartan-7 7S50 FPGA (Vivado WebPACK support will be in a later release). The Spartan-7 FPGAs are the lowest-cost devices in the 28nm Xilinx 7 series and they’re optimized for low, low cost per I/O while delivering terrific performance/watt. Compared to Xilinx Spartan-6 FPGAs, Spartan-7 FPGAs run at half the power consumption (for comparable designs) and with 30% more operating frequency. The Spartan-7 S50 FPGA is a mid-sized family member with 52,160 logic cells, 2.7Mbits of BRAM, 120 DSP slices, and 250 single-ended I/O pins. It’s a very capable FPGA. (For more information about the Spartan-7 FPGA family, see “Today, there are six new FPGAs in the Spartan-7 device family. Want to meet them?” and “Hot (and Cold) Stuff: New Spartan-7 1Q Commercial-Grade FPGAs go from -40 to +125°C!”)

 

 

Spartan-7 Family Table with 1Q devices.jpg 

 

Spartan-7 FPGA Family Table

 

 

 

 

 

AT&T recently announced the development of a one-of-a-kind 5G channel sounder—internally dubbed the “Porcupine” for obvious reasons—that can characterize a 5G transmission channel using 6000 angle-of-arrival measurements in 150msec, down from 15 minutes using conventional pan/tilt units. These channel measurements capture how wireless signals are affected in a given environment. For instance, channel measurements can show how objects such as trees, buildings, cars, and even people reflect or block 5G signals. The Porcupine allows measurement of 5G mmWave frequencies via drive testing, something that was simply not possible using other mmWave channel sounders. Engineers at AT&T used the mmWave Transceiver System and LabVIEW System Design Software including LabVIEW FPGA from National Instruments (NI) to develop this system.

 

 

 

AT&T Porcupine channel sounder.jpg

 

 

AT&T “Porcupine” 5G Channel Sounder

 

 

 

NI designed the mmWave Transceiver System as a modular, reconfigurable SDR platform for 5G R&D projects. This prototyping platform offers 2GHz of real-time bandwidth for evaluating mmWave transmission systems using NI’s modular transmit and receive radio heads in conjunction with the transceiver system’s modular PXIe processing chassis.

 

The key to this system’s modularity is NI’s 18-slot PXIe-1085 chassis, which accepts a long list of NI processing modules as well as ADC, DAC, and RF transceiver modules. NI’s mmWave Transceiver System uses the NI PXIe-7902 FPGA module—based on a Xilinx Virtex-7 485T—for real-time processing.

 

 

NI PXIe-7902 FPGA Module.jpg

 

 

NI PXIe-7902 FPGA module based on a Xilinx Virtex-7 485T

 

 

NI’s mmWave Transceiver System maps different mmWave processing tasks to multiple FPGAs in a software-configurable manner using the company’s LabVIEW System Design Software. NI’s LabVIEW relies on the Xilinx Vivado Design Suite for compiling the FPGA configurations. The FPGAs distributed in the NI mmWave Transceiver System provide the flexible, high-performance, low-latency processing required to quickly build and evaluate prototype 5G radio transceiver systems in the mmWave band—like AT&T’s Porcupine.

 

 

 

By Adam Taylor

 

Having introduced the Real-Time Clock (RTC) in the Xilinx Zynq UltraScale+ MPSoC, the next step is to write some simple software to set the time, get the time, and calibrate the RTC. Doing this is straightforward and aligns with how we use other peripherals in the Zynq MPSoC and Zynq-7000 SoC.

 

 

Image1.jpg

 

 

Like all Zynq peripherals, the first thing we need to do with the RTC is look up the configuration and then use it to initialize the peripheral device. Once we have the RTC initialized, we can configure and use it. We can use the functions provided in the xrtcpsu.h header file to initialize and use the RTC. All we need to do is correctly set up the driver instance and include the xrtcpsu.h header file. If you want to examine the file’s contents, you will find them within the generated BSP for the MPSoC. Under this directory, you will also find all the other header files needed for your design. Which files are available depends upon how you configured the MPSoC in Vivado (e.g. what peripherals are present in the design).

 

We need to use a driver instance to use the RTC within our software application. For the RTC, that’s XRtcPsu, which defines the essential information such as the device configuration, oscillator frequency, and calibration values. This instance is used in all interactions with the RTC using the functions in the xrtcpsu.h header file.

 

 

Image2.jpg

 

As I explained last week, the RTC counts the number of seconds, so we will need to convert to and from values in units of seconds. The xrtcpsu.h header file contains several functions to support these conversions. To support this, we’ll use a C structure to hold the real date prior to conversion and loading into the RTC or to hold the resultant conversion date following conversion from the seconds counter.

 

 

Image3.jpg

 

 

 

We can use the following functions to set or read the RTC (which I did in the code example available here):

 

  • XRtcPsu_GetCurrentTime – Gets the current time in seconds from the RTC
  • XRtcPsu_SecToDateTime – Converts the time in seconds to the date format contained within XRtcPSU_DT
  • XRtcPsu_DateTimeToSec – Converts the date in a format of XRtcPsu_DT into seconds
  • XRtcPsu_SetTime – Sets the RTC to the current time in seconds

 

By convention, the functions used to set the RTC seconds counter is based on a time epoch from 1/1/2000. If we are going to be using internet time, which is often based on a 1/1/1970 epoch by a completely different convention, we will need to convert from one format to another. The functions provided for the RTC only support years between 2000 and 2099.

 

In the example code, we’ve used these functions to report the last set time before allowing the user to enter the time over using a UART. Once the time has been set, the RTC is calibrated before being re-initialized. The RTC is then read once a second and the values output over the UART giving the image shown at the top of this blog. This output will continue until the MPSoC is powered down.

 

To really exploit the capabilities provided by the RTC, we need to enable the interrupts. I will look at RTC interrupts in the Zynq MPSoC in the next issue of the MicroZed Chronicles, UltraZed Edition. Once we understand how interrupts work, we can look at the RTC alarms. I will also fit a battery to the UltraZed board to test its operation on battery power.

 

The register map with the RTC register details can be found here.

 

 

My code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

By Adam Taylor

 

 

Having introduced the Aldec TySOM-2 FPGA Prototyping Board, based on the Xilinx Zynq SoC, and the face detection application running on it, I thought it would be a good idea to take a more detailed examination of the face-detection application’s architecture.

 

The face detection example uses one Blue Eagle camera, which is connected to the Aldec FMC-ADAS card. The processed frames showing the detected face are output via the TySOM-2 board’s HDMI port. What is worth pointing out is that the application running on the TySOM-2 board, face detection in this case, is enabled by the software. The Zynq PL (programmable logic) hardware design provides the capability to interface with the camera, for sharing the video frames with the Zynq PS (processing system) through the DDR SDRAM, and for display output.

 

Any application could be implemented—not just face detection. It could be object tracking. I could be corner detection. It could be anything. This is one of the things that makes development of image-processing systems on the Zynq so powerful. We can use the same base platform on the TySOM-2 board and customize the application in software. Of course, we can also use the Xilinx SDSoC development environment to further accelerate the algorithm into the TySOM-2 platform’s remaining resources to increase performance.

 

The Blue Eagle camera transmits the video stream using a, FPD-Link III link. These links use a high-speed, bi-directional CML (Current Mode Logic) link to transfer the image data. An FPD-Link III receiving device (a TI DS90UB914Q-Q1 FPD-Link III SER/DES) is used on the ADAS FMC to implement this camera interface. This device is configured for the application in hand using the I2C peripheral in the Zynq SoC’s PS. This device provides video to the Zynq PL in a parallel format: the parallel data bits, HSync, VSync, and a pixel clock.

 

 

Image1.jpg 

 

 

We need to process the frames and store them within the Zynq PS’ DDR SDRAM using Video DMA (Direct Memory Access) to ensure that we can access the image frames within DDR memory using the Zynq SoC’s ARM Cortex-A9 processor. We need to use several IP blocks that come as standard IP within Vivado to implement this. These IP blocks transfer data using the AXI streaming protocol--AXIS.

 

Therefore, the first thing needed is to convert the received video in parallel format into an AXIS stream. Once the video is in the correct format, we can use the VDMA IP block to transfer video data to and from the Zynq PS’ DDR SDRAM, where the software running on the Zynq SoC’s ARM Cortex-A9 processors can access the frames and implement the application algorithms.

 

Unlike previous examples we have examined, which used a single AXI High Performance (AXI HP) port, this example uses two of the Zynq SOC’s AXI HP interface ports, one in each direction. This configuration requires a slightly more complicated DMA architecture because we’ll need two VDMA IP Blocks. Within the Zynq PL, the AXI standard used for most IP blocks is AXI 4.0 while the ports on the Zynq SoC implement AXI 3.0. Therefore, we need to use an AXI Interconnect or a protocol convertor to convert between the two standards.

 

 

Image2.jpg

 

 

 

This use of two interfaces will make no performance difference when compared to a single HP AXI interface because the S0 and S1 AXI HP Ports on the Zynq SoC which are used by this configuration are multiplexed down to the M0 port on the memory interconnect and finally connected to the S3 port on the DDR SDRAM controller. This is shown below in the interconnection diagram from UG585, the TRM for the Zynq SoC.

 

 

 

Image3.jpg 

 

 

Once the VDMA is implemented, the design then perform color-space conversion, chroma resampling, and finally passes to an on-screen display module. Once this has been completed, the video stream must be converted from AXIS to parallel video, which can then be output to the HDMI transmitter.

 

With this hardware platform completed, the next step is to write the software to create the application. For this we have the choice of using SDK or using SDSoC, which adds the ability to accelerate some of the application algorithm functions using programmable logic. As this example is implemented on the Zynq Z-7100 SoC, which has a significant amount of free, on-chip programmable resources following the implementation of the base platform, we’ll be using SDSoC for this example. We will look at the software architecture next time.

 

My code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

If you have a mere 14 minutes to spare, you can watch this new video that will show you how to set up the Zynq UltraScale+ MPSoC’s hardened, embedded PCIe block as a Root Port using the Vivado Design Suite. The target system is the ZCU102 eval kit (currently on sale for half price) and the video shows you how to use the PetaLinux tools to connect to a PCIe-connected NVMe SSD.

 

This is a fast, painless way to see a complete set of Xilinx development tools being used to create a fully operational system based on the Zynq UltraScale+ MPSoC in less than a quarter of an hour.

 

 

 

 

 

Hardent, A Xilinx Approved Training Provider, is conducting a free 1-hour Webinar that will discuss best practices for Constraints Management using the the Vivado Design Suite on April 18. Topic covered will include:

 

  • Four steps to creating clock constraints
  • The importance of Baselining a design
  • The four different methods to enter design constraints
  • How to include variables and conditional statements within constraint files
  • How to modify and remove constraints both manually and using graphical tools
  • Shortcuts to creating and adding constraints directly from Xilinx reports and design-analysis tools

 

Register here.

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.