UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

In this 40-minute webinar, Xilinx will present a new approach that allows you to unleash the power of the FPGA fabric in Zynq SoCs and Zynq UltraScale+ MPSoCs using hardware-tuned OpenCV libraries, with a familiar C/C++ development environment and readily available hardware development platforms. OpenCV libraries are widely used for algorithm prototyping by many leading technology companies and computer vision researchers. FPGAs can achieve unparalleled compute efficiency on complex algorithms like dense optical flow and stereo vision in only a few watts of power.

 

This Webinar is being held on July 12. Register here.

 

Here’s a fairly new, 4-minute video showing a 1080p60 Dense Optical Flow demo, developed with the Xilinx SDSoC Development Environment in C/C++ using OpenCV libraries:

 

 

 

 

For related information, see Application Note XAPP1167, “Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries.”

 

Plethora IIoT develops cutting‑edge solutions to Industry 4.0 challenges using machine learning, machine vision, and sensor fusion. In the video below, a Plethora IIoT Oberon system monitors power consumption, temperature, and the angular speed of three positioning servomotors in real time on a large ETXE-TAR Machining Center for predictive maintenance—to spot anomalies with the machine tool and to schedule maintenance before these anomalies become full-blown faults that shut down the production line. (It’s really expensive when that happens.) The ETXE-TAR Machining Center is center-boring engine crankshafts. This bore is the critical link between a car’s engine and the rest of the drive train including the transmission.

 

 

 

Plethora IIoT Oberon System.jpg 

 

 

 

Plethora uses Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs as the heart of its Oberon system because these devices’ unique combination of software-programmable processors, hardware-programmable FPGA fabric, and programmable I/O allow the company to develop real-time systems that implement sensor fusion, machine vision, and machine learning in one device.

 

Initially, Plethora IIoT’s engineers used the Xilinx Vivado Design Suite to develop their Zynq-based designs. Then they discovered Vivado HLS, which allows you to take algorithms in C, C++, or SystemC directly to the FPGA fabric using hardware compilation. The engineers’ first reaction to Vivado HLS: “Is this real or what?” They discovered that it was real. Then they tried the SDSoC Development Environment with its system-level profiling, automated software acceleration using programmable logic, automated system connectivity generation, and libraries to speed programming. As they say in the video, “You just have to program it and there you go.”

 

Here’s the video:

 

 

 

 

Plethora IIoT is showcasing its Oberon system in the Industrial Internet Consortium (IIC) Pavilion during the Hannover Messe Show being held this week. Several other demos in the IIC Pavilion are also based on Zynq All Programmable devices.

 

 

You are never going to get past a certain performance barrier by compiling C for a software-programmable processor. At some point, you need hardware acceleration.

 

As an analogy: You can soup up a car all you want; it’ll never be an airplane.

 

Sure, you can bump the processor clock rate. You can add processor cores and distribute the tasks. Both of these approaches increase power consumption, so you’ll need a bigger and more expensive power supply; they increase heat generation, which means you will need better cooling and probably a bigger heat sink or a fan (or another fan); and all of these things increase BOM costs.

 

Are you sure you want to take that path? Really?

 

OK, you say. This blog’s from an FPGA company (actually, Xilinx is an “All Programmable” company), so you’ll no doubt counsel me to use an FPGA to accelerate these tasks and I don’t want to code in Verilog or VHDL, thank you very much.

 

Not a problem. You don’t need to.

 

You can get the benefit of hardware acceleration while coding in C or C++ using the Xilinx SDSoC development environment. SDSoC produces compiled software automatically coupled to hardware accelerators and all generated directly from your high-level C or C++ code.

 

That’s the subject of a new Chalk Talk video just posted on the eejournal.com Web site. Here’s one image from the talk:

 

 

SDSoC Acceleration Results.jpg

 

 

This image shows three complex embedded tasks and the improvements achieved with hardware acceleration:

 

 

  • 2-camera, 3D disparity mapping – 292x speed improvement

 

  • Sobel filter video processing – 30x speed improvement

 

  • Binary neural network – 1000x speed improvement

 

 

A beefier software processor or multiple processor cores will not get you 1000x more performance—or even 30x—no matter how you tweak your HLL code, and software coders will sweat bullets just to get a few percentage points of improvement. For such big performance leaps, you need hardware.

 

Here’s the 14-minute Chalk Talk video:

 

 

 

 

 

By Adam Taylor

 

 

Having introduced the Aldec TySOM-2 FPGA Prototyping Board, based on the Xilinx Zynq SoC, and the face detection application running on it, I thought it would be a good idea to take a more detailed examination of the face-detection application’s architecture.

 

The face detection example uses one Blue Eagle camera, which is connected to the Aldec FMC-ADAS card. The processed frames showing the detected face are output via the TySOM-2 board’s HDMI port. What is worth pointing out is that the application running on the TySOM-2 board, face detection in this case, is enabled by the software. The Zynq PL (programmable logic) hardware design provides the capability to interface with the camera, for sharing the video frames with the Zynq PS (processing system) through the DDR SDRAM, and for display output.

 

Any application could be implemented—not just face detection. It could be object tracking. I could be corner detection. It could be anything. This is one of the things that makes development of image-processing systems on the Zynq so powerful. We can use the same base platform on the TySOM-2 board and customize the application in software. Of course, we can also use the Xilinx SDSoC development environment to further accelerate the algorithm into the TySOM-2 platform’s remaining resources to increase performance.

 

The Blue Eagle camera transmits the video stream using a, FPD-Link III link. These links use a high-speed, bi-directional CML (Current Mode Logic) link to transfer the image data. An FPD-Link III receiving device (a TI DS90UB914Q-Q1 FPD-Link III SER/DES) is used on the ADAS FMC to implement this camera interface. This device is configured for the application in hand using the I2C peripheral in the Zynq SoC’s PS. This device provides video to the Zynq PL in a parallel format: the parallel data bits, HSync, VSync, and a pixel clock.

 

 

Image1.jpg 

 

 

We need to process the frames and store them within the Zynq PS’ DDR SDRAM using Video DMA (Direct Memory Access) to ensure that we can access the image frames within DDR memory using the Zynq SoC’s ARM Cortex-A9 processor. We need to use several IP blocks that come as standard IP within Vivado to implement this. These IP blocks transfer data using the AXI streaming protocol--AXIS.

 

Therefore, the first thing needed is to convert the received video in parallel format into an AXIS stream. Once the video is in the correct format, we can use the VDMA IP block to transfer video data to and from the Zynq PS’ DDR SDRAM, where the software running on the Zynq SoC’s ARM Cortex-A9 processors can access the frames and implement the application algorithms.

 

Unlike previous examples we have examined, which used a single AXI High Performance (AXI HP) port, this example uses two of the Zynq SOC’s AXI HP interface ports, one in each direction. This configuration requires a slightly more complicated DMA architecture because we’ll need two VDMA IP Blocks. Within the Zynq PL, the AXI standard used for most IP blocks is AXI 4.0 while the ports on the Zynq SoC implement AXI 3.0. Therefore, we need to use an AXI Interconnect or a protocol convertor to convert between the two standards.

 

 

Image2.jpg

 

 

 

This use of two interfaces will make no performance difference when compared to a single HP AXI interface because the S0 and S1 AXI HP Ports on the Zynq SoC which are used by this configuration are multiplexed down to the M0 port on the memory interconnect and finally connected to the S3 port on the DDR SDRAM controller. This is shown below in the interconnection diagram from UG585, the TRM for the Zynq SoC.

 

 

 

Image3.jpg 

 

 

Once the VDMA is implemented, the design then perform color-space conversion, chroma resampling, and finally passes to an on-screen display module. Once this has been completed, the video stream must be converted from AXIS to parallel video, which can then be output to the HDMI transmitter.

 

With this hardware platform completed, the next step is to write the software to create the application. For this we have the choice of using SDK or using SDSoC, which adds the ability to accelerate some of the application algorithm functions using programmable logic. As this example is implemented on the Zynq Z-7100 SoC, which has a significant amount of free, on-chip programmable resources following the implementation of the base platform, we’ll be using SDSoC for this example. We will look at the software architecture next time.

 

My code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

Here’s a 40-minute teardown video of a Vision Research Phantom v5 high-speed high-speed, 1024x1024-pixel, 1000frames/sec video camera (circa 2001) from tesla500’s YouTube video channel. His methodical teardown and excellent system-level explanation uncovers a couple of “huge” Xilinx XC4020 FPGAs (circa 2000) on the timing and interface boards and Xilinx XC9500 CPLDs implementing the timing and control on the four high-speed capture-memory boards. There’s also a Hitachi SH-2 32-bit RISC processor with a hardware MAC (for DSP) on the timing board.

 

The XC4020 FPGAs are 3rd-generation devices that each have 784 CLBs (1560 LUTs total). They were big in their day but they’re very small now. These days, I think you could implement all of the digital timing and control circuitry in this camera including the SH-2 processor’s capabilities using the smallest single-core Zynq Z-7007S SoC—with the ARM Cortex-A9 processor in the Zynq SoC running considerably more than 20x faster than the turn-of-the-millennium SH-2 processor’s roughly 28MHz maximum clock rate.

 

Of course, Vision Research has moved far beyond 1000 frames/sec over the past 17 years. Its latest cameras can go 1000x faster than that, hitting 1M frames/sec when configured with the company’s FAST option (fast indeed!), while the Phantom v5 is no longer listed even on the company’s “discontinued cameras” page. Nevertheless, I found tesla500’s teardown and explanations fascinating and valuable.

 

Of course, Xilinx All Programmable devices have long been used to design advanced video equipment like the Vision Research Phantom v5 high-speed camera. Which allows me to quickly remind you of the recent launch of the Xilinx reVISION stack launch for embedded-vision applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)

 

And now, here’s tesla500’s Vision Research Phantom v5 high-speed camera teardown video:

 

 

 

 

 

 

 

How to use machine learning for embedded vision—and many other embedded applications

by Xilinx Employee ‎03-30-2017 10:02 AM - edited ‎03-30-2017 12:00 PM (1,475 Views)

 

Image3.jpg Adam Taylor and Xilinx’s Sr. Product Manager for SDSoC and Embedded Vision Nick Ni have just published an article on the EE News Europe Web site titled “Machine learning in embedded vision applications.” That title’s pretty self-explanatory, but there are a few points I’d like to highlight. Then you can go read the full article yourself.

 

As the article states, “Machine learning spans several industry mega trends, playing a very prominent role within not only Embedded Vision (EV), but also Industrial Internet of Things (IIoT) and Cloud Computing.” In other words, if you’re designing products for any embedded market, you might well find yourself at a competitive disadvantage if you’re not adding machine-learning features to your road map.

 

This article closely ties machine learning with neural networks (including Feed-forward Neural Networks (FNNs), Recurrent Neural Networks (RNNs), and Deep Neural Networks (DNNs), and Convolutional Neural Networks (CNNs)). Neural networks are not programmed; they’re trained. Then, if they’re part of an embedded design, they’re deployed. Training is usually done using floating-point neural-network implementations but, for efficiency (power and cost), deployed neural networks can use fixed-point representations with very little or no loss of accuracy. (See “Counter-Intuitive: Fixed-Point Deep-Learning Inference Delivers 2x to 6x Better CNN Performance with Great Accuracy.”)

 

The programmable logic inside of Xilinx FPGAs, Zynq SoCs, and Zynq UltraScale+ MPSoCs is especially good at implementing fixed-point neural networks, as described in this article by Nick Ni and Adam Taylor. (Go read the article!)

 

Meanwhile, this is a good time to remind you of the recent Xilinx introduction of the reVISION stack for neural network development using Xilinx All Programmable devices. For more information about the Xilinx reVISION stack, see:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MathWorks’ HDL Coder wins Embedded World AWARD in Nuremberg last week

by Xilinx Employee ‎03-21-2017 02:07 PM - edited ‎03-22-2017 05:59 AM (1,204 Views)

 

The organizers of last week’s Embedded World show in Nuremberg gave out embedded AWARDS in three categories last week during the show and MathWorks’ HDL Coder won in the tools category. (See the announcement here.) If you don’t know about this unique development tool, now is a good time to become acquainted with it. HDL Coder accepts model-based designs created using MathWorks’ MATLAB and Simulink and can generate VHDL or Verilog for all-hardware designs or hardware and software code for designs based on a mix of custom hardware and embedded software running on a processor. That means that HDL Coder works well with Xilinx FPGAs and Zynq SoCs.

 

Here’s a diagram of what HDL Coder does:

 

 

MathWorks HDL Coder.jpg 

 

 

You might also want to watch this detailed MathWorks video titled “Accelerate Design Space Exploration Using HDL Coder Optimizations.” (Email registration required.)

 

 

For more information about using MathWorks HDL Coder to target your designs for Xilinx devices, see:

 

 

 

 

 

 

 

 

 

Image3.jpgAEye is the latest iteration of the eye-tracking technology developed by EyeTech Digital Systems. The AEye chip is based on the Zynq Z-7020 SoC. It’s located immediately adjacent to the imaging sensor, which creates compact, stand-alone systems. This technology is finding its way into diverse vision-guided systems in the automotive, AR/VR, and medical diagnostic arenas. According to EyeTech, the Zynq SoC’s unique abilities allows the company to create products they could not do any other way.

 

With the advent of the reVISION stack, EyeTech is looking to expand its product offerings into machine learning, as discussed in this short, 3-minute video:

 

 

 

 

 

 

For more information about EyeTech, see:

 

 

 

 

EETimes’ Junko Yoshida with some expert help analyzes this week’s Xilinx reVISION announcement

by Xilinx Employee ‎03-15-2017 01:25 PM - edited ‎03-22-2017 07:20 AM (1,192 Views)

 

Image3.jpgThis week, EETimes’ Junko Yoshida published an article titled “Xilinx AI Engine Steers New Course” that gathers some comments from industry experts and from Xilinx with respect to Monday’s reVISION stack announcement. To recap, the Xilinx reVISION stack is a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference.

 

(See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)

 

As Xilinx Senior Vice President of Corporate Strategy Steve Glaser tells Yoshida, “Xilinx designed the stack to ‘enable a much broader set of software and systems engineers, with little or no hardware design expertise to develop, intelligent vision guided systems easier and faster.’

 

Yoshida continues:

 

While talking to customers who have already begun developing machine-learning technologies, Xilinx identified ‘8 bit and below fixed point precision’ as the key to significantly improve efficiency in machine-learning inference systems.

 

 

Yoshida also interviewed Karl Freund, Senior Analyst for HPC and Deep Learning at Moor Insights & Strategy, who said:

 

Artificial Intelligence remains in its infancy, and rapid change is the only constant.” In this circumstance, Xilinx seeks “to ease the programming burden to enable designers to accelerate their applications as they experiment and deploy the best solutions as rapidly as possible in a highly competitive industry.

 

 

She also quotes Loring Wirbel, a Senior Analyst at The Linley group, who said:

 

What’s interesting in Xilinx's software offering, [is that] this builds upon the original stack for cloud-based unsupervised inference, Reconfigurable Acceleration Stack, and expands inference capabilities to the network edge and embedded applications. One might say they took a backward approach versus the rest of the industry. But I see machine-learning product developers going a variety of directions in trained and inference subsystems. At this point, there's no right way or wrong way.

 

 

There’s a lot more information in the EETimes article, so you might want to take a look for yourself.

 

 

 

EEJournal’s Kevin Morris weighs in on Monday’s Xilinx reVISION stack launch for embedded-vision apps

by Xilinx Employee ‎03-14-2017 01:35 PM - edited ‎03-22-2017 07:20 AM (1,144 Views)

 

Image3.jpgToday, EEJournal’s Kevin Morris has published a review article of the announcement titled “Teaching Machines to See: Xilinx Launches reVISION” following Monday’s announcement of the Xilinx reVISION stack for developing vision-guided applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”

 

Morris writes:

 

But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms.

 

“That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric - such as Xilinx Zynq and Altera SoC FPGAs - absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space.

 

Later on, Morris gets to the fly in the ointment:

 

“Oh, yeah, There’s still that “almost impossible to program” issue.”

 

And then he gets to the solution:

 

reVISION, announced this week, is a stack - a set of tools, interfaces, and IP - designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development - with customers predicting cuts of as much as 12 months from current schedules for new product and update development.

 

In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.

 

The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ⅕ the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications.

 

 

But don’t take my word for it, read Morris’ article yourself.

 

 

 

 

 

As part of today’s reVISION announcement of a new, comprehensive development stack for embedded-vision applications, Xilinx has produced a 3-minute video showing you just some of the things made possible by this announcement.

 

Here it is:

 

 

Adam Taylor’s MicroZed Chronicles, Part 177: Introducing the reVision stack

by Xilinx Employee ‎03-13-2017 10:39 AM - edited ‎03-22-2017 07:19 AM (2,017 Views)

 

By Adam Taylor

 

Several times in this series, we have looked at image processing using the Avnet EVK and the ZedBoard. Along with the basics, we have examined object tracking using OpenCV running on the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PS (processing system) and using HLS with its video library to generate image-processing algorithms for the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic, see blogs 140 to 148 here).

 

Xilinx’s reVision is an embedded-vision development stack that provides support for a wide range of frameworks and libraries often used for embedded-vision applications. Most exciting, from my point of view, is that the stack includes acceleration-ready OpenCV functions.

 

Image1.jpg 

 

 

The stack itself is split into three layers. Once we select or define our platform, we will be mostly working at the application and algorithm layers. Let’s take a quick look at the layers of the stack:

 

  1. Platform layer: This is the lowest level of the stack and is the one on which the remaining stack layers are built. This layer includes platform definitions of the hardware and the software environment. Should we choose not to use a predefined platform, we can generate a custom platform using Vivado.

 

  1. Algorithm layer: Here we create our application using SDSoC and the platform definition for the target hardware. It is within this layer that we can use the acceleration-ready OpenCV functions along with predefined and optimized implementations for Customized Neural Network (CNN) developments such as inference accelerators within the PL.

 

  1. Application Development Layer: The highest layer of the stack. Development here is where high-level frameworks such as Caffe and OpenVX are used to complete the application.

 

As I mentioned above one of the most exciting aspects of the reVISION stack is the ability to accelerate a wide range of OpenCV functions using the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL. We can group the OpenCV functions that can be hardware-accelerated using the PL into four categories:

 

  1. Computation – Includes functions such as absolute difference between two frames, pixel-wise operations (addition, subtraction and multiplication), gradient, and integral operations
  2. Input Processing – Supports bit-depth conversions, channel operations, histogram equalization, remapping, and resizing.
  3. Filtering – Supports a wide range of filters including Sobel, Custom Convolution, and Gaussian filters.
  4. Other – Provides a wide range of functions including Canny/Fast/Harris edge detection, thresholding, SVM, HoG, LK Optical Flow, Histogram Computation, etc.

 

What is very interesting with these function calls is that we can optimize them for resource usage or performance within the PL. The main optimization method is specifying the number of pixels to be processed during each clock cycle. For most accelerated functions, we can choose to process either one or eight pixels. Processing more pixels per clock cycle reduces latency but increases resource utilization. Processing one pixel per clock minimizes the resource requirements at the cost of increased latency. We control the number of pixels processed per clock in via the function call.

 

Over the next few blogs, we will look more at the reVision stack and how we can use it. However in the best Blue Peter tradition, the image below shows the result of running a reVision Harris OpenCV acceleration function within the PL when accelerated.

 

 

Image2.jpg

 

 

Accelerated Harris Corner Detection in the PL

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge

by Xilinx Employee ‎03-13-2017 07:37 AM - edited ‎03-22-2017 07:19 AM (3,776 Views)

 

Image3.jpgToday, Xilinx announced a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference. It’s called the reVISION stack and it allows design teams without deep hardware expertise to use a software-defined development flow to combine efficient machine-learning and computer-vision algorithms with Xilinx All Programmable devices to create highly responsive systems. (Details here.)

 

The Xilinx reVISION stack includes a broad range of development resources for platform, algorithm, and application development including support for the most popular neural networks: AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN. Additionally, the stack provides library elements such as pre-defined and optimized implementations for CNN network layers, which are required to build custom neural networks (DNNs and CNNs). The machine-learning elements are complemented by a broad set of acceleration-ready OpenCV functions for computer-vision processing.

 

For application-level development, Xilinx supports industry-standard frameworks including Caffe for machine learning and OpenVX for computer vision. The reVISION stack also includes development platforms from Xilinx and third parties, which support various sensor types.

 

The reVISION development flow starts with a familiar, Eclipse-based development environment; the C, C++, and/or OpenCL programming languages; and associated compilers all incorporated into the Xilinx SDSoC development environment. You can now target reVISION hardware platforms within the SDSoC environment, drawing from a pool of acceleration-ready, computer-vision libraries to quickly build your application. Soon, you’ll also be able to use the Khronos Group’s OpenVX framework as well.

 

For machine learning, you can use popular frameworks including Caffe to train neural networks. Within one Xilinx Zynq SoC or Zynq UltraScale+ MPSoC, you can use Caffe-generated .prototxt files to configure a software scheduler running on one of the device’s ARM processors to drive CNN inference accelerators—pre-optimized for and instantiated in programmable logic. For computer vision and other algorithms, you can profile your code, identify bottlenecks, and then designate specific functions that need to be hardware-accelerated. The Xilinx system-optimizing compiler then creates an accelerated implementation of your code, automatically including the required processor/accelerator interfaces (data movers) and software drivers.

 

The Xilinx reVISION stack is the latest in an evolutionary line of development tools for creating embedded-vision systems. Xilinx All Programmable devices have long been used to develop such vision-based systems because these devices can interface to any image sensor and connect to any network—which Xilinx calls any-to-any connectivity—and they provide the large amounts of high-performance processing horsepower that vision systems require.

 

Initially, embedded-vision developers used the existing Xilinx Verilog and VHDL tools to develop these systems. Xilinx introduced the SDSoC development environment for HLL-based design two years ago and, since then, SDSoC has dramatically and successfully shorted development cycles for thousands of design teams. Xilinx’s new reVISION stack now enables an even broader set of software and systems engineers to develop intelligent, highly responsive embedded-vision systems faster and more easily using Xilinx All Programmable devices.

 

And what about the performance of the resulting embedded-vision systems? How do their performance metrics compare against against systems based on embedded GPUs or the typical SoCs used in these applications? Xilinx-based systems significantly outperform the best of this group, which employ Nvidia devices. Benchmarks of the reVISION flow using Zynq SoC targets against Nvidia Tegra X1 have shown as much as:

 

  • 6x better images/sec/watt in machine learning
  • 42x higher frames/sec/watt for computer-vision processing
  • 1/5th the latency, which is critical for real-time applications

 

Image1.jpg 

 

There is huge value to having a very rapid and deterministic system-response time and, for many systems, the faster response time of a design that's been accelerated using programmable logic can mean the difference between success and catastrophic failure. For example, the figure below shows the difference in response time between a car’s vision-guided braking system created with the Xilinx reVISION stack running on a Zynq UltraScale+ MPSoC relative to a similar system based on an Nvidia Tegra device. At 65mph, the Xilinx embedded-vision system’s response time stops the vehicle 5 to 33 feet faster depending on how the Nvidia-based system is implemented. Five to 33 feet could easily mean the difference between a safe stop and a collision.

 

 

Image2.jpg 

 

(Note: This example appears in the new Xilinx reVISION backgrounder.)

 

 

The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down. Many new types of neural networks for vision-guided systems have emerged along with new techniques that make deployment of these neural networks much more efficient. No matter what you develop today or implement tomorrow, the hardware and I/O reconfigurability and software programmability of Xilinx All Programmable devices can “future-proof” your designs whether it’s to permit the implementation of new algorithms in existing hardware; to interface to new, improved sensing technology; or to add an all-new sensor type (like LIDAR or Time-of-Flight sensors, for example) to improve a vision-based system’s safety and reliability through advanced sensor fusion.

 

Xilinx is pushing even further into vision-guided, machine-learning applications with the new Xilinx reVISION Stack and this announcement complements the recently announced Reconfigurable Acceleration Stack for cloud-based systems. (See “Xilinx Reconfigurable Acceleration Stack speeds programming of machine learning, data analytics, video-streaming apps.”) Together, these new development resources significantly broaden your ability to deploy machine-learning applications using Xilinx technology—from inside the cloud to the very edge.

 

 

You might also want to read “Xilinx AI Engines Steers New Course” by Junko Yoshida on the EETimes.com site.

 

 

 

The amazing “snickerdoodle one”—a low-cost, single-board computer with wireless capability based on the Xilinx Zynq Z-7010 SoC—is once more available for purchase on the Crowd Supply crowdsourcing Web site. Shipments are already going out to existing backers and, if you missed out on the original crowdsourcing campaign, you can order one for the post-campaign price of $95. That’s still a huuuuge bargain in my book. (Note: There is a limited number of these boards available, so if you want one, now’s the time to order it.)

 

In addition, you can still get the “snickerdoodle black” with a faster Zynq Z-7020 SoC and more SDRAM that also includes an SDSoC software license, all for $195. Finally, snickerdoodle’s creator krtkl has added two mid-priced options: the snickerdoodle prime and snickerdoodle prime LE—also based on Zynq Z-7020 SoCs—for $145.

 

 

Snickerdoodle.jpg

The krtkl snickerdoodle low-cost, single-board computer based on a Xilinx Zynq SoC

 

 

 

Ryan Cousins at krtkl sent me this table that helps explain the differences among the four snickerdoodle versions:

 

 

Snickerdoodle table.jpg

 

 

 

For more information about krtkl’s snickerdoodle SBC, see:

 

 

 

 

 

 

 

 

 

 

If you’re still uncertain as to what System View’s Visual System Integrator hardware/software co-development tool for Xilinx FPGAs and Zynq SoCs does, the following 3-minute video should make it crystal clear. Visual System Integrator extends the Xilinx Vivado Design Suite and makes it a system-design tool for a wide variety of embedded systems based on Xilinx devices.

 

This short video demonstrates System View’s tool being used for a Zynq-controlled robotic arm:

 

 

 

 

 

For more information about System View’s Visual System Integrator hardware/software co-development tool, see:

 

 

 

 

 

Last year, I wrote about a new graphical system-level design tool called Visual System Integrator that lets you “graphically describe complete, heterogeneous, high-performance, systems based on ‘Platforms’ built from processors and Xilinx All Programmable devices.” (See “Visual System Integrator enables rapid system development and integration using processors and Xilinx FPGAs.”) I always thought that definition was a bit too abstract and now there’s a short 2.5-minute video that makes the abstract a bit more concrete:

 

 

 

 

There’s an even shorter companion video that demonstrates the tool being used to create a 10GbE inline packet processing system using a Xilinx Virtex-7 FPGA as a hardware accelerator for an x86 microprocessor:

 

 

 

 

 

In total, you need only five minutes to get a good overview of this relatively new development tool.

 

 

Dense Optical Flow hardware-acceleration on Zynq SoC made easier by SDSoC and OpenCV libraries

by Xilinx Employee ‎01-25-2017 12:31 PM - edited ‎01-25-2017 12:35 PM (2,924 Views)

 

The 4-minute video below demonstrates a real-time, dense optical flow demonstration running on a Xilinx Zynq SoC. The entire demo was developed using C/C++, the Xilinx SDSoC development environment, and associated OpenCV libraries. The dense optical flow algorithm compares successive video images to estimate the apparent motion of each pixel in the one of the images. This technique is used in video compression, object detection, object tracking, and image segmentation. Dense optical flow is a computationally-intensive operation, which makes it an ideal candidate for hardware acceleration using the programmable logic in a small, low-power Zynq SoC.

 

As Xilinx Senior Product Manager for SDSoC and Embedded Vision Nick Ni explains, SDSoC lowers the barriers to using the Zynq SoC in these embedded-vision applications because the tool makes it relatively easy for software developers accustomed to using only C or C++ to develop hardware-accelerated applications with the coding tools and styles they already know. SDSoC then converts the code that requires acceleration into hardware and automatically links this hardware to the software through DMA.

 

 

 

 

 

Magicians are very good at creating the illusion of levitating objects but the Institute for Integrated Systems at Ruhr University Bochum (RUB) has developed a system that does the real thing—quite precisely. The system levitates a steel ball using an electromagnet controlled by an Avnet PicoZed SOM, which in turn is based on a Xilinx Zynq Z-7000 SoC. An FMCW (frequency-modulated, continuous wave) radar module jointly developed by RUB and the Fraunhofer Institute senses the ball’s position and that data feeds a PID control loop that controls the pulse-width-modulated current supplied to an electromagnet that levitates the steel ball.

 

 

Fraunhofer FMCW Radar Sensor.jpg 

 

FMCW radar sensor module jointly developed by RUB and the Fraunhofer Institute

 

 

 

The entire system was developed using the Xilinx SDSoC development environment with hardware acceleration used for the critical paths in the control loop resulting in fast, repeatable, real-time system response. The un-accelerated code runs on the Zynq SoC’s dual-core ARM Cortex-A9 processor and the code translated into hardware by SDSoC resides in the Zynq SoC’s programmable logic. SDSoC seamlessly manages the interaction between the system’s software and the hardware accelerators and the Zynq SoC provides a single-chip solution to the sensor-driven-control design problem.

 

Here’s a 3-minute video that captures the entire demo:

 

 

 

 

 

 

By Adam Taylor

 

To wrap up this blog for the year, we are going to complete the SDSoC integration using the shared library.

 

To recap, we have generated a bit file using the Xilinx SDSoC development environment that implements the matrix multiply example using the PL (programmable logic) on the base PYNQ platform, which we previously defined using SDSoC. The final step is to get it all integrated and the first step is to upload the following files to the PYNQ board:

 

  • bit – The bit file for the programmable logic
  • tcl – The TCL file description of the block diagram with address ranges
  • so – The generated shared library

 

The names are slightly different as I generated them as part of the previous blog.

 

Using a program like WinSCP, I uploaded these three files to the PYNQ bit stream directory, the same place we uploaded our previous design too.

 

 

Image1.jpg

 

 

The next step is to develop the Jupyter notebook so that we can drive the new overlay that we have created. To get this up and running we need to do the following:

 

  • Download and verify the overlay
  • Create an MMIO class to interface with the existing block RAM which remains in the overlay
  • Create a CFFI class to interface with the shared library
  • Write a simple example to interface the overlay using the MMIO and CFFI classes

 

This is very similar to what we have done previously with the exception of the creating the CFFI, so that is where the rest of this blog will focus.

 

The first thing we need to do is know the names of the function within the shared library, because SDSoC will create a different name from the actual accelerated function. We can find the renamed files under <project>/<build config>/_sds/swstubs while the hardware files are under <project>/<build config>/_sds/p0/ipi.

 

If you already have the shared library on your PYNQ board, then you can use the command nm -D <path & shared library name> to examine its contents if you access the PYNQ via an SSH session.

 

With the name of the function known we can create CFFI class within our Jupyter note book. In the class for this example we need to create two functions: one for initialization and another to interact with the library. The more complicated of the two is the initialization under which we must define the location of the shared library within the file system. As mentioned earlier, I have uploaded the shared library to the same location as the bit and TCL files. We also need to declare the functions contained within the shared library and the finally open the shared library.

 

 

Image2.jpg 

 

The second function within the class is what we call when we wish to make use of the shared library. We can then make use of this class as we do any other within the rest of our program. In fact, this approach is used often in Python development to bind together C and Python.

 

This example shows just how easily we can create overlays using SDSoC and interface with them using Python and the PYNQ development system. If you want to try and you currently do not have a license for SDSoC, you can obtain a free 60 day evaluation here with the new release.

 

As I mentioned up top this is the last blog of 2016, I will resume writing in the New Year and to give you a taste of what we are going to be looking at in 2017. Amongst other things I will be featuring:

 

  • UltraZed-EG
  • OpenAMP
  • Image Processing using the PYNQ
  • Advanced sensor interfacing techniques using the Avnet EVK
  • Interfacing to Servos and robotics

 

Until then, have a great Christmas and New Year and thanks for reading the series.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 MicroZed Chronicles Second Year.jpg

 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

You can develop and deploy FPGA-accelerated cloud apps using the Xilinx SDAccel development environment with no downloads and no local FPGA hardware using a new Web-based service from Nimbix. This service runs on a Nimbix platform named JARVICE, which is specifically designed for Bid Data and Big Compute workloads.

 

Here’s a new 2.5-minute video demonstrating the Nimbix platform in action:

 

 

 

 

 

You can develop apps and deploy them on JARVICE. The Nimbix service is available as a subscription and as a pay-as-you-go service for only a few bucks per hour.

 

For more information about the Xilinx SDAccel development environment for cloud-based apps, see “Removing the Barrier for FPGA-Based OpenCL Data Center Servers.” To read about applications created using SDAccel, see:

 

 

 

 

 

 

 

This unlikely new project on the Instructables Web site uses a $189 Digilent ZYBO trainer board (based on a Xilinx Zynq Z7010 SoC) to track balloons with an attached Webcam and then pop them with a high-powered semiconductor laser. The tracking system is programmed with OpenCV.

 

Here’s a view down the bore of the laser:

 

Laser Balloon Popper.jpg 

 

And there’s a 1-second video of the system in action on the Instructables Web page.

 

Fun aside, this system demonstrates that even the smallest Zynq SoC can be used for advanced embedded-vision systems. You can get more information about embedded-vision systems based on Xilinx silicon and tools at the new Embedded Vision Developer Zone.

 

Note: For more information about Digilent’s ZYBO trainer board, see “ZYBO has landed. Digilent’s sub-$200 Zynq-based Dev Board makes an appearance (with pix!)

 

 

SDSoC Logo.jpg 

The latest version of the Xilinx SDSoC Development Environment for Zynq UltraScale+ MPSoCs and Zynq-7000 SoCs, 2016.3, is now available for download and includes the following features:

 

  • Full-feature support for Zynq UltraScale+ MPSoC devices including 64-bit addressing
  • Support for the Zynq UltraScale+ MPSoC’s dual-core ARM Cortex-R5 hardware
  • Linaro-based gcc 5.2-2015.11-2 32-bit and 64-bit tool chains with several compiler enhancements including:
    • Scheduling enhancements for pipelined hardware functions
    • Support for arbitrary numbers of function arguments
    • Support for scalars up to 1024 bits, including double, long long
    • Support for AXI bus data widths to 1024 bits
    • Enhanced pragma processing: user-defined trace points, separate RESOURCE and ASYNC pragmas
  • Vivado Tcl APIs to export hardware metadata specification for custom platforms
  • Embedded Vision Design Examples and OpenCV library
  • Support for QEMU and RTL emulation

 

 

Complete release notes for SDSoC 2016.3 available here.

 

If you already have SDSoC, you should know what to do to get the upgrade. If not, you can download a 60-day free eval copy here, get the SDSoC user guide here, and a tutorial here.

 

 

 

 

By Adam Taylor

 

As I described last week, we need to a platform to fuse Python and SDSoC. In the longer term, I want to perform some image processing with this platform. So although I am going to remove most of the logic from the base design, we need to keep the following in the hardware to ensure that we can correctly boot up the Pynq board:

 

  1. SWSled GPIO
  2. Btns GPIO
  3. RGBLeds GPIO
  4. Block Memory – Added in the MicroZed Chronicles, Part 158

 

We will leave the block memory within the design to demonstrate that the build produced by SDSoC is unique and different when compared to the original boot.bin file. Doing so will enable us to use the notebook we previously used to read and write the Block RAM. However this time we will not need the overlay first.

 

 

Image1.jpg

 

 

Stripped Down Vivado Platform

 

 

As we know by now, we need to have two elements to create an SDSoC hardware definition and a software definition. We can create the hardware definition within Vivado itself. This is straightforward. We declare the available AXI ports, clocks, and interrupts. I have created a script to do. It’s available on the GitHib repository. You can run this in the command line of the TCL console within Vivado.

 

The software definition will take a little more thought. Because we are using a Linux-based approach, we need the following:

 

  • uImage – The Pynq Kernel
  • dtb – The device tree blob
  • elf – The first-stage boot loader
  • elf – The second-stage boot loader
  • bif – Used to determine the boot order

 

 

We can obtain most of these items from the Pynq Master that we downloaded from GitHib previously under the file location:

 

 

<Path>/PYNQ-master/PYNQ-master/Pynq-Z1/sdk/bootbin

 

 

Within this directory, you can find the FSBL, device tree, Uboot, and a boot.bif. What is missing however is the actual Linux kernel: the uImage. We already have this image on the SD card we have been running the PYNQ from recently. I merely copied this file into the SDSoC platform directory.

 

With the platform defined, we can create a simple program that does not have any accelerators and we can use SDSoC to build the contents of the SD Card. Once built we can copy the contents to the SD Card and boot the PYNQ. We should see the LED’s flash as normal when the Pynq is ready for use.

 

We should be able to access the BRAM we have left within the design using the same notebook as before, but with the overlay section commented out. You should be able to read and write from the memory. You’ll should also check to see that if you change the base address from the correct address, the notebook will no longer work correctly.

 

Having proved that we can build a design without accelerating a function, the next step is to ensure that we can build a design that does accelerate a function. I therefore used the matrix multiply example to generate a simple example that shows how you to correctly use the platform to accelerate hardware. This is the final confirmation we need to confirm that we have defined the platform correctly.

 

Creating a new project, targeting the same platform as before, with the example code, and targeting the generation of a shared library produced the following hardware build in Vivado:

 

 

Image2.jpg

 

 

 

MMult hardware example as created by SDSoC

 

 

 

Clearly, we can see the addition of the accelerated hardware.

 

All that is needed now is to upload the bit, tcl, and so files to the PYNQ and then write a notebook to put them to work.

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

 

 

 

 

By Adam Taylor

 

One of the benefits of the PYNQ system is that we can integrate hardware overlays for the PYNQ’s Zynq Z-7000 SoC and use them with ease in a Python programming environment. As we have seen over last few weeks, it is pretty simple to create and integrate a hardware overlay. However, we still need to be able develop an overlay with the functions we desire. Ideally, to continue to leverage the benefits of the high-level PYNQ system, we want to develop the overlays using a similar high-level approach.

 

The traditional way to develop hardware overlays for the FPGA fabric in the Zynq SoC is to use Vivado as we’ve done previously, perhaps combined with Vivado HLS to implement complex functions defined in C or C++. The Xilinx SDSoC development environment allows us to create applications that run on the Zynq SoC’s ARM Cortex-A9 processors (the PS or processor system) and the programmable logic (the PL). We can move functions between then as we desire to accelerate parts of the design. If do this using a high-level language like C or C++, SDSoC combine the capabilities of Vivado HLS with a connectivity framework.

 

 

 

Image1.jpg

 

 

How SDSoC and Pynq can be combined

 

 

What this means for the PYNQ system is that we can use SDSoC to create a hardware overlay using Vivado HLS and then interface to it using Python’s C Foreign Function Interface (CFFI). Using CFFI is very similar to the approach we undertook last week. In theory, this approach allows us to create hardware overlays without the need to write a line of HDL.

 

The first step in using SDSoC is to create an SDSoC platform. As we have discussed before, an SDSoC platform requires both a hardware definition and a software definition. We can create the hardware definition from within Vivado. For the software definition, we can use a template for a Linux operating system.  The base PYNQ design will serve as our foundation because we want to ensure that the PS settings are correct. However to free up resources in the PL for SDSoC, we may want to prune out some of the logic functions.

 

Once the platform has been created within SDSoC, we can take advantage of the support for high-level frame works like OpenCV and the other supported HLS libraries to create the application we want. SDSoC will automatically generate the required bit file and TCL file for a build. However in this case, we also need the C files generated by SDSoC to interface with the accelerated function in the Zynq PL. We do this using a shared library, which we can call from within the Python environment. We can create a shared library by ticking the option when we create a new SDSoC project, like so:

 

 

 

Image2.jpg

 

 

Setting the shared library option

 

 

To make use of the shared library, we will need to know the names of the functions contained within it. These functions will be renamed by SDSoC during the build process and we will need to use these modified names within the Python CFFI interface because that is what is included within the shared library.

 

For example, using the matrix multiply example in SDSoC, the name of the accelerated function becomes:

 

 

mmult_accel  -> _p0_mmult_accel_0

 

 

These files will be available under the <project>/<build config>/_sds/swstubs while the hardware files are under <project>/<build config>/_sds/p0/ipi.

 

This is how the previous example we ran, the Sobel filter (and the FIR filter), was designed.

 

Over the next few weeks, we will look more in depth at how we create the our own SDSoC platform and how we implement it within the PYNQ environment.

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

 

 

By Adam Taylor

 

 

Having re-created the base hardware overlay for our PYNQ dev board, we’ll now modify the overlay to add our own memory-mapped peripheral. As we are modifying the base overlay, this will be a new overlay—one that we need to correctly integrate into the PYNQ environment.

 

While this will be a simple example, we can use the same techniques used here to create as complicated or as simple an overlay as we desire.

 

To demonstrate how we do this, I am going to introduce a new block memory within the PL that we can read from and write to using the Python environment.

 

 

Image1.jpg

 

 

The new blocks are highlighted

 

 

 

To do this we need to do the following in the Vivado design:

 

 

  1. Create a new AXI port (port 13) on the AXI Interconnect connected to General Purpose Master 0
  2. Import a new BRAM controller and configure it to have only one port
  3. Use the Block Memory Generator to create a BRAM. Set the mode to BRAM Controller, single port RAM
  4. Map the new BRAM controller to the Zynq SoC’s PS memory map

 

 

With these four things completed, we are ready to build the bit file. Once the file has been generated, we are halfway towards building an overlay we can use in our design. The other half of the way requires generating a TCL script that defines the address map of the bit file. To do this we need to use the command:

 

write_bd_tcl <name.tcl>

 

Once we have the TCL and bit files, we can move on to the next stage, which is to import the files and create the drivers and application.

 

This is where we need to power on the PYNQ dev board and connect to it to the network with our development PC. Once the PYNQ configuration is uploaded, we can connect to it using a program like WinSCP to upload the bit file and the tcl file.

 

Within the current directory structure on the PYNQ board, there is a bit stream directory we can use at:

 

 

/home/Xilinx/pynq/bitstream/

 

 

You will find the files needed to support the base overlay under this directory.

 

 

 

Image2.jpg 

 

Base overlay and modified overlay following upload

 

 

Once this has been uploaded, we need to create a notebook to use it. We need to make use of the existing overlay module provided with the PYNQ package to do this. This module will allow us to download the overlay into the PL of the PYNQ. Once it is downloaded, we need to check that it downloaded correctly, which we can do using the ol.is_loaded() function.

 

 

Image3.jpg

 

 

Downloading the new overlay

 

 

The simplest way to interface with the new overlay is to use the MMIO module within the PYNQ Package. This module allows us to interface directly to memory-mapped peripherals. First however, we need to define a new class within which we can declare the functions to interact with the overlay. For this example, I have called my class part158 to follow the blog numbering.

 

 

Image4.jpg

 

 

 

Looking within the class, we have defined the base address and address range using the line:

 

 

mmio = MMIO(0x46000000,0x00002000)

 

 

Three function definitions in the above figure define:

 

  • The initialization function (in this case, this function merely writes a 0 to address 0)
  • A function that writes data into the BRAM
  • Another function that reads data from the BRAM.

 

(Remember that the address increments by 4 for each address because this is a 32-bit system.)

 

With the class defined, we can then write a simple script that writes data to and reads data from the BRAM, as we would for any other function. Initially we will write a simple counting sequence followed by writing in random numbers.

 

 

Image5.jpg

 

 

When I executed the notebook, I received the results below:

 

 

Image6.jpg 

 

Once we have this new hardware overlay up and running, we can create a more complex overlay and interact with it using the MMIO module.

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 MicroZed Chronicles Second Year.jpg

 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

Alpha Data’s booth at this week’s SC16 conference in Salt Lake City held the company’s latest top-of-the-line FPGA accelerator card, the ADM-PCIE-9V3, based on the 16nm Xilinx Virtex UltraScale+ VU3P-2 FPGA. Announced just this week, the card also features two QSFP28 sockets that each accommodate one 100GbE connection or four 25GbE connections. If you have a full-height slot available, you can add two more 100GbE interfaces using Samtec FireFly Micro Flyover Optical modules and run four 100GbE interfaces simultaneously. All of this high-speed I/O capability comes courtesy of the 40 32.75Gbps SerDes ports on the Virtex UltraScale+ VU3P FPGA.

 

 

Alpha Data ADM-PCIE-9V3.jpg 

 

Alpha Data ADM-PCIE-9V3 Accelerator Card based on a Xilinx Virtex UltraScale+ VU3P-2 FPGA

 

 

To back up the board’s extreme Ethernet bandwidth, the ADM-PCIE-9V3 board incorporates two banks of 72-bit, DDR2400 SDRAM with ECC and a per-bank capacity of 8Gbytes for a total of 16Gbytes of on-board SDRAM. All of this fits on a half-length, low-profile PCIe card, which features a PCIe Gen4 x8 or a PCIe Gen3 x16 host connection and the board supports the OpenPOWER CAPI coherent interface. (The PCIe configuration is programmable, thanks to the on-board Virtex UltraScale+ FPGA.)

 

 

Taken as a whole, this new accelerator card delivers serious processing and I/O firepower along every dimension you might care to measure, whether it’s Ethernet bandwidth, memory capacity, or processing power.

 

The Alpha Data ADM-PCIE-9V3 board is based on a Xilinx Virtex UltraScale+ FPGA so it can serve as a target for the Xilinx SDAccel development environment, which delivers a CPU- and GPU-like development environment for application developers who wish to develop high-performance code using OpenCL, C, or C++ while targeting ready-to-go, plug-in FPGA hardware. In addition, Alpha Data offers an optional Board Support Package for the ADM-PCIE-9V3 accelerator board with example FPGA designs, application software, a mature API, and driver support for Microsoft Windows and Linux to further ease cloud-scale application development and deployment in hyperscale data centers.

 

 

 

By Adam Taylor

 

 

Having done the easy part and got the Pynq all set up and running a simple “hello world” program, I wanted to look next at the overlays which sit within the PL, how they work, and how we can use the base overlay provided.

 

What is an overlay? The overlay is a design that’s loaded into the Zynq SoC’s programmable logic (PL). The overlay can be designed to accelerate a function in the programmable logic or provide an interfacing capability using the PL. In short, overlays give Pynq its, unique capabilities.

 

What is important to understand about the overlay is that there is not a Python-to-PL high-level synthesis process involved. Instead, we develop the overlay using one of the standard Xilinx design methodologies (SDSoC, Vivado, or Vivado HLS). Once we’ve created the bit file for the overlay, we then integrate it within the Pynq architecture and establish the required parameters to communicate with it using Python.

 

Like all things with the Zynq SoC that we have looked at to date, this is very simple. We can easily integrate with the Python environment using the bit file and other files provided with the Vivado build. We do this with the Python MMIO class, which allows us to interact with designs in the PL through memory-mapped reads and writes.  The memory map of the current overlay in the PL is all we need. Of course, we can change the contents of the PL on the fly as our application requires to accelerate functions in the PL.

 

We will be looking more at how we can create our own overlay over the next few weeks. However, if you want to know more in the short term, I suggest you read the Pynq manual here. If you are thinking of developing your own overlay, be sure that you base it on the base overlay Vivado design to ensure that the configuration of the Zynq SoC’s Processor System (PS) and the PS/PL interface s are correct.

 

The supplied base overlay provides support for several interfaces including the HDMI port and a wide range of PMODs.

 

The real power of the Pynq system comes from the open source community developing and sharing overlays. I want to look at a couple of these in the remainder of this blog. These overlays are available via GitHub and provide a Sobel Filter for the HDMI input and output and a FIR filter. You’ll find them here:

 

 

 

 

The first thing we need to do is the install the packages. For this example, I am going to install the Sobel filter. To do this we need to use a terminal program to download and install the overlay and its associated files.

 

 

We can do this using PuTTY and log in easily with the user name and password of Xilinx. The command to install the overlay is then:

 

 

sudo -H pip install --upgrade 'git+https://github.com/beja65536/pz1_sobelfilter'

 

 

Image1.jpg 

 

Installing the Sobel Filter

 

 

Once this has been downloaded, the next step is to download the zip file containing the Juypter notebook from GitHub and upload it under the examples directory. This is simple to do. Just select the upload and navigate to the location of the notebook you wish to upload.

 

 

Image2.jpg 

 

This notebook also performs the installation of the overlay if you have not done this via the terminal. You do however only need to do this once.

 

 

Once this is uploaded, we can connect the Pynq to an HDMI source and an HDMI monitor and run the example. For this example, I am going to connect the Pynq between the Embedded Vision Kit and the display and then run the notebook.

 

 

Image3.jpg

 

 

When I did this, the notebook produced the image below showing the result of the Sobel Filter. Overall, this was very easy to get up and running using a different overlay that is not the base overlay.

 

 

Image4.jpg 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

This week, Techfocus Media’s President Kevin Morris wrote the following in an article published on the EEJournal Web site:

 

“Designers of FPGA tools should take heed. There is a vast number of different types of users entering the FPGA domain, and the majority are not FPGA experts. If FPGAs are to expand into the numerous new and exciting markets for which they’re suitable, the primary battleground will be tools, not chips. New users should not have to learn FPGA-ese in order to get an FPGA to work in their system. At some point, people with little or no hardware expertise at all will need to be able to customize the function of FPGAs.”

 

In a nutshell, this paragraph describes the philosophy behind the Xilinx SDx Development Environments including SDAccel, SDSoC, and SDNet. These application-specific development environments are designed to allow people versed in software engineering and other disciplines to get a hardware performance boost from Xilinx All Programmable devices without the need to become FPGA experts (in Morris’ terminology).

 

Later in the article, Morris writes:

 

“Higher levels of abstraction in design creation need to replace HDL. System-level design tools need to take into account both the hardware and software components of an application. Tools - particularly lower-level implementation tools such as synthesis and place-and-route - need to move ever closer to full automation.”

 

He might as well be writing about the Xilinx SDSoC development environment. If this is the sort of development tool you seek, you might want to check it out.

 

 

VisualApplets 3.jpg

 

Silicon Software’s VisualApplets has long been a handy GUI-based tool for designers creating high-performance, image-processing systems using FPGAs. The company is now offering a free e-book that shows you how the latest version, VisualApplets 3, lets you create such systems with Silicon Software’s V-series frame grabbers or compatible Baumer LX VisualApplets video cameras in as little as 1 week.

 

Click here to sign up for a free copy of the book.

 

 

Adam Taylor's MicroZed Chronicles Part 155: Introducing the PYNQ (Python + Zynq) Dev Board

by Xilinx Employee ‎11-06-2016 03:59 PM - edited ‎11-11-2016 11:41 AM (10,879 Views)

 

By Adam Taylor

 

Having recently received a Xilinx/Digilent PYNQ Dev Board, I want to spend some time looking at this rather exciting Zynq-based board. For those not familiar with the PYNQ, it combines the capability of the Zynq and the productivity of the Python programming language and it comes in a rather catching pink color.

 

 

Image1.jpg

 

PYNQ up and running on my desk

 

 

Hardware-wise, PYNQ incorporates an on-board Xilinx Zynq Z-7020 SoC, 512Mbytes of DDR SDRAM, HMDI In and Out, Audio In and Out, two PMOD ports, and support for the popular Arduino Interface Header. We can configure the board from either the SD card or QSPI. On its own, PYNQ would be a powerful development board. However, there are even more exciting aspects to this board that enable us to develop applications that use the Zynq SoC’s Programmable Logic.

 

The Zynq SoC runs a Linux kernel with a specific package that supports all of the PYNQ’s capabilities. Using this package, it is possible to place hardware overlays (in reality bit files developed in Vivado) in to the programmable logic of the Zynq.

 

The base PYNQ supports all of the PYNQ interfaces as shown below:

 

 

Image2.jpg

 

PYNQ PL hardware overlay

 

 

Within the supplied software environment, the PYNQ hardware and interfaces are supported by the Pynq Package. This package allows you to use the Python language to drive PYNQ’s GPIO, video, and audio interfaces along with a wide range of PMOD boards. We use this package within the code we have developed and documented using the Jupyter note book, which is the next part of the PYNQ framework.

 

As engineers, we ought to be familiar with the Python Language and Linux, even if we are not experts in either. However, we may be unfamiliar with Jupyter notebooks. These are Web-based, interactive environments that allow us to run code, widgets, document, plots, and even video within the Jupyter notebook Web pages.

 

A Jupyter notebook server runs within the Linux kernel that’s running on the PYNQ’s Zynq SoC. We use this interface to develop our PYNQ applications. Jupyter notebooks and overlays are the core of the PYNQ development methodology and over the next series of blogs we are going to explore how we can use these notebooks and overlays and even develop our own as required.

 

Let’s look at how we can power up the board and get our first “hello world” program running. We’ll develop a simple program that allows us to understand the process flow.

 

The first thing to do is to configure an SD card with the latest kernel image, which we can download from here. With this downloaded, the next step is to write the ISO file to the SD card using an application like Win Disk Imager (if we are using Microsoft Windows).

 

Insert the SD card into the PYNQ board (check that the jumper is set for SD boot) and connect a network cable to the Ethernet port. Power the board up and, once it boots, we can connect to the PYNQ board using a browser.

 

In a new browser window enter the address http://pynq:9090, which will take us to a log-on page where we enter the username Xilinx. From there we will see the Juypter notebook’s welcome page:

 

 

Image3.jpg

The PYNQ welcome page

 

 

Clicking on “Welcome to Pynq.ipynb” will open a welcome page that tells us how to navigate around the notebook and where to find supporting material.

 

For this example, we are going to create our own very simple example to demonstrate the flow, as I mentioned earlier. Again, we run the Python programs from within the Juypter notebook. We can see which programs we currently have running on the PYNQ by clicking on the “Running” tab, which is present on most notebook pages. Initially we have no notebooks running, so clicking on it right now will only show us that there are no running notebooks.

 

 

Image4.jpg

Notebooks running on the PYNQ

 

 

To create your own example, click on the examples page and then click on “New.” Select “notebooks Python 3” from the icon on the right:

 

 

Image5.jpg 

Creating a new notebook

 

 

This will create a new notebook called untitled. We can change the name to whatever we desire by clicking on “untitled,” which will open a dialog box to allow us to change the name. I am going to name my example after the number of this MicroZed Chronicles blog post (Part 155).

 

 

Image6.jpg

 

Changing the name of the Notebook

 

 

The next thing we wish to do is enter the code we wish to run on the PYNQ. Within the notebook, we can mark text as either Code, Markdown, Heading, or Raw NBConvert.

 

 

Image7.jpg

 

We can mark text as either Code, Markdown, Heading, or Raw NBConvert

 

 

For now, select “code” (if it is not already selected) and enter the code: print(“hello world”)

 

 

 

Image8.jpg

 

The code to run in the notebook

 

 

We click the play button to run this very short program. With the box selected and all being well, you will see the result appear as below:

 

 

Image9.jpg

 

Running the code

 

 

Image10.jpg

 

 Result of Running the Code

 

 

If we look under the running tab again, we will see that this time there is a running application:

 

 

Image11.jpg


Running Notebooks

 

 

 

If we wish to stop the notebook from running then we click on the shutdown button.

 

Next time, we will look at how we can use the PYNQ in more complex scenarios.

 

We can also use the PYNQ board as a traditional Zynq based development board if desired. This makes the PYNQ one of the best dev board choices available now.

 

Note, you can also log on to the PYNQ board using a terminal programme like PuTTY with the username and password Xilinx.

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 

 MicroZed Chronicles Second Year.jpg

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.