UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

MathWorks’ HDL Coder wins Embedded World AWARD in Nuremberg last week

by Xilinx Employee ‎03-21-2017 02:07 PM - edited ‎03-22-2017 05:59 AM (691 Views)

 

The organizers of last week’s Embedded World show in Nuremberg gave out embedded AWARDS in three categories last week during the show and MathWorks’ HDL Coder won in the tools category. (See the announcement here.) If you don’t know about this unique development tool, now is a good time to become acquainted with it. HDL Coder accepts model-based designs created using MathWorks’ MATLAB and Simulink and can generate VHDL or Verilog for all-hardware designs or hardware and software code for designs based on a mix of custom hardware and embedded software running on a processor. That means that HDL Coder works well with Xilinx FPGAs and Zynq SoCs.

 

Here’s a diagram of what HDL Coder does:

 

 

MathWorks HDL Coder.jpg 

 

 

You might also want to watch this detailed MathWorks video titled “Accelerate Design Space Exploration Using HDL Coder Optimizations.” (Email registration required.)

 

 

For more information about using MathWorks HDL Coder to target your designs for Xilinx devices, see:

 

 

 

 

 

 

 

 

 

Image3.jpgAEye is the latest iteration of the eye-tracking technology developed by EyeTech Digital Systems. The AEye chip is based on the Zynq Z-7020 SoC. It’s located immediately adjacent to the imaging sensor, which creates compact, stand-alone systems. This technology is finding its way into diverse vision-guided systems in the automotive, AR/VR, and medical diagnostic arenas. According to EyeTech, the Zynq SoC’s unique abilities allows the company to create products they could not do any other way.

 

With the advent of the reVISION stack, EyeTech is looking to expand its product offerings into machine learning, as discussed in this short, 3-minute video:

 

 

 

 

 

 

For more information about EyeTech, see:

 

 

 

 

EETimes’ Junko Yoshida with some expert help analyzes this week’s Xilinx reVISION announcement

by Xilinx Employee ‎03-15-2017 01:25 PM - edited ‎03-22-2017 07:20 AM (585 Views)

 

Image3.jpgThis week, EETimes’ Junko Yoshida published an article titled “Xilinx AI Engine Steers New Course” that gathers some comments from industry experts and from Xilinx with respect to Monday’s reVISION stack announcement. To recap, the Xilinx reVISION stack is a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference.

 

(See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)

 

As Xilinx Senior Vice President of Corporate Strategy Steve Glaser tells Yoshida, “Xilinx designed the stack to ‘enable a much broader set of software and systems engineers, with little or no hardware design expertise to develop, intelligent vision guided systems easier and faster.’

 

Yoshida continues:

 

While talking to customers who have already begun developing machine-learning technologies, Xilinx identified ‘8 bit and below fixed point precision’ as the key to significantly improve efficiency in machine-learning inference systems.

 

 

Yoshida also interviewed Karl Freund, Senior Analyst for HPC and Deep Learning at Moor Insights & Strategy, who said:

 

Artificial Intelligence remains in its infancy, and rapid change is the only constant.” In this circumstance, Xilinx seeks “to ease the programming burden to enable designers to accelerate their applications as they experiment and deploy the best solutions as rapidly as possible in a highly competitive industry.

 

 

She also quotes Loring Wirbel, a Senior Analyst at The Linley group, who said:

 

What’s interesting in Xilinx's software offering, [is that] this builds upon the original stack for cloud-based unsupervised inference, Reconfigurable Acceleration Stack, and expands inference capabilities to the network edge and embedded applications. One might say they took a backward approach versus the rest of the industry. But I see machine-learning product developers going a variety of directions in trained and inference subsystems. At this point, there's no right way or wrong way.

 

 

There’s a lot more information in the EETimes article, so you might want to take a look for yourself.

 

 

 

 

Image3.jpgToday, EEJournal’s Kevin Morris has published a review article of the announcement titled “Teaching Machines to See: Xilinx Launches reVISION” following Monday’s announcement of the Xilinx reVISION stack for developing vision-guided applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”

 

Morris writes:

 

But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms.

 

“That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric - such as Xilinx Zynq and Altera SoC FPGAs - absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space.

 

Later on, Morris gets to the fly in the ointment:

 

“Oh, yeah, There’s still that “almost impossible to program” issue.”

 

And then he gets to the solution:

 

reVISION, announced this week, is a stack - a set of tools, interfaces, and IP - designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development - with customers predicting cuts of as much as 12 months from current schedules for new product and update development.

 

In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.

 

The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ⅕ the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications.

 

 

But don’t take my word for it, read Morris’ article yourself.

 

 

 

 

 

As part of today’s reVISION announcement of a new, comprehensive development stack for embedded-vision applications, Xilinx has produced a 3-minute video showing you just some of the things made possible by this announcement.

 

Here it is:

 

 

Adam Taylor’s MicroZed Chronicles, Part 177: Introducing the reVision stack

by Xilinx Employee ‎03-13-2017 10:39 AM - edited ‎03-22-2017 07:19 AM (1,272 Views)

 

By Adam Taylor

 

Several times in this series, we have looked at image processing using the Avnet EVK and the ZedBoard. Along with the basics, we have examined object tracking using OpenCV running on the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PS (processing system) and using HLS with its video library to generate image-processing algorithms for the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic, see blogs 140 to 148 here).

 

Xilinx’s reVision is an embedded-vision development stack that provides support for a wide range of frameworks and libraries often used for embedded-vision applications. Most exciting, from my point of view, is that the stack includes acceleration-ready OpenCV functions.

 

Image1.jpg 

 

 

The stack itself is split into three layers. Once we select or define our platform, we will be mostly working at the application and algorithm layers. Let’s take a quick look at the layers of the stack:

 

  1. Platform layer: This is the lowest level of the stack and is the one on which the remaining stack layers are built. This layer includes platform definitions of the hardware and the software environment. Should we choose not to use a predefined platform, we can generate a custom platform using Vivado.

 

  1. Algorithm layer: Here we create our application using SDSoC and the platform definition for the target hardware. It is within this layer that we can use the acceleration-ready OpenCV functions along with predefined and optimized implementations for Customized Neural Network (CNN) developments such as inference accelerators within the PL.

 

  1. Application Development Layer: The highest layer of the stack. Development here is where high-level frameworks such as Caffe and OpenVX are used to complete the application.

 

As I mentioned above one of the most exciting aspects of the reVISION stack is the ability to accelerate a wide range of OpenCV functions using the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL. We can group the OpenCV functions that can be hardware-accelerated using the PL into four categories:

 

  1. Computation – Includes functions such as absolute difference between two frames, pixel-wise operations (addition, subtraction and multiplication), gradient, and integral operations
  2. Input Processing – Supports bit-depth conversions, channel operations, histogram equalization, remapping, and resizing.
  3. Filtering – Supports a wide range of filters including Sobel, Custom Convolution, and Gaussian filters.
  4. Other – Provides a wide range of functions including Canny/Fast/Harris edge detection, thresholding, SVM, HoG, LK Optical Flow, Histogram Computation, etc.

 

What is very interesting with these function calls is that we can optimize them for resource usage or performance within the PL. The main optimization method is specifying the number of pixels to be processed during each clock cycle. For most accelerated functions, we can choose to process either one or eight pixels. Processing more pixels per clock cycle reduces latency but increases resource utilization. Processing one pixel per clock minimizes the resource requirements at the cost of increased latency. We control the number of pixels processed per clock in via the function call.

 

Over the next few blogs, we will look more at the reVision stack and how we can use it. However in the best Blue Peter tradition, the image below shows the result of running a reVision Harris OpenCV acceleration function within the PL when accelerated.

 

 

Image2.jpg

 

 

Accelerated Harris Corner Detection in the PL

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge

by Xilinx Employee ‎03-13-2017 07:37 AM - edited ‎03-22-2017 07:19 AM (2,702 Views)

 

Image3.jpgToday, Xilinx announced a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference. It’s called the reVISION stack and it allows design teams without deep hardware expertise to use a software-defined development flow to combine efficient machine-learning and computer-vision algorithms with Xilinx All Programmable devices to create highly responsive systems. (Details here.)

 

The Xilinx reVISION stack includes a broad range of development resources for platform, algorithm, and application development including support for the most popular neural networks: AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN. Additionally, the stack provides library elements such as pre-defined and optimized implementations for CNN network layers, which are required to build custom neural networks (DNNs and CNNs). The machine-learning elements are complemented by a broad set of acceleration-ready OpenCV functions for computer-vision processing.

 

For application-level development, Xilinx supports industry-standard frameworks including Caffe for machine learning and OpenVX for computer vision. The reVISION stack also includes development platforms from Xilinx and third parties, which support various sensor types.

 

The reVISION development flow starts with a familiar, Eclipse-based development environment; the C, C++, and/or OpenCL programming languages; and associated compilers all incorporated into the Xilinx SDSoC development environment. You can now target reVISION hardware platforms within the SDSoC environment, drawing from a pool of acceleration-ready, computer-vision libraries to quickly build your application. Soon, you’ll also be able to use the Khronos Group’s OpenVX framework as well.

 

For machine learning, you can use popular frameworks including Caffe to train neural networks. Within one Xilinx Zynq SoC or Zynq UltraScale+ MPSoC, you can use Caffe-generated .prototxt files to configure a software scheduler running on one of the device’s ARM processors to drive CNN inference accelerators—pre-optimized for and instantiated in programmable logic. For computer vision and other algorithms, you can profile your code, identify bottlenecks, and then designate specific functions that need to be hardware-accelerated. The Xilinx system-optimizing compiler then creates an accelerated implementation of your code, automatically including the required processor/accelerator interfaces (data movers) and software drivers.

 

The Xilinx reVISION stack is the latest in an evolutionary line of development tools for creating embedded-vision systems. Xilinx All Programmable devices have long been used to develop such vision-based systems because these devices can interface to any image sensor and connect to any network—which Xilinx calls any-to-any connectivity—and they provide the large amounts of high-performance processing horsepower that vision systems require.

 

Initially, embedded-vision developers used the existing Xilinx Verilog and VHDL tools to develop these systems. Xilinx introduced the SDSoC development environment for HLL-based design two years ago and, since then, SDSoC has dramatically and successfully shorted development cycles for thousands of design teams. Xilinx’s new reVISION stack now enables an even broader set of software and systems engineers to develop intelligent, highly responsive embedded-vision systems faster and more easily using Xilinx All Programmable devices.

 

And what about the performance of the resulting embedded-vision systems? How do their performance metrics compare against against systems based on embedded GPUs or the typical SoCs used in these applications? Xilinx-based systems significantly outperform the best of this group, which employ Nvidia devices. Benchmarks of the reVISION flow using Zynq SoC targets against Nvidia Tegra X1 have shown as much as:

 

  • 6x better images/sec/watt in machine learning
  • 42x higher frames/sec/watt for computer-vision processing
  • 1/5th the latency, which is critical for real-time applications

 

Image1.jpg 

 

There is huge value to having a very rapid and deterministic system-response time and, for many systems, the faster response time of a design that's been accelerated using programmable logic can mean the difference between success and catastrophic failure. For example, the figure below shows the difference in response time between a car’s vision-guided braking system created with the Xilinx reVISION stack running on a Zynq UltraScale+ MPSoC relative to a similar system based on an Nvidia Tegra device. At 65mph, the Xilinx embedded-vision system’s response time stops the vehicle 5 to 33 feet faster depending on how the Nvidia-based system is implemented. Five to 33 feet could easily mean the difference between a safe stop and a collision.

 

 

Image2.jpg 

 

(Note: This example appears in the new Xilinx reVISION backgrounder.)

 

 

The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down. Many new types of neural networks for vision-guided systems have emerged along with new techniques that make deployment of these neural networks much more efficient. No matter what you develop today or implement tomorrow, the hardware and I/O reconfigurability and software programmability of Xilinx All Programmable devices can “future-proof” your designs whether it’s to permit the implementation of new algorithms in existing hardware; to interface to new, improved sensing technology; or to add an all-new sensor type (like LIDAR or Time-of-Flight sensors, for example) to improve a vision-based system’s safety and reliability through advanced sensor fusion.

 

Xilinx is pushing even further into vision-guided, machine-learning applications with the new Xilinx reVISION Stack and this announcement complements the recently announced Reconfigurable Acceleration Stack for cloud-based systems. (See “Xilinx Reconfigurable Acceleration Stack speeds programming of machine learning, data analytics, video-streaming apps.”) Together, these new development resources significantly broaden your ability to deploy machine-learning applications using Xilinx technology—from inside the cloud to the very edge.

 

 

You might also want to read “Xilinx AI Engines Steers New Course” by Junko Yoshida on the EETimes.com site.

 

 

 

The amazing “snickerdoodle one”—a low-cost, single-board computer with wireless capability based on the Xilinx Zynq Z-7010 SoC—is once more available for purchase on the Crowd Supply crowdsourcing Web site. Shipments are already going out to existing backers and, if you missed out on the original crowdsourcing campaign, you can order one for the post-campaign price of $95. That’s still a huuuuge bargain in my book. (Note: There is a limited number of these boards available, so if you want one, now’s the time to order it.)

 

In addition, you can still get the “snickerdoodle black” with a faster Zynq Z-7020 SoC and more SDRAM that also includes an SDSoC software license, all for $195. Finally, snickerdoodle’s creator krtkl has added two mid-priced options: the snickerdoodle prime and snickerdoodle prime LE—also based on Zynq Z-7020 SoCs—for $145.

 

 

Snickerdoodle.jpg

The krtkl snickerdoodle low-cost, single-board computer based on a Xilinx Zynq SoC

 

 

 

Ryan Cousins at krtkl sent me this table that helps explain the differences among the four snickerdoodle versions:

 

 

Snickerdoodle table.jpg

 

 

 

For more information about krtkl’s snickerdoodle SBC, see:

 

 

 

 

 

 

 

 

 

 

If you’re still uncertain as to what System View’s Visual System Integrator hardware/software co-development tool for Xilinx FPGAs and Zynq SoCs does, the following 3-minute video should make it crystal clear. Visual System Integrator extends the Xilinx Vivado Design Suite and makes it a system-design tool for a wide variety of embedded systems based on Xilinx devices.

 

This short video demonstrates System View’s tool being used for a Zynq-controlled robotic arm:

 

 

 

 

 

For more information about System View’s Visual System Integrator hardware/software co-development tool, see:

 

 

 

 

 

Last year, I wrote about a new graphical system-level design tool called Visual System Integrator that lets you “graphically describe complete, heterogeneous, high-performance, systems based on ‘Platforms’ built from processors and Xilinx All Programmable devices.” (See “Visual System Integrator enables rapid system development and integration using processors and Xilinx FPGAs.”) I always thought that definition was a bit too abstract and now there’s a short 2.5-minute video that makes the abstract a bit more concrete:

 

 

 

 

There’s an even shorter companion video that demonstrates the tool being used to create a 10GbE inline packet processing system using a Xilinx Virtex-7 FPGA as a hardware accelerator for an x86 microprocessor:

 

 

 

 

 

In total, you need only five minutes to get a good overview of this relatively new development tool.

 

 

Dense Optical Flow hardware-acceleration on Zynq SoC made easier by SDSoC and OpenCV libraries

by Xilinx Employee ‎01-25-2017 12:31 PM - edited ‎01-25-2017 12:35 PM (2,536 Views)

 

The 4-minute video below demonstrates a real-time, dense optical flow demonstration running on a Xilinx Zynq SoC. The entire demo was developed using C/C++, the Xilinx SDSoC development environment, and associated OpenCV libraries. The dense optical flow algorithm compares successive video images to estimate the apparent motion of each pixel in the one of the images. This technique is used in video compression, object detection, object tracking, and image segmentation. Dense optical flow is a computationally-intensive operation, which makes it an ideal candidate for hardware acceleration using the programmable logic in a small, low-power Zynq SoC.

 

As Xilinx Senior Product Manager for SDSoC and Embedded Vision Nick Ni explains, SDSoC lowers the barriers to using the Zynq SoC in these embedded-vision applications because the tool makes it relatively easy for software developers accustomed to using only C or C++ to develop hardware-accelerated applications with the coding tools and styles they already know. SDSoC then converts the code that requires acceleration into hardware and automatically links this hardware to the software through DMA.

 

 

 

 

 

Magicians are very good at creating the illusion of levitating objects but the Institute for Integrated Systems at Ruhr University Bochum (RUB) has developed a system that does the real thing—quite precisely. The system levitates a steel ball using an electromagnet controlled by an Avnet PicoZed SOM, which in turn is based on a Xilinx Zynq Z-7000 SoC. An FMCW (frequency-modulated, continuous wave) radar module jointly developed by RUB and the Fraunhofer Institute senses the ball’s position and that data feeds a PID control loop that controls the pulse-width-modulated current supplied to an electromagnet that levitates the steel ball.

 

 

Fraunhofer FMCW Radar Sensor.jpg 

 

FMCW radar sensor module jointly developed by RUB and the Fraunhofer Institute

 

 

 

The entire system was developed using the Xilinx SDSoC development environment with hardware acceleration used for the critical paths in the control loop resulting in fast, repeatable, real-time system response. The un-accelerated code runs on the Zynq SoC’s dual-core ARM Cortex-A9 processor and the code translated into hardware by SDSoC resides in the Zynq SoC’s programmable logic. SDSoC seamlessly manages the interaction between the system’s software and the hardware accelerators and the Zynq SoC provides a single-chip solution to the sensor-driven-control design problem.

 

Here’s a 3-minute video that captures the entire demo:

 

 

 

 

 

 

By Adam Taylor

 

To wrap up this blog for the year, we are going to complete the SDSoC integration using the shared library.

 

To recap, we have generated a bit file using the Xilinx SDSoC development environment that implements the matrix multiply example using the PL (programmable logic) on the base PYNQ platform, which we previously defined using SDSoC. The final step is to get it all integrated and the first step is to upload the following files to the PYNQ board:

 

  • bit – The bit file for the programmable logic
  • tcl – The TCL file description of the block diagram with address ranges
  • so – The generated shared library

 

The names are slightly different as I generated them as part of the previous blog.

 

Using a program like WinSCP, I uploaded these three files to the PYNQ bit stream directory, the same place we uploaded our previous design too.

 

 

Image1.jpg

 

 

The next step is to develop the Jupyter notebook so that we can drive the new overlay that we have created. To get this up and running we need to do the following:

 

  • Download and verify the overlay
  • Create an MMIO class to interface with the existing block RAM which remains in the overlay
  • Create a CFFI class to interface with the shared library
  • Write a simple example to interface the overlay using the MMIO and CFFI classes

 

This is very similar to what we have done previously with the exception of the creating the CFFI, so that is where the rest of this blog will focus.

 

The first thing we need to do is know the names of the function within the shared library, because SDSoC will create a different name from the actual accelerated function. We can find the renamed files under <project>/<build config>/_sds/swstubs while the hardware files are under <project>/<build config>/_sds/p0/ipi.

 

If you already have the shared library on your PYNQ board, then you can use the command nm -D <path & shared library name> to examine its contents if you access the PYNQ via an SSH session.

 

With the name of the function known we can create CFFI class within our Jupyter note book. In the class for this example we need to create two functions: one for initialization and another to interact with the library. The more complicated of the two is the initialization under which we must define the location of the shared library within the file system. As mentioned earlier, I have uploaded the shared library to the same location as the bit and TCL files. We also need to declare the functions contained within the shared library and the finally open the shared library.

 

 

Image2.jpg 

 

The second function within the class is what we call when we wish to make use of the shared library. We can then make use of this class as we do any other within the rest of our program. In fact, this approach is used often in Python development to bind together C and Python.

 

This example shows just how easily we can create overlays using SDSoC and interface with them using Python and the PYNQ development system. If you want to try and you currently do not have a license for SDSoC, you can obtain a free 60 day evaluation here with the new release.

 

As I mentioned up top this is the last blog of 2016, I will resume writing in the New Year and to give you a taste of what we are going to be looking at in 2017. Amongst other things I will be featuring:

 

  • UltraZed-EG
  • OpenAMP
  • Image Processing using the PYNQ
  • Advanced sensor interfacing techniques using the Avnet EVK
  • Interfacing to Servos and robotics

 

Until then, have a great Christmas and New Year and thanks for reading the series.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 MicroZed Chronicles Second Year.jpg

 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

You can develop and deploy FPGA-accelerated cloud apps using the Xilinx SDAccel development environment with no downloads and no local FPGA hardware using a new Web-based service from Nimbix. This service runs on a Nimbix platform named JARVICE, which is specifically designed for Bid Data and Big Compute workloads.

 

Here’s a new 2.5-minute video demonstrating the Nimbix platform in action:

 

 

 

 

 

You can develop apps and deploy them on JARVICE. The Nimbix service is available as a subscription and as a pay-as-you-go service for only a few bucks per hour.

 

For more information about the Xilinx SDAccel development environment for cloud-based apps, see “Removing the Barrier for FPGA-Based OpenCL Data Center Servers.” To read about applications created using SDAccel, see:

 

 

 

 

 

 

 

This unlikely new project on the Instructables Web site uses a $189 Digilent ZYBO trainer board (based on a Xilinx Zynq Z7010 SoC) to track balloons with an attached Webcam and then pop them with a high-powered semiconductor laser. The tracking system is programmed with OpenCV.

 

Here’s a view down the bore of the laser:

 

Laser Balloon Popper.jpg 

 

And there’s a 1-second video of the system in action on the Instructables Web page.

 

Fun aside, this system demonstrates that even the smallest Zynq SoC can be used for advanced embedded-vision systems. You can get more information about embedded-vision systems based on Xilinx silicon and tools at the new Embedded Vision Developer Zone.

 

Note: For more information about Digilent’s ZYBO trainer board, see “ZYBO has landed. Digilent’s sub-$200 Zynq-based Dev Board makes an appearance (with pix!)

 

 

SDSoC Logo.jpg 

The latest version of the Xilinx SDSoC Development Environment for Zynq UltraScale+ MPSoCs and Zynq-7000 SoCs, 2016.3, is now available for download and includes the following features:

 

  • Full-feature support for Zynq UltraScale+ MPSoC devices including 64-bit addressing
  • Support for the Zynq UltraScale+ MPSoC’s dual-core ARM Cortex-R5 hardware
  • Linaro-based gcc 5.2-2015.11-2 32-bit and 64-bit tool chains with several compiler enhancements including:
    • Scheduling enhancements for pipelined hardware functions
    • Support for arbitrary numbers of function arguments
    • Support for scalars up to 1024 bits, including double, long long
    • Support for AXI bus data widths to 1024 bits
    • Enhanced pragma processing: user-defined trace points, separate RESOURCE and ASYNC pragmas
  • Vivado Tcl APIs to export hardware metadata specification for custom platforms
  • Embedded Vision Design Examples and OpenCV library
  • Support for QEMU and RTL emulation

 

 

Complete release notes for SDSoC 2016.3 available here.

 

If you already have SDSoC, you should know what to do to get the upgrade. If not, you can download a 60-day free eval copy here, get the SDSoC user guide here, and a tutorial here.

 

 

 

 

By Adam Taylor

 

As I described last week, we need to a platform to fuse Python and SDSoC. In the longer term, I want to perform some image processing with this platform. So although I am going to remove most of the logic from the base design, we need to keep the following in the hardware to ensure that we can correctly boot up the Pynq board:

 

  1. SWSled GPIO
  2. Btns GPIO
  3. RGBLeds GPIO
  4. Block Memory – Added in the MicroZed Chronicles, Part 158

 

We will leave the block memory within the design to demonstrate that the build produced by SDSoC is unique and different when compared to the original boot.bin file. Doing so will enable us to use the notebook we previously used to read and write the Block RAM. However this time we will not need the overlay first.

 

 

Image1.jpg

 

 

Stripped Down Vivado Platform

 

 

As we know by now, we need to have two elements to create an SDSoC hardware definition and a software definition. We can create the hardware definition within Vivado itself. This is straightforward. We declare the available AXI ports, clocks, and interrupts. I have created a script to do. It’s available on the GitHib repository. You can run this in the command line of the TCL console within Vivado.

 

The software definition will take a little more thought. Because we are using a Linux-based approach, we need the following:

 

  • uImage – The Pynq Kernel
  • dtb – The device tree blob
  • elf – The first-stage boot loader
  • elf – The second-stage boot loader
  • bif – Used to determine the boot order

 

 

We can obtain most of these items from the Pynq Master that we downloaded from GitHib previously under the file location:

 

 

<Path>/PYNQ-master/PYNQ-master/Pynq-Z1/sdk/bootbin

 

 

Within this directory, you can find the FSBL, device tree, Uboot, and a boot.bif. What is missing however is the actual Linux kernel: the uImage. We already have this image on the SD card we have been running the PYNQ from recently. I merely copied this file into the SDSoC platform directory.

 

With the platform defined, we can create a simple program that does not have any accelerators and we can use SDSoC to build the contents of the SD Card. Once built we can copy the contents to the SD Card and boot the PYNQ. We should see the LED’s flash as normal when the Pynq is ready for use.

 

We should be able to access the BRAM we have left within the design using the same notebook as before, but with the overlay section commented out. You should be able to read and write from the memory. You’ll should also check to see that if you change the base address from the correct address, the notebook will no longer work correctly.

 

Having proved that we can build a design without accelerating a function, the next step is to ensure that we can build a design that does accelerate a function. I therefore used the matrix multiply example to generate a simple example that shows how you to correctly use the platform to accelerate hardware. This is the final confirmation we need to confirm that we have defined the platform correctly.

 

Creating a new project, targeting the same platform as before, with the example code, and targeting the generation of a shared library produced the following hardware build in Vivado:

 

 

Image2.jpg

 

 

 

MMult hardware example as created by SDSoC

 

 

 

Clearly, we can see the addition of the accelerated hardware.

 

All that is needed now is to upload the bit, tcl, and so files to the PYNQ and then write a notebook to put them to work.

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

 

 

 

 

By Adam Taylor

 

One of the benefits of the PYNQ system is that we can integrate hardware overlays for the PYNQ’s Zynq Z-7000 SoC and use them with ease in a Python programming environment. As we have seen over last few weeks, it is pretty simple to create and integrate a hardware overlay. However, we still need to be able develop an overlay with the functions we desire. Ideally, to continue to leverage the benefits of the high-level PYNQ system, we want to develop the overlays using a similar high-level approach.

 

The traditional way to develop hardware overlays for the FPGA fabric in the Zynq SoC is to use Vivado as we’ve done previously, perhaps combined with Vivado HLS to implement complex functions defined in C or C++. The Xilinx SDSoC development environment allows us to create applications that run on the Zynq SoC’s ARM Cortex-A9 processors (the PS or processor system) and the programmable logic (the PL). We can move functions between then as we desire to accelerate parts of the design. If do this using a high-level language like C or C++, SDSoC combine the capabilities of Vivado HLS with a connectivity framework.

 

 

 

Image1.jpg

 

 

How SDSoC and Pynq can be combined

 

 

What this means for the PYNQ system is that we can use SDSoC to create a hardware overlay using Vivado HLS and then interface to it using Python’s C Foreign Function Interface (CFFI). Using CFFI is very similar to the approach we undertook last week. In theory, this approach allows us to create hardware overlays without the need to write a line of HDL.

 

The first step in using SDSoC is to create an SDSoC platform. As we have discussed before, an SDSoC platform requires both a hardware definition and a software definition. We can create the hardware definition from within Vivado. For the software definition, we can use a template for a Linux operating system.  The base PYNQ design will serve as our foundation because we want to ensure that the PS settings are correct. However to free up resources in the PL for SDSoC, we may want to prune out some of the logic functions.

 

Once the platform has been created within SDSoC, we can take advantage of the support for high-level frame works like OpenCV and the other supported HLS libraries to create the application we want. SDSoC will automatically generate the required bit file and TCL file for a build. However in this case, we also need the C files generated by SDSoC to interface with the accelerated function in the Zynq PL. We do this using a shared library, which we can call from within the Python environment. We can create a shared library by ticking the option when we create a new SDSoC project, like so:

 

 

 

Image2.jpg

 

 

Setting the shared library option

 

 

To make use of the shared library, we will need to know the names of the functions contained within it. These functions will be renamed by SDSoC during the build process and we will need to use these modified names within the Python CFFI interface because that is what is included within the shared library.

 

For example, using the matrix multiply example in SDSoC, the name of the accelerated function becomes:

 

 

mmult_accel  -> _p0_mmult_accel_0

 

 

These files will be available under the <project>/<build config>/_sds/swstubs while the hardware files are under <project>/<build config>/_sds/p0/ipi.

 

This is how the previous example we ran, the Sobel filter (and the FIR filter), was designed.

 

Over the next few weeks, we will look more in depth at how we create the our own SDSoC platform and how we implement it within the PYNQ environment.

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

 

 

By Adam Taylor

 

 

Having re-created the base hardware overlay for our PYNQ dev board, we’ll now modify the overlay to add our own memory-mapped peripheral. As we are modifying the base overlay, this will be a new overlay—one that we need to correctly integrate into the PYNQ environment.

 

While this will be a simple example, we can use the same techniques used here to create as complicated or as simple an overlay as we desire.

 

To demonstrate how we do this, I am going to introduce a new block memory within the PL that we can read from and write to using the Python environment.

 

 

Image1.jpg

 

 

The new blocks are highlighted

 

 

 

To do this we need to do the following in the Vivado design:

 

 

  1. Create a new AXI port (port 13) on the AXI Interconnect connected to General Purpose Master 0
  2. Import a new BRAM controller and configure it to have only one port
  3. Use the Block Memory Generator to create a BRAM. Set the mode to BRAM Controller, single port RAM
  4. Map the new BRAM controller to the Zynq SoC’s PS memory map

 

 

With these four things completed, we are ready to build the bit file. Once the file has been generated, we are halfway towards building an overlay we can use in our design. The other half of the way requires generating a TCL script that defines the address map of the bit file. To do this we need to use the command:

 

write_bd_tcl <name.tcl>

 

Once we have the TCL and bit files, we can move on to the next stage, which is to import the files and create the drivers and application.

 

This is where we need to power on the PYNQ dev board and connect to it to the network with our development PC. Once the PYNQ configuration is uploaded, we can connect to it using a program like WinSCP to upload the bit file and the tcl file.

 

Within the current directory structure on the PYNQ board, there is a bit stream directory we can use at:

 

 

/home/Xilinx/pynq/bitstream/

 

 

You will find the files needed to support the base overlay under this directory.

 

 

 

Image2.jpg 

 

Base overlay and modified overlay following upload

 

 

Once this has been uploaded, we need to create a notebook to use it. We need to make use of the existing overlay module provided with the PYNQ package to do this. This module will allow us to download the overlay into the PL of the PYNQ. Once it is downloaded, we need to check that it downloaded correctly, which we can do using the ol.is_loaded() function.

 

 

Image3.jpg

 

 

Downloading the new overlay

 

 

The simplest way to interface with the new overlay is to use the MMIO module within the PYNQ Package. This module allows us to interface directly to memory-mapped peripherals. First however, we need to define a new class within which we can declare the functions to interact with the overlay. For this example, I have called my class part158 to follow the blog numbering.

 

 

Image4.jpg

 

 

 

Looking within the class, we have defined the base address and address range using the line:

 

 

mmio = MMIO(0x46000000,0x00002000)

 

 

Three function definitions in the above figure define:

 

  • The initialization function (in this case, this function merely writes a 0 to address 0)
  • A function that writes data into the BRAM
  • Another function that reads data from the BRAM.

 

(Remember that the address increments by 4 for each address because this is a 32-bit system.)

 

With the class defined, we can then write a simple script that writes data to and reads data from the BRAM, as we would for any other function. Initially we will write a simple counting sequence followed by writing in random numbers.

 

 

Image5.jpg

 

 

When I executed the notebook, I received the results below:

 

 

Image6.jpg 

 

Once we have this new hardware overlay up and running, we can create a more complex overlay and interact with it using the MMIO module.

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 MicroZed Chronicles Second Year.jpg

 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

Alpha Data’s booth at this week’s SC16 conference in Salt Lake City held the company’s latest top-of-the-line FPGA accelerator card, the ADM-PCIE-9V3, based on the 16nm Xilinx Virtex UltraScale+ VU3P-2 FPGA. Announced just this week, the card also features two QSFP28 sockets that each accommodate one 100GbE connection or four 25GbE connections. If you have a full-height slot available, you can add two more 100GbE interfaces using Samtec FireFly Micro Flyover Optical modules and run four 100GbE interfaces simultaneously. All of this high-speed I/O capability comes courtesy of the 40 32.75Gbps SerDes ports on the Virtex UltraScale+ VU3P FPGA.

 

 

Alpha Data ADM-PCIE-9V3.jpg 

 

Alpha Data ADM-PCIE-9V3 Accelerator Card based on a Xilinx Virtex UltraScale+ VU3P-2 FPGA

 

 

To back up the board’s extreme Ethernet bandwidth, the ADM-PCIE-9V3 board incorporates two banks of 72-bit, DDR2400 SDRAM with ECC and a per-bank capacity of 8Gbytes for a total of 16Gbytes of on-board SDRAM. All of this fits on a half-length, low-profile PCIe card, which features a PCIe Gen4 x8 or a PCIe Gen3 x16 host connection and the board supports the OpenPOWER CAPI coherent interface. (The PCIe configuration is programmable, thanks to the on-board Virtex UltraScale+ FPGA.)

 

 

Taken as a whole, this new accelerator card delivers serious processing and I/O firepower along every dimension you might care to measure, whether it’s Ethernet bandwidth, memory capacity, or processing power.

 

The Alpha Data ADM-PCIE-9V3 board is based on a Xilinx Virtex UltraScale+ FPGA so it can serve as a target for the Xilinx SDAccel development environment, which delivers a CPU- and GPU-like development environment for application developers who wish to develop high-performance code using OpenCL, C, or C++ while targeting ready-to-go, plug-in FPGA hardware. In addition, Alpha Data offers an optional Board Support Package for the ADM-PCIE-9V3 accelerator board with example FPGA designs, application software, a mature API, and driver support for Microsoft Windows and Linux to further ease cloud-scale application development and deployment in hyperscale data centers.

 

 

 

By Adam Taylor

 

 

Having done the easy part and got the Pynq all set up and running a simple “hello world” program, I wanted to look next at the overlays which sit within the PL, how they work, and how we can use the base overlay provided.

 

What is an overlay? The overlay is a design that’s loaded into the Zynq SoC’s programmable logic (PL). The overlay can be designed to accelerate a function in the programmable logic or provide an interfacing capability using the PL. In short, overlays give Pynq its, unique capabilities.

 

What is important to understand about the overlay is that there is not a Python-to-PL high-level synthesis process involved. Instead, we develop the overlay using one of the standard Xilinx design methodologies (SDSoC, Vivado, or Vivado HLS). Once we’ve created the bit file for the overlay, we then integrate it within the Pynq architecture and establish the required parameters to communicate with it using Python.

 

Like all things with the Zynq SoC that we have looked at to date, this is very simple. We can easily integrate with the Python environment using the bit file and other files provided with the Vivado build. We do this with the Python MMIO class, which allows us to interact with designs in the PL through memory-mapped reads and writes.  The memory map of the current overlay in the PL is all we need. Of course, we can change the contents of the PL on the fly as our application requires to accelerate functions in the PL.

 

We will be looking more at how we can create our own overlay over the next few weeks. However, if you want to know more in the short term, I suggest you read the Pynq manual here. If you are thinking of developing your own overlay, be sure that you base it on the base overlay Vivado design to ensure that the configuration of the Zynq SoC’s Processor System (PS) and the PS/PL interface s are correct.

 

The supplied base overlay provides support for several interfaces including the HDMI port and a wide range of PMODs.

 

The real power of the Pynq system comes from the open source community developing and sharing overlays. I want to look at a couple of these in the remainder of this blog. These overlays are available via GitHub and provide a Sobel Filter for the HDMI input and output and a FIR filter. You’ll find them here:

 

 

 

 

The first thing we need to do is the install the packages. For this example, I am going to install the Sobel filter. To do this we need to use a terminal program to download and install the overlay and its associated files.

 

 

We can do this using PuTTY and log in easily with the user name and password of Xilinx. The command to install the overlay is then:

 

 

sudo -H pip install --upgrade 'git+https://github.com/beja65536/pz1_sobelfilter'

 

 

Image1.jpg 

 

Installing the Sobel Filter

 

 

Once this has been downloaded, the next step is to download the zip file containing the Juypter notebook from GitHub and upload it under the examples directory. This is simple to do. Just select the upload and navigate to the location of the notebook you wish to upload.

 

 

Image2.jpg 

 

This notebook also performs the installation of the overlay if you have not done this via the terminal. You do however only need to do this once.

 

 

Once this is uploaded, we can connect the Pynq to an HDMI source and an HDMI monitor and run the example. For this example, I am going to connect the Pynq between the Embedded Vision Kit and the display and then run the notebook.

 

 

Image3.jpg

 

 

When I did this, the notebook produced the image below showing the result of the Sobel Filter. Overall, this was very easy to get up and running using a different overlay that is not the base overlay.

 

 

Image4.jpg 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

This week, Techfocus Media’s President Kevin Morris wrote the following in an article published on the EEJournal Web site:

 

“Designers of FPGA tools should take heed. There is a vast number of different types of users entering the FPGA domain, and the majority are not FPGA experts. If FPGAs are to expand into the numerous new and exciting markets for which they’re suitable, the primary battleground will be tools, not chips. New users should not have to learn FPGA-ese in order to get an FPGA to work in their system. At some point, people with little or no hardware expertise at all will need to be able to customize the function of FPGAs.”

 

In a nutshell, this paragraph describes the philosophy behind the Xilinx SDx Development Environments including SDAccel, SDSoC, and SDNet. These application-specific development environments are designed to allow people versed in software engineering and other disciplines to get a hardware performance boost from Xilinx All Programmable devices without the need to become FPGA experts (in Morris’ terminology).

 

Later in the article, Morris writes:

 

“Higher levels of abstraction in design creation need to replace HDL. System-level design tools need to take into account both the hardware and software components of an application. Tools - particularly lower-level implementation tools such as synthesis and place-and-route - need to move ever closer to full automation.”

 

He might as well be writing about the Xilinx SDSoC development environment. If this is the sort of development tool you seek, you might want to check it out.

 

 

VisualApplets 3.jpg

 

Silicon Software’s VisualApplets has long been a handy GUI-based tool for designers creating high-performance, image-processing systems using FPGAs. The company is now offering a free e-book that shows you how the latest version, VisualApplets 3, lets you create such systems with Silicon Software’s V-series frame grabbers or compatible Baumer LX VisualApplets video cameras in as little as 1 week.

 

Click here to sign up for a free copy of the book.

 

 

Adam Taylor's MicroZed Chronicles Part 155: Introducing the PYNQ (Python + Zynq) Dev Board

by Xilinx Employee ‎11-06-2016 03:59 PM - edited ‎11-11-2016 11:41 AM (10,345 Views)

 

By Adam Taylor

 

Having recently received a Xilinx/Digilent PYNQ Dev Board, I want to spend some time looking at this rather exciting Zynq-based board. For those not familiar with the PYNQ, it combines the capability of the Zynq and the productivity of the Python programming language and it comes in a rather catching pink color.

 

 

Image1.jpg

 

PYNQ up and running on my desk

 

 

Hardware-wise, PYNQ incorporates an on-board Xilinx Zynq Z-7020 SoC, 512Mbytes of DDR SDRAM, HMDI In and Out, Audio In and Out, two PMOD ports, and support for the popular Arduino Interface Header. We can configure the board from either the SD card or QSPI. On its own, PYNQ would be a powerful development board. However, there are even more exciting aspects to this board that enable us to develop applications that use the Zynq SoC’s Programmable Logic.

 

The Zynq SoC runs a Linux kernel with a specific package that supports all of the PYNQ’s capabilities. Using this package, it is possible to place hardware overlays (in reality bit files developed in Vivado) in to the programmable logic of the Zynq.

 

The base PYNQ supports all of the PYNQ interfaces as shown below:

 

 

Image2.jpg

 

PYNQ PL hardware overlay

 

 

Within the supplied software environment, the PYNQ hardware and interfaces are supported by the Pynq Package. This package allows you to use the Python language to drive PYNQ’s GPIO, video, and audio interfaces along with a wide range of PMOD boards. We use this package within the code we have developed and documented using the Jupyter note book, which is the next part of the PYNQ framework.

 

As engineers, we ought to be familiar with the Python Language and Linux, even if we are not experts in either. However, we may be unfamiliar with Jupyter notebooks. These are Web-based, interactive environments that allow us to run code, widgets, document, plots, and even video within the Jupyter notebook Web pages.

 

A Jupyter notebook server runs within the Linux kernel that’s running on the PYNQ’s Zynq SoC. We use this interface to develop our PYNQ applications. Jupyter notebooks and overlays are the core of the PYNQ development methodology and over the next series of blogs we are going to explore how we can use these notebooks and overlays and even develop our own as required.

 

Let’s look at how we can power up the board and get our first “hello world” program running. We’ll develop a simple program that allows us to understand the process flow.

 

The first thing to do is to configure an SD card with the latest kernel image, which we can download from here. With this downloaded, the next step is to write the ISO file to the SD card using an application like Win Disk Imager (if we are using Microsoft Windows).

 

Insert the SD card into the PYNQ board (check that the jumper is set for SD boot) and connect a network cable to the Ethernet port. Power the board up and, once it boots, we can connect to the PYNQ board using a browser.

 

In a new browser window enter the address http://pynq:9090, which will take us to a log-on page where we enter the username Xilinx. From there we will see the Juypter notebook’s welcome page:

 

 

Image3.jpg

The PYNQ welcome page

 

 

Clicking on “Welcome to Pynq.ipynb” will open a welcome page that tells us how to navigate around the notebook and where to find supporting material.

 

For this example, we are going to create our own very simple example to demonstrate the flow, as I mentioned earlier. Again, we run the Python programs from within the Juypter notebook. We can see which programs we currently have running on the PYNQ by clicking on the “Running” tab, which is present on most notebook pages. Initially we have no notebooks running, so clicking on it right now will only show us that there are no running notebooks.

 

 

Image4.jpg

Notebooks running on the PYNQ

 

 

To create your own example, click on the examples page and then click on “New.” Select “notebooks Python 3” from the icon on the right:

 

 

Image5.jpg 

Creating a new notebook

 

 

This will create a new notebook called untitled. We can change the name to whatever we desire by clicking on “untitled,” which will open a dialog box to allow us to change the name. I am going to name my example after the number of this MicroZed Chronicles blog post (Part 155).

 

 

Image6.jpg

 

Changing the name of the Notebook

 

 

The next thing we wish to do is enter the code we wish to run on the PYNQ. Within the notebook, we can mark text as either Code, Markdown, Heading, or Raw NBConvert.

 

 

Image7.jpg

 

We can mark text as either Code, Markdown, Heading, or Raw NBConvert

 

 

For now, select “code” (if it is not already selected) and enter the code: print(“hello world”)

 

 

 

Image8.jpg

 

The code to run in the notebook

 

 

We click the play button to run this very short program. With the box selected and all being well, you will see the result appear as below:

 

 

Image9.jpg

 

Running the code

 

 

Image10.jpg

 

 Result of Running the Code

 

 

If we look under the running tab again, we will see that this time there is a running application:

 

 

Image11.jpg


Running Notebooks

 

 

 

If we wish to stop the notebook from running then we click on the shutdown button.

 

Next time, we will look at how we can use the PYNQ in more complex scenarios.

 

We can also use the PYNQ board as a traditional Zynq based development board if desired. This makes the PYNQ one of the best dev board choices available now.

 

Note, you can also log on to the PYNQ board using a terminal programme like PuTTY with the username and password Xilinx.

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 

 MicroZed Chronicles Second Year.jpg

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

 

Adam Taylor’s MicroZed Chronicles, Part 154: SDSoC Tracing

by Xilinx Employee on ‎10-31-2016 09:48 AM (5,740 Views)

 

By Adam Taylor

 

Using the Xilinx SDSoC Development Environment allows us to create a complex system with functions in both the Zynq SoC’s PS and the PL (the Processor System and the Programmable Logic). We need to see the interaction between the PS and PL to achieve the best system performance and optimization. This is where tracing comes in. Unlike AXI profiling, which enables us to ensure we are using the AXI links between the PS and PL optimally, tracing shows you the detailed interaction taking place between the software and hardware acceleration. To do this SDSoC instruments the design and includes several additional blocks to trace events within the PL design, enabling tracing for the global solution.

 

We can enable tracing using the SDSoC Project overview: select the Enable Event Tracing check box and set the active build to SDDebug. We are then able to build our application with the trace functionality built in, which ensures that there are no issues. I recommend that you clean the build first (Project-> Clean).

 

 

Image1.jpg 

 

Using SDSoC Project Overview to configure a build with built-in trace

 

 

For this blog post, I am going to target the Avnet ZedBoard with standalone OS and use a matrix multiplication example.

 

Once SDSoC has generated the build files, we need to execute and run the trace design from within SDSoC itself. We need to connect both the JTAG and UART to our development PC to run things from within the SDSoC environment. If we are using a Linux operating system, then we also need to connect the Ethernet port, to the same network connected to the development PC.

 

Power up the target board and then, within SDSoC project Explorer, expand your working project. Beneath the SDDebug folder, right click on the resultant ELF file, then select debug and Trace application.

 

 

Image2.jpg

 

 

Executing the trace application

 

 

This will then program the bit file into the target board, execute the instrumented design, capture the trace results, and upload them to SDSoC. If we have a terminal window connected to the target board to capture the UART’s output, we will see the results of the program being executed:

 

 

Image3.jpg

 

 

UART results of the program

 

 

 

The trace application executes within SDSoC and uploads a trace log and a graphical view of this log. This log shows the starting and stoping of the software, data transfers, and hardware acceleration:

 

 

Image4.jpg

 

SDSoC Trace Log

 

 

Image5.jpg

 

 

Results of tracing an example application

(Orange = Software, Green = Accelerated function and Blue = Transfer)

 

 

 

Running the trace creates a new project containing the trace details, which can be seen within the project explorer.

 

 

Image6.jpg 

 

New Project containing trace data

 

 

Tracing instruments the PL side of the design so I ran the same build with and without tracing enabled to compare the differences in resource allocation. The results appear below:

 

 

Image7.jpg

 

 

Resource Utilization with Tracing enabled (left) and Normal Build (right)

 

 

There is a small difference between the two builds but implementing trace within the design does not significantly increase the resources required.

 

I also made a short video for this blog post. Here it is:

 

 

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

NI’s FPGA-based, 400MHz, 100Vpp, 2-channel, PXIe DSO captures 1G, 14-bit samples/sec

by Xilinx Employee ‎10-26-2016 03:15 PM - edited ‎10-26-2016 03:31 PM (6,528 Views)

 

Today, National Instruments (NI) announced the PXIe-5164 2-channel, 400MHz DSO (digital sampling oscilloscope) that captures 1G 14-bit samples/sec per channel. One of the scope’s key features is the ability to handle 100Vpp input signals—a feature not usually found on box-level DSOs. With programmable level offsets, the PXIe-5164 scope can actually handle inputs over a ±250V range. And, since this is Xcell Daily, you have probably already figured out that there’s an FPGA inside NI’s PXIe-5164 scope handling the capturing, triggering, memory control for the 1.5Gbyte sample memory, and the PXIe interface. It’s a Xilinx Kintex-7 410 FPGA.

 

 

NI PXIe-5164 Dual-Channel DSO.jpg

 

NI’s PXIe-5164 2-channel, 400MHz PXIe DSO captures 1G 14-bit samples/sec per channel

 

 

Moreover, you can use digital filtering to adjust the PXIe-5164 scope’s response for characteristics such as passband flatness and linear phase response. This filtering is based on a 16-tap finite impulse response (FIR) filter in the FPGA. The filter parameters are determined during factory calibration and can be altered during external calibration. The calibration procedure mandates a very flat-with-frequency power meter, hence a guaranteed and traceable flat frequency response is transferred to the PXIe-5164 during calibration. Factory-calibrated flatness is guaranteed at +/- 0.31dB to 330MHz.

 

You can also harness the PXIe-5164 scope’s internal Kintex-7 FPGA to implement custom triggers (details here) or custom input-signal processing using the company’s graphical LabVIEW System Design Software package. The internal FPGA provides benefits that range from low-latency device under test (DUT) control to CPU load reduction. It’s simply not possible to do processing at these speeds using a software-driven microprocessor. You need an FPGA. In addition, the FPGA-based hardware platform design allows NI to rapidly create a family of PXIe DSOs with different sample rates and sample resolutions.

 

For more information about the NI PXIe-5164 DSO, contact National Instruments directly.

 

MVTec Software has released a version of the popular vision package HALCON Embedded that generates applications directly for Zynq-based hardware. HALCON is a machine-vision development environment with a large library of powerful 2D and 3D image-processing functions and compatible with a truly broad range of imaging devices. HALCON runs on CPU-based hosts while HALCON Embedded lets you develop your machine-vision application on a host using C, C++, and other development languages. You can then move this code to a diverse assortment of embedded hardware, which now includes hardware based on the Xilinx Zynq-7000 SoC.

 

 

HALCON Embedded.jpg 

 

Note: Contact MVTec for more information about HALCON Embedded for the Zynq SoC.

 

 

 

By Adam Taylor

 

Over the last few weeks and indeed last year, we have looked at the Xilinx SDSoC Development Environment in detail. However one area we have not examined are SDSoC’s performance monitoring and the trace capabilities.

 

Performance monitoring allows us to examine the performance of the processors executing applications within our system. We also can see the performance of the AXI interconnect used as part of the Zynq SoC’s PL acceleration in considerable detail. This feature allows us to understand the interaction between the PS and the PL. Tracing capability, which requires more detail, will be the focus of another blog.

 

We enable the AXI performance monitor using SDSoC’s Project overview. On the right-hand side under options, there is a tick box labeled Insert AXI Performance Monitoring. Checking this box and then cleaning the build, prior to a complete re-build of the project with the active configuration set to SDDebug, tells SDSoC to insert AXI performance-monitoring blocks into the design.

 

 

Image1.jpg

 

 

For this example, I will use one of the demo applications and target the ZedBoard. I am going to run the matrix multiplication example and target a bare-metal solution. We can monitor the AXI performance using both standalone code and Linux.

 

Once the application is built, we need to connect the ZedBoard to our development PC using both the UART and the JTAG connectors.

 

To run the examples on our target board, we will be using an approach that differs from what we have done before—i.e. we are not going to copy the generated files on to a SD Card. Instead we are going to use SDSoC’s Debugger. We will also be using two new perspectives within SDSoC: the debugger perspective (which should be familiar to those of us who have used Xilinx SDK previously), and the performance analysis perspective.

 

The first thing we need to do with the files we’ve generated is to create a debug configuration for the elf file. Within SDSoC, under project explorer, open the folder for the project we have just compiled, expand the SDDebug folder, and select the elf file for the project.

 

 

Image2.jpg

 

 

 

Right click on this selection and select Debug As -> Debug Configurations. Create a new Xilinx SDSoC Application as configured in the image below.

 

 

Image3.jpg

 

 

Selecting the Debug Configuration

 

 

 

On the application tab, check the stop program at entry operation. This will prevent the program from running the minute it is downloaded and will allow us to control the program when executed.

 

 

Image4.jpg

 

 

SDSoC Debug Configuration

 

 

 

Image5.jpg

 

 

Ensuring the program waits at entry

 

 

With this complete, click on debug. The bit file will be loaded and the application downloaded and held at the entry point, awaiting our command. You may see a dialog asking you to switch to the debug view, click yes and you will see that the application has been loaded and is paused.

 

 

Image6.jpg

 

Program downloaded and awaiting execution

 

 

If we want to execute the program, we can click the resume button (or hit F8) as shown above. However if we do that, we will not obtain the performance data. If we want the performance data, we need to open the performance analysis perspective by clicking on the open perspective button.

 

 

Image7.jpg 

 

We can also select Window->Perspective-> Open Perspective-> Other

 

 

 

Image8.jpg

 

 

Selecting the Performance Analysis Perspective

 

 

This will open the performance analysis perspective. However, before we can obtain the performance analysis, we need to define the underlying hardware. This is very simple to do under the performance system manager. Just select run.

 

 

Image9.jpg

 

The Performance Session Manager settings

 

 

This will open a dialog box that allows us to define the clock rate and the APM (AXI Performance Monitor) information. This information resides in the following directory:

 

 

<project>\SDDebug\_sds\p0\ipi\<projectname>.sdk\<projectname>.hdf

 

 

 

Image10.jpg

 

 

Defining the APM slots in the design

 

 

Once this is completed, we can run the program and capture the information of interest within in the performance graphs. These performance graphs relate to either the PS or the APM performance. I captured the following when I ran the program:

 

 

 

Image11.jpg

 

 

Result of the APM Performance Analysis Graph

 

 

 

Image12.jpg

 

 

Result from the APM Performance Analysis Counters

 

 

 

Performance analysis allows us to examine the performance of our system in more depth, which helps us better understand the interaction between the Zynq SoC’s PS and PL.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

 MicroZed Chronicles hardcopy.jpg

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

All of Adam Taylor’s MicroZed Chronicles are cataloged here.

 

 

 

 

 

 

 

 

 

Agnisys IDesignSpec and ISequenceSpec can generate synthesizable HDL for Vivado from plain-text specs

by Xilinx Employee ‎10-20-2016 10:23 AM - edited ‎10-20-2016 03:49 PM (5,763 Views)

 

 

Here’s the system engineer’s dream: Write a concise specification for a system, feed it to the right tool, and get a working design out of the other end of the tool. As I said, it’s a dream. But Agnisys seems bound and determined to make this dream a reality. The company offers two design tools—IDesignSpec and ISequenceSpec—that perform this feat for system and test designs. Here’s a diagram of the IDesignSpec flow:

 

 

Agnisys IDesignSpec.jpg 

 

Agnisys IDesigSpec Design Flow

 

 

Note that IDesignSpec accepts specifications in a variety of text-centric formats including IP-XACT and XML and emits a number of files including synthesizable Verilog or VHDL, which slips right into the Xilinx Vivado HLx Design Suite.

 

Now this may sound complex and I’m sure that whatever’s going on under the hood is indeed complicated; but perhaps your job isn’t, as demonstrated in the following 5-minute demo video:

 

 

 

 

 

Contact Agnisys directly for more information about IDesignSpec and ISequenceSpec.

 

 

 

 

 

 

 

Programmable logic control of power electronics—where to start? What dev boards to use?

by Xilinx Employee ‎10-18-2016 10:24 AM - edited ‎10-18-2016 10:28 AM (4,177 Views)

 

A great new blog post on the ELMG Web site discusses three entry-level dev boards you can use to learn about controlling power electronics with FPGAs. (This post follows a Part 1 post that discusses the software you can use—namely Xilinx Vivado HLS and SDSoC—to develop power-control FPGA designs.)

 

And what are those three boards? They should be familiar to any Xcell Daily reader:

 

 

The $99 Digilent ARTY dev board (Artix-7 FPGA)

 

ARTY Board v2 White.jpg 

 

 

 

The Avnet ZedBoard (Zynq Z-7000 SoC)

 

ZedBoard V2.jpg

 

 

 

 

 

The Avnet MicroZed SOM (Zynq Z-7000 SoC)

 

 

MicroZed V2.jpg

 

 

 

 

Who is ELMG? They’ve spent the last 25 years developing digitally controlled power converters in motor drives, industrial switch mode power supplies, reactive power compensation, medium voltage system, power quality systems, motor starters, appliances and telecom switch-mode power supplies.

 

 

For more information about the ARTY board, see: ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????

 

 

For more information about the MicroZed and the ZedBoard, see the 150+ blog posts in Adam Taylor’s MicroZed Chronicles.

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.