Embedded-vision applications present many design challenges and a new ElectronicsWeekly.com article written by Michaël Uyttersprot, a Technical Marketing Manager at Avnet Silica, and titled “Bringing embedded vision systems to market” discusses these challenges and solutions.
First, the article enumerates several design challenges including:
Next, the article discusses Avnet Silica’s various design offerings that help engineers quickly develop embedded-vision designs. Products discussed include:
The Avnet PicoZed Embedded Vision Kit is based on the Xilinx Zynq SoC
If you’re about to develop any sort of embedded-vision design, it might be worth your while to read the short article and then connect with your friendly neighborhood Avnet or Avnet Silica rep.
For more information about the Avnet PicoZed Embedded Vision Kit, see “Avnet’s $1500, Zynq-based PicoZed Embedded Vision Kit includes Python-1300-C camera and SDSoC license.”
DornerWorks is one of only three Xilinx Premier Alliance Partners in North America offering design services, so the company has more than a little experience using Xilinx All Programmable devices. The company has just launched a new learn-by-email series with “interesting shortcuts or automation tricks related to FPGA development.”
The series is free but you’ll need to provide an email address to receive the lessons. I signed up and immediately received a link to the first lesson titled “Algorithm Implementation and Acceleration on Embedded Systems” written by DornerWorks’ Anthony Boorsma. It contains information about the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC and the Xilinx SDSoC development environment.
Sign up here.
Last month, a user on EmbeddedRelated.com going by the handle stephaneb started a thread titled “When (and why) is it a good idea to use an FPGA in your embedded system design?” Olivier Tremois (oliviert), a Xilinx DSP Specialist FAE based in France, provided an excellent, comprehensive, concise, Xilinx-specific response worth repeating in the Xcell Daily blog:
As a Xilinx employee I would like to contribute on the Pros ... and the Cons.
Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.
Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:
FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...
Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.
Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).
Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].
Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:
There are also tools related to the MathWorks environment [MATLAB and Simulink]:
All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…
If you want your design to run at maximum speed at the lowest possible power consumption (and who does not?), then you want to run your algorithms using fixed-point hardware. With that in mind, MathWorks has just published an extensive guide to “Best Practices for Converting MATLAB Code to Fixed Point” for MATLAB-based designs with a nearly hour-long companion video.
Mathworks has been advocating model-based design using its MATLAB and Simulink development tools for some time because the design technique allows you to develop more complex software with better quality in less time. (See the Mathworks White Paper: “How Small Engineering Teams Adopt Model-Based Design.”) Model-based design employs a mathematical and visual approach to developing complex control and signal-processing systems through the use of system-level modeling throughout the development process—from initial design, through design analysis, simulation, automatic code generation, and verification. These models are executable specifications that consist of block diagrams, textual programs, and other graphical elements. Model-based design encourages rapid exploration of a broader design space than other design approaches because you can iterate your design more quickly, earlier in the design cycle. Further, because these models are executable, verification becomes an integral part of the development process at every step. Hopefully, this design approach results in fewer (or no) surprises at the end of the design cycle.
Xilinx supports model-based design using MATLAB and Simulink through the new Xilinx Model Composer, a design tool that integrates into the MATLAB and Simulink environments. The Xilinx Model Composer includes libraries with more than 80 high-level, performance-optimized, Xilinx-specific blocks including application-specific blocks for computer vision, image processing, and linear algebra. You can also import your own custom IP blocks written in C and C++, which are subsequently processed by Vivado HLS.
Here’s a block diagram that shows you the relationship among Mathworks’ MATLAB, Simulink, and Xilinx Model Composer:
Finally, here’s a 6-minute video explaining the benefits and use of Xilinx Model Composer:
Good machine learning heavily depends on large training-data sets, which are not always available. There’s a solution to this problem called transfer learning, which allows the new neural network to leverage an already trained neural network as a starting point. Kaan Kara at ETH Zurich has published an example of transfer learning as a Jupyter Notebook for the Zynq-and-Python based PYNQ development environment on Github. This demo uses the ZipML-PYNQ overlay and analyzes astronomical images of galaxies and puts the images into one of two classes: one showing images of merging galaxies and one that doesn’t.
The work is discussed further in a paper presented at the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2017. The paper is titled “FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off.”
Designing SDRs (software-defined radios)? MathWorks and Analog Devices have joined together to bring you a free Webinar titled “Radio Deployment on SoC Platforms.” It a 45-minute class that discusses hardware and software development for SDR designs using MathWorks’ MATLAB, Simulink, and HDL Coder to:
Analog Devices’ Zynq-based RF SOM on a Carrier Card
There will be three broadcasts of the Webinar on December 13 to accommodate viewers around the world. Register here. Register even if you cannot attend and you’ll receive a link to a recording of the session.
RS-Online has a running series of design articles and a new one written by Adam Taylor titled “Software Defined SoC on Arty Z7-20, Xilinx ZYNQ evaluation board” tells you how to get started with the Xilinx SDSoC Development Environment using the Digilent Arty Z7-20, which is based on a a Zynq Z-7020 SoC. What’s great about SDSoC is that it lets you program code on the Zynq SoC’s dual-core ARM Cortex-A9 MPCore processors and accelerate specific tasks using the Zynq SoC’s programmable logic, all while using C, C++, or OpenCL.
The other great thing is that the Digilent Arty Z7-20 with a voucher for SDSoC costs a mere $219 from Digilent.
You can’t find a more cost-effective way of learning how to use FPGA-based hardware acceleration to break performance bottlenecks in your embedded designs.
For more information about the Digilent Arty Z7, see “Arty Z7: Digilent’s new Zynq SoC trainer and dev board—available in two flavors for $149 or $209.”
If you’ve got some high-speed RF analog work to do, VadaTech’s new AMC598 and VPX598 Quad ADC/Quad DAC modules appear to be real workhorses. The four 14-bit ADCs (using two AD9208 dual ADCs) operate at 3Gsamples/sec and the quad 16-bit DACs (using four AD9162 or AD9164 DACs) operate at 12Gsamples/sec. You’re not going to drive those sorts of data rates over the host bus so the modules have local memory in the form of three DDR4 SDRAM banks for a total of 20Gbytes of on-board SDRAM. A Xilinx Kintex UltraScale KCU115 FPGA (aka the DSP Monster, the largest Kintex UltraScale FPGA family member with 5520 DSP slices that give you an immense amount of digital signal processing power to bring to bear on those RF analog signals) manages all of the on-board resources (memory, analog converters, and host bus) and handles the blazingly fast data-transfer rates allowing you to create RF waveform generators and advanced RF-capture systems for applications including communications and signal intelligence (COMINT/SIGINT), radar, and electronic warfare using Xilinx tools including the Vivado Design Suite HLx Editions and the Xilinx Vivado System Generator for DSP, which can be used in conjunction with MathWorks’ MATLAB and the Simulink model-based design tool.
Here’s a block diagram of the AMC598 module:
VadaTech AMC598 Quad ADC/Quad DAC Block Diagram
And here’s a photo of the AMC598 Quad ADC/Quad DAC module:
VadaTech AMC598 Quad ADC/Quad DAC
Note: Please contact VadaTech directly for more information about the AMC598 and VPX598 Quad ADC/Quad DAC modules.
Digilent has announced a major upgrade to the Zynq-based Zybo dev board, now called the Zybo Z7. The original board was based on a Xilinx Zynq Z-7010 SoC with the integrated Arm Cortex-A9 MPCore processors running at 650MHz. The new Zybo Z7-10 and -20 dev boards are based on the Zynq Z-7010 and Z-7020 SoC respectively, and the processors now run at 667MHz. The Zybo Z7-10 sells for $199 (currently, you can get a voucher for the Xilinx SDSoC development environment for $10 more) and the Zybo Z7-20 board with triple the programmable logic resources sells for $299 (and currently includes the SDSoC voucher).
Digilent Zybo Z7-20 Dev Board based on Zynq Z-7020 SoC
In addition to the faster processors, there are several additional upgrades made to the Zybo Z7 versus the Zybo dev board. SDRAM capacity has increased from 512Mbytes on the original Zybo board to 1Gbyte on the Zybo Z7. The new boards now have two HDMI ports to support “bump-in-the-wire” HDMI applications. Both boards now also include a connector with a MIPI CSI-2 interface for video camera connections. You can plug a Raspberry Pi Camera Module directly into this connector and Digilent also plans to offer a camera module for this port.
Here’s a video explaining some of the highlights of the new Zybo Z7.
Note: For more information about the Zybo Z7 dev board, please contact Digilent directly.
Programmable logic is proving to be an excellent, flexible implementation medium for neural networks that gets faster and faster as you go from floating-point to fixed-point representation—making it ideal for embedded AI and machine-learning applications—and the latest proof point is a recently published paper written by Yufeng Hao and Steven Quigley in the Department of Electronic, Electrical and Systems Engineering at the University of Birmingham, UK. The paper is titled “The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA” and it describes a successful implementation and training of a fixed-point Deep Recurrent Neural Network (DRNN) using the Python programming language; the Theano math library and framework for multi-dimensional arrays; the open-source, Python-based PYNQ development environment; the Digilent PYNQ-Z1 dev board; and the Xilinx Zynq Z-7020 SoC on the PYNQ-Z1 board. Using a Python DRNN hardware-acceleration overlay, the two-person team achieved 20GOPS of processing throughput for an NLP (natural language processing) application with this design and outperformed earlier FPGA-based implementation by factors ranging from 2.75x to 70.5x.
Most of the paper discusses NLP and the LM (language model), “which is involved in machine translation, voice search, speech tagging, and speech recognition.” The paper then discusses the implementation of a DRNN LM hardware accelerator using Vivado HLS and Verilog to synthesize a custom overlay for the PYNQ development environment. The resulting accelerator contains five Process Elements (PEs) capable of delivering 20 GOPS in this application. Here’s a block diagram of the design:
DRNN Accelerator Block Diagram
There are plenty of deep technical details embedded in this paper but this one sentence sums up the reason for this blog post about the paper: “More importantly, we showed that a software and hardware joint design and simulation process can be useful in the neural network field.” This statement is doubly true considering that the PYNQ-Z1 dev board sells for $229.
Twelve student and industry teams competed for 30 straight hours in the Xilinx Hackathon 2017 competition in early October and the 3-minute wrap video just appeared on YouTube. The video shows a lot of people having a lot of fun with the Zynq-based Digilent PYNQ-Z1 dev board and Python-based PYNQ development environment:
In the end, the prizes:
For detailed descriptions of the Hackathon entries, see “12 PYNQ Hackathon teams competed for 30 hours, inventing remote-controlled robots, image recognizers, and an air keyboard.”
And a special “Thanks!” to Sparkfun for supplying much of the Hackathon hardware. Sparkfun is headquartered just down the road from the Xilinx facility in Longmont, Colorado.
Xilinx has a terrific tool designed to get you from product definition to working hardware quickly. It’s called SDSoC. Digilent has a terrific dev board to get you up and running with the Zynq SoC quickly. It’s the low-cost Arty Z7. A new blog post by Digilent’s Alex Wong titled “Software Defined SoC on Arty Z7-20, Xilinx ZYNQ evaluation board” posted on RS Online’s DesignSpark site gives you a detailed, step-by-step tutorial on using SDSoC with the Digilent Arty S7. In particular, the focus here is on the ease of moving functions from software running on the Zynq SoC’s Arm Cortex-A9 processors to the Zynq SoC’s programmable hardware using Vivado HLS, which is embedded in SDSoC. That’s so that you can get the performance benefit of hardware-based task execution.
Digilent’s Arty Z7 dev board
Envious of all the cool FPGA-accelerated applications showing up on the Amazon AWS EC2 F1 instance like the Edico Genome DRAGEN Genome Pipeline that set a Guinness World Record last week, the DeePhi ASR (Automatic speech Recognition) Neural Network announced yesterday, Ryft’s cloud-based search and analysis tools, or NGCodec’s RealityCodec video encoder?
Well, you can shake off that green monster by signing up for the free, live, half-day Amazon AWS EC2 F1 instance and SDAccel dev lab being held at SC17 in Denver on the morning of November 15 at The Studio Loft in the Denver Performing Arts Complex (1400 Curtis Street), just across the street from the Denver Convention Center where SC17 is being held. Xilinx is hosting the lab and technology experts from Xilinx, Amazon Web Services, Ryft, and NGCodec will be available onsite.
Here’s the half-day agenda:
8:00 AM Doors open, Registration, and Continental Breakfast
9:00 AM Welcome, Technology Discussion, F1 Developer Use Cases and Demos
9:35 AM Break
9:45 AM Hands-on Training Begins
12:00 PM Developer Lab Concludes
A special guest speaker from Amazon Web Services is also on the agenda.
Lab instruction time includes:
Seats are necessarily limited for a lab like this, so you might want to get your request in immediately. Where? Here.
Earlier this month, I described Aaware’s $199 Far-Field Development Platform for cloud-based, voice controlled systems such as Amazon’s Alexa and Google Home. (See “13 MEMS microphones plus a Zynq SoC gives services like Amazon’s Alexa and Google Home far-field voice recognition clarity.”) This far-field, sound-capture technology exhibits some sophisticated abilities including:
Aaware’s Far-Field Development Platform
These features are layered on top of a Xilinx Zynq SoC or Zynq UltraScale+ MPSoC and Aaware’s CTO Chris Eddington feels that the Zynq devices provide “well over” 10x the performance of an embedded processor thanks to the devices’ on-chip programmable logic, which offloads a significant amount of processing from the on-chip ARM Cortex processor(s). (Aaware can squeeze its technology into a single-core Zynq Z-7007S SoC and can scale up to larger Zynq SoC and Zynq UltraScale+ MPSoC devices as needed by the customer application.)
Aaware’s algorithm development is based on a unique tool chain:
This tool chain allows Aaware to fit the features it wants into the smallest Zynq Z-7007S SoC or to scale up to the largest Zynq UltraScale+ MPSoC.
Amazon AWS’ re:Invent 2017 takes place in Las Vegas on November 27 through December 1. (Tickets nearly sold out as of today.) CMP402, a class session during the event, is titled “Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances.” Here’s the verbatim class description:
“The newly introduced Amazon EC2 F1 OpenCL development workflow helps software developers with little to no FPGA experience supercharge their applications with Amazon EC2 F1. Join us for an overview and demonstration of how to accelerate your C/C++ applications in the cloud using OpenCL with Amazon EC2 F1 instances. We walk you through the development flow for creating a custom hardware acceleration for a software algorithm. Attendees get hands-on and creative by optimizing an algorithm for maximum acceleration on Amazon EC2 F1 instances.”
The Amazon AWS EC2 F1 instance gets its acceleration from Xilinx UltraScale+ VU9P FPGAs and the C/C++/OpenCL programming facility is based on SDAccel—Xilinx’s development environment for accelerating cloud-based applications using C, C++, or OpenCL—which became available for the AWS EC2 F1 instance just last month. (See “SDAccel for cloud-based application acceleration now available on Amazon’s AWS EC2 F1 instance.”)
For more information about the Amazon AWS EC2 F1 instance, see:
Exactly a week ago, Xilinx introduced the Zynq UltraScale+ RFSoC family, which is a new series of Zynq UltraScale+ MPSoCs with RF ADCs and DACs and SD-FECs added. (See “Zynq UltraScale+ RFSoC: All the processing power of 64- and 32-bit ARM cores, programmable logic plus RF ADCs, DACs.”) This past Friday at the Xilinx Showcase held in Longmont, Colorado, Senior Marketing Engineer Lee Hansen demonstrated a Zynq UltraScale+ ZU28DR RFSoC with eight 12-bit, 4Gsamples/sec RF ADCs, eight 14-bit, 6.4Gsamples/sec RF DACs, and eight SD-FECs connected through an appropriate interface to National Instruments’ LabVIEW Systems Engineering Development Environment.
The demo system was generating signals using the RF DACs, receiving the signals using the RF ADCs, and then displaying the resulting signal spectrum using LabVIEW.
Here’s a 3-minute video of the demo:
Over the past weekend, Xilinx held a Showcase and PYNQ Hackathon in its Summit Retreat Center in Longmont, Colorado. About 100 people from tech companies all over Colorado attended the Showcase and twelve teams—about 40 people including students from local universities and engineers from industry—competed in the Hackathon.
Here are a few images from the Xilinx Showcase and PYNQ Hackathon:
Xilinx Summit Retreat Center in Longmont, Colorado
Xilinx CTO Ivo Bolsens welcomes everyone to the Xilinx Showcase
Xilinx Showcase Attendees
More Xilinx Showcase Attendees
Xilinx VP of Interactive Design Tools and Xilinx Longmont Site Manager Dan Gibbons Welcomes the Hackers to the PYNQ Hackathon
Abo’s Pizza (Boulder’s Finest) for Friday Night Dinner
Handing out Digilent PYNQ-Z1 boards
The PYNQ Hackathon Work Begins
Giving a little help to the participants
Focus, Focus, Focus
A Mini Lecture about the PYNQ Logictools Overlay for Hackathon Attendees
Getting a little shuteye
A view of Longs Peak from the Xilinx Longmont Summit Retreat Center
The event was organized by an internal Xilinx team including:
In addition, a team of helpers was on hand over the 30-hour duration of the event to answer questions:
For more information about the PYNQ Hackathon, see “12 PYNQ Hackathon teams competed for 30 hours, inventing remote-controlled robots, image recognizers, and an air keyboard.”
For more information about the Python-based, open-source PYNQ development environment and the Zynq-based Digilent PYNQ-Z1 dev board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.”
Twelve student and industry teams competed for 30 straight hours in the Xilinx Hackathon 2017 competition over the weekend at the Summit Retreat Center in the Xilinx corporate facility located in Longmont, Colorado. Each team member received a Digilent PYNQ-Z1 dev board, which is based on a Xilinx Zynq Z-7020 SoC, and then used their fertile imaginations to conceive of and develop working code for an application using the open-source, Python-based PYNQ development environment, which is based on self-documenting Jupyter Notebooks. The online electronics and maker retailer Sparkfun, located just down the street from the Xilinx facility in Longmont, supplied boxes of compatible peripheral boards with sensors and motor controllers to spur the team members’ imaginations. Several of the teams came from local universities including the University of Colorado at Boulder and the Colorado School of Mines in Golden, Colorado. At the end of the competition, eleven of the teams presented their results using their Jupyter Notebooks. Then came the prizes.
For the most part, team members had never used the PYNQ-Z1 boards and were not familiar with using programmable logic. In part, that was the intent of the Hackathon—to connect teams of inexperienced developers with appropriate programming tools and see what develops. That’s also the reason that Xilinx developed PYNQ: so that software developers and students could take advantage of the improved embedded performance made possible by the Zynq SoC’s programmable hardware without having to use ASIC-style (HDL) design tools to design hardware (unless they want to do so, of course).
Here are the projects developed by the teams, in the order presented during the final hour of the Hackathon (links go straight to the teams’ Github repositories with their Jupyter notebooks that document the projects with explanations and “working” code):
Team John Cena’s Voice-Controlled Mobile Robot
Team “Joy of Pink” developed an emoji generator based on facial interpretation on Microsoft’s cloud-based Azure Emotion API
Team Caffeine’s Audio Fiend Tone-Based Robotic Controller
After the presentations, the judges deliberated for a few minutes using multiple predefined criteria and then awarded the following prizes:
Congratulations to the winners and to all of the teams who spent 30 hours with each other in a large room in Colorado to experience the joy of hacking code to tackle some tough problems. (A follow-up blog will include a photographic record of the event so that you can see what it was like.)
For more information about the PYNQ development environment and the Digilent PYNQ-Z1 board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.”
Late last month, I wrote about an announcement by DNAnexus and Edico Genome that described a huge reduction in the cost and time to analyze genomic information, enabled by Amazon’s FPGA-accelerated AWS EC2 F1 instance. (See “Edico Genome and DNAnexus announce $20, 90-minute genome analysis on Amazon’s FPGA-accelerated AWS EC2 F1 instance.”) The AWS Partner Network blog has just published more details in an article written by Amazon’s Aaron Friedman, titled “How DNAnexus and Edico Genome are Powering Precision Medicine on Amazon Web Services (AWS).”
The details are exciting to say the least. The article begins with this statement:
“Diagnosing the medical mysteries behind acutely ill babies can be a race against time, filled with a barrage of tests and misdiagnoses. During the first few days of life, a few hours can save or seal the fate of patients admitted to the neonatal intensive care units (NICUs) and pediatric intensive care units (PICUs). Accelerating the analysis of the medical assays conducted in these hospitals can improve patient outcomes, and, in some cases, save lives.”
Then, if you read far enough into the post, you find this statement:
“Rady Children’s Institute for Genomic Medicine is one of the global leaders in advancing precision medicine. To date, the institute has sequenced the genomes of more than 3,000 children and their family members to diagnose genetic diseases. 40% of these patients are diagnosed with a genetic disease, and 80% of these receive a change in medical management. This is a remarkable rate of change in care, considering that these are rare diseases and often involve genomic variants that have not been previously observed in other individuals.”
This example is merely a road sign, pointing the way to even more exciting developments in FPGA-accelerated, cloud-based computing to come. Well-known Silicon Valley venture capitalist Jim Hogan directly addressed these developments in a speech at San Jose State University just a couple of weeks ago. (See “Four free training videos (two hour's worth) on using Xilinx SDAccel to create apps for Amazon AWS EC2 F1 instances.”)
The Amazon AWS EC2 F1 instance is a cloud service that’s based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs installed in Amazon’s Web servers. For more information on the AWS EC2 F1 Instance in Xcell Daily, see:
By Adam Taylor
The Xilinx Zynq UltraScale+ MPSoC is good for many applications including embedded vision. It’s APU with two or four 64-bit ARM Cortex-A53 processors, Mali GPU, DisplayPort interface, and on-chip programmable logic (PL) give the Zynq UltraScale+ MPSoC plenty of processing power to address exciting applications such as ADAS and vision-guided robotics with relative ease. Further, we can use the device’s PL and its programmable I/O to interface with a range of vision and video standards including MIPI, LVDS, parallel, VoSPI, etc. When it comes to interfacing image sensors, the Zynq UltraScale+ MPSoC can handle just about anything you throw at it.
Once we’ve brought the image into the Zynq UltraScale+ MPSoC’s PL, we can implement an image-processing pipeline using existing IP cores from the Xilinx library or we can develop our own custom IP cores using Vivado HLS (high-level synthesis). However, for many applications we’ll need to move the images into the device’s PS (processing system) domain before we can apply exciting application-level algorithms such as decision making or use the Xilinx reVISION acceleration stack.
I thought I would kick off the fourth year of this blog with a look at how we can use VDMA instantiated in the Zynq MPSoC’s PL to transfer images from the PL to the PS-attached DDR Memory without processor intervention. You often need to make such high-speed background transfers in a variety of applications.
To do this we will use the following IP blocks:
Once configured over its AXI Lite interface, the Test Pattern Generator outputs test patterns which are then transferred into the PS-attached DDR memory. We can demonstrate that this has been successful by examining the memory locations using SDK.
Enabling the FPD Master and Slave Interfaces
For this simple example, we’ll clock both the AXI networks at the same frequency, driven by PL_CLK_0 at 100MHz.
For a deployed system, an image sensor would replace the TPG as the image source and we would need to ensure that the VDMA input-channel clocks (Slave-to-Memory-Map and Memory-Map-to-Slave) were fast enough to support the required pixel and frame rate. For example, a sensor with a resolution of 1280 pixels by 1024 lines running at 60 frames per second would require a clock rate of at least 108MHz. We would need to adjust the clock frequency accordingly.
Block Diagram of the completed design
To aid visibility within this example, I have included three ILA modules, which are connected to the outputs of the Test Pattern Generator, AXI VDMA, and the Slave Memory Interconnect. Adding these modules enables the use of Vivado’s hardware manager to verify that the software has correctly configured the TPG and the VDMA to transfer the images.
With the Vivado design complete and built, creating the application software to configure the TPG and VDMA to generate images and move them from the PL to the PS is very straightforward. We use the AXIVDMA, V_TPG, Video Common APIs available under the BSP lib source directory to aid in creating the application. The software itself performs the following:
The application will then start generating test frames, transferred from the TPG into the PS DDR memory. I disabled the caches for this example to ensure that the DDR memory is updated.
Examining the ILAs, you will see the TPG generating frames and the VDMA transferring the stream into memory mapped format:
TPG output, TUSER indicates start of frame while TLAST indicates end of line
VDMA Memory Mapped Output to the PS
Examining the frame store memory location within the PS DDR memory using SDK demonstrates that the pixel values are present.
Test Pattern Pixel Values within the PS DDR Memory
You can use the same approach in Vivado when creating software for a Zynq Z-7000 SoC iinstead of a Zynq UltraScale+ MPSoC by enabling the AXI GP master for the AXI Lite bus and AXI HP slave for the VDMA channel.
Should you be experiencing trouble with your VDMA based image processing chain, you might want to read this blog.
The project, as always, is on GitHub.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here
MathWorks has just published a 4-part mini course that teaches you how to develop vision-processing applications using MATLAB, HDL Coder, and Simulink, then walks you through a practical example targeting a Xilinx Zynq SoC using a lane-detection algorithm in Part 4.
Click here for each of the classes:
PDF Solutions provides yield-improvement technologies and services to the IC-manufacturing industry to lower manufacturing costs, improve profitability, and shorten time to market. One of the company’s newest solutions is the eProbe series of e-beam tools used for inline electrical characterization and process control. These tools combine an SEM (scanning electron microscope) and an optical microscope and have the unique ability to provide real-time image analysis of nanometer-scale features. The eProbe development team selected National Instrument’s (NI’s) LabVIEW to control the eProbe system and brought in JKI—a LabVIEW consulting company, Xilinx Alliance Program member, and NI Silver Alliance Partner—to help develop the system.
PDF Solutions eProbe e-beam tool combines an SEM with an optical microscope
In less than four months, JKI helped PDF Solutions attain a 250MHz pixel-acquisition rate from the prototype eProbe using a combination of NI’s FlexRIO module, based on a Xilinx Kintex-7 FPGA, and NI’s LabVIEW FPGA module. According to the PDF Solutions case study published on the JKI Web site, using NI’s LabVIEW allowed the PDF/JKI team to implement the required, real-time FPGA logic and easily integrate third-party FPGA IP in a fraction of the time required by alternative design platforms while still achieving the project’s image-throughput goals.
LabVIEW controls most of the functions within the eProbe that perform the wafer inspection including:
JKI contributed both to the eProbe’s software architecture design and the development of various high-level software components that coordinate and control the low-level hardware functions including data acquisition and image manipulation.
Although the eProbe’s control system runs within NI’s LabVIEW environment, the system’s user interface is based on a C# application from The PEER Group called the Peer Tool Orchestrator (PTO). JKI developed the interface between the eProbe’s front-end user interface and its LabVIEW-based control system using its internally developed tools. (Note: JKI offers several LabVIEW development tools and templates directly on this Web page.)
eProbe user interface screen
Once PDF Solutions started fielding eProbe systems, JKI sent people to work with PDF Solutions’ customers on site in a collaboration that helped generate ideas for future algorithm and tool improvements.
For more information about real-time LabVIEW development using the NI LabVIEW FPGA module and Xilinx-based NI hardware, contact JKI directly.
Xylon has a new hardware/software development kit for quickly implementing embedded, multi-camera vision systems for ADAS and AD (autonomous driving), machine-vision, AR/VR, guided robotics, drones, and other applications. The new logiVID-ZU Vision Development Kit is based on the Xilinx Zynq UltraScale+ MPSoC and includes four Xylon 1Mpixel video cameras based on the TI FPD (flat-panel display) Link-III interface. The kit supports HDMI video input and display output and comes complete with extensive software deliverables including pre-verified camera-to-display SoC designs built with licensed Xylon logicBRICKS IP cores, reference designs and design examples prepared for the Xilinx SDSoC Development Environment, and complete demo Linux applications.
Xylon’s new logiVID-ZU Vision Development Kit
Please contact Xylon for more information about the new logiVID-ZU Vision Development Kit.
Earlier this week at San Jose State University (SJSU), Jim Hogan, one of Silicon Valley’s most successful venture capitalists, gave a talk on the disruptive effects that cognitive science and AI are already having on society. In a short portion of that talk, Hogan discussed how he and one of his teams developed the world’s most experienced lung-cancer radiologist—an AI app—for $75:
Hogan’s trained AI radiologist can look at lung images and find possibly cancerous tumors based on thousands of cases in the CDC database. However, said Hogan, the US Veterans Administration has a database with millions of cases. Yes, his team used that database for training too.
Hogan predicted that something like 25 million AI apps like his lung-cancer-specific radiologist will be developed over the next few years. His $75 example is meant to prove the cost feasibility of developing that many useful apps.
Hogan made me a believer.
In a related connection to AWS app development, Xilinx has just posted four training videos showing you how to develop FPGA-accelerated apps using Xilinx’s SDAccel on Amazon’s AWS EC2 F1 instance. That’s nearly two hours of free training available at your desk. (For more information on the AWS EC2 F1 instance, see “SDAccel for cloud-based application acceleration now available on Amazon’s AWS EC2 F1 instance” and “AWS makes Amazon EC2 F1 instance hardware acceleration based on Xilinx Virtex UltraScale+ FPGAs generally available”.)
Here are the videos:
Note: You can watch Jim Hogan’s 90-minute presentation at SJSU by clicking here.
SDAccel—Xilinx’s development environment for accelerating cloud-based applications using C, C++, or OpenCL—is now available on Amazon’s AWS EC2 F1 instance. (Formal announcement here.) The Amazon EC2 F1 compute instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. SDAccel automates the acceleration of software applications by building application-specific FPGA kernels for the AWS EC2 F1. You can also use HDLs including Verilog and VHDL to define hardware accelerators in SDAccel. With this release, you can access SDAccel through the AWS FPGA developer AMI.
For more information about Amazon’s AWS EC2 F1 instance, see:
For more information about SDAccel, see:
Brandon Treece from National Instruments (NI) has just published an article titled “CPU or FPGA for image processing: Which is best?” on Vision-Systems.com. NI offers a Vision Development Module for LabVIEW, the company’s graphical systems design environment, and can run vision algorithms on CPUs and FPGAs, so the perspective is a knowledgeable one. Abstracting the article, what you get from an FPGA-accelerated imaging pipeline is speed. If you’re performing four 6msec operations on each video frame, a CPU will need 24msec (four times 6msec) to complete the operations while an FPGA offers you parallelism that shortens processing time for each operation and permits overlap among the operations, as illustrated from this figure taken from the article:
In this example, the FPGA needs a total of 6msec to perform the four operations and another 2msec to transfer a video frame back and forth between processor and FPGA. The CPU needs a total of 24msec for all four operations. The FPGA needs 8msec, for a 3x speedup.
Treece then demonstrates that the acceleration is actually much greater in the real world. He uses the example of a video processing sequence needed for particle counting that includes these three major steps:
Here’s an image series that shows you what’s happening at each step:
Using the NI Vision Development Module for LabVIEW, he then runs the algorithm run on an NI cRIO-9068 CompactRIO controller, which is based on a Xilinx Zynq Z-7020 SoC. Running the algorithm on the Zynq SoC’s ARM Cortex-A9 processor takes 166.7msec per frame. Running the same algorithm but accelerating the video processing using the Zynq SoC’s integral FPGA hardware takes 8msec. Add in another 0.5msec for DMA transfer of the pre- and post-processed video frame back and forth between the Zynq SoC’s CPU and FPGA and you get about a 20x speedup.
A key point here is that because the cRIO-9068 controller is based on the Zynq SoC, and because NI’s Vision Development Module for LabVIEW supports FPGA-based algorithm acceleration, this is an easy choice to make. The resources are there for your use. You merely need to click the “Go-Fast” button.
For more information about NI’s Vision Development Module for LabVIEW and cRIO-9068 controller, please contact NI directly.
A new open-source tool named GUINNESS makes it easy for you to develop binarized (2-valued) neural networks (BNNs) for Zynq SoCs and Zynq UltraScale+ MPSoCs using the SDSoC Development Environment. GUINNESS is a GUI-based tool that uses the Chainer deep-learning framework to train a binarized CNN. In a paper titled “On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA,” presented at the recent 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, authors Haruyoshi Yonekawa and Hiroki Nakahara describe a system they developed to implement a binarized CNN for the VGG-16 benchmark on the Xilinx ZCU102 Eval Kit, which is based on a Zynq UltraScale+ ZU9EG MPSoC. Nakahara presented the GUINNESS tool again this week at FPL2017 in Ghent, Belgium.
According to the IEEE paper, the Zynq-based BNN is 136.8x faster and 44.7x more power efficient than the same CNN running on an ARM Cortex-A57 processor. Compared to the same CNN running on an Nvidia Maxwell GPU, the Zynq-based BNN is 4.9x faster and 3.8x more power efficient.
Xilinx ZCU102 Zynq UltraScale+ MPSoC Eval Kit
The Xilinx Technology Showcase 2017 will highlight FPGA-acceleration as used in Amazon’s cloud-based AWS EC2 F1 Instance and for high-performance, embedded-vision designs—including vision/video, autonomous driving, Industrial IoT, medical, surveillance, and aerospace/defense applications. The event takes place on Friday, October 6 at the Xilinx Summit Retreat Center in Longmont, Colorado.
You’ll also have a chance to see the latest ways you can use the increasingly popular Python programming language to create Zynq-based designs. The Showcase is a prelude to the 30-hour Xilinx Hackathon starting immediately after the Showcase. (See “Registration is now open for the Colorado PYNQ Hackathon—strictly limited to 35 participants. Apply now!”)
The Xilinx Technology Showcase runs from 3:00 to 5:00 PM.
Click here for more details and for registration info.
Xilinx Colorado, Longmont Facility
For more information about the FPGA-accelerated Amazon AWS EC2 F1 Instance, see:
The Xilinx Hackathon is a 30-hour marathon event being held at the Xilinx “Retreat” (also known as the Xilinx Colorado facility in Longmont, but see the image below), starting on October 7. The organizers are looking for no more than 35 heroic coders who will receive a Python-programmable, Zynq-based Digilent/Xilinx PYNQ-Z1 board and an assortment of Arduino-compatible shields and sensors. The intent, as Zaphod Beeblebrox might say, is to create something not just amazing but “amazingly amazing.”
Xilinx Colorado, Longmont Facility
In case you’ve not read about it, the PYNQ project is an open-source project from Xilinx that makes it easy to design high-performance embedded systems using Xilinx Zynq Z-7000 SoCs. Here’s what’s on the PYNQ-Z1 board:
The PYNQ-Z1 Board
For more information about the PYNQ project, see: