Today, Digilent announced a $299 bundle including its Zybo Z7-20 dev board (based on a Xilinx Zynq Z-7020 SoC), a Pcam 5C 5Mpixel (1080P) color video camera, and a Xilinx SDSoC development environment voucher. (That’s the same price as a Zybo Z7-20 dev board without the camera.) The Zybo Z7 dev board includes a new 15-pin FFC connector that allows the board to interface with the Pcam 5C camera over a 2-lane MIPI CSI-2 and I2C interfaces. (This connector is pin-compatible with the Raspberry Pi’s FFC camera port.) The Pcam 5C camera is based on the Omnivision OV5640 image sensor.
Digilent has created the Pcam 5C + Zybo Z7 demo project to get you started. The demo accepts video from the Pcam 5C camera and passes it out to a display via the Zybo Z7’s HDMI port. All IP used in the demo including a D-PHY receiver, CSI-2 decoder, Bayer to RGB converter and gamma correction is free and open-source so you can study exactly how the D-PHY and CSI-2 decoding works and then develop you own embedded vision products.
If you want this deal, you’d better hurry. The offer expires February 23—three weeks from today.
Rigol’s new RSA5000 real-time spectrum analyzer allows you to capture, identify, isolate, and analyze complex RF signals with a 40MHz real-time bandwidth over either a 3.2GHz or 6.5GHz signal span. It’s designed for engineers working on RF designs in the IoT and IIot markets as well as industrial, scientific, and medical equipment. Rigol was demonstrating the RSA5000 real-time spectrum analyzer at this week’s DesignCon being held at the Santa Clara Convention Center. I listened to a presentation from Rigol’s North American General Manager Mike Rizzo and then a demo by Rigol’s Director of Product Marketing & Software Applications Chris Armstrong, both captured in the 2.5-minute video below.
Rigol RSA5000 Real-Time Spectrum Analyzer
Based on what I saw in the demo, this is an extremely responsive instrument—far more responsive than a swept spectrum analyzer—with several visualization display modes to help you isolate the significant signal in a sea of signals and noise, in real time. It’s capable of continuously executing 146,484 FFTs/sec, which results in a minimum 100% POI (probability of intercept) of 7.45μsec. You need some real DSP horsepower to achieve that sort of performance and the Rigol RSA5000 real-time spectrum analyzer gets this performance from a pair of Xilinx Zynq Z-7015 SoCs. (You'll find many more details about real-time spectrum analysis and the RSA5000 Real-Time Spectrum Analyzer in the Rigol app note "Realtime Spectrum Analyzer vs Spectrum Analyzer," attached at the end of this post. See below.)
Here’s the short presentation and demo of the Rigol RSA5000 real-time spectrum analyzer from DesignCon 2018:
Mike Rizzo told me that the Rigol design engineers selected the Zynq Z-7015 SoCs for three main reasons:
High-bandwidth access between the Zynq SoC’s PS (processing system) and PL (programmable logic)
Excellent development tools including Xilinx’s Vivado HLS
If you’re looking for a very capable spectrum analyzer, give the Rigol RSA5000 a look. If you’re designing your own real-time system and need high-speed computation coupled with fast user response, take a look at the line of Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.
In a new report titled “Hitting the accelerator: the next generation of machine-learning chips,” Deloitte Global predicted that “by the end of 2018, over 25 percent of all chips used to accelerate machine learning in the data center will be FPGAs and ASICs.” The report then continues: “These new kinds of chips should increase dramatically the use of ML, enabling applications to consume less power and at the same time become more responsive, flexible and capable, which is likely to expand the addressable market.” And later in the Deloitte Global report:
“There will also be over 200,000 FPGA and 100,000 ASIC chips sold for ML applications.”
“…the new kinds of chips may dramatically increase the use of ML, enabling applications to use less power and at the same time become more responsive, flexible and capable, which is likely to expand the addressable market…”
“Total 2018 FPGA chip volume for ML would be a minimum of 200,000. The figure is almost certainly going to be higher, but by exactly how much is difficult to predict.”
These sorts of statements are precisely why Xilinx has rapidly expanded its software offerings for machine-learning development from the edge to the cloud. That includes the reVISION stack for developing responsive and reconfigurable vision systems and the Reconfigurable Acceleration stack for developing and deploying platforms at cloud scale.
Xcell Daily has covered the FPGA-accelerated AWS EC2 F1 instances from Amazon Web Services several times. The AWS EC2 F1 instances allows AWS customers to develop accelerated code in C, C++, OpenCL, Verilog, or VHDL and run it on Amazon servers augmented with hardware-accelerated cards based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. (See below.)
A new AWS case study titled “Xilinx Speeds Testing Time, Increases Developer Productivity Using AWS” turns the tables. It discusses Xilinx’s use of AWS services to speed development of Xilinx development software such as the Vivado and SDx development environments. Xilinx employs extensive regression testing when developing new releases of these complex tools and the resulting demand spikes called for more “elastic” server resources. (Amazon’s “EC2” designation stands for “Elastic Compute Cloud.”)
As the case study states:
“Xilinx addressed its infrastructure-scaling problem by migrating to a high-performance computing (HPC) cluster running on Amazon Web Services (AWS). ‘We evaluated several cloud providers and chose AWS because it had the best tools and most mature solution,’” says [Ambs] Kesavan, [software engineering and DevOps director at Xilinx].
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Huawei’s FACS cloud offering is based on a PCIe server card that incorporates a Xilinx Virtex UltraScale+ VU9P FPGA. (Huawei also offers the board for on-premise installations.) In addition to the hardware, Huawei offers three major development tools for FACS:
An SDAccel-based shell that offers fast, easy development. SDAccel is Xilinx’s development environment for C, C++, and OpenCL. This shell also provides access to Xilinx’s Vivado development environment.
A DPDK shell for high-performance applications. Intel originally developed DPDK as a packet-processing framework for accelerated server systems and Huawei’s implementation can support throughputs as high as 12Gbytes/sec.
A Professional Simulation Platform that encapsulates more than two decades of Huawei’s FPGA development experience.
With these offerings, Davies said, Huawei is looking to add partners to expand its ecosystem and is particularly interested in talking to companies that offer:
There’s a Huawei Cloud Marketplace that serves as an outlet for FACS applications. The company is also welcoming end users to try the service.
Here’s a video of Davies’ 32-minute presentation at XDF:
Amazon’s Senior Director of Business Development and Product, Gadi Hutt, gave an in-depth presentation at the recent Xilinx Developers Forum in Frankfurt, Germany where he detailed the specifics, advantages, and the nuts-and-bolts “how to” with respect to using the FPGA-based AWS EC2 F1 instances to accelerate your business.
First, Hutt gave one of the most succinct definitions of “the cloud” I’ve heard: “the on-demand delivery of compute, storage, networking, etc. services.” This definition is free of the niggling details such as hardware, networking, power, and cooling that you are now free to ignore.
Then Hutt listed the advantages of cloud-based services:
Agility and speed of innovation
Elasticity: scale up or down quickly, as needed
Breadth of functionality
Go global in minutes
From there, Hutt provided a deep explanation of the steps you need to take to distribute cloud-based services globally. He also quoted a Gartner estimate, which said that AWS (Amazon Web Services) has more compute capacity than all of the other cloud providers combined. Certainly, this Gartner report puts AWS far in the upper right corner of the Gartner Magic Quadrant for Cloud Infrastructure as a Service, Worldwide.
Using AWS allows your company to “get out of IT” and focus on providing specialized services where you can add value, said Hutt. “You can focus on your core business,” he continued.
Earlier this month, Xilinx held a developer’s forum in Frankfurt, Germany and Xilinx’s Senior Director for Software and IP Ramine Roan discussed the growing role of Xilinx All Programmable devices in his opening remarks, which appear in a New Electronics article written by Neil Tyler titled “Resurgence of interest in FPGAs helped by new services via the Cloud.” Roane started by stating something that any design team already knows: CPU architectures are failing to meet the demand of increasing workloads because Dennard frequency and power scaling—often erroneously lumped into Moore’s Law, which is really about transistor and density scaling—essentially died several years ago after several decades of robust health. The current workaround—multicore architectures—rapidly hits its own limits in most embedded systems where there just aren’t enough tasks to distribute to dozens of processor cores.
The article then quotes Roane:
“There are too many transistors switching at the same time and current leakage at lower geometries is hitting power constraint limits, and this is all happening at a time when workload demand is growing exponentially both in the Cloud and at the edge.”
One solution, hardware application accelerators, only make sense if the production volumes are justified. For that you need a killer app said Roane.
Problem: there just aren’t that many killer apps.
The current situation plays to the strengths of Xilinx All Programmable devices, which can be reconfigured for a truly wide range of applications. “They provide configurable processor sub-systems and hardware that can be reconfigured dynamically,” said Roane.
The problem, of course, is that taking advantage of the programmable hardware resources in Xilinx devices has not been as easy as it might be. In the past, you needed specialized hardware-design skills; You needed to know Verilog or VHDL; You needed to wade into possibly unfamiliar hardware waters.
Roane emphasized that things are very different today. As the article states, “Xilinx and its growing ecosystem of partners are now delivering a much richer development stack so that hardware, embedded and application software developers can program them more easily by using higher level programming options, like C, C++ and OpenCL.”
“We are now able to deliver a development stack that designers are increasingly familiar with and which is also available on the Cloud via secure cloud services platforms,” added Roane, referring to Xilinx-based cloud acceleration offerings from Amazon Web Services (AWS EC2 F1 instances) and Alibaba Cloud.
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Embedded-vision applications present many design challenges and a new ElectronicsWeekly.com article written by Michaël Uyttersprot, a Technical Marketing Manager at Avnet Silica, and titled “Bringing embedded vision systems to market” discusses these challenges and solutions.
First, the article enumerates several design challenges including:
Meeting hi-res image-processing demands within cost, size, and power goals
Handling a variety of image-sensor types
Handling multiple image sensors in one camera
Real-time compensation (lens correction) for inexpensive lenses
Distortion correction, depth detection, dynamic range, and sharpness enhancement
Next, the article discusses Avnet Silica’s various design offerings that help engineers quickly develop embedded-vision designs. Products discussed include:
DornerWorks is one of only three Xilinx Premier Alliance Partners in North America offering design services, so the company has more than a little experience using Xilinx All Programmable devices. The company has just launched a new learn-by-email series with “interesting shortcuts or automation tricks related to FPGA development.”
The series is free but you’ll need to provide an email address to receive the lessons. I signed up and immediately received a link to the first lesson titled “Algorithm Implementation and Acceleration on Embedded Systems” written by DornerWorks’ Anthony Boorsma. It contains information about the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC and the Xilinx SDSoC development environment.
As a Xilinx employee I would like to contribute on the Pros ... and the Cons.
Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.
Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:
FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...
Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.
Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).
Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].
Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:
IP Integrator(IPI): a graphical interface to stitch IPs together and generate bitstreams for complete systems.
Vivado HLS (High Level Synthesis): a tool that allows you to generate HDL code from C/C++ code. This tool will generate IPs that can be handled by IPI.
SDSoC (Software Defined SoC): This tool allows you to design complete systems, software and hardware on a Zynq SoC/Zynq UltraScale+ MPSoC platform. This tool with some plugins will allow you to move part of your C/C++ code to programmable logic (calling Vivado HLS in the background).
SDAccel: an OpenCL (and more) implementation. Not relevant for this thread.
There are also tools related to the MathWorks environment [MATLAB and Simulink]:
System Generator for DSP (aka SysGen): Low-level Simulink library (designed by Xilinx for Xilinx FPGAs). Allows you to program HDL code with blocks. This tools achieves even better performance (clock/area) than HDL code as each block is an instance of an IP (from register, adder, counter, multiplier up to FFT, FIR compiler, and VHLS IP). Bit-true and cycle-true simulations.
Xilinx Model Composer (XMC): available since ... yesterday! Again a Simulink blockset but based on Vivado HLS. Much faster simulations. Bit-true but not cycle-true.
All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…
If you want your design to run at maximum speed at the lowest possible power consumption (and who does not?), then you want to run your algorithms using fixed-point hardware. With that in mind, MathWorks has just published an extensive guide to “Best Practices for Converting MATLAB Code to Fixed Point” for MATLAB-based designs with a nearly hour-long companion video.
Mathworks has been advocating model-based design using its MATLAB and Simulink development tools for some time because the design technique allows you to develop more complex software with better quality in less time. (See the Mathworks White Paper: “How Small Engineering Teams Adopt Model-Based Design.”) Model-based design employs a mathematical and visual approach to developing complex control and signal-processing systems through the use of system-level modeling throughout the development process—from initial design, through design analysis, simulation, automatic code generation, and verification. These models are executable specifications that consist of block diagrams, textual programs, and other graphical elements. Model-based design encourages rapid exploration of a broader design space than other design approaches because you can iterate your design more quickly, earlier in the design cycle. Further, because these models are executable, verification becomes an integral part of the development process at every step. Hopefully, this design approach results in fewer (or no) surprises at the end of the design cycle.
Xilinx supports model-based design using MATLAB and Simulink through the new Xilinx Model Composer, a design tool that integrates into the MATLAB and Simulink environments. The Xilinx Model Composer includes libraries with more than 80 high-level, performance-optimized, Xilinx-specific blocks including application-specific blocks for computer vision, image processing, and linear algebra. You can also import your own custom IP blocks written in C and C++, which are subsequently processed by Vivado HLS.
Here’s a block diagram that shows you the relationship among Mathworks’ MATLAB, Simulink, and Xilinx Model Composer:
Finally, here’s a 6-minute video explaining the benefits and use of Xilinx Model Composer:
Good machine learning heavily depends on large training-data sets, which are not always available. There’s a solution to this problem called transfer learning, which allows the new neural network to leverage an already trained neural network as a starting point. Kaan Kara at ETH Zurich has published an example of transfer learning as a Jupyter Notebook for the Zynq-and-Python based PYNQ development environment on Github. This demo uses the ZipML-PYNQ overlay and analyzes astronomical images of galaxies and puts the images into one of two classes: one showing images of merging galaxies and one that doesn’t.
Designing SDRs (software-defined radios)? MathWorks and Analog Devices have joined together to bring you a free Webinar titled “Radio Deployment on SoC Platforms.” It a 45-minute class that discusses hardware and software development for SDR designs using MathWorks’ MATLAB, Simulink, and HDL Coder to:
Model and simulate radio designs
Verify algorithms in simulation with streaming RF data
Deploy radio designs on hardware with HDL and C-code generation
Analog Devices’ Zynq-based RF SOM on a Carrier Card
There will be three broadcasts of the Webinar on December 13 to accommodate viewers around the world. Register here. Register even if you cannot attend and you’ll receive a link to a recording of the session.
If you’ve got some high-speed RF analog work to do, VadaTech’s new AMC598 and VPX598 Quad ADC/Quad DAC modules appear to be real workhorses. The four 14-bit ADCs (using two AD9208 dual ADCs) operate at 3Gsamples/sec and the quad 16-bit DACs (using four AD9162 or AD9164 DACs) operate at 12Gsamples/sec. You’re not going to drive those sorts of data rates over the host bus so the modules have local memory in the form of three DDR4 SDRAM banks for a total of 20Gbytes of on-board SDRAM. A Xilinx Kintex UltraScale KCU115 FPGA (aka the DSP Monster, the largest Kintex UltraScale FPGA family member with 5520 DSP slices that give you an immense amount of digital signal processing power to bring to bear on those RF analog signals) manages all of the on-board resources (memory, analog converters, and host bus) and handles the blazingly fast data-transfer rates allowing you to create RF waveform generators and advanced RF-capture systems for applications including communications and signal intelligence (COMINT/SIGINT), radar, and electronic warfare using Xilinx tools including the Vivado Design Suite HLx Editions and the Xilinx Vivado System Generator for DSP, which can be used in conjunction with MathWorks’ MATLAB and the Simulink model-based design tool.
Here’s a block diagram of the AMC598 module:
VadaTech AMC598 Quad ADC/Quad DAC Block Diagram
And here’s a photo of the AMC598 Quad ADC/Quad DAC module:
VadaTech AMC598 Quad ADC/Quad DAC
Note: Please contact VadaTech directly for more information about the AMC598 and VPX598 Quad ADC/Quad DAC modules.
Digilent has announced a major upgrade to the Zynq-based Zybo dev board, now called the Zybo Z7. The original board was based on a Xilinx Zynq Z-7010 SoC with the integrated Arm Cortex-A9 MPCore processors running at 650MHz. The new Zybo Z7-10 and -20 dev boards are based on the Zynq Z-7010 and Z-7020 SoC respectively, and the processors now run at 667MHz. The Zybo Z7-10 sells for $199 (currently, you can get a voucher for the Xilinx SDSoC development environment for $10 more) and the Zybo Z7-20 board with triple the programmable logic resources sells for $299 (and currently includes the SDSoC voucher).
Digilent Zybo Z7-20 Dev Board based on Zynq Z-7020 SoC
In addition to the faster processors, there are several additional upgrades made to the Zybo Z7 versus the Zybo dev board. SDRAM capacity has increased from 512Mbytes on the original Zybo board to 1Gbyte on the Zybo Z7. The new boards now have two HDMI ports to support “bump-in-the-wire” HDMI applications. Both boards now also include a connector with a MIPI CSI-2 interface for video camera connections. You can plug a Raspberry Pi Camera Module directly into this connector and Digilent also plans to offer a camera module for this port.
Here’s a video explaining some of the highlights of the new Zybo Z7.
Note: For more information about the Zybo Z7 dev board, please contact Digilent directly.
Programmable logic is proving to be an excellent, flexible implementation medium for neural networks that gets faster and faster as you go from floating-point to fixed-point representation—making it ideal for embedded AI and machine-learning applications—and the latest proof point is a recently published paper written by Yufeng Hao and Steven Quigley in the Department of Electronic, Electrical and Systems Engineering at the University of Birmingham, UK. The paper is titled “The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA” and it describes a successful implementation and training of a fixed-point Deep Recurrent Neural Network (DRNN) using the Python programming language; the Theano math library and framework for multi-dimensional arrays; the open-source, Python-based PYNQ development environment; the Digilent PYNQ-Z1 dev board; and the Xilinx Zynq Z-7020 SoC on the PYNQ-Z1 board. Using a Python DRNN hardware-acceleration overlay, the two-person team achieved 20GOPS of processing throughput for an NLP (natural language processing) application with this design and outperformed earlier FPGA-based implementation by factors ranging from 2.75x to 70.5x.
Most of the paper discusses NLP and the LM (language model), “which is involved in machine translation, voice search, speech tagging, and speech recognition.” The paper then discusses the implementation of a DRNN LM hardware accelerator using Vivado HLS and Verilog to synthesize a custom overlay for the PYNQ development environment. The resulting accelerator contains five Process Elements (PEs) capable of delivering 20 GOPS in this application. Here’s a block diagram of the design:
DRNN Accelerator Block Diagram
There are plenty of deep technical details embedded in this paper but this one sentence sums up the reason for this blog post about the paper: “More importantly, we showed that a software and hardware joint design and simulation process can be useful in the neural network field.” This statement is doubly true considering that the PYNQ-Z1 dev board sells for $229.
Xilinx has a terrific tool designed to get you from product definition to working hardware quickly. It’s called SDSoC. Digilent has a terrific dev board to get you up and running with the Zynq SoC quickly. It’s the low-cost Arty Z7. A new blog post by Digilent’s Alex Wong titled “Software Defined SoC on Arty Z7-20, Xilinx ZYNQ evaluation board” posted on RS Online’s DesignSpark site gives you a detailed, step-by-step tutorial on using SDSoC with the Digilent Arty S7. In particular, the focus here is on the ease of moving functions from software running on the Zynq SoC’s Arm Cortex-A9 processors to the Zynq SoC’s programmable hardware using Vivado HLS, which is embedded in SDSoC. That’s so that you can get the performance benefit of hardware-based task execution.
Envious of all the cool FPGA-accelerated applications showing up on the Amazon AWS EC2 F1 instance like the Edico Genome DRAGEN Genome Pipeline that set a Guinness World Record last week, the DeePhi ASR (Automatic speech Recognition) Neural Network announced yesterday, Ryft’s cloud-based search and analysis tools, or NGCodec’s RealityCodec video encoder?
Well, you can shake off that green monster by signing up for the free, live, half-day Amazon AWS EC2 F1 instance and SDAccel dev lab being held at SC17 in Denver on the morning of November 15 at The Studio Loft in the Denver Performing Arts Complex (1400 Curtis Street), just across the street from the Denver Convention Center where SC17 is being held. Xilinx is hosting the lab and technology experts from Xilinx, Amazon Web Services, Ryft, and NGCodec will be available onsite.
Here’s the half-day agenda:
8:00 AM Doors open, Registration, and Continental Breakfast
9:00 AM Welcome, Technology Discussion, F1 Developer Use Cases and Demos
9:35 AM Break
9:45 AM Hands-on Training Begins
12:00 PM Developer Lab Concludes
A special guest speaker from Amazon Web Services is also on the agenda.
Lab instruction time includes:
Step-by-step instructions to connect to an F1 instance
Interactive walkthrough of the SDAccel Development Environment
Highlights of SDAccel IDE features: compile, debug, profile
Instruction for how to develop a sample framework acceleration app
Seats are necessarily limited for a lab like this, so you might want to get your request in immediately. Where? Here.
The ability to cancel interfering noise without a reference signal. (Competing solutions focus on AEC—acoustic echo cancellation—which cancels noise relative to a required audio reference channel.)
Support for non-uniform 1D and 2D microphone array spacing.
Scales up with more microphones for noisier environments.
Offers a one-chip solution for sound capture, multiple wake words, and customer applications. (Today this is a two-chip solution.)
Makes everything available in a “software-ready” environment: Just log in to the Ubuntu linux environment and use Aaware’s streaming audio API to begin application development.
Aaware’s Far-Field Development Platform
These features are layered on top of a Xilinx Zynq SoC or Zynq UltraScale+ MPSoC and Aaware’s CTO Chris Eddington feels that the Zynq devices provide “well over” 10x the performance of an embedded processor thanks to the devices’ on-chip programmable logic, which offloads a significant amount of processing from the on-chip ARM Cortex processor(s). (Aaware can squeeze its technology into a single-core Zynq Z-7007S SoC and can scale up to larger Zynq SoC and Zynq UltraScale+ MPSoC devices as needed by the customer application.)
Aaware’s algorithm development is based on a unique tool chain:
Algorithm development in MathWork’s MATLAB.
Hand-coding of an equivalent application in C++.
Initial hardware-accelerator synthesis from the C++ specification using Vivado HLS.
Use of Xilinx SDSoC to connect the hardware accelerators to the AXI bus and memory.
This tool chain allows Aaware to fit the features it wants into the smallest Zynq Z-7007S SoC or to scale up to the largest Zynq UltraScale+ MPSoC.
“The newly introduced Amazon EC2 F1 OpenCL development workflow helps software developers with little to no FPGA experience supercharge their applications with Amazon EC2 F1. Join us for an overview and demonstration of how to accelerate your C/C++ applications in the cloud using OpenCL with Amazon EC2 F1 instances. We walk you through the development flow for creating a custom hardware acceleration for a software algorithm. Attendees get hands-on and creative by optimizing an algorithm for maximum acceleration on Amazon EC2 F1 instances.”
Over the past weekend, Xilinx held a Showcase and PYNQ Hackathon in its Summit Retreat Center in Longmont, Colorado. About 100 people from tech companies all over Colorado attended the Showcase and twelve teams—about 40 people including students from local universities and engineers from industry—competed in the Hackathon.
Here are a few images from the Xilinx Showcase and PYNQ Hackathon:
Xilinx Summit Retreat Center in Longmont, Colorado
Xilinx CTO Ivo Bolsens welcomes everyone to the Xilinx Showcase
Xilinx Showcase Attendees
More Xilinx Showcase Attendees
Xilinx VP of Interactive Design Tools and Xilinx Longmont Site Manager Dan Gibbons Welcomes the Hackers to the PYNQ Hackathon
Abo’s Pizza (Boulder’s Finest) for Friday Night Dinner
Twelve student and industry teams competed for 30 straight hours in the Xilinx Hackathon 2017 competition over the weekend at the Summit Retreat Center in the Xilinx corporate facility located in Longmont, Colorado. Each team member received a Digilent PYNQ-Z1 dev board, which is based on a Xilinx Zynq Z-7020 SoC, and then used their fertile imaginations to conceive of and develop working code for an application using the open-source, Python-based PYNQ development environment, which is based on self-documenting Jupyter Notebooks. The online electronics and maker retailer Sparkfun, located just down the street from the Xilinx facility in Longmont, supplied boxes of compatible peripheral boards with sensors and motor controllers to spur the team members’ imaginations. Several of the teams came from local universities including the University of Colorado at Boulder and the Colorado School of Mines in Golden, Colorado. At the end of the competition, eleven of the teams presented their results using their Jupyter Notebooks. Then came the prizes.
For the most part, team members had never used the PYNQ-Z1 boards and were not familiar with using programmable logic. In part, that was the intent of the Hackathon—to connect teams of inexperienced developers with appropriate programming tools and see what develops. That’s also the reason that Xilinx developed PYNQ: so that software developers and students could take advantage of the improved embedded performance made possible by the Zynq SoC’s programmable hardware without having to use ASIC-style (HDL) design tools to design hardware (unless they want to do so, of course).
Here are the projects developed by the teams, in the order presented during the final hour of the Hackathon (links go straight to the teams’ Github repositories with their Jupyter notebooks that document the projects with explanations and “working” code):
Team “from timemachine import timetravel” developed a sine wave generator with a PYNQ-callable frequency modulator and an audio spectrum analyzer. Time permitted the team to develop a couple of different versions of the spectrum analyzer but not enough time to link the generator and analyzer together.
Team “John Cena” developed a voice-controlled mobile robot. An application on a PC captured the WAV file for a spoken command sequence and this file was then wirelessly transmitted to the mobile robot, which interpreted commands and executed them.
Team John Cena’s Voice-Controlled Mobile Robot
Inspired by the recent Nobel Physics prize given to the 3-person team that architected the LIGO gravity-wave observatories, Team “Daedalus” developed a Hackathon entry called “Sonic LIGO”—a sound localizer that takes audio captured by multiple microphones, uses time correlation to filter audio noise from the sounds of interest, and then triangulates the location of the sound using its phase derived from each microphone. Examples of sound events the team wanted to locate included hand claps and gun shots. The team planned to use its members’ three PYNQ-Z1 boards for the triangulation.
Team “Questionable” from the Colorado School of Mines developed an automated parking lot assistant to aid students looking for a parking space near the university. The design uses two motion detectors to detect cars passing through each lot’s entrances and exits. Timing between the two sensors determines whether the car is entering or leaving the lot. The team calls their application PARQYNG and produced a YouTube video to explain the idea:
Team “Snapback” developed a Webcam-equipped cap that captures happy moments by recognizing smiling faces and using that recognition to trigger the capture of a short video clip, which is then wirelessly uploaded to the cloud for later viewing. This application was inspired by the oncoming memory loss of one of the team members’ grandmother.
Team “Trimble” from Trimble, Inc. developed a sophisticated photogrammetric application for determining position using photogrammetry techniques. The design uses the Zynq SoC’s programmable logic to accelerate the calculations.
Team “Codeing Crazy” developed an “air keyboard” (it’s like a working air guitar but it’s a keyboard) using OpenCV to recognize the image of a hand in space, locate the recognized object in a space that’s predefined as a keyboard, and then playing the appropriate note.
Team “Joy of Pink” from CU Boulder developed a real-time emoji generator that recognizes facial expressions in an image, interprets the emotion shown on the subject’s face by sending the image to Microsoft’s cloud-based Azure Emotion API, and then substituting the appropriate emoji in the image.
Team “Joy of Pink” developed an emoji generator based on facial interpretation on Microsoft’s cloud-based Azure Emotion API
Team “Harsh Constraints” plunged headlong into a Verilog-based project to develop a 144MHz LVDS Cameralink interface to a thermal camera. It was a very ambitious venture for a team that had never before used Verilog.
Team “Caffeine” developed a tone-controlled robot using audio filters instantiated in the Zynq SoC’s programmable logic to decode four audio tones which then control robot motion. Here’s a block diagram:
Team Caffeine’s Audio Fiend Tone-Based Robotic Controller
Team “Lynx” developed a face-recognition system that stores faces in the cloud in a spreadsheet on a Google drive based on whether or not the system has seen that face before. The system uses Haar-Cascade detection written in OpenCV.
After the presentations, the judges deliberated for a few minutes using multiple predefined criteria and then awarded the following prizes:
The “Murphy’s Law” prize for dealing with insurmountable circumstances went to Team Harsh Constraints.
The “Best Use of Programmable Logic” prize went to Team Caffeine.
The “Runner Up” prize went to Team Snapback.
The “Grand Prize” went to Team Questionable.
Congratulations to the winners and to all of the teams who spent 30 hours with each other in a large room in Colorado to experience the joy of hacking code to tackle some tough problems. (A follow-up blog will include a photographic record of the event so that you can see what it was like.)
The details are exciting to say the least. The article begins with this statement:
“Diagnosing the medical mysteries behind acutely ill babies can be a race against time, filled with a barrage of tests and misdiagnoses. During the first few days of life, a few hours can save or seal the fate of patients admitted to the neonatal intensive care units (NICUs) and pediatric intensive care units (PICUs). Accelerating the analysis of the medical assays conducted in these hospitals can improve patient outcomes, and, in some cases, save lives.”
Then, if you read far enough into the post, you find this statement:
“Rady Children’s Institute for Genomic Medicine is one of the global leaders in advancing precision medicine. To date, the institute has sequenced the genomes of more than 3,000 children and their family members to diagnose genetic diseases. 40% of these patients are diagnosed with a genetic disease, and 80% of these receive a change in medical management. This is a remarkable rate of change in care, considering that these are rare diseases and often involve genomic variants that have not been previously observed in other individuals.”
The Amazon AWS EC2 F1 instance is a cloud service that’s based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs installed in Amazon’s Web servers. For more information on the AWS EC2 F1 Instance in Xcell Daily, see:
The Xilinx Zynq UltraScale+ MPSoC is good for many applications including embedded vision. It’s APU with two or four 64-bit ARM Cortex-A53 processors, Mali GPU, DisplayPort interface, and on-chip programmable logic (PL) give the Zynq UltraScale+ MPSoC plenty of processing power to address exciting applications such as ADAS and vision-guided robotics with relative ease. Further, we can use the device’s PL and its programmable I/O to interface with a range of vision and video standards including MIPI, LVDS, parallel, VoSPI, etc. When it comes to interfacing image sensors, the Zynq UltraScale+ MPSoC can handle just about anything you throw at it.
Once we’ve brought the image into the Zynq UltraScale+ MPSoC’s PL, we can implement an image-processing pipeline using existing IP cores from the Xilinx library or we can develop our own custom IP cores using Vivado HLS (high-level synthesis). However, for many applications we’ll need to move the images into the device’s PS (processing system) domain before we can apply exciting application-level algorithms such as decision making or use the Xilinx reVISION acceleration stack.
I thought I would kick off the fourth year of this blog with a look at how we can use VDMA instantiated in the Zynq MPSoC’s PL to transfer images from the PL to the PS-attached DDR Memory without processor intervention. You often need to make such high-speed background transfers in a variety of applications.
To do this we will use the following IP blocks:
Zynq MPSoC core – Configured to enable both a Full Power Domain (FPD) AXI HP Master and FPD HPC AXI Slave, along with providing at least one PL clock and reset to the PL fabric.
VDMA core – Configured for write only operations, No FSync option and with a Genlock Mode of master
Test Pattern Generator (TPG) – Configurable over the AXI Lite interface
AXI Interconnects – Implement the Master and Slave AXI networks
Once configured over its AXI Lite interface, the Test Pattern Generator outputs test patterns which are then transferred into the PS-attached DDR memory. We can demonstrate that this has been successful by examining the memory locations using SDK.
Enabling the FPD Master and Slave Interfaces
For this simple example, we’ll clock both the AXI networks at the same frequency, driven by PL_CLK_0 at 100MHz.
For a deployed system, an image sensor would replace the TPG as the image source and we would need to ensure that the VDMA input-channel clocks (Slave-to-Memory-Map and Memory-Map-to-Slave) were fast enough to support the required pixel and frame rate. For example, a sensor with a resolution of 1280 pixels by 1024 lines running at 60 frames per second would require a clock rate of at least 108MHz. We would need to adjust the clock frequency accordingly.
Block Diagram of the completed design
To aid visibility within this example, I have included three ILA modules, which are connected to the outputs of the Test Pattern Generator, AXI VDMA, and the Slave Memory Interconnect. Adding these modules enables the use of Vivado’s hardware manager to verify that the software has correctly configured the TPG and the VDMA to transfer the images.
With the Vivado design complete and built, creating the application software to configure the TPG and VDMA to generate images and move them from the PL to the PS is very straightforward. We use the AXIVDMA, V_TPG, Video Common APIs available under the BSP lib source directory to aid in creating the application. The software itself performs the following:
Initialize the TPG and the AXI VDMA for use in the software application
Configure the TPG to generate a test pattern configured as below
Set the Image Width to 1280, Image Height to 1080
Set the color space to YCRCB, 4:2:2 format
Set the TPG background pattern
Enable the TPG and set it for auto reloading
Configure the VDMA to write data into the PS memory
Set up the VDMA parameters using a variable of the type XAxiVdma_DmaSetup – remember the horizontal size and stride are measured in bytes not pixels.
Configure the VDMA with the setting defined above
Set the VDMA frame store location address in the PS DDR
Start VDMA transfer
The application will then start generating test frames, transferred from the TPG into the PS DDR memory. I disabled the caches for this example to ensure that the DDR memory is updated.
Examining the ILAs, you will see the TPG generating frames and the VDMA transferring the stream into memory mapped format:
TPG output, TUSER indicates start of frame while TLAST indicates end of line
VDMA Memory Mapped Output to the PS
Examining the frame store memory location within the PS DDR memory using SDK demonstrates that the pixel values are present.
Test Pattern Pixel Values within the PS DDR Memory
You can use the same approach in Vivado when creating software for a Zynq Z-7000 SoC iinstead of a Zynq UltraScale+ MPSoC by enabling the AXI GP master for the AXI Lite bus and AXI HP slave for the VDMA channel.
Should you be experiencing trouble with your VDMA based image processing chain, you might want to read this blog.
MathWorks has just published a 4-part mini course that teaches you how to develop vision-processing applications using MATLAB, HDL Coder, and Simulink, then walks you through a practical example targeting a Xilinx Zynq SoC using a lane-detection algorithm in Part 4.
PDF Solutions provides yield-improvement technologies and services to the IC-manufacturing industry to lower manufacturing costs, improve profitability, and shorten time to market. One of the company’s newest solutions is the eProbe series of e-beam tools used for inline electrical characterization and process control. These tools combine an SEM (scanning electron microscope) and an optical microscope and have the unique ability to provide real-time image analysis of nanometer-scale features. The eProbe development team selected National Instrument’s (NI’s) LabVIEW to control the eProbe system and brought in JKI—a LabVIEW consulting company, Xilinx Alliance Program member, and NI Silver Alliance Partner—to help develop the system.
PDF Solutions eProbe e-beam tool combines an SEM with an optical microscope
In less than four months, JKI helped PDF Solutions attain a 250MHz pixel-acquisition rate from the prototype eProbe using a combination of NI’s FlexRIO module, based on a Xilinx Kintex-7 FPGA, and NI’s LabVIEW FPGA module. According to the PDF Solutions case study published on the JKI Web site, using NI’s LabVIEW allowed the PDF/JKI team to implement the required, real-time FPGA logic and easily integrate third-party FPGA IP in a fraction of the time required by alternative design platforms while still achieving the project’s image-throughput goals.
LabVIEW controls most of the functions within the eProbe that perform the wafer inspection including:
Controlling the x and y axis for the stage
Sampling and driving various I/O points for the electron gun and the column
Controlling the load port and equipment frontend module
Overseeing the vacuum and interlocking components
Directing and managing SEM and optical image acquisition.
JKI contributed both to the eProbe’s software architecture design and the development of various high-level software components that coordinate and control the low-level hardware functions including data acquisition and image manipulation.
Although the eProbe’s control system runs within NI’s LabVIEW environment, the system’s user interface is based on a C# application from The PEER Group called the Peer Tool Orchestrator (PTO). JKI developed the interface between the eProbe’s front-end user interface and its LabVIEW-based control system using its internally developed tools. (Note: JKI offers several LabVIEW development tools and templates directly on this Web page.)
eProbe user interface screen
Once PDF Solutions started fielding eProbe systems, JKI sent people to work with PDF Solutions’ customers on site in a collaboration that helped generate ideas for future algorithm and tool improvements.
For more information about real-time LabVIEW development using the NI LabVIEW FPGA module and Xilinx-based NI hardware, contact JKI directly.