Two new papers, one about hardware and one about software, describe the Snowflake CNN accelerator and accompanying Torch7 compiler developed by several researchers at Purdue U. The papers are titled “Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks” (the hardware paper) and “Compiling Deep Learning Models for Custom Hardware Accelerators” (the software paper). The authors of both papers are Andre Xian Ming Chang, Aliasger Zaidy, Vinayak Gokhale, and Eugenio Culurciello from Purdue’s School of Electrical and Computer Engineering and the Weldon School of Biomedical Engineering.
In the abstract, the hardware paper states:
“Snowflake, implemented on a Xilinx Zynq XC7Z045 SoC is capable of achieving a peak throughput of 128 G-ops/s and a measured throughput of 100 frames per second and 120 G-ops/s on the AlexNet CNN model, 36 frames per second and 116 Gops/s on the GoogLeNet CNN model and 17 frames per second and 122 G-ops/s on the ResNet-50 CNN model. To the best of our knowledge, Snowflake is the only implemented system capable of achieving over 91% efficiency on modern CNNs and the only implemented system with GoogLeNet and ResNet as part of the benchmark suite.”
The primary goal of the Snowflake accelerator design was computational efficiency. Efficiency and bandwidth are the two primary factors influencing accelerator throughput. The hardware paper says that the Snowflake accelerator achieves 95% computational efficiency and that it can process networks in real time. Because it is implemented on a Xilinx Zynq Z-7045, power consumption is a miserly 5W according to the software paper, well within the power budget of many embedded systems.
The hardware paper also states:
“Snowflake with 256 processing units was synthesized on Xilinx's Zynq XC7Z045 FPGA. At 250MHz, AlexNet achieved in 93:6 frames/s and 1:2GB/s of off-chip memory bandwidth, and 21:4 frames/s and 2:2GB/s for ResNet18.”
Here’s a block diagram of the Snowflake machine architecture from the software paper, from the micro level on the left to the macro level on the right:
There’s room for future performance improvement notes the hardware paper:
“The Zynq XC7Z045 device has 900 MAC units. Scaling Snowflake up by using three compute clusters, we will be able to utilize 768 MAC units. Assuming an accelerator frequency of 250 MHz, Snowflake will be able to achieve a peak performance of 384 G-ops/s. Snowflake can be scaled further on larger FPGAs by increasing the number of clusters.”
This is where I point out that a Zynq Z-7100 SoC has 2020 “MAC units” (actually, DSP48E1 slices)—which is a lot more than you find on the Zynq Z-7045 SoC—and the Zynq UltraScale+ ZU15EG MPSoC has 3528 DSP48E2 slices—which is much, much larger still. If speed and throughput are what you desire in a CNN accelerator, then either of these parts would be worthy of consideration for further development.
A recent White Paper published by National Instruments (NI) and titled “ADAS HIL With Sensor Fusion” discusses the challenges presented by systems that have fused many diverse sensors—and automated driver assistance systems (ADAS) present some of the biggest challenges yet. ADAS systems fuse radar, visible and IR cameras, LIDAR, and ultrasound sensors into an environment sensing system of nearly unprecedented complexity, yet they’re expected to work reliably en masse while installed in consumer-grade automobiles. That’s a stiff challenge indeed.
Meeting these challenges requires good design, yes, and it also requies large amounts of testing. Preferably automated testing because there are too many tests to run by hand.
This NI White Paper discusses the way that Altran Italia met these challenges by creating an ADAS HIL Test Environment Suite based on NI instruments and the LabVIEW Development Environment that could test a variety of automotive electronics control units (ECUs) including a Sensor Fusion ECU, a Radar ECU, and a camera ECU. The ADAS HIL test system generates 3D video scenes for the cameras and simulates radar-detected objects using an RF simulator that’s based, in part, on one or more NI 2nd-generation PXIe-5840 VSTs (Vector Signal Transceivers), which in turn are based on a Xilinx Virtex-7 690T FPGA. (See “NI launches 2nd-Gen 6.5GHz Vector Signal Transceiver with 5x the instantaneous bandwidth, FPGA programmability.”)
ADAS HIL Test Environment
Korea-based ATUS (Across The Universe) has developed a working automotive vision sensor that recognizes objects such as cars and pedestrians using a 17.53frames/sec video stream. A CNN (convolutional neural network) performs the object recognition on 20 different object classes and runs in the programmable logic fabric on a Xilinx Zynq Z7045 SoC. The programmable logic clocks at 200MHz and the entire design draws 10.432W. That’s about 10% of the power required by CPUs or GPUs to implement this CNN.
Here’s a block diagram of the recognition engine in the Zynq SoC’s programmable logic fabric:
ATUS’ Object-Recognition CNN runs in the programmable logic fabric of a Zynq Z7045 SoC
Here’s a short video of ATUS’ Automotive Vision Sensor in action, running on a Xilinx ZC106 eval kit:
Please contact ATUS for more information about their Automotive Vision Sensor.
The latest “Powered by Xilinx” video, published today, provides more detail about the Perrone Robotics MAX development platform for developing all types of autonomous robots—including self-driving cars. MAX is a set of software building blocks for handling many types of sensors and controls needed to develop such robotic platforms.
Perrone Robotics has MAX running on the Xilinx Zynq UltraScale+ MPSoC and relies on that heterogeneous All Programmable device to handle the multiple, high-bit-rate data streams from complex sensor arrays that include lidar systems and multiple video cameras.
Perrone is also starting to develop with the new Xilinx reVISION stack and plans to both enhance the performance of existing algorithms and develop new ones for its MAX development platform.
Here’s the 4-minute video:
Last month, I wrote about Perrone Robotic’s Autonomous Driving Platform based on the Zynq UltraScale+ MPSoC. (See “Linc the autonomous Lincoln MKZ running Perrone Robotics' MAX AI takes a drive in Detroit without puny humans’ help” and “Perrone Robotics builds [Self-Driving] Hot Rod Lincoln with its MAX platform, on a Zynq UltraScale+ MPSoC.”) That platform runs on a controller box supplied by iVeia. In the 2-minute video below, iVeia’s CTO Mike Fawcett describes the attributes of the Zynq UltraScale+ MPSoC that make it a superior implementation technology for autonomous driving platforms. The Zynq UltraScale+ MPSoC’s immense, heterogeneous computing power supplied by six ARM processors plus programmable logic and a few more programmable resources flexibly delivers the monumental amount of processing required for vehicular sensor fusion and real-time perception processing while consuming far less power and generating far less heat than competing solutions involving CPUs or GPUs.
Here’s the video:
Cloud computing and application acceleration for a variety of workloads including big-data analytics, machine learning, video and image processing, and genomics are big data-center topics and if you’re one of those people looking for acceleration guidance, read on. If you’re looking to accelerate compute-intensive applications such as automated driving and ADAS or local video processing and sensor fusion, this blog post’s for you to. The basic problem here is that CPUs are too slow and they burn too much power. You may have one or both of these challenges. If so, you may be considering a GPU or an FPGA as an accelerator in your design.
How to choose?
Although GPUs started as graphics accelerators, primarily for gamers, a few architectural tweaks and a ton of software have made them suitable as general-purpose compute accelerators. With the right software tools, it’s not too difficult to recode and recompile a program to run on a GPU instead of a CPU. With some experience, you’ll find that GPUs are not great for every application workload. Certain computations such as sparse matrix math don’t map onto GPUs well. One big issue with GPUs is power consumption. GPUs aimed at server acceleration in a data-center environment may burn hundreds of watts.
With FPGAs, you can build any sort of compute engine you want with excellent performance/power numbers. You can optimize an FPGA-based accelerator for one task, run that task, and then reconfigure the FPGA if needed for an entirely different application. The amount of computing power you can bring to bear on a problem is scary big. A Virtex UltraScale+ VU13P FPGA can deliver 38.3 INT8 TOPS (that’s tera operations per second) and if you can binarize the application, which is possible with some neural networks, you can hit 500TOPS. That’s why you now see big data-center operators like Baidu and Amazon putting Xilinx-based FPGA accelerator cards into their server farms. That’s also why you see Xilinx offering high-level acceleration programming tools like SDAccel to help you develop compute accelerators using Xilinx All Programmable devices.
For more information about the use of Xilinx devices in such applications including a detailed look at operational efficiency, there’s a new 17-page White Paper titled “Xilinx All Programmable Devices: A Superior Platform for Compute-Intensive Systems.”
Linc, Perrone Robotics’ autonomous Lincoln MKZ automobile, took a drive around the Perrone paddock at the TU Automotive autonomous vehicle show in Detroit last week and Dan Isaacs, Xilinx’s Director Connected Systems in Corporate Marketing, was there to shoot photos and video. Perrone’s Linc test vehicle operates autonomously using the company’s MAX (Mobile Autonomous X), a “comprehensive full-stack, modular, real-time capable, customizable, robotics software platform for autonomous (self-driving) vehicles and general purpose robotics.” MAX runs on multiple computing platforms including one based on an Iveia controller, which is based on an Iveia Atlas SOM, which in turn is based on a Xilinx Zynq UltraScale+ MPSoC. The Zynq UltraScale+ MPSoC handles the avalanche of data streaming from the vehicle’s many sensors to ensure that the car travels the appropriate path and avoids hitting things like people, walls and fences, and other vehicles. That’s all pretty important when the car is driving itself in public. (For more information about Perrone Robotics’ MAX, see “Perrone Robotics builds [Self-Driving] Hot Rod Lincoln with its MAX platform, on a Zynq UltraScale+ MPSoC.”)
Here’s a photo of Perrone’s sensored-up Linc autonomous automobile in the Perrone Robotics paddock at TU Automotive in Detroit:
And here’s a photo of the Iveia control box with the Zynq UltraScale+ MPSoC inside, running Perrone’s MAX autonomous-driving software platform. (Note the controller’s small size and lack of a cooling fan):
Opinions about the feasibility of autonomous vehicles are one thing. Seeing the Lincoln MKZ’s 3800 pounds of glass, steel, rubber, and plastic being controlled entirely by a little silver box in the trunk, that’s something entirely different. So here’s the video that shows Perrone Robotics’ Linc in action, driving around the relative safety of the paddock while avoiding the fences, pedestrians, and other vehicles:
When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.
Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:
That’s a huge range covered in just five minutes.
Here’s the video:
With LED automotive lighting now becoming commonplace, newer automobiles have the ability to communicate with each other (V2V communications) and with roadside infrastructure by quickly flashing their lights (LiFi) instead of using radio protocols. Researchers at OKATEM—the Centre of Excellence in Optical Wireless Communication Technologies at Ozyegin University in Turkey—have developed an OFDM-based LiFi demonstrator for V2V (vehicle-to-vehicle) and V2I (vehicle-to-infrastructure) applications that has achieved 50Mbps communications between vehicles as far apart as 70m in a lab atmospheric emulator.
Inside the OKATEM LiFi Atmospheric Emulator
The demo system is based on PXIe equipment from National Instruments (NI) including FlexRIO FPGA modules. (NI’s PXIe FlexRIO modules are based on Xilinx Virtex-5 and Virtex-7 FPGAs.) The FlexRIO modules implement the LiFi OFDM protocols including channel coding, 4-QAM modulation, and an N-IFFT. Here’s a diagram of the setup:
Researchers developed the LiFi system using NI’s LabVIEW and LabVIEW system engineering software. Initial LiFi system performance demonstrated a data rate of 50 Mbps with as much as 70m between two cars, depending on the photodetectors’ location in the car (particularly its height above ground level). Further work will try to improve the total system performance by integrating advanced capabilities such as multiple-input, multiple-output (MIMO) communication and link adaptation on the top of OFDM architecture.
This project was a 2017 NI Engineering Impact Award Winner in the RF and Mobile Communications category last month at NI Week. It is documented in this NI case study.
My Pappy said
Son, you’re gonna
Drive me to drinkin’
If you don’t stop drivin’
That Hot Rod Lincoln” — Commander Cody & His Lost Planet Airmen
In other words, you need an autonomous vehicle.
For the last 14 years, Perrone Robotics has focused on creating platforms that allow vehicle manufacturers to quickly integrate a variety of sensors and control algorithms into a self-driving vehicle. The company’s MAX (Mobile Autonomous X) is “comprehensive full-stack, modular, real-time capable, customizable, robotics software platform for autonomous (self-driving) vehicles and general purpose robotics.”
Sensors for autonomous vehicles include cameras, lidar, radar, ultrasound, and GPS. All of these sensors generate a lot of data—about 1Mbyte/sec for the Perrone test platform. Designers need to break up all of the processing required for these sensors into tasks that can be distributed to multiple processors and then fuse the processed sensor data (sensor fusion) to achieve real-time, deterministic performance. For the most demanding tasks, software-based processing won’t deliver sufficiently quick response.
Self-driving systems must make as many as 100 decisions/sec based on real-time sensor data. You never know what will come at you.
According to Perrone’s Chief Revenue Officer Dave Hofert, the Xilinx Zynq UltraScale+ MPSoC with its multiple ARM Cortex-A53 and -R5 processors and programmable logic can handle all of these critical tasks and provides a “solution that scales,” with enough processing power to bring in machine learning as well.
Here’s a brand new, 3-minute video with more detail and a lot of views showing a Perrone-equipped Lincoln driving very carefully all by itself:
For more detailed information about Perrone Robotics, see this new feature story from an NBC TV affiliate.
Mentor has just announced the DRS360 platform for developing autonomous driving systems based on the Xilinx Zynq UltraScale+ MPSoC. The automotive-grade DRS360 platform is already designed and tested for deployment in ISO 26262 ASIL D-compliant systems.
This platform offers comprehensive sensor-fusion capabilities for multiple cameras, radar, LIDAR, and other sensors while offering “dramatic improvements in latency reduction, sensing accuracy and overall system efficiency required for SAE Level 5 autonomous vehicles.” In particular, the DRS360 platform’s use of the Zynq UltraScale+ MPSoC permits the use of “raw data sensors,” thus avoiding the power, cost, and size penalties of microcontrollers and the added latency of local processing at the sensor nodes.
Eliminating pre-processing microcontrollers from all system sensor nodes brings many advantages to the autonomous-driving system design including improved real-time performance, significant reductions in system cost and complexity, and access to all of the captured sensor data for a maximum-resolution, unfiltered model of the vehicle’s environment and driving conditions.
Rather than try to scale lower levels of ADAS up, Mentor’s DRS360 platform is optimized for Level 5 autonomous driving, and it’s engineered to easily scale down to Levels 4, 3 and even 2. This approach makes it far easier to develop systems at the appropriate level for the system you’re developing because the DRS360 platform is already designed to handle the most complex tasks from the beginning.
I did not go to Embedded World in Nuremberg this week but apparently SemiWiki’s Bernard Murphy was there and he’s published his observations about three Zynq-based reference designs that he saw running in Aldec’s booth on the company’s Zynq-based TySOM embedded dev and prototyping boards.
Aldec TySOM-2 Embedded Prototyping Board
Murphy published this article titled “Aldec Swings for the Fences” on SemiWiki and wrote:
“At the show, Aldec provided insight into using the solution to model the ARM core running in QEMU, together with a MIPI CSI-2 solution running in the FPGA. But Aldec didn’t stop there. They also showed off three reference designs designed using this flow and built on their TySOM boards.
“The first reference design targets multi-camera surround view for ADAS (automotive – advanced driver assistance systems). Camera inputs come from four First Sensor Blue Eagle systems, which must be processed simultaneously in real-time. A lot of this is handled in software running on the Zynq ARM cores but the computationally-intensive work, including edge detection, colorspace conversion and frame-merging, is handled in the FPGA. ADAS is one of the hottest areas in the market and likely to get hotter since Intel just acquired Mobileye.
“The next reference design targets IoT gateways – also hot. Cloud interface, through protocols like MQTT, is handled by the processors. The gateway supports connection to edge devices using wireless and wired protocols including Bluetooth, ZigBee, Wi-Fi and USB.
“Face detection for building security, device access and identifying evil-doers is also growing fast. The third reference design is targeted at this application, using similar capabilities to those on the ADAS board, but here managing real-time streaming video as 1280x720 at 30 frames per second, from an HDR-CMOS image sensor.”
The article contains a photo of the Aldec TySOM-2 Embedded Prototyping Board, which is based on a Xilinx Zynq Z-7045 SoC. According to Murphy, Aldec developed the reference designs using its own and other design tools including the Aldec Riviera-PRO simulator and QEMU. (For more information about the Zynq-specific QEMU processor emulator, see “The Xilinx version of QEMU handles ARM Cortex-A53, Cortex-R5, Cortex-A9, and MicroBlaze.”)
Then Murphy wrote this:
“So yes, Aldec put together a solution combining their simulator with QEMU emulation and perhaps that wouldn’t justify a technical paper in DVCon. But business-wise they look like they are starting on a much bigger path. They’re enabling FPGA-based system prototype and build in some of the hottest areas in systems today and they make these solutions affordable for design teams with much more constrained budgets than are available to the leaders in these fields.”
This week, EETimes’ Junko Yoshida published an article titled “Xilinx AI Engine Steers New Course” that gathers some comments from industry experts and from Xilinx with respect to Monday’s reVISION stack announcement. To recap, the Xilinx reVISION stack is a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference.
As Xilinx Senior Vice President of Corporate Strategy Steve Glaser tells Yoshida, “Xilinx designed the stack to ‘enable a much broader set of software and systems engineers, with little or no hardware design expertise to develop, intelligent vision guided systems easier and faster.’”
“While talking to customers who have already begun developing machine-learning technologies, Xilinx identified ‘8 bit and below fixed point precision’ as the key to significantly improve efficiency in machine-learning inference systems.”
Yoshida also interviewed Karl Freund, Senior Analyst for HPC and Deep Learning at Moor Insights & Strategy, who said:
“Artificial Intelligence remains in its infancy, and rapid change is the only constant.” In this circumstance, Xilinx seeks “to ease the programming burden to enable designers to accelerate their applications as they experiment and deploy the best solutions as rapidly as possible in a highly competitive industry.”
She also quotes Loring Wirbel, a Senior Analyst at The Linley group, who said:
“What’s interesting in Xilinx's software offering, [is that] this builds upon the original stack for cloud-based unsupervised inference, Reconfigurable Acceleration Stack, and expands inference capabilities to the network edge and embedded applications. One might say they took a backward approach versus the rest of the industry. But I see machine-learning product developers going a variety of directions in trained and inference subsystems. At this point, there's no right way or wrong way.”
There’s a lot more information in the EETimes article, so you might want to take a look for yourself.
Today, EEJournal’s Kevin Morris has published a review article of the announcement titled “Teaching Machines to See: Xilinx Launches reVISION” following Monday’s announcement of the Xilinx reVISION stack for developing vision-guided applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”
“But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms.
“That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric - such as Xilinx Zynq and Altera SoC FPGAs - absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space.”
Later on, Morris gets to the fly in the ointment:
“Oh, yeah, There’s still that “almost impossible to program” issue.”
And then he gets to the solution:
“reVISION, announced this week, is a stack - a set of tools, interfaces, and IP - designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development - with customers predicting cuts of as much as 12 months from current schedules for new product and update development.
In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.
The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ⅕ the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications.”
But don’t take my word for it, read Morris’ article yourself.
As part of today’s reVISION announcement of a new, comprehensive development stack for embedded-vision applications, Xilinx has produced a 3-minute video showing you just some of the things made possible by this announcement.
Here it is:
By Adam Taylor
Several times in this series, we have looked at image processing using the Avnet EVK and the ZedBoard. Along with the basics, we have examined object tracking using OpenCV running on the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PS (processing system) and using HLS with its video library to generate image-processing algorithms for the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic, see blogs 140 to 148 here).
Xilinx’s reVision is an embedded-vision development stack that provides support for a wide range of frameworks and libraries often used for embedded-vision applications. Most exciting, from my point of view, is that the stack includes acceleration-ready OpenCV functions.
The stack itself is split into three layers. Once we select or define our platform, we will be mostly working at the application and algorithm layers. Let’s take a quick look at the layers of the stack:
As I mentioned above one of the most exciting aspects of the reVISION stack is the ability to accelerate a wide range of OpenCV functions using the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL. We can group the OpenCV functions that can be hardware-accelerated using the PL into four categories:
What is very interesting with these function calls is that we can optimize them for resource usage or performance within the PL. The main optimization method is specifying the number of pixels to be processed during each clock cycle. For most accelerated functions, we can choose to process either one or eight pixels. Processing more pixels per clock cycle reduces latency but increases resource utilization. Processing one pixel per clock minimizes the resource requirements at the cost of increased latency. We control the number of pixels processed per clock in via the function call.
Over the next few blogs, we will look more at the reVision stack and how we can use it. However in the best Blue Peter tradition, the image below shows the result of running a reVision Harris OpenCV acceleration function within the PL when accelerated.
Accelerated Harris Corner Detection in the PL
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Today, Xilinx announced a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference. It’s called the reVISION stack and it allows design teams without deep hardware expertise to use a software-defined development flow to combine efficient machine-learning and computer-vision algorithms with Xilinx All Programmable devices to create highly responsive systems. (Details here.)
The Xilinx reVISION stack includes a broad range of development resources for platform, algorithm, and application development including support for the most popular neural networks: AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN. Additionally, the stack provides library elements such as pre-defined and optimized implementations for CNN network layers, which are required to build custom neural networks (DNNs and CNNs). The machine-learning elements are complemented by a broad set of acceleration-ready OpenCV functions for computer-vision processing.
For application-level development, Xilinx supports industry-standard frameworks including Caffe for machine learning and OpenVX for computer vision. The reVISION stack also includes development platforms from Xilinx and third parties, which support various sensor types.
The reVISION development flow starts with a familiar, Eclipse-based development environment; the C, C++, and/or OpenCL programming languages; and associated compilers all incorporated into the Xilinx SDSoC development environment. You can now target reVISION hardware platforms within the SDSoC environment, drawing from a pool of acceleration-ready, computer-vision libraries to quickly build your application. Soon, you’ll also be able to use the Khronos Group’s OpenVX framework as well.
For machine learning, you can use popular frameworks including Caffe to train neural networks. Within one Xilinx Zynq SoC or Zynq UltraScale+ MPSoC, you can use Caffe-generated .prototxt files to configure a software scheduler running on one of the device’s ARM processors to drive CNN inference accelerators—pre-optimized for and instantiated in programmable logic. For computer vision and other algorithms, you can profile your code, identify bottlenecks, and then designate specific functions that need to be hardware-accelerated. The Xilinx system-optimizing compiler then creates an accelerated implementation of your code, automatically including the required processor/accelerator interfaces (data movers) and software drivers.
The Xilinx reVISION stack is the latest in an evolutionary line of development tools for creating embedded-vision systems. Xilinx All Programmable devices have long been used to develop such vision-based systems because these devices can interface to any image sensor and connect to any network—which Xilinx calls any-to-any connectivity—and they provide the large amounts of high-performance processing horsepower that vision systems require.
Initially, embedded-vision developers used the existing Xilinx Verilog and VHDL tools to develop these systems. Xilinx introduced the SDSoC development environment for HLL-based design two years ago and, since then, SDSoC has dramatically and successfully shorted development cycles for thousands of design teams. Xilinx’s new reVISION stack now enables an even broader set of software and systems engineers to develop intelligent, highly responsive embedded-vision systems faster and more easily using Xilinx All Programmable devices.
And what about the performance of the resulting embedded-vision systems? How do their performance metrics compare against against systems based on embedded GPUs or the typical SoCs used in these applications? Xilinx-based systems significantly outperform the best of this group, which employ Nvidia devices. Benchmarks of the reVISION flow using Zynq SoC targets against Nvidia Tegra X1 have shown as much as:
There is huge value to having a very rapid and deterministic system-response time and, for many systems, the faster response time of a design that's been accelerated using programmable logic can mean the difference between success and catastrophic failure. For example, the figure below shows the difference in response time between a car’s vision-guided braking system created with the Xilinx reVISION stack running on a Zynq UltraScale+ MPSoC relative to a similar system based on an Nvidia Tegra device. At 65mph, the Xilinx embedded-vision system’s response time stops the vehicle 5 to 33 feet faster depending on how the Nvidia-based system is implemented. Five to 33 feet could easily mean the difference between a safe stop and a collision.
(Note: This example appears in the new Xilinx reVISION backgrounder.)
The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down. Many new types of neural networks for vision-guided systems have emerged along with new techniques that make deployment of these neural networks much more efficient. No matter what you develop today or implement tomorrow, the hardware and I/O reconfigurability and software programmability of Xilinx All Programmable devices can “future-proof” your designs whether it’s to permit the implementation of new algorithms in existing hardware; to interface to new, improved sensing technology; or to add an all-new sensor type (like LIDAR or Time-of-Flight sensors, for example) to improve a vision-based system’s safety and reliability through advanced sensor fusion.
Xilinx is pushing even further into vision-guided, machine-learning applications with the new Xilinx reVISION Stack and this announcement complements the recently announced Reconfigurable Acceleration Stack for cloud-based systems. (See “Xilinx Reconfigurable Acceleration Stack speeds programming of machine learning, data analytics, video-streaming apps.”) Together, these new development resources significantly broaden your ability to deploy machine-learning applications using Xilinx technology—from inside the cloud to the very edge.
You might also want to read “Xilinx AI Engines Steers New Course” by Junko Yoshida on the EETimes.com site.
It’s amazing what you can do with a few low-cost video cameras and FPGA-based, high-speed video processing. One example: the Virtual Flying Camera that Xylon has implemented with just four video cameras and a Xilinx Zynq Z-7000 SoC. This setup gives the driver a flying, 360-degree view of a car and its surroundings. It’s also known as a bird’s-eye view, but in this case the bird can fly around the car.
Many such implementations of this sort of video technology use GPUs for the video processing, but Xylon uses the programmable logic in the Zynq SoC using custom hardware designed with Xylon logicBRICKS IP cores. The custom hardware implemented in the Zynq SoC’s programmable logic enables very fast execution of complex video operations including camera lens-distortion corrections, video frame grabbing, video rotation, perspective changes, as well as the seamless stitching of four processed video streams into a single display output—and all this occurs in real time. This design approach assures the lowest possible video processing delay at significantly lower power consumption when compared to GPU-based implementations.
A Xylon logi3D Scalable 3D Graphics Controller soft-IP core—also implemented in the Zynq SoC’s programmable logic—renders a 3D vehicle and the surrounding view on the driver’s information display. The Xylon Surround View system permits real-time 3D image generation even in programmable SoCs without an on-chip GPU, as long as there’s programmable logic available to implement the graphics controller. The current version of the Xylon ADAS Surround View Virtual Flying Camera system runs on the Xylon logiADAK Automotive Driver Assistance Kit that is based on the Xilinx Zynq-7000 All Programmable SoC.
Here’s a 2-minute video of the Xylon Surround View system in action:
If you’re attending the CAR-ELE JAPAN show in Tokyo next week, you can see the Xylon Surround View system operating live in the Xilinx booth.
Next week, the Xilinx booth at the CAR-ELE JAPAN show at Tokyo Big Sight will hold a variety of ADAS (Advanced Driver Assistance Systems) demos based on Xilinx Zynq SoC and Zynq UltraScale+ MPSoC devices from several companies including:
The Zynq UltraScale+ MPSoC and original Zynq SoC offer a unique mix of ARM 32- and 64-bit processors with the heavy-duty processing you get from programmable logic, needed to process and manipulate video and to fuse data from a variety of sensors such as video and still cameras, radar, lidar, and sonar to create maps of the local environment.
If you are developing any sort of sensor-based electronic systems for future automotive products, you might want to come by the Xilinx booth (E35-38) to see what’s already been explored. We’re ready to help you get a jump on your design.
Aldec has posted a new 4-minute video with a demonstration of its TySOM-2 Embedded Development Kit generating a 360° view from four Blue Eagle DC3K-1-LVD video cameras plugged into an FMC-ADAS card that is in turn plugged into the TySOM-2 board. The TySOM-2 board is based on a Xilinx Zynq Z-7045 SoC. The demo uses Aldec’s Multi-Camera Surround View technology.
The four video-camera feeds appear choppy in the demo until the FPGA-based acceleration is turned on. At that point, the four video feeds appear on screen in real time with corner-detection annotation added at the full frame rate, thanks to the FPGA-based video processing.
Here’s Aldec’s new video:
This week, National Instruments (NI) announced a technology demonstration of a test system for 76-81GHz automotive radar, targeting ADAS (Advanced Driver Assistance Systems) applications. The system is based on the company’s mmWave front-end technology and its PXIe-5840 2nd-generation vector signal transceiver (VST), introduced earlier this year, which combines a 6.5GHz RF vector signal generator and a 6.5GHz vector signal analyzer in a 2-slot PXIe module. (See “NI launches 2nd-Gen 6.5GHz Vector Signal Transceiver with 5x the instantaneous bandwidth, FPGA programmability.”) The ADAS Test Solution combines NI’s banded, frequency-specific upconverters and downconverters for the 76–81GHz radar band with the 2nd-generation VST’s 1GHz of real-time bandwidth.
The PXIe-5840 VST gets its real-time signal-analysis capabilities from a Xilinx Virtex-7 690T FPGA.
National Instruments PXIe-5840 2nd-generation vector signal transceiver (VST)
Xilinx had a table in Maker’s Alley at the 8th Annual Sparkfun Autonomous Vehicle Competition (AVC), held today in Niwot, Colorado near Boulder. AEs and software engineers from the nearby Xilinx Longmont facility staffed the table along with Aaron Behman and myself. We answered many questions and demonstrated an optical-flow algorithm running on a Zynq-based ZC706 Eval Kit. The demo accepted HDMI video from a camcorder, converted the live HD video stream to greyscale, extracted motion information on a frame-by-frame basis, and displayed the motion on a video monitor using color-coding to express the direction and magnitude of the motion, all in real time. We also gave out 50 Xilinx SDSoC licenses and awarded five Zynq-based ZYBO kits to lucky winners. Digilent supplied the kits. (See “About those Zynq-based Zybo boards we're giving away at Sparkfun’s Autonomous Vehicle Competition: They’re kits now!”)
The Xilinx table in Maker’s Alley at Sparkfun AVC 2016
In case you are not familiar with the Sparkfun AVC, it’s an autonomous vehicle competition and this year, there were two classes of autonomous vehicle: Classic and Power Racing. The Classic class vehicle was about the size of an R/C car and raced on an appropriately sized track with hazards including the Discombobulator (a gasoline-powered turntable), a ball pit, hairpin turns, and an optional dirt-track shortcut. The Power Racing class is based on kid’s Power Wheels vehicles, which are sized to be driven by young kids but in this race were required to be carrying adults. There were races for both autonomous and human-driven Power Racers.
Here’s a video of one of the Sparkfun AVC Classic races getting off to a particularly rocky start:
Here’s a short video of an Autonomous Power Racing race, getting off to an equally disastrous start:
And here’s a long video of an entire, 30-lap, human-driven Power Racing race:
On Saturday, September 17, you’ll be able to get one of 50 free license vouchers for the Xilinx SDSoC Development Environment, which we’re pre-loading along with Vivado HL on a USB drive so you won’t even need to download the software. (Worth $995!)
Where and how?
At the Xilinx Tent in Maker Alley, part of Sparkfun’s 8th annual Autonomous Vehicle Competition (AVC) in Niwot, Colorado. (That’s between Boulder and Longmont if you don’t know about Google Maps.)
There’s one tiny catch. You need an admission ticket to get in.
How much? Early bird AVC tickets are on sale here for $6. Admission at the door on the day of the AVC is $8. That’s a tiny, tiny price for a full day of entertainment watching autonomous vehicles race against time while fighting robots maul or burn each other to a cinder.
However, there’s a way to knock another buck off the already low, low early bird admission price; there’s the secret discount code: SFEFRIENDS.
See you in Niwot. Wear your asbestos underpants.
For more information about the Sparkfun AVC and the Xilinx SDSoC giveaway, see:
Xilinx will be attending this year’s Sparkfun AVC (Autonomous Vehicle Competition) in Colorado on September 17. Haven’t heard about the Sparkfun AVC? Incredibly, this is its eighth year and there are four different competitions this year:
Sparkfun’s AVC is taking place in the Sparkfun parking lot. Sparkfun is located in beautiful Niwot, Colorado. Where’s that? On the Diagonal halfway between Boulder and Longmont, of course.
Haven’t heard of Sparkfun? They’re an online electronics retailer at the epicenter of the maker movement. Sparkfun’s Web site is chock full of tutorials and just-plain-weird videos for all experience levels from beginner to engineer. I’m a regular viewer of the company’s Friday new-product videos. Also a long-time customer.
Xilinx will be exhibiting an embedded-vision demo in Maker’s Alley tent at AVC this year because Xilinx All Programmable devices like the Zynq-7000 SoC and Zynq UltraScale+ MPSoC give you a real competitive advantage when developing a quick, responsive autonomous vehicle.
If you are entering this year’s AVC and are using Xilinx All Programmable devices in your vehicle, please let me know in the comments below or come to see us in the tent at the event. We want to help make you famous for your effort!
Here’s an AVC video from Sparkfun to give you a preview of the AVC:
Here's the PRS video:
And here's the Robot Combat video:
Xylon has introduced logiADAK 3.2, the latest version of the company’s ADAS toolset for the Xilinx Zynq-7000 SoC. This new release includes a new toolset for driver drowsiness detection based on facial movements monitored through a camera placed in a vehicle cabin and significantly expanded and improved forward camera collision avoidance ADAS based on detection and recognition of vehicles, pedestrians and bikes. The current logiADAK kit includes around ten different ADAS applications, ranging from design frameworks to complete, production-ready solutions that help you create highly differentiated driver assistance applications.
If you’re not yet familiar with Xylon’s logiADAK toolkit, here’s Xilinx’s Aaron Behman with a quick, 90-second video demo shot at the recent Embedded Vision Summit:
If you look at what’s happening with Moore’s Law (just read any article about the topic during the last two years), you see that systems design is being forced to make use of All Programmable devices at an increasing rate because of the enormous NRE costs associated with roll-your-own ASICs at 16nm, 10nm, and below. Companies still need the differentiation afforded by custom hardware to boost product margins in their competitive, global marketplaces, but they need to get it in a different way.
Nowhere is that more true than in the six Megatrends that Xilinx has identified:
These Megatrends drive the future of the electronics industry—and they drive Xilinx’s future as well. Xilinx has made a slick, 4-minute video discussing these trends:
Truthfully, I didn’t write that headline. It’s the title of yesterday’s Frost & Sullivan press release awarding Xilinx the 2016 North American Frost & Sullivan Award for Product Leadership, based on the consulting firm’s recent analysis of the automotive programmable logic devices market for advanced driver assistance systems (ADAS). The press release continues: “Xilinx is uniquely positioned to cater to current and future market needs.”
To date, you’ve seen very little in the Xcell Daily blog about Xilinx and ADAS systems, not because Xilinx isn’t working closely with automotive Tier 1 suppliers and OEMs on ADAS systems but because those companies really have not wanted any publicity about that highly competitive work and so I could not write about the many, many design wins. In reality, more than 20 of these automotive suppliers and OEMs have been working with Xilinx on ADAS designs over the last few years.
The subhead of the Frost & Sullivan press release captures the reality of this effort:
”Superior product value has made Xilinx’s devices the preferred choice for current and evolving ADAS modules among global OEMs.”
And, since I’m already quoting from this Frost & Sullivan press release, let me add this quote:
“The company has strong technical capabilities and a successful track record in multiple sensor applications that include radar, light detection and ranging (LIDAR), and camera systems, all of which give it an edge over competing system on chip (SoC) suppliers,” said Frost & Sullivan Industry Analyst, Arunprasad Nandakumar. “Xilinx’s Zynq UltraSCALE+ multiprocessor SoC (MPSoC), scores high on scalability, modularity, reliability, and quality.”
“Xilinx adheres to self-defined standards that exceed industry requirements. Its FPGAs and PLDs are far ahead of the baseline defined by AEC-Q100, which is the standard stress test qualification requirement for electronic components used in automotive applications. In fact, Xilinx has introduced its own Beyond AEC-Q100 testing that characterizes its robust XA family of products.”
And this final quote sums it up:
“In recognition of its strong product portfolio, which is aligned perfectly with the vision of automated driving, Xilinx receives the 2016 North American Frost & Sullivan Product Leadership Award. Each year, this award is presented to the company that has developed a product with innovative features and functionality, gaining rapid acceptance in the market. The award recognizes the quality of the solution and the customer value enhancements it enables.
“Frost & Sullivan’s Best Practices Awards recognize companies in a variety of regional and global markets for outstanding achievement in areas such as leadership, technological innovation, customer service, and product development. Industry analysts compare market participants and measure performance through in-depth interviews, analysis, and extensive secondary research.”
Would you like to see the results of those in-depth interviews, analysis, and extensive secondary research? Thought you might.
There’s a companion 12-page Frost & Sullivan research paper attached to this blog. Just click below.
I’ve written previously about Apertus, the Belgian company behind the AXIOM open-source 4K cinema camera effort. (See below.) I met with two of the Apertus principals, Sebastian Pichelhofer and Herbert Pötzl, at last month’s Embedded World 2016 in Nuremberg. They carry the coolest business cards I’ve seen in a long, long time:
Pichelhofer and Pötzl were making the rounds at the Embedded World show to talk about their 3rd-generation AXIOM camera, the Gamma. This is the big, modular, pro-level 4K cinema camera that leverages the knowledge gained in the design of the AXIOM Alpha and Beta cameras. Like the earlier cameras, the AXIOM Gamma is based on a CMOSIS imager and a Xilinx Zynq-7000 SoC (a Z-7030). The AXIOM Beta is based on an Avnet MicroZed SOM with a Zynq Z-7020 SoC.
Here’s a closeup photo of the AXIOM Beta’s Image Sensor Module:
AXIOM Beta 4K Cinema Camera Image Sensor Module
And here’s a photo of the back of the AXIOM Beta Image Sensor Module showing the Zynq-based Avnet MicroZed board that’s currently being used:
Back Side of AXIOM Gamma 4K Cinema Camera Image Sensor Module showing Avnet MicroZed SOM
The AXIOM Beta is currently operational and the gents from Apertus directed me to the Antmicro booth at the show to see a working model. Here’s a photo from the Antmicro booth:
A working AXIOM Beta 4K camera in the Antmicro booth
Antmicro, located in Poland, is a partner working with Apertus on the AXIOM camera. Although I didn’t see it at Embedded World, here’s a photo of the AXIOM Gamma Image Sensor Module prototype from the Antmicro Web site:
AXIOM Gamma 4K Cinema Camera Image Sensor Module
While at the Antmicro booth, I met team leader Karol Gugala, who impressed me with his knowledge of the Zynq-7000 SoC. He’s already developed several Zynq-based projects including a distance-measuring system for an autonomous mining vehicle based on stereo video imagers. Here’s a photo of that project taken at the Antmicro booth:
Antmicro Zynq-based Stereo Distance Measuring Board
Although we spoke for only 10 minutes or so, I was really impressed with Gugala’s knowledge and his considerable experience with the Zynq-7000 SoC. I immediately dubbed him “King of Zynq,” in my mind at least. Antmicro is currently working with Apertus on the AXIOM Gamma design and I can hardly wait to see what this international team produces.
Earlier Xcell Daily blog posts about the AXIOM 4K cinema cameras:
Xylon’s logiADAK Automotive Driver Assistance Kit and logiRECORDER Multi-Channel Video Recording ADAS Kit provide you with a number of essential building blocks needed to develop your own vision-based ADAS (advanced driver assist system) systems based on the Xilinx Zynq SoC for a wide range of vehicle designs. The logiADAK kit comes with a full set of DA demo applications, customizable reference SoC designs, software drivers, libraries, and documentation. The logiRECORDER kit includes hardware and software necessary for synchronous video recording of up to six uncompressed video streams from Xylon video cameras.
Xylon has just published a short video showing these kits in action:
The CAR-ELE show for automotive OEMs and Tier 1 suppliers kicked off at Tokyo Big Sight in Japan today and I received this image of an RC car equipped with five video cameras and a Zynq SoC from Naohiro Jinbo at the Xilinx booth:
The image shows a transparent-bodied RC car equipped with the five video cameras facing off against four pedestrians and two other vehicles towards the bottom of the image. You can also see two screen pairs at the top of the booth. The left screen in the rightmost screen pair shows a bird’s-eye view around the RC car. That image is a real-time fusion of the five video streams from the cameras on the RC car. The other screen in the rightmost pair shows real-time object detection in action. Pedestrians are highlighted in bounding boxes. Both screens are generated live by the car’s on-board Zynq SoC and both of these demos rely on the programmable logic in the Zynq SoC to perform the heavy lifting required by the real-time video processing.
This 5-Camera ADAS Development Platform demo is being presented by Xylon, eVS (embedded Vision Systems), and DDC (Digital Design Corp). The demo is based on Xylon’s logiADAK Driver Assistance Kit version 3.1, which extends the functionality of the company’s logiADAK platform to include efficient multi-object classification, encompassing vehicle and cyclist detection in addition to pedestrian detection.
"Tokyo Big Sight at Night" by Masato Ohta from Tokyo, Japan. - Flickr. Licensed under CC BY 2.0 via Commons