UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

The just-announced VICO-4 TICO SDI Converter from Village Island employs visually lossless 4:1 TICO compression to funnel a 4K60p video (on four 3G-SDI video streams or one 12G-SDI stream) into onto a single 3G-SDI output stream, which reduces infrastructure costs for transport, cabling, routing, and compression in broadcast networks.

 

 

 

Village Island VICO-4.jpg

 

 

VICO-4 4:1 SDI Converter from Village Island

 

 

 

Here’s a block diagram of what’s going on inside of Village Island’s VICO-4 TICO SDI Converter:

 

 

Village Island VICO-4 Block Diagram.jpg 

 

And here’s a diagram showing you what broadcasters can do with this sort of box:

 

 

Village Island VICO-4 Distribution Diagram.jpg

 

 

 

The reason this is even possible in a real-time broadcast environment is because the lightweight intoPIX TICO compression algorithm has very low latency (just very a few video lines) when implemented in hardware as IP. (Software-based, frame-by-frame video compression is therefore totally out of the question in an application such as this introduces too much delay.)

 

Looking at the VICO-4’s main (and only) circuit board shows one main chip implementing the 4:1 compression and signal multiplexing. And that chip is… a Xilinx Kintex UltraScale KU035 FPGA. It has plenty of on-chip programmable logic for the TICO compression IP and it has sixteen 16.3Gbps transceiver ports—more than plenty to handle the 3G- and 12G-SGI I/O required by this application.

 

 

Village Island VICO-4 pcb.jpg 

 

 

Note: Paltek in Japan is distributing Village Island’s VICO-4 board in Japan as an OEM component. The board needs 12Vdc at ~25VA.

 

 

 

For more information about TICO compression IP, see:

 

 

 

 

 

 

 

Laser-based, industrial 3D Camera from VRmagic resolves complex surfaces with 1/64 sub-pixel accuracy

by Xilinx Employee ‎03-23-2017 10:48 AM - edited ‎03-23-2017 11:08 AM (438 Views)


VRmagic LineCam3D.jpgA configurable, COG (center-of-gravity), laser-line extraction algorithm allows VRmagic’s LineCam3D to resolve complex surface contours with 1/64 sub-pixel accuracy. (The actual measurement precision, which can be as small as a micrometer, depends on the optics attached to the camera.) The camera must process the captured video internally because, at its maximum 1KHz scan rate, there would be far more raw contour data than can be pumped over the camera’s GigE Vision interface. The algorithm therefore runs in real time on the camera’s internal Xilinx series 7 FPGA, which is paired with a TI DaVinci SoC to handle other processing chores and 2Gbytes of DDR3 SDRAM. The camera’s imager is a 2048x1088-pixel CMOSIS CMV2000 CMOS image sensor with a pipelined global shutter. The VRmagic LineCam3D also has a 2D imaging mode that permits the extraction of additional object information such as surface printing that would not appear on the contour scans (as demonstrated in the photo below).

 

Here’s a composite photo of the camera’s line-scan contour output (upper left), the original object being scanned (lower left), and the image of the object constructed from the contour scans (right):

 

 

VRmagic LineCam3D Output.jpg 

 

In laser-triangulation measurement setups, the camera’s lens plane is not parallel to the scanned object’s image plane, which means that only a relatively small part of the laser-scanned image would normally be in focus due to limited depth of focus. To compensate for this, the LineCam3D integrates a 10° tilt-shift adapter into its rugged IP65/67 aluminum housing, to expand the maximum in-focus object height. Anyone familiar with photographic tilt-shift lenses—mainly used for architectural photography in the non-industrial world—immediately recognizes this as the Scheimpflug principle, which increases depth of focus by tilting the lens relative to both the imager plane and the subject plane. It’s fascinating that this industrial camera incorporates this ability into the camera body so that any C-mount lens can be used as a tilt-shift lens.

 

 

For more information about the LineCam3D camera, please contact VRmagic directly.

 

 

 

The Sundance DSP PXIe700 module is a 3U PXIe card with an on-board Xilinx Kintex-7 FPGA (a 325T or a 410T), so it can perform nearly any signal-processing or control task you can imagine.

 

 

Sundance PXIe-700 Kintex-7 Module Photo.jpg 

 

 

Sundance PXIe700 module based on a Xilinx Kintex-7 FPGA

 

 

Here’s a block diagram of the Sundance PXIe700 module:

 

 

Sundance PXIe-700 Kintex-7 Module.jpg

 

 

Sundance PXIe700 module based on a Xilinx Kintex-7 FPGA, Block Diagram

 

 

 

Sundance provides this board with the SCom IP Core, which communicates to the host through the PCIe interface and provides the user logic instantiated in the Kintex-7 FPGA with a multichannel streaming interface to the host CPU and sample applications. Other IP cores, a Windows driver, DLL, and user-interface software are also available. The PXIe700 data sheet also mentions a VideoGuru toolset that can turn this hardware into a video test center for  NTSC, VGA, DVI, SMPTE, GigE-Vision, and other video  standards. (Contact Sundance DSP for more details.)

 

The Sundance product page also shows the PXIe700 board with a couple of Sundance FMC modules attached as example applications:

 

 

Sundance PXIe-700 With attached FMC-DAQ2p5.jpg 

 

 

Sundance PXIe700 with attached FMC-DAQ2p5 multi-Gsample/sec ADC and DAC card

 

 

 

 

Sundance PXIe-700 With attached FMC-ADC500-5.jpg

 

 

Sundance PXIe700 with attached FMC-ADC500-5 5-channel, 500Msamples/sec, 16-bit ADC card

 

 

 

 

 

 

I did not go to Embedded World in Nuremberg this week but apparently SemiWiki’s Bernard Murphy was there and he’s published his observations about three Zynq-based reference designs that he saw running in Aldec’s booth on the company’s Zynq-based TySOM embedded dev and prototyping boards.

 

 

Aldec TySOM-2 Prototyping Board.jpg

 

Aldec TySOM-2 Embedded Prototyping Board

 

 

 

Murphy published this article titled “Aldec Swings for the Fences” on SemiWiki and wrote:

 

 

“At the show, Aldec provided insight into using the solution to model the ARM core running in QEMU, together with a MIPI CSI-2 solution running in the FPGA. But Aldec didn’t stop there. They also showed off three reference designs designed using this flow and built on their TySOM boards.

 

“The first reference design targets multi-camera surround view for ADAS (automotive – advanced driver assistance systems). Camera inputs come from four First Sensor Blue Eagle systems, which must be processed simultaneously in real-time. A lot of this is handled in software running on the Zynq ARM cores but the computationally-intensive work, including edge detection, colorspace conversion and frame-merging, is handled in the FPGA. ADAS is one of the hottest areas in the market and likely to get hotter since Intel just acquired Mobileye.

 

“The next reference design targets IoT gateways – also hot. Cloud interface, through protocols like MQTT, is handled by the processors. The gateway supports connection to edge devices using wireless and wired protocols including Bluetooth, ZigBee, Wi-Fi and USB.

 

“Face detection for building security, device access and identifying evil-doers is also growing fast. The third reference design is targeted at this application, using similar capabilities to those on the ADAS board, but here managing real-time streaming video as 1280x720 at 30 frames per second, from an HDR-CMOS image sensor.”

 

The article contains a photo of the Aldec TySOM-2 Embedded Prototyping Board, which is based on a Xilinx Zynq Z-7045 SoC. According to Murphy, Aldec developed the reference designs using its own and other design tools including the Aldec Riviera-PRO simulator and QEMU. (For more information about the Zynq-specific QEMU processor emulator, see “The Xilinx version of QEMU handles ARM Cortex-A53, Cortex-R5, Cortex-A9, and MicroBlaze.”)

 

Then Murphy wrote this:

 

“So yes, Aldec put together a solution combining their simulator with QEMU emulation and perhaps that wouldn’t justify a technical paper in DVCon. But business-wise they look like they are starting on a much bigger path. They’re enabling FPGA-based system prototype and build in some of the hottest areas in systems today and they make these solutions affordable for design teams with much more constrained budgets than are available to the leaders in these fields.”

 

 

 

Image3.jpgAEye is the latest iteration of the eye-tracking technology developed by EyeTech Digital Systems. The AEye chip is based on the Zynq Z-7020 SoC. It’s located immediately adjacent to the imaging sensor, which creates compact, stand-alone systems. This technology is finding its way into diverse vision-guided systems in the automotive, AR/VR, and medical diagnostic arenas. According to EyeTech, the Zynq SoC’s unique abilities allows the company to create products they could not do any other way.

 

With the advent of the reVISION stack, EyeTech is looking to expand its product offerings into machine learning, as discussed in this short, 3-minute video:

 

 

 

 

 

 

For more information about EyeTech, see:

 

 

 

 

EETimes’ Junko Yoshida with some expert help analyzes this week’s Xilinx reVISION announcement

by Xilinx Employee ‎03-15-2017 01:25 PM - edited ‎03-22-2017 07:20 AM (585 Views)

 

Image3.jpgThis week, EETimes’ Junko Yoshida published an article titled “Xilinx AI Engine Steers New Course” that gathers some comments from industry experts and from Xilinx with respect to Monday’s reVISION stack announcement. To recap, the Xilinx reVISION stack is a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference.

 

(See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)

 

As Xilinx Senior Vice President of Corporate Strategy Steve Glaser tells Yoshida, “Xilinx designed the stack to ‘enable a much broader set of software and systems engineers, with little or no hardware design expertise to develop, intelligent vision guided systems easier and faster.’

 

Yoshida continues:

 

While talking to customers who have already begun developing machine-learning technologies, Xilinx identified ‘8 bit and below fixed point precision’ as the key to significantly improve efficiency in machine-learning inference systems.

 

 

Yoshida also interviewed Karl Freund, Senior Analyst for HPC and Deep Learning at Moor Insights & Strategy, who said:

 

Artificial Intelligence remains in its infancy, and rapid change is the only constant.” In this circumstance, Xilinx seeks “to ease the programming burden to enable designers to accelerate their applications as they experiment and deploy the best solutions as rapidly as possible in a highly competitive industry.

 

 

She also quotes Loring Wirbel, a Senior Analyst at The Linley group, who said:

 

What’s interesting in Xilinx's software offering, [is that] this builds upon the original stack for cloud-based unsupervised inference, Reconfigurable Acceleration Stack, and expands inference capabilities to the network edge and embedded applications. One might say they took a backward approach versus the rest of the industry. But I see machine-learning product developers going a variety of directions in trained and inference subsystems. At this point, there's no right way or wrong way.

 

 

There’s a lot more information in the EETimes article, so you might want to take a look for yourself.

 

 

 

Zynq + PYNQ + Python + BNNs: Machine inference does not get any easier… or faster

by Xilinx Employee ‎03-14-2017 03:10 PM - edited ‎03-15-2017 10:25 AM (3,435 Views)

 

Machine learning and machine inference based on CNNs (convolutional neural networks) are the latest way to classify images and, as I wrote in Monday’s blog post about the new Xilinx reVISION announcement, “The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down.” (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”) The challenge now is to make the CNNs run faster while consuming less power. It would be nice to make them easier to use as well.

 

OK, that’s a setup. A paper published last month at the 25th International Symposium on Field Programmable Gate Arrays titled “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference” describes a method to speed up CNN-based inference while cutting power consumption by reducing CNN precision in the inference machines. As the paper states:

 

…a growing body of research demonstrates this approach [CNN] incorporates significant redundancy. Recently, it has been shown that neural networks can classify accurately using one- or two-bit quantization for weights and activations.  Such a combination of low-precision arithmetic and small memory footprint presents a unique opportunity for fast and energy-efficient image classification using Field Programmable Gate Arrays (FPGAs). FPGAs have much higher theoretical peak performance for binary operations compared to floating point, while the small memory footprint removes the off-chip memory bottleneck by keeping parameters on-chip, even for large networks. Binarized Neural Networks (BNNs), proposed by Courbariaux et al., are particularly appealing since they can be implemented almost entirely with binary operations, with the potential to attain performance in the teraoperations per second (TOPS) range on FPGAs.

 

The paper then describes the techniques developed by the authors to generate BNNs and instantiate them into FPGAs. The results, based on experiment using a Xilinx ZC706 eval kit based on a Zynq Z-7045 SoC, are impressive:

 

When it comes to pure image throughput, our designs outperform all others. For the MNIST dataset, we achieve an FPS which is over 48/6x over the nearest highest throughput design [1] for our SFC-max/LFC-max designs respectively. While our SFC-max design has lower accuracy than the networks implemented by Alemdar et al. for our LFC-max design outperforms their nearest accuracy design by over 6/1.9x for throughput and FPS/W respectively. For other datasets, our CNV-max design outperforms TrueNorth for FPS by over 17/8x for CIFAR-10 / SVHN datasets respectively, while achieving 9.44x higher throughput than the design by Ovtcharov et al., and 2:2x over the fastest results reported by Hegde et al. Our prototypes have classification accuracy within 3% of the other low-precision works, and could have been improved by using larger BNNs.

 

There’s something even more impressive, however. This design approach to creating BNNs is so scalable that it’s now on a low-end platform—the $229 Digilent PYNQ-Z1. (Digilent’s academic price for the PYNQ-Z1 is only $65!) Xilinx Research Labs in Ireland, NTNU (Norwegian U. of Science and Technology), and the U. of Sydney have released an open-source Binarized Neural Network (BNN) Overlay for the PYNQ-Z1 based on the work described in the above paper.

 

According to Giulio Gambardella of Xilinx Reseach Labs, “…running on the PYNQ-Z1 (a smaller Zynq 7020), [the PYNQ-Z1] can achieve 168,000 image classifications per second with 102µsec latency on the MNIST dataset with 98.40% accuracy, and 1700 images per seconds with 2.2msec latency on the CIFAR-10, SVHN, and GTSRB dataset, with 80.1%, 96.69%, and 97.66% accuracy respectively running at under 2.5W.”

 

 

PYNQ-Z1.jpg

 

Digilent PYNQ-Z1 board, based on a Xilinx Zynq Z-7020 SoC

 

 

 

Because the PYNQ-Z1 programming environment centers on Python and the Jupyter development environment, there are a number of Jupyter notebooks associated with this package that demonstrate what the overlay can do through live code that runs on the PYNQ-Z1 board, equations, visualizations and explanatory text and program results including images.

 

There are also examples of this BNN in practical application:

 

 

 

 

For more information about the Digilent PYNQ-Z1 board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.

 

 

 

 

Image3.jpgToday, EEJournal’s Kevin Morris has published a review article of the announcement titled “Teaching Machines to See: Xilinx Launches reVISION” following Monday’s announcement of the Xilinx reVISION stack for developing vision-guided applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”

 

Morris writes:

 

But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms.

 

“That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric - such as Xilinx Zynq and Altera SoC FPGAs - absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space.

 

Later on, Morris gets to the fly in the ointment:

 

“Oh, yeah, There’s still that “almost impossible to program” issue.”

 

And then he gets to the solution:

 

reVISION, announced this week, is a stack - a set of tools, interfaces, and IP - designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development - with customers predicting cuts of as much as 12 months from current schedules for new product and update development.

 

In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.

 

The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ⅕ the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications.

 

 

But don’t take my word for it, read Morris’ article yourself.

 

 

 

 

 

As part of today’s reVISION announcement of a new, comprehensive development stack for embedded-vision applications, Xilinx has produced a 3-minute video showing you just some of the things made possible by this announcement.

 

Here it is:

 

 

Adam Taylor’s MicroZed Chronicles, Part 177: Introducing the reVision stack

by Xilinx Employee ‎03-13-2017 10:39 AM - edited ‎03-22-2017 07:19 AM (1,272 Views)

 

By Adam Taylor

 

Several times in this series, we have looked at image processing using the Avnet EVK and the ZedBoard. Along with the basics, we have examined object tracking using OpenCV running on the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PS (processing system) and using HLS with its video library to generate image-processing algorithms for the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic, see blogs 140 to 148 here).

 

Xilinx’s reVision is an embedded-vision development stack that provides support for a wide range of frameworks and libraries often used for embedded-vision applications. Most exciting, from my point of view, is that the stack includes acceleration-ready OpenCV functions.

 

Image1.jpg 

 

 

The stack itself is split into three layers. Once we select or define our platform, we will be mostly working at the application and algorithm layers. Let’s take a quick look at the layers of the stack:

 

  1. Platform layer: This is the lowest level of the stack and is the one on which the remaining stack layers are built. This layer includes platform definitions of the hardware and the software environment. Should we choose not to use a predefined platform, we can generate a custom platform using Vivado.

 

  1. Algorithm layer: Here we create our application using SDSoC and the platform definition for the target hardware. It is within this layer that we can use the acceleration-ready OpenCV functions along with predefined and optimized implementations for Customized Neural Network (CNN) developments such as inference accelerators within the PL.

 

  1. Application Development Layer: The highest layer of the stack. Development here is where high-level frameworks such as Caffe and OpenVX are used to complete the application.

 

As I mentioned above one of the most exciting aspects of the reVISION stack is the ability to accelerate a wide range of OpenCV functions using the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL. We can group the OpenCV functions that can be hardware-accelerated using the PL into four categories:

 

  1. Computation – Includes functions such as absolute difference between two frames, pixel-wise operations (addition, subtraction and multiplication), gradient, and integral operations
  2. Input Processing – Supports bit-depth conversions, channel operations, histogram equalization, remapping, and resizing.
  3. Filtering – Supports a wide range of filters including Sobel, Custom Convolution, and Gaussian filters.
  4. Other – Provides a wide range of functions including Canny/Fast/Harris edge detection, thresholding, SVM, HoG, LK Optical Flow, Histogram Computation, etc.

 

What is very interesting with these function calls is that we can optimize them for resource usage or performance within the PL. The main optimization method is specifying the number of pixels to be processed during each clock cycle. For most accelerated functions, we can choose to process either one or eight pixels. Processing more pixels per clock cycle reduces latency but increases resource utilization. Processing one pixel per clock minimizes the resource requirements at the cost of increased latency. We control the number of pixels processed per clock in via the function call.

 

Over the next few blogs, we will look more at the reVision stack and how we can use it. However in the best Blue Peter tradition, the image below shows the result of running a reVision Harris OpenCV acceleration function within the PL when accelerated.

 

 

Image2.jpg

 

 

Accelerated Harris Corner Detection in the PL

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge

by Xilinx Employee ‎03-13-2017 07:37 AM - edited ‎03-22-2017 07:19 AM (2,702 Views)

 

Image3.jpgToday, Xilinx announced a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference. It’s called the reVISION stack and it allows design teams without deep hardware expertise to use a software-defined development flow to combine efficient machine-learning and computer-vision algorithms with Xilinx All Programmable devices to create highly responsive systems. (Details here.)

 

The Xilinx reVISION stack includes a broad range of development resources for platform, algorithm, and application development including support for the most popular neural networks: AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN. Additionally, the stack provides library elements such as pre-defined and optimized implementations for CNN network layers, which are required to build custom neural networks (DNNs and CNNs). The machine-learning elements are complemented by a broad set of acceleration-ready OpenCV functions for computer-vision processing.

 

For application-level development, Xilinx supports industry-standard frameworks including Caffe for machine learning and OpenVX for computer vision. The reVISION stack also includes development platforms from Xilinx and third parties, which support various sensor types.

 

The reVISION development flow starts with a familiar, Eclipse-based development environment; the C, C++, and/or OpenCL programming languages; and associated compilers all incorporated into the Xilinx SDSoC development environment. You can now target reVISION hardware platforms within the SDSoC environment, drawing from a pool of acceleration-ready, computer-vision libraries to quickly build your application. Soon, you’ll also be able to use the Khronos Group’s OpenVX framework as well.

 

For machine learning, you can use popular frameworks including Caffe to train neural networks. Within one Xilinx Zynq SoC or Zynq UltraScale+ MPSoC, you can use Caffe-generated .prototxt files to configure a software scheduler running on one of the device’s ARM processors to drive CNN inference accelerators—pre-optimized for and instantiated in programmable logic. For computer vision and other algorithms, you can profile your code, identify bottlenecks, and then designate specific functions that need to be hardware-accelerated. The Xilinx system-optimizing compiler then creates an accelerated implementation of your code, automatically including the required processor/accelerator interfaces (data movers) and software drivers.

 

The Xilinx reVISION stack is the latest in an evolutionary line of development tools for creating embedded-vision systems. Xilinx All Programmable devices have long been used to develop such vision-based systems because these devices can interface to any image sensor and connect to any network—which Xilinx calls any-to-any connectivity—and they provide the large amounts of high-performance processing horsepower that vision systems require.

 

Initially, embedded-vision developers used the existing Xilinx Verilog and VHDL tools to develop these systems. Xilinx introduced the SDSoC development environment for HLL-based design two years ago and, since then, SDSoC has dramatically and successfully shorted development cycles for thousands of design teams. Xilinx’s new reVISION stack now enables an even broader set of software and systems engineers to develop intelligent, highly responsive embedded-vision systems faster and more easily using Xilinx All Programmable devices.

 

And what about the performance of the resulting embedded-vision systems? How do their performance metrics compare against against systems based on embedded GPUs or the typical SoCs used in these applications? Xilinx-based systems significantly outperform the best of this group, which employ Nvidia devices. Benchmarks of the reVISION flow using Zynq SoC targets against Nvidia Tegra X1 have shown as much as:

 

  • 6x better images/sec/watt in machine learning
  • 42x higher frames/sec/watt for computer-vision processing
  • 1/5th the latency, which is critical for real-time applications

 

Image1.jpg 

 

There is huge value to having a very rapid and deterministic system-response time and, for many systems, the faster response time of a design that's been accelerated using programmable logic can mean the difference between success and catastrophic failure. For example, the figure below shows the difference in response time between a car’s vision-guided braking system created with the Xilinx reVISION stack running on a Zynq UltraScale+ MPSoC relative to a similar system based on an Nvidia Tegra device. At 65mph, the Xilinx embedded-vision system’s response time stops the vehicle 5 to 33 feet faster depending on how the Nvidia-based system is implemented. Five to 33 feet could easily mean the difference between a safe stop and a collision.

 

 

Image2.jpg 

 

(Note: This example appears in the new Xilinx reVISION backgrounder.)

 

 

The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down. Many new types of neural networks for vision-guided systems have emerged along with new techniques that make deployment of these neural networks much more efficient. No matter what you develop today or implement tomorrow, the hardware and I/O reconfigurability and software programmability of Xilinx All Programmable devices can “future-proof” your designs whether it’s to permit the implementation of new algorithms in existing hardware; to interface to new, improved sensing technology; or to add an all-new sensor type (like LIDAR or Time-of-Flight sensors, for example) to improve a vision-based system’s safety and reliability through advanced sensor fusion.

 

Xilinx is pushing even further into vision-guided, machine-learning applications with the new Xilinx reVISION Stack and this announcement complements the recently announced Reconfigurable Acceleration Stack for cloud-based systems. (See “Xilinx Reconfigurable Acceleration Stack speeds programming of machine learning, data analytics, video-streaming apps.”) Together, these new development resources significantly broaden your ability to deploy machine-learning applications using Xilinx technology—from inside the cloud to the very edge.

 

 

You might also want to read “Xilinx AI Engines Steers New Course” by Junko Yoshida on the EETimes.com site.

 

 

 

By Adam Taylor

 

Embedded vision is one of my many FPGA/SoC interests. Recently, I have been doing some significant development work with the Avnet Embedded Vision Kit (EVK) significantly (for more info on the EVK and its uses see Issues 114 to 126 of the MicroZed Chronicles). As part my development, I wanted to synchronize the EVK display output with an external source—also useful if we desire to synchronize multiple image streams.

 

Implementing this is straight forward provided we have the correct architecture. The main element we need is a buffer between the upstream camera/image sensor chain and the downstream output-timing and -processing chain. VDMA (Video Direct Memory Access) provides this buffer by allowing us to store frames from the upstream image-processing pipeline in DDR SDRAM and then reading out the frames into a downstream processing pipeline with different timing.

 

The architectural concept appears below:

 

 

Image1.jpg

 

 

VDMA buffering between upstream and downstream with external sync

 

 

For most downstream chains, we use a combination of the video timing controller (VTC) and AXI Stream to Video Out IP blocks, both provided in the Vivado IP library. These two IP blocks work together. The VTC provides output timing and generates signals such as VSync and HSync. The AXI Stream to Video Out IP Block synchronizes its incoming AXIS stream with the timing signals provided by the VTC to generate the output video signals. Once the AXI Stream to Video Out block has synchronized with these signals, it is said to be locked and it will generate output video and timing signals that we can use.

 

The VTC itself is capable of both detecting input video timing and generating output video timing. These can be synchronized if you desire. If no video input timing signals are available to the VTC, then the input frame sync pulse (FSYNC_IN) serves to synchronize the output timing.  

 

 

Image2.jpg

 

 

Enabling Synchronization with FSYNC_IN or the Detector

 

 

 

If FSYNC_IN alone is used to synchronize the output, we need to use not only FSYNC_IN but also the VTC-provided frame sync out (FSYNC_OUT) and GEN_CLKEN to ensure correct synchronization. GEN_CLKEN is an input enable that allows the VTC generator output stage to be clocked.

 

The FSYNC_OUT pulse can be configured to occur at any point within the frame. For this application, is has been configured to be generated at the very end of the frame. This configuration can take place in the VTC re-configuration dialog within Vivado for a one-time approach or, if an AXI Lite interface is provided, it can be positioned using that during run time.

 

The algorithm used to synchronize the VTC to an external signal is:

 

  • Generate a 1-clock-wide pulse on FSYNC_IN reception
  • Enable GEN_CLK
  • Wait for the FSYNC_OUT to be received
  • Disable GEN_CLK
  • Repeat from step 1

 

Should GEN_CLK not be disabled, the VTC will continue to run freely and will generate the next frame sequence. Issuing another FSYNC_IP while this is occurring will not result in re-synchronisation but will result in the AXI Stream to Video Out IP block being unable to synchronize the AXIS video with the timing information and losing lock.

 

Therefore, to control the enabling of the GEN_CLKEN we need to create a simple RTL block that implements the algorithm above.

 

 

Image3.jpg

 

Vivado Project Demonstrating the concept

 

 

When simulated, this design resulted in the VTC synchronizing to the FSYNC_IN signal as intended. It also worked the same when I implemented it in my EVK kit, allowing me to synchronize the output to an external trigger.

 

 

Image4.jpg

 

Simulation Results

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

RFEL has supplied the UK’s Defence Science and Technology Laboratory (DSTL), an executive agency sponsored by the UK’s Ministry of Defence, with two of its Zynq-based HALO Rapid Prototype Development Systems (RPDS). DSTL will evaluate video processing algorithms using IP from RFEL and 3rd parties in real-time, interactive video trials for military users. The HAL RPDS dramatically speeds assessment of complex video-processing solutions and provides real-time prototypes while conventional software-based simulations do not provide real-time performance).

 

 

RFEL HALO RPDS.jpg

 

HALO Rapid Prototype Development Systems (RPDS)

 

 

HALO is a small, lightweight, real-time video-processing subsystem based on the Xilinx Zynq Z-7020 or Z-7030 SoCs. It’s also relatively low-cost. HALO is designed for fast integration of high-performance vision capabilities for extremely demanding video applications—and military video applications are some of the most demanding because things whiz by pretty quickly and mistakes are very costly. The Zynq SoC’s software and hardware programmability give the HALO RPDS the flexibility to adapt to a wide variety of video-processing applications while providing real-time response.

 

Here’s a block diagram of the HALO RPDS:

 

 

RFEL HALO Block Diagram.jpg

 

HALO Rapid Prototype Development Systems (RPDS) Block Diagram

 

 

As you can see from the light blue boxes at the top of this block diagram, there are already a variety of real-time, video-processing algorithms. RFEL itself offers many such cores for:

 

 

All of these video-processing functions operate in real-time because they are hardware implementations instantiated in the Zynq SoC’s PL (programmable logic). In addition, the Zynq SoC’s extensive collection of I/O peripherals and programmable I/O mean that the HALO RPDS can interface with a broad range of image and video sources and displays.(That's why we say that Zynq SoCs are All Programmable.)

 

DSTL procured two HALO RPDS systems to support very different video processing investigations, for diverse potential applications. One system is being used to evaluate RFEL's suite of High Definition (HD) video-stabilization IP products to create bespoke solutions. The second system is being used to evaluate 3rd-party algorithms and their performance. The flexibility and high performance of the Zynq-based HALO RPDS system means that it is now possible for DSTL to rapidly experiment with many different hardware-based algorithms. Of course, any successful candidate solutions are inherently supported on the HALO platform, so the small, lightweight HALO system provides both a prototyping platform and an implementation platform.

 

 

For previous coverage of an earlier version of RFEL’s HALO system, see “Linux + Zynq + Hardware Image Processing = Fused Driver Vision Enhancement (fDVE) for Tank Drivers.”

 

What are people doing with the Amazon Web Services FPGA-based F1 services? Quite a lot.

by Xilinx Employee ‎02-09-2017 11:31 AM - edited ‎02-09-2017 12:07 PM (2,713 Views)

 

Amazon Web Services (AWS) rolled out the F1 instance for cloud application development based on Xilinx Virtex UltraScale+ Plus VU0P FPGAs last November. (See “Amazon picks Xilinx UltraScale+ FPGAs to accelerate AWS, launches F1 instance with 8x VU9P FPGAs per instance.) It appears from the following LinkedIn post that people are using it already to do some pretty interesting things:

 

 

AWS F1 Neural Net application.jpg 

 

 

If you’re interested in Cloud computing applications based on the rather significant capabilities of Xilinx-based hardware application acceleration, check out the Xilinx Acceleration Zone.

 

The new VC Z series of industrial Smart Cameras from Vision Components incorporate a Xilinx Zynq Z-7010 SoC to give the camera programmable local processing. The VC nano Z camera series is available as a bare-board imaging platform called the VCSBC series or as a fully enclosed camera called the VC series. The VSBC series is available with 752x480-pixels (WVGA), 1280x1024-pixels (SXGA), 1600x1200-pixel, or 2048x1536-pixel sensors. These camera modules acquire video at rates from 50 to 120 frames/sec depending on sensor size. All four of these modules are also available with remote sensor heads as the VCSBC nano Z-RH series to ease system integration. Thanks to the added video-processing horsepower of the Zynq SoC, these modules are also offered in dual-sensor, stereo-imaging versions called the VCSBC nano Z-RH-2 series.

 

 

VCSBC nano Stereo Camera.jpg
 

Vision Components VCSBC nano Z-RH-2 industrial stereo smart camera module

 

 

These same cameras are also available from Vision Components with rugged enclosures and lens mounts as the VC nano Z series and the VC pro Z series. The VC pro Z versions can be equipped with IR LED illumination.

 

 

VC pro Z Enclosed Camera Module.jpg

 

 

Vision Components VC pro Z enclosed industrial smart camera

 

 

 

The ability to create more than a dozen different programmable cameras and camera modules from one platform directly arises from the use of the integrated Xilinx Zynq SoC. The cameras use the Zynq SoC’s dual-core ARM Cortex-A9 MPCore processor to run Linux and to support the extensive programmability made possible by software tools such as Halcon Embedded from MVTech software, which allows you to comfortably develop applications on a PC and then export them to Vision Components’ Smart Cameras. The Zynq SoC’s on-chip programmable logic is able to perform a variety of vision-processing tasks such as white-light interferometry, color conversion, and high-speed image recognition (such as OCR, bar-code reading and license-plate recognition) in real time.

 

These cameras make use of the extensive, standard I/O capabilities in the Zynq SoC including high-speed Ethernet, I2C, and serial I/O while the Zynq SoC’s programmable I/O provides the interfacing flexibility needed to accommodate the four existing image sensors offered in the series or any other sensor that Vision Components might wish to add to the VC Z series of smart cameras in the future. According to Endre J. Tóth, Director of Business Development at Vision Components, these programmable capabilities give his company a real competitive advantage.

 

Here’s a 5-minute video detailing some of the applications you can address with these Smart Cameras from Vision components:

 

 

 

 

Note: For more information about these Smart Cameras, please contact Vision Components directly.

 

 

 

 

 

SDVoE Logo.jpgThe Pro AV industry’s transition from proprietary audio/video transport means to lower-cost, IP-based solutions is already underway but like any new field, the differing approaches make the situation somewhat chaotic. That’s why 14 leading vendors are launching the SDVoE (Software Defined Video Over Ethernet) Alliance at this week’s ISE 2017 show in Amsterdam (Booth 12-H55).

 

The SDVoE Alliance is a non-profit consortium that’s developing standards to provide “an end-to-end hardware and software platform for AV extension, switching, processing and control through advanced chipset technology, common control APIs and interoperability.” The consortium also plans to create an ecosystem around SDVoE technology.

 

An SDVoE announcement late last week said that fourteen new companies were joining the original six founding member companies (AptoVision, Aquantia, Christie Digital, NETGEAR, Sony, and ZeeVee). The new member companies are:

 

  • DVIGear
  • Grandbeing
  • IDK Corporation
  • Arista
  • Aurora Multimedia
  • HDCVT
  • Techlogix Networx
  • Xilinx

 

 

You might recognize Aquantia’s name on this list from the recent Xcell Daily blog post about the company’s new AQLX107 device, which packs an Ethernet PHY—capable of operating at 10Gbps over 100m of Cat 6a cable (or 5Gbps down to 100Mbps over 100m of Cat 5e cable)—along with a Xilinx Kintex-7 FPGA into one compact package.

 

The connection here is not at all coincidental. The Aquantia  AQLX107 “FPGA-programmable PHY” makes a pretty nice device for implementing SDVoE and in fact, Aquantia and AptoVision announced such an implementation just today. According to this announcement, “Combined with AptoVision’s BlueRiver technology, the AQLX107 can be used to transmit true 4K60 video across off-the-shelf 10G Ethernet networks and standard category cable with zero frame latency.  Audio and video processing, including upscaling, downscaling, and multi-image compositing are all realizable on the SDVoE hardware and software platform made possible by the AQLX107.”

 

The presence of Xilinx on this list of SDVoE Alliance members also should not be surprising. Xilinx has long worked with major Pro AV vendors to meet a wide variety of professional and broadcast-video challenges including any-to-any connectivity, all of the latest video-compression technologies, and video over IP.

 

In fact, Xilinx and its Xilinx Alliance Members will be demonstrating some of the most recent AV innovations and implementations in the Xilinx booth exhibiting at this week’s ISE show including:

 

  • 4K HEVC Real-Time Compression – presented by Xilinx
  • 4K HDMI Over IP Using TICO – presented by intoPIX
  • 4K HDMI Over IP Using VC-2 HQ – presented by Barco Silex
  • 4K Real Time Warp & Image Stitching – presented by Omnitek
  • 8K Real Time Video Processing – presented by Omnitek

 

Check out these demonstrations in booth 14-B132 at the ISE 2017 show.

 

 

 

Earlier this week at Photonics West in San Francisco, Tattile introduced the high-speed, 12Mpixel S12MP Smart Camera based on a Xilinx Zynq Z-7030 SoC. (See “Tattile’s 12Mpixel S12MP industrial GigE Smart Camera captures 300 frames/sec with Zynq SoC processing help.”) However, the S12MP camera is not the company’s first smart camera to be based on Xilinx Zynq SoCs. In fact, the company previously introduced four other color/monochrome/multispectral, C-mount smart cameras based on various Zynq SoC family members:

 

  • The 640x480-pixel, 120 frames/sec S50 Compact Smart Camera series based on a CMOSIS CMV300 image sensor and a single-core Xilinx Zynq Z-7000S SoC.

 

 

Tattile S50 Smart Camera.jpg 

 

Tattile S50 Compact Smart Camera based on a single-core Xilinx Zynq Z-7000S SoC

 

 

 

  • The VGA-to-4Mpixel, 35-250 frames/sec Next-Generation S100 Smart Camera series based on one of three CMOSIS image sensors and a dual-core Xilinx Zynq Z-7000 SoC.

 

 

 

Tattile S100 Smart Camera.jpg 

 

Tattile S100 Compact Smart Camera based on a dual-core Xilinx Zynq Z-7000 SoC

 

 

 

  • The 4.2Mpixel, 180 frames/sec High-Performance S200 Smart Camera series based on a CMOSIS CMV4000 image sensor and a dual-core Xilinx Zynq Z-7000 SoC.

 

  • The Hyperspectral S200 Hyp Smart Camera series based on one of three hyperspectral image sensors and a dual-core Xilinx Zynq Z-7000 SoC.

 

 

Tattile S200 Smart Camera.jpg

 

 

Tattile S200 and S200 Hyp Smart Cameras based on a dual-core Xilinx Zynq Z-7000 SoC

 

 

All of these cameras use the Zynq SoC’s on-chip programmable logic to perform a variety of real-time vision processing. For example, the S50 and S100 Smart Cameras use the on-chip programmable logic for image acquisition and image preprocessing. The S200 Hyp camera uses the programmable logic to also perform reflectance calculations and multispectral image/cube reconstruction. In addition, Tattile is able to make the real-time processing capabilities of the programmable logic available in these cameras to its customers through software including a graphical development tool.

 

The compatible Xilinx Zynq Z-7000 and Z-7000S SoCs give Tattile’s development teams a choice of several devices with a variety of cost/performance/capability ratios while allowing Tattile to develop a unified camera platform on which to base a growing family of programmable smart cameras. The Zynq SoCs’ programmable I/O permits any type of image sensor to be used, including the multispectral line, tiled, and mosaic sensors used in the S200 Hyp series. The same basic controller design can be reused multiple times and the design is future-proof—ready to handle any new sensor type that might be introduced at a later date.

 

That’s exactly that happened with the newly introduced S12MP Smart Camera.

 

 

Please contact Tattile directly for more information about these Smart Cameras.

 

A Very Short Conversation with Ximea about Subminiature Video Cameras and Very Small FPGAs

by Xilinx Employee ‎02-02-2017 01:00 PM - edited ‎02-02-2017 09:54 PM (1,803 Views)

 

Yesterday at Photonics West, my colleague Aaron Behman and I stopped by the Ximea booth and had a very brief conversation with Max Larin, Ximea’s CEO. Ximea makes a very broad line of industrial and scientific cameras and a lot of them are based on several generations of Xilinx FPGAs. During our conversation, Max removed a small pcb from a plastic bag and showed it to us. “This is the world’s smallest industrial camera,” he said while palming a 13x13mm board. It was one of Ximea’s MU9 subminiature USB cameras based on a 5Mpixel ON Semiconductor (formerly Aptina) MT9P031 image sensor. Ximea’s MU9 subminiature camera is available as a color or monochrome device.

 

Here’s are front and back photos of the camera pcb:

 

 

Ximea MU9 Subminiature Camera.jpg

 

Ximea 5Mpixel MU9 subminiature USB camera

  

 

As you can see, the size of the board is fairly well determined by the 10x10mm image sensor, its bypass capacitors, and a few other electronic components mounted on the front of the board. Nearly all of the active electronics and the camera’s I/O connector are mounted on the rear. A Cypress CY7C68013 EZ-USB Microcontroller operates the camera’s USB interface and the device controlling the sensor is a Xilinx Spartan-3 XC3S50 FPGA in an 8x8mm package. FPGAs with their logic and I/O programmability are great for interfacing to image sensors and for processing the video images generated by these sensors.

 

Our conversation with Max Larin at Photonics West got me to thinking. I wondered, “What would I use to design this board today?” My first thought was to replace both the Spartan-3 FPGA and the USB microcontroller with a single- or dual-core Xilinx Zynq SoC, which can easily handle all of the camera’s functions including the USB interface, reducing the parts count by one “big” chip. But the Zynq SoC family’s smallest package size is 13x13mm—the same size as the camera pcb—and that’s physically just a bit too large.

 

The XC3S50 FPGA used in this Ximea subminiature camera is the smallest device in the Spartan-3 family. It has 1728 logic cells and 72Kbits of BRAM. That’s a lot of programmable capability in an 8x8mm package even though the Spartan-3 FPGA family first appeared way back in 2003. (See “New Spartan-3 FPGAs Are Cost-Optimized for Design and Production.”)

 

There are two newer Spartan FPGA families to consider when creating a design today, Spartan-6 and Spartan-7, and both device families include multiple devices in 8x8mm packages. So I decided see how much I might pack into a more modern FPGA with the same pcb real-estate footprint.

 

The simple numbers from the data sheets tell part of the story. A Spartan-3 XC3S50 provides you with 1728 logic cells, 72Kbits of BRAM, and 89 I/O pins. The Spartan-6 XCSLX4, XC6SLX9, and XCSLX16 provide you with 3840 to 14,579 logic cells, 216 to 576Kbits of BRAM, and 106 I/O pins. The Spartan-7 XC7S6 and XC7S15 provide 6000 to 12,800 logic cells, 180 to 360Kbits of BRAM, and 100 I/O pins. So both the Spartan-6 and Spartan-7 FPGA families provide nice upward-migration paths for new designs.

 

However, the simple data-sheet numbers don’t tell the whole story. For that, I needed to talk to Jayson Bethurem, the Xilinx Cost Optimized Portfolio Product Line Manager, and get more of the story. Jayson pointed out a few more things.

 

First and foremost, the Spartan-7 FPGA family offers a 2.5x performance/watt improvement over the Spartan-6 family. That’s a significant advantage right there. The Spartan-7 FPGAs are significantly faster than the Spartan-6 FPGAs as well. Spartan-6 devices in the -1L speed grade have a 250MHz Fmax versus 464MHz for Spartan-7 -1 or -1L parts. The fastest Spartan-6 devices in the -3 speed grade have an Fmax of 400MHz (still not as fast as the slowest Spartan-7 speed grade) and the fastest Spartan-7 FPGAs, the -2 parts, have an Fmax of 628MHz. So if you feel the need for speed, the Spartan-7 FPGAs are the way to go.

 

I’d be remiss not to mention tools. As Jayson reminded me, the Spartan-7 family gives you entrée into the world of Vivado Design Suite tools. That means you get access to the Vivado IP catalog and Vivado’s IP Integrator (IPI) with its automated integration features. These are two major benefits.

 

Finally, some rather sophisticated improvements to the Spartan-7 FPGA family’s internal routing architecture means that the improved placement and routing tools in the Vivado Design Suite can pack more of your logic into Spartan-7 devices and get more performance from that logic due to reduced routing congestion. So directly comparing logic cell numbers between the Spartan-6 and Spartan-7 FPGA families from the data sheets is not as exact a science as you might assume.

 

The nice thing is: you have plenty of options.

 

 

For previous Xcell Daily blog posts about Ximea industrial and scientific cameras, see:

 

 

 

 

 

Tattile’s rugged, new S12MP Ultra High Resolution Smart Camera for industrial and machine-vision applications pairs a 12Mpixel CMOSIS CMV12000 image sensor with a Xilinx Zynq Z-7030 SoC to create a compact, high-performance, programmable imaging system capable of capturing 300 12Mpixel, full-resolution frames/sec at 10 bits/pixel (and 140 frames/sec at 12 bits/pixel). The camera can capture partial-resolution video at even higher frame rates under programmatic control of the Zynq SoC. An on-board GigE Vision server streams the captured video to an Ethernet-connected host and an integrated SD card slot permits as much as 32Gbytes of local video storage. The camera takes F-mount lenses, measures only 80x80x60 (without the lens mount), and consumes just 12W from a 12Vdc supply.

 

 

Tattile S12MP Ultra High Resolution Smart Camerat.jpg 

 

 

The S12MP Ultra High Resolution Smart Camera is the latest in a growing line of Smart Cameras from Tattile. The company has focused on adding intelligence to its latest cameras to help customers reduce overall system costs in a variety of vision applications. To that end, Tattile has exposed the programmable logic inside of the S12MP camera to permit its customers to develop and run custom real-time vision algorithms in the camera itself using the Xilinx Vivado Design Suite. According to Tattile, pushing vision processing to the edge in this manner increases vision-system performance and lowers cost.

 

For more information about the S12MP Ultra High Resolution Smart Camera, please contact Tattile directly.

 

 

 

Aquantia has packed its Ethernet PHY—capable of operating at 10Gbps over 100m of Cat 6a cable (or 5Gbps down to 100Mbps over 100m of Cat 5e cable)—with a Xilinx Kintex-7 FPGA, creating a universal Gigabit Ethernet component with extremely broad capabilities. Here’s a block diagram of the new AQLX107 device:

 

 

Aquantia AQLX107 PHY Block Diagram.jpg 

 

 

This Aquantia device gives you a space-saving, one-socket solution for a variety of Ethernet designs including controllers, protocol converters, and anything-to-Ethernet bridges.

 

Please contact Aquantia for more information about this unique Ethernet chip.

 

 

Dense Optical Flow hardware-acceleration on Zynq SoC made easier by SDSoC and OpenCV libraries

by Xilinx Employee ‎01-25-2017 12:31 PM - edited ‎01-25-2017 12:35 PM (2,536 Views)

 

The 4-minute video below demonstrates a real-time, dense optical flow demonstration running on a Xilinx Zynq SoC. The entire demo was developed using C/C++, the Xilinx SDSoC development environment, and associated OpenCV libraries. The dense optical flow algorithm compares successive video images to estimate the apparent motion of each pixel in the one of the images. This technique is used in video compression, object detection, object tracking, and image segmentation. Dense optical flow is a computationally-intensive operation, which makes it an ideal candidate for hardware acceleration using the programmable logic in a small, low-power Zynq SoC.

 

As Xilinx Senior Product Manager for SDSoC and Embedded Vision Nick Ni explains, SDSoC lowers the barriers to using the Zynq SoC in these embedded-vision applications because the tool makes it relatively easy for software developers accustomed to using only C or C++ to develop hardware-accelerated applications with the coding tools and styles they already know. SDSoC then converts the code that requires acceleration into hardware and automatically links this hardware to the software through DMA.

 

 

 

 

 

You can now watch four hour-long Xilinx “Vision with Precision” Webinars at your convenience, on demand. The four Webinars are:

 

 

  • Medical Imaging
  • Video Surveillance
  • Vision-Guided Robotics and Drone Applications
  • Machine Vision Applications

 

 

More info and links to the Webinars here.

 

It’s amazing what you can do with a few low-cost video cameras and FPGA-based, high-speed video processing. One example: the Virtual Flying Camera that Xylon has implemented with just four video cameras and a Xilinx Zynq Z-7000 SoC. This setup gives the driver a flying, 360-degree view of a car and its surroundings. It’s also known as a bird’s-eye view, but in this case the bird can fly around the car.

 

Many such implementations of this sort of video technology use GPUs for the video processing, but Xylon uses the programmable logic in the Zynq SoC using custom hardware designed with Xylon logicBRICKS IP cores. The custom hardware implemented in the Zynq SoC’s programmable logic enables very fast execution of complex video operations including camera lens-distortion corrections, video frame grabbing, video rotation, perspective changes, as well as the seamless stitching of four processed video streams into a single display output—and all this occurs in real time. This design approach assures the lowest possible video processing delay at significantly lower power consumption when compared to GPU-based implementations.

 

A Xylon logi3D Scalable 3D Graphics Controller soft-IP core—also implemented in the Zynq SoC’s programmable logic—renders a 3D vehicle and the surrounding view on the driver’s information display. The Xylon Surround View system permits real-time 3D image generation even in programmable SoCs without an on-chip GPU, as long as there’s programmable logic available to implement the graphics controller. The current version of the Xylon ADAS Surround View Virtual Flying Camera system runs on the Xylon logiADAK Automotive Driver Assistance Kit that is based on the Xilinx Zynq-7000 All Programmable SoC.

 

Here’s a 2-minute video of the Xylon Surround View system in action:

 

 

 

 

If you’re attending the CAR-ELE JAPAN show in Tokyo next week, you can see the Xylon Surround View system operating live in the Xilinx booth.

 

 

 

Next week, the Xilinx booth at the CAR-ELE JAPAN show at Tokyo Big Sight will hold a variety of ADAS (Advanced Driver Assistance Systems) demos based on Xilinx Zynq SoC and Zynq UltraScale+ MPSoC devices from several companies including:

 

 

  • A camera-based driver monitoring system by Fovio, a pioneer in the emerging market segment of Driver Monitoring Systems.
  • A multi-camera system with Ethernet-based audio/video Bridging by Regulus, NEC Communication Systems, and Linear Technology
  • An advanced camera-and-display E-Mirror System by Toyota Tsusho Electronics Corporation
  • A high-end surround-view system employing sensor fusion by Xylon
  • A deep-learning system based on a CNN (Convolutional Neural Networks) running on a Zynq UltraScale+ MPSoC

 

 

The Zynq UltraScale+ MPSoC and original Zynq SoC offer a unique mix of ARM 32- and 64-bit processors with the heavy-duty processing you get from programmable logic, needed to process and manipulate video and to fuse data from a variety of sensors such as video and still cameras, radar, lidar, and sonar to create maps of the local environment.

 

If you are developing any sort of sensor-based electronic systems for future automotive products, you might want to come by the Xilinx booth (E35-38) to see what’s already been explored. We’re ready to help you get a jump on your design.

 

 

 

All Internet-connected video devices produce data streams that are processed somewhere in the cloud, said Xilinx Chief Video Architect Johan Janssen during a talk at November’s SC16 conference in Salt Lake City. FPGAs are well suited to video acceleration and deliver better compute density than cloud servers based on microprocessors. One example Janssen gave during his talk shows a Xilinx Virtex UltraScale VU190 FPGA improving the video-stream encoding rate from 3 to 60fps while cutting power consumption by half when compared to the performance of a popular Intel Xeon microprocessor executing the same encoding task. In power-constrained data centers, that’s a 40x efficiency improvement with no increase in electrical or heat load. In other words, it costs a lot less operationally to use FPGA for video encoding in data centers.

 

Here’s the 7-minute video of Janssen’s talk at SC16:

 

 

 

 

 

Nextera Video is helping the broadcast video industry migrate to video-over-IP as quickly as possible with an FPGA IP core developed for Xilinx UltraScale and other Xilinx FPGAs that compresses 4K video using Sony’s low-latency, noise-free NMI (Network Media Interface) packet protocols to achieve compression ratios of 3:1 to 14:1. The company’s products can transport compressed 4Kp60 video between all sorts of broadcast equipment over standard 10G IP switches, which significantly lowers equipment and operating costs for broadcasters.

 

Here’s a quick video that describes Nextera’s approach:

 

 

 

 

It’s been fascinating to watch Apertus’ efforts to develop the crowd-funded Axiom Beta open 4K cinema camera over the past few years. It’s based on On Semi and CMOSIS image sensors and a MicroZed dev board sporting a Xilinx Zynq Z-7030 SoC. Apertus released a Team Talk video last November with Max Gurresch and Sebastian Pichelhofer discussing the current state of the project and focusing on the mechanical and housing aspects of the projects.

 

Here’s a photo from the video showing a diagram of the camera’s electronic board stack including the MicroZed board:

 

 

Apertus Axion Exploded Diagram.jpg

 

 

Axiom Beta open-source 4K Cinema Camera Electronic Board Stack

 

 

 

And here’s a rendering of the current thinking for a camera enclosure, which is discussed at length in the video.

 

 

 

Axiom Beta Concept housing.jpg

 

 

Axiom Beta open-source 4K Cinema Camera Housing Concept

 

 

 

If you’re interested in following the detailed thought processes of this complex imaging product that also addresses many of the issues connected with a crowd-funded project like the Axiom Beta camera, watch this video:

 

 

 

 

For more information about the Axiom Beta 4K cinema camera, see “How to build 4K Cinema Cameras: The Apertus Prescription includes Zynq ingredients.”

 

 

Ultra HD H.264 Video Codec IP runs on Zynq Z-7045 SoC

by Xilinx Employee on ‎01-03-2017 02:45 PM (3,400 Views)

 

Anand V Kulkarni, Engineering Manager, Atria Logic India Pvt Ltd, Bangalore, India

 

 

Atria Logic’s H.264 codec IP blocks (the AL-H264E-4KI422-HW encoder and the AL-H264D-4KI422-HW decoder) achieve UHD 4k@60fps video with each running on a Xilinx Zynq Z-7045 SoC as shown in the figure below.

 

 

article1.jpg

 

Block Diagram of Atria Logic UHD H.264 Codec Solution

 

 

Atria Logic’s AL-H264E-4KI422-HW is a hardware-based, feature-rich, low-latency, high-quality, H.264 (AVC) UHD Hi422 Intra encoder IP core. The AL-H264E-4KI422-HW encoder pairs with the Atria Logic AL-H264D-4KI422-HW low-latency decoder IP.

 

The IP cores’ features include:

 

 

  • Complete modular implementation that you can customize and scale
  • 264 Intra-only Hi422 Level 5.1 encoder and decoder
  • Integrated HDMI2.0 receiver and transmitter subsystems
  • 8/10-bit support
  • YUV 4:2:2/4:4:4, RGB support
  • Very low latency at ~0.3sec
  • Variable bit rate (VBR) and constant bit rate (CBR) support
  • Video quality at 0.99% SSIM, or 50dB PSNR or higher
  • Video processing subsystem for pre/post processing including color-space conversion, video, scaling, and chroma subsampling
  • Gbps Ethernet streaming output support

 

 

When devising a plan for evaluating our UHD Encoder and Decoder IP cores and to meet 4K@60fps performance requirements, we needed a flexible, powerful platform. We settled on the Xilinx ZC706 evaluation kit based on the Zynq Z-7045 SoC because:

 

 

 

 

  • The Zynq Z-7045 SoC’s programmable logic can accommodate the encoder and decoder IP logic while meeting our stringent timing requirements to achieve the required performance.

 

  • The Zynq SoC’s processing system with its dual-core ARM Cortex-A9 MPCore processor gave us the ability to modify application driver software and to build customizations like an application-specific GUI.

 

 

The H.264 encoder supports the H.264 Hi422 (High-422) profile at Level 5.1 (3840x2160p30) for Intra-only coding. Support for 10-bit video content means that there is no grayscale or color degradation in terms of banding. Support for YUV 4:2:2 video content means that there is better color separation—especially noticeable for red colors—which makes images appear sharper. These video-quality attributes are especially important for medical-imaging applications.

 

 

article3.jpg

 

 

Atria Logic UHD H.264 Encoder IP Block Diagram

 

 

Support for Intra-only encoding allows the H.264 encoder to operate at frame-rate latencies. A macroblock-line-level pipelined architecture further reduces the latency to the sub-frame level: about 0.3msec. Using a pipelined design that processes 8 pixels/clock allows the design to encode 4k@60fps in real time.

 

Implementation of the Atria Logic H.264 encoder consumes only 78% of the Zynq Z-7045 SoC’s programmable logic and DSP resources and 55% of the available RAM, leaving ample room for other required circuitry.

 

The H.264 decoder supports the H.264 Hi422 (High-422) profile at Level 5.1 (3840x2160p30) for Intra-only coding. As with the encoder, support for 10-bit video content means that there is no grayscale or color degradation in terms of banding. The decoder also supports YUV 4:2:2 video content. Support for Intra-only decoding using a pipelined architecture allows the decoder to operate at frame-rate latencies.

 

 

 

article4.png

 

Atria Logic UHD H.264 Decoder IP Block Diagram

 

 

Low latency is important for any closed-loop man/machine application. When the Atria Logic AL-H264E-4KI422-HW encoder is connected to the Atria Logic AL-H264D-4KI422-HW low-latency decoder via an IP network, the glass-to-glass latency is about 0.6msec (excluding transmission latency). That’s about a 2-frame latency.

 

An efficient implementation of the Atria Logic H.264 decoder only takes up 68% of the Zynq Z-7045 SoC’s programmable logic resources, 35% of available DSP resources, and 45% of the available RAM, leaving ample room for implementation of any other required circuitry.

 

The design’s HDMI subsystem consists of two major modules: the Xilinx LogiCore HDMI TX and RX subsystems, configured as shown in the figure below:

 

 

article2.jpg

 

 

The HDMI Transceiver (GTX) module transmits and receives the serial HDMI TX and RX data and converts between these serial streams and on-chip parallel data streams as needed. The transceiver module, which converts parallel data into serial and vice versa, employs the Zynq SoC’s high speed GT transceivers as the HDMI PHY.

 

The TX subsystem consists of the transmitter core, AXI video bridge, video timing controller, and an optional HDCP module. An AXI video stream carries two or four pixels per clock into the HDMI TX subsystem and supports 8, 10, and 12 bits per component. This stream conforms to the video protocol defined in the Video IP chapter of the AXI Reference Guide (UG761). The TX subsystem’s video bridge converts the incoming video AXI-stream to native video and the video timing controller generates the native video timing. The audio AXI stream transports multiple channels of uncompressed audio data into the HDMI TX subsystem. The Zynq Z-7045 SoC’s ARM Cortex-A9 processor controls the HDMI TX subsystem’s transmitter blocks through the CPU interface.

 

The HDMI RX subsystem incorporates three AXI interfaces. A video bridge converts captured native video to AXI streaming video and outputs the video data through the AXI video interface using the video protocol defined in the Video IP chapter of the AXI Reference Guide (UG761). The video timing controller measures the video timing. Received audio is transmitted through the AXI streaming audio interface. A CPU interface provides processor access to the peripherals’ control and status data.

 

The HDCP module is optional and is not included in the standard deliverables.

 

 

 

 

 

Baumer’s new line of intelligent LX VisualApplets industrial cameras delivers image and video processing with image resolutions to 20Mpixels at high frame rates. The cameras contain sufficient FPGA-accelerated, image-processing power to perform real-time image pre-processing according to application-specific programming created using Silicon Software’s VisualApplets graphical programming environment. This pre-processing improves an imaging systems throughput and real-time response while reducing the amount of data uploaded to a host.

 

 

Baumer LX VisualApplets camera.jpg 

 

Baumer intelligent LX VisualApplets industrial camera

 

 

 

The Baumer LX VisualApplets cameras perform this pre-processing in camera using an internal Xilinx Spartan-6 LX150 FPGA and 256Mbytes of DDR3 SDRAM. The cameras support a GigE Vision interface over 100m of cable. Coincidentally, these new industrial cameras recently won a Platinum-level award in the Vision Systems Design 2016 Innovators Awards Program.

 

The LX VisualApplets industrial camera product family includes seven models with sensor resolutions ranging from 2Mpixels to 20Mpixels, all based on CMOSIS image sensors. Here’s a table listing details for the seven 2D and 3D models in the product family:

 

 

Baumer LX VisualApplets camera table.jpg 

 

 

Here’s a lighthearted, 3.5-minute whiteboard video that concisely describes the advantages of in-camera, FPGA-based, image-stream pre-processing:

 

 

 

 

 

This unlikely new project on the Instructables Web site uses a $189 Digilent ZYBO trainer board (based on a Xilinx Zynq Z7010 SoC) to track balloons with an attached Webcam and then pop them with a high-powered semiconductor laser. The tracking system is programmed with OpenCV.

 

Here’s a view down the bore of the laser:

 

Laser Balloon Popper.jpg 

 

And there’s a 1-second video of the system in action on the Instructables Web page.

 

Fun aside, this system demonstrates that even the smallest Zynq SoC can be used for advanced embedded-vision systems. You can get more information about embedded-vision systems based on Xilinx silicon and tools at the new Embedded Vision Developer Zone.

 

Note: For more information about Digilent’s ZYBO trainer board, see “ZYBO has landed. Digilent’s sub-$200 Zynq-based Dev Board makes an appearance (with pix!)

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.