We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!


RedZone Robotics’ Solo—a camera-equipped, autonomous sewer-inspection robot—gives operators a detailed, illuminated view of the inside of a sewer pipe by crawling the length of the pipe and recording video of the conditions it finds inside. A crew can deploy a Solo robot in less than 15 minutes and then move to another site to launch yet another Solo robot, thus conducting several inspections simultaneously and cutting the cost per inspection. The treaded robot traverses the pipeline autonomously and then returns to the launch point for retrieval. If the robot encounters an obstruction or blockage, it attempts to negotiate the problem three times before aborting the inspection and returning to its entry point. The robot fits into pipes as small as eight inches in diameter and even operates in pipes that contain some residual waste water.




RedZone Robotics Solo Sewer Inspection Robot.jpg


RedZone Robotics Autonomous Sewer-Pipe Inspection Robot




Justin Starr, RedZone’s VP of Technology, says that the Solo inspection robot uses its on-board Spartan FPGA for image processing and for AI. Image-processing algorithms compensate for lens aberrations and also perform a level of sensor fusion for the robot’s multiple sensors. “Crucial” AI routines in the Spartan FPGA help the robot keep track of where it is in the pipeline and tell the robot what to do when it encounters an obstruction.


Starr also says that RedZone is already evaluating Xilinx Zynq devices to extend the robot’s capabilities. “It’s not enough for the Solo to just grab information about what it sees, but let’s actually look at those images. Let’s have the solo go through that inspection data in real time and generate a preliminary report of what it saw. It used to be the stuff of science fiction but now it’s becoming reality.”


Want to see the Solo in action? Here’s a 3-minute video:










Here’s a hot-off-the-camera, 3-minute video showing a demonstration of two ZCU106 dev boards based on the Xilinx Zynq UltraScale+ ZU7EV MPSoCs with integrated H.265 hardware encoders and decoders. The first ZCU106 board in this demo processes an input stream from a 4K MIPI video camera by encoding it, packetizing it, and then transmitting it over a GigE connection to the second board, which depacketizes, decodes, and displays the video stream on a 4K monitor. Simultaneously, the second board performs the same encoding, packetizing, and transmission of another video stream from a second 4K MIPI camera to the first ZCU106 board, which displays the second video stream on another 4K display.


Note that the integrated H.265 hardware codecs in the Zynq UltraScale+ ZU7EV MPSoC can handle as many as eight simultaneous video streams in both directions.


Here’s the short video demo of this system in action:





For more information about the ZCU106 dev board and the Zynq UltraScale+ EV MPSoCs, contact your friendly, neighborhood Xilinx or Avnet sales representative.




By Adam Taylor


One ongoing area we have been examining is image processing. We’ve look at the algorithms and how to capture images from different sources. A few weeks ago, we looked at the different methods we could use to receive HDMI data and followed up with an example using an external CODEC (P1 & P2). In this blog we are going to look at using internal IP cores to receive HDMI images in conjunction with the Analog Devices AD8195 HDMI buffer, which equalizes the line. Equalization is critical when using long HDMI cables.





Nexys board, FMC HDMI and the Digilent PYNQ-Z1




To do this I will be using the Digilent FMC HDMI card, which provisions one of its channels with an AD8195. The AD8195I on the FMC HDMI card needs a 3v3 supply, which is not available on the ZedBoard unless I break out my soldering iron. Instead, I broke out my Digilent Nexys Video trainer board, which comes fitted with an Artix-7 FPGA and an FMC connector. This board has built-in support for HDMI RX and TX but the HDMI RX path on this board supports only 1m of HDMI cable while the AD8195 on the FMC HDMI card supports cable runs of up to 20m—far more useful in many distributed applications. So we’ll add the FMC HDMI card.


First, I instantiated a MicroBlaze soft microprocessor system in the Nexys Video card’s Artix-7 FPGA to control the simple image-processing chain needed for this example. Of course, you can implement the same approach to the logic design that I outline here using a Xilinx Zynq SoC or Zynq UltraScale+ MPSoC. The Zynq PS simply replaces the MicroBlaze.


 The hardware design we need to build this system is:


  • MicroBlaze controller with local memory, AXI UART, MicroBlaze Interrupt controller, and DDR Memory Interface Generator.
  • DVI2RGB IP core to receive the HDMI signals and convert them to a parallel video format.
  • Video Timing Controller, configured for detection.
  • ILA connected between the VTC and the DVI2RBG cores, used for verification.
  • Clock Wizard used to generate a 200MHz clock, which supplies the DDR MIG and DVI2RGB cores. All other cores are clocked by the MIG UI clock output.
  • Two 3-bit GPIO modules. The first module will set the VADJ to 3v3 on the HDMI FMC. The second module enables the ADV8195 and provides the hot-plug detection.







The final step in this hardware build is to map the interface pins from the AD8195 to the FPGA’s I/O pins through the FMC connector. We’ll use the TMDS_33 SelectIO standard for the HDMI clock and data lanes.


Once the hardware is built, we need to write some simple software to perform the following:



  • Disable the VADJ regulator using pin 2 on the first GPIO port.
  • Set the desired output voltage on VADJ using pins 0 & 1 on the first GPIO port.
  • Enable the VADJ regulator using pin 2 on the first GPIO port.
  • Enable the AD8195 using pin 0 on the second GPIO port.
  • Enable pre- equalization using pin 1 on the second GPIO port.
  • Assert the Hot-Plug Detection signal using pin 2 on the second GPIO port.
  • Read the registers within the VTC to report the modes and status of the video received.



To test this system, I used a Digilent PYNQ-Z1 board to generate different video modes. The first step in verifying that this interface is working is to use the ILA to check that the pixel clock is received and that its DLL is locked, along with generating horizontal and vertical sync signals and the correct pixel values.


Provided the sync signals and pixel clock are present, the VTC will be able to detect and classify the video mode. The application software will then report the detected mode via the terminal window.





ILA Connected to the DVI to RGB core monitoring its output







Software running on the Nexys Video detecting SVGA mode (600 pixels by 800 lines)




With the correct video mode being detected by the VTC, we can now configure a VDMA write channel to move the image from the logic into a DDR frame buffer.



You can find the project on GitHub



If you are working with video applications you should also read these:



PL to PS using VDMA

What to do if you have VDMA issues  

Creating a MicroBlaze System Video

Writing MicroBlaze Software  




If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




First Year E Book here

First Year Hardback here.




MicroZed Chronicles hardcopy.jpg  




Second Year E Book here

Second Year Hardback here



MicroZed Chronicles Second Year.jpg 




Photonfocus has just introduced two cameras that specialize in seeing in the SWIR (short-wave IR, 930nm to 1700nm) spectral range to its huge and growing line of industrial cameras. The company’s MV3-D640I-M01-144 cameras are based on the Sofradir SNAKE low-noise InGaAs image sensor with 640x512-pixel resolution with a 12-bit, grayscale output. CameraLink or GigE interfaces are available. SNR is said to be as much as 1200:1 and the camera can operate at 300 frames/sec at full resolution. A global shutter makes µsec-long exposures possible even at the camera’s highest frame rate and permits a constant frame rate independent of exposure time.




Photonfocus MV3-D640I-CL SWIR camera.jpg 


Photonfocus MV3-D640I-M01-144-CL camera based on Sofradir SNAKE low-noise SWIR image sensor




Like many of the company’s industrial cameras, the MV3-D640I-M01-144 cameras employ a Xilinx All Programmable device—an Artix-7 FPGA in this case—to interface to the image sensor and to give the camera some of its programmable, frame-rate features. For this camera, that means as many as 256 configurable regions of interest, which enables hyperspectral applications.



For information about other Photonfocus industrial cameras, see:















Twelve student and industry teams competed for 30 straight hours in the Xilinx Hackathon 2017 competition over the weekend at the Summit Retreat Center in the Xilinx corporate facility located in Longmont, Colorado. Each team member received a Digilent PYNQ-Z1 dev board, which is based on a Xilinx Zynq Z-7020 SoC, and then used their fertile imaginations to conceive of and develop working code for an application using the open-source, Python-based PYNQ development environment, which is based on self-documenting Jupyter Notebooks. The online electronics and maker retailer Sparkfun, located just down the street from the Xilinx facility in Longmont, supplied boxes of compatible peripheral boards with sensors and motor controllers to spur the team members’ imaginations. Several of the teams came from local universities including the University of Colorado at Boulder and the Colorado School of Mines in Golden, Colorado. At the end of the competition, eleven of the teams presented their results using their Jupyter Notebooks. Then came the prizes.


For the most part, team members had never used the PYNQ-Z1 boards and were not familiar with using programmable logic. In part, that was the intent of the Hackathon—to connect teams of inexperienced developers with appropriate programming tools and see what develops. That’s also the reason that Xilinx developed PYNQ: so that software developers and students could take advantage of the improved embedded performance made possible by the Zynq SoC’s programmable hardware without having to use ASIC-style (HDL) design tools to design hardware (unless they want to do so, of course).


Here are the projects developed by the teams, in the order presented during the final hour of the Hackathon (links go straight to the teams’ Github repositories with their Jupyter notebooks that document the projects with explanations and “working” code):



  • Team “from timemachine import timetravel” developed a sine wave generator with a PYNQ-callable frequency modulator and an audio spectrum analyzer. Time permitted the team to develop a couple of different versions of the spectrum analyzer but not enough time to link the generator and analyzer together.


  • Team “John Cena” developed a voice-controlled mobile robot. An application on a PC captured the WAV file for a spoken command sequence and this file was then wirelessly transmitted to the mobile robot, which interpreted commands and executed them.



Team John Cena Voice-Controlled Mobile Robot.jpg


Team John Cena’s Voice-Controlled Mobile Robot




  • Inspired by the recent Nobel Physics prize given to the 3-person team that architected the LIGO gravity-wave observatories, Team “Daedalus” developed a Hackathon entry called “Sonic LIGO”—a sound localizer that takes audio captured by multiple microphones, uses time correlation to filter audio noise from the sounds of interest, and then triangulates the location of the sound using its phase derived from each microphone. Examples of sound events the team wanted to locate included hand claps and gun shots. The team planned to use its members’ three PYNQ-Z1 boards for the triangulation.


  • Team “Questionable” from the Colorado School of Mines developed an automated parking lot assistant to aid students looking for a parking space near the university. The design uses two motion detectors to detect cars passing through each lot’s entrances and exits. Timing between the two sensors determines whether the car is entering or leaving the lot. The team calls their application PARQYNG and produced a YouTube video to explain the idea:






  • Team “Snapback” developed a Webcam-equipped cap that captures happy moments by recognizing smiling faces and using that recognition to trigger the capture of a short video clip, which is then wirelessly uploaded to the cloud for later viewing. This application was inspired by the oncoming memory loss of one of the team members’ grandmother.


  • Team “Trimble” from Trimble, Inc. developed a sophisticated photogrammetric application for determining position using photogrammetry techniques. The design uses the Zynq SoC’s programmable logic to accelerate the calculations.


  • Team “Codeing Crazy” developed an “air keyboard” (it’s like a working air guitar but it’s a keyboard) using OpenCV to recognize the image of a hand in space, locate the recognized object in a space that’s predefined as a keyboard, and then playing the appropriate note.


  • Team “Joy of Pink” from CU Boulder developed a real-time emoji generator that recognizes facial expressions in an image, interprets the emotion shown on the subject’s face by sending the image to Microsoft’s cloud-based Azure Emotion API, and then substituting the appropriate emoji in the image.



Team Joy of Pink Emoji Generator.jpg


Team “Joy of Pink” developed an emoji generator based on facial interpretation on Microsoft’s cloud-based Azure Emotion API




  • Team “Harsh Constraints” plunged headlong into a Verilog-based project to develop a 144MHz LVDS Cameralink interface to a thermal camera. It was a very ambitious venture for a team that had never before used Verilog.



  • Team “Caffeine” developed a tone-controlled robot using audio filters instantiated in the Zynq SoC’s programmable logic to decode four audio tones which then control robot motion. Here’s a block diagram:



Team Caffeine Audio Fiend Machine.jpg



Team Caffeine’s Audio Fiend Tone-Based Robotic Controller




  • Team “Lynx” developed a face-recognition system that stores faces in the cloud in a spreadsheet on a Google drive based on whether or not the system has seen that face before. The system uses Haar-Cascade detection written in OpenCV.



After the presentations, the judges deliberated for a few minutes using multiple predefined criteria and then awarded the following prizes:



  • The “Murphy’s Law” prize for dealing with insurmountable circumstances went to Team Harsh Constraints.


  • The “Best Use of Programmable Logic” prize went to Team Caffeine.


  • The “Runner Up” prize went to Team Snapback.


  • The “Grand Prize” went to Team Questionable.



Congratulations to the winners and to all of the teams who spent 30 hours with each other in a large room in Colorado to experience the joy of hacking code to tackle some tough problems. (A follow-up blog will include a photographic record of the event so that you can see what it was like.)




For more information about the PYNQ development environment and the Digilent PYNQ-Z1 board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.”




By Adam Taylor


The Xilinx Zynq UltraScale+ MPSoC is good for many applications including embedded vision. It’s APU with two or four 64-bit ARM Cortex-A53 processors, Mali GPU, DisplayPort interface, and on-chip programmable logic (PL) give the Zynq UltraScale+ MPSoC plenty of processing power to address exciting applications such as ADAS and vision-guided robotics with relative ease. Further, we can use the device’s PL and its programmable I/O to interface with a range of vision and video standards including MIPI, LVDS, parallel, VoSPI, etc. When it comes to interfacing image sensors, the Zynq UltraScale+ MPSoC can handle just about anything you throw at it.


Once we’ve brought the image into the Zynq UltraScale+ MPSoC’s PL, we can implement an image-processing pipeline using existing IP cores from the Xilinx library or we can develop our own custom IP cores using Vivado HLS (high-level synthesis). However, for many applications we’ll need to move the images into the device’s PS (processing system) domain before we can apply exciting application-level algorithms such as decision making or use the Xilinx reVISION acceleration stack.






The original MicroZed Evaluation kit and UltraZed board used for this demo




I thought I would kick off the fourth year of this blog with a look at how we can use VDMA instantiated in the Zynq MPSoC’s PL to transfer images from the PL to the PS-attached DDR Memory without processor intervention. You often need to make such high-speed background transfers in a variety of applications.


To do this we will use the following IP blocks:


  • Zynq MPSoC core – Configured to enable both a Full Power Domain (FPD) AXI HP Master and FPD HPC AXI Slave, along with providing at least one PL clock and reset to the PL fabric.
  • VDMA core – Configured for write only operations, No FSync option and with a Genlock Mode of master
  • Test Pattern Generator (TPG) – Configurable over the AXI Lite interface
  • AXI Interconnects – Implement the Master and Slave AXI networks



Once configured over its AXI Lite interface, the Test Pattern Generator outputs test patterns which are then transferred into the PS-attached DDR memory. We can demonstrate that this has been successful by examining the memory locations using SDK.





Enabling the FPD Master and Slave Interfaces




For this simple example, we’ll clock both the AXI networks at the same frequency, driven by PL_CLK_0 at 100MHz.


For a deployed system, an image sensor would replace the TPG as the image source and we would need to ensure that the VDMA input-channel clocks (Slave-to-Memory-Map and Memory-Map-to-Slave) were fast enough to support the required pixel and frame rate.  For example, a sensor with a resolution of 1280 pixels by 1024 lines running at 60 frames per second would require a clock rate of at least 108MHz. We would need to adjust the clock frequency accordingly.






Block Diagram of the completed design




To aid visibility within this example, I have included three ILA modules, which are connected to the outputs of the Test Pattern Generator, AXI VDMA, and the Slave Memory Interconnect. Adding these modules enables the use of Vivado’s hardware manager to verify that the software has correctly configured the TPG and the VDMA to transfer the images.


With the Vivado design complete and built, creating the application software to configure the TPG and VDMA to generate images and move them from the PL to the PS is very straightforward. We use the AXIVDMA, V_TPG, Video Common APIs available under the BSP lib source directory to aid in creating the application. The software itself performs the following:


  1. Initialize the TPG and the AXI VDMA for use in the software application
  2. Configure the TPG to generate a test pattern configured as below
    1. Set the Image Width to 1280, Image Height to 1080
    2. Set the color space to YCRCB, 4:2:2 format
    3. Set the TPG background pattern
    4. Enable the TPG and set it for auto reloading
  3. Configure the VDMA to write data into the PS memory
    1. Set up the VDMA parameters using a variable of the type XAxiVdma_DmaSetup – remember the horizontal size and stride are measured in bytes not pixels.
    2. Configure the VDMA with the setting defined above
    3. Set the VDMA frame store location address in the PS DDR
    4. Start VDMA transfer

The application will then start generating test frames, transferred from the TPG into the PS DDR memory. I disabled the caches for this example to ensure that the DDR memory is updated.


Examining the ILAs, you will see the TPG generating frames and the VDMA transferring the stream into memory mapped format:






TPG output, TUSER indicates start of frame while TLAST indicates end of line






VDMA Memory Mapped Output to the PS




Examining the frame store memory location within the PS DDR memory using SDK demonstrates that the pixel values are present.





Test Pattern Pixel Values within the PS DDR Memory





You can use the same approach in Vivado when creating software for a Zynq Z-7000 SoC iinstead of a Zynq UltraScale+ MPSoC by enabling the AXI GP master for the AXI Lite bus and AXI HP slave for the VDMA channel.


Should you be experiencing trouble with your VDMA based image processing chain, you might want to read this blog.



The project, as always, is on GitHub.




If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




First Year E Book here

First Year Hardback here.




MicroZed Chronicles hardcopy.jpg  




Second Year E Book here

Second Year Hardback here



MicroZed Chronicles Second Year.jpg 



Moor Insights & Strategy publishes Amazon/Xilinx cloud-acceleration analysis on Forbes.com

by Xilinx Employee ‎09-27-2017 02:24 PM - edited ‎09-29-2017 12:41 PM (2,643 Views)


Karl Freund, a Senior Analyst at Moor Insights & Strategy, has just published an article on Forbes.com titled “Amazon And Xilinx Deliver New FPGA Solutions” that discusses Amazon’s use of Xilinx Virtex UltraScale+ FPGAs in the Amazon AWS EC2 F1 instance and how those resources are now being used more widely to distribute cloud-based applications through AWS Marketplace’s Amazon Machine Images (AMIs). Freund gave three specific examples of companies using these resources to offer accelerated, cloud-based services:



  • NGCodec offers accelerated video encoding and AR/VR processing for cloud-based video services including surveillance and OTT (over-the-top) TV.


  • Ryft offers advanced query and search services.


  • Endico Genome offers the DRAGEN Bio-It Platform for genomic analytics.



Freund also notes that cloud-based applications from these three companies are not “GPU-friendly,” which means that these applications benefit far more from FPGA-based acceleration than they do GPU acceleration.



NGCodec, Ryft, and Edico Genome have all appeared in Xcell Daily posts. For more information, see:








Today, the Virgo collaboration announced that the Virgo gravitational-wave detector located near Pisa in Italy detected a burst of gravity waves from the merger of two black holes on August 14. The black-hole event that generated these gravity waves occurred nearly two billion years ago. This is the fourth confirmed event of this type. Five countries are involved in the Virgo scientific collaboration: Italy, France, the Netherlands, Poland, and Hungary. The three previous gravity-wave events were observed by instruments operated by the LIGO scientific collaboration in the US, which also detected this fourth event. Gravity detectors like Virgo and LIGO are immense—the size is required to detect the effects of gravity waves as they pass through the spacetime fabric surrounding the earth.




Virgo Gravity Wave Observatory near Pisa Italy.jpg 



The Virgo Gravity-Wave Observatory near Pisa, Italy




The VIRGO instrument has two 3km detector arms set at a right angle and the instrument is trying to detect small changes in the lengths of the arms as the gravity wave passes through and slightly alters dimensions of the spacetime fabric. The change VIRGO is trying to detect is on the order of 10-18 m.


It’s all done with lasers, suspended ultra-precise mirrors, and interferometry. VIRGO’s laser was recently upgraded with a wider, more powerful beam to increase the instrument’s sensitivity through a program called Advanced VIRGO, which was dedicated earlier this year in a ceremony on February 20 and only started taking data two weeks before detecting the August 14 event.


The increased laser power introduces thermal effects on the laser optics and the mirrors which creatres high-order modes in the beam. These high-order modes can reduce the circulating power in the interferometer and reduce the instrument’s sensitivity. Advanced VIRGO uses a thermal-compensation system to reduce these high-order modes and one of the instrument’s three phase cameras provides the instrumentation needed for closed-loop, real-time thermal control. The phase cameras were developed by Nikhef, the National Institute for Subatomic Physics in Amsterdam. (See “Advanced Virgo phase cameras” for more information.)


The phase camera’s sensor is sampled by a 14-bit ADC at 500MHz and the digitized signal feeds a Xilinx Virtex-7 485T FPGA, which implements the real-time processing (because, what else could you use for high-speed, real-time vision processing?).




MathWorks has just published a 4-part mini course that teaches you how to develop vision-processing applications using MATLAB, HDL Coder, and Simulink, then walks you through a practical example targeting a Xilinx Zynq SoC using a lane-detection algorithm in Part 4.



Click here for each of the classes:









Eye Vision Technology (EVT) has announced a tiny new line in its EyeCheck series of smart industrial cameras. According to the EVT announcement, the EyeCheck ZQ smart camera is “hardly bigger than an index finger” (no precise specifications available). Even at that size, the EyeCheck ZQ camera manages to integrate four illuminating LEDs around the camera’s lens. An onboard Xilinx Zynq SoC adds the camera’s intelligence and makes the camera software-programmable for applications including bar, DMC, and QR code reading; pattern matching; counting and measuring objects; and object error detection for detecting visible manufacturing defects, as an example.



EVT EyeCheck ZQ Smart Camera.jpg 



EVT’s Zynq-based EyeCheck ZQ smart camera is a little larger than an index finger




Programming involved drag-and-drop, graphical-based programming using the company’s EyeVision software. According to the EVT announcement, use of the Zynq SoC allows the camera to run applications much faster than any other camera in the company’s EyeCheck series. EyeVision commands take the form of icons, which can be lined up. As an example, a bar-code-reading application program requires only three icons. The EyeCheck ZQ camera is also available as a pre-programmed vision sensor, which EVT calls its EyeSens series.


Because of its high processing speed, small size, and light weight, the EyeCheck ZQ smart camera is well suited for smart-imaging applications including robot arms used for production lines in the automotive or electronics industry.




PDF Solutions provides yield-improvement technologies and services to the IC-manufacturing industry to lower manufacturing costs, improve profitability, and shorten time to market. One of the company’s newest solutions is the eProbe series of e-beam tools used for inline electrical characterization and process control. These tools combine an SEM (scanning electron microscope) and an optical microscope and have the unique ability to provide real-time image analysis of nanometer-scale features. The eProbe development team selected National Instrument’s (NI’s) LabVIEW to control the eProbe system and brought in JKI—a LabVIEW consulting company, Xilinx Alliance Program member, and NI Silver Alliance Partner—to help develop the system.



PDF Solutions eProbe 100 e-beam inspection system.jpg 



PDF Solutions eProbe e-beam tool combines an SEM with an optical microscope




In less than four months, JKI helped PDF Solutions attain a 250MHz pixel-acquisition rate from the prototype eProbe using a combination of NI’s FlexRIO module, based on a Xilinx Kintex-7 FPGA, and NI’s LabVIEW FPGA module. According to the PDF Solutions case study published on the JKI Web site, using NI’s LabVIEW allowed the PDF/JKI team to implement the required, real-time FPGA logic and easily integrate third-party FPGA IP in a fraction of the time required by alternative design platforms while still achieving the project’s image-throughput goals.


LabVIEW controls most of the functions within the eProbe that perform the wafer inspection including:



  • Controlling the x and y axis for the stage
  • Sampling and driving various I/O points for the electron gun and the column
  • Controlling the load port and equipment frontend module
  • Overseeing the vacuum and interlocking components
  • Directing and managing SEM and optical image acquisition.



JKI contributed both to the eProbe’s software architecture design and the development of various high-level software components that coordinate and control the low-level hardware functions including data acquisition and image manipulation.


Although the eProbe’s control system runs within NI’s LabVIEW environment, the system’s user interface is based on a C# application from The PEER Group called the Peer Tool Orchestrator (PTO). JKI developed the interface between the eProbe’s front-end user interface and its LabVIEW-based control system using its internally developed tools. (Note: JKI offers several LabVIEW development tools and templates directly on this Web page.)



eProbe 150 User Interface Screen.jpg



eProbe user interface screen




Once PDF Solutions started fielding eProbe systems, JKI sent people to work with PDF Solutions’ customers on site in a collaboration that helped generate ideas for future algorithm and tool improvements.



For more information about real-time LabVIEW development using the NI LabVIEW FPGA module and Xilinx-based NI hardware, contact JKI directly.



The just-announced MV1-D2048x1088-3D06 industrial laser-triangulation video camera from Photonfocus is based on a high-sensitivity, 2048x1088-pixel CMOSIS CMV2000 V3 CMOS image sensor and takes advantage of a new, real-time Photonfocus LineFinder algorithm running in the camera’s on-board Xilinx Spartan-6 FPGA to compute the laser line position with sub-pixel accuracy. Using this information, the camera computes the height profile of an object with no help needed from a host PC. You can program specific regions of interest within the camera imaging frame and the size of the region of interest determines the 3D triangulation rate, which ranges from 42 frames/sec for a 2048x1088-pixel region of interest in 2D+3D imaging mode to 18619 frames/sec for a 2048x11-pixel region of interest in 3D imaging mode.



Photonfocus MV1-D2048x1088-3D06 industrial laser-triangulation video camera .jpg 



Photonfocus MV1-D2048x1088-3D06 industrial laser-triangulation video camera based on a Xilinx Spartan-6 FPGA





Like many of its existing industrial video cameras, Photonfocus’ MV1-D2048x1088-3D06 is based on a Spartan-6 LX75T FPGA, which serves as a platform for the company’s smart industrial video cameras. Use of the Spartan-6 FPGA permitted Photonfocus to create an extremely flexible real-time, vision-processing platform that serves as a foundation for several very different types of cameras based on very different imaging sensors with very different sensor interfaces including:










Xylon has a new hardware/software development kit for quickly implementing embedded, multi-camera vision systems for ADAS and AD (autonomous driving), machine-vision, AR/VR, guided robotics, drones, and other applications. The new logiVID-ZU Vision Development Kit is based on the Xilinx Zynq UltraScale+ MPSoC and includes four Xylon 1Mpixel video cameras based on the TI FPD (flat-panel display) Link-III interface. The kit supports HDMI video input and display output and comes complete with extensive software deliverables including pre-verified camera-to-display SoC designs built with licensed Xylon logicBRICKS IP cores, reference designs and design examples prepared for the Xilinx SDSoC Development Environment, and complete demo Linux applications.





Xylon logiVID-ZU Vision Development Kit .jpg 


Xylon’s new logiVID-ZU Vision Development Kit




Please contact Xylon for more information about the new logiVID-ZU Vision Development Kit.






Adam Taylor’s MicroZed Chronicles, Part 216: Capturing the HDMI video mode with the ADV7611 HDMI FMC

by Xilinx Employee ‎09-18-2017 11:33 AM - edited ‎09-18-2017 11:34 AM (2,997 Views)


By Adam Taylor



With the bit file generated, we are now able to create software that configures the ADV7611 Low-Power HDMI Receiver chip and the Zynq SoC’s VTC (Video Timing Controller). If we do this correctly, the VTC will then be able to report the input video mode.


To be able to receive and detect the video mode, the software must perform the following steps:


  • Initialize and configure the Zynq SoC’s I2C controller for master operation at 100KHz
  • Initialize and configure the VTC
  • Configure the ADV7611
  • Sample the VTC once a second, reporting the detected video mode





ZedBoard, FMC HDMI, and the PYNQ dev board connected for testing




Configuring the I2C and VTC is very simple. We have done both several times throughout this series (See these links: I2C, VTC.) Configuring the ADV7611 is more complicated and is performed using I2C. This is where this example gets a little complicated as the ADV7611 uses eight internal I2C slave addresses to configure different sub functions.







To reduce address-contention issues, seven of these addresses are user configurable. Only the IO Map has a fixed default address.


I2C addressing uses 7 bits. However, the ADV7611 documentation specifies 8-bit addresses, which includes a Read/Write bit. If we do not understand the translation between these 7- and 8-bit addresses, we will experience addressing issues because the Read/Write bit is set or cleared depending on the function we call from XIICPS.h.


The picture below shows the conversion from 8-bit to 7-bit format. The simplest method is to shift the 8-bit address one place to the right.







We need to create a header file containing the commands to configure each of the eight ADV7611’s sub functions.


This raises the question of where to obtain the information to configure the ADV7611 device. Rather helpfully, the  Analog Devices engineer zone, provides several resources including a recommended registers settings guide and several pre-tested scripts that you can download and use to configure the device for most use cases. All we need to do is select the desired use case and incorporate the commands into our header file.


One thing we must be very careful with is that the first command issued to the AD7611 must be an I2C reset command. You may see a NACK on the I2C bus in response to this command as the reset asserts very quickly. We also need to wait an appropriate period after issuing the reset command before continuing to load commands. In this example, I decided to wait the same time as following a hard reset, which the data sheet specifies as 5msec.


Once 5msec has elapsed following the reset, we can continue loading configuration data, which includes the Extended Display Identification Data (EDID) table. The EDID identifies to the source the capabilities of the display. Without a valid EDID table, the HDMI source will not start transmitting data.


Having properly configured the ADV7611, we may want to read back registers to ensure that it is properly configured or to access the device’s status. To do this successfully, we need to perform what is known as a I2C repeat start in the transaction following the initial I2C write. A repeat start is used when a master issues a write command and then wants to read back the result immediately. Issuing the repeat start prevents another device from interrupting the sequence.


We can configure the I2C controller to issue repeat starts between write and read operations within our software application by using the function call XIicPs_SetOptions(&Iic,XIICPS_REP_START_OPTION). Once we have completed the transaction we need to clear the repeat start option using the XIicPs_ClearOptions(&Iic,XIICPS_REP_START_OPTION) function call. Otherwise we may have issues with communication.


Once configured, the ADV7611 starts free running. It will generate HDMI Frames even with no source connected. The VTC will receive these input frames, lock to them and determine the video mode. We can obtain both the timing parameters and video mode by using the VTC API. The video modes that can be detected are:







Initially in its free-running mode, the ADV7611 outputs video in 480x640 pixel format. Checking the VTC registers, it is also possible to observe that the detector has locked with the incoming sync signals and has detected the mode correctly, as shown in the image below:







With the free-running mode functioning properly, the next step is to stimulate the FMC HDMI with different resolutions to ensure that they are correctly detected.


To test the application, we will use a PYNQ Dev Board. The PYNQ is ideal for this application because it is easily configured for different HDMI video standards using just a few lines of Python, as shown below. The only downside is the PYNQ board does not generate fully compliant 1080P video timing.



SVGA video outputting 800 pixels by 600 lines @ 60Hz






720P video outputting 1280 pixels by 720 Lines @ 60 Hz






SXGA video outputting 1280 pixels by 1024 lines @ 60Hz







Having performed these tests, it is clear the ADV7611 on the FMC HDMI is working as required and is receiving and decoding different HDMI resolutions correctly. At the same time, the VTC is correctly detecting the video mode, enabling us to capture video data on our Zynq SoC or Zynq UltraScale+ MPSoC systems for further processing.


The FMC HDMI has another method of receiving HDMI that equalizes the channel and passes it through to the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL for decoding. I will create an example design based upon that input over the next few weeks.


Note that we can also use this same approach with a MicroBlaze soft processor core instantiated in a Xilinx FPGA.




Code is available on Github as always.



If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




First Year E Book here

First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 




Second Year E Book here

Second Year Hardback here



MicroZed Chronicles Second Year.jpg





Brandon Treece from National Instruments (NI) has just published an article titled “CPU or FPGA for image processing: Which is best?” on Vision-Systems.com. NI offers a Vision Development Module for LabVIEW, the company’s graphical systems design environment, and can run vision algorithms on CPUs and FPGAs, so the perspective is a knowledgeable one. Abstracting the article, what you get from an FPGA-accelerated imaging pipeline is speed. If you’re performing four 6msec operations on each video frame, a CPU will need 24msec (four times 6msec) to complete the operations while an FPGA offers you parallelism that shortens processing time for each operation and permits overlap among the operations, as illustrated from this figure taken from the article:




NI Vision Acceleration.jpg 



In this example, the FPGA needs a total of 6msec to perform the four operations and another 2msec to transfer a video frame back and forth between processor and FPGA. The CPU needs a total of 24msec for all four operations. The FPGA needs 8msec, for a 3x speedup.


Treece then demonstrates that the acceleration is actually much greater in the real world. He uses the example of a video processing sequence needed for particle counting that includes these three major steps:



  • Convolution filtering to sharpen the image
  • Thresholding to produce a binary image
  • Morphology to remove holes in the binary particles



Here’s an image series that shows you what’s happening at each step:



NI Vision Acceleration Steps.jpg 



Using the NI Vision Development Module for LabVIEW, he then runs the algorithm run on an NI cRIO-9068 CompactRIO controller, which is based on a Xilinx Zynq Z-7020 SoC. Running the algorithm on the Zynq SoC’s ARM Cortex-A9 processor takes 166.7msec per frame. Running the same algorithm but accelerating the video processing using the Zynq SoC’s integral FPGA hardware takes 8msec. Add in another 0.5msec for DMA transfer of the pre- and post-processed video frame back and forth between the Zynq SoC’s CPU and FPGA and you get about a 20x speedup.


A key point here is that because the cRIO-9068 controller is based on the Zynq SoC, and because NI’s Vision Development Module for LabVIEW supports FPGA-based algorithm acceleration, this is an easy choice to make. The resources are there for your use. You merely need to click the “Go-Fast” button.



For more information about NI’s Vision Development Module for LabVIEW and cRIO-9068 controller, please contact NI directly.





BrainChip Holdings has just announced the BrainChip Accelerator, a PCIe server-accelerator card that simultaneously processes 16 channels of video in a variety of video formats using spiking neural networks rather than convolutional neural networks (CNNs). The BrainChip Accelerator card is based on a 6-core implementation BrainChip’s Spiking Neural Network (SNN) processor instantiated in an on-board Xilinx Kintex UltraScale FPGA.


Here’s a photo of the BrainChip Accelerator card:



BrainChip FPGA Board.jpg 


BrainChip Accelerator card with six SNNs instantiated in a Kintex UltraScale FPGA




Each BrainChip core performs fast, user-defined image scaling, spike generation, and SNN comparison to recognize objects. The SNNs can be trained using low-resolution images as small as 20x20 pixels. According to BrainChip, SNNs as implemented in the BrainChip Accelerator cores excel at recognizing objects in low-light, low-resolution, and noisy environments.


The BrainChip Accelerator card can process 16 channels of video simultaneously with an effective throughput of more than 600 frames per second while dissipating a mere 15W for the entire card. According to BrainChip, that’s a 7x improvement in frames/sec/watt when compared to a GPU-accelerated CNN-based, deep-learning implementation for neural networks like GoogleNet and AlexNet. Here’s a graph from BrainChip illustrating this claim:




BrainChip Efficiency Chart.jpg 





SNNs mimic human brain function (synaptic connections, neuron thresholds) more closely than do CNNs and rely on models based on spike timing and intensity. Here’s a graphic from BrainChip comparing a CNN model with the Spiking Neural Network model:





BrainChip Spiking Neural Network comparison.jpg 



For more information about the BrainChip Accelerator card, please contact BrainChip directly.




By Adam Taylor



When we surveyed the different types of HDMI sources and sinks recently for our Zynq SoC and Zynq UltraScale+ MPSoC designs, one HDMI receiver we discussed was the ADV7611. This device receives three TDMS data streams and converts them into discrete video and audio outputs, which can then be captured and processed. Of course, the ADV7611 is a very capable and somewhat complex device. It requires configuration prior to use. We are going to examine how we can include one within our design.






ZedBoard HDMI Demonstration Configuration




To do this, we need an ADV7611. Helpfully, the FMC HDMI card provides two HDMI inputs, one of which uses an ADV7611. The second equalizes the TMDS data lanes and passes them on directly to the Zynq SoC for decoding.


To demonstrate how we can get this device up and running with our Zynq SoC or Zynq UltraScale+ MPSoC, we will create an example that uses the ZedBoard with the HDMI FMC. For this example, we first need to create a hardware design in Vivado that interfaces with the ADV7611 on the HDMI FMC card. To keep this initial example simple, I will be only receiving the timing signals output by the ADV7611. These signals are:


  • Local Locked Clock (LLC) – The pixel clock.
  • HSync – Horizontal Sync, indicates the start of a new line.
  • VSync – Vertical Sync, indicates the start of a new frame.
  • Video Active – indicates that the pixel data is valid (e.g. we are not in a Sync or Blanking period)


This approach uses the VTG’s (Video Timing Generator’s) detector to receive the sync signals and identify the received video’s timing parameters and video mode. Once the ADV7611 correctly identifies the video mode, we have configured correctly. It is then a simple step to connect the received pixel data to a Video-to-AXIS IP block and use VDMA to write the received video frames into DDR memory for further processing.


For this example, we need the following IP blocks:


  • VTC (Video Timing Controller) – Configured for detection and to receive sync signals only.
  • ILA – Connected to the sync signals so that we can see that they are toggling correctly—to aid debugging and commissioning.
  • Constant – Set to a constant 1 to enable the clock and detector enables.


The resulting block diagram appears below. The eagle-eyed will also notice the addition both a GPIO output and I2C bus from the processor system. We need these to control and configure the ADV7611.






Simple Architecture to detect the video type



Following power up, the ADV7611 generates no sync signals or video. We must first configure the device, which requires the use of an I2C bus. We therefore need to enable one of the two I2C controllers within the Zynq PS and route the IO to the EMIO so that we can then route the I2C signals (SDA and SCL) to the correct pins on the FMC connector. The ADV7611 is a complex device to configure with multiple I2C addresses that address different internal functions within the device. EDID and High-bandwidth Digital Content Protection (HDCP), for example.


We also need to be able to reset the ADV7611 following the application of power to the ZedBoard and FMC HDMI. We use a PS GPIO pin, output via the EMIO, to do this. Using a controllable I/O pin for this function allows the application software to reset of the device each time we run the program. This capability is also helpful when debugging the software application to ensure that we start from a fresh reset each time the program runs—a procedure that prevents previous configurations form affecting the next.


With the block diagram completed, all that remains is to build the design with the location constraints (identified below) to connect to the correct pins on the FMC connector for the ADV7611.






Vivado Constraints for the ADV7611 Design




Once Vivado generates the bit file, we are ready to begin configuring the ADV7611. Using the I2C interface this way is quite complex, so we will examine the steps we need to do this in detail in the next blog. However, the image below shows one set of the results from the testing of the completed software as it detects a VGA (640 pixel by 480 line) input:







VTC output when detecting VGA input format















Code is available on Github as always.



If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.




MicroZed Chronicles hardcopy.jpg




  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 



A new open-source tool named GUINNESS makes it easy for you to develop binarized (2-valued) neural networks (BNNs) for Zynq SoCs and Zynq UltraScale+ MPSoCs using the SDSoC Development Environment. GUINNESS is a GUI-based tool that uses the Chainer deep-learning framework to train a binarized CNN. In a paper titled “On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA,” presented at the recent 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, authors Haruyoshi Yonekawa and Hiroki Nakahara describe a system they developed to implement a binarized CNN for the VGG-16 benchmark on the Xilinx ZCU102 Eval Kit, which is based on a Zynq UltraScale+ ZU9EG MPSoC. Nakahara presented the GUINNESS tool again this week at FPL2017 in Ghent, Belgium.


According to the IEEE paper, the Zynq-based BNN is 136.8x faster and 44.7x more power efficient than the same CNN running on an ARM Cortex-A57 processor. Compared to the same CNN running on an Nvidia Maxwell GPU, the Zynq-based BNN is 4.9x faster and 3.8x more power efficient.


GUINNESS is now available on GitHub.




ZCU102 Board Photo.jpg 



Xilinx ZCU102 Zynq UltraScale+ MPSoC Eval Kit








The Xilinx Technology Showcase 2017 will highlight FPGA-acceleration as used in Amazon’s cloud-based AWS EC2 F1 Instance and for high-performance, embedded-vision designs—including vision/video, autonomous driving, Industrial IoT, medical, surveillance, and aerospace/defense applications. The event takes place on Friday, October 6 at the Xilinx Summit Retreat Center in Longmont, Colorado.


You’ll also have a chance to see the latest ways you can use the increasingly popular Python programming language to create Zynq-based designs. The Showcase is a prelude to the 30-hour Xilinx Hackathon starting immediately after the Showcase. (See “Registration is now open for the Colorado PYNQ Hackathon—strictly limited to 35 participants. Apply now!”)


The Xilinx Technology Showcase runs from 3:00 to 5:00 PM.


Click here for more details and for registration info.




Xilinx Longmont.jpg


Xilinx Colorado, Longmont Facility





For more information about the FPGA-accelerated Amazon AWS EC2 F1 Instance, see:









Adam Taylor’s MicroZed Chronicles, Part 214: Addressing VDMA Issues

by Xilinx Employee ‎09-05-2017 12:04 PM - edited ‎09-06-2017 08:41 AM (6,076 Views)


By Adam Taylor



Video Direct Memory Access (VDMA) is one of the key IP blocks used within many image-processing applications. It allows frames to be moved between the Zynq SoC’s and Zynq UltraScale+ MPSoC’s PS and PL with ease. Once the frame is within the PS domain, we have several processing options available. We can implement high-level image processing algorithms using open-source libraries such as OpenCV and acceleration stacks such as the Xilinx reVISION stack if we wish to process images at the edge. Alternatively, we can transmit frames over Gigabit Ethernet, USB3, PCIe, etc. for offline storage or later analysis.


It can be infuriating when our VDMA-based image-processing chain does not work as intended. Therefore, we are going to look at a simple VDMA example and the steps we can take to ensure that it works as desired.


The simple VDMA example shown below contains the basic elements needed to provide VDMA output to a display. The processing chain starts with a VDMA read that obtains the current frame from DDR memory. To correctly size the data stream width, we use an AXIS subset convertor to convert 32-bit data read from DDR memory into a 24-bit format that represents each RGB pixel with 8 bits. Finally, we output the image with an AXIS-to-video output block that converts the AXIS stream to parallel video with video data and sync signals, using timing provided by the Video Timing Controller (VTC). We can use this parallel video output to drive a VGA, HDMI, or other video display output with an appropriate PHY.


This example outlines a read case from the PS to the PL and corresponding output. This is a more complicated case than performing a frame capture and VDMA write because we need to synchronize video timing to generate an output.







Simple VDMA-Based Image-Processing Pipeline




So what steps can we take if the VDMA-based image pipeline does not function as intended? To correct the issue:


  1. Check Reset and Clocks as we would when debugging any application. Ensure that the reset polarity is correct for each module as there will be mixed polarities. Ensure that the pixel clock is correct for the required video timing and that it is supplied to both the VTC and the AXIS-to-Video Out blocks. While the clock required for the AXIS network must be able to support the image throughput.
  2. Check the Clock Enables on both the VTC and AXIS to Video Out blocks are tied to the correct level to enable the clocks.
  3. Check that the VTC is correctly configured, especially if you are using the AXI interface to define the configuration through the application software. When configuring the VTC using AXI, it is important to make sure we have set the source registers to the VTC generator, enabled register updates, and defined the timing parameters required.
  4. Check the connections between the VTC and AXIS-to-Video-Out Blocks. Ensure that the horizontal and vertical blanking signals are also connected along with the horizontal and vertical syncs.
  5. Check the AXIS-to-Video-Out If we are using VDMA, the timing mode of the AXIS-to-Video-Out block should be set to master. This enables the AXIS-to-Video-Out block to assert back pressure on the AXIS data stream to halt the frame buffer output. This mechanism permits the AXIS-to-Video-Out block to manage the flow of pixels by enabling synchronization and lock. You may also want to increase the size of the internal buffer from the default.
  6. Check that the AXIS-to-Video-Out VTC_ce signal is not connected to the VTC gen clock enable as is the case when configured for slave operation. This will prevent the AXIS-to-Video-Out block from being able to lock to the AXIS video stream.
  7. Insert ILA’s. Inserting these within the design allow us to observe the detailed workings of the AXI buses. When commissioning a new image processing pipeline, I insert ILA blocks on the VTC output and the VDMA MM-to-AXIS port so that I can observe the generated timing signals and VDMA output stream. When observing the AXI Stream the tuser signal identifies the start of frame and the tlast signal represents the end of line. You may also want to observe the AXIS-to-Video-Out 32-bit status output, which provides indication of the locked status along with additional debug information.
  8. Ensure that HSize and Stride are set correctly. These are defined by the application software and configure the VMDA with frame-store information. HSize represents the horizontal size of the image and Stride represents the distance in memory between the image lines. Both HSize and Stride are defined in bytes. As such, when working with U32 or U16 types, take care to correctly set these values to reflect the number of bytes used.



Hopefully by the time you have checked these points, the issue with your VDMA based image processing pipeline will have been identified and you can start developing the higher-level image processing algorithms needed for the application.



Code is available on Github as always.



If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 




  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 


A commitment to “Any Media over Any Network” when video has rapidly proliferated across all markets requires another commitment: any-to-any video transcoding. That’s because the video you want is often not coded in the format you want (compression standard, bit rate, frame rate, resolution, color depth, etc.). As a result, transcoding has become a big deal and supporting the myriad video formats already available, and the new ones to come, is a big challenge.


Would you like some help? Wish granted.


Xilinx’s Pro AV & Broadcast Video Systems Architect Alex Luccisano is presenting two free, 1-hour Webinars on September 26 that covers video transcoding and how you can use Xilinx Zynq UltraScale+ EV MPSoCs for real-time, multi-stream video transcoding in your next design.



Click here for the 7:00 am (PST), 14:00 (GMT) Webinar on September 26.


Click here for the 10:00 am (PST), 17:00 (GMT) Webinar on September 26.




A recent Sensorsmag,com article written by Nick Ni and Adam Taylor titled “Accelerating Sensor Fusion Embedded Vision Applications” discusses some of the sensor-fusion principles behind, among other things, 3D stereo vision as used in the Carnegie Robotics Multisense stereo cameras discussed in today’s earlier blog titled “Carnegie Robotics’ FPGA-based GigE 3D cameras help robots sweep mines from a battlefield, tend corn, and scrub floors.” We’re starting to put a large amount of sensors into systems and turning the deluge of raw sensor data into usable information is a tough computational job.


Describing some of that job’s particulars consumes the first half of Ni’s and Taylor’s article. The second half of the article then discusses some implementation strategies based on the new Xilinx reVISION stack, which is built on top of Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.


If there are a lot of sensors in your next design, particularly image sensors, be sure to take a look at this article.



Carnegie Robotics currently uses a Spartan-6 FPGA in its GigE 3D imaging sensors to fuse video feeds from the two video cameras in the stereo pair; to generate 2.1 billion correspondence matches/sec from the left and right camera video streams; to then generate 15M points/sec of 3D point-cloud data from the correspondence matches; which in turn helps the company’s robots to make safe movement decisions and avoid obstacles while operating in unknown, unstructured environments. The company’s 3D sensors are used in unmanned vehicles and robots, which generally weigh between 100 and 1000 pounds, operate in a variety of such unstructured environments in applications as diverse as agriculture, building maintenance, mining, and battlefield mine sweeping. All of this is described by Carnegie Robotics’ CTO Chris Osterwood in a new 3-minute “Powered by Xilinx” video, which appears below.


The company is a spinout of Carnegie Mellon University’s National Robotics Engineering Center (NREC), one of the world’s premier research and development organizations for advanced field robotics, machine vision and autonomy. It offers a variety of 3D stereo cameras including:



  • The MultiSense S7, a rugged, high-resolution, high-data-rate, high-accuracy GigE 3D imaging sensor.
  • The MultiSense S21, a long-range, low-latency GigE imaging sensor based on the S7 stereo-imaging sensor but with wide (21cm) separation between the stereo camera pair for increased range
  • The MultiSense SL, a tri-modal GigE imaging sensor that fuses high-resolution, high-accuracy 3D stereo vision from the company’s MultiSense S7 stereo-imaging sensor with laser ranging (0.4 to 10m).




Carnegie Robotics MultiSense SL Tri-Modal 3D Imaging Sensor.jpg



Carnegie Robotics MultiSense SL Tri-Modal 3D Imaging Sensor





All of these Carnegie Robotics cameras consume less than 10W, thanks in part to the integrated Spartan-6 FPGA, which uses 1/10 of the power required by a CPU to generate 3D data from the 2.1 billion correspondence matches/sec. The Multisense SL served as the main perceptual “head” sensor for the six ATLAS robots that participated in the DARPA Robotics Challenge Trials in 2013. Five of these robots placed in the top eight finishers during the DARPA trials.


The video below also briefly discusses the company’s plans to migrate to a Zynq SoC, which will allow Carnegie Robotics’ sensors to perform more in-camera computation and will further reduce the overall robotic system’s size, weight, power consumption and image latency. That’s a lot of engineering dimensions all being driven in the right direction by the adoption of the more integrated Zynq SoC All Programmable technology.


Earlier this year, Carnegie Robotics and Swift Navigation announced that they were teaming up to develop a line of multi-sensor navigation products for autonomous vehicles, outdoor robotics, and machine control. Swift develops precision, centimeter-accurate GNSS (global navigation satellite system) products. The joint announcement included a photo of Swift Navigation’s Piksi Multi—a multi-band, multi-constellation RTK GNSS receiver clearly based on a Zynq Z-7020 SoC.





Swift Piksi Multi GNSS Receiver.jpg 



Swift Navigation Piksi Multi multi-band, multi-constellation RTK GNSS receiver, based on a Zynq SoC.




There are obvious sensor-fusion synergies between the product-design trajectory based on the Zynq SoC as described by Chris Osterwood in the “Powered by Xilinx” video below and Swift Navigation’s existing, Zynq-based Piksi  Multi GNSS receiver.


Here’s the Powered by Xilinx video:







By Adam Taylor


With the hardware platform built using the Zynq-based Avnet MiniZed dev board, the next step in this adventure is to write the software so we can display images on the 7-inch touch display. To do this we need write a bare-metal software application to do the following:


  • Configure the video timing controller (VTC) to generate timings required for the 800x480-pixel WGA (Wide Video Graphics Array) display.
  • Create three frame buffers within the PS (processing system) DDR SDRAM.
  • Configure the FLIR Lepton IR camera and store images in the current write frame buffer.
  • Configure the VDMA to read from the current read frame buffer.


The first step is to configure VTC to generate video timing signals for the desired resolution. Failing to do this correctly will mean that the AXI-Stream-to-Video-Out block won’t lock with the AXIS video stream.


The VTC is a core component, present in most image-processing pipelines (ISPs). The VTC’s function is not just limited to generating timing signals; it also detects video input timing. This feature allows the VTC to lock its timing generation with input video streams. That’s a key capability if the ISP needs to be agile and if it’s to adapt on the fly to changes in input resolution.









The VTC generator can be configured by either its own registers, which we update when write to those registers directly, or by the VTC detector registers. For this exercise, we need to set the VTC generator register sources correctly because we are only using the generator half of the VTC and not the detector half. The VTC’s power-on default is to take configuration data from the detector registers and that’s not the mode we wish to use here. To set the VTC register source, we’ll use a variable of the structure type XVtc_SourceSelect in conjunction with the function XVtc_SetSource().







Together these lines of code set the VTC control-register bits 8 to 26, which determine the source for each register. Each of these bits controls a specific generator register source. For example, bit 8 controls the Frame Horizontal Size register. Setting this bit to “0” instructs the VTC to use the detector settings while a “1” instructs the VTC to use the generator’s internal register settings.


Failing to do this results in writes to the detector registers having no effect on the generated video timing, which can be a rather frustrating issue to track down.


With the correct register source set, the next step is to write the timing parameters. We need the following settings for the 7-Inch touch display:








These parameters are stored in a variable of the XVtc_Timing type. We write them into the VTC using the XVtc_SetGeneratorTiming() function:







Of course, the VDMA and the frame buffers must also be aligned with the VTC. The current design uses three frame buffers to store the output images. Each frame buffer is based on the u32 type and declared as a one-dimensional array containing the total number of pixels in the image.


The u32 type is ideal for the frame buffer because each pixel in the 7-inch touch display requires eight-bit Red, Green, and Blue values. Therefore, we need 24 bits per pixel. Each frame buffer has an associated pointer that we’ll use for frame-buffer access. We initialize these pointers just after the program starts.


We use the VDMA to display the contents of the frame buffer. The key VDMA configuration parameters are stored within a variable of the type XAxiVdma_DmaSetup. It is here where we define the vertical & horizontal size, stride, and the frame-store addresses. The DMA is then configured using this data and the XAxiVdma_DmaConfig() and XAxiVdma_DmaSetBufferAddr() functions. One very important thing to remember here is that the horizontal size and stride are entered bytes. So in this example, they are set to 800 * 4 as each u32 word consists of four bytes.







We’ll use code from the previous example (p1 & p2) to interface with the FLIR Lepton IR camera. This code communicates with the camera over I2C and SPI interfaces. Once the image has been received from the camera, the code copies the image into the frame buffer. However, to ensure that we use most of the available image frame, we’ll use a simple digital zoom to scale up the 80x60-pixel image from the Lepton 2 camera. To do this, we output each pixel eight times to generate a 640x480-pixel display image that we’ll position within the 7-inch touch display’s 800x480 pixels. We set the remaining pixels to a constant color. As this is a touch display, this remaining space would be idea for command buttons and other user interfaces.


Putting all this together results in the image below. The green coloring comes from mapping the 8-bit Lepton image data into the green channel of the display.







This combination of the FLIR Lepton camera and the Zynq-based MiniZed dev board results in a very compact and cost-efficient thermal-imaging solution. The next step in our journey is to get the MiniZed’s wireless communications working with PetaLinux so that we can transmit these images over the air.



I have uploaded the initial complete design to GitHub and it is available here



If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



 MicroZed Chronicles hardcopy.jpg



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 


Pinnacle’s Denali-MC Real-Time, HDR-Capable Image Signal Processor Supports 29 CMOS Image Sensors

by Xilinx Employee ‎08-22-2017 10:33 AM - edited ‎08-22-2017 10:41 AM (7,682 Views)


Pinnacle Imaging Systems’ configurable Denali-MC HDR video and HDR still ISP (Image Signal Processor) IP can support 29 different HDR-capable CMOS image sensors including nine Aptina/ON Semi, six Omnivision, and eleven Sony sensors and twelve different pixel-level gain and frame-set HDR methods using 16-bit processing. The IP can be useful in a wide variety of applications including but certainly not limited to:


  • Surveillance/Public Safety
  • ADAS/Autonomous Driving
  • Intelligent Traffic systems
  • Body Cameras
  • Machine Vision



Pinnacle Denali-MC ISP Core.jpg


Pinnacle’s Denali-MC Image Signal Processor Core Block Diagram




Pinnacle has implemented its Denali-MC IP on a Xilinx Zynq Z-7045 SoC (from the photo on the Denali-MC product page, it appears that Pinnacle used a Xilinx Zynq ZC706 Eval Kit as the implementation vehicle) and it has produced this impressive 3-minute video of the IP in real-time action:





Please contact Pinnacle directly for more information about the Denali-MC ISP IP. The data sheet for the Denali-MC ISP core is here.



Adam Taylor’s MicroZed Chronicles, Part 212: Building an IoT Application with the MiniZed Dev Board

by Xilinx Employee ‎08-21-2017 10:34 AM - edited ‎08-21-2017 10:48 AM (14,067 Views)


By Adam Taylor


Avnet’s Zynq-based MiniZed is one of the most interesting dev boards we have looked at in this series. Thanks to its small form factor and its WiFi and Bluetooth capabilities, it is ideal for demonstrating Internet of Things (IoT) applications. We are now going to combine the FLIR Lepton camera module with the MiniZed and use them both to create a simple IOT application.






The approach I am going to follow for this demonstration is to update the MiniZed PetaLinux hardware design to do the following:


  • Interface with the FLIR Lepton camera module
  • Implement a video-processing pipeline that supports a 7-inch touch display connected to the MiniZed’s Pmod ports


The use of the local 7-inch touch display has two purposes. First, it demonstrates that the FLIR Lepton camera and the MiniZed are correctly working before I invest too much time in getting WiFi image transmission working. Second, the touch display could be used for local control and display, if required in an industrial (IIoT) application for example.


Opening the existing MiniZed Vivado project, you will notice it contains the Zynq (for the first time a single core Zynq) and an RTL block that interfaces with the WiFi and Bluetooth radio modules. This interface uses processing systems’ (PS’) SDIO0 for the WiFi interface and UART0 for Bluetooth. When we develop software, we must therefore remember to define the STDIN/STDOUT as being PS UART1 if we need a UART for debugging.


To this diagram we will add the following IP Blocks:


  • Quad SPI Core – Configured for single-mode operation. Receives the VoSPI from the Lepton.
  • Video Timing Controller – Generates the video timing signals for display output.
  • VDMA – Reads an image from the PS DDR and converts it into a PL (programmable logic) AXI Stream.
  • AXI Stream to Video Out – Converts the AXI Streamed video data to parallel video with timing synchronization provided by the Video Timing Core.
  • Zed_ALI3_Controller – Display controller for the 7-inch touch-screen display.


The Zed_ALI3_Controller IP block can be downloaded from the AVNET GitHub. Once downloaded, running the TCL script within the Vivado project will create an IP block we can include in our design.


The clocking architecture is now a little more complicated and includes the new Zed_ALI3_Controller block. This module generates the pixel clock, which is supplied to the VTC and the AXIS to Video blocks. Zynq-generated clocks provide the reference clock to the Zed_ALI3_Controller (33.33MHz) and the AXI Networks.


This demonstration uses two AXI networks. The first is the General-Purpose network. Te software uses this GP AXI network to configure IP blocks within the PL including the VDMA and VTC.


The second AXI network uses the High Performance AXI interface to transfer images from the PS DDR memory into the image-processing stream in the PL.





The complete block diagram




To connect the FLIR Lepton camera module, we will connect it as we did previously (p1 & p2) to the MiniZed shield connector, making use of the shield’s I2C and SPI connections.


The I2C pins are mapped into the constraints file already used for the temperature and motion sensors. Therefore, all we need to do is add the SPI I/O pin locations and standards.


The FLIR Lepton camera’s AREF supply pin is not enabled. To power the camera on the shield connector as in the previous example, we take 5V power from a flying lead connected to the opposite shield connector’s 5V supply and the back of the FLIR Lepton camera.





FLIR Lepton Connected to the MiniZed in the Shield Header




We’ll need both Pmod connectors To output the image to the 7-inch display. The pin-out required appears below. The differential pins on the Pmod connector are used for the video output lines with the I/O standard set to TMDS_33.






Pmod Pinout




With the basic hardware design in place all that remains now is to generate the software builds. Initially, I will build a bare metal application to verify that this design functions as intended. This step-by-step process stems from my strong belief in incremental verification as a project progresses.




  • You need to install the MiniZed board definition files into your Vivado /data/boards/board_files directory to work with the MiniZed dev board. If you have not already done so, they are available here.


  • This blog welcomes Daniel Taylor, born today.




Code is available on Github as always.



If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg

Adam Taylor’s MicroZed Chronicles, Part 211: Working with HDMI using Zynq SoC and MPSoC Dev Boards

by Xilinx Employee ‎08-14-2017 10:12 AM - edited ‎08-14-2017 10:16 AM (9,237 Views)


By Adam Taylor


Throughout this series we have looked at numerous image-processing applications. One of the simplest ways to capture or display an image in these applications is using HDMI (High Definition Multimedia Interface). HDMI is a proprietary standard that carries HD digital video and audio data. It is a widely adopted standard supported by many video displays and cameras. Its widespread adoption makes HDMI an ideal interface for our Zynq-based image processing applications.


In this blog, I am going to outline the different options for implementing HDMI in our Zynq design using the different boards we have looked as targets. This exploration will also provide ideas for us when we are designing our own custom hardware.





Arty Z7 HDMI In and Out Example




The several Zynq boards we have used in this series so far support HDMI using one of two methods: an external or internal CODEC.






Zynq-based boards with HDMI capabilities




If the board uses an external CODEC, it is fitted with an Analog Devices ADV7511 or ADV7611 for transmission and reception respectively. The external HDMI CODEC interfaces directly with the HDMI connector and generates the TMDS (Transition-Minimized Differential Signalling) signals containing the image and audio data.


The interface between the CODEC and Zynq PL (programmable logic) consists of a I2C bus, pixel-data bus, timing sync signals, and the pixel clock. We route the pixel data, sync signals, and clock directly into the PL. We use the I2C controller in the Zynq PS (processing system) for the I2C interface with the Zynq SoC’s I2C IO signals routed via the EMIO to the PL IO.

To ease integration between CODEC and PL, AVNET has developed two IP cores. They are available on the Avnet GitHub. In the image-processing chain, these IP blocks will be located at the very front and end of the chain if you are using them to interface to external CODECs.


The alternate approach is to use an internal CODEC located within the Zynq PL. In this case, the HDMI TMDS signals are routed directly to the PL IO and the CODEC is implemented with programmable logic. To save having to write such complicated CODECs from scratch, Digilent provides two CODEC IP cores. They are available from the Digilent GitHub. Using these cores within the design means the TMDS signals’ IO standard within the constraints file is set to TMDS_33 IO.


Note: This IO standard is only available on the High Range (HR) IO banks.





 HDMI IP Cores mentioned in the blog




Not every board I have discussed in the MicroZed Chronicles series can both receive and transmit HDMI signals. The ZedBoard and TySOM only provide HDMI output. If we are using one of these boards and the application must receive HDMI signals, we can use the FMC connector with an FMC HDMI input card.


The Digilent FMC-HDMI provides two HDMI inputs with the ability to receive HDMI data using both external and internal CODECs. Of its two inputs, the first uses the ADV7611, while the second equalizes and passes the HDMI Signals through to be decoded directly in the Zynq PL.







This provides us with the ability to demonstrate how both internal and external CODECs can be implanted on the ZedBoard when using an external CODEC for image transmission.


However first I need to get my soldering iron out to fit a jumper to J18 so that we can set VADJ on the ZedBoard to 3v3 as required for the FMC-HDMI.


We should also remember that while I have predominantly talked about the Zynq SoC here, the same discussion applies to the Zynq UltraScale+ MPSoC, although that device family also incorporates DisplayPort capabilities.



Code is available on Github as always.



If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



 MicroZed Chronicles hardcopy.jpg



  • Second Year E Book here
  • Second Year Hardback here


MicroZed Chronicles Second Year.jpg 





Two new papers, one about hardware and one about software, describe the Snowflake CNN accelerator and accompanying Torch7 compiler developed by several researchers at Purdue U. The papers are titled “Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks” (the hardware paper) and “Compiling Deep Learning Models for Custom Hardware Accelerators” (the software paper). The authors of both papers are Andre Xian Ming Chang, Aliasger Zaidy, Vinayak Gokhale, and Eugenio Culurciello from Purdue’s School of Electrical and Computer Engineering and the Weldon School of Biomedical Engineering.


In the abstract, the hardware paper states:



“Snowflake, implemented on a Xilinx Zynq XC7Z045 SoC is capable of achieving a peak throughput of 128 G-ops/s and a measured throughput of 100 frames per second and 120 G-ops/s on the AlexNet CNN model, 36 frames per second and 116 Gops/s on the GoogLeNet CNN model and 17 frames per second and 122 G-ops/s on the ResNet-50 CNN model. To the best of our knowledge, Snowflake is the only implemented system capable of achieving over 91% efficiency on modern CNNs and the only implemented system with GoogLeNet and ResNet as part of the benchmark suite.”



The primary goal of the Snowflake accelerator design was computational efficiency. Efficiency and bandwidth are the two primary factors influencing accelerator throughput. The hardware paper says that the Snowflake accelerator achieves 95% computational efficiency and that it can process networks in real time. Because it is implemented on a Xilinx Zynq Z-7045, power consumption is a miserly 5W according to the software paper, well within the power budget of many embedded systems.


The hardware paper also states:



“Snowflake with 256 processing units was synthesized on Xilinx's Zynq XC7Z045 FPGA. At 250MHz, AlexNet achieved in 93:6 frames/s and 1:2GB/s of off-chip memory bandwidth, and 21:4 frames/s and 2:2GB/s for ResNet18.”



Here’s a block diagram of the Snowflake machine architecture from the software paper, from the micro level on the left to the macro level on the right:



Snowflake CNN Accelerator Block Diagram.jpg 



 There’s room for future performance improvement notes the hardware paper:



“The Zynq XC7Z045 device has 900 MAC units. Scaling Snowflake up by using three compute clusters, we will be able to utilize 768 MAC units. Assuming an accelerator frequency of 250 MHz, Snowflake will be able to achieve a peak performance of 384 G-ops/s. Snowflake can be scaled further on larger FPGAs by increasing the number of clusters.”



This is where I point out that a Zynq Z-7100 SoC has 2020 “MAC units” (actually, DSP48E1 slices)—which is a lot more than you find on the Zynq Z-7045 SoC—and the Zynq UltraScale+ ZU15EG MPSoC has 3528 DSP48E2 slices—which is much, much larger still. If speed and throughput are what you desire in a CNN accelerator, then either of these parts would be worthy of consideration for further development.


Korea-based ATUS (Across The Universe) has developed a working automotive vision sensor that recognizes objects such as cars and pedestrians using a 17.53frames/sec video stream. A CNN (convolutional neural network) performs the object recognition on 20 different object classes and runs in the programmable logic fabric on a Xilinx Zynq Z7045 SoC. The programmable logic clocks at 200MHz and the entire design draws 10.432W. That’s about 10% of the power required by CPUs or GPUs to implement this CNN.


Here’s a block diagram of the recognition engine in the Zynq SoC’s programmable logic fabric:






ATUS’ Object-Recognition CNN runs in the programmable logic fabric of a Zynq Z7045 SoC




Here’s a short video of ATUS’ Automotive Vision Sensor in action, running on a Xilinx ZC106 eval kit:






Please contact ATUS for more information about their Automotive Vision Sensor.





The latest “Powered by Xilinx” video, published today, provides more detail about the Perrone Robotics MAX development platform for developing all types of autonomous robots—including self-driving cars. MAX is a set of software building blocks for handling many types of sensors and controls needed to develop such robotic platforms.


Perrone Robotics has MAX running on the Xilinx Zynq UltraScale+ MPSoC and relies on that heterogeneous All Programmable device to handle the multiple, high-bit-rate data streams from complex sensor arrays that include lidar systems and multiple video cameras.


Perrone is also starting to develop with the new Xilinx reVISION stack and plans to both enhance the performance of existing algorithms and develop new ones for its MAX development platform.


Here’s the 4-minute video:



About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.