UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Adam Taylor’s MicroZed Chronicles, Part 211: Working with HDMI using Zynq SoC and MPSoC Dev Boards

by Xilinx Employee ‎08-14-2017 10:12 AM - edited ‎08-14-2017 10:16 AM (1,580 Views)

 

By Adam Taylor

 

Throughout this series we have looked at numerous image-processing applications. One of the simplest ways to capture or display an image in these applications is using HDMI (High Definition Multimedia Interface). HDMI is a proprietary standard that carries HD digital video and audio data. It is a widely adopted standard supported by many video displays and cameras. Its widespread adoption makes HDMI an ideal interface for our Zynq-based image processing applications.

 

In this blog, I am going to outline the different options for implementing HDMI in our Zynq design using the different boards we have looked as targets. This exploration will also provide ideas for us when we are designing our own custom hardware.

 

 

 
Image1.jpg

 

Arty Z7 HDMI In and Out Example

 

 

 

The several Zynq boards we have used in this series so far support HDMI using one of two methods: an external or internal CODEC.

 

 

 

Image2.jpg

 

Zynq-based boards with HDMI capabilities

 

 

 

If the board uses an external CODEC, it is fitted with an Analog Devices ADV7511 or ADV7611 for transmission and reception respectively. The external HDMI CODEC interfaces directly with the HDMI connector and generates the TMDS (Transition-Minimized Differential Signalling) signals containing the image and audio data.

 

The interface between the CODEC and Zynq PL (programmable logic) consists of a I2C bus, pixel-data bus, timing sync signals, and the pixel clock. We route the pixel data, sync signals, and clock directly into the PL. We use the I2C controller in the Zynq PS (processing system) for the I2C interface with the Zynq SoC’s I2C IO signals routed via the EMIO to the PL IO.

To ease integration between CODEC and PL, AVNET has developed two IP cores. They are available on the Avnet GitHub. In the image-processing chain, these IP blocks will be located at the very front and end of the chain if you are using them to interface to external CODECs.

 

The alternate approach is to use an internal CODEC located within the Zynq PL. In this case, the HDMI TMDS signals are routed directly to the PL IO and the CODEC is implemented with programmable logic. To save having to write such complicated CODECs from scratch, Digilent provides two CODEC IP cores. They are available from the Digilent GitHub. Using these cores within the design means the TMDS signals’ IO standard within the constraints file is set to TMDS_33 IO.

 

Note: This IO standard is only available on the High Range (HR) IO banks.

 

 

Image3.jpg

 

 HDMI IP Cores mentioned in the blog

 

 

 

Not every board I have discussed in the MicroZed Chronicles series can both receive and transmit HDMI signals. The ZedBoard and TySOM only provide HDMI output. If we are using one of these boards and the application must receive HDMI signals, we can use the FMC connector with an FMC HDMI input card.

 

The Digilent FMC-HDMI provides two HDMI inputs with the ability to receive HDMI data using both external and internal CODECs. Of its two inputs, the first uses the ADV7611, while the second equalizes and passes the HDMI Signals through to be decoded directly in the Zynq PL.

 

 

Image4.jpg

 

 

 

This provides us with the ability to demonstrate how both internal and external CODECs can be implanted on the ZedBoard when using an external CODEC for image transmission.

 

However first I need to get my soldering iron out to fit a jumper to J18 so that we can set VADJ on the ZedBoard to 3v3 as required for the FMC-HDMI.

 

We should also remember that while I have predominantly talked about the Zynq SoC here, the same discussion applies to the Zynq UltraScale+ MPSoC, although that device family also incorporates DisplayPort capabilities.

 

 

Code is available on Github as always.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 MicroZed Chronicles hardcopy.jpg

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

Two new papers, one about hardware and one about software, describe the Snowflake CNN accelerator and accompanying Torch7 compiler developed by several researchers at Purdue U. The papers are titled “Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks” (the hardware paper) and “Compiling Deep Learning Models for Custom Hardware Accelerators” (the software paper). The authors of both papers are Andre Xian Ming Chang, Aliasger Zaidy, Vinayak Gokhale, and Eugenio Culurciello from Purdue’s School of Electrical and Computer Engineering and the Weldon School of Biomedical Engineering.

 

In the abstract, the hardware paper states:

 

 

“Snowflake, implemented on a Xilinx Zynq XC7Z045 SoC is capable of achieving a peak throughput of 128 G-ops/s and a measured throughput of 100 frames per second and 120 G-ops/s on the AlexNet CNN model, 36 frames per second and 116 Gops/s on the GoogLeNet CNN model and 17 frames per second and 122 G-ops/s on the ResNet-50 CNN model. To the best of our knowledge, Snowflake is the only implemented system capable of achieving over 91% efficiency on modern CNNs and the only implemented system with GoogLeNet and ResNet as part of the benchmark suite.”

 

 

The primary goal of the Snowflake accelerator design was computational efficiency. Efficiency and bandwidth are the two primary factors influencing accelerator throughput. The hardware paper says that the Snowflake accelerator achieves 95% computational efficiency and that it can process networks in real time. Because it is implemented on a Xilinx Zynq Z-7045, power consumption is a miserly 5W according to the software paper, well within the power budget of many embedded systems.

 

The hardware paper also states:

 

 

“Snowflake with 256 processing units was synthesized on Xilinx's Zynq XC7Z045 FPGA. At 250MHz, AlexNet achieved in 93:6 frames/s and 1:2GB/s of off-chip memory bandwidth, and 21:4 frames/s and 2:2GB/s for ResNet18.”

 

 

Here’s a block diagram of the Snowflake machine architecture from the software paper, from the micro level on the left to the macro level on the right:

 

 

Snowflake CNN Accelerator Block Diagram.jpg 

 

 

 There’s room for future performance improvement notes the hardware paper:

 

 

“The Zynq XC7Z045 device has 900 MAC units. Scaling Snowflake up by using three compute clusters, we will be able to utilize 768 MAC units. Assuming an accelerator frequency of 250 MHz, Snowflake will be able to achieve a peak performance of 384 G-ops/s. Snowflake can be scaled further on larger FPGAs by increasing the number of clusters.”

 

 

This is where I point out that a Zynq Z-7100 SoC has 2020 “MAC units” (actually, DSP48E1 slices)—which is a lot more than you find on the Zynq Z-7045 SoC—and the Zynq UltraScale+ ZU15EG MPSoC has 3528 DSP48E2 slices—which is much, much larger still. If speed and throughput are what you desire in a CNN accelerator, then either of these parts would be worthy of consideration for further development.

 

Korea-based ATUS (Across The Universe) has developed a working automotive vision sensor that recognizes objects such as cars and pedestrians using a 17.53frames/sec video stream. A CNN (convolutional neural network) performs the object recognition on 20 different object classes and runs in the programmable logic fabric on a Xilinx Zynq Z7045 SoC. The programmable logic clocks at 200MHz and the entire design draws 10.432W. That’s about 10% of the power required by CPUs or GPUs to implement this CNN.

 

Here’s a block diagram of the recognition engine in the Zynq SoC’s programmable logic fabric:

 

 

 

ATUS CNN.jpg

 

ATUS’ Object-Recognition CNN runs in the programmable logic fabric of a Zynq Z7045 SoC

 

 

 

Here’s a short video of ATUS’ Automotive Vision Sensor in action, running on a Xilinx ZC106 eval kit:

 

 

 

 

 

Please contact ATUS for more information about their Automotive Vision Sensor.

 

 

 

 

The latest “Powered by Xilinx” video, published today, provides more detail about the Perrone Robotics MAX development platform for developing all types of autonomous robots—including self-driving cars. MAX is a set of software building blocks for handling many types of sensors and controls needed to develop such robotic platforms.

 

Perrone Robotics has MAX running on the Xilinx Zynq UltraScale+ MPSoC and relies on that heterogeneous All Programmable device to handle the multiple, high-bit-rate data streams from complex sensor arrays that include lidar systems and multiple video cameras.

 

Perrone is also starting to develop with the new Xilinx reVISION stack and plans to both enhance the performance of existing algorithms and develop new ones for its MAX development platform.

 

Here’s the 4-minute video:

 

 

 

Last month, I wrote about Perrone Robotic’s Autonomous Driving Platform based on the Zynq UltraScale+ MPSoC. (See “Linc the autonomous Lincoln MKZ running Perrone Robotics' MAX AI takes a drive in Detroit without puny humans’ help” and “Perrone Robotics builds [Self-Driving] Hot Rod Lincoln with its MAX platform, on a Zynq UltraScale+ MPSoC.”) That platform runs on a controller box supplied by iVeia. In the 2-minute video below, iVeia’s CTO Mike Fawcett describes the attributes of the Zynq UltraScale+ MPSoC that make it a superior implementation technology for autonomous driving platforms. The Zynq UltraScale+ MPSoC’s immense, heterogeneous computing power supplied by six ARM processors plus programmable logic and a few more programmable resources flexibly delivers the monumental amount of processing required for vehicular sensor fusion and real-time perception processing while consuming far less power and generating far less heat than competing solutions involving CPUs or GPUs.

 

Here’s the video:

 

 

 

 

 

Adam Taylor’s MicroZed Chronicles, Part 206: Software for the Digilent Nexys Video Project

by Xilinx Employee ‎07-12-2017 10:21 AM - edited ‎07-12-2017 11:04 AM (6,150 Views)

 

By Adam Taylor

 

With the MicroBlaze soft processor system up and running on the Nexys Video Artix-7 FPGA Trainer Board, we need some software to generate a video output signal. In this example, we are going to use the MicroBlaze processor to generate test patterns. To do this, we’ll will write data into the Nexys board’s DDR SDRAM so that the VDMA can read this data and output it over HDMI.

 

The first thing we will need to do in the software is define the video frames, which are going to be stored in memory and output by the VDMA. To do this, we will define three frames within memory. We will define each frame as a two-dimensional array:

 

u8 frameBuf[DISPLAY_NUM_FRAMES][DEMO_MAX_FRAME];

 

Where DISPLAY_NUM_FRAME is set to 3 and DEMO_MAX_FRAME is set to 1920 * 1080 * 3. This takes into account the maximum frame resolution and the final multiplication by 3 accommodates each pixel (8 bits each for red, green, and blue).

 

To access these frames, we use an array of pointers to the each of the three frame buffers. Defining things this way eases our interaction with the frames.

 

With the frames defined, the next step it is to initialize and configure the peripherals within the design. These are:

 

  • VDMA – Uses DMA to move data from the board’s DDR SDRAM to the output video chain.
  • Dynamic Clocking IP – Outputs the pixel clock frequency and multiples of this frequency for the HDMI output.
  • Video Timing Controller 0 – Defines the output display timing depending upon resolution.
  • Video Timing Controller 1 – Determines the video timing on the input received. In this demo, this controller graba input frames from a source.

 

To ensure the VDMA functions correctly, we need to define the stride. This is the separation between each line within the DDR memory. For this application, the stride is 3 * 1920, which is the maximum length of a line.

When it comes to the application, we will be able to set different display resolutions from 640x480 to 1920x1080.

 

 

Image1.jpg 

 

 

No matter what resolution we select, we will be able to draw test patterns on the screen using software functions that write to the DDR SDRAM.  When we change functions, we will need to reconfigure the VDMA, Video Timing Generator 0, and the dynamic clocking module.

 

Our next step is to generate video output. With this example, there are many functions within the main application that generate, capture, and display video. These are:

 

  1. Bar Test Pattern – Generates several color bars across the screen
  2. Blended Test Pattern – Generates a blended color test pattern across the screen
  3. Streaming from the HDMI input to the output
  4. Grab an input frame and invert colors
  5. Grab an input frame and scale to the current display resolution

 

Within each of these functions we pass a pointer to the frame currently being output so that we can modify the pixel values in memory. This can be done simply as shown in the code snippet below, which sets the red, blue, and green pixels. Each pixel color value is unsinged 8 bits.

 

 

Image2.jpg 

 

 

When we run the application, we can choose which of the functions we want to exercise using the menu output over the UART terminal:

 

 

Image3.jpg 

 

 

 

Setting the program to output color bars and the blended test gave the outputs below on my display:

 

 

 

Image4.jpg 

 

 

Now we know how we can write information to DDR memory and see it appear on our display. We could generate a Mandelbrot pattern using this approach pretty simply and I will put that on my list of things to cover in a future blog.

 

 

Code is available on Github as always.

 

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 MicroZed Chronicles hardcopy.jpg

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 MicroZed Chronicles Second Year.jpg

 

 

 

 

Mindray, one of the world’s top medical ultrasound vendors, believes that the ZONE Sonography Technology (ZST+) in its cart-based Resona 7 premium color ultrasound system delivers unprecedented ultrasound imaging quality that help doctors non-invasively peer into their patients with much better clarity, which in turn helps them achieve a deeper understanding of the images and deliver better, more accurate diagnoses than was previously possible. According to the company, ZST+ takes medical ultrasound imaging from “conventional beamforming” to “channel data-based processing” that enhances images through advanced acoustic acquisition (10x faster than conventional line-by-line beamforming), dynamic pixel focusing (provides pixel uniformity from near field to far field), sound-speed compensation (allows for tissue variation), enhanced channel-data processing (improves image clarity), and total-recall imaging (permits retrospective processing of complete, captured data sets, further improving image clarity and reducing the need for repeated scanning).

 

 

 

Mindray Resona 7 Premium Ultrasound Imaging System v2.jpg

 

Mindray Resona 7 Premium Ultrasound System

 

 

Many of these advanced, real-time, ultrasound-processing and -imaging features are made possible by and implemented in a Xilinx Kintex-7 FPGA. For example, one of the advanced features enabled by ZST+ is “V Flow,” which can show blood flow direction and velocity using colored arrow overlays in an image with a refresh rate as fast as 600 images/sec. Here’s a mind-blowing, 6-second YouTube video by Medford Medical Solutions LLC showing just what this looks like:

 

 

 

 

Mindray V Flow Real-Time Ultrasound Blood-Flow Imaging

 

 

That’s a real-time blood flow and it’s the kind of high-performance, image-processing speed you can only achieve using programmable logic.

 

The Resona 7 system provides many advanced, ultrasound-imaging capabilities in addition to V Flow. Because of this broad capability spectrum, doctors are able to use the Resona series of medical ultrasound imaging machines in radiology applications—including abdominal imaging and imaging of small organs and blood vessels; vascular hemodymnamics evaluation; and obstetrics/gynecology applications including fetal CNS (central nervous system) imaging. (The fetal brain undergoes major developmental changes throughout pregnancy.) Resona 7 systems are also used for clinical medical research.

 

 

Mindray Fetal 3D Image v2.jpg

 

Fetal 3D image generated by a Mindray Resona 7 Premium Ultrasound Imaging System

 

 

 

Since its founding, the company has continuously explored ways to improve diagnostic confidence in ultrasound imaging. The recently developed ZST+ collects the company’s latest imaging advances into one series of imaging systems. However, ZST+ is not “finished.” Mindray is constantly improving the component ZST+ technologies, having just released version 2.0 of the Resona 7’s operating software and that continuous improvement effort explains why Mindray selected Xilinx All Programmable technology in the form of a Kintex-7 FPGA, which permits the revision and enhancement of existing real-time features and the addition of new features through what is effectively a software upgrade. Because of this, Mindray calls ZST+ a “living technology” and believes that the Kintex-7 FPGA is the core of this living technology.

 

 

Free Webinar on “Any Media Over Any Network: Streaming and Recording Design Solutions.” July 18

by Xilinx Employee ‎07-11-2017 11:21 AM - edited ‎07-11-2017 12:44 PM (4,999 Views)

 

On July 18 (that’s one week from today), Xilinx’s Video Systems Architect Alex Luccisano will be presenting a free 1-hour Webinar on streaming media titled “Any Media Over Any Network: Streaming and Recording Solution.” He’ll be discussing key factors such as audio/video codecs, bit rates, formats, and resolutions in the development of OTT (over-the-top) and VOD (video-on-demand) boxes and live-streaming equipment. Alex will also be discussing the Xilinx Zynq UltraScale+ MPSoC EV device family, which incorporates a hardened, multi-stream AVC/HEVC simultaneous encode/decode block that supports UHD-4Kp60. That’s the kind of integration you need to develop highly differentiated pro AV and broadcast products (and any other streaming-media or recording products) that stand well above the competition.

 

Register here.

 

 

By Adam Taylor

 

With the Vivado design for the Lepton thermal imaging IR camera built and the breakout board connected to the Arty Z7 dev board, the next step is to update the software so that we can receive and display images. To do this, we can also use the HDMI-out example software application as this correctly configures the board’s VDMA output. We just need to remove the test-pattern generation function and write our own FLIR control and output function as a replacement.

 

This function must do the following:

 

 

  1. Configure the I2C and SPI peripherals using the XIICPS and XSPI API’s provided when we generated the BSP. To ensure that we can communicate with the Lepton Camera, we need to set the I2C address to 0x2A and configure the SPI for CPOL=1, CPHA=1, and master operation.
  2. Once we can communicate over the I2C interface to determine that the Lepton camera module is ready, we need to read the status register. If the camera is correctly configured and ready when we read this register, the Lepton camera will respond with 0x06.
  3. With the camera module ready, we can read out an image and store it within memory. To do this we execute several SPI reads.
  4. Having captured the image, we can move the stored image into the memory location being accessed by VDMA to display the image.

 

 

To successfully read out an image from the Lepton camera, we need to synchronize the VoSPI output to find the start of the first line in the image. The camera outputs each line as a 160-byte block (Lepton 2) or two 160-byte blocks (Lepton 3), and each block has a 2-byte ID and a 2-byte CRC. We can use this ID to capture the image, identify valid frames, and store them within the image store.

 

Performing steps 3 and 4 allows us to increase the size of the displayed image on the screen. The Lepton 2 camera used for this example has a resolution of only 80 horizontal pixels by 60 vertical pixels. This image would be very small when displayed on a monitor, so we can easily scale the image to 640x480 pixels by outputting each pixel and line eight times. This scaling produces a larger image that’s easier to recognize on the screen although may look a little blocky.

 

However, scaling alone will not present the best image quality as we have not configured the Lepton camera module to optimize its output. To get the best quality image from the camera module, we need to use the I2C command interface to enable parameters such as AGC (automatic gain control), which affects the contrast and quality of the output image, and flat-field correction to remove pixel-to-pixel variation.

 

To write or read back the camera module’s settings, we need to create a data structure as shown below and write that structure into the camera module. If we are reading back the settings, we can then perform an I2C read to read back the parameters. Each 16-bit access requires two 8-bit commands:

 

  • Write to the command word at address 0x00 0x04.
  • Generate the command-word data formed from the Module ID, Command ID, Type, and Protection bit. This word informs the camera module which element of the camera we wish to address and if we wish to read, write, or execute the command.
  • Write the number of words to be read or written to the data-length register at address 0x00 0x06.
  • Write the number of data words to addresses 0x00 0x08 to 0x00 0x26.

 

This sequence allows us to configure the Lepton camera so that we get the best performance. When I executed the updated program, I could see the image that appears below, of myself taking a picture of the screen on the monitor screen. The image has been scaled up by a factor of 8.  

 

 

Image1.jpg 

 

 

Now that we have this image on the screen, I want to integrate this design with MiniZed dev board and configure the camera to transfer images over a wireless network.

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

 

Freelance documentary cameraman, editor, and producer/director Johnnie Behiri has just published a terrific YouTube video interview with Sebastian Pichelhofer, acting Project Leader of Apertus’ Zynq-based AXIOM Beta open-source 4K video camera project. (See below for more Xcell Daily blog posts about the AXIOM open-source 4K video camera.) This video is remarkable in the amount of valuable information packed into its brief, 20-minute duration. This video is part of Behiri’s cinema5D Web site and there’s a companion article here.

 

First, Sebastian explains the concept behind the project: develop a camera with features in demand, with development funded by a crowd-funding campaign. Share the complete, open-source design with community members so they can hack it, improve it, and give these improvements and modifications back to the community.

 

A significant piece of news: Sebastian says that the legendary Magic Lantern team (a group dedicated to adding substantial enhancements to the video and imaging capabilities of Canon dSLR cameras, is now on board as the project’s color-science experts. As a result, says Sebastian, the camera will be able to feature push-button selection of different “film stocks.” Film selection was one way for filmmakers to control the “look” of a film, back in the days when they used film. These days, camera companies devote a lot of effort into developing their own “film” look, but the AXIOM Beta project wants flexibility in this area, as in all other areas. I think Sebastian’s discussion of camera color science from end to end is excellent and worth watching just by itself.

 

I also appreciated Sebastian’s very interesting discussion of the challenges associated with a crowd-funded, open-source project like the AXIOM Beta. The heart of the AXIOM Beta camera’s electronic package is a Zynq SoC on an Avnet MicroZed SOM and that design choice strongly supports the project team’s desire to be able to quickly incorporate the latest innovations and design changes into systems in the manufacturing process. Here's a photo captured from the YouTube interview:

 

 

 

AXIOM Beta Interview Screen Capture 1.jpg 

 

 

 

At 14:45 in the video, Sebastian attempts to provide an explanation of the FPGA-based video pipeline’s advantages in the AXIOM Beta 4K camera—to the non-technical Behiri (and his mother). It’s not easy to contrast the sequential processing of microprocessor-based image and video processing with the same processing on highly parallel programmable logic when talking to a non-engineering audience, especially on the fly in a video interview, but Sebastian makes a valiant effort. By the way, the image-processing pipeline’s design is also open-source and Sebastian suggests that some brave souls may well want to develop improvements.

 

At the end of the interview, there are some video clips captured by a working AXIOM prototype. Of course, they are cat videos. How appropriate for YouTube! The videos are nearly monochrome (gray cats) and shot wide open so there’s a very shallow depth of field, but they still look very good to me for prototype footage. (There are additional video clips including HDR clips here on Apertus’ Web site.)

 

 

 

Here’s the cinema5D video interview:

 

 

 

 

 

 

Additional Xcell Daily posts about the AXIOM Beta open-source video camera project:

 

 

 

 

 

 

reVISION Cobot logo.jpg

In a free Webinar taking place on July 12, Xilinx experts will present a new design approach that unleashes the immense processing power of FPGAs using the Xilinx reVISION stack including hardware-tuned OpenCV libraries, a familiar C/C++ development environment, and readily available hardware-development platforms to develop advanced vision applications based on complex, accelerated vision-processing algorithms such as dense optical flow. Even though the algorithms are advanced, power consumption is held to just a few watts thanks to Xilinx’s All Programmable silicon.

 

Register here.

 

 

By Adam Taylor

 

Over this blog series, I have written a lot about how we can use the Zynq SoC in our designs. We have looked at a range of different applications and especially at embedded vision. However, some systems use a pure FPGA approach to embedded vision, as opposed to an SoC like the members in the Zynq family, so in this blog we are going to look at how we can get a simple HDMI input-and-output video-processing system using the Artix-7 XC7A200T FPGA on the Nexys Video Artix-7 FPGA Trainer Board. (The Artix-7 A200T is the largest member of the Artix-7 FPGA device family.)

 

Here’s a photo of my Nexys Video Artix-7 FPGA Trainer Board:

 

 

 

Image1.jpg

 

Nexys Video Artix-7 FPGA Trainer Board

 

 

 

For those not familiar with it, the Nexys Video Trainer Board is intended for teaching and prototyping video and vision applications. As such, it comes with the following I/O and peripheral interfaces designed to support video reception, processing, and generation/output:

 

 

  • HDMI Input
  • HDMI Output
  • Display Port Output
  • Ethernet
  • UART
  • USB Host
  • 512 MB of DDR SDRAM
  • Line In / Mic In / Headphone Out / Line Out
  • FMC

 

 

To create a simple image-processing pipeline, we need to implement the following architecture:

 

 

 

Image2.jpg 

 

 

The supervising processor (in this case, a Xilinx MicroBlaze soft-core RISC processor implemented in the Artix-7 FPGA) monitors communications with the user interface and configures the image-processing pipeline as required for the application. In this simple architecture, data received over the HDMI input is converted from its parallel format of Video Data, HSync and VSync into an AXI Streaming (AXIS) format. We want to convert the data into an AXIS format because the Vivado Design Suite provides several image-processing IP blocks that use this data format. Being able to support AXIS interfaces is also important if we want to create our own image-processing functions using Vivado High Level Synthesis (HLS).

 

The MicroBlaze processor needs to be able to support the following peripherals:

 

 

  • AXI UART – Enables communication and control of the system
  • AXI Timer Enables the MicroBlaze to time events

  • MicroBlaze Debugging Module – Enables the debugging of the MicroBlaze

  • MicroBlaze Local Memory – Connected to DLMB and ILMB (Data & Instruction Local Memory Bus)

 

We’ll use the memory interface generator to create a DDR interface to the board’s SDRAM. This interface and the SDRAM creates a common frame store accessible to both the image-processing pipeline and the supervising processor using an AXI interconnect.

 

Creating a simple image-processing pipeline requires the use of the following IP blocks:

 

 

  • DVI2RGB – HDMI input IP provided by Digilent
  • RGB2DVI – HDMI output IP provided by Digilent
  • Video In to AXI4-Stream – Converts a parallel-video input to AXI Streaming protocol (Vivado IP)
  • AXI4-Stream to Video Out – Converts an AXI-Stream-to-Parallel-video output (Vivado IP)
  • Video Timing Controller Input – Detects the incoming video parameters (Vivado IP)
  • Video Timing Controller Output – Generates the output video timing parameters (Vivado IP)
  • Video Direct Memory Access – Enables images to be written to and from the DDR SDRAM

 

 

The core of this video-processing chain is the VDMA, which we use to move the image into the DDR memory.

 

 

Image3.jpg 

 

 

 

The diagram above demonstrates how the IP block converts from streamed data to memory-mapped data for the read and write channels. Both VDMA channels provide the ability to convert between streaming and memory-mapped data as required. The write channel supports Stream-to-Memory-Mapped conversion while the read channel provides Memory-Mapped-to-Stream conversion.

 

When all this is put together in Vivado to create the initial base system, we get the architecture below, which is provided by the Nexys Video HDMI example.

 

 

Image4.jpg 

 

 

 

All that is required now is to look at the software required to configure the image-processing pipeline. I will explain that next time.

 

 

 

Code is available on Github as always.

 

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Adam Taylor Special Edition.jpg

 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

MicroZed Chronicles Second Year.jpg 

 

 

 

By Adam Taylor

 

We have looked at embedded vision several times throughout this series, however it has always been within the visible portion of the electromagnetic (EM) spectrum. Infra-Red is another popular imaging section of the EM spectrum that allows us to see thermal emissions from objects in the world around us. For this reason, IR is very popular if we want to see in low-light conditions or at night and in a range of other very exciting applications from wildfire detection to defense applications.

 

Therefore, over the next two blogs we are going to look at getting the FLIR Lepton IR camera up and running with the Zynq-based Arty Z7 dev board from Digilent. I selected Digilent’s Arty Z7 because it has an HDMI output port so we can output the image from the Lepton IR camera to a display. The Arty Z7 board also has the Arduino/chipKIT Shield connector, which we can use to connect directly to the Lepton camera itself.

 

 

Image1.jpg 

 

Digilent’s Arty Z7 dev board with the Lepton IR camera plugged into the board’s Arduino/chipKIT Shield connector

 

 

 

The Lepton IR camera from FLIR is an 80x60-pixel  (Lepton 2) or 160x120-pixel (Lepton 3) long-wave infra-red (LWIR) camera module. As a microbolometer-based thermal sensor, it operates without the need for cryogenic cooling, unlike HgCdTe-based sensors. Instead, a microbolometer works by each pixel changing resistance when IR radiation strikes it. This resistance change defines the temperatures in the scene. Typically, microbolometer-based thermal imagers have much-reduced resolution when compared to a cooled imager. They do however make thermal-imaging systems simpler to create.

 

To get the Lepton camera up and running with the Arty Z7 board, we need a breakout board for mounting the Lepton camera module. This breakout board simplifies the power connection, breaks out the camera’s control and video interfaces, and allows us to connect directly into the Arty Z7’s Shield connector.

 

The Lepton is controlled using a 2-wire interface, which is remarkably similar to I2C. This similarity allows us to use the Zynq I2C controller over EMIO to issue commands to the camera. The camera supplies 14-bit video output using Video over SPI (VoSPI). This video interface uses the SCLK, CS, and MISO signals. The camera module is assumed to be the slave. However, as we need to receive 16 bits of data for each pixel in the VoSPI transaction, we cannot use the SPI peripheral in the Zynq SoC’s PS (processing system), which only works with 8-bit data.

 

Instead, we will use an AXI QSPI IP block instantiated in the Zynq SoC’s PL (programmable logic), correctly configured to work with standard SPI. This is a simple example of why Zynq SoCs are so handy for I/O-centric embedded designs. You can accommodate just about any I/O requirement you encounter with a configurable IP block or a little HDL code.

 

Implementing the above will enable us to control the camera module on the breakout board and receive the video into the PS memory space. To display the received image, we need to be able to create a video pipeline that reads the image from the PS DDR SDRAM and outputs it over HDMI.

 

The simplest way to do this is to update the HDMI output reference design, which is available on the Digilent GitHub:

 

 

 

Image2.jpg 

 

 

 

 

To update date this design we are going to do the following:

 

  1. Add an AXI QSPI configured for 16-bit standard SPI
  2. Enable the PS I2C routing, the signal via the EMIO
  3. Map both the I2C and the SPI I/O to the Arty Z7 board’s Shield connector

 

 

We can then update the software running on the Zynq processor core to control the camera module, receive the VoSPI, and configure the HDMI output channel.

 

For this example, I have plugged the Breakout board with the camera module so that the SDA and SCL pins on the Shield connector and breakout board align. This means we can use the Shield connector’s IO10 to IO13 pins for the VoSPI. We do not use IO11, which would be the SPI interface’s MOSI, because that signal is unused in this application.

 

However, if we use this approach we must also provide an additional power signal to the Breakout board and camera module as the Shield connector on the Arty Z7 is not able to supply the 5V required on the A pin. Instead, it’s connected to a Zynq I/O pin. Therefore, I used a wire from the Shield connector’s 5V pin on the opposite side to supply power to the Lepton breakout board’s 5V power input.

 

With the hardware, up and running and the Vivado design rebuilt we can then open SDK and update the software as required to display the image. We will look at that next time.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

Last year at Embedded World 2016, a vision-guided robot based on a Xilinx Zynq UltraScale+ ZU9 MPSoC incorporated into a ZCU102 eval kit autonomously played solitaire on an Android tablet in the Xilinx booth. (See “3D Delta Printer plays Robotic Solitaire on a Touchpad under control of a Xilinx Zynq UltraScale+ MPSoC.”) This year at Embedded World 2017, an upgraded and improved version of the robot again appeared in the Xilinx booth, still playing solitaire.

 

In the original implementation, an HD video camera monitored the Android tablet’s screen to image the solitaire playing cards. Acceleration hardware implemented in the Zynq MPSoC’s PL (programmable logic) performed real-time preprocessing of the HD video stream including Sobel edge detection. Software running on the Zynq MPSoC’s ARM Cortex-A53 APU (Application Processing Unit) recognized the playing cards from the processed video supplied by the Zynq MPSoC’s PL and planned the solitaire game moves for the robot. The Zynq MPSoC’s dual-core ARM Cortex-R5 RPU (Real-Time Processing Unit) operating in lockstep—useful for safety-critical applications such as robotic control—operated the robotic stylus positioner, fashioned from a 3D Delta printer. The other processing sections of the Zynq UltraScale+ ZU9 MPSoC were also gainfully employed in this demo.

 

This year a trained, 3-layer Convolutional BNN (Binary Neural Network) with 256 neurons/layer executed the playing-card recognition algorithm. The tangible results: improved accuracy and a performance boost of 11,320x! (Not to mention the offloading of the recognition task from the Zynq MPSoC’s APU.)

 

Here’s a new, 2-minute video explaining the new autonomous solitaire-playing demo system:

 

 

 

 

Note: For more information about BNNs and programmable logic, see:

 

 

 

 

 

 

Drone maker Zerotech announced the Dobby AI pocket-sized drone earlier this year. Now, there’s a Xilinx video of DeePhi Tech’s Fuzhang Shi explaining a bit more about the machine-learning innards of the Dobby AI drone, which uses deep-learning algorithms for tasks including pedestrian detection, tracking, and gesture recognition. DeePhi’s algorithms are running on a Xilinx Zynq Z-7020 SoC integrated into the Dobby AI drone.

 

Power consumption, stability, and cost are all critical factors in drone design and DeePhi developed a low-power, low-cost, high-stability system using the Zynq SoC, which executes 230GOPS while consuming a mere 3W. This is far more power-efficient than running similar application on CPUs or GPUs, explains Fuzhang Shi.

 

 

 

Dobby AI PCB with Zynq SoC.jpg

 

Zerotech’s Dobby AI Palm-sized autonomous drone pcb with Zynq Z-7020 SoC running DeePhi deep-learning algorithms

 

 

 

 

 

Here’s the 2-minute video:

 

 

Compute Acceleration: GPU or FPGA? New White Paper gives you numbers

by Xilinx Employee ‎06-14-2017 02:24 PM - edited ‎06-14-2017 02:28 PM (10,379 Views)

 

Cloud computing and application acceleration for a variety of workloads including big-data analytics, machine learning, video and image processing, and genomics are big data-center topics and if you’re one of those people looking for acceleration guidance, read on. If you’re looking to accelerate compute-intensive applications such as automated driving and ADAS or local video processing and sensor fusion, this blog post’s for you to. The basic problem here is that CPUs are too slow and they burn too much power. You may have one or both of these challenges. If so, you may be considering a GPU or an FPGA as an accelerator in your design.

 

How to choose?

 

Although GPUs started as graphics accelerators, primarily for gamers, a few architectural tweaks and a ton of software have made them suitable as general-purpose compute accelerators. With the right software tools, it’s not too difficult to recode and recompile a program to run on a GPU instead of a CPU. With some experience, you’ll find that GPUs are not great for every application workload. Certain computations such as sparse matrix math don’t map onto GPUs well. One big issue with GPUs is power consumption. GPUs aimed at server acceleration in a data-center environment may burn hundreds of watts.

 

With FPGAs, you can build any sort of compute engine you want with excellent performance/power numbers. You can optimize an FPGA-based accelerator for one task, run that task, and then reconfigure the FPGA if needed for an entirely different application. The amount of computing power you can bring to bear on a problem is scary big. A Virtex UltraScale+ VU13P FPGA can deliver 38.3 INT8 TOPS (that’s tera operations per second) and if you can binarize the application, which is possible with some neural networks, you can hit 500TOPS. That’s why you now see big data-center operators like Baidu and Amazon putting Xilinx-based FPGA accelerator cards into their server farms. That’s also why you see Xilinx offering high-level acceleration programming tools like SDAccel to help you develop compute accelerators using Xilinx All Programmable devices.

 

For more information about the use of Xilinx devices in such applications including a detailed look at operational efficiency, there’s a new 17-page White Paper titled “Xilinx All Programmable Devices: A Superior Platform for Compute-Intensive Systems.”

 

 

 

 

 

Although humans once served as the final inspectors for pcbs, today’s component dimensions and manufacturing volumes mandate the use of camera-based automated optical inspection (AOI) systems. Amfax has developed a 3D AOI system—the a3Di—that uses two lasers to make millions of 3D measurements with better than 3μm accuracy. One of the company’s customers uses an a3Di system to inspect 18,000 assembled pcbs per day.

 

The a3Di control system is based on a National Instruments (NI) cRIO-9075 CompactRIO controller—with an integrated Xilinx Virtex-5 LX25 FPGA—programmed with NI’s LabVIEW systems engineering software. The controller manages all aspects of the a3Di AOI system including monitoring and control of:

 

 

  • Machine motors
  • Control switches
  • Optical position sensors
  • Inverters
  • Up and downstream SMEMA (Surface Mount Equipment Manufacturers Association) conveyor control
  • Light tower
  • Pneumatics
  • Operator manual controls for width PCB control
  • System emergency stop

 

 

The system provides height-graded images like this:

 

 

 

Amfax 3D PCB image.jpg 

 

3D Image of a3Di’s Measurement Data: Colors represent height, with Z resolution down to less than a micron. The blue section at the top indicates signs of board warp. Laser etched component information appears on some of the ICs.

 

 

 

The a3Di system then compares this image against a stored golden reference image to detect manufacturing defects.

 

Amfax says that it has found the CompactRIO system to be “CompactRIO system has proven to be a dependable, reliable, and cost-effective.” In addition, the company found it could get far better timing resolution with the CompactRIO system than the 1msec resolution usually provided by PLC controllers.

 

 

This project was a 2017 NI Engineering Impact Award Finalist in the Electronics and Semiconductor category last month at NI Week. It is documented in this NI case study.

 

 

Linc, Perrone Robotics’ autonomous Lincoln MKZ automobile, took a drive around the Perrone paddock at the TU Automotive autonomous vehicle show in Detroit last week and Dan Isaacs, Xilinx’s Director Connected Systems in Corporate Marketing, was there to shoot photos and video. Perrone’s Linc test vehicle operates autonomously using the company’s MAX (Mobile Autonomous X), a “comprehensive full-stack, modular, real-time capable, customizable, robotics software platform for autonomous (self-driving) vehicles and general purpose robotics.” MAX runs on multiple computing platforms including one based on an Iveia controller, which is based on an Iveia Atlas SOM, which in turn is based on a Xilinx Zynq UltraScale+ MPSoC. The Zynq UltraScale+ MPSoC handles the avalanche of data streaming from the vehicle’s many sensors to ensure that the car travels the appropriate path and avoids hitting things like people, walls and fences, and other vehicles. That’s all pretty important when the car is driving itself in public. (For more information about Perrone Robotics’ MAX, see “Perrone Robotics builds [Self-Driving] Hot Rod Lincoln with its MAX platform, on a Zynq UltraScale+ MPSoC.”)

 

Here’s a photo of Perrone’s sensored-up Linc autonomous automobile in the Perrone Robotics paddock at TU Automotive in Detroit:

 

 

Perrone Robotics Linc Autonomous Driving Lincoln MKZ.jpg 

 

 

And here’s a photo of the Iveia control box with the Zynq UltraScale+ MPSoC inside, running Perrone’s MAX autonomous-driving software platform. (Note the controller’s small size and lack of a cooling fan):

 

 

Iveia Autonomous Driving Controller for Perrone Robotics.jpg 

 

 

Opinions about the feasibility of autonomous vehicles are one thing. Seeing the Lincoln MKZ’s 3800 pounds of glass, steel, rubber, and plastic being controlled entirely by a little silver box in the trunk, that’s something entirely different. So here’s the video that shows Perrone Robotics’ Linc in action, driving around the relative safety of the paddock while avoiding the fences, pedestrians, and other vehicles:

 

 

 

When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.

 

Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:

 

  • Machine learning for manufacturing
  • Cloud acceleration
  • Autonomous cars, drones, and robots
  • Real-time 4K, UHD, and 8K video and image processing
  • VR and AR
  • High-speed networking by RF, LED-based free-air optics, and fiber
  • Cybersecurity for IIoT

 

That’s a huge range covered in just five minutes.

 

Here’s the video:

 

 

 

 

 

A wide range of commercial, government, and social applications require precise aerial imaging. These application range from the management of high-profile, international-scale humanitarian and disaster relief programs to everyday commercial use—siting large photovoltaic arrays for example. Satellites can capture geospatial imagery across entire continents, often at the expense of spatial resolution. Satellites also lack the flexibility to image specific areas on demand. You must wait until the satellite is above the real estate of interest. Spookfish Limited in Australia along with ICON Technologies have developed the Spookfish Airborne Imaging Platform (SAIP) based on COTS (commercial off-the-shelf) products including National Instruments’ (NI’s) PXIe modules and LabVIEW systems engineering software that can capture precise images with resolutions of 6cm/pixel to better than 1cm/pixel from a light aircraft cruising at 160 knots at altitudes to 12,000 feet.

 

The 1st-generation SAIP employs one or more cameras installed in a tube attached to the belly of a light aircraft. Success with the initial prototype led to the development of a 2nd-generation design with two camera tubes. The system has continued to grow and now accommodates as many as three camera tubes with as many as four cameras per tube.

 

The multiple cameras must be steered precisely in continuous, synchronized motion while recording camera angles, platform orientation, and platform acceleration. All of this data is used to post-process the image data. At typical operating altitudes and speeds, the cameras must be steered with millidegree precision and the camera angles and platform position must be logged with near-microsecond accuracy and precision. Spookfish then uses a suite of open-source and proprietary computer-vision and photogrammetry techniques to process the imagery, which results in orthophotos, elevation data, and 3D models.

 

Here’s a block diagram of the Spookfish SAIP:

 

 

Spookfish SAIP Block diagram.jpg 

 

 

 

The NI PXIe system in the SAIP design consists of a PXIe-1082DC chassis, a PXIe-8135 RT controller, a PXI-6683H GPS/PPS synchronization module, a PXIe-6674T clock and timing module, a PXIe-7971R FlexRIO FPGA Module, and a PXIe-4464 sound and vibration module. (The PXIe7971R FlexRIO module is based on a Xilinx Kintex-7 325T FPGA. The PXI-6683H synchronization module and the PXIe-6674T clock and timing module are both based on Xilinx Virtex-5 FPGAs.)

 

Here’s an aerial image captured by an SAIP system at 6cm/pixel:

 

 

Spookfish SAIP image at 6cm per pixel.jpg 

 

 

And here’s a piece of an aerial image taken by an SAIP system at 1.5cm/pixel:

 

 

Spookfish SAIP image at 6cm per pixel.jpg 

 

 

 

During its multi-generation development, the SAIP system quickly evolved far beyond its originally envisioned performance specification as new requirements arose. For example, initial expectations were that logged data would only need to be tagged with millisecond accuracy. However, as the project progressed, ICON Technologies and NI improved the system’s timing accuracy and precision by three orders of magnitude.

 

NI’s FPGA-based FlexRIO technology was also crucial in meeting some of these shifting performance targets. Changing requirements pushed the limits of some of the COTS interfaces, so custom FlexRIO interface implementations optimized for the tasks were developed as higher-speed replacements. Often, NI’s FlexRIO technology is employed for the high-speed computation available in the FPGA’s DSP slices, but in this case it was the high-speed programmable I/O that was needed.

 

Spookfish and ICON Technologies are now developing the next-generation SAIP system. Now that the requirements are well understood, they’re considering a Xilinx FPGA-based or Zynq-based NI CompactRIO controller as a replacement for the PXIe system. NI’s addition of TSN (time-sensitive networking) to the CompactRIO family’s repertoire makes such a switch possible. (For more information about NI’s TSN capabilities, see “IOT and TSN: Baby you can drive my [slot] car. TSN Ethernet network drives slot cars through obstacles at NI Week.”)

 

 

 

This project was a 2017 NI Engineering Impact Award finalist in the Energy category last month at NI Week. It is documented in this NI case study.

 

DFC Design’s Xenie FPGA module product family pairs a Xilinx Kintex-7 FPGA (a 70T or a 160T) with a Marvell Alaska X 88X3310P 10GBASE-T PHY on a small board. The module breaks out six of the Kintex-7 FPGA’s 12.5Gbps GTX transceivers and three full FPGA I/O banks (for a total of 150 single-ended I/O or up to 72 differential pairs) with configurable I/O voltage to two high-speed, high-pin-count, board-to-board connectors. A companion Xenie BB Carrier board accepts the Xenie FPGA board and breaks out the high-speed GTX transceivers into a 10GBASE-T RJ45 connector, an SFP+ optical cage, and four SDI connectors (two inputs and two outputs).

 

Here’s a block diagram and photo of the Xenia FPGA module:

 

 

 

 

DFC Design Xenia FPGA Module.jpg 

 

 

Xenia FPGA module based on a Xilinx Kintex-7 FPGA

 

 

 

And here’s a photo of the Xenie BB Carrier board that accepts the Xenia FPGA module:

 

 

 

DFC Design Xenia BB Carrier Board.jpg 

 

Xenia BB Carrier board

 

 

These are open-source designs.

 

DFC Design has developed a UDP core for this design, available on OpenCores.org and has published two design examples: an Ethernet example and a high-speed camera design.

 

Here’s a block diagram of the Ethernet example:

 

 

 

 

DFC Design Ethernet Example System.jpg 

 

 

Please contact DFC Design directly for more information.

 

 

 

LMI TechnologiesGocator 3210 is a smart, metrology-grade, stereo-imaging snapshot sensor that produces 3D point clouds of scanned objects with 35μm accuracy over fields as large as 100x154mm at 4fps. The diminutive (190x142x49mm) Gocator 3210 pairs a 2Mpixel stereo camera with an industrial LED-based illuminator that projects structured blue light onto the subject to aid precise measurement of object width, height, angles, and radii. An integral Xilinx Zynq SoC accelerates these measurements so that the Gocator 3210 can scan objects at 4Hz, which LMI says is 4x the speed of such a sensor setup feeding raw data to a host CPU for processing. This fast scanning speed means that parts can pass by the Gocator for inspection on a production line without stopping for the measurement to be made. The Gocator uses a GigE interface for host connection.

 

 

LMI Technologies Gocator 3210.jpg

 

LMI Technologies Gocator 3210 3D Smart Stereo Vision Sensor

 

 

LMI provides a browser-based GUI to process the point clouds and 3D models generated by the Gocator. That means the processing—which includes the calculation of object width, height, angles, and radii—all takes place inside of the Gocator. No additional host software is required.

 

Here’s a photo of LMI’s GUI showing a 3D scan of an automotive cylinder head (a typical application for this type of sensor):

 

 

 

LMI Gocator GUI.jpg

 

 

LMI also offers an SDK so that you can develop sophisticated inspection programs that run on the Gocator. The company has also produced an extensive series of interesting training videos for the Gocator sensor family.

 

Finally, here’s a short (3 minutes) but information-dense video explaining the Gocator’s features and capabilities:

 

 

 

 

 

LMI’s VP of Sales Len Chamberlain has just published a blog titled “Meeting the Demand for Application-Specific 3D Solutions” that further discusses the Gocator 3210’s features and applications.

 

 

A paper titled “Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs” that discusses the advantages and efficiencies of Python-based development using the PYNQ development environment—based on the Python programming language and Jupyter Notebooks—and the Digilent PYNQ-Z1 board, which is based on the Xilinx Zynq SoC, recently won the Best Short Paper award at the 25th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2017) held in Napa, CA. The paper’s authors—Senior Computer Scientist Andrew G. Schmidt, Computer Scientist Gabriel Weisz, and Research Director Matthew French from the USC Viterbi School of Engineering’s Information Sciences Institute—evaluated the impact of, the performance implications, and the bottlenecks associated with using PYNQ for application development on Xilinx Zynq devices. The authors then compared their Python-based results against existing C-based and hand-coded implementations.

 

 

The authors do a really nice job of describing what PYNQ is:

 

 

“The PYNQ application development framework is an open source effort designed to allow application developers to achieve a “fast start” in FPGA application development through use of the Python language and standard “overlay” bitstreams that are used to interact with the chip’s I/O devices. The PYNQ environment comes with a standard overlay that supports HDMI and Audio inputs and outputs, as well as two 12-pin PMOD connectors and an Arduino-compatible connector that can interact with Arduino shields. The default overlay instantiates several MicroBlaze processor cores to drive the various I/O interfaces. Existing overlays also provide image filtering functionality and a soft-logic GPU for experimenting with SIMT [single instruction, multiple threads] -style programming. PYNQ also offers an API and extends common Python libraries and packages to include support for Bitstream programming, directly access the programmable fabric through Memory-Mapped I/O (MMIO) and Direct Memory Access (DMA) transactions without requiring the creation of device drivers and kernel modules.”

 

 

They also do a nice job of explaining what PYNQ is not:

 

 

“PYNQ does not currently provide or perform any high-level synthesis or porting of Python applications directly into the FPGA fabric. As a result, a developer still must use create a design using the FPGA fabric. While PYNQ does provide an Overlay framework to support interfacing with the board’s IO, any custom logic must be created and integrated by the developer. A developer can still use high-level synthesis tools or the aforementioned Python-to-HDL projects to accomplish this task, but ultimately the developer must create a bitstream based on the design they wish to integrate with the Python [code].”

 

 

Consequently, the authors did not simply rely on the existing PYNQ APIs and overlays. They also developed application-specific kernels for their research based on the Redsharc project (see “Redsharc: A Programming Model and On-Chip Network for Multi-Core Systems on a Programmable Chip”) and they describe these extensions in the FCCM 2017 paper as well.

 

 

 

Redsharc Project.jpg

 

 

 

So what’s the bottom line? The authors conclude:

 

“The combining of both Python software and FPGA’s performance potential is a significant step in reaching a broader community of developers, akin to Raspberry Pi and Ardiuno. This work studied the performance of common image processing pipelines in C/C++, Python, and custom hardware accelerators to better understand the performance and capabilities of a Python + FPGA development environment. The results are highly promising, with the ability to match and exceed performances from C implementations, up to 30x speedup. Moreover, the results show that while Python has highly efficient libraries available, such as OpenCV, FPGAs can still offer performance gains to software developers.”

 

In other words, there’s a vast and unexplored territory—a new, more efficient development space—opened to a much broader system-development audience by the introduction of the PYNQ development environment.

 

For more information about the PYNQ-Z1 board and PYNQ development environment, see:

 

 

 

 

 

 

The new PALTEK DS-VU 3 P-PCIE Data Brick places a Xilinx Virtex UltraScale+ VU3P FPGA along with 8Gbytes of DDR4-2400 SDRAM, two VITA57.1 FMC connectors, and four Samtec FireFly Micro Flyover ports on one high-bandwidth, PCIe Gen3 with a x16 host connector. The card aims to provide FPGA-based hardware acceleration for applications including 2K/4K video processing, machine learning, big data analysis, financial analysis, and high-performance computing.

 

 

Paltek Data Brick.jpg 

 

PALTEK Data Brick packs Virtex UltraScale+ VU3P FPGA onto a PCIe card

 

 

 

The Samtec Micro Flyover ports accept both ECUE copper twinax and ECUO optical cables. The ECUE twinax cables are for short-reach applications and have a throughput of 28Gbps per channel. The ECUO optical cables operate at a maximum data rate of 14Gbps per channel and are available with as many as 12 simplex or duplex channels (with 28Gbps optical channels in development at Samtec).

 

For broadcast video applications, PALTEK also offers companion 12G-SDI Rx and 12G-SDI-Tx cards that can break out eight 12G-SDI video channels from one FireFly connection.

 

Please contact PALTEK directly for more information about these products.

 

 

 

 For more information about the Samtec FireFly system, see:

 

 

 

 

 

 

A Tale of Two Cameras: You’re gonna need a bigger FPGA

by Xilinx Employee on ‎05-05-2017 04:02 PM (10,205 Views)

 

Cutting-edge industrial and medical camera maker NET (New Electronic Technology) had a table at this week’s Embedded Vision Summit where the company displayed two generations of GigE cameras: the GigEPRO and CORSIGHT. Both of these camera families include multiple cameras that accommodate a wide range of monochrome and color image sensors. There are sixteen different C- and CS-mount cameras in the GigEPRO family with sensors ranging from WVGA (752x480 pixels) to WQUXGA (3664x2748 pixels) and a mix of global and rolling shutters. The CORSIGHT family includes eleven cameras with sensors ranging from WVGA (752x480 pixels) to QSXGA (2592x1944 pixels), or one of two line-scan sensors (2048 or 4096 pixels), with a mix of global and rolling shutters. In addition to its Gigabit Ethernet interface, the CORSIGHT cameras have WiFi, Bluetooth, USB 2.0, and optional GSM interfaces. Both the GigEPRO and CORSIGHT cameras are user-programmable and have on-board, real-time image processing, which can be augmented with customer-specific image-processing algorithms.

 

 

NET GigEPRO Camera.jpg 

 

 

GigEPRO Camera from NET

 

 

 

 

NET CORESIGHT Camera.jpg 

 

CORSIGHT Camera from NET

 

 

 

You program both cameras with NET’s GUI-based SynView Software Development Kit, which generates code for controlling the NET cameras and for processing the acquired images. When you create a program, SynView automatically determines if the required functionality is available in camera hardware. If not, SynView will do the necessary operations in software (although this increases the host CPU’s load). NET’s GigEPRO and CORSIGHT cameras are capable of performing significant on-board image processing right out of the box including Bayer decoding for color cameras, LUT (Lookup Table) conversion, white balance, gamma, brightness, contrast, color correction, and saturation.

 

Which leads to the question: What’s performing all of these real-time, image-processing functions in NET’s GigEPRO and CORSIGHT cameras?

 

Xilinx FPGAs, of course. (This should not be a surprise. After all, you’re reading a post in the Xilinx Xcell Daily blog.)

 

The GigEPRO cameras are based on Spartan-6 FPGAs—an LX45, LX75, or LX100 depending on the family member. At the Embedded Vision Summit, Dr. Thomas Däubler, NET’s Managing Director and CTO, explained to me that “the FPGAs are what give the GigEPRO cameras their PRO features.” In fact, there is user space reserved in the larger FPGAs for customer-specific algorithms to be performed in real time inside of the camera itself. What sort of algorithms? Däubler gave me two examples: laser triangulation and Q-code recognition. In fact, he said, some of NET’s customers perform all of the image processing and analysis in the camera and never send the image to the host—just the results of the analysis. Of course, this distributed-processing approach greatly reduces the host CPU’s processing load and therefore allows one host computer to handle many more cameras.

 

Here’s a photo from the Summit showing a NEW GigEPRO camera inspecting a can on a spinning platform while reading the label on the can:

 

 

 

NET GigEPRO Camera Inspects Object on Spinning Table and Reads Label.jpg 

 

 

NET GigEPRO Camera Inspects Object on Spinning Table and Reads Label

 

 

There’s a second important reason for using the FPGA in NET’s GigEPRO cameras: the FPGAs create a hardware platform that allowed NET to develop the sixteen GigEPRO family members that handle many different image sensors with varied hardware interfaces and timing requirements. NET relied on the Spartan-6 FPGAs’ I/O programmability to help with this aspect of the camera family’s design.

 

So when it came time for NET to develop a new intelligent camera family—the recently introduced CORSIGHT smart vision system—with even more features, did NET’s design engineers continue to use FPGAs for real-time image processing?

 

Of course they did. For the newer camera, and for the same reasons, they chose the Xilinx Artix-7 FPGA family.

 

And here’s the CORSIGHT camera in action:

 

 

 

NET CORSIGHT Camera Inspects Object on Spinning Table and Reads Label.jpg

 

 

NET CORSIGHT Camera Inspects Object on Spinning Table

 

 

 

Note: For more information about the GigEPRO and CORSIGHT camera families, and the SynView software, please contact NET directly.

 

 

 

This week, just in time for the Embedded Vision Summit in Santa Clara, Aldec announced its TySOM-2A Embedded Prototyping Board based on a Xilinx Zynq Z-7030 SoC. The board features a combination of memories (1Gbyte of DDR3 SDRAM, SPI flash memory, EEPROM, microSD), communication interfaces (2× Gigabit Ethernet, 4× USB 2.0, UART-via-USB, Wi-Fi, Bluetooth, HDMI 1.4), an FMC connector, and other miscellaneous modules (LEDs, DIP switches, XADC, RTC, accelerometer, temperature sensor). Here’s a photo of the TySOM-2A board:

 

 

 

 

Aldec TySOM-2A Board.jpg

 

 

Aldec TySOM-2A Embedded Prototyping Board based on a Xilinx Zynq Z-7030 SoC

 

 

In its booth at the Summit, Aldec demonstrated a real-time, face-detection reference design running on the Zynq SoC. The program depends on the accelerated processing capabilities of the Zynq SoC’s programmable logic to run this complex code, processing a 1280x720-pixel video stream in real time. The most computationally intensive parts of the code including edge detection, color-space conversion, and frame merging were off-loaded from Zynq SoC’s ARM Cortex-A9 processor to the device’s programmable logic using Xilinx’s SDSoC Development Environment.

 

Here’s a very short video showing the demo:

 

 

 

 

 

This week at the Embedded Vision Summit in Santa Clara, CA, Mario Bergeron demonstrated a design he’d created that combines real-time visible and IR thermal video streams from two different sensors. (Bergeron is a Senior FPGA/DSP Designer with Avnet.) The demo runs on an Avnet PicoZed SOM (System on Module) based on a Xilinx Zynq Z-7030 SoC. The PicoZed SOM is the processing portion of the Avnet PicoZed Embedded Vision Kit. An FMC-mounted Python-1300-C image sensor supplies the visible video stream in this demo and a FLIR Systems Lepton image sensor supplies the 60x80-pixel IR video stream. The Lepton IR sensor connects to the PicoZed SOM over a Pmod connector on the PicoZed.

 

Here’s a block diagram of this demo:

 

 

Avnet reVISION demo with PicoZed Embedded Vision Kit.jpg 

 

 

Bergeron integrated these two video sources and developed the code for this demo using the new Xilinx reVISION stack, which includes a broad range of development resources for vision-centric platform, algorithm, and application development. The Xilinx SDSoC Development Environment and the Vivado Design Suite including the Vivado HLS high-level synthesis tool are all part of the reVISION stack, which also incorporates OpenCV libraries and machine-learning frameworks such as Caffe.

 

In this demo, Bergeron’s design takes the visible image stream and performs a Sobel edge extraction on the video. Simultaneously, the design also warps and resizes the IR Thermal image stream so that the Sobel edges can be combined with the thermal image. The Sobel and resizing algorithms come from the Xilinx reVISION stack library and Bergeron wrote the video-combining code in C. He then synthesized these three tasks in hardware to accelerate them because they were the most compute-intensive tasks in the demo. Vivado HLS created the hardware accelerators for these tasks directly from the C code and SDSoC connected the accelerator cores to the ARM processor with DMA hardware and generated the software drivers.

 

Here’s a diagram showing the development process for this demo and the resulting system:

 

 

Avnet reVISION demo Project Diagram.jpg 

 

 

In the video below, Bergeron shows that the unaccelerated Sobel algorithm running in software consumes 100% of an ARM Cortex-A9 processor in the Zynq Z-7030 SoC and still only achieves about one frame/sec—far too slow. By accelerating this algorithm in the Zynq SoC’s programmable logic using SDSoC and Vivado HLS, Bergeron cut the processor load by more than 80% and achieved real-time performance. (By my back-of-the envelope calculation, that’s about a 150x speedup: going from 1 to 30 frames/sec and cutting the processor load by more than 80%.)

 

Here’s the 5-minute video of this fascinating demo:

 

 

 

 

 

 

For more information about the Avnet PicoZed Embedded Vision Kit, see “Avnet’s $1500, Zynq-based PicoZed Embedded Vision Kit includes Python-1300-C camera and SDSoC license.”

 

 

For more information about the Xilinx reVISION stack, see “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge,” and “Yesterday, Xilinx announced the reVISION stack for software-defined embedded-vision apps. Today, there’s two demo videos.”

 

How to tackle KVM (Keyboard/Video/Mouse) challenges at 4K and beyond: Any Media Over Any Network

by Xilinx Employee ‎05-04-2017 11:05 AM - edited ‎05-04-2017 11:08 AM (9,262 Views)

 

We’ve had KVM (keyboard, video, mouse) switches for controlling multiple computers from one set of user-interface devices for a long, long time. Go back far enough, and you were switching RS-232 ports to control multiple computers or other devices with one serial terminal. Here’s what they looked like back in the day:

 

 

Old KVM Switch.jpg 

 

 

In those days, these KVM switches could be entirely mechanical. Now, they can’t. There are different video resolutions, different coding and compression standards, there’s video over IP (Ethernet), etc. Today’s KVM switch is also a many-to-many converter. Your vintage rotary switch isn’t going to cut it for today’s Pro AV and Broadcast applications.

 

If you need to meet this kind of design challenge—today—you need low-latency video codecs like H.265/HEVC and compression standards such as TICO; you need 4K and 8K video resolution with conversion to and from HD; and you need compatibility and interoperability with all sorts of connectivity standards including 3G/12G SGI and high-speed Ethernet. In short, you need “Any Media Over Any Network” capability and you need all of that without exploding your BOM cost.

 

Where are you going to get it?

 

Well, considering that this is the Xilinx Xcell Daily blog, it’s a good bet that you’re going to hear about the capabilities of at least one Xilinx All Programmable device.

 

Actually, this blog is about a couple of upcoming Webinars being held on May 23 titled “Any Media Over Any Network: KVM Extenders, Switches and KVM-over-IP.” The Webinars are identical but are being held at two different times to accommodate worldwide time zones. In this webinar, Xilinx will show you how you can use the Zynq UltraScale+ MPSoC in KVM applications. The webinar will highlight how Xilinx and its partners’ video-processing and -connectivity IP cores along with the integrated H.265/HEVC codec in the three Zynq UltraScale+ MPSoC EV family members can quickly and easily address new opportunities in the KVM market.

 

 

  • Register here for the free webinar being held at 7am Pacific Daylight Time (UTC-08:00).

 

  • Register here for the free webinar being held at 10am Pacific Daylight Time (UTC-08:00).

 

 

 

 

 

 

In this 40-minute webinar, Xilinx will present a new approach that allows you to unleash the power of the FPGA fabric in Zynq SoCs and Zynq UltraScale+ MPSoCs using hardware-tuned OpenCV libraries, with a familiar C/C++ development environment and readily available hardware development platforms. OpenCV libraries are widely used for algorithm prototyping by many leading technology companies and computer vision researchers. FPGAs can achieve unparalleled compute efficiency on complex algorithms like dense optical flow and stereo vision in only a few watts of power.

 

This Webinar is being held on July 12. Register here.

 

Here’s a fairly new, 4-minute video showing a 1080p60 Dense Optical Flow demo, developed with the Xilinx SDSoC Development Environment in C/C++ using OpenCV libraries:

 

 

 

 

For related information, see Application Note XAPP1167, “Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries.”

 

Plethora IIoT develops cutting‑edge solutions to Industry 4.0 challenges using machine learning, machine vision, and sensor fusion. In the video below, a Plethora IIoT Oberon system monitors power consumption, temperature, and the angular speed of three positioning servomotors in real time on a large ETXE-TAR Machining Center for predictive maintenance—to spot anomalies with the machine tool and to schedule maintenance before these anomalies become full-blown faults that shut down the production line. (It’s really expensive when that happens.) The ETXE-TAR Machining Center is center-boring engine crankshafts. This bore is the critical link between a car’s engine and the rest of the drive train including the transmission.

 

 

 

Plethora IIoT Oberon System.jpg 

 

 

 

Plethora uses Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs as the heart of its Oberon system because these devices’ unique combination of software-programmable processors, hardware-programmable FPGA fabric, and programmable I/O allow the company to develop real-time systems that implement sensor fusion, machine vision, and machine learning in one device.

 

Initially, Plethora IIoT’s engineers used the Xilinx Vivado Design Suite to develop their Zynq-based designs. Then they discovered Vivado HLS, which allows you to take algorithms in C, C++, or SystemC directly to the FPGA fabric using hardware compilation. The engineers’ first reaction to Vivado HLS: “Is this real or what?” They discovered that it was real. Then they tried the SDSoC Development Environment with its system-level profiling, automated software acceleration using programmable logic, automated system connectivity generation, and libraries to speed programming. As they say in the video, “You just have to program it and there you go.”

 

Here’s the video:

 

 

 

 

Plethora IIoT is showcasing its Oberon system in the Industrial Internet Consortium (IIC) Pavilion during the Hannover Messe Show being held this week. Several other demos in the IIC Pavilion are also based on Zynq All Programmable devices.

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.