UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

Exactly a week ago, Xilinx introduced the Zynq UltraScale+ RFSoC family, which is a new series of Zynq UltraScale+ MPSoCs with RF ADCs and DACs and SD-FECs added. (See “Zynq UltraScale+ RFSoC: All the processing power of 64- and 32-bit ARM cores, programmable logic plus RF ADCs, DACs.”) This past Friday at the Xilinx Showcase held in Longmont, Colorado, Senior Marketing Engineer Lee Hansen demonstrated a Zynq UltraScale+ ZU28DR RFSoC with eight 12-bit, 4Gsamples/sec RF ADCs, eight 14-bit, 6.4Gsamples/sec RF DACs, and eight SD-FECs connected through an appropriate interface to National Instruments’ LabVIEW Systems Engineering Development Environment.

 

The demo system was generating signals using the RF DACs, receiving the signals using the RF ADCs, and then displaying the resulting signal spectrum using LabVIEW.

 

Here’s a 3-minute video of the demo:

 

 

 

 

 

Over the past weekend, Xilinx held a Showcase and PYNQ Hackathon in its Summit Retreat Center in Longmont, Colorado. About 100 people from tech companies all over Colorado attended the Showcase and twelve teams—about 40 people including students from local universities and engineers from industry—competed in the Hackathon.

 

Here are a few images from the Xilinx Showcase and PYNQ Hackathon:

 

 

 

 Xilinx Longmont Summit Retreat Center.jpg

 

Xilinx Summit Retreat Center in Longmont, Colorado

 

 

 

Showcase 3.jpg

 

Xilinx CTO Ivo Bolsens welcomes everyone to the Xilinx Showcase

 

 

 

Showcase 1.jpg

 

Xilinx Showcase Attendees

 

 

 

Showcase 2.jpg 

 

More Xilinx Showcase Attendees

 

 

 

Dans Welcome to the Hackathon.jpg

 

Xilinx VP of Interactive Design Tools and Xilinx Longmont Site Manager Dan Gibbons Welcomes the Hackers to the PYNQ Hackathon

 

 

 

Pizza Dinner.jpg

 

Abo’s Pizza (Boulder’s Finest) for Friday Night Dinner

 

 

 

Handing out Pynqs.jpg

 

Handing out Digilent PYNQ-Z1 boards

 

 

 

Work Begins 1.jpg

 

The PYNQ Hackathon Work Begins

 

 

 

A little advice.jpg

 

Giving a little help to the participants

 

 

 

Work Begins 2.jpg

 

Focus, Focus, Focus

 

 

 

 Logictools Mini Lecture.jpg

 

A Mini Lecture about the PYNQ Logictools Overlay for Hackathon Attendees

 

 

 

 

 

 Catching some sleep.jpg

 

Getting a little shuteye

 

 

 

Longs Peak.jpg

 

A view of Longs Peak from the Xilinx Longmont Summit Retreat Center

 

 

 

 

The event was organized by an internal Xilinx team including:

 

  • Craig Abramson
  • Mindy Brooks
  • Derek Curd
  • Greg Daughtry
  • Brad Fross
  • Adrian Hernandez
  • Graham Schelle

 

 

In addition, a team of helpers was on hand over the 30-hour duration of the event to answer questions:

 

  • Kv Thanjavur Bhaaskar
  • Anurag Dubey
  • Vikhyat Goyal
  • David Gronlund
  • Dustin Richmond
  • Naveen Purushotham
  • Rock Qu

 

 

For more information about the PYNQ Hackathon, see “12 PYNQ Hackathon teams competed for 30 hours, inventing remote-controlled robots, image recognizers, and an air keyboard.”

 

 

For more information about the Python-based, open-source PYNQ development environment and the Zynq-based Digilent PYNQ-Z1 dev board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.”

 

 

 

 

 

Twelve student and industry teams competed for 30 straight hours in the Xilinx Hackathon 2017 competition over the weekend at the Summit Retreat Center in the Xilinx corporate facility located in Longmont, Colorado. Each team member received a Digilent PYNQ-Z1 dev board, which is based on a Xilinx Zynq Z-7020 SoC, and then used their fertile imaginations to conceive of and develop working code for an application using the open-source, Python-based PYNQ development environment, which is based on self-documenting Jupyter Notebooks. The online electronics and maker retailer Sparkfun, located just down the street from the Xilinx facility in Longmont, supplied boxes of compatible peripheral boards with sensors and motor controllers to spur the team members’ imaginations. Several of the teams came from local universities including the University of Colorado at Boulder and the Colorado School of Mines in Golden, Colorado. At the end of the competition, eleven of the teams presented their results using their Jupyter Notebooks. Then came the prizes.

 

For the most part, team members had never used the PYNQ-Z1 boards and were not familiar with using programmable logic. In part, that was the intent of the Hackathon—to connect teams of inexperienced developers with appropriate programming tools and see what develops. That’s also the reason that Xilinx developed PYNQ: so that software developers and students could take advantage of the improved embedded performance made possible by the Zynq SoC’s programmable hardware without having to use ASIC-style (HDL) design tools to design hardware (unless they want to do so, of course).

 

Here are the projects developed by the teams, in the order presented during the final hour of the Hackathon (links go straight to the teams’ Github repositories with their Jupyter notebooks that document the projects with explanations and “working” code):

 

 

  • Team “from timemachine import timetravel” developed a sine wave generator with a PYNQ-callable frequency modulator and an audio spectrum analyzer. Time permitted the team to develop a couple of different versions of the spectrum analyzer but not enough time to link the generator and analyzer together.

 

  • Team “John Cena” developed a voice-controlled mobile robot. An application on a PC captured the WAV file for a spoken command sequence and this file was then wirelessly transmitted to the mobile robot, which interpreted commands and executed them.

 

 

Team John Cena Voice-Controlled Mobile Robot.jpg

 

Team John Cena’s Voice-Controlled Mobile Robot

 

 

 

  • Inspired by the recent Nobel Physics prize given to the 3-person team that architected the LIGO gravity-wave observatories, Team “Daedalus” developed a Hackathon entry called “Sonic LIGO”—a sound localizer that takes audio captured by multiple microphones, uses time correlation to filter audio noise from the sounds of interest, and then triangulates the location of the sound using its phase derived from each microphone. Examples of sound events the team wanted to locate included hand claps and gun shots. The team planned to use its members’ three PYNQ-Z1 boards for the triangulation.

 

  • Team “Questionable” from the Colorado School of Mines developed an automated parking lot assistant to aid students looking for a parking space near the university. The design uses two motion detectors to detect cars passing through each lot’s entrances and exits. Timing between the two sensors determines whether the car is entering or leaving the lot. The team calls their application PARQYNG and produced a YouTube video to explain the idea:

 

 

 

 

 

  • Team “Snapback” developed a Webcam-equipped cap that captures happy moments by recognizing smiling faces and using that recognition to trigger the capture of a short video clip, which is then wirelessly uploaded to the cloud for later viewing. This application was inspired by the oncoming memory loss of one of the team members’ grandmother.

 

  • Team “Trimble” from Trimble, Inc. developed a sophisticated photogrammetric application for determining position using photogrammetry techniques. The design uses the Zynq SoC’s programmable logic to accelerate the calculations.

 

  • Team “Codeing Crazy” developed an “air keyboard” (it’s like a working air guitar but it’s a keyboard) using OpenCV to recognize the image of a hand in space, locate the recognized object in a space that’s predefined as a keyboard, and then playing the appropriate note.

 

  • Team “Joy of Pink” from CU Boulder developed a real-time emoji generator that recognizes facial expressions in an image, interprets the emotion shown on the subject’s face by sending the image to Microsoft’s cloud-based Azure Emotion API, and then substituting the appropriate emoji in the image.

 

 

Team Joy of Pink Emoji Generator.jpg

 

Team “Joy of Pink” developed an emoji generator based on facial interpretation on Microsoft’s cloud-based Azure Emotion API

 

 

 

  • Team “Harsh Constraints” plunged headlong into a Verilog-based project to develop a 144MHz LVDS Cameralink interface to a thermal camera. It was a very ambitious venture for a team that had never before used Verilog.

 

 

  • Team “Caffeine” developed a tone-controlled robot using audio filters instantiated in the Zynq SoC’s programmable logic to decode four audio tones which then control robot motion. Here’s a block diagram:

 

 

Team Caffeine Audio Fiend Machine.jpg

 

 

Team Caffeine’s Audio Fiend Tone-Based Robotic Controller

 

 

 

  • Team “Lynx” developed a face-recognition system that stores faces in the cloud in a spreadsheet on a Google drive based on whether or not the system has seen that face before. The system uses Haar-Cascade detection written in OpenCV.

 

 

After the presentations, the judges deliberated for a few minutes using multiple predefined criteria and then awarded the following prizes:

 

 

  • The “Murphy’s Law” prize for dealing with insurmountable circumstances went to Team Harsh Constraints.

 

  • The “Best Use of Programmable Logic” prize went to Team Caffeine.

 

  • The “Runner Up” prize went to Team Snapback.

 

  • The “Grand Prize” went to Team Questionable.

 

 

Congratulations to the winners and to all of the teams who spent 30 hours with each other in a large room in Colorado to experience the joy of hacking code to tackle some tough problems. (A follow-up blog will include a photographic record of the event so that you can see what it was like.)

 

 

 

For more information about the PYNQ development environment and the Digilent PYNQ-Z1 board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.”

 

 

 

Late last month, I wrote about an announcement by DNAnexus and Edico Genome that described a huge reduction in the cost and time to analyze genomic information, enabled by Amazon’s FPGA-accelerated AWS EC2 F1 instance. (See “Edico Genome and DNAnexus announce $20, 90-minute genome analysis on Amazon’s FPGA-accelerated AWS EC2 F1 instance.”) The AWS Partner Network blog has just published more details in an article written by Amazon’s Aaron Friedman, titled “How DNAnexus and Edico Genome are Powering Precision Medicine on Amazon Web Services (AWS).

 

 

The details are exciting to say the least. The article begins with this statement:

 

“Diagnosing the medical mysteries behind acutely ill babies can be a race against time, filled with a barrage of tests and misdiagnoses. During the first few days of life, a few hours can save or seal the fate of patients admitted to the neonatal intensive care units (NICUs) and pediatric intensive care units (PICUs). Accelerating the analysis of the medical assays conducted in these hospitals can improve patient outcomes, and, in some cases, save lives.”

 

 

Then, if you read far enough into the post, you find this statement:

 

Rady Children’s Institute for Genomic Medicine is one of the global leaders in advancing precision medicine. To date, the institute has sequenced the genomes of more than 3,000 children and their family members to diagnose genetic diseases. 40% of these patients are diagnosed with a genetic disease, and 80% of these receive a change in medical management. This is a remarkable rate of change in care, considering that these are rare diseases and often involve genomic variants that have not been previously observed in other individuals.”

 

 

This example is merely a road sign, pointing the way to even more exciting developments in FPGA-accelerated, cloud-based computing to come. Well-known Silicon Valley venture capitalist Jim Hogan directly addressed these developments in a speech at San Jose State University just a couple of weeks ago. (See “Four free training videos (two hour's worth) on using Xilinx SDAccel to create apps for Amazon AWS EC2 F1 instances.”)

 

 

The Amazon AWS EC2 F1 instance is a cloud service that’s based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs installed in Amazon’s Web servers. For more information on the AWS EC2 F1 Instance in Xcell Daily, see:

 

 

 

 

 

 

 

 

By Adam Taylor

 

The Xilinx Zynq UltraScale+ MPSoC is good for many applications including embedded vision. It’s APU with two or four 64-bit ARM Cortex-A53 processors, Mali GPU, DisplayPort interface, and on-chip programmable logic (PL) give the Zynq UltraScale+ MPSoC plenty of processing power to address exciting applications such as ADAS and vision-guided robotics with relative ease. Further, we can use the device’s PL and its programmable I/O to interface with a range of vision and video standards including MIPI, LVDS, parallel, VoSPI, etc. When it comes to interfacing image sensors, the Zynq UltraScale+ MPSoC can handle just about anything you throw at it.

 

Once we’ve brought the image into the Zynq UltraScale+ MPSoC’s PL, we can implement an image-processing pipeline using existing IP cores from the Xilinx library or we can develop our own custom IP cores using Vivado HLS (high-level synthesis). However, for many applications we’ll need to move the images into the device’s PS (processing system) domain before we can apply exciting application-level algorithms such as decision making or use the Xilinx reVISION acceleration stack.

 

 

 

Image1.jpg 

 

The original MicroZed Evaluation kit and UltraZed board used for this demo

 

 

 

I thought I would kick off the fourth year of this blog with a look at how we can use VDMA instantiated in the Zynq MPSoC’s PL to transfer images from the PL to the PS-attached DDR Memory without processor intervention. You often need to make such high-speed background transfers in a variety of applications.

 

To do this we will use the following IP blocks:

 

  • Zynq MPSoC core – Configured to enable both a Full Power Domain (FPD) AXI HP Master and FPD HPC AXI Slave, along with providing at least one PL clock and reset to the PL fabric.
  • VDMA core – Configured for write only operations, No FSync option and with a Genlock Mode of master
  • Test Pattern Generator (TPG) – Configurable over the AXI Lite interface
  • AXI Interconnects – Implement the Master and Slave AXI networks

 

 

Once configured over its AXI Lite interface, the Test Pattern Generator outputs test patterns which are then transferred into the PS-attached DDR memory. We can demonstrate that this has been successful by examining the memory locations using SDK.

 

 

Image2.jpg 

 

Enabling the FPD Master and Slave Interfaces

 

 

 

For this simple example, we’ll clock both the AXI networks at the same frequency, driven by PL_CLK_0 at 100MHz.

 

For a deployed system, an image sensor would replace the TPG as the image source and we would need to ensure that the VDMA input-channel clocks (Slave-to-Memory-Map and Memory-Map-to-Slave) were fast enough to support the required pixel and frame rate.  For example, a sensor with a resolution of 1280 pixels by 1024 lines running at 60 frames per second would require a clock rate of at least 108MHz. We would need to adjust the clock frequency accordingly.

 

 

 

Image3.jpg

 

Block Diagram of the completed design

 

 

 

To aid visibility within this example, I have included three ILA modules, which are connected to the outputs of the Test Pattern Generator, AXI VDMA, and the Slave Memory Interconnect. Adding these modules enables the use of Vivado’s hardware manager to verify that the software has correctly configured the TPG and the VDMA to transfer the images.

 

With the Vivado design complete and built, creating the application software to configure the TPG and VDMA to generate images and move them from the PL to the PS is very straightforward. We use the AXIVDMA, V_TPG, Video Common APIs available under the BSP lib source directory to aid in creating the application. The software itself performs the following:

 

  1. Initialize the TPG and the AXI VDMA for use in the software application
  2. Configure the TPG to generate a test pattern configured as below
    1. Set the Image Width to 1280, Image Height to 1080
    2. Set the color space to YCRCB, 4:2:2 format
    3. Set the TPG background pattern
    4. Enable the TPG and set it for auto reloading
  3. Configure the VDMA to write data into the PS memory
    1. Set up the VDMA parameters using a variable of the type XAxiVdma_DmaSetup – remember the horizontal size and stride are measured in bytes not pixels.
    2. Configure the VDMA with the setting defined above
    3. Set the VDMA frame store location address in the PS DDR
    4. Start VDMA transfer

The application will then start generating test frames, transferred from the TPG into the PS DDR memory. I disabled the caches for this example to ensure that the DDR memory is updated.

 

Examining the ILAs, you will see the TPG generating frames and the VDMA transferring the stream into memory mapped format:

 

 

 

Image4.jpg

 

TPG output, TUSER indicates start of frame while TLAST indicates end of line

 

 

 

Image5.jpg

 

VDMA Memory Mapped Output to the PS

 

 

 

Examining the frame store memory location within the PS DDR memory using SDK demonstrates that the pixel values are present.

 

 

Image6.jpg 

 

Test Pattern Pixel Values within the PS DDR Memory

 

 

 

 

You can use the same approach in Vivado when creating software for a Zynq Z-7000 SoC iinstead of a Zynq UltraScale+ MPSoC by enabling the AXI GP master for the AXI Lite bus and AXI HP slave for the VDMA channel.

 

Should you be experiencing trouble with your VDMA based image processing chain, you might want to read this blog.

 

 

The project, as always, is on GitHub.

 

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg  

 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

MathWorks has just published a 4-part mini course that teaches you how to develop vision-processing applications using MATLAB, HDL Coder, and Simulink, then walks you through a practical example targeting a Xilinx Zynq SoC using a lane-detection algorithm in Part 4.

 

 

Click here for each of the classes:

 

 

 

 

 

 

 

 

PDF Solutions provides yield-improvement technologies and services to the IC-manufacturing industry to lower manufacturing costs, improve profitability, and shorten time to market. One of the company’s newest solutions is the eProbe series of e-beam tools used for inline electrical characterization and process control. These tools combine an SEM (scanning electron microscope) and an optical microscope and have the unique ability to provide real-time image analysis of nanometer-scale features. The eProbe development team selected National Instrument’s (NI’s) LabVIEW to control the eProbe system and brought in JKI—a LabVIEW consulting company, Xilinx Alliance Program member, and NI Silver Alliance Partner—to help develop the system.

 

 

PDF Solutions eProbe 100 e-beam inspection system.jpg 

 

 

PDF Solutions eProbe e-beam tool combines an SEM with an optical microscope

 

 

 

In less than four months, JKI helped PDF Solutions attain a 250MHz pixel-acquisition rate from the prototype eProbe using a combination of NI’s FlexRIO module, based on a Xilinx Kintex-7 FPGA, and NI’s LabVIEW FPGA module. According to the PDF Solutions case study published on the JKI Web site, using NI’s LabVIEW allowed the PDF/JKI team to implement the required, real-time FPGA logic and easily integrate third-party FPGA IP in a fraction of the time required by alternative design platforms while still achieving the project’s image-throughput goals.

 

LabVIEW controls most of the functions within the eProbe that perform the wafer inspection including:

 

 

  • Controlling the x and y axis for the stage
  • Sampling and driving various I/O points for the electron gun and the column
  • Controlling the load port and equipment frontend module
  • Overseeing the vacuum and interlocking components
  • Directing and managing SEM and optical image acquisition.

 

 

JKI contributed both to the eProbe’s software architecture design and the development of various high-level software components that coordinate and control the low-level hardware functions including data acquisition and image manipulation.

 

Although the eProbe’s control system runs within NI’s LabVIEW environment, the system’s user interface is based on a C# application from The PEER Group called the Peer Tool Orchestrator (PTO). JKI developed the interface between the eProbe’s front-end user interface and its LabVIEW-based control system using its internally developed tools. (Note: JKI offers several LabVIEW development tools and templates directly on this Web page.)

 

 

eProbe 150 User Interface Screen.jpg

 

 

eProbe user interface screen

 

 

 

Once PDF Solutions started fielding eProbe systems, JKI sent people to work with PDF Solutions’ customers on site in a collaboration that helped generate ideas for future algorithm and tool improvements.

 

 

For more information about real-time LabVIEW development using the NI LabVIEW FPGA module and Xilinx-based NI hardware, contact JKI directly.

 

 

Xylon has a new hardware/software development kit for quickly implementing embedded, multi-camera vision systems for ADAS and AD (autonomous driving), machine-vision, AR/VR, guided robotics, drones, and other applications. The new logiVID-ZU Vision Development Kit is based on the Xilinx Zynq UltraScale+ MPSoC and includes four Xylon 1Mpixel video cameras based on the TI FPD (flat-panel display) Link-III interface. The kit supports HDMI video input and display output and comes complete with extensive software deliverables including pre-verified camera-to-display SoC designs built with licensed Xylon logicBRICKS IP cores, reference designs and design examples prepared for the Xilinx SDSoC Development Environment, and complete demo Linux applications.

 

 

 

 

Xylon logiVID-ZU Vision Development Kit .jpg 

 

Xylon’s new logiVID-ZU Vision Development Kit

 

 

 

Please contact Xylon for more information about the new logiVID-ZU Vision Development Kit.

 

 

 

 

 

 

Earlier this week at San Jose State University (SJSU), Jim Hogan, one of Silicon Valley’s most successful venture capitalists, gave a talk on the disruptive effects that cognitive science and AI are already having on society. In a short portion of that talk, Hogan discussed how he and one of his teams developed the world’s most experienced lung-cancer radiologist—an AI app—for $75:

 

  • US Center for Disease Control lung imaging database--$25
  • Cloud storage for the 4Tbyte data base--$25
  • Time for training the AI using the CDC database using Amazon’s AWS--$25

 

Hogan’s trained AI radiologist can look at lung images and find possibly cancerous tumors based on thousands of cases in the CDC database. However, said Hogan, the US Veterans Administration has a database with millions of cases. Yes, his team used that database for training too.

 

Hogan predicted that something like 25 million AI apps like his lung-cancer-specific radiologist will be developed over the next few years. His $75 example is meant to prove the cost feasibility of developing that many useful apps.

 

Hogan made me a believer.

 

In a related connection to AWS app development, Xilinx has just posted four training videos showing you how to develop FPGA-accelerated apps using Xilinx’s SDAccel on Amazon’s AWS EC2 F1 instance. That’s nearly two hours of free training available at your desk. (For more information on the AWS EC2 F1 instance, see “SDAccel for cloud-based application acceleration now available on Amazon’s AWS EC2 F1 instance” and “AWS makes Amazon EC2 F1 instance hardware acceleration based on Xilinx Virtex UltraScale+ FPGAs generally available”.)

 

 

Here are the videos:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Note: You can watch Jim Hogan’s 90-minute presentation at SJSU by clicking here.

 

 

SDAccel—Xilinx’s development environment for accelerating cloud-based applications using C, C++, or OpenCL—is now available on Amazon’s AWS EC2 F1 instance. (Formal announcement here.) The Amazon EC2 F1 compute instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. SDAccel automates the acceleration of software applications by building application-specific FPGA kernels for the AWS EC2 F1. You can also use HDLs including Verilog and VHDL to define hardware accelerators in SDAccel. With this release, you can access SDAccel through the AWS FPGA developer AMI.

 

 

For more information about Amazon’s AWS EC2 F1 instance, see:

 

 

 

 

 

 

 

 

For more information about SDAccel, see:

 

 

 

 

 

 

Brandon Treece from National Instruments (NI) has just published an article titled “CPU or FPGA for image processing: Which is best?” on Vision-Systems.com. NI offers a Vision Development Module for LabVIEW, the company’s graphical systems design environment, and can run vision algorithms on CPUs and FPGAs, so the perspective is a knowledgeable one. Abstracting the article, what you get from an FPGA-accelerated imaging pipeline is speed. If you’re performing four 6msec operations on each video frame, a CPU will need 24msec (four times 6msec) to complete the operations while an FPGA offers you parallelism that shortens processing time for each operation and permits overlap among the operations, as illustrated from this figure taken from the article:

 

 

 

NI Vision Acceleration.jpg 

 

 

In this example, the FPGA needs a total of 6msec to perform the four operations and another 2msec to transfer a video frame back and forth between processor and FPGA. The CPU needs a total of 24msec for all four operations. The FPGA needs 8msec, for a 3x speedup.

 

Treece then demonstrates that the acceleration is actually much greater in the real world. He uses the example of a video processing sequence needed for particle counting that includes these three major steps:

 

 

  • Convolution filtering to sharpen the image
  • Thresholding to produce a binary image
  • Morphology to remove holes in the binary particles

 

 

Here’s an image series that shows you what’s happening at each step:

 

 

NI Vision Acceleration Steps.jpg 

 

 

Using the NI Vision Development Module for LabVIEW, he then runs the algorithm run on an NI cRIO-9068 CompactRIO controller, which is based on a Xilinx Zynq Z-7020 SoC. Running the algorithm on the Zynq SoC’s ARM Cortex-A9 processor takes 166.7msec per frame. Running the same algorithm but accelerating the video processing using the Zynq SoC’s integral FPGA hardware takes 8msec. Add in another 0.5msec for DMA transfer of the pre- and post-processed video frame back and forth between the Zynq SoC’s CPU and FPGA and you get about a 20x speedup.

 

A key point here is that because the cRIO-9068 controller is based on the Zynq SoC, and because NI’s Vision Development Module for LabVIEW supports FPGA-based algorithm acceleration, this is an easy choice to make. The resources are there for your use. You merely need to click the “Go-Fast” button.

 

 

For more information about NI’s Vision Development Module for LabVIEW and cRIO-9068 controller, please contact NI directly.

 

 

 

 

A new open-source tool named GUINNESS makes it easy for you to develop binarized (2-valued) neural networks (BNNs) for Zynq SoCs and Zynq UltraScale+ MPSoCs using the SDSoC Development Environment. GUINNESS is a GUI-based tool that uses the Chainer deep-learning framework to train a binarized CNN. In a paper titled “On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA,” presented at the recent 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, authors Haruyoshi Yonekawa and Hiroki Nakahara describe a system they developed to implement a binarized CNN for the VGG-16 benchmark on the Xilinx ZCU102 Eval Kit, which is based on a Zynq UltraScale+ ZU9EG MPSoC. Nakahara presented the GUINNESS tool again this week at FPL2017 in Ghent, Belgium.

 

According to the IEEE paper, the Zynq-based BNN is 136.8x faster and 44.7x more power efficient than the same CNN running on an ARM Cortex-A57 processor. Compared to the same CNN running on an Nvidia Maxwell GPU, the Zynq-based BNN is 4.9x faster and 3.8x more power efficient.

 

GUINNESS is now available on GitHub.

 

 

 

ZCU102 Board Photo.jpg 

 

 

Xilinx ZCU102 Zynq UltraScale+ MPSoC Eval Kit

 

 

 

 

 

 

 

The Xilinx Technology Showcase 2017 will highlight FPGA-acceleration as used in Amazon’s cloud-based AWS EC2 F1 Instance and for high-performance, embedded-vision designs—including vision/video, autonomous driving, Industrial IoT, medical, surveillance, and aerospace/defense applications. The event takes place on Friday, October 6 at the Xilinx Summit Retreat Center in Longmont, Colorado.

 

You’ll also have a chance to see the latest ways you can use the increasingly popular Python programming language to create Zynq-based designs. The Showcase is a prelude to the 30-hour Xilinx Hackathon starting immediately after the Showcase. (See “Registration is now open for the Colorado PYNQ Hackathon—strictly limited to 35 participants. Apply now!”)

 

The Xilinx Technology Showcase runs from 3:00 to 5:00 PM.

 

Click here for more details and for registration info.

 

 

 

Xilinx Longmont.jpg

 

Xilinx Colorado, Longmont Facility

 

 

 

 

For more information about the FPGA-accelerated Amazon AWS EC2 F1 Instance, see:

 

 

 

 

 

 

 

 

 

The Xilinx Hackathon is a 30-hour marathon event being held at the Xilinx “Retreat” (also known as the Xilinx Colorado facility in Longmont, but see the image below), starting on October 7. The organizers are looking for no more than 35 heroic coders who will receive a Python-programmable, Zynq-based Digilent/Xilinx PYNQ-Z1 board and an assortment of Arduino-compatible shields and sensors. The intent, as Zaphod Beeblebrox might say, is to create something not just amazing but “amazingly amazing.”

 

 

 

Xilinx Longmont.jpg 

 

 

Xilinx Colorado, Longmont Facility

 

 

 

 

  • Want to compete as a team? No problem. The Xilinx Hackathon rules allow teams as large as four people, but the body count is capped at 35 so if you have a large team you’d better get your name on the invite list early. Better yet, get your name on the list early even if you’re competing solo.

 

  • How much does it cost to enter? Zero. Zip, Nada, Nothing.

 

  • What are the prizes? We’re offering more than $2000 in cash prizes plus all competitors keep their PYNQ-Z1 boards. Also, winners and other amazingly amazing projects will get incredible, amazingly amazing recognition in the Xcell Daily blog, which will be covering this event. Your fame is assured.

 

  • What should you bring: “A laptop, laptop charger, phone, charger(s), headphones, a pillow, toiletries, an extra set of clothes, and a water bottle.” (It’s a 30-hour hackathon, but it’s at the Xilinx retreat (see photo, again)). Xilinx will provide you with three meals per day (good hackers will figure out how many meals they’ll get in 30 hours) as well as snacks, drinks, and caffeine stimulation.

 

 

  • Where do I sign up? Here. A crack Xilinx team will hand-select and invite the lucky 35 participants from this list. (That’s the Final Five times seven.)

 

 

In case you’ve not read about it, the PYNQ project is an open-source project from Xilinx that makes it easy to design high-performance embedded systems using Xilinx Zynq Z-7000 SoCs. Here’s what’s on the PYNQ-Z1 board:

 

 

  • Xilinx Zynq Z-7020 SoC with a dual-core ARM Cortex-A9 MPCore processor running at 650MHz
  • 512Mbytes of DDR3 SDRAM running at 1050MHz
  • 16Mbytes of Quad-SPI Flash memory
  • A MicroSD slot for the PYNQ environment
  • USB OTG host port
  • USB programming port
  • Gigabit Ethernet port
  • Microphone
  • Mono audio output jack
  • HDMI input port
  • HDMI output port
  • Two Digilent PMOD ports
  • Arduino-format header for Arduino shields
  • Pushbuttons, switches, and LEDs

 

 

 

PYNQ-Z1.jpg 

 

 

The PYNQ-Z1 Board

 

 

 

 

For more information about the PYNQ project, see:

 

 

 

 

 

 

 

 

Adam Taylor’s MicroZed Chronicles, Part 214: Addressing VDMA Issues

by Xilinx Employee ‎09-05-2017 12:04 PM - edited ‎09-06-2017 08:41 AM (6,076 Views)

 

By Adam Taylor

 

 

Video Direct Memory Access (VDMA) is one of the key IP blocks used within many image-processing applications. It allows frames to be moved between the Zynq SoC’s and Zynq UltraScale+ MPSoC’s PS and PL with ease. Once the frame is within the PS domain, we have several processing options available. We can implement high-level image processing algorithms using open-source libraries such as OpenCV and acceleration stacks such as the Xilinx reVISION stack if we wish to process images at the edge. Alternatively, we can transmit frames over Gigabit Ethernet, USB3, PCIe, etc. for offline storage or later analysis.

 

It can be infuriating when our VDMA-based image-processing chain does not work as intended. Therefore, we are going to look at a simple VDMA example and the steps we can take to ensure that it works as desired.

 

The simple VDMA example shown below contains the basic elements needed to provide VDMA output to a display. The processing chain starts with a VDMA read that obtains the current frame from DDR memory. To correctly size the data stream width, we use an AXIS subset convertor to convert 32-bit data read from DDR memory into a 24-bit format that represents each RGB pixel with 8 bits. Finally, we output the image with an AXIS-to-video output block that converts the AXIS stream to parallel video with video data and sync signals, using timing provided by the Video Timing Controller (VTC). We can use this parallel video output to drive a VGA, HDMI, or other video display output with an appropriate PHY.

 

This example outlines a read case from the PS to the PL and corresponding output. This is a more complicated case than performing a frame capture and VDMA write because we need to synchronize video timing to generate an output.

 

 

 

Image1.jpg

 

 

Simple VDMA-Based Image-Processing Pipeline

 

 

 

So what steps can we take if the VDMA-based image pipeline does not function as intended? To correct the issue:

 

  1. Check Reset and Clocks as we would when debugging any application. Ensure that the reset polarity is correct for each module as there will be mixed polarities. Ensure that the pixel clock is correct for the required video timing and that it is supplied to both the VTC and the AXIS-to-Video Out blocks. While the clock required for the AXIS network must be able to support the image throughput.
  2. Check the Clock Enables on both the VTC and AXIS to Video Out blocks are tied to the correct level to enable the clocks.
  3. Check that the VTC is correctly configured, especially if you are using the AXI interface to define the configuration through the application software. When configuring the VTC using AXI, it is important to make sure we have set the source registers to the VTC generator, enabled register updates, and defined the timing parameters required.
  4. Check the connections between the VTC and AXIS-to-Video-Out Blocks. Ensure that the horizontal and vertical blanking signals are also connected along with the horizontal and vertical syncs.
  5. Check the AXIS-to-Video-Out If we are using VDMA, the timing mode of the AXIS-to-Video-Out block should be set to master. This enables the AXIS-to-Video-Out block to assert back pressure on the AXIS data stream to halt the frame buffer output. This mechanism permits the AXIS-to-Video-Out block to manage the flow of pixels by enabling synchronization and lock. You may also want to increase the size of the internal buffer from the default.
  6. Check that the AXIS-to-Video-Out VTC_ce signal is not connected to the VTC gen clock enable as is the case when configured for slave operation. This will prevent the AXIS-to-Video-Out block from being able to lock to the AXIS video stream.
  7. Insert ILA’s. Inserting these within the design allow us to observe the detailed workings of the AXI buses. When commissioning a new image processing pipeline, I insert ILA blocks on the VTC output and the VDMA MM-to-AXIS port so that I can observe the generated timing signals and VDMA output stream. When observing the AXI Stream the tuser signal identifies the start of frame and the tlast signal represents the end of line. You may also want to observe the AXIS-to-Video-Out 32-bit status output, which provides indication of the locked status along with additional debug information.
  8. Ensure that HSize and Stride are set correctly. These are defined by the application software and configure the VMDA with frame-store information. HSize represents the horizontal size of the image and Stride represents the distance in memory between the image lines. Both HSize and Stride are defined in bytes. As such, when working with U32 or U16 types, take care to correctly set these values to reflect the number of bytes used.

 

 

Hopefully by the time you have checked these points, the issue with your VDMA based image processing pipeline will have been identified and you can start developing the higher-level image processing algorithms needed for the application.

 

 

Code is available on Github as always.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

by Anthony Boorsma, DornerWorks

 

 

Why aren’t you getting all of the performance that you expect after moving a task or tasks from the Zynq PS (processing system) to its PL (programmable logic)? If you used SDSoC to develop your embedded design, there’s help available. Here’s some advice from DornerWorks, a Premier Xilinx Alliance Program member. This blog is adapted from a recent post on the DornerWorks Web site titled “Fine Tune Your Heterogeneous Embedded System with Emulation Tools.

 

 

 

Thanks to Xilinx’s SDSoC Development Environment, offloading portions of your software algorithm to a Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic) to meet system performance requirements is straightforward. Once you have familiarized yourself with SDSoC’s data-transfer options for moving data back and forth between the PS and PL, you can select the appropriate data mover that represents the best choice for your design. SDSoC’s software estimation tool then shows you the expected performance results.

 

Yet when performing the ultimate test of execution—on real silicon—the performance of your system sometimes fails to match expectations and you need to discover the cause… and the cure. Because you’ve offloaded software tasks to the PL, your existing software debugging/analysis methods do not fully apply because not all of the processing occurs in the PS.

 

You need to pinpoint the cause of the unexpected performance gap. Perhaps you made a sub-optimal choice of data mover. Perhaps the offloaded code was not a good candidate for offloading to the PL. You cannot cure the performance problem without knowing its cause.

 

Just how do you investigate and debug system performance on a Zynq-based heterogeneous embedded system with part of the code running in the PS and part in the PL?

 

If you are new to the world of debugging PL data processing, you may not be familiar with the options you have for viewing PL data flow. Fortunately, if you used SDSoC to accelerate software tasks by offloading them to the PL, there is an easy solution. SDSoC has an emulation capability for viewing the simulated operation of your PL hardware that uses the context of your overall system.

 

This emulation capability allows you to identify any timing issues with the data flow into or out of the auto-generated IP blocks that accelerate your offloaded software. The same capability can also show you if there is an unexpected slowdown in the offloaded software acceleration itself.

 

Using this tool can help you find performance bottlenecks. You can investigate these potential bottlenecks by watching your data flow through the hardware via the displayed emulation signal waveforms. Similarly, you can investigate the interface points by watching the data signals transfer data between the PS and the PL. This information provides key insights that help you find and fix your performance issues.

 

We’ll focus on the multiplier IP block from the Xilinx MMADD example to demonstrate how you can debug/emulate a hardware-accelerated function. For simplicity, we will focus on one IP block, the matrix multiplier IP block from the Multiply and Add example, shown in Figure 1.

 

 

 

Image1.jpg

 

Figure 1: Multiplier IP block with Port A expanded to show its signals

 

 

 

We will look at the waveforms for the signals to and from this Mmult IP block in the emulation. Specifically we will view the A_PORTA signals as shown in the figure above. These signals represent the data input for matrix A, which corresponds to the software input param A to the matrix multiplier function.

 

To get started with the emulation, enable generation of the “emulation model” configuration for the build in SDSoC’s project’s settings, as shown in Figure 2.

 

 

 

Image2.jpg 

 

 

Figure 2: The mmult Project Settings needed to enable emulation

 

 

 

Next, rebuild your project as normal. After building your project with emulation model support enabled in the configuration, run the emulator by selecting “Start/Stop Emulation” under the “Xilinx Tools” menu option. When a window opens, select “Start” to start the emulator. SDSoC will then automatically launch an instance of Xilinx Vivado, which triggers the auto-generated PL project that SDSoC created for you as a subproject within your SDSoC project.

 

We specifically want to view the A_PORTA signals of the Mmult IP block. These signals must be added to the Wave Window to be viewed during a simulation. The available Mmult signals can be viewed in the Objects pane by selecting the mmult_1 block in the Scopes pane. To add the A_PORTA signals to the Wave Window, select all of the “A_*” signals in the Objects pane, right click, and select “Add to Wave Window” as shown in Figure 3.

 

 

 

Image3.jpg

 

 

Figure 3: Behavioral Simulation – mmult_1 signals highlighted

 

 

 

Now you can run the emulation and view the signal states in the waveform viewer. Start the emulator by clicking “Run All” from the “Run” drop-down menu as shown in Figure 4.

 

 

 

Image4.jpg

 

 

Figure 4: Start emulation of the PL

 

 

 

Back SDSoC’s toolchain environment, you can now run a debugging session that connects to this emulation session as it would to your software running on the target. From the “Run” menu option, select “Debug As -> 1 Launch on Emulator (SDSoC Debugger)” to start the debug session as shown in Figure 5.

 

 

 

Image5.jpg

 

 

Figure 5: Connect Debug Session to run the PL emulation

 

 

 

Now you can step or run through your application test code and view the signals of interest in the emulator. Shown below in Figure 6 are the A_PORTA signals we highlighted earlier and their signal values at the end of the PL logic operation using the Mmult and Add example test code.

 

 

Image6.jpg

 

Figure 6: Emulated mmult_1 signal waveforms

 

 

 

These signals tell us a lot about the performance of the offloaded code now running in the PL and we used familiar emulation tools to obtain this troubleshooting information. This powerful debugging method can help illuminate unexpected behavior in your hardware-accelerated C algorithm by allowing you to peer into the black box of PL processing, thus revealing data-flow behavior that could use some fine-tuning.

 

 

A recent Sensorsmag,com article written by Nick Ni and Adam Taylor titled “Accelerating Sensor Fusion Embedded Vision Applications” discusses some of the sensor-fusion principles behind, among other things, 3D stereo vision as used in the Carnegie Robotics Multisense stereo cameras discussed in today’s earlier blog titled “Carnegie Robotics’ FPGA-based GigE 3D cameras help robots sweep mines from a battlefield, tend corn, and scrub floors.” We’re starting to put a large amount of sensors into systems and turning the deluge of raw sensor data into usable information is a tough computational job.

 

Describing some of that job’s particulars consumes the first half of Ni’s and Taylor’s article. The second half of the article then discusses some implementation strategies based on the new Xilinx reVISION stack, which is built on top of Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.

 

If there are a lot of sensors in your next design, particularly image sensors, be sure to take a look at this article.

 

 

By Anthony Boorsma, DornerWorks

 

Having some trouble choosing between Vivado HLS and SDSoC? Here’s some advice from DornerWorks, a Premier Xilinx Alliance Program member. This blog is adapted from a recent post on the DornerWorks Web site titled “Algorithm Implementation and Acceleration on Embedded Systems

 

 

How does an engineer already experienced and comfortable with working in the Zynq SoC’s software-based PS (processing system) domain take advantage of the additional flexibility and processing power of the Zynq SoC’s PL (programmable logic)? The traditional method is through education and training to learn to program the PL using an HDL such as Verilog or VHDL. Another way is to learn and use a tool that allows you to take a software-based design written exclusively for the ARM 32-bit processors in the PS and transfer some or most of the tasks to the PL, without writing HDL descriptions.

 

One such tool is Xilinx’s Vivado High Level Synthesis (HLS). By leveraging the capabilities of HLS, you can prototype a design using the Zynq PS and then move functionality to the PL to boost performance. The advantage of this tool is that it generates IP blocks that can be used in the programmable logic of Xilinx FPGAs as well as Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.

 

Logic optimization occurs when Vivado HLS synthesizes your algorithm’s C model and creates RTL. There are code directives (essentially guidelines for the tools’ optimization process) available that allow you to guide the HLS tool’s synthesis from the C model source to the RTL bitstream programmed into the FPGA. If you are working with an existing algorithm modeled in C, C++, or SystemC and need to implement this algorithm in custom logic for added performance, then HLS is a great tool choice.

 

However, be aware that the data movers that transfer data between the Zynq PS and the PL must be manually configured for performance when using Vivado HLS. This can become a complicated process when there’s significant data transfer between the domains.

 

A recent innovation that simplifies data-mover configuration is the development of the Xilinx SDSoC (Software-Defined System on Chip) Development Environment for use with Zynq SoCs and Zynq UltraScale+ MPSoCs. SDSoC builds on Vivado HLS’ capabilities by using HLS to perform the C-to-RTL conversion but with the convenient addition of automatically generated data movers, which greatly simplifies configuring the connection between the software running on the Zynq PS and the accelerated algorithm executing in the Zynq PL. SDSoC also allows you to guide data-mover generation by providing a set of pragmas to make specific data-mover choices. The SDSoC directive pragmas give you control over the automatically generated data movers but still require some minimal manual configuration. Code-directive pragmas for RTL optimization available in Vivado HLS are also available in SDSoC and can be used in tandem with SDSoC pragmas to optimize both the PL algorithm and the automatically generated data movers.

 

It is possible to disable the SDSoC auto generated data movers and only use the HLS optimizations. Demonstrated below are an IP block diagram generated with the auto configured SDSoC data movers and one without them.

 

The following screen shots are taken from a Xilinx-provided template project demonstrating the acceleration of a software matrix multiplication and addition algorithm, provided with the SDx installation. We used the SDx 2016.4 toolchain and targeted an Avnet Zedboard with a standalone OS configuration for this example.

 

 

Image1.jpg

 

 

Here is a screen shot of the same block, but without the SDSoC data movers. (We have disabled the automatic generation of data movers within SDSoC by manually declaring the AXI HLS interface directives for both mmult and madd accelerated IP block.)

 

 

Image2.jpg 

 

 

To achieve the best algorithm performance, be prepared to familiarize yourself and use both the SDSoC and Vivado HLS user guides and datasheets. SDSoC provides a superset of Vivado HLS’s capabilities.

 

If you are developing and accelerating your model from first principles but want to take advantage of the flexibility of testing and proving out a design in software first, and you don’t intend to use a Zynq SoC, then using the Vivado HLS toolset straightaway is the place to start. A design started in HLS is transferable to an SDSoC if requirements change. Alternatively, if using a Zynq-based system is possible, it would be worthwhile to start right away with using SDSoC.

 

 

 

 

Amazon Web Services (AWS) is now offering the Xilinx SDAccel Development Environment as a private preview. SDAccel empowers hardware designers to easily deploy their RTL designs in the AWS F1 FPGA instance. It also automates the acceleration of code written in C, C++ or OpenCL by building application-specific accelerators on the F1. This limited time preview is hosted in a private GitHub repo and supported through an AWS SDAccel forum. To request early access, click here.

 

Last September at the GNU Radio Conference in Boulder, Colorado, Ettus Research announced the RFNoC & Vivado Challenge for SDR (software-defined radio). Ettus’ RFNoC (RF Network on Chip) is designed to allow you to efficiently harness the latest-generation FPGAs for SDR applications without being an expert firmware or FPGA developer. Today, Ettus Research and Xilinx announced the three challenge winners.

 

Ettus’ GUI-based RFNoC design tool allows you to create FPGA applications as easily as you can create GNU Radio flowgraphs. This includes the ability to seamlessly transfer data between your host PC and an FPGA. It dramatically eases the task of FPGA off-loading in SDR applications. Ettus’ RFNoC is built upon Xilinx’s Vivado HLS.

 

Here are the three winning teams and their projects:

 

 

 

 

 

Finally, here’s a 5-minute video announcing the winners along with the prizes they have won:

 

 

 

 

It will take you five or ten minutes to read the new SDSoC article written by Nick Ni and Adam Taylor titled “Developing all programmable logic using the SDSoC environment” and after you’re done, you’ll have a very good idea of why you might want to try Xilinx’s new SDSoC Development Environment for C, C++, and OpenCL application development. The article quickly guides you through a typical software/hardware development cycle and then gives you the performance results for an AES cryptography application.

 

The resulting performance chart showing as much as a 75% reduction in clock cycles for the application alone should be enough to pique your interest:

 

 

SDSoC Performance Chart.jpg

 

 

 

Got a problem getting enough performance out of your processor-based embedded system? You might want to watch a 14-minute video that does a nice job of explaining how you can develop hardware accelerators directly from your C/C++ code using the Xilinx SDK.

 

How much acceleration do you need? If you don’t know for sure, the video gives an example of an autonomous drone with vision and control tasks that need real-time acceleration.

 

What are your alternatives? If you need to accelerate your code, you can:

 

  • Increase your processor’s clock speed, likely requiring a faster speed grade
  • Add more processor cores to share the load
  • Switch to a higher-end, code-compatible processor

 

Unfortunately, each of these three alternatives increases power consumption. There’s another alternative however that can actually cut power consumption. That alternative’s based on the use of Xilinx All Programmable Zynq SoCs and Zynq UltraScale+ MPSoCs. By moving critical code into custom hardware accelerators implements in the programmable logic incorporated into all Zynq family members, you can relieve the processor of the associated processing burden and actually slow the processor’s clock speed, thus reducing power. It’s quite possible to cut overall power consumption using this approach.

 

Ah, but implementing these accelerators. Aye, there’s the rub!

 

It turns out that implementation of these hardware accelerators might not be as difficult as you imagine. The Xilinx SDK is already a C/C++ development environment based on familiar IDE and compiler technology. Under the hood, the SDK serves as a single cockpit for all Zynq-based development work—software and hardware. It also includes SDSoC, the piece of the puzzle you need to convert C/C++ code into acceleration hardware using a 3-step process:

 

 

  • Code profiling to identify time-consuming tasks that are critical to real-time operation
  • Software/hardware partitioning based on the profiling data
  • Software/hardware compilation based on the system partitioning

 

One development platform, SDK, serves all members of the Zynq SoC and Zynq UltraScale+ MPSoC device families, giving you a huge price/performance range.

 

Here’s that 14-minute video:

 

 

 

 

 

The latest “Powered by Xilinx” video, published today, provides more detail about the Perrone Robotics MAX development platform for developing all types of autonomous robots—including self-driving cars. MAX is a set of software building blocks for handling many types of sensors and controls needed to develop such robotic platforms.

 

Perrone Robotics has MAX running on the Xilinx Zynq UltraScale+ MPSoC and relies on that heterogeneous All Programmable device to handle the multiple, high-bit-rate data streams from complex sensor arrays that include lidar systems and multiple video cameras.

 

Perrone is also starting to develop with the new Xilinx reVISION stack and plans to both enhance the performance of existing algorithms and develop new ones for its MAX development platform.

 

Here’s the 4-minute video:

 

 

reVISION Cobot logo.jpg

In a free Webinar taking place on July 12, Xilinx experts will present a new design approach that unleashes the immense processing power of FPGAs using the Xilinx reVISION stack including hardware-tuned OpenCV libraries, a familiar C/C++ development environment, and readily available hardware-development platforms to develop advanced vision applications based on complex, accelerated vision-processing algorithms such as dense optical flow. Even though the algorithms are advanced, power consumption is held to just a few watts thanks to Xilinx’s All Programmable silicon.

 

Register here.

 

 

Xilinx announced the addition of the P416 network programming language for SDN applications to its SDNet Development Environment for high-speed (1Gbps to 100Gbps) packet processing back in May. (See “The P4 has landed: SDNet 2017.1 gets P4-to-FPGA compilation capability for 100Gbps data-plane packet processing.”) An OFC 2017 panel session in March—presented by Xilinx, Barefoot Networks, Netcope Technologies, and MoSys—discussed the adoption of P4, the emergent high-level language for packet processing, and early implementations of P4 for FPGA and ASIC targets. Here’s a half-hour video of that panel discussion.

 

 

 

 

For more information, you might also want to take a look at the Xilinx P4-SDNet Translator User Guide and the SDNet Packet Processor User Guide, which was just updated recently.

 

 

Doulos Logo.gif 

If the new Xilinx SDSoC Design Environment with its C/C++ software-centric design flow for Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs has piqued your interest, then you will certainly be interested in a free 1-hour Webinar that Doulos is teaching on that design flow on July 7.

 

Register here.

 

 

 

Compute Acceleration: GPU or FPGA? New White Paper gives you numbers

by Xilinx Employee ‎06-14-2017 02:24 PM - edited ‎06-14-2017 02:28 PM (16,907 Views)

 

Cloud computing and application acceleration for a variety of workloads including big-data analytics, machine learning, video and image processing, and genomics are big data-center topics and if you’re one of those people looking for acceleration guidance, read on. If you’re looking to accelerate compute-intensive applications such as automated driving and ADAS or local video processing and sensor fusion, this blog post’s for you to. The basic problem here is that CPUs are too slow and they burn too much power. You may have one or both of these challenges. If so, you may be considering a GPU or an FPGA as an accelerator in your design.

 

How to choose?

 

Although GPUs started as graphics accelerators, primarily for gamers, a few architectural tweaks and a ton of software have made them suitable as general-purpose compute accelerators. With the right software tools, it’s not too difficult to recode and recompile a program to run on a GPU instead of a CPU. With some experience, you’ll find that GPUs are not great for every application workload. Certain computations such as sparse matrix math don’t map onto GPUs well. One big issue with GPUs is power consumption. GPUs aimed at server acceleration in a data-center environment may burn hundreds of watts.

 

With FPGAs, you can build any sort of compute engine you want with excellent performance/power numbers. You can optimize an FPGA-based accelerator for one task, run that task, and then reconfigure the FPGA if needed for an entirely different application. The amount of computing power you can bring to bear on a problem is scary big. A Virtex UltraScale+ VU13P FPGA can deliver 38.3 INT8 TOPS (that’s tera operations per second) and if you can binarize the application, which is possible with some neural networks, you can hit 500TOPS. That’s why you now see big data-center operators like Baidu and Amazon putting Xilinx-based FPGA accelerator cards into their server farms. That’s also why you see Xilinx offering high-level acceleration programming tools like SDAccel to help you develop compute accelerators using Xilinx All Programmable devices.

 

For more information about the use of Xilinx devices in such applications including a detailed look at operational efficiency, there’s a new 17-page White Paper titled “Xilinx All Programmable Devices: A Superior Platform for Compute-Intensive Systems.”

 

 

 

 

 

There’s considerable 5G experimentation taking place as the radio standards have not yet gelled and researchers are looking to optimize every aspect. SDRs (software-defined radios) are excellent experimental tools for such research—NI’s (National Instruments’) SDR products especially so because, as the Wireless Communication Research Laboratory at Istanbul Technical University discovered:

 

“NI SDR products helped us achieve our project goals faster and with fewer complexities due to reusability, existing examples, and the mature community. We had access to documentation around the examples, ready-to-run conceptual examples, and courseware and lab materials around the grounding wireless communication topics through the NI ecosystem. We took advantage of the graphical nature of LabVIEW to combine existing blocks of algorithms more easily compared to text-based options.”

 

Researchers at the Wireless Communication Research Laboratory were experimenting with UFMC (universal filtered multicarrier) modulation, a leading modulation candidate technique for 5G communications. Although current communication standards frequently use OFDM (orthogonal frequency-division multiplexing), it is not considered to be a suitable modulation technique for 5G systems due to its tight synchronization requirements, inefficient spectral properties (such as high spectral side-lobe levels), and cyclic prefix (CP) overhead. UFMC has relatively relaxed synchronization requirements.

 

The research team at the Wireless Communication Research Laboratory implemented UFMC modulation using two USRP-2921 SDRs, a PXI-6683H timing module, and a PXIe-5644R VST (Vector signal Transceiver) module from National Instruments (NI)–and all programmed with NI’s LabVIEW systems engineering software. Using this equipment, they achieved better spectral results over the OFDM usage and, by exploiting UFMC’s sub-band filtering approach, they’ve proposed enhanced versions of UFMC. Details are available in the NI case study titled “Using NI Software Defined Radio Solutions as a Testbed of 5G Waveform Research.” This project was a finalist in the 2017 NI Engineering Impact Awards, RF and Mobile Communications category, held last month in Austin as part of NI Week.

 

 

5G UFMC Modulation Testbed.jpg 

 

 

5G UFMC Modulation Testbed based on Equipment from National Instruments

 

 

Note: NI’s USRP-2921 SDR is based on a Xilinx Spartan-6 FPGA; the NI PXI-6683 timing module is based on a Xilinx Virtex-5 FPGA; and the PXIe-5644R VST is based on a Xilinx Virtex-6 FPGA.

 

 

 

 

 

 

Although humans once served as the final inspectors for pcbs, today’s component dimensions and manufacturing volumes mandate the use of camera-based automated optical inspection (AOI) systems. Amfax has developed a 3D AOI system—the a3Di—that uses two lasers to make millions of 3D measurements with better than 3μm accuracy. One of the company’s customers uses an a3Di system to inspect 18,000 assembled pcbs per day.

 

The a3Di control system is based on a National Instruments (NI) cRIO-9075 CompactRIO controller—with an integrated Xilinx Virtex-5 LX25 FPGA—programmed with NI’s LabVIEW systems engineering software. The controller manages all aspects of the a3Di AOI system including monitoring and control of:

 

 

  • Machine motors
  • Control switches
  • Optical position sensors
  • Inverters
  • Up and downstream SMEMA (Surface Mount Equipment Manufacturers Association) conveyor control
  • Light tower
  • Pneumatics
  • Operator manual controls for width PCB control
  • System emergency stop

 

 

The system provides height-graded images like this:

 

 

 

Amfax 3D PCB image.jpg 

 

3D Image of a3Di’s Measurement Data: Colors represent height, with Z resolution down to less than a micron. The blue section at the top indicates signs of board warp. Laser etched component information appears on some of the ICs.

 

 

 

The a3Di system then compares this image against a stored golden reference image to detect manufacturing defects.

 

Amfax says that it has found the CompactRIO system to be “CompactRIO system has proven to be a dependable, reliable, and cost-effective.” In addition, the company found it could get far better timing resolution with the CompactRIO system than the 1msec resolution usually provided by PLC controllers.

 

 

This project was a 2017 NI Engineering Impact Award Finalist in the Electronics and Semiconductor category last month at NI Week. It is documented in this NI case study.

 

HIL simulator based on NI equipment allows Hyundai to cut marine diesel development from 3 years to one

by Xilinx Employee ‎06-13-2017 12:09 PM - edited ‎06-13-2017 12:11 PM (12,956 Views)

 

Hyundai Heavy Industries (HHI) is the world’s foremost shipbuilding company and the company’s Engine and Machinery Division (HHI-EMD) is the world’s largest marine diesel engine builder. HHI’s HiMSEN medium-sized engines are four-stroke diesels with output power ranging from 960kW to 25MW. These engines power electric generators on large ships and serve as the propulsion engine on medium and small ships. HHI-EMD is always developing newer, more fuel-efficient engines because the fuel costs for these large diesels runs about $2000/hour. Better fuel efficiency will significantly reduce operating costs and emissions.

 

For that research, HHI-EMD developed monitoring and diagnostic equipment to better understand engine combustion performance and an HIL system to test new engine controller designs. The test and HIL systems are based on equipment from National Instruments (NI).

 

Engine instrumentation must be able to monitor 10-cylinder engines running at thousands of RPM while measuring crankshaft angle to 0.1 degree of resolution. From that information, the engine test and monitoring system calculates in-cylinder peak pressure, mean effective pressure, and cycle-to-cycle pressure variation. All this must happen every 10 μ sec for each cylinder.

 

HHI-EMD elected to use an NI cRIO-9035 Controller, which incorporates a Xilinx Kintex-7 70T FPGA, to serve as the platform for developing its HiCAS test and data-acquisition system. The HiCAS system monitors all aspects of the engine under test including engine speed, in-cylinder pressure, and pressures in the intake and exhaust systems. This data helped HHI-EMD engineers analyze the engine’s overall performance and the performance of key parts using thermodynamic analysis. HiCAS provides real-time analysis of dynamic data including:

 

  • In-cylinder peak pressure
  • Indicated mean effective pressure and cycle-to-cycle variation
  • Cylinder-to-cylinder distribution
  • Cyclic moving parts fault diagnosis

 

Using the collected data, the engineering team then developed a model of the diesel engine, resulting in the development of an HMI system used to exercise the engine controllers. This engine model runs in real time on an NI PXI system synchronized with the high-speed signal-sensor simulation software running on the PXI system’s multifunction FPGA-based FlexRIO module. The HMI system transmits signals to the engine controllers, simulating an operating engine and eliminating the operating costs of a large diesel engine during these tests. HHI-EMD credits the FPGAs in these systems for making the calculations run fast enough for real-time simulation. The simulated engine also permits fault testing without the risk of damaging an actual engine. Of course, all of this is programmed using NI’s LabVIEW systems engineering software and LabVIEW FPGA.

 

 

 

 

HHI-EMD HIL Simulator for Marine Diesel Engines.jpg 

 

 

HHI-EMD HIL Simulator for Marine Diesel Engines

 

 

 

According to HHI-EMD, development of the HiCAS engine-monitoring system and virtual verification based on the HIL system shortened development time from more than three years to one, significantly accelerating the time-to-market for HHI-EMD’s more eco-friendly marine diesel engines.

 

 

 

This project was a 2017 NI Engineering Impact Award Finalist in the Transportation and Heavy Equipment category last month at NI Week and won the 2017 HPE Edgeline Big Analog Data Award. It is documented in this NI case study.

 

 

 

 

 

 

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.