Don't have a Xilinx account yet?

  • Choose to receive important news and product information
  • Gain access to special content
  • Personalize your web experience on

Create Account



Forgot your password?
XClose Panel
Xilinx Home

Optimizing an OpenCL Application for Video Watermarking in FPGAs

by Xilinx Employee ‎05-21-2015 11:25 AM - edited ‎05-21-2015 01:54 PM (197 Views)


By Jasmina Vasiljevic, University of Toronto, and Fernando Martinez Vallina, PhD, Xilinx


(Excerpted from the latest issue of Xcell Journal)


We recently used Xilinx’s SDAccel development environment to compile and optimize a video-watermarking application written in OpenCL for an FPGA accelerator card. Video content providers use watermarking to brand and protect their content. Our goal was to design a watermarking application that would process high-definition (HD) video at a 1080p resolution with a target throughput of 30fps running on an Alpha Data ADM-PCIE-7V3 card.


The SDAccel development environment enables designers to take applications captured in OpenCL and compile them to an FPGA without requiring knowledge of the underlying FPGA implementation tools. The video-watermarking application serves as a perfect way to introduce the main optimization techniques available in SDAccel.


The main function of the video-watermarking algorithm is to overlay a logo at a specific location on a video stream. The logo used for the watermark can be either active or passive. An active logo is typically represented by a short, repeating video clip, while a passive logo is a still image. The most common technique among broadcasting companies that brand their video streams is to use a company logo as a passive watermark, so that was the aim of our example design. The application inserts a passive logo on a pixel-by-pixel level of granularity based on the operations of the following equations:



out_y[x][y] = (255-mask[x][y]) * in_y[x][y] + mask[x][y] * logo_y[x][y]


out_cr[x][y] = (255-mask[x][y]) * in_cr[x][y] + mask[x][y] * logo_cr[x][y]


out_cb[x][y] = (255-mask[x][y]) * in_cb[x][y] + mask[x][y] * logo_cb[x][y]



The input and output frames are two-dimensional arrays in which pixels are expressed using the YCbCr color space. In this color space, each pixel is represented in three components: Y is the luma component, Cb is the chroma blue-difference component and Cr is the chroma red-difference component. Each component is represented by an 8-bit value, resulting in a total of 24 bits per pixel.


The system on which we executed the application is shown in the figure below. It is composed of an Alpha Data ADMPCIE-7V3 card communicating with an x86 processor over a PCIe link. In this system, the host processor retrieves the input video stream from disk and transfers it to the device global memory. The device global memory is the memory on the FPGA card that is directly accessible from the FPGA. In addition to placing the video frames in device global memory, the logo and mask are transferred from the host to the accelerator card and placed in on-chip memory to take advantage of the low latency of BRAM memories. The code that runs on the host processor is responsible for sending a video frame to the FPGA accelerator card, launching the accelerator and then retrieving the processed frame from the FPGA accelerator card.



Video Watermarking Application.jpg



System Overview for the Video Watermarking Application



The optimizations necessary when creating applications like this one using SDAccel are software optimizations. Thus, these optimizations are similar to the ones required to extract performance from other processing fabrics, such as GPUs. As a result of using SDAccel, the details of getting the PCIe link to work, drivers, IP placement and interconnect became a non-issue, allowing us as designers to focus solely on the target application.





This blog entry is an excerpt from an article in the latest issue of Xcell Journal.


Alpha Data, the maker of the FPGA-based Alpha Data ADM-PCIE-7V3 PCIe accelerator card discussed in this article, has joined the OpenPOWER Foundation. The organization is a group of technology organizations working collaboratively to build advanced server, networking, storage and acceleration technology as well as industry leading open source software aimed at delivering more choice, control, and flexibility to developers of next-generation hyperscale and cloud data centers.




Peel me a grape—and then watch the da Vinci surgical robot suture the grape back together

by Xilinx Employee ‎05-21-2015 10:20 AM - edited ‎05-21-2015 11:35 AM (227 Views)

Intuitive Surgical has worked with Xilinx since 2003 to speed the delivery of increasingly advanced da Vinci robotic surgical system capabilities to operating rooms. Multiple generations of the da Vinci system have heavily relied on Xilinx devices going all the way back to the days of the Virtex-2 Pro FPGA to deliver the speed and flexibility required from a finely controlled robotic surgical system. (Note: The latest 16nm Virtex UltraScale All Programmable devices are now many, many generations ahead of those 0.13μm Virtex-2 Pro days.)


According to David Powell, Principal Design Engineer for Intuitive Surgical, “As we started using the Xilinx device, we discovered it to be quite a nice design platform—so nice, in fact, that follow-on platforms have evolved to employ dozens of Xilinx FPGAs in all of the main system components. Our first board to employ a Xilinx FPGA was up and running in two hours. After that, we found we could get a board up and running in just minutes—these kind of results are almost unheard of.”


If you want to know more about how Intuitive Surgical is using Xilinx All Programmable devices to speed product development, improve performance, and reduce systems costs (in horse racing, that’s called a “trifecta”), then take a look at “Medical Robotics Improve Patient Outcomes and Satisfaction.”


I’m more of a seeing is believing kind of guy, so here’s a 2-minute video showing a da Vinci surgical robot suturing up the skin of a grape—which someone has obviously peeled—and there’s a surprise ending you won’t want to miss:







From the article about Intuitive Surgical in Xcell Journal, Issue 77:


Powell pointed to a close partnership with Xilinx’s technical staff, sales force and executives as another key to success. “We know Xilinx devices backwards and forwards now, and this really helps us make a difference in many lives,” he said. “It always comes back to the patients. We hear from people every day who tell us how a new procedure changed or saved their life. That’s what motivates us to deliver the best technology.”


As the infamous saying goes, you can’t be too rich or too thin. Although that may or may not be true, in the world of network switching you truly can’t have too many Ethernet ports. Time was, the number of Gigabit Ethernet ports you could have on one FPGA was limited to the number of SerDes ports on the device. That’s no longer true. With the advent of Xilinx UltraScale All Programmable devices, you can now use low-power LVDS SelectIO pins (in addition to SerDes transceivers) for 1000Base-X Ethernet ports. If you find that assertion tough to swallow, here’s a 5-minute video complete with a technical explanation, eye diagrams, and J-BERT jitter histograms to convince you that you can get much better low-jitter I/O performance for Gigabit Ethernet than you’ll need from the UltraScale LVDS SelectIO pins, even with nearly all of the FPGA’s on-chip logic resources toggling:





Now, you may need more than some I/O pins to complete the Ethernet port. You may also need Gigabit MAC and SGMII IP to complete the package. You’ll find the Gigabit MAC listed on the Xilinx Web site as the Xilinx Tri-Mode Ethernet Media Access Controller (TEMAC) (click here for the latest product guide) and the SGMII IP core as the Ethernet 1G/2.5G BASE-X PCS/PMA or SGMII LogiCORE. For some switching and multiplexing applications, you won't even need the MAC.


How many ports can you get on one FPGA using LVDS SelectIO pins to implement Gigabit Ethernet on Xilinx UltraScale devices? Well, of course, that depends on the size of the device. I’m told that it’s certainly possible to fit 40 Gigabit Ethernet ports on one Kintex UltraScale KU040 device. That’s the second smallest Kintex UltraScale FPGA, the one that entered full production last December (see “First Kintex UltraScale FPGA enters full production. Two dev boards now available to help you design advanced new systems”). My quick calculations show that 40 ports worth of Gigabit Ethernet won’t come close to filling the device even with the MACs. UltraScale devices really do alter reality when it comes to system-design assumptions.


So, would 40 fully configurable, low-jitter Gigabit Ethernet ports on one chip help you with your next design?




Note: The maximum number of differential HP I/O pairs on an UltraScale KU040 FPGA is 192.




By Devadas Varma and Tom Feist, Xilinx


(Excerpted from the latest issue of Xcell Journal)



Mission-critical enterprise servers often use specialized hardware for application acceleration, including graphics processing units (GPUs) and digital signal processors (DSPs). Xilinx’s new SDAccel development environment removes programming as a gating issue to FPGA utilization in this application by providing developers with a familiar CPU/GPU-like environment.


Spartan-6 FPGA Dev Board plays Doom, a first-person shooter video game from 1993

by Xilinx Employee ‎05-20-2015 11:26 AM - edited ‎05-20-2015 12:41 PM (451 Views)

Yesterday, I mentioned a port of Professor Nicklaus Wirth’s Oberon programming language and operating system to Saanlima's Pipistrello, a Spartan-6 FPGA development board. (See “Oberon System Implemented on a Low-Cost FPGA Board.”) There are many uses for a dev board however and one of the other ports to this board is the first-person shooter called “Doom,” which was originally developed to run on DOS PCs by id Software.


This FPGA-based implementation uses a recompilation of a source port called “Chocolate Doom” that runs on a 100MHz Xilinx MicroBlaze soft processor instantiated in the Pipistrello board’s Spartan-6 LX45 FPGA with additional HDL to handle the mouse, game controllers, video, and sound. Now running recompiled code for an x86 microprocessor on an FPGA-based, soft-core processor might not be the most obvious way to fully employ a Spartan-6 FPGA, but I think it’s a pretty cool demo nevertheless.


Here’s a short, 26-second video showing DOOM running in demo mode on the Spartan-6 LX45 FPGA:





Note: There are no sound effects in this short video because we didn’t hook up any speakers to the Pipistrello board.



Convolutional Neural Networks (CNNs) and deep learning are revolutionizing all sorts of recognition applications from image and speech recognition to big data mining. Baidu’s Dr. Ren Wu, a GPU application pioneer, gave a keynote at last week’s Embedded Vision Summit 2015 announcing worldwide accuracy leadership in analyzing the ImageNet Large Scale Visual Recognition Challenge data set using Baidu’s GPU-based deep-learning CNN. (See “Baidu Leads in Artificial Intelligence Benchmark” and Baidu’s paper.) GPUs are currently the implementation technology of choice for CNN researchers—because of their familiar programming model—but GPUs have prohibitive power consumption. Meanwhile and also at the Embedded Vision Summit, Auviz Systems founder and CEO Nagesh Gupta presented results of related work on image-processing CNNs. Auviz Systems has been developing FPGA-based middleware IP for data centers that cuts application power consumption.


Oberon System Implemented on a Low-Cost FPGA Board

by Xilinx Employee on ‎05-19-2015 03:40 PM (371 Views)


By Niklaus Wirth, Professor (retired), Swiss Federal Institute of Technology (ETH)


(Excerpted from the latest issue of Xcell Journal)


In 1988, Jürg Gutknecht and I completed and published the programming language Oberon as the successor to two other languages, Pascal and Modula-2, which I had developed earlier in my career. We originally designed the Oberon language to be more streamlined and efficient than Modula-2 so that it could better help academics teach system programming to computer science students. To advance this endeavor, in 1990 we developed the Oberon operating system (OS) as a modern implementation for workstations that use windows and have word-processing abilities. We then published a book that details both the Oberon compiler and the operating system of the same name. The book, entitled Project Oberon, includes thorough instructions and source code.


The ExaLINK Fusion Ultra Low Latency Switch and Application Platform from Exablaze delivers 110nsec Layer 2 switching, 100nsec “Layer 1.5” forwarding and aggregation, and 5nsec Layer 1 patching and tapping to as many as 48 front-panel 10G SFP+ or twelve 40G QSFP+ optical Ethernet ports by relying heavily on the fast speed and flexibility of 20nm Xilinx UltraScale FPGAs. What do you do with that kind of capability? Among other things, you can:


  • Patch together two ports
  • Fan out a port with multicast data
  • Connect multiple servers to a single uplink
  • Switch packets at Layer 2 with only 110nsec of latency


ExaLINK Fusion Ethernet Switch Applications.jpg 



Here’s a short video overview of the product:








Each year, Frost & Sullivan presents an award for new product innovation to the company that has developed an innovative element in a product by leveraging leading-edge technologies. This week, Frost & Sullivan recognized Typhoon HIL with the 2015 Global Frost & Sullivan Award for New Product Innovation. “Typhoon HIL provides tools for automated testing of control systems, software regression testing, pre-certification and continuous test and integration processes, which translate into a fully integrated HIL solution,” said Frost & Sullivan Research Analyst Viswam Sathiyanarayanan. “In addition, the company uses its own highly optimized and fully integrated software and hardware platform, thus redefining the ease of use and greatly reducing deployment effort.”


The Typhoon HIL 4 and HIL 6 Series are real-time emulators for hardware-in-the-loop (HIL) testing of power electronics in applications such as solar inverters, battery storage, wind turbines and motor drives. Prior to the development of the Typhoon HIL Series emulators, HIL systems could only manage millisecond-to-second simulation time steps, not nearly fast enough for power electronics. Typhoon HIL emulators were the first to achieve 1μsec time steps.


Here’s a great quote from the Typhoon HIL402 Web page:



Flipped classroom



Typhoon HIL402.jpg



“Imagine the power engineering instruction in which the technology enables students to develop their engineering intuition by playing with megawatts as if they were avatars in their computer games… The HIL402 kit makes large scale unsupervised hands on instruction in the field of power engineering and Smart Grid possible.”



When you need hardware in the loop with the consistent, high-speed response times required by accurate emulation, software running on a microprocessor just can’t meet the need. The Typhoon HIL402 emulator is based on the Xilinx Zynq SoC and the Typhoon HIL400 emulator is based on Virtex-5 FPGAs. The Typhoon HIL600 and HIL602 emulators are based on Xilinx Virtex-6 FPGAs.


The typical Embedded Vision system must process video frames, extract features from those processed frames, and then make decisions based on the extracted features. Pixel-level tasks can require hundreds of operations per pixel and require hundreds of GOPS (giga operations/sec) when you’re talking about HD or 4K2K video. Contrast that with the frame-based tasks, which “only” require millions of operations per second but the algorithms are more complex. You need a hardware implementation for the pixel-level tasks while fast processors can handle the more complex frame-based tasks. This explanation is how Mario Bergeron, a Technical Marketing engineer from Avnet, launched into his presentation at last week’s Embedded Vision Summit 2015 in Santa Clara, California.



Bergeron Embedded Vision Summit 2015 fig 1.jpg




Gaze tracking makes jump from assistive technology niche to mainstream with the help of a Zynq SoC

by Xilinx Employee ‎05-18-2015 02:59 PM - edited ‎05-18-2015 09:40 PM (492 Views)

Applications that help people with needs are a special pleasure to blog and this blog’s all about using technology to help people overcome tremendous challenges. The technology is gaze tracking, as embodied in the EyeTech Digital Systems’ AEye eye tracker. This technology performs a seemingly simple task: figure out where someone is looking. The measuring techniques have been known since 1901. Implementation? Well that’s taken more than 100 years of development and EyeTech has been at the forefront of this work for almost two decades. Here’s the current gaze-tracking process flow used by EyeTech:



 Eyetech Gaze Tracking Process Flow.jpg




Originally, EyeTech used commercial analog video cameras and PCs to create a “Windows mouse” that could be controlled with nothing more than eye positioning. EyeTech’s eye-tracking technology determines gaze direction from pupil position and 850nm IR light reflections from the human cornea. The major markets for this technology were originally for disabled users who needed assistive technology to more fully interact with the world at large. These disabilities are caused by numerous factors including ALS, cerebral palsy, muscular dystrophy, spinal cord injuries, traumatic brain injuries, and stroke. Eye-tracking technology makes a large, qualitative difference in the lives of people affected by these challenges. There’s an entire page full of video testimonials to the transformative power of this technology on the EyeTech Web site.


However important the assistive technology market, it’s relatively small and Robert Chappell, EyeTech’s founder, realized that the technology could have far more utility for a much larger user base if he could reduce the implementation costs, size, and power consumption. Here were Chappell’s goals:


  • Stand-alone operation (no PC needed)
  • “Compact” size
  • Low power (< 5W)
  • Low cost (< $200)
  • Superior eye-tracking capability
  • Multi-OS support
  • Field upgradeable
  • Reasonable development time and costs


These are not huge hurdles for typical embedded systems but when your algorithms require a PC to handle the processing load, these goals for an embedded version present some significant design challenges. No doubt, Chappell and his team would have used a microcontroller if they could have found a suitable device with sufficient processing horsepower. But with the existing code running on PC-class x86 processors, shrinking the task into one device was not easy.


Chappell learned about the Xilinx Zynq SoC at exactly the right time and it seemed like exactly the right type of device for his project. The Zynq SoC’s on-chip, dual-core ARM Cortex-A9 MPCore processors could run the existing PC-based code with a recompilation and deliver an operable system. Then, Chappell’s team could gradually move the most performance-hungry tasks to the Zynq SoC’s on-chip PL (programmable logic) to accelerate sections of the code. Porting the code took two years and the team size varied from two to four engineers working part time on the project.


Ultimately, the project resulted in a product that can track a gaze at frame rates ranging from 40 to 200+ frames/sec. Many gaze-tracking applications can use the slower frame rates but certain applications such as testing for brain injuries requires the faster frame rate for an accurate result.


Here’s a photo of the resulting AEye pc board:



Eyetech AEye Module with Zynq Z-7020.jpg



This is a fairly small board! A Zynq Z-7020 SoC measures 17mm on a side and the board is only slightly taller than the Zynq SoC package. Note the US dime shown on the right of the above image for a size comparison. Here’s a hardware block diagram of the AEye board:



Eyetech AEye Module Block Diagram.jpg




And here’s how EyeTech has apportioned the tasks between the Zynq SoC’s PS (processor system) and PL:



Eyetech Gaze Tracking Task Allocation.jpg



Chappell notes that the availability of a high-performance PS and PL in the Zynq SoC made for an ideal rapid-development environment because the boundary between the hardware and software is not rigid. The ability to move tasks from the PS to the PL is what permitted the design team to achieve better than 200 fps frame rates.


How mainstream could this gaze-tracking technology get? How about one such eye tracker per car to help fight driver fatigue; chemically induced inattention; and distraction from cell phones, tablets, and the like? If proven effective, insurance companies may soon be lobbying for this feature to be made mandatory in new cars. Science fiction? Just watch this video from Channel 3 TV news in Mesa, AZ.



(Note: This blog is a summary of a presentation made by Robert Chappell and Dan Isaacs of Xilinx at last week’s Embedded Vision Summit 2015, held in Santa Clara, CA.)


Synopsys has announced new DesignWare Hybrid IP Prototyping Kits that integrate a Virtualizer Development Kit (VDK) from the ARMv8 Base Platform and a DesignWare IP Prototyping Kit. The Hybrid IP Prototyping Kits provide software engineers with a ready-to-use solution for accelerating software development, validation, code porting, software debug, and optimization for ARM Cortex-A57 and Cortex-A53 processor cores in big.LITTLE configurations. The prototyping platform is based on the Synopsys HAPS-DX prototyping system, which incorporates a Xilinx Virtex-7 690T FPGA.


That’s a lot to take in so here’s a short Synopsys video that explains what this product can do for you:







Phenomenal Cosmic Prototyping Power, Itty Bitty Package: The new S2C Single VU440 Prodigy Logic Module

by Xilinx Employee ‎05-18-2015 11:10 AM - edited ‎05-18-2015 09:50 PM (506 Views)

What do you get when you put one 20nm Xilinx Virtex UltraScale VU440 All Programmable device on a prototyping board?


  • 44M ASIC gates (4.433M logic cells)
  • 88.6Mbits of internal memory
  • 2880 DSP slices
  • An on-board DDR4 SODIMM socket (max capacity 8Gbytes)
  • 1,152 I/O pins
  • 44 16Gbps GTH serial transceivers



These are the specs for S2C’s new Single VU440 Prodigy Logic Module and here’s a photo:



S2C Single VU440 Prodigy Logic Module.jpg



For the remote chance that these resources are not sufficient for your prototyping needs, you can configure 16 of these bad boys in an S2C Cloud Cube to create a multi-user prototyping environment of ginormous proportions.


More info here.




By Adam Taylor


Following on from last week’s blog we’re heading towards programming an OLED display for the Zedboard (not the MicroZed). Before we jump too far into programming the OLED however, I thought it would be a good idea to ensure that we have correctly configured the SPI port for the application. This could save us lots of time later on and it’s very simple to do. In fact it is so simple, I am going to demonstrate two different methods in this single blog post. The first method will route the SPI pins out via the Zynq SoC’s MIO while the second will route the SPI pins out via the EMIO. What’s the difference? Keep reading.


The best way to demo SoC IP for video? Cadence says “Xilinx”

by Xilinx Employee ‎05-15-2015 04:06 PM - edited ‎05-16-2015 07:28 AM (812 Views)

What does Cadence do when it wants to demo its IVP image/video processor and MIPI IP? It builds a Xilinx-based FPGA emulation platform, of course. (It takes way too long and costs far too much to build an SoC for a demo.) Pulin Desai of Cadence was at this week’s Embedded Vision Summit 2015 in Santa Clara, California and I captured this quick video of the Cadence IVP in action, performing real-time face detection and sending the resulting annotated video stream to an LCD using the company’s MIPI IP.







The Cadence demo platform is based on Xilinx All Programmable silicon including two Artix-7 FPGAs and a third device hidden under a heat sink and fan. The Cadence MIPI IP is physical IP, so it is implemented as a small custom IC on a small red daughtercard for this demo.


Note: You can implement MIPI interfaces directly with nothing more than a few resistors using Xilinx devices, see “Swipe these Low Cost FPGA-based MIPI DSI and CSI-2 Interfaces for Video Displays and Cameras.”



VectorBlox Matrix Processor IP for FPGAs accelerates image, video, other types of processing

by Xilinx Employee ‎05-15-2015 03:04 PM - edited ‎05-15-2015 06:19 PM (790 Views)

You know what it’s like when you connect with a professor who’s really, really good at explaining things? That’s how I felt talking to Guy Lemieux, who is both the CEO of VectorBlox, an embedded supercomputing IP vendor, and a Professor of Electrical and Computer Engineering at the University of British Columbia. We met at this week’s Embedded Vision Summit in Santa Clara, California where I got a fast education in matrix processor IP in general and a real-time demo of the VectorBlox MXP Matrix Processor IP core. (See the video below).


The VectorBlox MXP is a scalable, soft matrix coprocessor that you can drop into an FPGA to accelerate image, vision, and other tasks that require vector or matrix processing. If your system has a lot of such processing to handle and needs real-time performance, you’ve got three choices—design paths you might take:


  1. HDL design using Verilog or VHDL
  2. High-level synthesis using a C-to-gates compiler like Xilinx Vivado HLS
  3. Use a vector co-processor to boost performance


Path number 1 is the traditional path taken by hardware designers since HDLs became popular at the end of the 1980s. In the early 1980s, nascent HDL compilers weren’t that great at generating hardware, commonly described as having poor QoR—Quality of Results. Many designers back then either felt or said outright, “I'll give you my schematics when you pry them from my cold, dead hands.”


You don’t see many systems being designed with schematics these days. Systems have gotten far too complicated and HDLs represent a far more suitable level of abstraction. Tools change with the times.


First-generation high-level synthesis tools, as embodied in Synopsys’ Behavioral Compiler, met with similar resistance and they didn’t go very far. However, design path number two has become viable as HLS compilers have improved. You can now find a growing number of testimonials to the effectiveness of such tools, like this one from NAB 2014.


Design path number three has the compelling allure of a software-based, quick-iteration design approach. Software compilation remains faster than HDL-based hardware compilation followed by placement and routing but depends on using a processor with appropriate matrix acceleration—not really the purview of the usual RISC suspects.


Matrix processing is exactly what the VectorBlox MXP is designed to do.


This week at the Embedded Vision Summit, Teradeep demonstrated real-time video classification from streaming video using its deep-learning neural network IP running on a Xilinx Kintex-7 FPGA, the same FPGA fabric you find in a Zynq Z-7045 SoC. Image-search queries run in data center servers usually use CPUs and GPUs, which consume a lot of power. Running the same algorithms on a properly configured FPGA can reduce the power consumption by 3x-5x according to Vinayak Gokhale, a hardware engineer at TeraDeep, who was running the following demo in the Xilinx booth at the event:






Note that this demo can classify the images using as many as 40 categories simultaneously without degrading the real-time performance.

Among the many demos at this week’s Embedded Vision Summit held at the Santa Clara Convention Center was a demonstration of a Zynq-based development workflow using MathWorks’ Simulink and HDL Coder to create a fully operational, real-time pedestrian detector based on the HOG (Histogram of Oriented Gradients) algorithm. The model for this application was developed entirely in MathWorks’ Simulink and the company’s HDL Coder generated the HDL code for implementing the HOG algorithm’s SVM (support vector machine) classifier in the programmable logic section of a Xilinx Zynq SoC. The Xilinx Vivado Design Suite converted the HDL into a hardware implementation for the Zynq SoC.


This design takes real-time HD video, processes the video in the Zynq SoC’s programmable-logic implementation of the SVM classifier, and passes the results back to the Zynq SoC’s dual-core ARM Cortex-A9 MPCore processor, which annotates the video stream and then outputs the result.


Here’s a video of the demo, presented by MathWorks’ Principal Development Engineer Steve Kuznicki at the Embedded Vision Summit:







The Xilinx Zynq SoC is the heavy lifter of the embedded world. Yes, there’s a 1GHz, dual-core, ARM Cortex-A9 MPCore processor on the device for the more conventional, software-driven embedded tasks including operating systems and application programs. But there’s also a nicely sized block of programmable logic fabric including embedded DSP cores and memory when you absolutely, positively need hardware-driven response times. The problem, of course, is learning how to use something like a Zynq SoC when you’ve never used one for a design.


Here’s an answer: A free, new 110-page User Guide, UG1165, titled “Zynq-7000 All Programmable SoC: Embedded Design Tutorial A Hands-On Guide to Effective Embedded System Design.” The User Guide shows you how to set up and manage a project using the Xilinx Vivado Design Suite; how to get a “Hello World” application up and running; how to write, profile, and debug an application for the Zynq SoC; how to create custom IP using the programmable logic on the Zynq SoC; and how to write device drivers for that IP that runs under Linux.





This week in Japan, Renesas will be demonstrating a Deterministic Deep Database Search Engine based on one of the company’s S-series Network Search Engine ICs and the Xilinx 200Gbps Programmable Packet Processor, built into a Xilinx Virtex-7 FPGA and generated by the Xilinx SDNet Development Environment. Using the Xilinx Vivado Design Suite, Renesas engineers developed a controller for the Network Search Engine IC as an IP block that’s also instantiated in the Virtex-7 FPGA. Here’s a block diagram of the system:



Renesas Deterministic Deep Database Search Engine .jpg 



Renesas is using its R8A20686BG-G 80Mbit Dual-Port Interlaken-LA TCAM to store search data. The device is designed for large table searches; is capable of performing 2 billion searches/sec; supports 80-, 160-, 320- and 640-bit search keys; and connects to a packet processor using 12-lane 10.1325/12.5Gbps Interlaken serial ports.


In this demo, the Renesas TCAM is connected to a Xilinx Eval Board carrying a Xilinx Virtex-7 FPGA. A custom Programmable Packet Processor created by the Xilinx SDNet development environment generates and feeds search keys to the Renesas device, which performs the searches in real time and passes search results back to the Programmable Packet Processor. A MicroBlaze RISC processor instantiated in the Virtex-7 FPGA handles table maintenance in the Renesas TCAM.


Here’s a photo of the working demo system:



Renesas Network Search Engine.jpg




The board on the left is the Xilinx Virtex-7 Eval Board and the board on the right has the Renesas S-series Network Search Engine IC. The boards are linked through a high-speed CFP cable, appearing at the bottom of the photo. The rainbow ribbon cable between the boards carries a low-speed housekeeping connection employed by the MicroBlaze processor instantiated in the Virtex-7 FPGA.



The Xilinx Zynq SoC’s PS (Processor System) incorporates two USB 2.0 On-The-Go (OTG) controllers, which can act as USB host or USB Device. They can also dynamically change roles between host and device. The USB protocol defines four primary transfer types:


  • Control Transfer
  • Bulk Transfer
  • Isochronous Transfer
  • Interrupt Transfer


The Zynq SoC OTG controllers support all the four types of transfers. A new Tech Tip posted in the Xilinx Wiki explains how to enable the USB configuration options, with step-by-step procedures for using the Zynq SoC’s USB OTG controllers in device mode, making use of the bulk transfer type for a serial communication device abstraction.



Among the many buried gems in the Xilinx Wiki is a new Tech Tip that shows you how to use the Zynq SoC as a mass-storage device, connected to a host over a USB 2.0 connection. Here’s a simple block diagram of the design:



Zynq Mass Storage Tech Tip.jpg




Click here to access the Tech Tip and associate files.

The Roads Must Roll: Zynq SoC will be used to build Intelligent Transport System in Singapore

by Xilinx Employee ‎05-11-2015 11:29 AM - edited ‎05-11-2015 11:35 AM (729 Views)

The Land Transport Authority (LTA) of Singapore is responsible for planning, operating, and maintaining Singapore’s land transport infrastructure and systems. Singapore has a population of nearly 5.5 million people living in a little more than 718.3 km2 spread across 63 islands, so running a smooth transport system is not a simple task. Singapore has implemented a sophisticated Intelligent Transport System in addition to a number of transport initiatives—including free public transportation in pre-morning peak hours, a vehicle quota system, congestion charge, and an extensive public transport system-- to help manage this infrastructure.



Singapore Roads.jpg


Singapore’s Land Transport Authority plans, operates, and maintains the nation’s road infrastructure



The city has pioneered the introduction of a variety of technologies to the land transport system including one of the world’s first Electronic Road Pricing systems (ERP – with tolls that vary according to traffic flows). The ERP system uses a short-range radio communications to deduct charges from smart cards inserted in the vehicles. Other Intelligent elements include an Expressway Monitoring and Advisory System, alerting motorists to traffic accidents on major roads, and a GPS system installed on the city taxis, which monitors and reports on traffic conditions around the city. Information from these systems feeds into the Intelligent Transport System’s Operations Control Centre, which consolidates the data and provides real-time traffic information to the public.


Singapore’s LTA has selected the automotive-qualified version of the Xilinx Zynq SoC for its Intelligent Transport System project to help keep Singapore's 164km of roads and tunnel systems safe while maximizing road network efficiency and capacity, and monitoring and managing traffic flow. The program was awarded to Xilinx on March 20 at a signing ceremony at LTA headquarters in Singapore.



Note: This blog’s headline is an homage to Robert Heinlein’s short story “The Roads Must Roll,” written in 1940. The story’s conclusion: the price of high tech transportation is eternal vigilance. The automotive-grade Zynq SoC is a very good choice for such vigilance.





Adam Taylor’s MicroZed(ish) Chronicles Part 81: Simple Communication Interfaces

by Xilinx Employee ‎05-11-2015 10:15 AM - edited ‎05-11-2015 10:15 AM (972 Views)


By Adam Taylor


So far in this journey we have looked at getting data on and off the MicroZed board using Ethernet. We have not looked at communicating with on-board peripherals: real time clocks, non-volatile memories, and unique sensors. These communications often employ either I2C or SPI. Here’s a look at both:


  • SPI (Serial Peripheral Interface) is a serial, 4-wire, full-duplex interface originally developed by Motorola Semiconductor. It subsequently developed into a de facto standard and is commonly used for intra-module communication—i.e. transferring data between peripherals and the processor or FPGA within the same module. SPI is often used for semiconductor memories, ADC’s, CODECs, MMC and SD memory cards. The SPI system architecture consists of a single master or multiple masters and one or more slaves.


  • The I2C (Inter-Integrated Circuit, usually pronounced “I-squared-C”) interface is a multi-master, 2-wire, serial bus developed by Phillips in the early 1980’s with a purpose similar to Motorola’s SPI. Only half-duplex mode is possible with I2C due to its 2-wire nature. The advantage: you save two pins—and in the 1980s, when a 40-pin DIP was considered “large,” saving two pins was a lot more important than it is today. The I2C standard persists because it’s economical and useful.


The Zynq SoC’s PS (processor system) incorporates two I2C and two SPI peripheral interfaces, which can be routed through either the MIO or the Zynq PL’s (Programmable Logic’s) EMIO as shown below:






VxWorks Logo.jpgXilinx just published a new App Note, XAPP1258, titled “Using VxWorks 7 BSP with the Zynq-7000 AP SoC.” Wind River’s VxWorks is the granddaddy of RTOSes with tailored profiles for aerospace, industrial, medical, networking, and consumer applications. It was introduced way back in 1987 and designed to provide a bullet-proof, deterministic operating system for microprocessors.


Both the Spirit and Opportunity Mars rovers—in operation on Mars since early 2004—and the Curiosity Mars Science Laboratory employ VxWorks. You’ll also find VxWorks in diverse systems such as the Boeing 787 Dreamliner, some of BMW’s iDrive systems, The Apple Airport Extreme, Linksys wireless routers, assorted industrial robots, RAID storage controllers from IBM, and a ton of networking equipment from nearly all the majors.


In other words, the VxWorks RTOS is well tested and tempered by time.


Multicore-capable VxWorks 7, introduced in early 2014, is the latest incarnation of this venerable operating system. The Xilinx Zynq SoC with its dual-core ARM Cortex-A9 MPCore processor makes a mighty fine host for VxWorks and the new app note will help you get things running.





Evaluating the Linearity of RF-DAC Multiband Transmitters

by Xilinx Employee on ‎05-08-2015 02:34 PM (388 Views)


By Lei Guan, Member of Technical Staff, Bell Laboratories, Alcatel Lucent Ireland


(Excerpted from the latest issue of Xcell Journal)


Emerging RF-class data converters—namely, RF DACs and RF ADCs—architecturally make it possible to create compact multiband transceivers. But the nonlinearities inherent in these new devices can be a stumbling block. For instance, nonlinearity of the RF devices has two faces in the frequency domain: in-band and out of band. In-band nonlinearity refers to the unwanted frequency terms within the TX band, while out-of-band nonlinearity consists of the undesired frequency terms out of the TX band.


Here at Bell Labs Ireland, we have created a flexible software-and-hardware platform to rapidly evaluate RF DACs that are potential candidates for next-generation wireless systems. The three key elements of this R&D project are a high-performance Xilinx FPGA, Xilinx intellectual property (IP), and MATLAB. We tried to minimize the FPGA resource usage while keeping the system as flexible as possible. A system block diagram appears below:



 Simplified Block Diagram RD-DAC Linearity Eval Tester.jpg




We picked the latest Analog Devices RF-DAC evaluation boards (AD9129 and AD9739a) and the Xilinx ML605 evaluation board. The ML605 board comes with a Virtex-6 XC6VLX240T-1FFG1156 FPGA device, which contains fast-switching I/Os (up to 710 MHz) and serdes units (up to 5 Gbps) for interfacing the RF DACs.


The FPGA portion of the design includes a clock distribution unit, a state machine-based system control unit and a DDS core-based multitone generation unit, along with two units built around Block RAM: a small BRAM-based control message storage unit (cRAM core) and a BRAM array-based user data storage unit (dRAM core).


The clock is the life pulse of the FPGA. In order to ensure that multiple clocks are properly distributed across FPGA banks, we chose Xilinx’s clock-management core, which provides an easy, interactive way of defining and specifying clocks. A compact instruction core built around a state machine serves as the system control unit.


We designed two testing strategies: a continuous-wave (CW) signals test (xDDS) and a wideband signals test (xRAM). Multitone CW testing has long been the preferred choice of RF engineers for characterizing the nonlinearity of RF components. Keeping the same testing philosophy, we created a tunable four-tone logic core based on a direct digital synthesizer (DDS), which actually uses a pair of two-tone signals to stimulate the RF DAC in two separate frequency bands. By tuning the four tones independently, we can evaluate the linearity performance of the RF DAC—that is, the location and the power of the intermodulation spurs in the frequency domain. CW signal testing is an inherently narrowband operation. To further evaluate the RF DAC regarding wideband performance, we need to drive it with concurrent multiband, multimode signals, such as dual-mode UMTS and LTE signals at 2.1 GHz and 2.6 GHz, respectively.


We chose MATLAB as the software host, simply because it has many advantages in terms of digital signal processing (DSP) capability. What’s more, MATLAB also provides a handy tool called GUIDE for laying out a graphical user interface (GUI). The figure below illustrates the GUI that we created for the platform:



RF-DAC Eval Tester GUI.jpg



Note: This blog is an excerpt. To read the full article in the latest issue of Xcell Journal, click here.

It’s da BOM: How to lower BOM costs using non-intuitive design techniques—a free White Paper

by Xilinx Employee ‎05-07-2015 03:53 PM - edited ‎05-08-2015 09:41 PM (673 Views)

Successful products are profitable products and profit is what you can sell the product for minus the BOM (the bill of materials), manufacturing, shipping, and sales costs. Of those knobs, the BOM cost is often the one you can most easily dial down. For products in cost-sensitive markets, BOM costs are even more critical. However, looking at the cost of just one component isn’t going to get you to a true minimum because a system’s BOM cost consists of several interdependent component costs. It’s best to take a holistic approach to achieve the lowest overall BOM cost.


How do you go all holistic on a BOM? There’s a new Xilinx White Paper titled “Reducing System BOM Cost with Xilinx's Low-End Portfolio” that shows you how.


Depending on requirements, systems designed for cost-sensitive markets can be architected in many ways including:


  • Based solely on a single-chip microcontroller
  • Based on a microcontroller plus an ASSP for a performance boost
  • Based on a microcontroller plus an FPGA for a more flexible performance boost
  • Based on an application processor plus an ASSP plus memory for even more performance
  • Based on an application processor plus memory plus an FPGA—yet more performance + flexibility
  • Etc, etc., etc.


Let’s face it, even if this is the Xilinx Xcell Daily blog, if you can design your entire system with a 50-cent microcontroller, you’d better do that (I have). It gives you a lot of bang for half a buck. However, you won’t get a lot of performance from a 50-cent microcontroller. If the system requires any heavy lifting—video processing, signal processing, or high-speed I/O for example—you are going to need more horsepower.


To get more processing horsepower while keeping BOM costs low, you could architect the system using one of the above configurations and then work on minimizing the cost of each component in the design. That approach gets you to local minimizations but not necessarily a global minimum. The cost of any given component includes not only the component cost itself, but also the cost of other required support components (like memory for an application processor) and the cost of pc board real estate for those components.


Component selection impacts pc board cost by affecting the required board area, the number of board layers, and the need for through holes and blind vias. The cost of processors, ASSPs, memory and—yes—FPGAs all impact the BOM cost and also affect performance requirements and sourcing flexibility. So it’s important to do the right math by taking a holistic view, examining the impact that component selection has on the design of the whole system including the BOM cost.


The best strategy—not the only one but one that’s worked for more than 40 years—is to use more system integration to entirely eliminate components from the BOM. As it turns out, Xilinx FPGAs and the Zynq SoC are really good at this.


Does this sound like something you’d like to investigate further? Then by all means, download and read “Reducing System BOM Cost with Xilinx's Low-End Portfolio”.



Speed is the name of the game for digital radio design and for many other high-speed systems as well. No surprise there. It should also not be a surprise that there are special device features and design techniques that yield more performance—sometimes a lot more performance. If only you know what to use and how. A new on-demand Xilinx video Webinar has just been posted that gives you this knowhow and it’s free.


The Webinar is titled “How to Efficiently Implement Flexible and Full-Featured Digital Radio Solutions Using All Programmable SoCs” but don’t let the title fool you. Wireless System Architect Michel Pecot works long hours to figure out the best ways to extract maximum performance from Xilinx FPGAs and Zynq SoCs and he shares many of these design techniques and tricks with you in this 55-minute video webinar.


For example, Pecot discusses specific ways to optimize FPGA-based system implementations—starting at the architectural level—to get maximum clock rates and maximum performance with exceptional resource utilization. From the Webinar:



Sub-band splitting and multi-stage carrier mixing-extraction.jpg 



The Digilent ZYBO board based on the Xilinx Zynq SoC is a full-featured development board with 512Mbytes of DDR3 SDRAM, HDMI, VGA, Ethernet, MicroSD slot, OTG USB 2.0, audio inputs and outputs, and six Digilent PMOD expansion connectors. Normally, this board sells for $189 ($125 academic pricing) but you can win one from the ARM Connected Community with a power supply and some Digilent swag—and pretty easily. Just click here and leave a comment, stating what you’d do with this board if you won it.


However, tick tock Cinderella. You have until 11:59 PST on May 14 to enter. The offer turns into a pumpkin at midnight.



ZYBO Top View closeup.jpg



Digilent ZYBO board based on the Xilinx Zynq SoC



There are only five comments so far, so your changes are good right now.


Good luck!


By Dr. Javier Díaz (Chief Executive Officer), Rafael Rodríguez-Gómez (Chief Technical Officer), and Dr. Eduardo Ros (Chief Operating Officer), Seven Solutions SL



(Excerpted from the latest issue of Xcell Journal)


White Rabbit Logo.jpgAn Ethernet-based technology called White Rabbit, born at CERN, the European Organization for Nuclear Research, promises to meet the precise timing needs of high-speed, widely distributed applications including 100G Ethernet and 5G mobile telecom networks, smart grids, high-frequency trading, and geopositioning systems. Named after the time-obsessed hare in Alice in Wonderland, White Rabbit is based on, and is compatible with, standard mechanisms such as PTPv2 (IEEE-1588v2) and Synchronous Ethernet, but is properly modified to achieve subnanosecond accuracy. White Rabbit inherently performs self-calibration over long-distance links and is capable of distributing time to a very large number of devices with very small degradation.


From the very beginning, Seven Solutions, based in Granada, Spain, has collaborated in the design of White Rabbit products including not only the electronics but also the firmware and gateware. The company also provides customization and turnkey solutions based on this technology. As an extension of Ethernet, White Rabbit technology is being evaluated for possible inclusion in the next Precision Time Protocol standard (IEEE-1588v3) in the framework of a high-accuracy profile. Standardization would facilitate WR’s integration with a wide range of diverse technologies in the future.


About the Author
  • Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.