We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Zynq-based snickerdoodle says "Hello World" using Micrium’s µC/OS

by Xilinx Employee ‎07-27-2016 11:10 AM - edited ‎07-27-2016 11:13 AM (335 Views)


Jonathan Blanchard is a lead embedded software developer for real-time kernels at Micrium, the company responsible for the µC/OS RTOS that runs on a variety of processors including the dual-core ARM Cortex-A9 processor in the Xilinx Zynq-7000 SoC. Blanchard has just posted a step-by-step blog about getting Micrium’s µC/OS up and running on the low-cost, Zynq-based snickerdoodle board from krtkl. (For more information about the snickerdoodle, see “$55, Zynq-based, WiFi-enabled Snickerdoodle makes MAKE:’s Maker’s Guide to Boards.”)




Zynq-based snickerdoodle board from krtkl



Blanchard’s blog describes how he installed the µC/OS RTOS on the snickerdoodle and how he downloaded the appropriate BSP for the Xilinx SDK under Vivado 2016.2.


His blog ends happily: “And indeed, the snickerdoodle said hello.”










NGCodec is now offering its H.265/HEVC video codec through the Origami Ecosystem launched earlier this year by Image Matters (see “Image Matters launches Origami Ecosystem for developing advanced 4K/8K video apps using the FPGA-based Origami module”). NGCodec says that its H.265/HEVC video encoder is optimized for low latency and low bit rates and can support two 1080p30 channels or one 1080p60 channel on the Origami B20 board, which is based on a Xilinx Kintex UltraScale KU060 FPGA. (For more information on the Origami B20 board, see “Image Matters, inrevium America, and Tokyo Electron Device team up on agile FPGA platform for fast design of advanced 4K/8K video systems.”) NGCodec plans to sell the Origami B20 board pre-loaded with its H.265/HEVC codec IP in 4Q2016.


For more information about NGCodec’s H.265/HEVC video encoder, see “NGCodec demos HEVC H.265 Real-Time Encoder running on Kintex-7 FPGA at NAB 2015” and “NGCodec demos real-time HEVC/H.265 hardware encoder at NAB 2016 – new demo captured on video.”


And here’s the video demo of the NGCodec H.265/HEVC video encoder on Xilinx KCU105 dev board (based on a Kintex UltraScale KU040 FPGA) earlier this year at NAB 2016:







Here’s a 3-minute video by Andrew Powell showing you something pretty slick and useful—that you can connect a $13 SainSMART I2C 20-character by 4-line display to a Xilinx Zynq-7000 SoC on a Digilent ZYBO trainer board through a PMOD connector with no glue, graft Arduino LCD drivers to PetaLinux, and get “Hello World” up on the LCD. Easy Peasy.


Warning: Someone needs to introduce Mr. Powell to the concept of a camera tripod, but the video makes its point nevertheless.






Powell’s code is on github.



You can use Keysight Technologies’ U5340A FPGA Development Kit for High-Speed Digitizers to develop custom algorithms with high-speed Keysight digitizers with 8- to 12-bit resolution and sampling rates ranging from 1 to 4 Gsamples/sec. Keysight’s Giovanni Lucia has just published an article titled “Embedding custom real-time processing in a multi-gigasample high-speed digitizer” on Embedded.com that describes why you would want to develop custom processing algorithms directly into a high-speed digitizer and how to go about doing it.



Keysight U5340 FPGA Development Kit.jpg



Lucia explains that the custom processing algorithms running on an on-board FPGA speeds algorithmic execution and reduces the amount of data you’ll need to extract from the digitizer, reducing I/O and storage loads. A surprising revelation is that you can also insert encryption algorithms into the FPGA, which improves data security by never allowing unencrypted data to leave the digitizer. Development projects targeting rapidly changing markets such as 5G may well benefit from such security measures.


The article also enumerates the minimum features you should expect from an FPGA development kit designed for digitizers:


  • A pre-configured environment, thereby saving the time-consuming effort of selecting the tools, adapting constraints and scripts, and verifying the flow.
  • A database of optimized and standardized cores, which -- coupled with a stable and reliable design flow -- ensure fast compilation times and repeatability of results.
  • A functional system-level testbench environment allowing quick verification of the custom algorithm integration before running the building flow.
  • An automatic flexible and configurable building script to ease repetitive tasks, thereby allowing users to produce multiple bitfiles every day.
  • A direct line of support to an FPGA and digitizer expert who can address any issue and consult on the best approach to implement the algorithm(s) to assure the success of the project.
  • The software driver should offer a standardize API interface, support for multiple and modern languages like Python, a dedicated custom interface, and a reusable calibration routine.
  • A suite of complete example designs and an easy design migration path facilitating deployment on multiple products and form factors with small delta effort.



Keysight’s U5340A FPGA Development Kit is built on Mentor Graphics HDL Designer and ModelSim and Xilinx development tools and LogiCore IP. The kit is compatible with several Keysights digitizers including:



All of these high-speed Keysight digitizers have Xilinx Virtex-6 FPGAs on board for signal processing, both Keysight-written and custom.


Note: For more information about the Keysight U5340A FPGA Development Kit, see “Keysight ups PCIe and AXIe game—lets you add signal processing to the FPGAs in its high-speed digitizers.”




Adam Taylor’s MicroZed Chronicles Part 140: Embedded Vision, HLS, and OpenCV on the Zynq-7000 SoC

by Xilinx Employee ‎07-26-2016 10:02 AM - edited ‎07-26-2016 10:21 AM (818 Views)


By Adam Taylor


Over the last several blogs, we have looked at how we build a Linux system using both the RAM Disk and file system approaches. The most recent blog culminated with the ZedBoard functioning as a single board computer.


Over the coming weeks, I want to explore embedded vision. To do this we are going to be using the following for vision applications:


  • ZedBoard running as a single board computer
  • Avnet Embedded Vision Kit
  • OpenCV
  • High-Level Synthesis and how we can use it in image-processing applications






Zynq SBC running simple OpenCV demo



An increasing number of embedded applications use vision, ranging from simple security and monitoring systems to robotics, driver awareness, and medical imaging. We must also remember that embedded vision can cover a wider section of the electromagnetic spectrum, from ultraviolet, which is used in scientific imaging, to infrared, commonly used for night vision, security and safety, and thermography.


I think it’s sensible to dedicate a number of blogs to embedded vision topics to see the different approaches, the challenges, and how we can overcome the challenges.


No matter the area of the visual spectrum we are working in, in a typical embedded vision system we will want to:


  • Configure the imaging device to output images in the correct format, frame rate, etc.
  • Process the received raw data. Processing examples include color filter interpolation if a Bayer filter is used, color-space conversions and corrections, image enhancement (e.g. noise filtering), edge enhancement, etc. Depending upon the sensor used, processing can be straightforward or pretty complicated.
  • Implement the image-processing algorithms required for our application. Typically this can require a number of stages and is very processing-intense.


The beauty of the Zynq-7000 SoC is that we can perform image processing operations within the PS (processor system) or the PL (programmable logic) and indeed we can use tools like we have looked at previously (such as SDSoC) to accelerate PS performance. Or we can use HLS (High Level Synthesis) to generate RTL modules used in the image-processing algorithm.


When we develop image-processing applications we normally use HLLs (high-level languages) and libraries of algorithms to save time. One such collection of image-processing applications is OpenCV, which provides a number of C++ algorithms for real-time, computer-vision applications.


We can use OpenCV on Microsoft Windows or Linux machines. This means we can develop our algorithm on a development machine and then cross compile it to run on the Zynq SoC’s PS under Linux. Even more exciting, Xilinx Vivado HLS supports OpenCV. We can create AXI Streaming IP modules and drop them into the Zynq SoC’s PL within the image-processing chain. Now that is cool.


First, we’ll look at how we can use Vivado HLS and OpenCV on the ZedBoard SBC that we created last week. To use OpenCV on the Zynq SoC, we need to install both the include files and the libraries it uses. We can then develop the OpenCV code.


The first step for the Zynq SBC is to open a terminal window and download OpenCV. There are a number of ways you can do this, including building it from scratch using the source, however I opted for the simplest method and used a command. In my defense, I am time limited when I write these blogs. You may also be under a time crunch so my approach actually has broad appeal.


I used this command to load OpenCV on the Zynq SBC:



sudo apt-get install libopencv-dev



Once the OpenCV files loaded, I was ready to write my first OpenCv application, which opens and displays a specific file. You can find this on my github page. (See below for a link.)


When it comes to compiling the code we can use the built-in GCC compiler using the command line:



g++ `pkg-config - - cflags opencv` <filename.cpp> `pkg-config - -libs opencv` -o <output name>



When I ran this, I got the image appearing above.


Now so far, we have assumed that we wanted to develop the embedded-vision application on the Zynq SBC. However SDK comes with the OpenCV libraries, so if we wish to use them we can develop our application using SDK on a workstation host and then upload the file to our SBC or to another Zynq implementation. To do this though we need to make SDK aware of the include directory and the library locations. These are in the screenshots below for my installation of Xilinx SDK:





Setting Include Directory in SDK






Setting Libraries in Xilinx SDK



Now we have pipe-cleaned the process. Next week, we’ll look a little more at different image-processing applications using this SBC and the many different ways we can implement them.




The code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.




 MicroZed Chronicles hardcopy.jpg



  • Second Year E Book here
  • Second Year Hardback here




 MicroZed Chronicles Second Year.jpg




The Ethernet Alliance and NBASE-T Alliance have announced a collaborative effort to accelerate mainstream deployment of 2.5GBASE-T and 5GBASE-T Ethernet, which leverages the last 13 years of infrastructure construction based on Cat5e and Cat6 cabling (some 70 billion meters of cable!). The two organizations plan to validate multi-vendor interoperability at a plugfest scheduled for the week of October 10, 2016 at the University of New Hampshire InterOperability Laboratory (UNH-IOL) in Durham, NH. For more information on the Ethernet Alliance/NBASE-T Alliance plugfest, please contact morgan@ethernetalliance.org or admin@nbaset.org.


Note: Xilinx is a founding member of the NBASE-T Alliance. (See “NBASE-T aims to boost data center bandwidth and throughput by 5x with existing Cat 5e/6 cable infrastructure” and “12 more companies join NBASE-T alliance for 2.5 and 5Gbps Ethernet standards.”


For additional information about the PHY technology behind NBASE-T, see “Boost data center bandwidth by 5x over Cat 5e and 6 cabling. Ask your doctor if Aquantia’s AQrate is right for you” and “Teeny, tiny, 2nd-generation 1- and 4-port PHYs do 5 and 2.5GBASE-T Ethernet over 100m (and why that’s important).”)






ACDC Lane.jpg 

I’m not sure that Inrevium named its ACDC Quattro Kintex UltraScale Development Platform after the Australian rock and roll band AC/DC, but the specs on the board certainly do recall the band’s albums High Voltage and Powerage. This is one high-powered dev board and much of that power comes from the Xilinx Kintex UltraScale KU115 FPGA, the largest member of the Kintex UltraScale FPGA family with 1.451 million system logic cells. The Kintex UltraScale KU115 is also a DSP monster with 5520 DSP48E2 slices on chip. (See “The UltraScale DSP48E2: More DSP in every slice.”) The FPGA is flanked by 4Gbytes of DDR4-2400 SDRAM and enough I/O to do what needs to be done. I/O ports on the board include one SFP+ optical cage, four FMC connectors, and 20 SMA connectors for various clocks. The board has already been proven to work with Inrevium HDMI4K, 12G-SDI, DP1.2, V-by-One, MIPI, and Zynq FMC cards.



Here’s a block diagram of the Inrevium ACDC Quattro Kintex UltraScale Development Platform:



Inrevium Quattro Kintex UltraScale Dev Board.jpg



Block diagram of the Inrevium ACDC Quattro Kintex UltraScale Development Platform




And here’s a photo of the board:



Inrevium Quattro Kintex UltraScale Dev Board.jpg


Inrevium ACDC Quattro Kintex UltraScale Development Platform



Please contact Inrevium for more information about the ACDC Quattro Kintex UltraScale Development Platform.

Digilent bakes All Programmable cookies—the eating kind

by Xilinx Employee on ‎07-20-2016 11:31 AM (1,117 Views)


While browsing the Digilent Web site, I spotted a new blog posted by Quinn Sullivan titled “Embrace Your Geekness!” that featured cookies—not the HTML kind but the eating kind. In the same spirit at the Zynq UltraScale+ MPSoC cookie discussed in one of Xcell Daily’s posts last year (see “First Xilinx Zynq UltraScale+ MPSoC Cookie Ships”), Sullivan’s blog post shows four cookies made to look like four Digilent products, all based on Xilinx All Programmable devices:



Digilent cookies.jpg



Digilent’s All Programmable Cookies




From left to right, these Digilent products are:








The cookies look so good, it’s a shame to eat them. I don’t suppose they’ll power up.



Several low-cost Digilent dev boards based on Xilinx All Programmable devices have Digilent PMOD expansion connectors on them including:




ARTY Board v2.jpg



Diglilent ARTY Board with four PMOD connectors along the top




“PMOD” is a contraction of “Peripheral Modules” and Digilent offers a boatload of these small PMODs to get your prototype up and running quickly. Currently, Digilent is running a “Summer Sale” on PMODs “now through the end of Summer.” I counted 68 PMODs on sale—way too many to list here. However, I’ll list a few of my favorites with the list and sale prices:

















This is your chance to stock up on some peripherals at pretty low prices.


Need to mate an NVMe SSD to your FPGA or Zynq SoC without solder? Opsero’s FPGA Drive to the rescue

by Xilinx Employee ‎07-19-2016 01:37 PM - edited ‎07-21-2016 11:28 AM (1,688 Views)


Opsero Electronic Design’s FPGA Drive cleverly allows you to connect an M.2 NVMe SSD to an FPGA dev board using a PCIe or FMC connection. A picture is worth 100 words in a blog post:



Ospero FPGA Drive.jpg



Opsero Electronic Design’s FPGA Drive connects an M.2 SSD to your FPGA/SoC dev board



The image on the left shows an FPGA Drive board with the PCIe form factor plugged into a Xilinx KC705 Kintex-7 FPGA Eval Kit and the image on the right shows an FPGA Drive board with an FMC connector plugged onto an Avnet PicoZed SOM based on a Xilinx Zynq Z-7030 SoC. A standard M.2 NVMe SSD will plug into either FPGA Drive board.


However, a hardware connection alone is not sufficient to make an SSD work within a system. You need drivers and a file system. For information on that software, turn to Jeff Johnson’s FPGAdeveloper blog titled “Measuring the speed of an NVMe PCIe SSD in PetaLinux” where Johnson tests the FPGA Drive’s performance using Xilinx PetaLinux on the KC705 board—where it runs on a Xilinx MicroBlaze processor—and the Avnet PicoZed SOM—where it runs on the Zynq SoC’s dual-core ARM Cortex-A9 MPCore processor. (Johnson is an electronic design consultant and Opsero is his design services company.)




Here’s a pleasant surprise: About two years ago, Andy Brown purchased 40 obsolete but unused, waffle-packed Xilinx Virtex-E FPGAs (introduced in 1999, see page 5 in Xcell journal, Issue 34) on eBay for the bargain price of less than £3 each and decided to build an FPGA development board around the device. (Note: Please do not consider eBay as an authorized Xilinx distributor based on this Xcell Daily blog post. They’re not, at least not in the present space-time continuum.)


Now practically speaking, the Virtex-E XCV600E device that Brown uses in this tutorial has been obsolete for many years—so obsolete that you will have a hard time locating the data sheet on the Xilinx Web site (it’s easier using Google) and you’ll need a really old version of the Xilinx ISE Design Suite—ISE Version 10.1 was the last to support Virtex-E devices—and a PC or virtual PC running Microsoft Windows XP to create designs for these historic parts.


What’s resulted from Brown’s efforts, and the reason for this blog post, is that Brown has written a really nice, basic tutorial on FPGA design with a jam-packed, 24-minute video based on his experience in designing this dev board.


Brown’s lightning-fast tutorial covers:


  • pcb design and escape routing for BGA packages
  • reflow soldering and surface-mount assembly considerations
  • FPGA basics including a discussion of Xilinx FPGA I/O banks
  • FPGA power-supply filtering basics
  • FPGA development-tool basics


And it’s all explained in Brown’s exceptionally easy-to-understand, matter-of-fact manner.



Andy Browns Virtex-E Dev Board.jpg



Andy Brown’s FPGA Development Board based on an obsolete Xilinx Virtex-E FPGA

(Photo from Andy’s Workshop)




And now, here for your entertainment and amusement is a Virtex-E FPGA marketing video from the year 2000, complete with some exciting disco music and an FPGA memory hierarchy that has stood the test of time for nearly two decades, until Xilinx introduced its 16nm UltraScale+ device families (keep reading below the video for more details):






For comparison purposes, the largest Xilinx Virtex-E device circa 1999, the XCV3200E with 73,008 logic cells and 1,038,336 bits of BRAM, is functionally smaller than a contemporary, mid-sized, 28nm Xilinx Artix-7 A75T FPGA with 75,520 logic cells and 3,780,000 bits of BRAM. In addition, the Artix-7 FPGA has some very important programmable features that the Xilinx Virtex-E FPGA did not, like its 180 DSP48E1 slices and eight 6.6Gbps SerDes transceivers.


As amusing as this comparison of old versus new might be, the Virtex-E FPGA family was still quite capable for its day. For example, here’s a 16-year-old paper titled “20-GFLOPS QR processor on a Xilinx Virtex-E FPGA,” written by Richard L. Walke, Robert W. M. Smith, and Gaye Lightbody of the Defence Evaluation and Research Agency in the UK and published in the year 2000 in the SPIE Proceedings. The FPGA used for the work cited in this paper was a Xilinx Virtex-E XCV3200E.


Xilinx Virtex-E FPGAs were the “bigger, better, faster” versions of the original, groundbreaking Xilinx Virtex parts, made possible by jumping from a 0.22μm IC process technology to a 0.18μm IC process technology. These first Virtex and Virtex-E devices introduced BRAMs (block RAMs) with true dual-port capabilities. Much improved BRAMs are still available today on Xilinx All Programmable devices and they are now augmented with UltraRAMs in the latest Xilinx UltraScale+ FPGAs and Zynq UltraScale+ MPSoCs. For the first time in 17 years, UltraRAMs introduce a new level in the All Programmable device memory hierarchy. (For more information about UltraRAM, see “UltraRAM: a new tool in the memory hierarchy you’ll want because it fits so well into your system designs.”)


Brown had his own very valid educational reasons for designing his dev board using the Xilinx Virtex-E parts that he purchased through eBay. You may well want that design experience or you may want to start learning to use modern FPGAs more quickly. The Virtex-E XCV600E FPGA on Andy Brown’s dev board actually has fewer on-chip programmable-logic resources than today’s smallest Artix-7 FPGA, the A15T. So if you want to dive right into FGPA exploration, you might want to consider getting the $99 Digilent ARTY board based on a Xilinx Artix-7 A35T FPGA or the $189 Digilent ZYBO trainer board based on a Xilinx Zynq Z-7010 SoC. Both the Artix-7 A35T FPGA and the Zynq Z-7010 SoC have more on-chip FPGA resources than the 17-year-old Virtex-E XCV600E FPGA and you’ll get to use the latest Xilinx Vivado Design Suite tools with these newer devices.


(For more information about the Digilent ARTY board, see “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????” and for more information on the Digilent ZYBO trainer board, see “ZYBO has landed. Digilent’s sub-$200 Zynq-based Dev Board makes an appearance (with pix!)”)




IBM Power Systems recently placed a story on the Washington Post Web site titled “Powerful computing crunches genomic data at warp speed” that describes the contribution Edico Genome is making to medical research using hardware accelerators to speed genomic research. Edico Genome president and CEO Pieter van Rooyen and his team were analyzing blood work in South Africa, working on two diseases—HIV and tuberculosis—and they developed a custom hardware accelerator card to speed results. That accelerator card is the Dragen, which plugs into IBM’s Power Systems S822LC high-performance computing servers based on IBM’s POWER8 processor. The DRAGEN accelerator card is based on a Xilinx 28nm FPGA.


The result? “Previously, sequencing one genome would have taken around 30 hours,” van Rooyen said. “We have that down to 26 minutes. And soon, we’ll have that down to 10 minutes.” He foresees a time when a full sequence can be compiled in near real time.



Edico Genome Dragen Accelerator Board.jpg



Edico Genome Dragen Accelerator Card for Genome Analysis is based on a Xilinx 28nm FPGA



For more information about Edico Genome’s Dragen accelerator card, see “FPGA-based Edico Genome Dragen Accelerator Card for IBM OpenPOWER Server Speeds Exome/Genome Analysis by 60x.”

Adam Taylor’s MicroZed Chronicles Part 139: Linux File System and Single Board Computer

by Xilinx Employee ‎07-18-2016 10:20 AM - edited ‎07-18-2016 10:22 AM (1,486 Views)


By Adam Taylor


In the build we used last week for our Linux example, we placed the ZedBoard’s required boot files on an SD Card. One of these files was a RAMdisk image of the file system needed for Linux. This file loads into RAM each time we boot the ZedBoard. Changes we make within the file system while the board is operating—e.g. transferred files, etc—are lost the next time we reboot.


If we actually want to keep these changes between power cycles, we need to put a file system on non-volatile memory such as the SD Card. Boards like the Snickerdoodle and Parallela use a file system in place of the RAMdisk. Although it is possible to update the RAMdisk contents as this blog shows, it requires a number of steps.


Going forward I want to look at image processing and Open CV so it is important that we have a file system we can save changes in.  I also want to use the ZedBoard as a single bard computer so I need to use the HDMI outout to generate a display of the desktop. That’s another reason we need a file system.


To do this we need a file system and we need to format the SD Card correctly. Looking first at the SD Card, we need to create two paritions. One partition is FAT formatted and that’s where we store the following files:


  • bin
  • zimage
  • Device Tree Blob
  • Any other files needed for boot


The second SD Card partition is where we store the file system. This is normally the larger of the two paritions and it’s formated for a Linux file format (e.g. EXT2 or EXT4).


We will need to use a Linux-based OS to create the partitions and then the install the file system. This is where our virtual machine comes in very handy.


With the Linux OS up and running, insert the SD Card into its reader, open a disk utility, and format the SDCard. Once it’s formatted, we create two partitions called boot and rootfs, sized and formatted as follows:


  • Boot, 512 Mb, FAT 32 (bootable)
  • Rootfs, 3.5 Gb, EXT4 or EXT2





Typically, the file system will be distributed as a compressed folder which you must extract onto the file system partition.


One of the most commonly used file systems for the Zynq-7000 SoC is Linaro, which can be obtained from linaro.org. This is this file system used for this example. I will be using the Zedboard.org desktop Ubuntu Linux example available here.


Should we need to build a system from scratch, we use u-boot to ensure it knows where the file system resides.


While the example comes with the device tree, kernel image, and Boot.bin file necessary to boot the system, we need to download the filesystem.


With the file system downloaded the next step is to unzip the contents of the filesystem to the SD card rootfs partition. We do this using the command in a Linux terminal window:


sudo tar - - strip-components=3 -C <path to rootfs>/rootfs -xzpf <path to downloaded fs>/linaro-precise-ubuntu-desktop-20121124-560.tar.gz binary/boot/filesystem.dir






This may take some time (5 to 15 minutes or even longer). The final step is to move the DTB, kernel image, and boot.bin files onto the boot partition of the SD Card. We should then be ready to boot the system.


When I did this I was presented with the following desktop on my HDMI Monitor:







The code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



 MicroZed Chronicles hardcopy.jpg




  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 



Note: The Amazon ebooks have been listed at $0.00 for a long time. That’s about to end on July 20 so you might want to download them today or tomorrow before the price changes.


VadaTech may call them PCIe carrier cards for FMC but they’ve got big, bigger, biggest FPGAs on board

by Xilinx Employee ‎07-15-2016 11:15 AM - edited ‎07-19-2016 06:16 AM (2,109 Views)


VadaTech just announced three new PCIe carrier cards for FMC, which means that each of the PCIe boards has a VITA-57 FMC connector on it but they also have some pretty capable Xilinx FPGAs on board as well. The three new cards are:



PCI516—PCIe FPGA Carrier for FMC, Virtex-7 690T FPGA

PCI592—PCIe FPGA Carrier for FMC, Kintex UltraScale KU115 FPGA

PCI595—PCIe FPGA Carrier for FMC, Virtex UltraScale VU440 FPGA



Today’s press release say that these cards are “ideal for bringing COTS PCIe systems up to date with the latest FPGAs,” and they are. They’re also good for ASIC prototyping/emulation and for building 100G networking gear; the series of three PCIe carrier cards gives you a family of products to use as a broad foundation for a range of end products.




Vadatech PCI595 FPGA Carrier for FMC.jpg


VadaTech PCI595-- PCIe FPGA Carrier for FMC, Virtex UltraScale VU440 FPGA

Adam Taylor thinks there are ten FPGA design techniques that you should know. Do you?

by Xilinx Employee ‎07-14-2016 01:35 PM - edited ‎07-14-2016 06:51 PM (6,307 Views)


Ten of Hearts.jpgAdam Taylor, author of the long-running MicroZed Chronicles and all-round engineer’s engineer, thinks there are ten FPGA design techniques that every design engineer should know. He’s published the list in an article on the EETimes Web site. These ten design techniques represent the fundamentals of digital system design and it would be well worth your time to learn them.


The ten design techniques are:


  1. State machine design: FSMs (finite state machines) are the bedrock of digital design. They’re deterministic, so they often serve as the base for safety-critical designs but they’re used everywhere for simpler sequential control applications where a microprocessor is overkill. Taylor has published a longer treatment of FSM design in an article titled “How to Implement State Machines in Your FPGA.” (This article has nearly 30,000 views to date.) Note that Xilinx offers you an intermediate choice between FSMs and microprocessors called the PicoBlaze microcontroller, a seriously tiny piece of IP that predictably executes one machine instruction every two clock cycles so it’s ideal for implementing state machines using assembly code instead of describing the FSM in HDL. (See “Hidden Gems: The Xilinx PicoBlaze Microcontroller—a tiny RISC processor for FPGAs.”)
  2. Basic FPGA math: FPGAs don’t have ready-built ALUs like microprocessors but they do contain all the bits and pieces you need to create any type of ALU including plenty of fast, killer DSP slices that give you a boatload of parallelism. Many digital systems must perform some sort of math so you need to know how to build math functions from these bits and pieces if you’re going to use an FPGA for your design. That means you need to understand fixed-point representation and how to implement it with the digital building blocks available in the FPGA. Taylor has published an article on this topic in EETimes titled “The basics of FPGA mathematics.” Or, you can cheat by selecting a device that also incorporates a microprocessor or two (or six) like the Xilinx Zynq-7000 SoC and the Xilinx Zynq UltraScale+ MPSoC. The ARM Cortex microprocessor cores incorporated into the devices in the Zynq product families all incorporate 32- or 64-bit ALUs and single- and double-precision FPUs (floating-point units).
  3. FIFO (first-in, first out) buffers: Taylor states that FIFO buffers are “one of the most useful structures within a FPGA.” That sounds like a good reason for you to understand them. You often find FIFO buffers used to buffer between subsystems where the data-producing function and the data-consuming function operate at different instantaneous rates. FIFO buffers with independent read and write clocks are especially good for crossing clock domains when data moves from one subsystem to the next. Xilinx BRAMs (block RAMs) incorporate special dedicated FIFO control circuitry that simplifies the design of FIFO buffers using the Xilinx FIFO Generator built into the Xilinx Vivado Design Suite.
  4. The CORDIC Algorithm: If you need to implement trigonometric functions on an FPGA, you will quickly learn about CORDIC (COordinate Rotation DIgital Computer) algorithms. Jack Volder at Convair developed the CORDIC algorithm in 1956 when he was tasked with developing a digital navigation computer for the B-58 Hustler bomber aircraft. (That's the iconic delta-winged bomber portrayed in the truly scary 1964 movie Fail-Safe.) HP used CORDIC algorithms a decade later when it developed trig functions for the HP 9100, its first desktop calculator, and again for the HP 35, the first handheld scientific calculator. (See “The HP 9100 Project: An Exothermic Reaction.”) CORDIC computations use only adds and shifts—no multipliers needed—so they’re fast and require relatively little hardware for implementation. Taylor recommends this tutorial on implementing the CORDIC algorithms on FPGAs, written by Ray Andraka in 1998. Taylor wrote an article about the topic too. You’ll find it on page 50 of Xcell Journal, Issue 79.
  5. Metastability Protection: Metastability is a tough design problem that will not go away until everything runs on the same clock, which is never. It occurs when you violate a register’s setup or hold times, which can cause the register’s output to oscillate. That’s a bad thing. It often happens when you’re dealing with asynchronous clocks or clock skew that you can’t control. Consequently, you need to know how to tame metastability. (It never actually vanishes entirely—welcome to quantum states and chaos theory.) Austin Lesea, the resident metastability expert at Xilinx, writes: “Metastability is difficult to measure, and there is actually some disagreement on HOW it is measured. …we have measured metastability eight ways (at least).” Typically, you use flip-flop synchronizers to reduce the likelihood of metastability. Taylor previously published a tutorial in EETimes on metastability problems titled “Wrapping One's Brain Around Metastability.”
  6. Discrete & Fast Fourier Transforms: Typically, systems deal with signals in the time domain but working with them in the frequency domain often simplifies signal manipulation. Taylor writes: “Depending upon the type of the signal -- repetitive or non-repetitive; discrete or non-discrete—there are a number of methods one can use to covert between time and frequency domains, including Fourier series, Fourier transforms, or Z transforms. Within electronic signal processing in general and FPGA applications in particular, the engineer is most often interested in one type of transform, the Discrete Fourier Transform (DFT), and its most popular variant, the Fast Fourier Transform (FFT). Of course, in order to eventually return to the time domain, we will also need to use the inverse DFT or FFT.” You’ll find Taylor’s article “Getting to Grips with the Frequency Domain” on page 48 of Xcell Journal, Issue 91.
  7. Polynomial Approximation: Polynomial expressions are a faster, easier way of calculating complex mathematical functions with sufficient accuracy for engineering purposes. You’ll find Taylor’s article “Calculating Mathematically Complex Functions” on page 44 of Xcell Journal, Issue 87 where he writes: “One of the great benefits of FPGAs is that you can use their embedded DSP blocks to tackle the knottiest mathematical transfer functions. Polynomial approximation is one good way to do it.”
  8. Infinite Impulse Response Filters: You can implement Butterworth, Bessel, Chebyshev (Types I and II), and elliptic filters using the infinite impulse response (IIR) function. IIR filters are one of the most efficient digital filter implementations and they can provide the stopband, passband ripple, and roll-off characteristics you require. There’s an older Xilinx White Paper on the topic written by Michael Francis from 2009 titled “Infinite Impulse Response Filter Structures in Xilinx FPGAs.”
  9. Finite Impulse Response Filters: Taylor writes: “Finite Impulse Response (FIR) filters are used when we want to guarantee a stable filter in which the phase of the filter will remain constant. This comes at the cost of an increased size (order) to obtain the required roll off and attenuation when compared to an IIR filter.” There’s a FIR filter compiler in the Vivado Design Suite.
  10. Image Processing Filters: FPGAs are just really good for implementing fast, real-time image-processing pipelines that perform tasks such as color correction, interpolation, deinterlacing, and scaling. A wide variety of FPGA-based video- and image-processing cores including a Real-time Video Engine are available from Xilinx.


That’s Adam Taylor’s top 10. Master ‘em all!


The MV1-D2048x1088-HS03-96-G2 GigE hyperspectral camera from Photonfocus images 2048x1088-pixel video at 42fps in 16 spectral bands from 470nm to 630nm (the blue through orange part of the visible spectrum) using an IMEC snapshot mosaic CMV2K-SM4X4-470-630-VIS CMOS hyperspectral image sensor. The camera images each spectral band at 512X256 pixels. This new Photonfocus GigE hyperspectral video camera complements the Photonfocus MV1-D2048x1088-HS02-96-G2 GigE hyperspectral video camera introduced last year, which images the 600nm to 975nm part of the visible spectrum in 25 spectral bands. (See “Hyperspectral GigE video cameras from Photonfocus see the unseen @ 42fps for diverse imaging applications.”)



Photonfocus MV1-D2048x1088-HS03-96-G2 GigE hyperspectral camera and sensor.jpg


Photonfocus MV1-D2048x1088-HS03-96-G2 GigE hyperspectral video camera and IMEC image sensor




Here’s a spectral sensitivity plot for this new hyperspectral camera:




Photonfocus MV1-D2048x1088-HS03-96-G2 GigE hyperspectral camera spectral plot.jpg


Photonfocus MV1-D2048x1088-HS03-96-G2 GigE hyperspectral video camera spectral sensitivity plot



Not coincidentally, the new Photonfocus hyperspectral camera is based on a Xilinx Spartan-6 FPGA vision platform—the same vision platform the company developed and used in last year’s MV1-D2048x1088-HS02-96-G2 GigE hyperspectral video camera. The FPGA-based platform gives the company maximum ability to adapt to new image sensor types as they appear. The Spartan-6 FPGA allows the engineers at Photonfocus to more easily adapt to radically different sensor types with different pin-level interfaces, bit-level interface protocols, and overall image-processing requirements.


Here, the requirements of the 16-band hyperspectral image sensor versus a 25-band hyperspectral image sensor clearly require significantly different image processing. Using an FPGA-based platform like the one based on a Spartan-6 FPGA that Photonfocus is using allows you to turn new products quickly and to capitalize on developments such as the introduction of new image sensors. This latest camera in the growing line of Photonfocus cameras is proof.


Contact Photonfocus for information about these GigE hyperspectral cameras.




According to an EETimes article titled “Virtual Nets Pave Road to Acceleration” written by Cavium’s Nabil Damouny, the ETSI (European Telecommunications Standards Institute) NFV work group has published several NFV (network functions virtualization) guides over the past six months relating to networking data-plane acceleration for VNFs (virtual network functions) requiring high performance and/or low latency. These documents address use cases, VNF interfaces, virtual switch benchmarks, and virtual network management. (Damouny is an ETSI NFV work group rapporteur, meaning he reports on the group’s activities.)


The documents include:














The ETSI Centre for Testing and Interoperability is organizing its first NFV Plugfest, currently scheduled for January 23 to February 3, 2017. It will be held in Leganes near Madrid, Spain. There’s also an ETSI NFV Webinar titled “ETSI NFV Interfaces and Architecture Overview,” scheduled for September 8, 2016.


Note: Xilinx Ireland is a member of the ETSI NFV work group.



New Video: Vision-based industrial robotics demo runs on Zynq UltraScale+ MPSoC, uses the WHOLE chip

by Xilinx Employee ‎07-13-2016 11:44 AM - edited ‎07-13-2016 03:57 PM (2,282 Views)


OK, it’s a solitaire-playing robot based on a delta-configured 3D-printer chassis, but this demo of the Zynq UltraScale+ MPSoC thoroughly uses the device’s full capabilities to perform all of the following tasks on one chip:


  • Video and image processing
  • Object detection and recognition
  • Algorithm-based decision making
  • Motion path selection
  • Motor-drive control
  • Safety-critical event detection and safe shutdown
  • GUI for user interaction, status, and control
  • System configuration and security management


The demo distributes these tasks across all of the computing and processing elements in the Zynq UltraScale+ MPSoC including:


  • Four ARM Cortex-A53 application processors
  • Two ARM Cortex-R5 real-time processors operating in lockstep for safety-critical tasks
  • ARM Mali-400 GPU
  • Xilinx UltraScale+ programmable logic fabric


The Zynq UltraScale+ MPSoC’s programmable logic fabric handles Input from the demo system’s camera, which requires high-speed processing. Letting the on-chip FPGA fabric handle the direct video processing is far more practical than using software-driven microprocessors with respect to power-consumption and performance. However, the on-chip ARM Cortex-A53 processors are ideal for image recognition and decision making based on the recognized objects.


The dual-core ARM Cortex-R5 real-time processors handle motor control. The operate in lockstep because this is a safety-critical operation. Any time there’s metal in motion that could cause accidental injury, you have a safety-critical task on your hands.


Finally, the Mali-400 handles the GUI’s graphics and the video inset overlay.


Quite a lot packed into that one chip. Maybe this system looks a lot like one you need to design.


Here’s the 4-minute video:





For more information on the Zynq UltraScale+ MPSoC and additional technical details about this demo, see Glenn Steiner's blog post "Heterogeneous Multiprocessing: What Is It and Why Do You Need It?"





The MANGO Project, funded from the European Union’s Horizon 2020 research and innovation program, has developed a research vehicle for exploring power, performance, and predictability for new manycore architectures with an emphasis on extreme resource efficiency through power-efficient, heterogeneous HPC (high-performance computing) architectures. This hardware research vehicle, the MANGO architecture, serves as both a computing platform and an emulation platform for new HPC architectures.


The current plan is to demonstrate three real-time applications running on the MANGO platform: transcoding, medical imaging, and security-related applications. In addition, researchers will use the MANGO platform for researching QoS strategies within a heterogeneous HPC context and hardware/software co-design for power and thermal management.


The MANGO platform combines general-purpose compute nodes (GNs) with heterogeneous acceleration nodes (HNs). MANGO GNs are standard blade servers that combine high-end CPUs and GPUs. MANGO HNs are based on reprogrammable hardware using Xilinx All Programmable devices, implemented using modular proFPGA quad motherboards and daughter cards from PRO DESIGN Electronic gmbh. Each proFPGA daughter card currently carries a Xilinx Virtex-7 2000T FPGA or a Xilinx Zynq-7000 SoC.


MANGO HNs implement tiles, defined as a basic computing unit with the needed communication support. The figure below shows the overall MANGO architecture with GNs on the left and HNs on the right.



MANGO Architecture Phase 1.jpg 



MANGO Architecture Phase I



Tiles are replicated many times within the system, making them appear to be homogeneous components but internally, each tile can be customized to meet different computing needs and capabilities, resulting in a heterogeneous computing configuration as shown below:



MANGO Architecture Tile Detail.jpg



MANGO Architecture Tile Detail



The figure shows a daughter card carrying a Xilinx Virtex-7 2000T FPGA implementing four HN tiles and a daughter card carrying a Zynq-7000 SoC implementing two HN tiles.


System architecture is developed using PRO FPGA’s proFPGA Builder software, which automatically detects the physical layout of motherboards and daughter cards and generates a code framework for multi-FPGA HDL designs including scripts for simulation, synthesis, and for running the design. The Xilinx Vivado Design Suite then performs the synthesis, placement, and routing for the Xilinx FPGAs in the system.


For more information about the MANGO Project, download this paper: “The MANGO FET-HPC Project: An Overview.”



Today, National Instruments (NI) launched its 2nd-generation PXIe-5840 Vector Signal Transceiver (VST), which combines a 6.5GHz RF vector signal generator and a 6.5GHz vector signal analyzer in a 2-slot PXIe module. The instrument has 1GHz of instantaneous bandwidth and is designed for use in a wide range of RF test systems including 5G and IoT RF applications, ultra-wideband radar prototyping, and RFIC testing. Like all NI instruments, the PXIe-5840 VST is programmable with the company’s LabVIEW system-design environment and that programmability reaches all the way down to the VST’s embedded Xilinx Virtex-7 690T FPGA. (NI’s 1st-generation VSTs employed Xilinx Virtex-6 FPGAs.)



NI 2nd Generation VST Virtex-7 Internal Detail.jpg 


National Instruments uses this FPGA programmability to create varied RF test systems such as this 8x8 MIMO RF test system:



NI 2nd Generation VST MIMO Test Systeml.jpg




And this mixed-signal IoT test system:



NI 2nd Generation VST IoT Test Systeml.jpg




For additional information on NI’s line of VSTs, see:









S2C has announced eight new Prototype Ready interface cards and accessories to its growing library of off-the-shelf hardware and software products compatible with its Prodigy Complete Prototyping Platform. These new modules allow you to prototype SoC designs using a variety of pre-verified interfaces that work out-of-box with S2C’s comprehensive family of Prodigy Logic Modules, which includes modules based on Xilinx Virtex UltraScale, Kintex UltraScale, Virtex-7, and Kintex-7 All Programmable devices. Interfaces in the library now include general-purpose ports (GPIO, USB 2.0, USB 3.0, PCIe and PCI, Gigabit Ethernet, GMII and RGMII, and RS-232); high-speed GT-based ports (PCIe Gen2 and Gen 3, SFP+, and SATA); media-oriented peripherals (HDMI, DVI, MIPI, and VGA); and expansion ports (FMC and DDR2/3/4).


For more information about S2C’s Prodigy Logic Modules, see:





And don’t forget to get a copy of S2C’s new book on FPGA Prototyping, “PROTOTYPICAL: The Emergence of FPGA-based Prototyping for SoC Design.” (See “New S2C Book on FPGA Prototyping: Download it for free immediately before they change their minds!”)



Prototypical S2C Book Cover.jpg




ELMG’s Dlog IP and ControlScope app provide deep insight into FPGA-based power conversion for Zynq SoCs

by Xilinx Employee ‎07-11-2016 11:50 AM - edited ‎07-11-2016 11:52 AM (1,981 Views)


When you’re first developing the algorithms to use programmable logic for controlling a lot of power, things can go south very quickly making sophisticated diagnostics is pretty handy. ELMG Digital Power, a New Zealand consultancy focused on power control, has developed a sophisticated monitoring capability in the form of a data-collection IP block called Dlog that you can use to collect data from within a Xilinx Zynq-7000 SoC and a companion application called ControlScope that allows you to visualize significant events occurring in your power-control design. There’s a new blog posted on ELMG’s Web site that describes these two products.


Combined, Dlog and ControlScope allow you to collect and analyze large amounts of data so that you can detect power-control problems such as:


  • single sample errors
  • clipping
  • overflow
  • underflow or precision loss and
  • bursty instability due to precision loss


Data can be logged to on-board Flash memory cards at a high data rate. It can also be transferred over Ethernet to a PC running at 25Mbytes/sec.


This is probably a good time to remind you of tomorrow’s free ELMG Webinar on Zynq-specific power control presented by Dr. Tim King, ELMG Digital Power’s Principal FPGA Engineer. Key digital power control questions to be answered in this Webinar include:



  • What is important in digital power, including numeric precision and latency?
  • How do you design a compensator in the digital domain?
  • Why you would use a FPGA for digital power and why the Zynq-7000 SoC in particular?
  • What are the key issues for digital controllers based on programmable logic including the serial-parallel trade-off, choosing between fixed- and floating-point math, determining the needed precision, and selecting sample rates?
  • What building blocks are available for digital control including ELMG’s licensable IP cores?
  • How can you use the Zynq SoC’s dual-core, 32-bit ARM Cortex-A9 MPCore processor to full advantage for power control?


  • The Webinar will also discuss IIR digital filter design in a case study, along with understanding the delta operator


There are a limited number of spots for this Webinar and I received an email today from ELMG that said only 150 spots remained.


Register here.




By Adam Taylor


Having completed the “hello world” program on our Zynq-based Snickerdoodle system last week (see “Adam Taylor’s MicroZed Chronicles Part 137: Getting the Snickerdoodle to say “hello world” and wireless transfer”), we’ll now look more deeply into how we can exploit the capabilities of the Zynq-7000 SoC’s PL (programmable logic) side with the Linux OS.


We have looked at Linux previously, including Xilinx’s PetaLinux. However, looking back I see that there are a few areas that I want to cover in more depth. So before showing you how to build a Zynq SoC PL configuration and an appropriate Linux OS for the Snickerdoodle, I am going to quickly review what we need to do for the general case.


We first look at what we need to do to create a Zynq design that exploits the PL’s hardware capabilities while running a Linux OS. To run this we need the following:


  • First-stage bootloader - Generated by Xilinx SDK.
  • Bit file – Generated by Vivado.
  • Second-stage bootloader – Uboot, loads the image and the root file system.
  • Root file system – Here we have two options: a ramdisk image or a file system on a separate partition on the boot medium.
  • Kernel image – Can be prebuilt or regenerated from source.
  • Device tree blob - Identifies the hardware configuration to the kernel.


The procedures for generating the PL bit file and the first-stage bootloader are the same regardless of which operating system we wish to use. However, the remaining tasks will be new if we have not developed for Linux before.


Rather helpfully Xilinx provides everything we need prebuilt (kernel image, uboot, ramdisk, etc.) with each new version of Vivado. We can obtain prebuilt versions of these files from the Xilinx Linux Wiki for the Zedboard, ZC702, and ZC706 dev boards. Using these prebuilt kernel, ramdisk, and uboot files with an updated device tree that represents the PL design in the bit file is a good way to get our system up and running quickly. To use this approach, we need to check that the prebuilt kernel contains the drivers for our PL design. We can find the list of drivers here.


To make the Linux OS familiar with the hardware, we use the device tree blob, which details the memory, interrupts, locations, etc. of the hardware connected to the processor. When we develop a bare-metal application, we need to generate a Board Support Package (BSP) that contains details of the drivers required and address locations. The device tree blob does something similar to the BSP but does it for Linux. We can generate the raw device tree source (DTS) file within either Microsoft Windows or Linux. However, we can only compile the DTS into the DTB using a Linux installation.


We need to download the device tree plug-in for the Xilinx SDK from the Xilinx github to generate the DTS file. We can also get all of the other files to build the kernel, uboot, etc. from the same GitHub repository.


Once we have downloaded the device tree compiler onto our computer, we need to add it as a repository into which the SDK will generate the DTS from our hardware platform specification. In my system I added this as a global repository. With the plug-in installed, we will see a new device_tree option under file type when we select File -> New -> Board Support Package. For Linux applications, we wish to generate a device tree file.


The example hardware platform specification for this demo is simple and connects the PS to the ZedBoard’s LEDs via an 8-bit AXI_GPIO module.






This process takes your hardware platform from Vivado (importing it as we previously have done for bare-metal builds) and creates the device tree source. As Vivado generates this file, it will ask you to define any boot arguments. For the pre-built Zedboard, these arguments are:


console=ttyPS0,115200 root=/dev/ram rw earlyprintk






This process generates several files:


  • dts – Contains the boot arguments and main system definitions.
  • dsti – Called up by system.dts and contains all the definitions for the PL hardware memory-mapped devices.
  • Zynq-7000.dsti – Called up by system.dts and contains all the definitions for the wider PS system.


We then use the device tree compiler to convert these files into a compiled device tree blob that we can use in our system. We must use a Linux machine to do this.


The first thing to do within our Linux environment is to download the device tree compiler. If you do not already have it, use the command:


sudo apt-get install device-tree-compiler



Once this is installed, we can compile the device tree source using the command:



dtc -I dts -O dtb <path> system.dts -o <path>devicetree.dtb



The dtb (device tree blob) is device-independent and does not require cross compilation for the ARM architecture.



With the device tree compiled, we can then create a boot image (boot.bin) using a first-stage bootloader based on the hardware platform and the prebuilt uboot.elf.






We can then put the boot.bin, devicetree.dtb, ramdisk, and kernel image files on an SD Card, insert it into the ZedBoard, and the Linux OS should boot successfully. For this example, the PL design has an AXI_GPIO module connected to the LEDs on the ZedBoard. If all is working properly, we will be able to toggle the LEDs on and off.


There are also other ways to check for successful mapping to the PL hardware. For example, we can connect to the ZedBoard using WinSCP and explore the file system of the Linux OS running on the ZedBoard. To see that our PL device has been correctly picked up, we can navigate to the directory /sys/firmware/devicetree/base/amba_pl/ where we’ll see the GPIO module and the address range that Vivado assigned to it.


If we wish to test the functionality of the GPIO driving the LEDs using SSH, we can access the ZedBoard and issue commands to control the LED status. We find these within the /sys/class/gpio/ directory where there are two exported GPIOchips. The LEDs are connected to the first one, which ranges from IO 898 to 905. These I/O addresses correspond to the eight LEDs. We can work out the GPIO size by looking in the gpiocchipXXX directories and examining the ngpio file.


We can quickly test the GPIO by turning on the LEDs using the commands in the screen shot below:






The code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.




 MicroZed Chronicles hardcopy.jpg



  • Second Year E Book here
  • Second Year Hardback here




 MicroZed Chronicles Second Year.jpg


There’s a new memory in the system-design hierarchy and it’s called UltraRAM. For as long as we’ve had digital electronic computing machines—since 1945—we’ve used memory hierarchy to handle varied storage requirements. At the small, fast end are registers. At the other end, where there are massive storage needs, there’s disk and tape. Focusing exclusively on semiconductor memory, DRAM and Flash EPROM occupy the big, slow end of today's memory hierarchy. System designers strongly want to keep data on chip as long as possible because of the power and delay penalties associated with going off chip. UltraRAM, a new and larger on-chip SRAM technology incorporated into all Xilinx UltraScale+ device families (Virtex UltraScale+ FPGAs, Kintex UltraScale+ FPGAs, and Zynq UltraScale+ MPSoCs), adds a new and very useful level in the memory hierarchy for systems based on All Programmable FPGAs and SoCs.



UltraRAM in the Memory Hierarchy.jpg 



You will want to consider using UltraRAMs for your next design. Here’s why:


As discussed in previous Xcell Daily blog posts, UltraRAM blocks are dual-ported, 288Kbit synchronous SRAMs that blow away old on-chip memory limits. You can easily stitch together several UltraRAM blocks to create larger, multi-Mbit, on-chip memories without using additional programmable logic.


UltraRAMs do not replace the bulk storage of off-chip SDRAMs or the non-volatile storage of Flash EPROM. Instead, they postpone the need to go off chip with your data. You want to do that for at least three reasons:


  1. On-chip SRAM is faster than off-chip SDRAM (duh!)
  2. Timing closure is much, much easier with on-chip memory
  3. A simple SRAM interface means no need for a memory controller to handle complex memory interfacing protocols


Instead of replacing SDRAM and Flash memory, you can use the UltraRAMs’ speed and capacity to replace intermediate-capacity storage requirements currently served by off-chip SRAMS, RL-DRAMs, and CAMs in applications including:


  • Shallow buffers
  • Memory tables
  • Intermediate caches



Absorbing those external memories into an FPGA or MPSoC can reduce your BOM cost in addition to boosting performance, cutting power consumption, and decreasing the amount of pcb real estate consumed by external memory chips.


You might think that UltraRAMs eliminate the need for BRAMs, which FPGAs have incorporated for more than a decade. Not so. BRAMs have at least three advantages over UltraRAM:


  1. Configurable port width—for more efficient memory use
  2. Built-in FIFO circuitry—less effort and lower on-chip resource use if you’re building FIFOs
  3. True dual porting with separate clocks per port—oh so useful for crossing clock boundaries


For these reasons, all UltraScale+ devices incorporate BRAMs in addition to any on-chip UltraRAMs.


There’s an excellent, new 20-minute video that gives you the technical details you need to help you judge the value of UltraRAM to your system-level design. Here it is:







And don’t forget the UltraRAM White Paper, available here.


For previous Xcell Daily UltraRAM coverage, see


VME board design discovers that FPGAs are indeed the Fountain of Youth

by Xilinx Employee ‎07-08-2016 10:51 AM - edited ‎07-08-2016 11:44 AM (2,403 Views)


I designed workstation boards at Cadnetix based on VERSAmodule bus protocols back in 1981 and I used PLDs to do so. That was three years before Xilinx was founded and four years before the birth of the FPGA, so I used bipolar PALs (mostly 16L8s) for my designs. VME (VERSAmodule Eurocard) board designers later used FPGAs. A new article published this week on Electronic Design’s Web site titled “VME Interfaces Return to FPGAs” discusses the full-circle history of VMEbus design from FPGAs to ASSPs and now back to FPGAs. Author Michael Slonosky, Senior Product Marketing Manager for Power Architecture SBCs at Curtiss-Wright Defense Solutions, writes:


“Originally handled with discrete components, the VME interface moved to FPGAs to reduce board real-estate usage and increase performance and features. FPGA implementations then migrated into integrated devices, such as Tundra/IDT’s Universe, Universe II, and more recently, the Tempe device. Now, following the announcement in 2014 that the popular Tempe Tsi148 VMEbus bridge chip was being end-of-life’d (EOL), many leading COTS vendors have returned to the FPGA approach.”


The author then cites Curtiss-Wright’s Helix, an FPGA-based replacement for the Tempe Tsi148 VMEbus bridge chip with superset features. In addition to allowing Curtiss-Wright to add features, basing Helix on an FPGA essentially future-proofs the interface design. Curtiss-Wright is a major VMEbus supplier so the company is necessarily committed to making sure that there’s a future roadmap for its existing board designs and for new complementary boards yet to be designed.


And which FPGA did Curtis-Wright use to implement the Helix design? A Xilinx Artix-7 A50T, which the company prominently highlights in this product photo on its VME Web page:



Curtiss-Wright Helix VMEbus Interface Chip based on Artix-7 FPGA.jpg


Curtiss-Wright Helix VMEbus Interface Chip based on Artix-7 FPGA



This example underscores the FPGA’s ability to act like a Fountain of Youth for still-viable system designs facing oblivion simply because of component obsolescence. FPGAs like the Artix-7 device used by Curtiss-Wright to implement Helix extend a product’s design life. Unlike rather specialized ASSPs like a VMEbus interface chip, FPGAs tend to have extremely long life cycles because their universal nature allows them to be designed into a very wide range of long-lived systems.


Proof? Here’s what Curtiss-Wright’s Web page says:


“Going forward, Helix will be implemented in our flagship VME boards. Several current products (Power Architecture VME-194 and Intel VME-1908) will be updated to use Helix, thereby significantly extending their lifecycle.”


The VMEbus dates back to 1981 and the legendary 32-bit Motorola 68000 microprocessor yet it’s still used today for advanced military and telecom systems. The “VM” in VME stands for VERSAmodule, which was Motorola’s first card-level bus for the 68000 processor and the “E” stands for Eurocard, a European standard pc board format that used vastly superior 2-piece DIN connectors in place of the VERSAmodule’s card-edge connectors. VMEbus cards became extremely popular, especially for military and telecom applications, because they were rugged and they provided a reasonable amount of board space for electronics. Instead of fading away with technology like many early microprocessor bus standards, the VMEbus has tracked electronic advances with speed and capability extensions standardized under VITA (the VMEbus International Trade Association), which became an official standardization body way, way back in 1993. Now that VME design has reverted to using FPGAs as interface chips, perhaps the ever-evolving VME standard will see another 35 years of longevity.




Existing system architectures aren't keeping up with the performance demands of data-intensive applications. What can you do? Accelerate! But coupling a powerful server processor with hardware acceleration technology isn't easy. That’s why the OpenPOWER Foundation is sponsoring the Developer Challenge, split into “courses.” The two courses that apply to FPGA acceleration are:


  1. The Open Road Test
    • Port your application to OpenPOWER
    • Performance optimize & race against your first compile
    • Go faster with accelerators
  2. The Cognitive Cup: Multiple Deep Learning challenges to choose from:
    • ArtNet: Recognize artwork with Deep Learning
    • TuneNet: Guide Programmer Optimization Using Deep Learning
    • YourNet: Define and solve your own Deep Learning problem


The grand prize for each course is an all-expenses paid trip to Supercomputing 2016 for one team member and an Apple Watch Sport for each of as many as five team members. Second price for each course is an Apple iPad Air 16GB for as many as five team members. Third price for each course is an Apple Watch Sport for as many as five team members.


You’ll be working in the OpenPOWER Foundation’s SuperVessel development environment, which now includes the Xilinx SDAccel Development Environment, which delivers a GPU-like and CPU-like programming experience for data center workload acceleration. (See “Google and Rackspace to develop servers based on IBM’s POWER architecture while IBM and Xilinx bring SDAccel to SuperVessel.”) If you would like to develop your Open Road entry using SDAccel, sign up for the Developer Challenge and then request access to SDAccel from Xilinx.


Slightly confused? You can watch the recorded Orientation Hangout here to meet the experts and get a tour through useful resources and demos to help you get going on your challenge submission. There’s also a Hangout on using AccDNN for Deep Learning on FPGAs to Thursday 6/7 at 9PM EST. (That’s today!)




With the Zynq UltraScale+ MPSoC, there’s never an excuse for running shy of processing power. If as many as four 64-bit ARM Cortex-A53 application processors, two ARM Cortex-R5 real-time processors, a hardened and embedded ARM Mali-400 GPU, and an embedded H.265/H.264 video codec don’t get you there, then you’ll find big chunks of 16nm FPGA programmable logic fabric on these devices to build whatever hardware you need. Accelerators built with programmable logic can boost performance by as much as 100x with significantly less power consumed the same function implemented in software.


Xilinx along with EEJournal.com has just posted a new 14-minute Chalk Talk video covering these topics. If you look really quickly, you’ll also see a futures roadmap hinting at a 7nm Zynq family fly by as well, making the Zynq architecture a 3-node family with lots of options for your next design, and your next design, and your next.


Here’s the video:






Here are a two additional Zynq UltraScale+ MPSoC infobits to consider:



  • If you’re even slightly interested in using the Zynq UltraScale+ MPSoC’s broad array of heterogeneous processors, then a really good place to start would be Mentor Embedded’s upcoming July 13 Webinar “Developing Multiple OSes on a Xilinx Zynq UltraScale+ MPSoC” because successfully running multiple OSes on an MPSoC that combines multiple asymmetric microprocessor cores (ARM Cortex-A53s and Cortex-R5s in the case of the Zynq UltraScale+ MPSoC), graphics processing units, DSPs, FPGA-based offload engines, and programmable I/O is no simple feat. You have less than a week to reserve a slot. Register here.


  • According to the Zynq UltraScale+ MPSoC product selection guide, more than a dozen of the Zynq UltraScale+ MPSoC family members incorporate Xilinx’s new UltraRAM. What’s that? UltraRAM blocks are large, fast, on-chip, dual-ported synchronous 288Kbit SRAMs with a fixed configuration of 4,096 72-bit words. They’re a lot bigger than Block RAMs (BRAMs) that you now find in every Xilinx All Programmable device and they’re denser, which allows Xilinx to pack in significantly more storage per mm2. Zynq UltraScale+ MPSoCs have as much as 36Mbits of on-chip UltraRAM (and as much as 34.6Mbits of BRAM). There’s a new Xilinx White Paper titled “UltraRAM: Breakthrough Embedded Memory Integration on UltraScale+ Devices” that more fully describes Xilinx UltraRAM and discusses features and use models.






There’s a new Xilinx video online that discusses the power and performance advantages of the new 16nm Kintex UltraScale+ FPGAs versus the extremely successful 28nm Kintex-7 All Programmable devices. Essentially, you get a fortunate choice:



  • You can cut the operating power of a design by 50% while getting a 20% performance boost, with a resulting 2.4x improvement in performance/Watt.


  • You can cut operating power of a design by 20% while getting a 60% performance boost, doubling performance/Watt.


Both are great choices to have and you can make the choice at any time because all you’re doing is selecting the core operating voltage. The Kintex UltraScale+ FPGAs will run with core operating voltages of either 0.85V or 0.72V. Which you pick depends on your power and performance requirements.


Here’s the 5-minute video:






HiTech Global’s new HTG-847 Emulation/Prototyping Platform provides ASIC and SoC design teams with a fast way to test their designs. The platform accommodates two or four Xilinx Virtex UltraScale VU440 FPGAs for a maximum capacity of 22.164M system logic cells, 11,500 DSP slices, and 354.4Mbits of fast on-chip SRAM. Each Virtex UltraScale VU440 FPGA communicates with each of the other three Virtex UltraScale VU440 FPGAs on the platform using four 16.3Gbps GTH serial transceivers and 168 single-ended I/O pins as shown in this block diagram:



HiTech Global HTG-847.jpg



HiTech Global’s HTG-847 Emulation/Prototyping Platform – Block Diagram



The HTG-847 Emulation/Prototyping Platform features 12 FPC expansion connectors carrying an aggregate of 144 GTH serial transceivers, each capable of bidirectional 16.3Gbps communications, and 1920 single-ended select I/O pins. The company offers several FMC mezzanine boards to expand the HTG-847 Emulation/Prototyping Platform and FMC cables for connecting each HTG-847 board to two adjacent boards.


Here’s a photo of the board:



HiTech Global HTG-847 photo.jpg



HiTech Global’s HTG-847 Emulation/Prototyping Platform





Mentor Embedded along with Xilinx will present a free Webinar on July 13 titled “Developing Multiple OSes on a Xilinx Zynq UltraScale+ MPSoC.” You might want to register for this event if you are using or considering the Xilinx UltraScale+ MPSoC in your latest design because successfully running multiple OSes on an MPSoC that combines multiple asymmetric microprocessor cores (ARM Cortex-A53s and Cortex-R5s in the case of the Zynq UltraScale+ MPSoC), graphics processing units, DSPs, FPGA-based offload engines, and programmable I/O is no simple feat. This Webinar promises to help you gain a deeper understanding of the key issues and challenges encountered when developing and debugging software on complex systems based on Xilinx Zynq UltraScale+ MPSoCs. You will learn how to realize the full potential of the device.


In particular, the Webinar will cover Mentor Embedded’s OpenAMP-compatible Multicore Framework and its extensions. You will learn about various use cases and configurations that allow applications on the various cores in the Zynq UltraScale+ MPSoC to run natively, either supervised or in trusted secure environments, or both. These use cases will allow you to design robust and secure devices that can be certified to ISO 26262, IEC, or DO-178C standards, if needed.


Register here.



About the Author
  • Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.