We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!


For a limited time, Digilent is offering a $100 discount (that’s half off) on its Digital Discovery, a USB 24-channel logic analyzer (800Msamples/sec max acquisition rate @ 8 bits) and 16-channel digital pattern generator. (See “$199.99 Digital Discovery from Digilent implements 800Msample/sec logic analyzer, pattern generator. Powered by Spartan-6” for more information on this interesting little instrument.) To get the discount, you need to order at least $500 worth of items in the FPGA category on Digilent’s Web site.



Digilent Digital Discovery Fall Promo.jpg 




Digilent offers a truly wide selection of development and trainer boards in this category. Officially listed are boards based on Xilinx Spartan-6, Spartan-7, Artix-7, Kintex-7, and Virtex-7 FPGAs. You’ll also find several boards based on the Xilinx Zynq Z-7000 SoC including the Zybo, Arty Z7, ZedBoard, and the Python-based PYNQ-Z1. If you’re into vintage design, you’ll even find some older Xilinx devices on boards in this list including the CoolRunner-II CPLD and the Spartan-3E FPGA.


Why the deal on a Digilent Digital Discovery board and how does this deal work? Details in this new Digilent post by Kaitlyn Franz titled “Debugging Done Right.”






Got memory transfer-rate problems? Need more memory speed? GSI Technology may have an antidote for your problem in its SigmaQuad-IIIe SRAM, which is capable of transferring 7.2Gbytes/sec in both the read and write directions simultaneously when connected to a Xilinx Kintex UltraScale FPGA over separate, 36-bit read/write buses. That’s with an 800MHz clock rate.


How does GSI know this? The company has developed its own eval board to develop the memory controller IP needed to effect this nosebleed transfer rate and that IP is available free to GSI’s customers. (Contact GSI directly for more information.)


Here’s a photo of the GSI Eval Board with a SigmaQuad-IIIe SRAM connected to a Xilinx Kintex UltraScale KU040 FPGA:




GSI Quad SRAM Eval Board Ultrascale-SQ3e Small.JPG


GSI’s SigmaQuad-IIIe SRAM/Kintex UltraScale Eval Board




For more information, see this GSI brochure titled “Leading-Edge Memory Solutions for UltraScale & UltraScale+ FPGAs.” You might note from the brochure that GSI is already experimenting with Xilinx Kintex UltraScale+ FPGAs and their SigmaQuad-IVe SRAMs. Unsurprisingly, that combination goes even faster. More details when they’re available.




Earlier this year, I wrote a blog about a free Doulos Webinar titled “Developing Linux Systems on the Zynq SoC Using Yocto” and more than 13,000 people read that blog so I’m happy to say that Doulos has updated that Webinar, which is now titled “Developing Linux Systems on Zynq UltraScale+ Using Yocto.” Despite the title, the Webinar appears to cover Linux for both the Zynq SoC and Zynq UltraScale+ MPSoC. Doulos Senior MTS Simon Goda will be the presenter once again.


The webinar will cover:


  • A detailed introduction to creating a bootable Linux system
  • How to customize Linux to fit your project’s specific requirements
  • How Xilinx supports Yocto using the meta-xilinx layer



Once again, the Webinar is free and will be presented at two different times on October 6 to accommodate the widest number of time zones worldwide.


For more information and to register, click here.


By the way, it’s still free.


Sounds like a riddle, doesn’t it? How do you squeeze 1024 Xilinx Kintex UltraScale FPGAs plus 16Tbytes of DDR4 SDRAM into a standard 19-inch data-center rack and why would you do that? I can’t tell you how you would do it but I can tell you how IBM Research did it. They started with the design of the FPGA card, mounting a Kintex UltraScale KU060 FPGA on a PCIe card along with a big chunk of DDR4 SDRAM and a Cypress Semiconductor PSOC with an on-chip ARM Cortex-M3 processor for housekeeping over USB. They also instantiated a 10GBASE-KR 10Gbps backplane Ethernet NIC in the FPGA. (This is definitely an application where you want those bulletproof Xilinx UltraScale MGT SerDes transceivers.)


The card looks like this:




IBM Research Kintex UltraScale FPGA Server Card.jpg 




Next, IBM Research stuffed 32 of these cards into a half-rack “sled”—a passive carrier board that electronically aggregates the boards with an Intel FM6000 multi-layer switch chip that funnels the 10GbE connections into eight 40GbE optical connections. Then, they bolted two sleds into a 2U, 19-inch rack chassis that connects to the rest of the rack via the 16 40GbE ports on the north side of the Ethernet switches. Install 16 of these chassis into a rack, add 50kW of power and water cooling, and there you have it.


What do you have? Allow me to quote from the conclusion of the IBM paper titled “An FPGA Platform for Hyperscalers,” presented at last month’s IEEE Hot Interconnects conference in Santa Clara:


“…we first compared the network performance of our disaggregated FPGA with that obtained from   bare-metal servers, virtual machines, and  containers. The results showed that standalone disaggregated FPGAs outperform them in terms of network latency and throughput by a factor of up to 35x and 73x, respectively. We also observed that the Ethernet NIC integrated within the FPGA fabric was consuming less than 10% of the total FPGA resources.”



Note: The open-source TCP/IP stack used in this system was developed by Xilinx and the Systems Group at ETH Zurich and can be found at: http://github.com/fpgasystems/fpga-network-stack. It is written in Vivado HLS and supports thousands of concurred connections at a rate of 10Gbps.



Guitar pedal aficionados will love this winning entry in the Xilinx University Program’s Open Hardware 2017 competition, Student Category. It’s a senior-year project from Vladi Litmanovich and Adi Mikler from Tel Aviv University and it’s based on an Avnet ZedBoard Dev Kit. The on-board Zynq Z-7020 SoC accepts audio from an electric guitar, passes it through four effects processors, sums the resulting audio, and ships the audio to a guitar amplifier. The result is a multi-effect processor similar to the stacked pedal processors favored by musicians for the last 50 years to achieve just the right sound for each song.


The on-board Zynq SoC implements the multi-effects processor’s user interface, based on the ZedBoard’s switches and LEDs, and implements four real-time audio effects using the Zynq SoC’s programmable logic:


  • Distortion and Overdrive
  • Octavelo (an Octaver plus Tremolo)
  • Tremolo
  • Delay


Here’s a block diagram of the audio-processing chain:



Guitar Multi-Effects Processor Block Diagram.jpg




There’s a complete description of the project with implementation details on GitHub.



Because this is music, a video demo is much better for illustrating these audio effects:










I really enjoy watching EEVBlog’s Dave Jones tear down a piece of equipment and give his analysis of the good and bad of the design under scrutiny. The subject of Jones’ latest teardown is a Rigol DL3021 200W Programmable DC Electronic Load and I watched this video with no expectation that it would offer up an Xcell Daily blog topic. After all, who uses an FPGA in an electronic load? Rigol, that’s who. There’s a Xilinx Spartan-6 LX9 FPGA at the heart of the Rigol DL3021 Electronic Load.


The interest in electronic loads has spiked recently. They’re great for testing power supplies but they’re really seeing more bench and manufacturing floor use as battery-powered products and utility-grade solar panels proliferate. You can find inexpensive, bare-board, bare-bones 60W and 150W electronic loads for as little as $20 or $30 on eBay. These low-end loads are based on a power FET and a fan sink originally intended for cooling gamer PC processors. The next step up gets you a $300, 150W load that actually looks like a bench instrument. Then there are the high-end programmable loads with more precision and multiple load channels that cost thousands of dollars. Most of the low-end loads are based one or more power FETs controlled by an inexpensive microcontroller. That’s all the electronic control they need for the time, voltage, and current resolution they provide.


The Rigol DL3021 that’s the subject of Dave Jones’ latest teardown is a $499, 200W, 150V, 40A, single-channel electronic load. There’s also a 350W version called the Rigol DL3031 and there are more expensive “A” versions of the Rigol DL3021 and DL3031 electronic loads with better and faster specs, more features, and color screens. This is also something that Jones’ opines about in his teardown video.



Rigol DL3000 Series Electronic Load.jpg 



So why is there a Xilinx Spartan-6 FPGA deep inside the design of the Rigol DL3021? Only Rigol knows for sure but I think there are visible clues in the design of the instrument. For starters, the instrument is designed with three major electronics subsystems. The main board holds the power FET array that does the electronic load’s heavy lifting. The power FETs are mechanically and thermally attached to a large aluminum box heat sink with a large fan on one end to force cooling air through the box. In the 200W Rigol DL3021 that Dave examines, the power FET array is partially populated. The missing power FETs likely appear in the 350W Rigol DL3031 and DL3031A loads.


The controller on the main power FET board is a Xilinx Spartan-6 LX9 FPGA. That’s a “small” FPGA with “only” 9152 logic cells, 576Kbits of memory, and 16 DSPs. For a “small” device, the Spartan-6 LX9 packs ample processing power to handle the real-time monitoring and control of the load’s power FETs. That’s also true for the faster “A” variants of the Rigol DL3021 and DL3031 electronic loads.



Rigol DL3000 Series Electronic Load Spartan-6 FPGA Detail.jpg 




You’ll also find a processor board plugged into a PCIe slot in the Rigol DL3021 load’s main board. The processor board obviously handles the instrument’s front panel and graphical user interface and the I/O ports on the back of the instrument including LXI Ethernet, USB, and RS-232. The processor board uses a Freescale/NXP i.MX283 Multimedia Applications Processor with an ARM 926EJ-S processor running at 454 MHz with 16Kbytes of instruction cache and 32Kbytes of data cache. The i.MX283 Applications processor operates the I/O ports and manages the front panel (the electronic load’s third major electronic subsystem) over the PCIe bus that spans the main board.


Why not use the i.MX283 processor to control everything? For precision time measurement and control, you really should not use a cached processor because you can’t depend on getting instrument-grade timing consistency from cached software running on a processor. You will get absolutely dependable, consistent, deterministic timing from an FPGA and in a precision electronic load, that’s exactly what you want.


Here’s Jone’s 45-minute EEVblog teardown video of the Rigol DL3021 Electronic Load with all of Dave’s insights and his usual, colorful commentary:















This week, EXFO announced and demonstrated its FTBx-88400NGE Power Blazer 400G Ethernet Tester at the ECOC 2017 optical communications conference in Gothenburg, Sweden using a Xilinx VCU140 FPGA design platform as an interoperability target. The VCU140 development platform is based on a Xilinx Virtex UltraScale+ VU9P FPGA. EXFO’s FTBx-88400NGE Power Blazer offers advanced testing for the full suite of new 400G technologies including support for FlexE (Flex Ethernet), 400G Ethernet, and high-speed transceiver validation. The Flex Ethernet (FlexE) function supports one or more bonded 100GBASE-R PHYs supporting multiple Ethernet MAC operating at a rate of 10, 40, or n x 25Gbps. Flex Ethernet is a key data center technology that helps data centers deliver links that are faster than emerging 400G solutions.


Here’s a photo of the ECOC 2017 demo:




EXFO FTBx-88400NGE Power Blazer Demo.jpg 




This demonstration is yet one more proof point for the 400GbE standard, which will be used in a variety of high-speed communications applications including data-center interconnect, next-generation switch and router line cards, and high-end OTN transponders.




Farhad Fallahlalehzari, an Aldec Application Engineer, has just published a blog on the Aldec Web site titled “Demystifying AXI Interconnection for Zynq SoC FPGA.” So if it’s a mystery, please click over to that blog post so that the topic will no longer mystify.



Adam Taylor’s MicroZed Chronicles, Part 216: Capturing the HDMI video mode with the ADV7611 HDMI FMC

by Xilinx Employee ‎09-18-2017 11:33 AM - edited ‎09-18-2017 11:34 AM (1,324 Views)


By Adam Taylor



With the bit file generated, we are now able to create software that configures the ADV7611 Low-Power HDMI Receiver chip and the Zynq SoC’s VTC (Video Timing Controller). If we do this correctly, the VTC will then be able to report the input video mode.


To be able to receive and detect the video mode, the software must perform the following steps:


  • Initialize and configure the Zynq SoC’s I2C controller for master operation at 100KHz
  • Initialize and configure the VTC
  • Configure the ADV7611
  • Sample the VTC once a second, reporting the detected video mode





ZedBoard, FMC HDMI, and the PYNQ dev board connected for testing




Configuring the I2C and VTC is very simple. We have done both several times throughout this series (See these links: I2C, VTC.) Configuring the ADV7611 is more complicated and is performed using I2C. This is where this example gets a little complicated as the ADV7611 uses eight internal I2C slave addresses to configure different sub functions.







To reduce address-contention issues, seven of these addresses are user configurable. Only the IO Map has a fixed default address.


I2C addressing uses 7 bits. However, the ADV7611 documentation specifies 8-bit addresses, which includes a Read/Write bit. If we do not understand the translation between these 7- and 8-bit addresses, we will experience addressing issues because the Read/Write bit is set or cleared depending on the function we call from XIICPS.h.


The picture below shows the conversion from 8-bit to 7-bit format. The simplest method is to shift the 8-bit address one place to the right.







We need to create a header file containing the commands to configure each of the eight ADV7611’s sub functions.


This raises the question of where to obtain the information to configure the ADV7611 device. Rather helpfully, the  Analog Devices engineer zone, provides several resources including a recommended registers settings guide and several pre-tested scripts that you can download and use to configure the device for most use cases. All we need to do is select the desired use case and incorporate the commands into our header file.


One thing we must be very careful with is that the first command issued to the AD7611 must be an I2C reset command. You may see a NACK on the I2C bus in response to this command as the reset asserts very quickly. We also need to wait an appropriate period after issuing the reset command before continuing to load commands. In this example, I decided to wait the same time as following a hard reset, which the data sheet specifies as 5msec.


Once 5msec has elapsed following the reset, we can continue loading configuration data, which includes the Extended Display Identification Data (EDID) table. The EDID identifies to the source the capabilities of the display. Without a valid EDID table, the HDMI source will not start transmitting data.


Having properly configured the ADV7611, we may want to read back registers to ensure that it is properly configured or to access the device’s status. To do this successfully, we need to perform what is known as a I2C repeat start in the transaction following the initial I2C write. A repeat start is used when a master issues a write command and then wants to read back the result immediately. Issuing the repeat start prevents another device from interrupting the sequence.


We can configure the I2C controller to issue repeat starts between write and read operations within our software application by using the function call XIicPs_SetOptions(&Iic,XIICPS_REP_START_OPTION). Once we have completed the transaction we need to clear the repeat start option using the XIicPs_ClearOptions(&Iic,XIICPS_REP_START_OPTION) function call. Otherwise we may have issues with communication.


Once configured, the ADV7611 starts free running. It will generate HDMI Frames even with no source connected. The VTC will receive these input frames, lock to them and determine the video mode. We can obtain both the timing parameters and video mode by using the VTC API. The video modes that can be detected are:







Initially in its free-running mode, the ADV7611 outputs video in 480x640 pixel format. Checking the VTC registers, it is also possible to observe that the detector has locked with the incoming sync signals and has detected the mode correctly, as shown in the image below:







With the free-running mode functioning properly, the next step is to stimulate the FMC HDMI with different resolutions to ensure that they are correctly detected.


To test the application, we will use a PYNQ Dev Board. The PYNQ is ideal for this application because it is easily configured for different HDMI video standards using just a few lines of Python, as shown below. The only downside is the PYNQ board does not generate fully compliant 1080P video timing.



SVGA video outputting 800 pixels by 600 lines @ 60Hz






720P video outputting 1280 pixels by 720 Lines @ 60 Hz






SXGA video outputting 1280 pixels by 1024 lines @ 60Hz







Having performed these tests, it is clear the ADV7611 on the FMC HDMI is working as required and is receiving and decoding different HDMI resolutions correctly. At the same time, the VTC is correctly detecting the video mode, enabling us to capture video data on our Zynq SoC or Zynq UltraScale+ MPSoC systems for further processing.


The FMC HDMI has another method of receiving HDMI that equalizes the channel and passes it through to the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL for decoding. I will create an example design based upon that input over the next few weeks.


Note that we can also use this same approach with a MicroBlaze soft processor core instantiated in a Xilinx FPGA.




Code is available on Github as always.



If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




First Year E Book here

First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 




Second Year E Book here

Second Year Hardback here



MicroZed Chronicles Second Year.jpg





A news story from the University of Oxford dated yesterday says that carbon dating performed by scientists at Oxford’s Bodleian Libraries has pushed back the date of the Bakhshali manuscript by about half a millennium. That’s important because the Bakhshali manuscript, found near Peshawar, India (now Pakistan) in 1881, contains the earliest known use of a piece of technology that’s absolutely essential to the development of programmable logic. That piece of technology is…






Of course, without zero, there is no digital engineering. There is no binary math nor Boolean arithmetic. There’s nothing—but that’s not zero. Zero is so ingrained and pervasive in modern digital technology that it fades into the technological wallpaper in our engineering minds.


It was not always so. Zero as a full-fledged numeral does not appear in the historical record prior to the Bakhshali manuscript, which had been presumed to be contemporary with a 9th-century inscription of a symbolic representation of zero that appears on a temple wall in Gwalior, Madhya Pradesh, India. However, recent carbon dating pushes the creation date of the Bakhshali manuscript back in time to between 200 and 400 AD according to this short 3-minute video featuring Professor Marcus du Sautoy at the University of Oxford:







After I wrote this blog, I found another, more detailed video featuring Professor du Sautoy confirming the premise of this blog: that an important block that forms the foundation of all digital engineering including FPGAs is represented on the small piece of birch bark used to create the Bakhshali manuscript:






Ever seen an entire DSO fit inside of a probe? How about a wireless version? That describes the new €299, rechargeable, 30MHz, 200Msamples/sec IkaScope WS200 Wireless DSO from IkaLogic. Here’s a photo of the IkaScope in action:






The IkaScope DSO communicates over WiFi with computer or tablet




Here are the IkaScope’s specs:



IkaScope Specs.jpg


(Spec sheet here.)




Here’s a 90-second video showing the IkaScope in action:





The IkaScope will ship this month and IkaLogic plans to support wireless connection of the IkaScope to a variety of platforms including Windows, Linux, Mac OSx, iOS, and Android. Platform support apps will roll out over the next three months.


IkaLogic’s Ibrahim Kamal says that the IkaScope’s design is based on a Xilinx Spartan-3 FPGA, which was “quite enough” for the DSO’s 200MHz sample rate. Kamal says that the selected device “just fits the design.”


Except for the analog front end and the WiFi radio communications, the FPGA handles “almost everything” including:


  • Analog control
  • Triggering
  • Signal buffering
  • WiFi packet generation



Kamal also says that the Spartan-3 FPGAs internal memory offered a good way to save board space and the programmable I/O features were of great help when interfacing the FPGA to the analog front end’s ADC. In addition, the Spartan-3 FPGA communicates with the WiFi chip using a high-speed SPI, obviously built from the FPGA’s programmable logic. The design was created using VHDL.



Please contact IkaLogic directly for more information about the IkaScope.




Earlier this month, the CHIME (Canadian Hydrogen Intensity Mapping Experiment) radio telescope came online and started scanning the universe at an intergalactic scale from the Dominion Radio Astrophysical Observatory in the heart of Canada’s wine country in British Columbia. The radio telescope is helping to generate a 3D map of the hydrogen distributed across the observable universe (the part that can be observed from the northern hemisphere anyway). CHIME has multiple missions including:



  • Map the history of the expansion rate of the Universe by observing hydrogen gas in distant galaxies that were very strongly affected by dark energy.


  • Detect FRBs (fast radio bursts) to act as an early warning system for the wider astrophysical community.


  • Monitor known pulsars in the Northern sky to investigate the properties of neutron stars and ionized gas in the interstellar medium to help verify the predictions of general relativity and the search for gravitational waves.



Other than electrons, the CHIME radio telescope has no moving parts. Instead, the telescope consists of four parallel, adjacent cylindrical cylinders measuring 20x100m and oriented north-to-south. The telescope scans the heavens as the Earth turns. CHIME’s four reflectors feed 256 focal-point antennas located along each cylindrical axis (for a total of 1024 antennas) and each antenna generates signal feeds from two polarizations for a total of 2048 signal feeds. CHIME’s front-end electronics then sample each signal at 800Msamples/sec, resulting in 1.6384 Tsamples/sec (that’s tera samples per second!), resulting in a front-end feed of 13Tbps.


CHIME needs some serious front-end processing to handle this torrent of input data and that processing occurs in a pair of FPGA-based “F-Engines,” which are housed in two shielded 20-foot shipping containers located adjacent to the cylindrical reflector array.


Here’s a photo of the CHIME reflector array and the F-engine containers:




CHIME Radio Telescope with F-Engine Containers.jpg 


CHIME Radio Telescope with F-Engine Containers




The F-Engines convert each μsec of raw input data (with 2048 samples/μsec) into a spectral representation spanning 400MHz to 800MHz with a frequency resolution of 0.39MHz. They bin the spectral data and ship the processed signals to a GPU-based “X-Engine” via optical fiber.


Here’s a simplified block diagram of the CHIME radio telescope:




CHIME Block Diagram.jpg 


CHIME Block Diagram




CHIME’s F-Engines use the FPGA-based “ICE” system developed by the McGill Cosmology Instrumentation Laboratory specifically for the CHIME radio telescope requirements and the requirements of the South Pole Telescope. The ICE system addresses the needs of these two radio telescopes by combining high-density DSP with high-bandwidth networking—two of an FPGA’s greatest strengths. The ICE system hardware consists of FPGA motherboards, application-specific daughter boards, and crates with custom backplanes.


The ICE motherboard incorporates two industry-standard FMC connectors (for the application-specific daughter boards) that connect to a Xilinx Kintex-7 FPGA. The Kintex-7 FPGA also provides the board with its twenty-eight 10Gbps serial ports for inter-board networking and data offload. An on-board ARM co-processor running Linux manages motherboard resources and allows users to quickly implement high-level algorithms in C or other popular programming languages.





CHIME ICE Motherboard.jpg 


The ICE motherboard incorporates a Kintex-7 FPGA connected to 16 ADCs mounted on the two FMC daughter cards





CHIME F Engine.jpg 


The CHIME F-Engine




If this all looks somewhat familiar, you may be one of the 19,000 people who have read the Xcell Daily post from 2014 that describes CHIME Pathfinder—a smaller, pilot version of the now-operational CHIME radio telescope. (See “FPGAs Aid Search for Dark Energy with CHIME Telescope” for considerably more technical information about the ICE-based F-Engine electronics.)







Ryft deploys cloud-based search and analysis on Amazon’s FPGA-accelerated AWS EC2 F1 instance

by Xilinx Employee ‎09-13-2017 11:52 AM - edited ‎09-13-2017 12:00 PM (1,904 Views)


Ryft has announced that it now offers its Ryft Cloud cloud-based search and analysis tools on Amazon’s FPGA-accelerated AWS EC2 F1 instance through Amazon’s AWS Marketplace. When Xcell Daily last covered Ryft, the company had introduced the Ryft ONE, an FPGA-accelerated data analytics platform.  (See “FPGA-based Ryft ONE search accelerator delivers 100x performance advantage over Apache Spark in the data center.”)


Now you can access Ryft’s accelerated search and analysis algorithms instantly through Amazon’s EC2 F1 compute instance, which gets its acceleration from multiple Xilinx Virtex UltraScale+ VU9P FPGAs. According to Ryft, FPGA acceleration using the AWS EC2 F1 instance boosts application performance by 91X compared to traditional CPU-based cloud analytics.


How fast is that? Ryft has published a benchmark chart that shows you just how fast that is:




Ryft Acceleration Chart for AWS EC2 F1.jpg 




The announcement includes a link to a Ryft White Paper titled “Powering Elastic Search in the Cloud: Transform High-Performance Analytics in the AWS Cloud for Fast, Data-Driven Decisions.”




For more information about Amazon’s AWS EC2 F1 instance, see:











SDAccel—Xilinx’s development environment for accelerating cloud-based applications using C, C++, or OpenCL—is now available on Amazon’s AWS EC2 F1 instance. (Formal announcement here.) The Amazon EC2 F1 compute instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. SDAccel automates the acceleration of software applications by building application-specific FPGA kernels for the AWS EC2 F1. You can also use HDLs including Verilog and VHDL to define hardware accelerators in SDAccel. With this release, you can access SDAccel through the AWS FPGA developer AMI.



For more information about Amazon’s AWS EC2 F1 instance, see:









For more information about SDAccel, see:







Brandon Treece from National Instruments (NI) has just published an article titled “CPU or FPGA for image processing: Which is best?” on Vision-Systems.com. NI offers a Vision Development Module for LabVIEW, the company’s graphical systems design environment, and can run vision algorithms on CPUs and FPGAs, so the perspective is a knowledgeable one. Abstracting the article, what you get from an FPGA-accelerated imaging pipeline is speed. If you’re performing four 6msec operations on each video frame, a CPU will need 24msec (four times 6msec) to complete the operations while an FPGA offers you parallelism that shortens processing time for each operation and permits overlap among the operations, as illustrated from this figure taken from the article:




NI Vision Acceleration.jpg 



In this example, the FPGA needs a total of 6msec to perform the four operations and another 2msec to transfer a video frame back and forth between processor and FPGA. The CPU needs a total of 24msec for all four operations. The FPGA needs 8msec, for a 3x speedup.


Treece then demonstrates that the acceleration is actually much greater in the real world. He uses the example of a video processing sequence needed for particle counting that includes these three major steps:



  • Convolution filtering to sharpen the image
  • Thresholding to produce a binary image
  • Morphology to remove holes in the binary particles



Here’s an image series that shows you what’s happening at each step:



NI Vision Acceleration Steps.jpg 



Using the NI Vision Development Module for LabVIEW, he then runs the algorithm run on an NI cRIO-9068 CompactRIO controller, which is based on a Xilinx Zynq Z-7020 SoC. Running the algorithm on the Zynq SoC’s ARM Cortex-A9 processor takes 166.7msec per frame. Running the same algorithm but accelerating the video processing using the Zynq SoC’s integral FPGA hardware takes 8msec. Add in another 0.5msec for DMA transfer of the pre- and post-processed video frame back and forth between the Zynq SoC’s CPU and FPGA and you get about a 20x speedup.


A key point here is that because the cRIO-9068 controller is based on the Zynq SoC, and because NI’s Vision Development Module for LabVIEW supports FPGA-based algorithm acceleration, this is an easy choice to make. The resources are there for your use. You merely need to click the “Go-Fast” button.



For more information about NI’s Vision Development Module for LabVIEW and cRIO-9068 controller, please contact NI directly.





What happens when you host a genomic analysis application on the FPGA-accelerated Amazon AWS EC2 F1 instance? You get Edico Genome’s and DNAnexus’ dramatic announcement of a $20, 90-minute offer to analyze an entire human genome. Edico Genome previously ported the DRAGEN pipeline to Amazon’s FPGA instances and DNAnexus customers can now leverage Edico Genome’s Dragen app as a turnkey solution. DNAnexus provides a global network for sharing and managing genomic data and tools to accelerate genomics. New and existing DNAnexus customers have access to the DRAGEN app.


The two companies have launched a promotion, lasting from Aug. 28 to Oct. 31, where whole-genome analysis on the AWS EC2 F1 2x instances costs $20 and takes about an hour and a half. In the next few weeks, Edico Genome’s DRAGEN will be available through DNAnexus on the F1 16x instances as well, which reduces analysis time to 20 minutes or so. Whole-exome analysis will cost about $5 during the promotional period.


The Amazon AWS EC2 F1 instance is a cloud service that’s based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs installed in Amazon’s Web servers.




For more information about Edico Genome’s DRAGEN processor and genome analysis in Xcell Daily, see:









BrainChip Holdings has just announced the BrainChip Accelerator, a PCIe server-accelerator card that simultaneously processes 16 channels of video in a variety of video formats using spiking neural networks rather than convolutional neural networks (CNNs). The BrainChip Accelerator card is based on a 6-core implementation BrainChip’s Spiking Neural Network (SNN) processor instantiated in an on-board Xilinx Kintex UltraScale FPGA.


Here’s a photo of the BrainChip Accelerator card:



BrainChip FPGA Board.jpg 


BrainChip Accelerator card with six SNNs instantiated in a Kintex UltraScale FPGA




Each BrainChip core performs fast, user-defined image scaling, spike generation, and SNN comparison to recognize objects. The SNNs can be trained using low-resolution images as small as 20x20 pixels. According to BrainChip, SNNs as implemented in the BrainChip Accelerator cores excel at recognizing objects in low-light, low-resolution, and noisy environments.


The BrainChip Accelerator card can process 16 channels of video simultaneously with an effective throughput of more than 600 frames per second while dissipating a mere 15W for the entire card. According to BrainChip, that’s a 7x improvement in frames/sec/watt when compared to a GPU-accelerated CNN-based, deep-learning implementation for neural networks like GoogleNet and AlexNet. Here’s a graph from BrainChip illustrating this claim:




BrainChip Efficiency Chart.jpg 





SNNs mimic human brain function (synaptic connections, neuron thresholds) more closely than do CNNs and rely on models based on spike timing and intensity. Here’s a graphic from BrainChip comparing a CNN model with the Spiking Neural Network model:





BrainChip Spiking Neural Network comparison.jpg 



For more information about the BrainChip Accelerator card, please contact BrainChip directly.




Digilent sent me an Analog Discovery 2 early last month. (See “Just arrived: Digilent’s $279 Analog Discovery 2 multi-instrument based on a Spartan-6 FPGA and a ton of ADI chips.”) The Analog Discovery 2 lists a two-channel USB digital oscilloscope (1MΩ, ±25V, differential, 14-bit, 100MS/s, 30MHz+ bandwidth) and a two-channel arbitrary function generator (±5V, 14-bit, 100MS/s, 12MHz+ bandwidth) among its many integrated instruments (all made possible by the reprogrammable capabilities of a Xilinx Spartan-6 FPGA) and I wanted to break these particular instruments out so I could use them more easily with conventional BNC-terminated probes and cables. Now Digilent sells an Analog Discovery BNC Adapter Board with four BNC connectors for $19.99 so the most reasonable thing to do would be to order one.



Digilent BNC Adapter.jpg



Digilent’s $19.99 BNC Adapter Board for the Analog Discovery 2




I built my own instead. Why not? How hard could it be?


My first inclination was to get four BNC connectors and an extruded aluminum box and drill out the box for the connectors over at TechShop San Jose where I am qualified to use one of the Jet end mills. (Cue the “Tim ‘the Toolman’ Taylor” alpha male grunting.) I did get as far as stopping by the most excellent Excess Solutions surplus electronics store in San Jose where I picked up four panel-mount BNCs, but then an even better solution occurred to me. (More grunting.)


I got a 4-channel OdiySurveil Cat 5 Passive HD Transceiver balun box from Amazon for $12.59. These balun boxes are designed to take as many as four channels of HD video from coax cables and route them over long lengths (1500m !!!) of twisted-pair cabling. The balun box’s beefy steel enclosure is pre-punched for four BNCs and comes complete with four BNC connectors connected to an RJ45, Cat5 Ethernet connector and an 8-pin Phoenix Contact screw-terminal block through a pcb securely mounted inside of the box. It’s twelve bucks and change, nicely finished with no drilling or milling required, delivered to your door in two days by Amazon Prime.



Original Balun Box.jpg 



OdiySurveil Cat 5 Passive HD Transceiver




Here’s what it looks like inside:



Original Balun Box Interior.jpg 



OdiySurveil Cat 5 Passive HD Transceiver Unmodified Interior




Sure, there are some pesky balun transformers and a few other unneeded passive components on the board but I quickly popped those off the pcb with a screwdriver (more grunting), cleaned up the pads, and shorted out the appropriate connections on the board with eight short jumper wires. I still know which end of the soldering iron to hold, apparently.




Modified Balun Box Interior.jpg



OdiySurveil Cat 5 Passive HD Transceiver Modified Interior




Then, I needed to decide whether to use the box’s Phoenix Contact terminal block or the RJ45 connector to connect to the Analog Discovery 2’s double-row, 0.1-inch header strip. At first, I thought I’d use the terminal block but the more I considered that RJ45 connector, the better it looked. A quick trip over to IT here at Xilinx secured a Cat6 cable in a huge box of orphaned Ethernet cables.


I snipped one end off of the cable and then cut the outer insulation off of the cable end. There were the four unshielded, twisted-wire pairs that I needed for the connection to the Analog Discovery 2. I also ordered a reasonably large kit of crimpable “Dupont” pins with plastic headers from eBay for about $9 (including shipping).


Note: Somehow in the last 30 years, I missed the transition point where “Berg pins” became “Dupont pins.”



When everything arrived, I stripped the twisted pairs, crimped on the Dupont pins, and inserted the pins into a 2x5-pin plastic header.


A big piece of heat-shrink tubing cleaned up everything and there you go, one bulletproof breakout box. Finally, I made a printable label using Visio and attached it to the top of the box, which now looks like this:




Finished Breakout Box.jpg



Finished Analog Discovery 2 Breakout Box



Works great.



(Note: This isn't a great example of a wise make-versus-buy decision. If I weren't blogging this project, I'd probably just buy Digilent's adapter board.)




For more information about the Analog Discovery 2, please contact Digilent directly and see these previous Xcell Daily posts:






Ag Tech Precision Spray Autopilot uses Aerotenna’s new, Zynq-based Smart Drone Dev Platform

by Xilinx Employee ‎09-11-2017 11:45 AM - edited ‎09-11-2017 04:23 PM (2,470 Views)


Last week, Aerotenna announced its ready-to-fly Smart Drone Development Platform, based on its OcPoC with Xilinx Zynq Mini Flight Controller. Other components in the $3499 Drone Dev Platform include Aerotenna’s μLanding radar altimeter, three Aerotenna μSharp-Patch collision avoidance radar sensors, one Aerotenna CAN hub, a remote controller, and a pre-assembled quadcopter airframe:



Aerotenna Drone Dev Platform.jpg 




The OcPoC flight controller uses the Zynq SoC’s additional processing power and I/O flexibility—embodied in the on-chip programmable FPGA fabric and programmable I/O—to handle the immense sensor load presented to a drone in flight through sensor fusion and on-board, real-time processing. The Aerotenna OcPoC flight controller handles more than 100 sense inputs.


How well do all of these Aerotenna drone components work together? Well, one indication of how well integrated they are is another announcement last week—made jointly by Aerotenna and UAVenture of Switzerland—to roll out the Precision Spray Autopilot—a “simple plug-and-play solution, allowing quick and easy integration into your existing multirotor spraying drones.” This piece of advanced Ag Tech is designed to create smart drones for agricultural spraying applications.



Precision Spray Autopilot.jpg 


The Precision Spray Autopilot in Action




The Precision Spray Autopilot’s  features include:



  • Fly spray missions from a tablet or with a remote control
  • Radar-based, high-performance terrain following
  • Real-time adjustment of flight height and speed
  • In-flight spray-rate monitoring and control
  • Auto-refill and return to mission
  • Field-tested in simple and complex fields



What’s even cooler is this 1-minute demo video of the Precision Spray Autopilot in action:








ARM, Cadence, TSMC, and Xilinx have announced a collaboration to develop a CCIX (Cache Coherent Interconnect for Accelerators) test chip in TSMC’s 7nm FinFET process technology with a 2018 completion date. The test chip will demonstrate multiple ARM CPUs, CMN-600 coherent on-chip bus, and foundation IP communicating to other chips including Xilinx’s Virtex UltraScale+ FPGAs over the coherent, 25Gbps CCIX fabric. Cadence is supplying the CCIX controller and PHY IP for the test chip as well as PCIe Gen 4, DDR4 PHY, and Peripheral IP blocks. In addition, Cadence verification and implementation tools are being used to design and build the test chip. According to the announced plan, the test chip tapes out early in the first quarter of 2018, with silicon availability expected in the second half of 2018.


You can’t understand the importance of this announcement if you aren’t fully up to speed on CCIX, which Xcell Daily has discussed a few times in the recent past.


CCIX simplifies the design of offload accelerators for hyperscale data centers by providing low-latency, high-bandwidth, fully coherent access to server memory. The specification employs a subset of full coherency protocols and is ISA-agnostic, meaning that the specification’s protocols are independent of the attached processors’ architecture and instruction sets. Full coherency is unique to the CCIX specification. It permits accelerators to cache processor memory and processors to cache accelerator memory.


CCIX is designed to provide coherent interconnection between server processors and hardware accelerators, memory, and among hardware accelerators as shown below:



CCIX Configurations.jpg


Sample CCIX Configurations



The CCIX Consortium announced Release1 of the CCIX spec a little less than a year ago. CCIX Consortium members Xilinx and Amphenol FCI demonstrated a CCIX interface operating at 25Gbps using two Xilinx 16nm UltraScale+ devices through an Amphenol/FCI PCI Express CEM connector and a trace card earlier this year.


As the CCIX Consortium’s Web site says:


“CCIX simplifies the development and adoption by extending well-established data center hardware and software infrastructure.  This ultimately allows system designers to seamlessly integrate the right combination of heterogeneous components to address their specific system needs.”


For more information, see these earlier Xcell Daily CCIX blog posts:










By Adam Taylor



When we surveyed the different types of HDMI sources and sinks recently for our Zynq SoC and Zynq UltraScale+ MPSoC designs, one HDMI receiver we discussed was the ADV7611. This device receives three TDMS data streams and converts them into discrete video and audio outputs, which can then be captured and processed. Of course, the ADV7611 is a very capable and somewhat complex device. It requires configuration prior to use. We are going to examine how we can include one within our design.






ZedBoard HDMI Demonstration Configuration




To do this, we need an ADV7611. Helpfully, the FMC HDMI card provides two HDMI inputs, one of which uses an ADV7611. The second equalizes the TMDS data lanes and passes them on directly to the Zynq SoC for decoding.


To demonstrate how we can get this device up and running with our Zynq SoC or Zynq UltraScale+ MPSoC, we will create an example that uses the ZedBoard with the HDMI FMC. For this example, we first need to create a hardware design in Vivado that interfaces with the ADV7611 on the HDMI FMC card. To keep this initial example simple, I will be only receiving the timing signals output by the ADV7611. These signals are:


  • Local Locked Clock (LLC) – The pixel clock.
  • HSync – Horizontal Sync, indicates the start of a new line.
  • VSync – Vertical Sync, indicates the start of a new frame.
  • Video Active – indicates that the pixel data is valid (e.g. we are not in a Sync or Blanking period)


This approach uses the VTG’s (Video Timing Generator’s) detector to receive the sync signals and identify the received video’s timing parameters and video mode. Once the ADV7611 correctly identifies the video mode, we have configured correctly. It is then a simple step to connect the received pixel data to a Video-to-AXIS IP block and use VDMA to write the received video frames into DDR memory for further processing.


For this example, we need the following IP blocks:


  • VTC (Video Timing Controller) – Configured for detection and to receive sync signals only.
  • ILA – Connected to the sync signals so that we can see that they are toggling correctly—to aid debugging and commissioning.
  • Constant – Set to a constant 1 to enable the clock and detector enables.


The resulting block diagram appears below. The eagle-eyed will also notice the addition both a GPIO output and I2C bus from the processor system. We need these to control and configure the ADV7611.






Simple Architecture to detect the video type



Following power up, the ADV7611 generates no sync signals or video. We must first configure the device, which requires the use of an I2C bus. We therefore need to enable one of the two I2C controllers within the Zynq PS and route the IO to the EMIO so that we can then route the I2C signals (SDA and SCL) to the correct pins on the FMC connector. The ADV7611 is a complex device to configure with multiple I2C addresses that address different internal functions within the device. EDID and High-bandwidth Digital Content Protection (HDCP), for example.


We also need to be able to reset the ADV7611 following the application of power to the ZedBoard and FMC HDMI. We use a PS GPIO pin, output via the EMIO, to do this. Using a controllable I/O pin for this function allows the application software to reset of the device each time we run the program. This capability is also helpful when debugging the software application to ensure that we start from a fresh reset each time the program runs—a procedure that prevents previous configurations form affecting the next.


With the block diagram completed, all that remains is to build the design with the location constraints (identified below) to connect to the correct pins on the FMC connector for the ADV7611.






Vivado Constraints for the ADV7611 Design




Once Vivado generates the bit file, we are ready to begin configuring the ADV7611. Using the I2C interface this way is quite complex, so we will examine the steps we need to do this in detail in the next blog. However, the image below shows one set of the results from the testing of the completed software as it detects a VGA (640 pixel by 480 line) input:







VTC output when detecting VGA input format















Code is available on Github as always.



If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.




MicroZed Chronicles hardcopy.jpg




  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 



By Chetan Khona, Xilinx


The term Industrial IoT (IIoT) refers to a multidimensional, tightly coupled chain of systems involving edge devices, cloud applications, sensors, algorithms, safety, security, vast protocol libraries, human-machine interfaces (HMI), and other elements that must interoperate. If you’re designing equipment destined for IIoT networks, you have a lot of requirements to meet. This article discusses several.


Note: This article has been adapted from a new Xilinx White Paper titled “Key Attributes of an Intelligent IIoT Edge Platform.”



IT-OT Convergence


Some describe the IIoT as a convergence of information technology (IT) and operational technology (OT). The data-intensive nature of IT applications requires all these elements to come together with critical tasks performed reliably and on schedule. There’s usually a far more time-sensitive element to the OT applications. Designers generally meet these diverse IIoT requirements and challenges using embedded electronics at the IIoT edge (e.g., motion controllers, protection relays, programmable logic controllers, and similar systems) because embedded systems support deterministic communication and real-time control.


Equipment operating on IIoT networks at timescales on the order of hundreds of microseconds (or less) often need to operate in factories and remote locations for decades without being touched—but they can be updated remotely via the networks that connect them. Relying solely on multicore embedded processors in these applications can lead to a series of difficult and costly marketing and engineering trade-offs focused on managing functional timing issues and performance bottlenecks. A more advanced approach that manages determinism, latency, and performance while eliminating interference between the IT and OT domains and within subsystems in the OT domain produces better results.


Sometimes, you just need hardware to meet these challenges because software is just too slow, even when running on multiple processor cores. Augmenting static microprocessor architectures with specialized hardware to create a balanced division of labor is not a new concept in the world of embedded electronics. What is new is the need to adapt both the tasks and the division of labor over time. For example, an upgraded predictive-maintenance algorithm might require more sensor inputs than previous inputs—or entirely new types of sensors with new types of interfaces. These sensors invariably require local processing as well to offload the centralized cloud application that’s crunching the data from all of the edge nodes. Offloading the incremental sensor-processing calculations to hardware maintains the overall loading and avoids overburdening the edge processor.



TSN and Legacy Industrial Networks


The IIoT networks linking these new systems are equally dynamic. They evolve almost daily. Edge and system-wide protocols including OPC-UA (the OPC Foundation Open Platform Communications-Unified Architecture) and DDS (Data Distribution Service for Real-Time Systems) are gaining significant momentum. Both of these protocols benefit from time-sensitive networking (TSN), a deterministic Ethernet-based transport that manages mixed-criticality data streams. TSN significantly advances the vision of a unified network protocol across the edge and throughout the majority of the IIoT solution chain because it supports varying degrees of scheduled traffic alongside best-effort traffic.


The goal is to get TSN integrated into the IIoT Endpoint to enable scheduled traffic versus best-effort traffic with minimum impact on control function timing. Yet TSN is an evolving standard so using ASICs or ASSP chipsets developed before all aspects of the TSN standard and market-specific profiles are finalized carry some risk. Similarly, attempting to add TSN support to an existing controller using a software-only approach may exhibit unpredictable timing behavior and might not meet timing requirements.


Ultimately, TSN requires a form of time-awareness not available in controllers today. A good TSN implementation requires the addition of both hardware and software—something that’s easily done using a device that integrates processors and programmable hardware like the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC. These devices minimize the effects of adding TSN capabilities by implementing bandwidth-intensive, time-critical functions in hardware without significant impact to the software timing. (Xilinx offers an internally developed, fully standards-compatible, optimized TSN subsystem for the Zynq SoC and Zynq UltraScale+ MPSoC device families.)


Because industrial networking not new, IIoT systems will need to support the lengthy list of legacy industrial protocols that have been developed and used throughout the industry’s past. This need will exist for many years. Most modern SoCs don’t offer support and cannot easily be retrofitted for even a small fraction of these industrial protocols. In addition, the number of network interfaces that one controller must support can often exceed an SoC’s I/O capabilities. In contrast, the programmable hardware and I/O within Zynq SoCs and Zynq UltraScale+ MPSoCs easily support these legacy protocols without causing the unwanted timing side effects to mainstream software and firmware that a software-based networking approach might cause.




Security and the IIoT


IIoT design must follow a “defense-in-depth” approach to cybersecurity. Defense in depth is a form of multilayered security that reaches all the way from the supply chain to the end-customers’ enterprise and cloud application software. (That’s a very long chain—and one that requires its own article. This article’s scope is the chain of trust for deployed embedded electronics at the IIoT edge.)


With the network extending to the analog-digital boundary, data needs to be secured as soon as it enters the digital domain—usually at the edge. Defense-in-depth security requires a strong hardware root of trust that starts with secure and measured boot operations; run-time security through isolation of hardware, operating systems, and software; and secure communications. The entire network should employ trusted remote attestation servers for independent validation of credentials, certificate authorities, and so forth.


Security is not a static proposition. Five notable revisions have been made to the transport layer security (TLS) secure messaging protocol since 1995, with more to come. Cryptographic algorithms that underscore protocols like TLS can be implemented in software but such changes on the IT side can adversely affect time-critical OT performance. Architectural tools such as hypervisors and other isolation methods can reduce this impact but it is also possible to pair these software concepts with the ability to support new, and even yet-to-be-defined cryptographic functions years after equipment deployment if the design is based on devices that incorporate programmable hardware like the Zynq SoC and Zynq UltraScale+ MPSoC.




Edico Genome moves genomic analysis and storage to the cloud using Amazon’s AWS EC2 F1 Instance

by Xilinx Employee ‎09-08-2017 09:57 AM - edited ‎09-08-2017 10:00 AM (2,340 Views)


Edico Genome has been developing genetic-analysis algorithms for a while now. (See this Xcell Daily story from 2015, “FPGA-based Edico Genome Dragen Accelerator Card for IBM OpenPOWER Server Speeds Exome/Genome Analysis by 60x”). The company originally planned to accelerate its algorithm by developing an ASIC, but decided this was a poor implementation choice because of the rapid development of its algorithms. Once you develop an ASIC, it’s frozen in time. Instead, Edico Genome found that Xilinx FPGAs were an ideal match for the company’s development needs and so the company developed the Dragen Accelerator Card for exome/genome analysis.


This hardware was well suited to Edico Genome’s customers that wanted to have on-site hardware for genomic analysis but the last couple of years have seen a huge movement to cloud-based apps including genomic analysis. So Edico Genome moved its algorithms to Amazon’s AWS EC2 F1 Instance, which offers accelerated computing thanks to Xilinx UltraScale+ VU9P FPGAs. (See “AWS makes Amazon EC2 F1 instance hardware acceleration based on Xilinx Virtex UltraScale+ FPGAs generally available.”)


Edico Genome now offers cloud-based genomic processing and genomic storage in the cloud through Amazon’s AWS EC2 F1 Instance. Like its genomic analysis algorithms, the company’s cloud-based genomic storage takes advantage of the FPGA acceleration offered by Amazon’s AWS EC2 F1 Instance to achieve 2x to 4x compression. When you’re dealing with the human genome, you’re talking about storing 80Gbytes per genome so fast, 2x to 4x compression is a pretty important benefit.


This is all explained by Edico Genome’s VP of Engineering Rami Mehio in an information-packed 3-minute video:






Embedded C Coding Standard Book.jpg 

Embedded Systems Design magazine’s former editor-in-chief Michael Barr published the “Embedded C coding Standard” a decade ago and now he’d like for you to have a free PDF copy. Developing coding standards is not nearly as much fun as actually developing code, so getting a big head start with a standard developed by one of the world’s foremost embedded software experts is a huge advantage. Getting it for free—that’s even huger.


Oh, and that link above… it leads to an online HTML version of the Embedded C Coding Standard as well.


These are great resources if you are developing embedded systems based on the Xilinx Zynq SoC or Zynq UltraScale+ MPSoC.


Tell Michael that Steve sent you.






A new open-source tool named GUINNESS makes it easy for you to develop binarized (2-valued) neural networks (BNNs) for Zynq SoCs and Zynq UltraScale+ MPSoCs using the SDSoC Development Environment. GUINNESS is a GUI-based tool that uses the Chainer deep-learning framework to train a binarized CNN. In a paper titled “On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA,” presented at the recent 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, authors Haruyoshi Yonekawa and Hiroki Nakahara describe a system they developed to implement a binarized CNN for the VGG-16 benchmark on the Xilinx ZCU102 Eval Kit, which is based on a Zynq UltraScale+ ZU9EG MPSoC. Nakahara presented the GUINNESS tool again this week at FPL2017 in Ghent, Belgium.


According to the IEEE paper, the Zynq-based BNN is 136.8x faster and 44.7x more power efficient than the same CNN running on an ARM Cortex-A57 processor. Compared to the same CNN running on an Nvidia Maxwell GPU, the Zynq-based BNN is 4.9x faster and 3.8x more power efficient.


GUINNESS is now available on GitHub.




ZCU102 Board Photo.jpg 



Xilinx ZCU102 Zynq UltraScale+ MPSoC Eval Kit








Need to build a networking monster for financial services, low-latency trading, or cloud-based applications? The raw materials you need are already packed into Silicom Denmark’s SmartNIC fb4CGg3@VU PCIe card, which is based on a Xilinx Virtex UltraScale or Virtex UltraScale+ FPGA:



  • 1-to-16-lane PCIe Gen1/Gen2/Gen3
  • Optional 2x8 PCIe lanes on a secondary connector
  • Xilinx Virtex UltraScale+ (VU9P) or Virtex UltraScale (VU125 or VU080) FPGA (other FPGA sizes optional)
  • Four QSFP28 ports for 100G, 4x25G, or 4x10G optical modules or direct-attached copper cabling
  • 4Gbytes of DDR4-2400 SDRAM
  • SODIMM sockets for 4Gbytes of DDR4-2133 SDRAM




Silicom Denmark fb4CGg3 PCIe Card.jpg 


Silicom Denmark’s SmartNIC fb4CGg3@VU PCIe card




The SmartNIC fb4CGg3@VU PCIe card includes complete NIC functionality (TCP Offload Engine (TOE), UDP Offload Engine, and drivers).



Please contact Silicom Denmark directly for more information about the SmartNIC fb4CGg3@VU PCIe card.





The Xilinx Technology Showcase 2017 will highlight FPGA-acceleration as used in Amazon’s cloud-based AWS EC2 F1 Instance and for high-performance, embedded-vision designs—including vision/video, autonomous driving, Industrial IoT, medical, surveillance, and aerospace/defense applications. The event takes place on Friday, October 6 at the Xilinx Summit Retreat Center in Longmont, Colorado.


You’ll also have a chance to see the latest ways you can use the increasingly popular Python programming language to create Zynq-based designs. The Showcase is a prelude to the 30-hour Xilinx Hackathon starting immediately after the Showcase. (See “Registration is now open for the Colorado PYNQ Hackathon—strictly limited to 35 participants. Apply now!”)


The Xilinx Technology Showcase runs from 3:00 to 5:00 PM.


Click here for more details and for registration info.




Xilinx Longmont.jpg


Xilinx Colorado, Longmont Facility





For more information about the FPGA-accelerated Amazon AWS EC2 F1 Instance, see:










The Xilinx Hackathon is a 30-hour marathon event being held at the Xilinx “Retreat” (also known as the Xilinx Colorado facility in Longmont, but see the image below), starting on October 7. The organizers are looking for no more than 35 heroic coders who will receive a Python-programmable, Zynq-based Digilent/Xilinx PYNQ-Z1 board and an assortment of Arduino-compatible shields and sensors. The intent, as Zaphod Beeblebrox might say, is to create something not just amazing but “amazingly amazing.”




Xilinx Longmont.jpg 



Xilinx Colorado, Longmont Facility





  • Want to compete as a team? No problem. The Xilinx Hackathon rules allow teams as large as four people, but the body count is capped at 35 so if you have a large team you’d better get your name on the invite list early. Better yet, get your name on the list early even if you’re competing solo.


  • How much does it cost to enter? Zero. Zip, Nada, Nothing.


  • What are the prizes? We’re offering more than $2000 in cash prizes plus all competitors keep their PYNQ-Z1 boards. Also, winners and other amazingly amazing projects will get incredible, amazingly amazing recognition in the Xcell Daily blog, which will be covering this event. Your fame is assured.


  • What should you bring: “A laptop, laptop charger, phone, charger(s), headphones, a pillow, toiletries, an extra set of clothes, and a water bottle.” (It’s a 30-hour hackathon, but it’s at the Xilinx retreat (see photo, again)). Xilinx will provide you with three meals per day (good hackers will figure out how many meals they’ll get in 30 hours) as well as snacks, drinks, and caffeine stimulation.



  • Where do I sign up? Here. A crack Xilinx team will hand-select and invite the lucky 35 participants from this list. (That’s the Final Five times seven.)



In case you’ve not read about it, the PYNQ project is an open-source project from Xilinx that makes it easy to design high-performance embedded systems using Xilinx Zynq Z-7000 SoCs. Here’s what’s on the PYNQ-Z1 board:



  • Xilinx Zynq Z-7020 SoC with a dual-core ARM Cortex-A9 MPCore processor running at 650MHz
  • 512Mbytes of DDR3 SDRAM running at 1050MHz
  • 16Mbytes of Quad-SPI Flash memory
  • A MicroSD slot for the PYNQ environment
  • USB OTG host port
  • USB programming port
  • Gigabit Ethernet port
  • Microphone
  • Mono audio output jack
  • HDMI input port
  • HDMI output port
  • Two Digilent PMOD ports
  • Arduino-format header for Arduino shields
  • Pushbuttons, switches, and LEDs







The PYNQ-Z1 Board





For more information about the PYNQ project, see:









Amazon previews OpenCL Development Environment to FPGA-accelerated AWS EC2 F1 Instance

by Xilinx Employee ‎09-06-2017 10:40 AM - edited ‎09-06-2017 11:05 AM (2,312 Views)


Yesterday, Amazon announced a preview of an OpenCL development flow for the AWS EC2 F1 Instance, which is an FPGA-accelerated cloud-computing service based on Xilinx Virtex UltraScale+ VU9P FPGAs. According to Amazon, “…developers with little to no FPGA experience, will find a familiar development experience and now can use the cloud-scale availability of FPGAs to supercharge their applications.” In addition, wrote Amazon: “The FPGA Developer AMI now enables a graphical design canvas, enabling faster AFI development using a graphical flow, and leveraging pre-integrated verified IP blocks,” and "We have also upgraded the FPGA Developer AMI to Vivado 2017.1 SDx, improving the synthesis quality and runtime capabilities."


A picture is worth 1000 words:




Amazon AWS EC2 F1 Graphical Design.jpg 





For more information and to sign-up for the preview, please visit Amazon’s preview page



For more information about the Amazon EC2 F1 Instance based on Xilinx Virtex UltraScale+ FPGAs, see “AWS makes Amazon EC2 F1 instance hardware acceleration based on Xilinx Virtex UltraScale+ FPGAs generally available” and “AWS does a deep-dive video on the Amazon EC2 F1 Instance, a cloud accelerator based on Xilinx Virtex UltraScale+ FPGAs.”




Curious about using Amazon’s AWS EC2 F1 Instance? Want a head start? Falcon Computing in Santa Clara, California has a 2-day seminar just for you titled “Accelerate Applications on AWS EC2 F1.” It’s being taught by Professor Jason Cong from the Computer Science Department at the U. of California in Los Angeles and it’s taking place on September 28-29 at Falcon’s HQ.


Here’s the agenda:



Falcon Computing AWS F1 Instance Seminar Agenda.jpg 



Register here.


Please contact Falcon Computing directly for more information about this Amazon AWS EC2 F1 Instance Seminar.



For more information about the Amazon EC2 F1 Instance based on Xilinx Virtex UltraScale+ FPGAs, see “AWS makes Amazon EC2 F1 instance hardware acceleration based on Xilinx Virtex UltraScale+ FPGAs generally available” and “AWS does a deep-dive video on the Amazon EC2 F1 Instance, a cloud accelerator based on Xilinx Virtex UltraScale+ FPGAs.”


About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.