Audiophiles swoon over DirectStream DSD audio DAC based on Spartan-6 FPGA

by Xilinx Employee ‎02-12-2016 04:09 PM - edited ‎02-13-2016 09:10 AM (494 Views)


This blog post takes us into the rarefied world of audiophiles—people who are constantly on the prowl to find sweeter and sweeter sound. It’s a story about using a low-cost Spartan-6 FPGA to create a high-end, highly regarded audio product. Finally, it’s a personal story of stumbling back into the world of an old friend who undertook a 7-year journey to create this new audiophile product.



Ted Smith studied at MIT and then developed software at word-processing pioneer Wang before moving to Boulder, Colorado in the early 1980s where we worked together at EDA pioneer Cadnetix. Smith wrote a lot of the OS and EDA code that ran on 68010- and 68020-based engineering workstation hardware that I designed. (That was, ahem, almost 25 years ago.) After Cadnetix, Smith got into audio big time, developing code for digital audio workstations at Boulder-based Waveframe. He’s long been an audiophile.


In 2002, Smith heard an early demo of Sony’s multichannel DSD (Direct-Stream Digital) high-resolution digital audio format as implemented on an SACD (Super Audio CD) and the audiophile in him was hooked. From that point on, his goal has been perfecting equipment for the DSD format. Even though SACDs did not achieve the same success of the more mainstream audio CD, audiophiles and recording engineers continue to have a keen interest in Sony’s advanced DSD digital-audio format so there continues to be a high-end market for audio equipment that can do justice to the format’s sonic capabilities.


Ted wanted to build a DSD DAC to transform the digital DSD bit stream into audio. For his first experiment, he tapped the digital-audio stream from a $1500 Sony DVP-S9000ES DVD/SACD player, ran the stream through a hand-wired, passive, low-pass filter and “it sounded pretty good,” says Smith. He then built a prototype of the DSD DAC he envisioned.


That first prototype apparently didn’t sound very good.


The second one sounded worse.


So Smith developed a more ambitious prototype on a finished pc board. He says he put everything he knew into this third prototype as a last-ditch effort to prove out his ideas. If this third prototype failed to impress, Smith was done. “It sounded pretty darn good,” he says. Good enough, in fact to impress Gus Skinnas, a principal SACD mastering engineer, who then contacted Paul McGowan, CEO of PS Audio, an established high-end audio equipment manufacturer in Boulder. McGowan was also impressed—enough to want to manufacture a product based on Smith’s prototype.


One of Smith’s friends, a Xilinx employee (Xilinx has a Colorado facility near Boulder), suggested that Smith use an FPGA for his digital design. “Since I’m a software guy, FPGAs are great” says Smith. “I don’t have to turn the hardware to make changes.” The FPGA had extra capacity so Smith added digital-audio interfaces including PCM and TOSLINK, saying to himself “I’ll take care of the software later and make it work.”


“And I did,” he concludes.


After the first successful pc-board design, Smith went through a series of refinements, testing his design along the way using other audiophiles’ equipment and speakers so that his design was not accidentally tuned just for his personal audio system. McGowan says that when he got Smith’s system into PS Audio, it sounded remarkably better than the DAC he'd been making and selling for more than four years. That DAC did not handle DSD streams.


Smith’s audio DAC didn’t just sound good with SACDs, it made CDs sound better as well adds McGowan, “closer to high-resolution audio than I thought possible.” In fact, McGowan felt that Smith’s FPGA-based DAC stood up well against audiophile DACs costing $10,000 to $20,000. “It just slayed them,” he said. (Note: We’re talking about an audiophile DAC based on a low-cost Xilinx Spartan-6 FPGA here.)


The secret to Smith’s sweet-sounding DAC: jitter (actually lack of jitter). “DSD is incredibly sensitive to jitter,” says Smith. Clock jitter shows up directly as noise in the output stream. “People are laughing at me when I say I can hear 2psec,” says Smith, “but…when you have a good noise floor (120db), you can hear it.” Smith credits Xilinx Principle Engineer Austin Lesea with help on reducing jitter. Lesea is a jitter virtuoso. Here’s Lesea’s White Paper on the topic: “Jitter: Variations in the Significant Instants of a Clock or Data Signal.”


PS Audio introduced its $5995 DirectStream DAC, using Ted Smith's DAC design, on March 1, 2014.



PS Audio DirectStream DAC.jpg



PS Audio’s $5995 DirectStream DAC for Audiophiles, based on a Spartan-6 FPGA



Here’s a block diagram of what’s going on inside of the DirectStream DAC’s Spartan-6 FPGA:



PS Audio DirectStream DAC Signal Flow Diagram.jpg



DirectStream DAC Digital Signal Flow




How good is PS Audio’s DirectStream DAC? Here’s a list of awards it’s won:



  • Digital Product of the Year Stereophile
  • DAC of the Year TAS
  • Stereophile Recommended Component Class A+
  • DAC of the Year Hi-Fi+ 2015
  • Golden Ear Award, TAS 2014, 2015
  • Stereophile Editors’ Choice 2014
  • Digital Audio Review Knock Out award
  • Greatest Bits Award, AudioStream
  • Blue Moon Award,
  • Home Theater Review Best of 2014
  • Positive Feedback Brutus Award
  • Audiophilia Star Component
  • Audio Excellence Award 2015



But PS Audio’s DirectStream DAC isn’t the product that suddenly appeared on my radar and prompted me to write this blog. PS Audio just introduced a cost-reduced version of the DirectStream DAC: the $3999 DirectStream Junior DAC.




PS Audio DirectStream Jr DAC.jpg



PS Audio’s $3995 DirectStream Junior DAC for Audiophiles, based on a Spartan-6 FPGA



Where did the cost reduction come from? The DirectStream Junior DAC uses a simplified, lower-cost chassis; a simplified, less expensive user interface instead of a touchscreen; one smaller, simplified pc board with less expensive output stages (passively filtered amplifiers with direct-drive outputs instead of high-current analog video amplifiers driving a specially designed audio output transformer); and a simplified system power supply—but it has exactly the same Ted Smith DSD audio-processing engine instantiated in exactly the same low-cost Spartan-6 FPGA producing exactly the same output bit stream for the low-pass filter.


Here’s a photo of the simplified pc board in PS Audio’s DirectStream Junior DAC:



PS Audio DirectStream Jr DAC pcb.jpg


PC Board detail in PS Audio’s DirectStream Junior DAC



So, was anything lost in the cost reduction? According to the Ultra High-End Audio and Home Theater Review:


“The two instruments have near identical character of sound. Full, rich, warm, never electronic. Both units share the remarkable ability of helping Red Book CDs sound close to high resolution audio, and uncover a wealth of music long buried in home libraries.”


Oh, and one more benefit to using an FPGA to implement the DSD audio: you can upgrade the configuration with a USB memory stick.


If you’d like more technical details, click here for PS Audio’s White Paper.



Note: Much of this blog is based on Steve Rochlin’s 30-minute video interview with Ted Smith and Paul McGowan posted on Enjoy the Here’s that interview:






Say Hi to ARTY: Win a free ARTY Artix-7 FPGA Dev Kit at Embedded World, February 23 and 24

by Xilinx Employee ‎02-12-2016 09:31 AM - edited ‎02-12-2016 09:31 AM (405 Views)


Do you like to win free stuff? Are you going to the Embedded World show in Nuremberg later this month? How about winning a Digilent ARTY Dev Kit based on a low-power Artix-7 A35T -1LI FPGA that includes a license for a device-locked copy of Xilinx Vivado HLx Design Edition?


Sounds good, doesn’t it? And the entry requirements are pretty simple:


Come to the Xilinx booth at Embedded World in Nuremberg—Hall 1, stand 1-205, take a photo of our booth, and post the picture to the Twitter account @XilinxEMEA along with a catchy caption and the hashtag #SayHiToARTY.


That’s it. That’s your entry to win an ARTY Dev Kit. The winner will be drawn at random at 12.30 pm on each of the competition’s two days from the correctly completed entries.


Note: You must be following @XilinxEMEA on Twitter to win.



ARTY Board v2 White.jpg



The Digilent ARTY Artix-7 FPGA Dev Board




Hope to see you at Embedded World.


And now, the official contest legal boilerplate:



Twitter Competition


The Competition promoter is Xilinx Ireland, Logic Drive, Citywest Business Campus, Dublin 24, Ireland (“Xilinx”).


  1. The Competition will run on two days and will be open on 23 February, 2016 at 9:00am CET and close at 11.30am CET on and will be open again on 24 February, 2016 from 9.00am CET to 11.30am CET (the “Competition Period”).


  1. To qualify for entry and be considered to be considered as “Qualifying Entries”, entrants must:


  • come to Xilinx stand Hall 1, stand 1-205;
  • take a photo of the Xilinx booth; and
  • post it to the Twitter account @XilinxEMEA along with a caption and the hashtag #SayHiToARTY.


Your entry will not be considered unless you are following @XilinxEMEA. Incomplete entries and entries that do not satisfy the requirements of entry will be disqualified and will not be considered.


  1. One prize is available each day on each day of the Competition. The prize is a Digilent, ARTY board (the “Prize”). The winner will be drawn at random at 12.30 pm on each day of the Competition, from the Qualifying Entries, by an independent adjudicator appointed in the sole discretion of Xilinx.


  1. There is no entry fee and no purchase necessary to enter this competition.


  1. Winners will be contacted via their Twitter account to arrange pick up of their prize. Xilinx takes no responsibility for postage or shipping of the prize in circumstances where winner is unable to arrange collection of the prize.


  1. Winners may be required to show valid identification before receiving their prize.


  1. Entrants must be over the age of 18.


  1. Employees of Xilinx Ireland (or the Xilinx group of companies) or their family members shall not be permitted to enter the competition.  Third parties involved in the provision of the competition or coordinating the competition set-up shall not be permitted to enter the competition.


  1. The winners agree and consent to the use of their name, image and nationality for advertising and publicity purposes in local and online media without additional remuneration.


  1. No responsibility can be accepted for entries not received for whatever reason.


  1. To the maximum extent permitted by applicable law, Xilinx shall not be liable for any loss, damage or injury resulting from acceptance, possession, use, or misuse of the Prize, except for any injury to life, body or health, or any loss or damages arising from intentional misconduct or grossly negligent breach of duty by Xilinx.


  1. Xilinx reserves the right to cancel or amend the competition and these terms and conditions without notice in the event of a catastrophe, war, civil or military disturbance, act of God or any actual or anticipated breach of any applicable law or regulation or any other event outside of its control. Any changes to the competition will be notified to entrants as soon as possible by Xilinx.


  1. Entry of the Competition signifies your acceptance of these terms.



It’s been a long, long wait.


Einstein’s theory of general relativity (GR)—first presented in November, 1915—predicted the existence of gravitational waves, which are ripples in the fabric of spacetime that propagate at the speed of light. After 100 years, gravitational waves were the last consequence of Einstein’s GR awaiting experimental verification. Other GR consequences including time dilation, gravitationally-induced frequency shifting of light, gravitational light lensing, and the orbital precession and orbital decay of celestial bodies have all been directly observed. Although the existence of gravitational waves was indirectly confirmed by noting that the observed orbital decay of a binary pulsar was consistent with GR’s prediction of energy loss through gravitational radiation (which causes gravitational waves), direct observation of gravitational waves has eluded scientists… until today’s announcement in Washington, DC made by the LIGO Scientific Collaboration.


Here’s an 8-minute video of the announcement that visually illustrates what the scientists think happened to the black holes:





(The full-length announcement video is here and the actual announcement starts at 27:14.)


The experimental search for gravitational waves to directly verify the last remaining unconfirmed GR prediction started more than half a century ago, during the 1960s. Researchers working on the Advanced LIGO (Laser Interferometer Gravitational-wave Observatory) team announced today that the recently upgraded Advanced LIGO experiment detected the gravitational waves resulting from the cataclysmic collision of two closely orbiting black holes to create one bigger black hole about 1.3 billion light years distant from Earth. Advanced LIGO, called in today’s press conference announcing the findings “the most precise measuring device ever built,” uses a precision timing-distribution system based on Xilinx Spartan-3E FPGAs.


The LIGO project started in 1992 with funding from the US National Science Foundation. Two LIGO observatories were built—one in Hanford, Washington and the other in Livingston, Louisiana. These sites are separated by 3,002 kilometers (1,865 miles)—roughly ten light milliseconds—and that’s important to the story because GR predicts that gravitational waves should travel at the speed of light. So a gravitational wave passing through one LIGO site should arrive at the other site in something like a few milliseconds.


Each LIGO observatory consists of an L-shaped tube. Each tubular leg of the “L” is 4km long and the tubes are evacuated—they’re in vacuum. The tubes house laser interferometers capable of detecting slight length differences between the two legs of the L-shaped tube. The theory is that a gravitational wave will alter the fabric of spacetime enveloping the observatory, shrinking one leg of the tube while stretching the other in a cyclic manner as the wave passes. (We’re talking about spacetime distortions predicted to be on the order of 10-18m—which is smaller than the width of a proton.) LIGO’s laser interferometer is designed to detect these variations in spacetime, allowing direct observation of gravitational waves.


LIGO operated for eight years, from 2002 to 2010, without observing any gravitational waves. Then the observatories were shut down for a 5-year retrofit to improve the interferometers’ sensitivity by as much as an order of magnitude. The retrofit was finished in mid-September, 2015 and the interferometers went on line with 3x improved sensitivity—10x improvement is the ultimate goal.


Today’s announcement by the LIGO team said that the refurbished LIGO experiments observed gravitational waves on September 14, 2015. First, the LIGO facility in Louisiana detected a gravitational wave. Seven milliseconds later, the Washington facility detected a wave with the same signature, thus observationally confirming the last remaining GR prediction and ending a 100-year search.


As you can see from this very brief description, time and precise time measurement play an important role in the LIGO experiment. The absolute time and the relative arrival times of a passing gravitational wave at the interferometers located at the two observatories are crucial for determining the direction of the celestial source generating the detected gravitational waves. In addition, timing at different parts of each interferometer must be closely synchronized, which is complicated by the large size of the gravitational-wave detectors (4km per leg).


The upgraded Advanced LIGO interferometers use a timing-distribution system that’s slaved to GPS time. A collection of master/slave modules distributes UTC-synchronized timing information over an optical fiber network configured in a star topology that runs throughout each facility with enough precision to satisfy experimental needs. Events are time-stamped with 15nsec of overall precision. The timing system also synchronizes ADCs, DACs, and processors in the interferometer detection subsystems. According to published literature, the timing modules are based on Xilinx Spartan-3E FPGAs. For more information about the design of these timing modules, see:




Now it may seem strange to be discussing Spartan-3E FPGAs, which were introduced way back in 2005 based on “absolutely ancient” 90nm IC process technology, but it’s an important aspect of the experimental equipment design when you’re discussing a long-term scientific project initiated in 1992 with a 5-year upgrade that started in 2010. The second paper referenced above is from 2007. So this small aspect of the LIGO project speaks to the long lifespan of Xilinx semiconductor products and to their long-term support, something that’s pretty important to many Xilinx customers like the LIGO Scientific Collaboration, and to Xilinx.




Personal note: The Advanced LIGO interferometer owes a lot to the Michelson Interferometer developed by Albert Michelson in 1881 and the improved interferometer developed by Michelson and Edward Morley in 1887 to detect luminiferous aether, theorized at the time to be the medium that propagated electromagnetic waves through space. No such medium was detected by these experiments, conducted at the Case School in Cleveland (which subsequently became the Case Institute of Technology and is now Case Western Reserve University or CWRU, my alma mater), or by subsequent aether experiments conducted as recently as 2009 using masers, lasers, and optical resonators. (Einstein’s GR neither requires nor precludes aether.) However, these negative findings still don’t prove the non-existence of luminiferous aether, just as a half century of unsuccessful searching didn’t prove the non-existence of gravitational waves. As of today, we know that they do exist. Einstein proves right once again.


And just because he’s cool, because he explains how an interferometer works, and because he says that the Michelson-Morley experiment transformed experimental science, here’s Neil deGrasse Tyson to explain the experiment:









In a paper published at the recent SC15 (accompanying poster here), Ashish Sirasao, Elliott Delaye, Ravi Sunkavalli, and Stephen Neuendorffer of Xilinx describe their use of the OpenCL language and the Xilinx SDAccel Design Environment to accelerate execution of the Smith-Waterman alignment algorithm, which is used for genome sequencing. Smith-Waterman algorithmic performance is measured in GCUPS (billions of cell updates per second) and, taking a quick shortcut to the reported result, the systolic array architecture implemented for this FPGA-accelerated Smith-Waterman algorithm and instantiated in a Xilinx Virtex-7 690T FPGA on an off-the-shelf Alpha Data ADM-PCIE-7V3 PCIe card runs:


  • 3.9x faster with nearly 19x better performance/W than it does on a 12-core Intel X86 server CPU
  • 6x faster with more than 21x better performance/W than it does on a 60-core Intel Xeon Phi MIC (Many Integrated Core Architecture) coprocessor
  • 30% faster with nearly 12x better performance/W than it does on an nVidia Tesla K40 GPU with 2880 stream processors



 Alpha Data ADM-PCIE-7V3.jpg


Alpha Data ADM-PCIE-7V3 PCIe card based on a Xilinx Virtex-7 690T FPGA




Here are the Smith-Waterman performance results, taken from the SC15 poster:



SDAccel Smith-Waterman Performance from SC15.jpg 



Saying that these performance and performance/W results are significant is putting it mildly.


The diagram below from the SC15 poster shows why the Smith-Waterman algorithm is well-suited to a highly parallel systolic-processing approach:



Smith-Waterman Systolic Processing.jpg 



Of course, large FPGAs like the Xilinx Virtex-7 690T have abundant parallel computing resources so they are adept at implementing highly parallel compute engines such as the systolic array needed to efficiently execute the Smith-Waterman algorithm.


The authors’ experiments with FPGA-based Smith-Waterman algorithm implementations were multi-dimensional. In one dimension, the experiments determined the optimal number of systolic cells per OpenCL kernel versus the number of instantiated kernel instances needed to obtain maximum algorithmic performance. In this implementation, that number turns out to be 32 systolic cells per OpenCL kernel based on numerical analysis of the results, as shown in the diagram below (taken from the poster).



Smith-Waterman Optimal PE per OpenCL kernel.jpg 


Several more experimental dimensions are represented by performance and performance/W comparisons with the Smith-Waterman algorithm running on the 12-core Intel Xeon CPU, the 60-core Intel Xeon Phi MIC coprocessor, and the nVidia Tesla K40 GPU (as reviewed in the results table appearing a few paragraphs above).


Perhaps the most significant result however is not necessarily the FPGA implementation’s better performance or even the vastly superior performance/W but the ease-of-use result. This paper demonstrates how you can compile OpenCL code using SDAccel to successfully implement high-performance, low-power systolic arrays on FPGAs—something that was previously possible only by writing RTL code. It’s that sort of result that will put FPGA acceleration into more data centers more quickly than anything else.


Here’s a thumbnail image of the SC15 Poster, which capsulizes the information from the paper:



Smith-Waterman SDAccel Poster.jpg 



If this real-world example has piqued your curiosity about algorithmic FPGA-acceleration or SDAccel, you might want to read:



Yesterday, I wrote a blog about Pentek’s use of the Virtex-6 FPGA as a platform for developing a series of CompactPCI, AMC, PCIe, and VPX modules all supporting the VITA 49 Radio Transport Standard. (See “Virtex-6 FPGA powers Pentek VITA 49 Radio Transport Standard CompactPCI/AMC/PCIe/VPX Modules for SDR.”) As it happens, Pentek has extended that platform design generationally using the Xilinx Virtex-7 FPGA family to create the FlexorSet family of higher-performance analog I/O boards. Currently there are two such boards in the family:




Pentek equipped both FlexorSet boards with four 250Msamples/sec, 16-bit ADCs and two 800Msamples/sec 16-bit DACs. The design employs an FMC mezzanine connector in these applications to reuse one board-level analog converter design across both the VPX and PCIe FlexorSet boards, placing the ADCs and DACs on the Flexor Model 3312 FMC mezzanine card as shown in this photo of the two FlexorSet boards:



Pentek FlexorSet Boards.jpg 


The Pentek FlexorSet Model 5973-312 3U VPX and FlexorSet Model 7070-312 PCIe boards




Here’s a block diagram of the Pentek FlexorSet Model 5973-312 3U VPX board:


 Pentek FlexorSet Model 5973-312 Board Block Diagram.jpg


Pentek FlexorSet Model 5973-312 3U VPX Board Block Diagram




Here’s a block diagram of the Pentek FlexorSet Model 7070-312 3U VPX board:



 Pentek FlexorSet Model 7070-312 Board Block Diagram.jpg


Pentek FlexorSet Model 7070-312 3U VPX Board Block Diagram



Note the similarities between the two diagrams and the similarity of the above two block diagrams with the Pentek VITA 49 Radio Transport Standard modules based on the Xilinx Virtex-6 FPGA:




Pentek Cobalt VITA-49 Module Block Diagram.jpg



Pentek VITA 49 Radio Transport Standard Module Block Diagram




The similarity carries more deeply than the board-level design, reaching down into the FPGA itself as shown in the block diagrams from the datasheets of the two board series:



Pentek VITA 49 Module FPGA Detail.jpg 


Pentek VITA 49 Radio Transport Standard Module, FPGA detail





Pentek FlexorSet Board, FPGA Detail.jpg


Pentek FlexorSet Board, FPGA detail



These Pentek high-performance, board-level products demonstrate several extremely important factors in FPGA-based system design for extended product families. First, Pentek has used FPGAs to support the development of multi-member product families:


  • The seven-member Vita 49 Radio Transport Standard Module family based on the Virtex-6 FPGA
  • The two-member FlexorSet board family based on the Virtex-7 FPGA


The programmable nature of the Xilinx FPGAs allow Pentek to create one core design that can then be modified to meet different host-bus I/O requirements.


Second, it certainly appears from the FPGA-level block diagrams that Pentek has been able to move much of its key design IP from the company’s Virtex-6 platform to the newer Virtex-7 platform. This sort of IP reuse really accelerates product design and cuts development costs.


Third, use of the newer Virtex-7 FPGA family in the FlexorSet designs permits Pentek to step up to faster ADCs (250Msamples/sec), to add a couple of fast DACs, to support dynamic partial reconfiguration of the FPGA through the company’s GateXpress FPGA Configuration Manager, and to add as many as twelve 1Gbps optical interconnect channels compatible with VITA 66 on the VPX version of the board and brought out on an MTP optical connector on the PCIe version.


As VP and Pentek cofounder Roger Hosking says in a recent interview on, “…you can be sure that every few months there’s new technology around to allow you to do those things with greater speeds and greater processing power as well as smaller package sizes and higher densities.”






Brief demo at SC15 by NetCOPE shows the company’s 100G Ethernet boards in action

by Xilinx Employee ‎02-09-2016 10:24 AM - edited ‎02-09-2016 10:26 AM (616 Views)


The 2-minute video below shows two 100G Ethernet cards from NetCOPE Technologies, the 2-port NFB-100G2 and the single-port NFB-100G1, in action. Both boards are based on Xilinx Virtex-7 580T FPGAs, which have 48 on-chip, 13.1Gbps GTH transceivers and more than half a million logic cells, so you can use these boards as development platforms to implement a variety of high-speed networking applications including high-speed packet capture or high-speed electronic trading.







The following video, a scant 64 seconds long, shows an FPGA-accelerated FFmpeg video-scaling app running in real time on an IBM Power8 server. The app was developed using the Xilinx SDAccel Development Environment and runs on an Alpha Data accelerator card plugged directly into the server.






Here’s one very short, 80-second video shot late last year at SC15 showing two machine-learning demos running on Xilinx Virtex-7 FPGAs. The first shows an image-recognition algorithm from Auviz Systems running on the Virtex-7 FPGA. The application is written in OpenCL and was compiled for the Virtex-7 FPGA using the Xilinx SDAccel development environment. The FPGA-accelerated version of the algorithm processes 600 images/sec while consuming 28W. The unaccelerated version of the algorithm running on an X86 CPU processes 125 images/sec while consuming 90W. The Virtex-7 FPGA delivers a 15x performance/W improvement in this demo.


The second Virtex-7 FPGA demo, from MulticoreWare, is a vehicle-detection algorithm. MulticoreWare developed this application for FPGA hosting in just three weeks with no prior FPGA design experience using the SDAccel Development Environment. The FPGA-accelerated version of the algorithm runs about 20x faster than on an unaccelerated x86 CPU.


Here’s the 80-second video:





Virtex-6 FPGA powers Pentek VITA 49 Radio Transport Standard CompactPCI/AMC/PCIe/VPX Modules for SDR

by Xilinx Employee ‎02-08-2016 02:27 PM - edited ‎02-08-2016 03:46 PM (897 Views)


The Pentek Cobalt 71664 is the latest in a series of CompactPCI, AMC, PCIe, VPX Modules that are multichannel, 200MHz, 16-bit A/D modules conforming to the VITA 49 Radio Transport Standard for SDR (software-defined radio) applications. The VITA 49 transport protocol conveys time-stamped signal data and meta-data in packetized form, providing an interoperability framework for interfacing RF sensors (radio receivers and transmitters) to processing resources. Prior to the development of the VITA 49 standard, each SDR receiver manufacturer developed custom and proprietary digitized data formats and meta-data formats that made interoperability of data from different receivers impossible, which made life-cycle support of radio architectures problematic.


Each module in the Pentek Cobalt series features four programmable, multiband DDCs (digital down converters) based on the Xilinx Virtex-6 FPGA. The FPGA also gives the board its I/O flexibility, which allows Pentek to easily adapt it to multiple host-bus protocols.



Pentek Cobalt VITA-49 71664 SDR Module.jpg



Pentek VITA-49 Radio Transport Standard Module, Model 71664 (XMC version)



Pentek’s current lineup of Cobalt VITA 49 modules includes:




Here’s a block diagram of the Pentek Cobalt board:



Pentek Cobalt VITA-49 Module Block Diagram.jpg



Pentek VITA-49 Radio Transport Standard Module Block Diagram



As you can see, the Virtex-6 FPGA ties the four high-speed ADCs to the on-board sample memory. The FPGA performs all of the host-bus I/O in addition to the ADC and memory interfacing. In addition, the FPGA performs a significant amount of per-channel signal processing including:


  • Decimation
  • Power and threshold detection
  • Data packing and flow control
  • Metadata generation
  • VITA 49 formatting and packaging



Pentek is nice enough to show us exactly what’s happening inside of the Virtex-6 FPGA in this diagram from the datasheet:



Pentek VITA 49 Module FPGA Detail.jpg


Pentek VITA-49 Radio Transport Standard Module, FPGA detail



You simply cannot coax a microprocessor to perform this kind of high-speed signal processing. It takes a programmable device. The design solution also requires flexible, programmable I/O to accommodate the many I/O standards supported by this one multi-product design. This is precisely what the Virtex-6 FPGA provides to Pentek and that is why Xilinx refers to its semiconductor devices as All Programmable: hardware, software, and I/O.


The Virtex-6 FPGAs in these Pentek VITA-49 modules are based on 40nm IC process technology. Since these devices were introduced, Xilinx has introduced and is now shipping three more device generations based on TSMC’s 28nm, 20nm, and 16nm FinFET+ processes. With each new generation, these All Programmable devices offer more programmability, faster performance, lower operating power, and easier-to-use design tools. If you have a tough design challenge ahead of you, it’s worth your time to take a look and see what this technology can do for you.

Adam Taylor’s MicroZed Chronicles, Part 118: VDMA Hardware

by Xilinx Employee on ‎02-08-2016 11:17 AM (1,410 Views)


By Adam Taylor



Now that we have the video test-pattern generator up and running, the next step is to insert the VDMA into our design. We are going to insert the VDMA between the test pattern generator and the AXI-Stream-to-Video-Output block.


The VDMA consists of independent write and read channels, allowing the block to write frames from the input video stream to a memory-mapped location or to read data from memory-mapped locations and send it to the AXI stream output. Typically, a number of frames will be stored in memory-mapped locations so that the video data can be manipulated (if desired) before the frame is output. An example of manipulation would be object-detection algorithms running on the PS (Processor System) side of the Zynq SoC.


We want to transfer the stream data to our DDR memory. To do this, we will connect to the High Performance port on the PS. Both the read and the write channels from the VDMA need to be connected to the High Performance port. We’ll add an AXI Interconnect block between the VDMA and the PS to make the connection.







The processor uses an AXI4-Lite interface to control the VDMA. We’ll connect this interface to the same General Purpose AXI interface that connects the PS to the test pattern generator.


I have kept the clocking architecture in this example very simple. All of the AXI peripherals run from fabric clock 0 at 100MHz. The video timing generator and the video output section of the AXI-Stream-to-Video-Output block run at a 40MHz pixel frequency provided by fabric clock 1.


I have also include a number of ILA blocks within the design to enable inspection and gain understanding of what is happening with the design:


  • ILA 0 connected to the Video Timing Generator H and V Sync Signals
  • ILA 1 connected to the AXI-Stream-to-Video-Output Syncs
  • ILA 2 connected to the AXI Stream output from the VDMA
  • ILA 3 connected to the AXI Stream to Video output status
  • ILA 4 connected to the AXI stream generated by the test pattern generator


Inserting the VDMA requires that we change the mode of operation on the AXI-Stream-to-Video-Output generator from master to slave.


When it comes to configuring the VDMA module, we need to be sure to set the output stream width correctly (24 bits). I left the number of frame buffers set to three but we can add many more if necessary.






I also configured the VDMA to allow unaligned transfers. This setting enables unaligned transfers to the AXI memory-mapped boundaries.







The final step was to connect the VDMA and Test pattern generator interrupts to the PS. I decided to use the shared PPI interrupts for this purpose and therefore added a concatenation block to combine the three interrupt sources and supply them to the Zynq interrupt port.


With all of this implemented, the next step is to look at how we write software to get the VDMA up and running and outputting frames.





Complete block diagram






Implemented design





The code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.




 MicroZed Chronicles hardcopy.jpg



  • Second Year E Book here
  • Second Year Hardback here



 MicroZed Chronicles Second Year.jpg




You also can find links to all the previous MicroZed Chronicles blogs on my own Web site, here.





Xilinx Fellow Steve Trimberger has been elected to the National Academy of Engineering’s (NAE) membership in honor of his contributions to FPGA architecture and programming tools. The non-profit NAE’s mission is to promote engineering as a vibrant profession and to advise the US Federal government on engineering and technology topics. Trimberger, already an IEEE and ACM Fellow, joined Xilinx in time to contribute to the design of the third major Xilinx FPGA architecture, the XC4000, and he led the team that developed the XACT design-automation software for that family. (That was so long ago that separate software versions for the IBM PC and the Apollo, Sun-4, and HP 9000 Series 700 “Snakes” workstations were needed.)


According to “The Big Board” in the hallway at Xilinx HQ that I consult for these historic blog posts, The XC4000 is the device that “put Xilinx on the map.” I called Steve Trimberger for more explanation. He said that the XC4000 FPGA was the first to have on-chip user memory in the form of dual-ported LUT RAM and the first to have dedicated arithmetic logic (fast carry/borrow chains), which greatly increased the FPGA’s computation speed. The XC4000 series was also the first to be specifically designed for automated placement and routing.


Steve explained that earlier FPGAs had rather ad hoc routing design because the chip designers knew that the devices would be configured manually. The XC4000 FPGA design had more routing resources; those resources were more regular in their design; and there was a formal routing hierarchy, making it possible for the software team to create the first practical automated place-and-route software for FPGAs. That level of automation was required as the XC4000 FPGA gate capacity advanced the state of the art from about 5K gates to around 100K gates—making system-level, FPGA-based design feasible for the first time. As a result of these innovations, Steve says that the XC4000 series was the first FPGA family used to implement soft microprocessor designs.


Since then, Steve has made numerous, major contributions to the industry as evidenced by the more than two hundred patents granted to him in the field during his tenure at Xilinx. He’s especially proud of the fact that many of those patents represent fundamental innovations that have become standard practice and have become ubiquitous features in today’s FPGAs.


You can always learn a lot about programmable logic and engineering just by hanging around with Steve. For example, here’s a 40-minute presentation he recorded at the University of Toronto titled “The Three Ages of the FPGA” that covers the history of programmable logic:





Please join everyone here at Xilinx in congratulating Steve for this latest honor.



The latest FPGA-compatible TICO encoder and decoder compression IP cores from intoPIX support 4K/UHD video—they’re also 8K-capable—with a 4:4:4 color space (8-, 10-, or 12-bit color depth) using a variable, visually lossless data-compression ratio as large as 4:1, which allows you to transport 4K/UHD video over 3G-SDI or 10G Ethernet links. The “new” thing for these cores is the 4:4:4 color-space support. These IP cores work with all 7 series and newer Xilinx All Programmable devices as well as the Spartan-6 and Virtex-6 FPGA families. Spartan-6 and Artix-7 devices support 60p frame rates. Virtex-7, Kintex-7, Zynq-7000, and all UltraScale devices support 120p frame rates. Compression latency is said to be only a few microseconds. The company also says its new TICO cores “have already been integrated into top of the line product offerings from key players of the industry.” Translation: They are already production-proven.


If you are heading to next week’s ISE Show in Amsterdam, stop by the intoPIX booth to see these cores in action.


For more information, contact intoPIX directly.




The Barco Silex Viper-HV-4K OEM board for 4K HDMI transport over IP compresses 4K/UHD video using hardware-based, real-time SMPTE 2042 VC-2 LD High Quality (visually lossless) video compression and sends it over 1Gbps Ethernet, allowing pro-AV system developers to create point-to-point unicast or multicast distribution networks with low-cost, widely available IP networking equipment and standard, low-cost Ethernet cables.



Barco Silex Viper 4K-UHD Video over IP Board.jpg


Barco Silex Viper 4K/UHD Video-over-IP Transport Board for pro-AV OEMs



There are two variants of the board: the Viper-HV-4K-TX for converting 4K/UHD HDMI video into a compressed-video Ethernet stream and the Viper-HV-4K-RX for converting the compressed Ethernet video stream back into 4K/UHD HDMI video. Here’s a simple diagram showing you how these boards play together in an Ethernet network:



Barco Silex Viper Board Networking.jpg




The boards can be powered using standardized IEEE 802.3af Power-over-Ethernet (POE) so a standard industrial POE networking switch can power the boards, which means you don’t need additional external power supplies at the network endpoints to operate these boards.


Applications for the Viper boards include:


  • Video conferencing
  • Residential Audio/Video distribution
  • Digital signage
  • Video wall
  • Real-time local video network
  • HDMI extender
  • HDMI capture and transmission to server
  • VC-2 LD encoding/decoding



A Web-based GUI supplied with the Viper board supports real-time video upscaling, downscaling, and cropping to match the receiving displays’ requirements. It can also insert a logo, an image, or crawling text on top of the video content with many configurable parameters including the position and transparency of the logo, the font size, color, and speed of the text crawl.


The programmable FPGA fabric in a Zynq SoC handles the real-time video/audio processing while the Zynq SoC’s on-chip, dual-core ARM Cortex-A9 MPCore processor handles the board’s control and configuration. Barco Silex is willing to customize both the FPGA video-processing hardware configuration and the code running on the Zynq SoC’s ARM processor to meet specific customer requirements. The Zynq SoC’s unique hardware, software, and I/O programmability make this customization possible. For example, Barco Silex has security and cryptographic expertise that the company can apply to the video processing taking place on the board.


Contact Barco Silex directly for more information about the Viper-HV-4K 4K/UHD Video-over-IP Transport Board.



Yesterday, Xilinx announced that Xilinx Technology Ventures—the company’s investment arm, which identifies and invests in early-stage companies with innovative new technologies—has launched a Data Center Ecosystem Investment Program to accelerate the adoption of Xilinx All Programmable devices and targeted development tools such as SDAccel by data-center and cloud application developers. The first technology company to receive funds under this new program is TeraDeep, a pioneer in neural-network programming and deep learning.


Xcell Daily discussed TeraDeep a couple of years ago. (See “Teradeep gets a better than 8x GOPS/W advantage for real-time, image-recognition engine implemented with FPGA in the Zynq SoC.”) Back then, performance data indicated that TeraDeep’s FPGA-based convolutional network implementation delivered more than 8x better GOPS/W performance compared to microprocessors and GPU accelerators performing the same work. TeraDeep’s Web site currently says “up to 5x” the throughput at perhaps 10% (or less) of the power consumption. That’s a potential 50x improvement in GOPS/W if my multiplication still serves. Perhaps Microsoft need not locate its future data centers underwater (for cooling) after all.


No less a luminary than Electronics Weekly’s Components Editor and prolific industry blogger David Manners has weighed in on the announcement. As usual, my friend David wrote some things I can’t:



Cute Move By Xilinx To Boost Data-Centre ICs


“Xilinx has announced a clever wheeze to counteract the Intel-Altera combo effect in data centre chips. When Intel realised that FPGA technology could boost its server chips it bought Altera for $16 billion.


“Not having $16 billion, but wanting to boost its server chips with FPGA technology, Applied Micro formed a technology development partnership with Xilinx.


“Qualcomm has done the same.


“Other makers of ARM-based server chips have also beaten a path to Logic Drive for the enhancements which FPGA can bring to server chips.


“The initial idea of Qualcomm, AMCC and others is, like Intel/Altera, to package an FPGA with a server processor in the same package and then to move to integrating them both on the same die.


“Now, Xilinx has set up something called its 'Data Center Ecosystem Investment Program' to invest in start-ups which can contribute to Xilinx's data centre chips.


“Xilinx is now itself beating paths to doors - to the doors of any company which can improve its data centre ICs.


“First to get a knock on the door is TeraDeep - a kind of AI company which has acceleration technology which runs on Xilinx FPGAs.”


[Note: Logic Drive is the name of the road into Xilinx’s San Jose campus.]



logic drive sign.jpg



You generally do not get competitive information in the Xcell Daily blog for one simple reason: as a Xilinx employee, there’s absolutely nothing I can write about the competition that you will find credible. So, I don’t bother to write it.


However, if someone else writes it, well that’s a different story.


Today, Kevin Morris over at EE Journal published an article that discusses the competitive situation in programmable logic following Xilinx’s recent announcement regarding first customer shipments of Virtex UltraScale+ devices based on TSMC’s 16nm FinFET+ process. (See the Xcell Daily blog post “Xilinx announces first customer shipment of Virtex UltraScale+ devices based on TSMC’s 16nm FF+ process” for more information regarding that announcement.)


The title of Morris’ article is “Xilinx 1, Intel 0” and there’s really nothing left for me to add to that. Read his article and see what he has to say.



The Xilinx IDF (Isolation Design Flow) allows you to implement hardware designs for applications requiring both physical and logical isolation of security- and safety-critical modules. An example of such an application is a Single-Chip Cryptography (SCC) design. Many designers still think that it’s necessary to implement such designs in multiple devices to get absolute isolation but this is not strictly true. Using tools included in the Xilinx Vivado HLx Design Suite editions, you can develop single-chip designs where critical modules are logically, physically, and provably separated, as shown in the example graphic below:



Isolated Design Example Block Diagram.jpg



The Vivado tools you’ll use include IP Integrator, the TCL scripting tool and scripts for floorplanning, and the Vivado Isolation Verifier.


I’m not going to explain the procedures here in this blog because there’s a new and very readable, 45-page application note—XAPP1256 titled “Zynq-7000 AP SoC Isolation Design Flow Lab (Vivado Design Suite 2015.2)” by Ed Hallett, a Xilinx Staff Applications Engineer—which explains the process in detail. Download it and take a look if this is the sort of design you need to do. Also, if you don’t happen to be using the Zynq SoC, that’s OK. The process applies to all Xilinx 7 series devices.


Note: This application note also references XAPP1222 titled “Isolation Design Flow for Xilinx 7 Series FPGAs or Zynq-7000 AP SoCs (Vivado Tools),” also written by Ed Hallett.






In two days, Doulos will be conducting two free, 1-hour training sessions titled “Understanding the IP Flow in Vivado.” There’s one session conveniently timed for you if you live in the UK, Europe, and Asia. There’s another for you if you’re in North America (or South America, I suppose).


The Webinar will cover:


  • Customizing IP for the Vivado IP Catalog
  • Generating output products using Vivado
  • Instantiating IP described in Verilog or VHDL



One key word here: “FREE”


Sign up here, now.





The Skreens Plus HDMI video switcher/combiner is already a big winner, having garnered nearly half a million dollars in startup pledges on Kickstarter (see “Zynq-based Skreens Nexus on Kickstarter gets 1068 backers with $390K pledged. 5 days left in the funding campaign!”) and last month the product took home another couple of awards from the CES 2016 show in Las Vegas:




skreens nexus with tv.jpg


Skreens Nexus HDMI video switcher/combiner




The Skreens Nexus is based on a Xilinx Zynq-7045 SoC.


Also, see “Zynq-based Skreens HDMI video switcher/combiner goes from huge Kickstarter success to CES.”





NASA’s JPL (Jet Propulsion Laboratory) has just released a thrilling flyover video animation based on a 3D model of the Dwarf Planet Ceres. (You can see and play with a 3D model of Ceres here.) Dawn's Framing Camera shot these images while the spacecraft was in a high-altitude mapping orbit. The false-color movie and the 3D model are based on images taken by the Dawn spacecraft and Dawn's Framing Camera team at the German Aerospace Center, DLR, created the 3D model and this video. (More information on the creation of 3D asteroid models from images here.)


Here’s the Ceres flyby video:





The Dawn spacecraft’s Framing Camera is a multi-talented refractive telescope with a 1024x1024-pixel CCD imager capped with an electronic shutter. There are two identical Framing Cameras in the spacecraft for redundancy and each camera is controlled by its own FPGA-based DPU (data processing unit) developed at IDA (the Institut für Datentechnik und Kommunikationsnetze at TU Braunschweig) in Germany. A space-grade, radiation-tolerant 90nm Xilinx Virtex-4QV FPGA implements the Framing Cameras’ major DPU functions.


For more information about the use of an FPGA in the Dawn framing camera imaging system, see “Visit to a small planet: NASA’s Dawn spacecraft sends video postcard from Ceres in the asteroid belt.”



Saanlima’s Pepino is an entry-level FPGA development board based on a Xilinx Spartan-6 LX9 FPGA. It costs just $99.95 when loaded with 1Mbyte of SRAM and an upgraded version with a larger Spartan-6 LX25 FPGA and 2Mbytes of SRAM costs $139.95—not much more. Pepino includes functions and features typically needed to implement computing and gaming platforms including fast/wide SRAM, PS/2 keyboard and mouse connectors, a VGA connector, an SD card socket, and a dual-channel audio connector. Critically, the board includes a built-in JTAG-over-USB programmer that’s directly supported by Xilinx development tools (iMPACT, ChipScope, SDK etc.) and 3rd-party JTAG programming tools without the need for an external (and relatively expensive) JTAG cable.



Here’s a photo of the board:



Saanlima Pepino.jpg


Saanlima Pepino board based on a Xilinx Spartan-6 FPGA



The Pepino board is specifically designed to implement and run Niklaus Wirth’s Oberon RISC system (Project Oberon) and includes the required 32-bit, fast SRAM not typically found on a low-end FPGA development board. However, you can certainly use this board as a low-end, general-purpose FPGA development board or FPGA trainer.


If all of this dimly rings a bell, see “Oberon System Implemented on a Low-Cost FPGA Board,” which describes Saanlima’s slightly more expensive Pipistrello board, based on a Spartan-6 LX45.


Just how potent are the Xilinx muti-Gbps GTY transceivers? A new 4.5-minute video by Xilinx’s master SerDes showman and Transceiver Technical Marketing Manager Martin Gilpatric shows two separate demos using the GTY transceivers on a 20nm Xilinx Virtex UltraScale FPGA to successfully drive 100Gbps over over a 100G backplane and four lanes of 5m Twinax copper cable, conforming to 100GBASE-KP4 and 100GBASE-CR4 for backplane and Twinax cable respectively with as much as 35db of loss. The GTY transceiver’s low-jitter, high-speed clocking and auto-adaptive equalization make this robust performance possible.


Here’s the video:





Xilinx 20nm Virtex UltraScale FPGAs are already in volume production so you can immediately start using them to lower the cost of systems running at 100Gbps by replacing the expensive optical links with less-expensive, direct-attach copper cables. High-speed backplanes and Top-of-Rack switches are ideal targets for this sort of cost reduction.


Note: The same holds true for 1- and 2-lane systems running at 25 and 50Gbps.


And what about those 16nm Virtex UltraScale+ FPGAs with the 32.75Gbps GTY transceivers that started to ship last week? (See “Xilinx announces first customer shipment of Virtex UltraScale+ devices based on TSMC’s 16nm FF+ process” and “Remember that 16nm FinFET Virtex UltraScale+ FPGA shipped last week? The GTY SerDes are running at 32.75Gbps…”.) Stay tuned.




Last Friday, Xilinx announce customer shipments of the first 16nm Virtex UltraScale+ FPGAs based on TSMC’s FinFET+ technology. (See “Xilinx announces first customer shipment of Virtex UltraScale+ devices based on TSMC’s 16nm FF+ process.”) One significant feature of these new devices is their 32.75Gbps GTY SerDes ports. These GTY ports are not like the 28.05Gbps SerDes ports on the Xilinx Virtex-7 580T and 870T, which use 3D assembly to add the high-speed ports to 28nm silicon. These 32.75Gbps GTY ports are designed right into the 16nm FinFET+ devices—on the same die.


And these GTY ports actually do work at 32.75Gbps. Already.


The transmitter eye tells all in this screen capture from a new Virtex UltraScale+ FPGA video:



Virtex UltraScale Plus GTY Eye.jpg 



Minutes after the transceiver lab received its first Virtex UltraScale+ board, it had:


  • The GTY PLLs locking
  • The GTY transmitter sending bits
  • The GTY receiver recovering those bits



The 2.5-minute video shows the Virtex UltraScale+ GTY ports operating successfully at 32.75Gbps through an attenuating medium that includes board-to-board connectors with a total of 25db of loss and in the presence of noise (operating for three days, so far, in the video) without bit errors. Low-jitter transmitters and auto-adaptive equalization make this possible.


Here’s the video:




By Adam Taylor


I was hoping this week to focus on how to use the VDMA, however a few things happened as I worked towards this and these things demand a blog in their own right. My end goal is to create an imaging system using the ZedBoard Embedded Vision Kit using SDSoC. This means that the hardware platform built in Vivado has to be in an SDSoC compatible version.


Until recently I was using SDSoC V2015.2.1, having upgraded in the last week to Vivado and SDSoC 2015.4. This upgrade brought about several IP upgrades that required some changes to the previously developed software. As a result, I did not see what I expected on the VGA monitor when I first built the system using the upgraded tools and IP.


The biggest change was the new system uses software to configure the test pattern generator, which makes it more complicated to simulate. I know we can use Bus Function Models to simulate the Zynq interface. However, I wanted to troubleshoot both my software and hardware at the same time (because I only have one day a week to write theses blogs and, to get it working, I typically write a blog on a Sunday). Therefore, the best way to troubleshoot is to have the actual hardware in the loop during the test so that I can run my software on real hardware.


We can do this very simply within the Vivado environment by making use of the In-chip Logic Analyzer (ILA) IP block, which can be attached to either signals and buses or to internal AXI interfaces. Once these analyzers have been inserted, we can not only monitor them within Vivado but we can also use SDK to download our software application and run that too. Therefore, we can use the ILA and the ability to debug (breakpoints, watching memory etc.) the software to find the root cause of our problem.

The ILA block is available from the Vivado IP catalog and we add it to our design like we would any other IP module. We have the option of monitoring AXI interfaces (and which type of AXI4—Lite, streaming, etc.) or simpler bits and buses. We configure these choices by double clicking on the block and customizing as required.


To help me debug the test pattern design, I added three ILA blocks to monitor the outputs of the Video Timing Generator and the AXI-Stream-to-Video outputs. I also added an ILA to monitor the AXI stream generated by the test pattern generator as shown below:





AXI Stream monitoring ILA






Timing Generator and Video Output ILA



After I inserted the required ILA’s into the design, I re-generated the output products for the new design implemented and generated the bit files. I then opened the hardware manager in Vivado and also opened SDK. We need both going forward.


Within the Vivado Hardware manager I programmed the ZedBoard’s Zynq over the JTAG cable, along with the bit file. The hardware manager also identifies and loads an LTX file that contains information about the signals being monitored by each ILA. Within the hardware manager, you will see an ILA window for each ILA in your design. These windows enable you to set triggers and analyze the results.





ILA window open and triggered before the SW application was executed



However, it will do nothing until we run some software. In this case, this we can do so very simply with the SDK. We need to create a new debug configuration over GDB that only downloads the application and not the FPGA configuration file as shown below:




Creating the Debug Application



Once that is complete, we can launch the debugger and run the software as we desire on the Zynq SoC. We can add breakpoints and pause execution as required to investigate any issues we see.







At the same time, Vivado’s hardware manager allows us to configure the ILA’s so that they trigger when software-generated events occur. That way, we can check whether or not they do occur. Depending upon what we have connected to the ILA, we can also monitor events that are asynchronous to the software.





Setting the trigger on the AXI Stream monitor






Triggering on the AXI Stream following the issue of the software run command with the application executing on the Zynq SoC




Following this approach enabled me to quickly identify that the issue was the AXI-Stream-to-Video-Output IP was not correctly locking. That was the cause of the problem.


This approach also enabled me to see that the test pattern generator was correctly generating its output under software command.


Sorry for the diversion but I think it was a rather useful distraction.




The code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.




 MicroZed Chronicles hardcopy.jpg



  • Second Year E Book here
  • Second Year Hardback here



 MicroZed Chronicles Second Year.jpg




You also can find links to all the previous MicroZed Chronicles blogs on my own Web site, here.




If you need to implement HDMI 1.4 or 2.0 transmitters or receivers, then the new Xilinx App Note XAPP1275, “HDMI 2.0 Implementation on Kintex UltraScale FPGA GTH Transceivers,” is for you. This 30-page app note tells you how to implement HDMI ports using the 16.3Gbps Kintex UltraScale GTH SerDes transceivers and several pieces of Xilinx LogiCORE IP and the Xilinx MicroBlaze soft processor core. The app note gives you a reference design, with downloadable code, that will serve as a starting point for your own design. Here’s a block diagram of that reference design targeting the KCU105 Kintex UltraScale Eval Kit:



HDMI Reference Design.jpg

Xilinx announces first customer shipment of Virtex UltraScale+ devices based on TSMC’s 16nm FF+ process

by Xilinx Employee ‎01-28-2016 07:15 AM - edited ‎02-01-2016 02:58 PM (4,768 Views)


Xilinx just announced first customer shipments of high-end Virtex UltraScale+ devices based on TSMC’s 16nm FF+ (enhanced FinFET) process technology. As TSMC states on its 16FF+ Web page, “… the 16nm technology offers substantial power reduction for the same chip performance.” The Virtex UltraScale+ FPGAs are Xilinx’s biggest system chips yet, with big-system features including several million logic cells; as much as 455Mbits of on-chip Block RAM including UltraRAM (really big Block RAM); integrated 100G Ethernet MAC with RS-FEC, 150G Interlaken, and PCIe Gen3 x16 and PCIe Gen4 x8 cores; and as many as 128 32.75Gbps SerDes ports with an aggregate bandwidth that tips the scales at a mind-boggling 8.4Tbps.


Think these chips are “glue”? Fuhgettaboutit!


These chips are for designing and building large, robust systems with leading-edge performance (2x to 5x more system-level performance/Watt than you get with 28nm devices). Take a look at some of the reference designs listed on the Virtex UltraScale+ Web page:



That first design, the 24-channel Radar beamformer? You get a 66% reduction in BOM cost and a 66% reduction in total power consumption by switching from a Kintex UltraScale device to a Virtex UltraScale+ device. Those figures alone should make you want to take a look. (Caveat: Your mileage may vary. Every system is different.)


Today’s announcement also talks about this Virtex UltraScale+ device shipment being another proof point, representing “three consecutive generations of leadership technology at 28nm, 20nm, and now at 16nm.” Frankly, that’s from an IC-design perspective. As a system designer, I prefer to think of it this way—today’s announcement is about shipping the first devices in the ninth consecutive, distinct product line that Xilinx has developed with TSMC using three leading-edge IC process generations. Those nine distinct product lines include seven FPGA families, the groundbreaking Zynq All Programmable SoC family, and the even-more-capable Zynq UltraScale+ MPSoC family:



Although it goes without saying, I’ll say it anyway: This sort of thing just does not happen without an immensely strong semiconductor manufacturing partner and that phrase aptly describes TSMC, which has helped Xilinx pull in schedules and ship devices early—beating schedules that were already overly optimistic.


Today’s announcement also says (so I’m not giving anything away here) that more than 100 Xilinx customers are already actively developing designs based on devices in the three UltraScale+ device portfolios. That’s a better, more credible way of telling you that there is already tool support for UltraScale+ devices baked into the latest shipping version of the Vivado Design Suite HLx editions (2015.4), which include Vivado HLS (high-level synthesis).


It’s also a way of letting you know that if you’re not already designing your next system with Xilinx UltraScale+ devices, then it looks like you could already be trailing your competition. ‘Nuff said.



Xcell Daily has covered several FPGA Prototpying boards based on the largest Xilinx FPGAs and now you can add the UltraScale XCVU440 FPGA Module from proFPGA to that list. As the name says, this board is based on a Xilinx Virtex UltraScale VU440 and the prototyping board has a rated capacity of 30M ASIC gates. This FPGA Prototyping board offers ten extension sites with as many as 1327 user I/Os for daughter boards (e.g. memory boards, interface boards), interconnecting cables, or customer-specific application boards and works in combination with proFPGA’s uno, duo, or quad motherboards, which respectively carry one, two, or four members of the growing family of proFPGA FPGA Prototyping boards. (That’s 120M ASIC gates worth of prototyping capacity when you fully load a quad motherboard with four UltraScale XCVU440 FPGA Modules!)



proFPGA UltraScale VU440 FPGA Prototyping board.jpg



proFPGA’s UltraScale XCVU440 FPGA Module



The proFPGA system architecture employs a modular and scalable system concept. The FPGAs are assembled on dedicated FPGA modules, which you then plug into a proFPGA motherboard. The full proFPGA product series consists of the three motherboards (uno, duo, and quad); different kinds of FPGA Modules based on various programmable devices including Xilinx Virtex UltraScale and Virtex-7 FPGAs and the Xilinx Zynq-7000 SoC; a set of interconnection boards and cables; and various daughter boards such as DDR3/DDR4 memory boards and high-speed interface boards (PCIe, USB 3.0 and Gigabit Ethernet).


The proFPGA UltraScale XCVI440 FPGA Module connects to the motherboard through four high-density connectors, as shown in the block diagram below:



proFPGA UltraScale VU440 FPGA Prototyping board block diagram.jpg



proFPGA’s UltraScale XCVU440 FPGA Module Block Diagram



A design suite called proFPGA Builder provides the comprehensive development environment you need to create and run your designs. The proFPGA software automatically detects the physical board assembly and generating the complete code framework for multi-FPGA HDL designs, including scripts for simulation, synthesis, and for running your design.




Note: For more information about the Zynq-based proFPGA FPGA Prototyping module, see “Zynq-based prototyping module accelerates ARM-based SoC development.”


The VMEbus—which dates back to the earliest days of 16-bit microprocessors—started life as the Eurocard version of Motorola’s VERSAbus, designed explicitly for Motorola Semiconductor’s 68000 microprocessor. The “VME” in VMEbus stands for “VERSA module European,” the VERSAmodule being the first (and less successful) board standard for 68000-based systems. Many board-level vendors jumped onto the VMEbus but it soon became clear that the 68000 microprocessor had a serious competitor in the Intel x86 processor dynasty and, as a consequence, VMEbus board vendors needed a way of adapting Intel’s PCI bus to the VMEbus. Bus-bridge chips filled this need and one of the leading semiconductor vendors of bridge chips, Tundra in Canada, introduced a first-generation PCI-to-VME bridge chip in 1997. That was nearly two decades ago. Since then, the VMEbus has evolved into VME64.


IDT acquired Tundra in 2009 along with its second-generation PCI-X-to-VME64 bridge chip, the Tsi148, which was manufactured by IBM at its Essex Junction fabrication facility. Then, IBM decided to get out of the IC foundry business in 2014 and the fate of the Tsi148 PCI-to-VME64 bridge chip was sealed. End of life for the second-generation bridge chip was announced by IDT in 2014 and production ceased in 2015, which could have ended the life of a lot of VME64 boards.


Except that’s not what happened.


X-ES (Extreme Engineering Solutions), an active VME64 board maker, decided that it did not want to curtail its successful VME64 product line. It also did not want to design an ASIC replacement for the IDT/Tundra bus bridge chip for the obvious reasons (design and NRE costs, risk, and schedule). Instead, X-ES selected the Xilinx Artix-7 FPGA as a target for a redesigned bus bridge chip—simply named the VME Bridge—and, thanks to the radically expanded resources available the 28nm All Programmable device, was able to replicate all of the functions of the EOL’ed bus bridge chip and add features as well, including support for a PCIe Gen2 x4 connection to the host and a DMA-buffered interface to external DDR3 SDRAM.


Because the new X-ES bridge chip is based on an FPGA, the bus functions can be upgraded at any time, even after boards have been shipped. What’s more, because the new bus bridge is now a design that’s explicitly targeted at an FPGA implementation, it can be moved with relative ease to the next FPGA generation, and the next, and the next using the same Xilinx Vivado HLx tools. Go ahead, try that with an ASIC.


X-ES has already designed the Artix-7 FPGA Bus Bridge into several VME64 boards including the XCalibur4531 SBC, shown below, which puts an Intel Core i7 Broadwell-H microprocessor onto the VME64 bus using a bus bridge designed into a Xilinx Artix-7 FPGA:



X-ES XCalibur4531 SBC.jpg



The X-ES XCalibur4531 puts an Intel Core i7 Broadwell-H microprocessor onto the VME64 bus using a bus bridge designed into a Xilinx Artix-7 FPGA



Here’s a block diagram of the X-ES XCalibur 4531 VME board that illustrates how the FPGA-based Bus Bridge fits into the system:



X-ES XCalibur4531 SBC Block Diagram.jpg



X-ES XCalibur4531 VME64 SBC Block Diagram



The Intel QM87 chipset delivers I/O such as Ethernet and SATA to the VME64 bus but the PCIe-to-VME64 Bridge chip, implemented with an Artix-7 FPGA, implements the VME64 bus protocol.


Contact X-ES for more information about VME64 boards based on the FPGA Bus Bridge.


The Fiberblaze fb8XG NIC (network interface card) pairs two QSFP+ optical cages with a Xilinx Kintex UltraScale FPGA on a PCIe card with an 8-lane PCIe Gen1/Gen2/Gen3 host interface. The two QSFP+ cages on the NIC support 2x40Gbps or 8x10Gbps optical links. The on-board FPGA can be either a Kintex UltraScale KU060 or KU115 device. There’s also an option to use a Kintex UltraScale KU085 FPGA. (These 20nm FPGAs have 32 to 64 bulletproof 16.3Gbps SerDes ports to support the high data rates required by the QSFP+ optical cages.)



 Fiberblaze fb8XG NIC.jpg


The Fiberblaze fb8XG NIC pairs two QSFP+ optical cages with a Xilinx Kintex UltraScale FPGA



Fiberblaze supports all of its fbNIC products with a hand-optimized SDK and device drivers so that you can start developing high-performance applications right out of the box. Fiberblaze also provides Layer 2 and 3 network capabilities through the company’s industry-proven MAC and TCP/IP solutions. (Currently, there are eight Fiberblaze fbNIC boards based on the Xilinx Kintex UltraScale, Virtex-7, and Virtex-6 FPGA families.)



Coming off of a successful CrowdSupply campaign (see “Zynq-based Snickerdoodle Dev Board hits 1001 pledges, 105% of funding goal on CrowdSupply crowd-funding site”) where it collected pledges totaling 211% of goal, krtkl’s (pronounced “critical’s”) $55, WiFi-enabled Snickerdoodle has now racked up another accolade. It’s made MAKE:’s Maker’s Guide to Boards. You’ll find the Snickerdoodle listing here.





Krtkl’s $55, Zynq-based, WiFi-enabled Snickerdoodle Dev Board


BBC uses Zynq and ZedBoard to replace entire rack of NICAM audio-coding equipment dating back to 1983

by Xilinx Employee ‎01-25-2016 06:23 PM - edited ‎01-28-2016 08:40 AM (8,562 Views)


Thanks to Hackaday’s recent article, "35 Million People Didn’t Notice When Zynq Took Over Their Radio," Twitter, and Xilinx’s Social Media wrangler Nicole Whalen, we now know that the BBC switched over from an entire rack full of NICAM audio-coding and -distribution gear developed and fielded in 1983 with one small rack-mount box based on the Xilinx Zynq SoC and the low-cost ZedBoard development kit. The news originated with a January 20 posting titled “35 million people didn’t notice a thing…” by Justin Mitchell, a principal engineer at BBC R&D. Mitchell writes “The output of these coders are listened to by about 61% of the UK population (35-40 million people) each week,” so you get the sense that this was a high-profile project.


As Mitchell explains: “The NICAM equipment … in the basement of Broadcasting House was originally installed in the autumn of 1983. The circuit boards that made up the NICAM equipment were failing due to their age and the supply of spare circuit boards was running out. In addition, the faulty circuit boards were becoming difficult to repair because some of the components they use have become obsolete and can no longer be bought. So we looked to make a replacement.”



He continues: “…the new coder … replaces the 3 data combiners (which combine the RDS data with the transmitter control information), the 6-channel audio coder, the CRC inserter, the CRC checker, the 6-channel audio decoder and the 3 data splitters (which separate the RDS data and transmitter control information). It also includes the NICAM test waveform generator.”


Here’s a photo of the inside of the new NICAM codec box with the ZedBoard located in the center of the image:



BBC Zynq-Based NICAM Codec.jpg


Zynq-based BBC NICAM Audio Codec with ZedBoard in the center of the image



How well did the switchover work? It’s all in the title of Mitchell’s article. Nobody noticed. That’s the hallmark of a 100% successful replacement effort.





About the Author
  • Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.