UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

 

A quick look at the latest product table for the Xilinx Zynq UltraScale+ RFSoC will tell you that the sample rate for the devices’ RF-class, 14-bit DAC has jumped to 6.554Gsamples/sec, up from 6.4Gsamples/sec. I asked Senior Product Line Manager Wouter Suverkropp about the change and he told me that the increase supports “…an extra level of oversampling for DOCSIS3.1 [designers]. The extra oversampling gives them 3dB processing gain and therefore simplifies the external circuits even further.”

 

 

 

RFSoC Conceptual Diagram.jpg 

 

Zynq UltraScale+ RFSoC Conceptual Diagram

 

 

 

For more information about the Zynq UltraScale+ RFSoC, see:

 

 

 

 

 

 

 

 

 

 

 

 

Javier Alejandro Varela and Professor Dr-Ing Norbert Wehn at of the University of Kaiserslautern’s Microelectronic Systems Design Research Group have just published a White Paper titled “Running Financial Risk Management Applications on FPGA in the Amazon Cloud” and the last sentence in the White Paper’s abstract reads:

 

 

“…our FPGA implementation achieves a 10x speedup on the compute intensive part of the code, compared to an optimized parallel implementation on multicore CPU, and it delivers a 3.5x speedup at application level for the given setup.”

 

 

The University of Kaiserslautern’s Microelectronic Systems Design Research Group has been working on accelerating financial applications using FPGAs in connection with high-performance computing systems since 2010 and that research has recently migrated to cloud-based computing systems including Amazon’s EC2 F1 Instance, which is based on Xilinx Virtex Ultrascale+ FPGAs. The results in this White Paper are based on using OpenCL code and the Xilinx SDAccel development environment.

 

 

 

For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:

 

 

 

 

 

 

 

Xilinx has announced availability of automotive-grade Zynq UltraScale+ MPSoCs, enabling development of safety critical ADAS and Autonomous Driving systems.  The 4-member Xilinx Automotive XA Zynq UltraScale+ MPSoC family is qualified according to AEC-Q100 test specifications with full ISO 26262 ASIL-C level certification and is ideally suited for various automotive platforms by delivering the right performance/watt while integrating critical functional safety and security features.

 

The XA Zynq UltraScale+ MPSoC family has been certified to meet ISO 26262 ASIL-C level requirements by Exida, one of the world's leading accredited certification companies specializing in automation and automotive system safety and security. The product includes a "safety island" designed for real-time processing functional safety applications that has been certified to meet ISO 26262 ASIL-C level requirements.  In addition to the safety island, the device’s programmable logic can be used to create additional safety circuits tailored for specific applications such as monitors, watchdogs, or functional redundancy. These additional hardware safety blocks effectively allow ASIL decomposition and fault-tolerant architecture designs within a single device.

 

 

Xilinx XA Zynq UltraScale Plus MPSoC.jpg 

 

  

 

 

Bitmain manufactures Bitcoin, Litecoin, and other cryptocurrency mining machines and currently operates the world’s largest cryptocurrency mines. The company’s latest-generation Bitcoin miner, the Antminer S9, incorporates 189 of Bitmain’s 16nm ASIC, the BM1387, which performs the Bitcoin hash algorithm at a reate of 14 TeraHashes/sec. (See “Heigh ho! Heigh ho! Bitmain teams 189 bitcoin-mining ASICs with a Zynq SoC to create world's most powerful bitcoin miner.”) The company also uses one Zynq Z-7010 SoC to control those 189 hash-algorithm ASICs.

 

 

Bitmain Antminer S9.jpg 

 

Bitmain’s Antminer S9 Bitcoin Mining Machine uses a Zynq Z-7010 SoC as a main control processor

 

 

 

The Powered by Xilinx program has just published a 3-minute video containing an interview with Yingfei Li, Bitmain’s Marketing Director, and Wenguo Zhang, Bitmain’s Hardware R&D Director. In the video, Zhang explains that the Zynq Z-7010 solved multiple hidden problems with the company’s previous-generation control panel, thanks to the Zynq SoC’s dual-core Arm Cortex-A9 MPCore processor and the on-chip programmable logic.

 

Due to the success that Bitmain has had with Xilinx Zynq SoCs in it’s Antminer S9 Bitcoin mining machine, the company is now exploring the use of Xilinx 20nm and 16nm devices (UltraScale and UltraScale+) for future, planned AI platforms and products.

 

 

 

 

DornerWorks is one of only three Xilinx Premier Alliance Partners in North America offering design services, so the company has more than a little experience using Xilinx All Programmable devices. The company has just launched a new learn-by-email series with “interesting shortcuts or automation tricks related to FPGA development.”

 

The series is free but you’ll need to provide an email address to receive the lessons. I signed up and immediately received a link to the first lesson titled “Algorithm Implementation and Acceleration on Embedded Systems” written by DornerWorks’ Anthony Boorsma. It contains information about the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC and the Xilinx SDSoC development environment.

 

Sign up here.

 

 

 

The recent introduction of the groundbreaking Xilinx Zynq UltraScale+ RFSoC means that there are big changes in store for the way advanced RF and comms systems will be designed. With as many as 16 RF-class ADCs and DACs on one device along with a metric ton or two of other programmable resources, the Zynq UltraScale+ RFSoC makes it possible to start thinking about single-chip Massive MIMO systems. A new EDN.com article by Paul Newson , Hemang Parekh, and Harpinder Matharu titled “Realizing 5G New Radio massive MIMO systemsteases a few details for building such systems and includes this mind-tickling photo:

 

 

 

Zynq UltraScale RFSoC Massive MIMO proto system.jpg 

 

 

A sharp eye and keen memory will link that photo to a demo from last October’s Xilinx Showcase demo at the Xilinx facility in Longmont, Colorado. Here’s Xilinx’s Lee Hansen demonstrating a similar system based on the Xilinx Zynq UltraScale+ RFSoC:

 

 

 

 

For more details about the Zynq UltraScale+ RFSoC, contact your friendly neighborhood Xilinx or Avnet sales rep and see these previous Xcell Daily blog posts:

 

 

 

 

 

 

 

 

 

 

 

Last month, a user on EmbeddedRelated.com going by the handle stephaneb started a thread titled “When (and why) is it a good idea to use an FPGA in your embedded system design?” Olivier Tremois (oliviert), a Xilinx DSP Specialist FAE based in France, provided an excellent, comprehensive, concise, Xilinx-specific response worth repeating in the Xcell Daily blog:

 

 

 

As a Xilinx employee I would like to contribute on the Pros ... and the Cons.

 

Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.


Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:

 

FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...

 

Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.

 

Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).

 

Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].

 

Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:

 

  • IP Integrator(IPI): a graphical interface to stitch IPs together and generate bitstreams for complete systems.

 

  • Vivado HLS (High Level Synthesis): a tool that allows you to generate HDL code from C/C++ code. This tool will generate IPs that can be handled by IPI.

 

 

  • SDSoC (Software Defined SoC): This tool allows you to design complete systems, software and hardware on a Zynq SoC/Zynq UltraScale+ MPSoC platform. This tool with some plugins will allow you to move part of your C/C++ code to programmable logic (calling Vivado HLS in the background).

 

  • SDAccel: an OpenCL (and more) implementation. Not relevant for this thread.

 

 

There are also tools related to the MathWorks environment [MATLAB and Simulink]:

 

 

  • System Generator for DSP (aka SysGen): Low-level Simulink library (designed by Xilinx for Xilinx FPGAs). Allows you to program HDL code with blocks. This tools achieves even better performance (clock/area) than HDL code as each block is an instance of an IP (from register, adder, counter, multiplier up to FFT, FIR compiler, and VHLS IP). Bit-true and cycle-true simulations.

 

  • Xilinx Model Composer (XMC): available since ... yesterday! Again a Simulink blockset but based on Vivado HLS. Much faster simulations. Bit-true but not cycle-true.

 

 

All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…

 

 

 

 

One of life’s realities is that the most advanced semiconductor devices—including the Xilinx Zynq UltraScale+ MPSoCs—require multiple voltage supplies for proper operation. That means that you must devote a part of the system engineering effort for a product based on these devices to the power subsystem. Put another way, it’s been a long, long time since the days when a single 5V supply and a bypass capacitor were all you needed. Fortunately, there’s help. Xilinx has a number of vendor partners with ready, device-specific power-management ICs (PMICs). Case in point: Dialog Semiconductor.

 

If you need to power a Zynq UltraScale+ ZU3EG, ZU7EV, or ZU9CG MPSoC, you’ll want to check out Dialog’s App Note AN-PM-095 titled “Power Solutions for Xilinx Zynq Ultrascale+ ZU9EG.” This document contains reference designs for cost-optimized, PMIC-based circuits specifically targeting the power requirements for Zynq UltraScale+ MPSoCs. According to Xilinx Senior Tech Marketing Manager for Analog and Power Delivery Cathal Murphy, Dialog Semi’s PMICs can be used for low-cost power-supply designs because they generate as many as 12 power rails per device. They also switch at frequencies as high as 3MHz, which means that you can use smaller, less expensive passive devices in the design.

 

It also means that your overall power-management design will be smaller. For example, Dialog Semi’s power-management ref design for a Zynq UltraScale+ ZU9 MPSoC requires only 1.5in2 of board space—or less for smaller devices in the MPSoC family.

 

You don’t need to visualize that in your head. Here’s a photo and chart supplied by Cathal:

 

 

Dialog Semi Zynq UltraScale Plus MPSoC PMICs.jpg 

 

 

The Dialog Semi reference design is hidden under the US 25-cent piece.

 

As the chart notes, these Dialog Semi PMICs have built in power sequencing and can be obtained preprogrammed for Zynq-specific power sequences from distributors such as Avnet.

 

Cathal also pointed out that Dialog Semi has long been supplying PMICs to the consumer market (think smartphones and tablets) and that the power requirements for Zynq UltraScale+ MPSoCs map well into the existing capabilities of PMICs designed for this market, so you reap the benefit of the company’s volume manufacturing expertise.

 

Note: If you’re looking for a PMIC to power your Spartan-7 FPGA design, check out Dialog Semi’s DA9062 with four buck converters and four LDOs.

 

 

Adam Taylor’s MicroZed Chronicles, Part 231: “Developing Image Processing Platforms”—The Video

by Xilinx Employee ‎01-08-2018 09:33 AM - edited ‎01-15-2018 09:44 AM (6,368 Views)

 

Adam Taylor has been writing about the use of Xilinx All Programmable devices for image-processing platforms for quite a while and he has wrapped up much of what he knows into a 44-minute video presentation, which appears below. Adam is presenting tomorrow at the Xilinx Developer Forum being held in Frankfurt, Germany.

 

 

 

 

You’ll find a PDF of his slides attached below:

Powered by Xilinx: Another look at KORTIQ’s FPGA-based AIScale CNN Accelerator

by Xilinx Employee ‎01-04-2018 02:00 PM - edited ‎01-04-2018 02:16 PM (4,687 Views)

 

A previous blog at the end of last November discussed KORTIQ’s FPGA-based AIScale CNN Accelerator, which takes pre-trained CNNs (convolutional neural networks)—including industry standards such as ResNet, AlexNet, Tiny Yolo, and VGG-16—compresses them, and fits them into Xilinx’s full range of programmable logic fabrics. (See “KORTIQ’s AIScale Accelerator fits trained CNNs into large or small All Programmable devices, allowing you to pick the right price/performance ratio for your application.”) A short, new Powered by Xilinx video provides more details about Kortiq and its accelerated CNN.

 

In the video, KORTIQ CEO Harold Weiss discusses using low-end Zynq SoCs (up to the Z-7020) and Zynq UltraScale+ MPSoCs (the ZU2 and ZU3) to create low-power solutions that deliver “just enough” performance for target industrial applications such as video processing, which requires billions of operations per second. The Zynq SoCs and Zynq UltraScale+ MPSoCs consume far less power than competing GPUs and CPUs while accelerating multiple CNN layers including convolutional layers, pooling layers, fully connected layers, and adding layers.

 

Here’s the new video:

 

 

 

 

Vivado 2017.4 is now available. Download it now to get these new features (see the release notes for complete details):

 

 

 

 

Download the new version of the Vivado Design Suite HLx editions here.

 

 

 

 

Continental AG has announced its Assisted & Automated Driving Control Unit, based on Xilinx All Programmable technology and developed in collaboration with Xilinx. According to the company, “…the Assisted & Automated Driving Control Unit will enable Continental’s customers to get to market faster by building upon the Open Computing Language (OpenCL) framework…” and “…offers a scalable product family for assisted and automated driving fulfilling the highest safety requirements (ASIL D) by 2019.” 

 

 

 

Continental Assisted Driving Control Unit.jpg 

 

 

Continental AG’s Assisted & Automated Driving Control Unit is based on Xilinx All Programmable technology

 

 

 

Continental’s incorporation of Xilinx All Programmable technology “provides developers the ability to optimize software for the appropriate processing engine or to create their own hardware accelerators with the Xilinx All Programmable technology. The result is the ultimate freedom to optimize performance, without sacrificing latency, power dissipation, or the flexibility to move software algorithms between the integrated chips, as the project progresses.”

 

“Our Assisted & Automated Driving Control Unit will enable automotive engineers to create their own differentiated solutions for machine learning, and sensor fusion. Xilinx’s All Programmable Technology was chosen as it offers flexibility and scalability to address the ever-changing and new requirements along the way to fully automated self-driving cars,” said Karl Haupt, Head of Continental’s Advanced Driver Assistance Systems business unit. “For Continental, the Assisted & Automated Driving Control Unit is a central element for implementing the required functional safety architecture and, at the same time, a host for the comprehensive environment model and driving functions needed for automated driving.”

 

Continental will be exhibiting at next week’s CES in Las Vegas.

 

 

 

 

By Adam Taylor

 

 

What better way to start the New Year than with a new Adam Taylor MicroZed Chronicles blog? – The Editor

 

 

 

Following on from the popularity of my final blog of last year where I presented several tips for better image-processing systems, I thought I would kick off the 2018 series of blogs by providing a number of tips for using the XADC and Sysmon in Zynq SoCs and Zynq UltraScale+ MPSoCs.

 

Whether our targeted device uses a XADC or Sysmon depends upon the device family. If we are targeting a 7 series FPGA or Zynq SoC device, we will be using the XADC. If the target is an UltraScale FPGA, UltraScale+ FPGA, or a Zynq UltraScale+ MPSoC, we’ll be using a Sysmon block. Behaviorally, the on-chip XADC and Sysmon blocks are very similar but there are some minor differences in architecture and maximum sampling rates between the two. Including the XADC or Sysmon adds a very interesting analog/mixed-signal capability to your design and helps reduce the number of external components. Because they can monitor internal device parameters along with external signals, you can also use the XADC/Sysmon blocks to implement a comprehensive system health and security monitoring solution critical for many applications.

 

Here are some of my favourite tips for using the Xilinx XADC/Sysmon blocks:

 

 

 

  • Remember Nyquist’s sampling theorem and configure sample clocks correctly

 

 

Image1.jpg 

 

 

 

To prevent signal aliasing, you must set the XADC/Sysmon sampling rate to at least twice the frequency of the signal being quantized. When sampling external signals, the XADC and Sysmon have different maximum sampling frequencies of 1000Ksamples/sec and 200Ksamples/sec respectively. To set the appropriate sampling frequency, we need to consider the relationship between the clock provided to the XADC/Sysmon (called DClock) and the resultant internally derived clock used for sampling (called ADC Clock). Both the XADC and Sysmon take a minimum of 26 internal ADC Clock cycles to perform a conversion. To achieve the maximum conversion rate of 1000KSPS for the XADC, we therefore need to set the ADC Clock at 26 MHz. For the Sysmon block, we need to set the ADC Clock to 5.2MHz to achieve the full 200ksamples/sec sample rate. ADC clock frequencies below these will result in lower sampling rates. Correctly setting the sampling rate depends upon the device you are using and the access method:

 

 

  1. Zynq PS APB Access – The XADC is clocked using the PCAP_2X clock which has a nominal frequency of 200MHz. This is divided further internally within the DevC to generate a DClock for the XADC, which has a maximum frequency of 50MHz using the XADCIF_CFG[TCKRATE] register settings. This clock is then further divided down (minimum division by 2) using the XDAC configuration register to select the desired conversion rate.
  2. Zynq MPSoC APB Access – The PS and PL Sysmon are clocked by the AMS clock with a range 0 to 52 MHz. This is then divided further by the Sysmon to create the ADC Clock used for sampling. This further division is a minimum of 2 for the PS Sysmon or 8 for the PL Sysmon using the Sysmon configuration clock division registers PS/PL CONFIG_REG2.
  3. AXI Access using FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC (PL Sysmon) – Use the system management wizard or XADC wizard configuration to determine the AXI clock frequency and hence the sampling clock via the IP customization tab. The required sample clock frequency can then be supplied by either a fabric clock (Zynq SoC or Zynq UltraScale+ MPSoC) or clock wizard (FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC).

 

 

 

 

  • Configure the Analog Inputs Correctly

 

Image2.jpg 

 

 

 

The analog inputs are defined by IP Integrator or software to be either unipolar or bipolar and you can control the input configuration for each analog input individually. When a unipolar signal is quantized, the input signal can range between 0V and 1V. For a bipolar input, the differential voltage between the Vp and Vn inputs is ±0.5V. Selecting the right mode ensures the best performance and avoids damaging the analog inputs. For unipolar configurations, Vp cannot be negative with respect to Vn. For bipolar inputs, Vp and Vn can swing positive and negative with respect the common mode (reference) voltage. Bipolar mode provides better noise performance because any common-mode noise coupled onto the Vp and Vn signals will be removed thanks to differential sampling.

 

When it comes to providing better performance in electrically noisy environments, you can also turn on input-channel averaging to average out the noise.

 

 

 

 

  • Leverage the External Multiplexer Capabilities

 

Image3.jpg 

 

 

Both the XADC and the Sysmon can accept as many as seventeen external differential analog signals using one dedicated Vp/Vn pair and sixteen Auxiliary Vp/Vn pins. Doing so of course uses several I/O signal pins—as many as 34 I/O pins if all analog inputs are used. This may present issues, especially on smaller devices where I/O-pin availability might be tightly constrained so the XADC/Sysmon can drive an external multiplexer that reduces the number of pins required and also allows you to use and external mux with added protection for harsh operating environments (e.g. ESD protection).

 

 

 

 

 

  • Consider the Anti-Aliasing Filter effect on Conversion Performance

 

Image4.jpg 

 

 

Implementing an Anti-Aliasing filter on the front end of the XADC/Sysmon external inputs is critical to ensuring that only the signals we want are quantized.

 

The external resistor and capacitors in the AAF will increase the overall settling time. Therefore, we need to ensure the external AAF also does not adversely affect the total settling time and consequently the conversion performance. Failing to provide adequate system-level settling time can result in ADC measurement errors because the sampling capacitor will not charge to its final value.

 

Xilinx APP 795 Driving the Xilinx Analog-to-Digital Converter provides very useful information on this subject.

 

 

 

 

  • Use the Alarms and Set Appropriate Thresholds

 

 

Image5.jpg 

 

 

 

Both the XADC and Sysmon can monitor internal power supply voltages and temperatures. This is a great feature when we initially commission the boards because we can verify that the power supplies are delivering the expected voltages. We can even use the temperature sensor to verify thermal calculations at the high and low end of qualification environments.

 

 

When it comes to creating the run-time application you should use the temperature and voltage alarms, which are based on defined thresholds for core voltages and device temperature. Should the measured parameter fall outside of these defined thresholds, an alarm allows further action to be taken. Configured correctly this alarm capability can be used to generate an interrupt which alerts the processing system to a problem. Depending upon which alarm which has been raised, the system can then act to either protect itself or undertake graceful degradation, thus preventing sudden failure.

 

 

 

Hopefully these tips will enable you to create smoother XADC/Sysmon solutions. If you experience any issues, I have a page on my website that links to all previous XADC / Sysmon examples in this series.   

 

 

 

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

  

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

OKI IDS and Avnet have jointly announced a new board for developing ADAS (automated driver assist systems) and advanced SAE Level 4/5 autonomous driving systems based on two Xilinx UltraScale+ MPSoCs.

 

 

 

 

OKI Avnet ADAS Platform Based on Two Zynq UltraScale Plus MPSoCs.jpg 

 

 

 

Avnet plans to start distributing the board in Japan in February, 2018 and will then expand into other parts of Asia. The A4-sized board interfaces to as many as twelve sensors including cameras and other types of imagers. The board operates on 12V and, according to the announcement, consumes about 20% of the power compared to similar hardware based on GPUs because it employs the Xilinx UltraScale+ MPSoCs as its foundation.  

 

Want to see this board in person? You can, at the Xilinx booth at Automotive World 2018 being held at Tokyo Big Site from January 17th to 19th. (Hall East 6, 54-47)

 

 

Ann Stefora Mutschler has just published an article on the SemiconductorEngineering.com Web site titled “Mixing Interface Protocols” that describes some of the complexities of SoC design—all related to the proliferation of various on- and off-chip I/O protocols. However, the article can just as easily be read as a reason for using programmable-logic devices such as Xilinx Zynq SoCs, Zynq UltraScale+ MPSoCs, and FPGAs in your system designs.

 

For example, here’s Mutschler’s lead sentence:

 

“Continuous and pervasive connectivity requires devices to support multiple interface protocols, but that is creating problems at multiple levels because each protocol is based on a different set of assumptions.”

 

This sentence nicely sums up the last two decades of interface design philosophy for programmable-logic devices. Early on, it became clear that a lot of logic translation was needed to connect early FPGAs to the rest of a system. When Xilinx developed I/O pins with programmable logic levels, it literally wiped out a big chunk of the market for level-translator chips. When MGT (multi-gigabit serial transceivers) started to become popular for moving large amounts of data from one subsystem to another, Xilinx moved those onto its devices as well.

 

So if you’d like to briefly glimpse into the chaotic I/O scene that’s creating immense headaches for SoC designers, take a read through Ann Stefora Mutschler’s new article. If you’d like to sidestep those headaches, just remember that Xilinx’s engineering team has already suffered them for you.

 

 

Looking for a quick explanation of the FPGA-accelerated AWS EC2 F1 instance? Here’s a 3-minute video

by Xilinx Employee ‎12-19-2017 10:45 AM - edited ‎12-19-2017 10:49 AM (8,526 Views)

 

The AWS EC2 F1 compute instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Several companies now list applications for FPGA-accelerated AWS EC2 F1 instances in the AWS Marketplace in application categories including:

 

 

  • Video processing
  • Data analytics
  • Genomics
  • Machine Learning

 

 

Here’s a 3-minute video overview recorded at the recent SC17 conference in Denver:

 

 

 

 

 

 

In an article published in EETimes today titled “Programmable Logic Holds the Key to Addressing Device Obsolescence,” Xilinx’s Giles Peckham argues that the use of programmable devices—such as the Zynq SoCs, Zynq UltraScale+ MPSoCs, and FPGAs offered by Xilinx—can help prevent product obsolescence in long-lived products designed for industrial, scientific, and military applications. And that assertion is certainly true. But in this blog, I want to highlight the response by a reader using the handle MWagner_MA who wrote:

 

Given the pace of change in FPGA's, I don't know if an FPGA will be a panacea for chip obsolescence issues. However, when changes in system design occur for hooking up new peripherals to a design off board, FPGA's can extend the life of a product 5+ years assuming you can get board-compatible FPGA's. Comm channels are what come to mind. If you use the same electrical interface but have an updated protocol, programmable logic can be a solution. Another solution is that when devices on SPI or I2C busses go obsolete, FPGA code can get updated to accomodate, even changing protocol if necessary assuming the right pins are connected at the other chip (like an A/D).”

 

 

MWagner_MA’s response is nuanced and tempered with obvious design experience. However, I will need to differ with the comment that the pace of change in FPGAs means something significant within the context of product obsolescence. Certainly FPGAs go obsolete, but it takes a long, long time.

 

Case in point:

 

I received an email just today from Xilinx about this very topic. (Feel free to insert amusement here about Xilinx’s corporate blogger being on the company’s promotional email list.) The email is about Xilinx’s Spartan-6 FPGAs, which were first announced in 2009. That’s eight or nine years ago. Today’s email states that Xilinx plans to ship Spartan-6 devices “until at least 2027.” That’s another nine or ten years into the future for a resulting product-line lifespan of nearly two decades and that’s not all that unusual for Xilinx parts. In other words, Xilinx FPGAs are in another universe entirely when compared to the rapid pace of obsolescence for semiconductor devices like PC and server processors. That’s something to keep in mind when you’re designing products destined for a long life in the field.

 

If you want to see the full long-life story for the Spartan-6 FPGA family, click here.

 

 

 

The Raptor from Rincon Research implements a 2x2 MIMO SDR (software-defined radio) in a compact 5x2.675-inch form factor by combining the capabilities of the Analog Devices AD9361 RF Agile Transceiver and the Zynq UltraScale+ ZU9EG MPSoC. The board has an RF tuning range of 70MHz to 6GHz. On-board memory includes 4Gbytes of DDR4 SDRAM, a pair of QSPI Flash memory chips, and an SD card socket. Digital I/O options include three on-board USB connectors (two USB 3.0 ports and one USB 2.0 port) and, through a mezzanine board, 10/100/1000 Ethernet, two SFP+ optical cages, an M.2 SATA port, DisplayPort, and a Samtec FireFly connector. Rincon Research provides the board along with a BSP, drivers, and COTS tool support.

 

Here’s a block diagram of the Raptor board:

 

 

Rincon Research Raptor Block Diagram.jpg

 

Rincon Research’s Raptor, a 2x2 MIMO SDR Board, Block Diagram

 

 

 

Here are photos of the Raptor main board and its I/O expansion mezzanine board:

 

 

 

Rincon Research Raptor Board.jpg 

 

Rincon Research’s Raptor 2x2 MIMO SDR Board

 

 

 

 

Rincon Research Raptor IO Mezzanine Board.jpg 

 

Rincon Research’s Raptor I/O Expansion Board

 

 

 

Please contact Rincon Research for more information about the Raptor SDR.

 

 

 

By Adam Taylor

 

 

For the final MicroZed Chronicles blog of the year, I thought I would wrap up with several tips to help when you are creating embedded-vision systems based on Zynq SoC, Zynq UltraScale+ MPSoC, and Xilinx FPGA devices.

 

Note: These tips and more will be part of Adam Taylor’s presentation at the Xilinx Developer Forum that will be held in Frankfurt, Germany on January 9.

 

 

 

Image1.jpg 

 

 

 

 

  1. Design in Flexibility from the Beginning

 

 

Image2.jpg

 

 

Video Timing Controller used to detect the incoming video standard

 

 

Use the flexibility provided by the Video Timing Controller (VTC) and reconfigurable clocking architectures such as Fabric Clocks, MMCM, and PLLs.  Using the VTC and associated software running on the PS (processor system) in the Zynq SoC and Zynq UltraScale+ MPSoC, it is possible to detect different video standards from an input signal at run time and to configure the processing and output video timing accordingly. Upon detection of a new video standard, the software running on the PS can configure new clock frequencies for the pixel clock and the image-processing chain along with re-configuring VDMA frame buffers for the new image settings. You can use the VTC’s timing detector and timing generator to define the new video timing. To update the output video timings for the new standard, the VTC can use the detected video settings to generate new output video timings.

 

 

 

  1. Convert input video to AXI Interconnect as soon as possible to leverage IP and HLS

 

 

Image3.jpg 

 

 

Converting Data into the AXI Streaming Format

 

 

 

Vivado provides a range of key IP cores that implement most of the functions required by an image processing chain—functions such as Color Filter Interpolation, Color Space Conversion, VDMA, and Video Mixing. Similarity Vivado HLS can generate IP cores that use the AXI interconnect to ease integration within Vivado designs. Therefore, to get maximum benefit from the available IP and tool chain capabilities, we need to convert our incoming video data into the AXI Streaming format as soon as possible in the image-processing chain. We can use the Video-In-to-AXI-Stream IP core as an aid here. This core converts video from a parallel format consisting of synchronization signals and pixel values into our desired AXI Streaming format. A good tip when using this IP core is that the sync inputs do not need to be timed as per a VGA standard; they are edge triggered. This eases integration with different video formats such as Camera Link, with its frame-valid, line-valid, and pixel information format, for example. 

 

 

 

  1. Use Logic Debugging Resources

 

 

Image4.jpg 

 

 

 

Insertion of the ILA monitoring the output stage

 

 

 

Insert integrated logic analyzers (ILAs) at key locations within the image-processing chain. Including these ILAs from day one in the design can help speed commissioning of the design. When implementing an image-processing chain in a new design, I insert ILA’s as a minimum in the following locations:

 

  • Directly behind the receiving IP module—especially if it is a custom block. This ILA enables me to be sure that I am receiving data from the imager / camera.
  • On the output of the first AXI Streaming IP Core. This ILA allows me to be sure the image-processing core has started to move data through the AXI interconnect. If you are using VDMA, remember you will not see activity on the interconnect until you have configured the VDMA via software.
  • On the AXI-Streaming-to-Video-Out IP block, if used. I also consider connecting the video timing controller generator outputs to this ILA as well. This enables me to determine if the AXI-Stream-to-Video-Out block is correctly locked and the VTC is generating output timing.

 

When combined with the test patterns discussed below, insertion of ILAs allows us to zero in faster on any issues in the design which prevent the desired behavior.

 

 

 

  1. Select an Imager / Camera with a Test Pattern capability

 

 

Image5.jpg 

 

 

Incorrectly received incrementing test pattern captured by an ILA

 

 

 

If possible when selecting the imaging sensor or camera for a project, choose one that provides a test pattern video output. You can then use this standard test pattern to ensure the reception, decoding, and image-processing chain is configured correctly because you’ll know exactly what the original video signal looks like. You can combine the imager/camera test pattern with ILAs connected close to the data reception module to determine if any issues you are experiencing when displaying an image is internal to the device and the image processing chain or are the result of the imager/camera configuration.

 

We can verify the deterministic pixel values of the test pattern using the ILA. If the pixel values, line length, and the number of lines are as we expect, then it is not an imager configuration issue. More likely you will find the issue(s) within the receiving module and the image-processing chain.  This is especially important when using complex imagers/cameras that require several tens, or sometimes hundreds of configuration settings to be applied before an image is obtained.

 

 

  1. Include a Test Patter Generator in your Zynq SoC, Zynq UltraScale+ MPSoC, or FPGA design

 

 

Image6.jpg 

 

 

Tartan Color Bar Test Pattern

 

 

 

If you include a test-pattern generator within the image-processing chain, you can use it to verify the VDMA frame buffers, output video timing, and decoding prior to the integration of the imager/camera. This reduces integration risks. To gain maximum benefit, the test-pattern generator should be configured with the same color space and resolution as the final imager. The test pattern generator should be included as close to the start of the image-processing chain as possible. This enables more of the image-processing pipeline to be verified, demonstrating that the image-processing pipeline is correct. When combined with test pattern capabilities on the imager, this enables faster identification of any problems.

 

 

 

  1. Understand how Video Direct Memory Access stores data in memory

 

 

Image7.jpg 

 

 

 

Video Direct Memory Access (VDMA) allows us to use the processor DDR memory as a frame buffer. This enables access to the images from the processor cores in the PS to perform higher-level algorithms if required. VDMA also provides the buffering required for frame-rate and resolution changes. Understanding how VDMA stores pixel data within the frame buffers is critical if the image-processing pipeline is to work as desired when configured.

 

One of the major points of confusion when implementing VDMA-based solutions centers around the definition of the frame size within memory. The frame buffer is defined in memory by three parameters: Horizontal Size (HSize), Vertical Size (VSize). and Stride.  The two parameters that define the Horizontal Size of the image are the HSize and the stride of the image. Like VSize, which defines the number of lines in the image, the HSize defines the length of each line. However instead of being measured in pixels the horizontal size is measured in bytes. We therefore need to know how many bytes make up each pixel.

 

The Stride defines the distance between the start of one line and another. To gain efficient use of the DDR memory, the Stride should at least equal the horizontal size. Increasing the Stride introduces a gap between lines. Implementing this gap can be very useful when verifying that the imager data is received correctly because it provides a clear indication of when a line of the image starts and ends with memory.

 

These six simple techniques have helped me considerably when creating imageprocessing examples for this blog or solutions for clients and they significantly ease both the creation and commissioning of designs.

 

As I said, this is my last blog of the year. We will continue this series in the New Year. Until then I wish you all happy holidays.

 

 

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

  

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

 

TSN (time-sensitive networking) is a set of evolving IEEE standards that support a mix of deterministic, real-time and best-effort traffic over fast Ethernet connections. The TSN set of standards is bocming increasingly important in many industrial networking sutuations, particularly for IIoT (the Industrial Internet of Things). SoC-e has developed TSN IP that you can instantiate in Xilinx All Programmable devices. (Because the standards are still evolving, implementing the TSN hardware in reprogrammable hardware is a good idea.)

 

In particular, the company offers the MTSN (Multiport TSN Switch IP Core) IP core, which provides precise time synchronization of network nodes using synchronized, distributed local clocks with a reference and IEEE 802.1Qbv for enhanced traffic scheduling. You can currently instantiate the SoC-e core on all of the Xilinx 7 series devices (the Zynq SoC and Spartan-7, Artix-7, Kintex-7, and Virtex-7 FPGAs), Virtex and Kintex UltraScale devices, and all UltraScale+ devices (the Zynq UltraScale+ MPSoCs and Virtex and Kintex UltraScale+ FPGAs).

 

Here’s a short three-and-a-half minute video explaining TSN and the SoC-e MSTN IP:

 

 

 

 

Tincy YOLO: a real-time, low-latency, low-power object detection system running on a Zynq UltraScale+ MPSoC

by Xilinx Employee ‎12-14-2017 10:39 AM - edited ‎12-15-2017 06:15 AM (21,042 Views)

 

Last week at the NIPS 2017 conference in Long Beach, California, a Xilinx team demonstrated a live object-detection implementation of a YOLO—“you only look once”—network called Tincy YOLO (pronounced “teensy YOLO”) running on a Xilinx Zynq UltraScale+ MPSoC. Tincy YOLO combines reduced precision, pruning, and FPGA-based hardware acceleration to speed network performance by 160x, resulting in a YOLO network capable of operating on video frames at 16fps while dissipating a mere 6W.

 

 

Figure 5.jpg 

 

Live demo of Tincy YOLO at NIPS 2017. Photo credit: Dan Isaacs

 

 

 

Here’s a description of that demo:

 

 

 

 

TincyYOLO: a real-time, low-latency, low-power object detection system running on a Zynq UltraScale+ MPSoC

 

 

By Michaela Blott, Principal Engineer, Xilinx

 

 

The Tincy YOLO demonstration shows real-time, low-latency, low-power object detection running on a Zynq UltraScale+ MPSoC device. In object detection, the challenge is to identify objects of interest within a scene and to draw bounding boxes around them, as shown in Figure 1. Object detection is useful in many areas, particularly in advanced driver assistance systems (ADAS) and autonomous vehicles where systems need to automatically detect hazards and to take the right course of action. Tincy YOLO leverages the “you only look once” (YOLO) algorithm, which delivers state-of-the-art object detection. Tincy YOLO is based on the Tiny YOLO convolutional network, which is based on the Darknet reference network. Tincy YOLO has been optimized through heavy quantization and modification to fit into the Zynq UltraScale+ MPSoC’s PL (programmable logic) and Arm Cortex-A53 processor cores to produce the final, real-time demo.

 

 

Figure 1.jpg 

 

Figure 1: YOLO-recognized people with bounding boxes

 

 

 

To appreciate the computational challenge posed by Tiny YOLO, note that it takes 7 billion floating-point operations to process a single frame. Before you can conquer this computational challenge on an embedded platform, you need to pull many levers. Luckily, the all-programmable Zynq UltraScale+ MPSoC platform provides many levers to pull. Figure 2 summarizes the versatile and heterogeneous architectural options of the Zynq platform.

 

 

Figure 2.jpg 

 

Figure 2: Tincy YOLO Platform Overview

 

 

 

The vanilla Darknet open-source neural network framework is optimized for CUDA acceleration but its generic, single-threaded processing option can target any C-programmable CPU. Compiling Darknet for the embedded Arm processors in the Zynq UltraScale+ MPSoC left us with a sobering performance of one recognized frame every 10 seconds. That’s about two orders of magnitude of performance away from a useful ADAS implementation. It also produces a very limited live-video experience.

 

To create Tincy YOLO, we leveraged several of the Zynq UltraScale+ MPSoC’s architectural features in steps, as shown in Figure 3. Our first major move was to quantize the computation of the network’s twelve inner (aka. hidden) layers by giving them binary weights and 3-bit activations. We then pruned this network to reduce the total operations to 4.5 GOPs/frame.

 

 

 

Figure 3.jpg 

 

Figure 3: Steps used to achieve a 160x speedup of the Tiny YOLO network

 

 

 

We created a reduced-precision accelerator using a variant of the FINN BNN library (https://github.com/Xilinx/BNN-PYNQ) to offload the quantized layers into the Zynq UltraScale+ MPSoC’s PL. These layers account for more than 97% of all the computation within the network. Moving the computations for these layers into hardware bought us a 30x speedup of their specific execution, which translated into an 11x speedup within the overall application context, bringing the network’s performance up to 1.1fps.

 

We tackled the remaining outer layers by exploiting the NEON SIMD vector capabilities built into the Zynq UltraScale+ MPSoC’s Arm Cortex-A53 processor cores, which gained another 2.2x speedup. Then we cracked down on the complexity of the initial convolution using maxpool elimination for another 2.2x speedup. This work raised the frame rate to 5.5fps. A final re-write of the network inference to parallelize the CPU computations across all four of the Zynq UltraScale+ MPSoC’s Arm Cortex-A53 processor delivered video performance at 16fps.

 

The result of these changes appears in Figure 4, which demonstrates better recognition accuracy than Tiny YOLO.

 

 

 

Figure 4.jpg 

 

Figure 4: Tincy YOLO results

 

 

 

 

 

High-frequency trading is all about speed, which explains why Aldec’s new reconfigurable HES-HPC-HFT-XCVU9P PCIe card for high-frequency trading (HFT) apps is powered by a Xilinx Virtex UltraScale+ VU9P FPGA. That’s about as fast as you can get with any sort of reprogrammable or reconfigurable technology. The Virtex UltraScale+ FPGA directly connects to all of the board’s critical, high-speed interface ports—Ethernet, QSFP, and PCIe x16—and implements the communications protocols for those standard interfaces as well as the memory control and interface for the board’s three QDR-II+ memory modules. Consequently, there’s no time-consuming chip-to-chip interconnection. Picoseconds count in HFT applications, so the FPGA’s ability to implement all of the card’s logic is a real competitive advantage for Aldec. The new FPGA accelerator is extremely useful for implementing time-sensitive trading strategies such as Market Making, Statistical Arbitrage, and Algorithmic Trading and is compatible with 1U and larger trading systems.

 

 

Aldec HES-HPC-HFT-XCVU9P PCIe card .jpg 

 

 

Aldec’s HES-HPC-HFT-XCVU9P PCIe card for high-frequency trading apps—Powered by a Xilinx Virtex UltraScale+ FPGA

 

 

 

 

Here’s a block diagram of the board:

 

 

 

Aldec HES-HPC-HFT-XCVU9P PCIe card block diagram.jpg

 

 

Aldec’s HES-HPC-HFT-XCVU9P PCIe card block diagram

 

 

 

Please contact Aldec directly for more information about the HES-HPC-HFT-XCVU9P PCIe card.

 

 

 

Eideticom’s NoLoad (NVMe Offload) platform uses FPGa-based acceleration on PCIe FPGA cards and in cloud-based FPGA servers to provide storage and compute acceleration through standardized NVMe and NVMe over Fabrics protocols. The No Load product itself is a set of IP that implements the NoLoad accelerator. The company is offering Hardware Eval Kits that target FPGA-based PCIe cards from Nallatech--the 250S FlashGT+ Card based on a Xilinx Kintex UltraScale+ KU15P FPGA—and the Alpha Data ADM-PCIE-9V3, which is based on a Xilinx Virtex UltraScale+ VU3P FPGA.

 

The NoLoad platform allows networked systems to share FPGA acceleration resources across the network fabric. For example, Eideticom offers an FPGA-accelerated Reed-Solomon Erasure Coding engine that can supply codes to any storage facility on the network.

 

Here’s a 6-minute video that explains the Eideticom NoLoad offering with a demo from the Xilinx booth at the recent SC17 conference:

 

 

 

 

 

For more information about the Nallatech 250S+ SSD accelerator, see “Nallatech 250S+ SSD accelerator boosts storage speed of four M.2 NVMe drives using Kintex UltraScale+ FPGA.”

 

 

For more information about the Alpha Data ADM-PCIE-9V3, see “Blazing new Alpha Data PCIe Accelerator card sports Virtex UltraScale+ VU3P FPGA, 4x 100GbE ports, 16Gbytes of DDR4 SDRAM.”

 

The latest hypervisor to host Wind River’s VxWorks RTOS alongside with Linux is the Xen Project Hypervisor, an open-source virtualization platform from the Linux Foundation. DornerWorks has released a version of the Xen Project Hypervisor called Virtuosity (the hypervisor formerly known as the Xen Zynq Distribution) that runs on the Arm Cortex-A53 processor cores in the Xilinx Zynq UltraScale+ MPSoC. Consequently, Wind River has partnered with DornerWorks to provide a Xen Project Hypervisor solution for VxWorks and Linux on the Xilinx Zynq UltraScale+ MPSoC ZCU102 eval kit.

 

Having VxWorks and Linux running on the same system allows developers to create hybrid software systems that offer the combined advantages of the two operating systems, with VxWorks managing mission-critical functions and Linux managing human-interactive functions and network cloud connection functions.

 

Wind River has just published a blog about using VxWorks and Linux on the Arm cortex-A53 processor, concisely titled “VxWorks on Xen on ARM Cortex A53,” written by Ka Kay Achacoso. The blog describes an example system with VxWorks running signal-processing and spectrum-analysis applications. Results are compiled into a JSON string and sent through the virtual network to Ubuntu.  On Ubuntu, the Apache2 HTTP server sends results to a browser using Node.js and Chart.js to format the data display.

 

Here’s a block diagram of the system in the Wind River blog:

 

 

 

Wind River VxWorks and Linux Hybrid System.jpg 

 

VxWorks and Linux Hybrid OS System

 

 

 

VxWorks runs as a guest OS on top of the unmodified Virtuosity hypervisor.

 

 

 

For more information about DornerWorks Xen Hypervisor (Virtuosity), see:

 

 

 

 

 

 

There was a live AWS EC2 F1 application-acceleration Developer’s Workshop during last month Amazon’s re:Invent 2017. If you couldn’t make it, don’t worry. It’s now online and you can run through it in about two hours (I’m told). This workshop teaches you how to develop accelerated applications using the AWS F1 OpenCL flow and the Xilinx SDAccel development environment for the AWS EC2 F1 platform, which uses Xilinx Virtex UltraScale+ FPGAs as high-performance hardware accelerators.

 

The architecture of the AWS EC2 F1 platform looks like this:

 

 

AWS EC2 F1 Architecture.jpg 

 

AWS EC2 F1 Architecture

 

 

 

This developer workshop is divided in 4 modules. Amazon recommends that you complete each module before proceeding to the next.

 

  1. Connecting to your F1 instance 
    You will start an EC2 F1 instance based on the FPGA developer AMI and connect to it using a remote desktop client. Once connected, you will confirm you can execute a simple application on F1.
  2. Experiencing F1 acceleration 
    AWS F1 instances are ideal to accelerate complex workloads. In this module you will experience the potential of F1 by using FFmpeg to run both a software implementation and an F1-optimized implementation of an H.265/HEVC encoder.
  3. Developing and optimizing F1 applications with SDAccel 
    You will use the SDAccel development environment to create, profile and optimize an F1 accelerator. The workshop focuses on the Inverse Discrete Cosine Transform (IDCT), a compute intensive function used at the heart of all video codecs.
  4. Wrap-up and next steps 
    Explore next steps to continue your F1 experience after the re:Invent 2017 Developer Workshop.

 

 

Access the online AWS EC2 F1 Developer’s Workshop here.

 

 

For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:

 

 

 

 

 

 

 

 

Accolade’s new ANIC-200Kq Flow Classification and Filtering Adapter brings packet processing, storage optimization, and scalable Flow Classification at 100GbE through two QSFP28 optical cages. Like the company’s ANIC-200Ku Lossless Packet Capture adapter introduced last year, the ANIC-200Kq board is based on a Xilinx UltraScale FPGA so it’s able to run a variety of line-speed packet-processing algorithms including the company’s new “Flow Shunting” feature.

 

 

 

Accolade ANIC-200Kq Flow Classification and Filtering Adapter.jpg 

 

Closeup view of the QSFP28 ports on Accolade’s ANIC-200Kq Flow Classification and Filtering Adapter

 

 

 

The new ANIC-200Kq adapter differs from the older ANIC-200Ku adapter in its optical I/O ports. The ANIC-200Kq adapter incorporates two QSFP28 optical cages and the ANIC-200Kq adapter incorporates two CFP2 cages. Both the QSFP28 and CFP2 interfaces accept SR4 and LR4 modules. The QSFP28 optical cages put Accolade’s ANIC-200Kq adapter squarely in the 25, 40, and 100GbE arenas, providing data center architects with additional architectural flexibility when designing their optical networks. For this reason, QSFP28 is fast becoming the universal form factor for new data center installations.

 

 

For more information in Xcell Daily about Accolade’s fast Flow Classification and Filtering Adapters, see:

 

 

 

 

 

 

By Adam Taylor

 

Getting the best performance from our embedded-vision systems often requires that we can capture frames individually for later analysis in addition to displaying them. Programs such as Octave, Matlab, or Image J can analyze these captured frames, allowing us to examine parameters such as:

 

  • Compare the received pixel values against those expected for a test or calibration pattern.
  • Examine the Image Histogram, enabling histogram equalization to be implemented if necessary.
  • Ensure that the integration time of the imager is set correctly for the scene type.
  • Examine the quality of the image sensor to identify defective pixels—for example dead or stuck-at pixels.
  • Determine the noise present in the image. The noise present will be due to both inherent imager noise sources—for example fixed pattern noise, device noise and dark current—and also due to system noise as coupled in via power supplies and other sources of electrical noise in the system design.

 

Typically, this testing may occur in the lab as part of the hardware design validation and is often performed before the higher levels of the application software are available.  Such testing is often implemented using a bare-metal approach on the processor system.

 

If we are using VDMA, the logical point to extract the captured data is from the frame buffer in the DDR SDRAM attached to the Zynq SoC’s or MPSoC’s PS. There are two methods we can use to examine the contents of this buffer:

 

  • Use XSCT terminal to read out the frame buffer and post process it using a TCL script.
  • Output the frame buffer over RS232 or Ethernet using the Light Weight IP Stack and then capturing the image data in a terminal for post processing using a TCL file.

 

For this example, I am going to use the UltraZed design we created a few weeks ago to examine PL-to-PS image transfers in the Zynq UltraScale+ MPSoC (see here). This design rather helpfully uses the test pattern generator to transfer a test image to a frame buffer in the PS-attached DDR SDRAM. In this example, we will extract the test pattern and convert it into a bit-map (BMP) file. Once we have the bit-map file, we can read it into the analysis program of choice.

 

BMP files are very simple. In the most basic format, they consist of a BMP Header, Device Independent Bitmap (DIB) Header, and the pixel array. In this example the pixel array will consist of 24-bit pixels, using eight bits each for blue, green and red pixel values.

 

It is important to remember two key facts when generating the pixel array. First, when generating the pixel array each line must be padded with zeros so that its length is a multiple of four, allowing for 32-bit word access. Second, the BMP image is stored upside down in the array. That is the first line of the pixel array is the bottom line of the image.

 

Combined, both headers equal 54 bytes in length and are structured as shown below:

 

 

 

Image1.jpg

 

Bitmap Header Construction

 

 

 

Image2.jpg 

 

DIB Header Construction

 

 

 

Having understood what is involved in creating the file, all we need to do now is gather the pixel data from the PS-attached DDR SDRAM and output it in the correct format.

 

As we have done several times before in this blog, when we extract the pixel values it is a good idea to double check that the frame buffer contains pixel values. We can examine the contents of the frame buffer using the memory viewer in SDK. However, the view we choose will ease our understanding of the pixel values and hence the frame. This is due to how the VDMA packs the pixels into the frame buffer.

 

The default view for the Memory viewer is to display 32-bit words as shown below:

 

 

 

Image3.jpg

 

TPG Test Pattern in memory

 

 

 

The data we are working with has a pixel width of 24 bits. To ensure efficient use of the DDR SDRAM memory, the VDMA packs the 24-bit pixels into 32-bit values, splitting pixels across locations. This can make things a little confusing when we look at the memory contents for expected pixel values. Because we know the image is formatted as 8-bit RGB, a better view is to configure the memory display to list the memory contents in byte order. We then know that each group of three bytes represents one pixel.

 

 

 

Image4.jpg

 

 

TPG Test Pattern in memory Byte View

 

 

 

 

Having confirmed that the frame buffer contains image data, I am going to output the BMP information over the RS232 port for this example. I have selected this interface because it is the simplest interface available on many development boards and it takes only a few seconds to read out even a large image.

 

The first thing I did in my SDK application was to create a structure that defines the header and sets the values as required for this example:

 

 

Image5.jpg 

 

Header Structure in the application

 

 

 

I then created a simple loop that creates three u8 arrays, each the size of the image. There is one array for each color element. I then used these arrays with the header information to output the BMP information, taking care to use the correct format for the pixel array. A BMP pixel array organizes the pixel element as Blue-Green-Red:

 

 

Image6.jpg 

 

Body of the Code to Output the Image

 

 

 

Wanting to keep the processes automated and without the need to copy and paste to capture the output, I used Putty as the terminal program to receive the output data. I selected Putty because it is capable of saving received data a log file.

 

 

Image7.jpg 

 

Putty Configuration for logging

 

 

 

Of course, this log file contains an ASCII representation of the BMP. To view it, we need to convert it to a binary file of the same values. I wrote a simple TCL script to do this. The script performs the conversion, reading in the ASCII file and writing out the binary BMP File.

 

 

Image8.jpg 

 

TCL ASCII to Binary Conversion Widget

 

 

 

With this complete, we have the BMP image which we can load into Octave, Matlab, or another tool for analysis. Below is an example of the tartan color-bar test pattern that I captured from the Zynq frame buffer using this method:

 

 

 

Image9.jpg

 

Generated BMP captured from the PS DDR

 

 

 

Now if we can read from the frame buffer, then it springs to mind that we can use the same process to write a BMP image into the frame buffer. This can be especially useful when we want to generate overlays and use them with the video mixer.

 

We will look at how we do this in a future blog.

 

 

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

  

 

MicroZed Chronicles hardcopy.jpg 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

Every year at the end of the year for the past few decades, the staff of EDN has sifted through the thousands of electronic products they’ve written about over the past year to select the Hot 100 products for the year. In EDN’s opinion, the Xilinx Zynq UltraScale+ RFSoC is one of the Hot 100 products for 2017 in the “RF & Networking” Category.

 

Members of the Xilinx Zynq UltraScale+ RFSoC device family integrates multi-gigasample/sec RF ADCs and DACs, soft-decision forward error correction (SD-FEC) IP blocks, Xilinx UltraScale architecture programmable logic fabric, and an Arm Cortex-A53/Cortex-R5 multi-core processing subsystem into one chip. The Zynq UltraScale+ RFSoC is a category killer for many, many applications that need “high-speed analog-in, high-speed analog-out, digital-processing-in-the-middle” capabilities due to the devices’ extremely high integration level. It most assuredly will reduce the size, power, and complexity of traditional antenna structures in many RF applications—especially for 5G antenna systems.

 

As I wrote when the Zynq UltraScale+ RFSoC family won the IET Innovation Award in the Communications category, “There's simply no other device like the Zynq UltraScale+ RFSoC on the market, as suggested by this award. “

 

 

 

RFSoC Conceptual Diagram.jpg 

 

Zynq UltraScale+ RFSoC Conceptual Diagram

 

 

 

For more information about the Zynq UltraScale+ RFSoC, see:

 

 

 

 

 

 

 

 

 

 

 

 

The upcoming Xilinx Developer Forum in Frankfurt, Germany on January 9 will feature a hands-on Developer Lab titled “Accelerating Applications with FPGAs on AWS.” During this afternoon session, you’ll gain valuable hands-on experience with the FPGA-accelerated AWS EC2 F1 instance and hear from a special guest speaker from Amazon Web Services. Attendance is limited on a first-come-first-serve basis, so you must register, here.

 

 

For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:

 

 

 

 

 

 

 

 

 

Netcope’s NP4, a cloud-based programming tool allows you to specify networking behavior using declarations written in the P4 network-specific, high-level programming language for the company’s high-performance, programmable Smart NICs based on Xilinx Virtex UltraScale+ and Virtex-7 FPGAs. The programming process involves the following steps:

 

  1. Write the P4 code.
  2. Upload your code to the NP4 cloud.
  3. Wait for the application to autonomously translate your P4 code into VHDL and synthesize the FPGA configuration.
  4. Download the firmware bitstream and upload it to the FPGA on your Netcope NIC.

 

Netcope calls NP4 its “Firmware as a Service” offering. If you are interested in trying NP4, you can request free trial access to the cloud service here.

 

 

Netcope NFB-200G2QL Programmable NIC.jpg

 

Netcope Technologies’ NFB-200G2QL 200G Ethernet Smart NIC based on a Virtex UltraScale+ FPGA

 

 

 

For more information about Netcope and P4 in Xcell Daily, see:

 

 

 

For more information about Netcope’s FPGA-based NICs in Xcell Daily, see:

 

 

 

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.