UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

Embedded-vision applications present many design challenges and a new ElectronicsWeekly.com article written by Michaël Uyttersprot, a Technical Marketing Manager at Avnet Silica, and titled “Bringing embedded vision systems to market” discusses these challenges and solutions.

 

First, the article enumerates several design challenges including:

 

  • Meeting hi-res image-processing demands within cost, size, and power goals
  • Handling a variety of image-sensor types
  • Handling multiple image sensors in one camera
  • Real-time compensation (lens correction) for inexpensive lenses
  • Distortion correction, depth detection, dynamic range, and sharpness enhancement

 

 

Next, the article discusses Avnet Silica’s various design offerings that help engineers quickly develop embedded-vision designs. Products discussed include:

 

 

 

Avnet PicoZed Embedded Vision Kit.jpg 

 

The Avnet PicoZed Embedded Vision Kit is based on the Xilinx Zynq SoC

 

 

 

If you’re about to develop any sort of embedded-vision design, it might be worth your while to read the short article and then connect with your friendly neighborhood Avnet or Avnet Silica rep.

 

 

 

For more information about the Avnet PicoZed Embedded Vision Kit, see “Avnet’s $1500, Zynq-based PicoZed Embedded Vision Kit includes Python-1300-C camera and SDSoC license.”

 

 

 

 A YouTube video maker with the handle “takeshi i” has just posted an 18-minute video titled “IoT basics with ZYBO (Zynq)” that demonstrates an IoT design created with a $199 Digilent Zybo Z7 dev board based on a Xilinx Zynq SoC. (Note: It's a silent video.)

 

First, the YouTube video demonstrates the IoT design interacting with an app on a mobile phone. Then video takes you step-by-step through the creation process using the Xilinx Vivado development environment.

 

The YouTuber writes:

 

“I implemented a web server using Python and bottle framework, which works with another C++ application. The C++ application controls my custom IPs (such as PWM) implemented in PL block. A user can control LEDs, 3-color LEDs, buttons and switches mounted on ZYBO board.”

 

The YouTube video’s Web page also lists the resources you need to recreate the IoT design:

 

 

 

 

Here’s the video:

 

 

 

 

 

 

A quick look at the latest product table for the Xilinx Zynq UltraScale+ RFSoC will tell you that the sample rate for the devices’ RF-class, 14-bit DAC has jumped to 6.554Gsamples/sec, up from 6.4Gsamples/sec. I asked Senior Product Line Manager Wouter Suverkropp about the change and he told me that the increase supports “…an extra level of oversampling for DOCSIS3.1 [designers]. The extra oversampling gives them 3dB processing gain and therefore simplifies the external circuits even further.”

 

 

 

RFSoC Conceptual Diagram.jpg 

 

Zynq UltraScale+ RFSoC Conceptual Diagram

 

 

 

For more information about the Zynq UltraScale+ RFSoC, see:

 

 

 

 

 

 

 

 

 

 

 

 

Xilinx has announced availability of automotive-grade Zynq UltraScale+ MPSoCs, enabling development of safety critical ADAS and Autonomous Driving systems.  The 4-member Xilinx Automotive XA Zynq UltraScale+ MPSoC family is qualified according to AEC-Q100 test specifications with full ISO 26262 ASIL-C level certification and is ideally suited for various automotive platforms by delivering the right performance/watt while integrating critical functional safety and security features.

 

The XA Zynq UltraScale+ MPSoC family has been certified to meet ISO 26262 ASIL-C level requirements by Exida, one of the world's leading accredited certification companies specializing in automation and automotive system safety and security. The product includes a "safety island" designed for real-time processing functional safety applications that has been certified to meet ISO 26262 ASIL-C level requirements.  In addition to the safety island, the device’s programmable logic can be used to create additional safety circuits tailored for specific applications such as monitors, watchdogs, or functional redundancy. These additional hardware safety blocks effectively allow ASIL decomposition and fault-tolerant architecture designs within a single device.

 

 

Xilinx XA Zynq UltraScale Plus MPSoC.jpg 

 

  

 

 

Bitmain manufactures Bitcoin, Litecoin, and other cryptocurrency mining machines and currently operates the world’s largest cryptocurrency mines. The company’s latest-generation Bitcoin miner, the Antminer S9, incorporates 189 of Bitmain’s 16nm ASIC, the BM1387, which performs the Bitcoin hash algorithm at a reate of 14 TeraHashes/sec. (See “Heigh ho! Heigh ho! Bitmain teams 189 bitcoin-mining ASICs with a Zynq SoC to create world's most powerful bitcoin miner.”) The company also uses one Zynq Z-7010 SoC to control those 189 hash-algorithm ASICs.

 

 

Bitmain Antminer S9.jpg 

 

Bitmain’s Antminer S9 Bitcoin Mining Machine uses a Zynq Z-7010 SoC as a main control processor

 

 

 

The Powered by Xilinx program has just published a 3-minute video containing an interview with Yingfei Li, Bitmain’s Marketing Director, and Wenguo Zhang, Bitmain’s Hardware R&D Director. In the video, Zhang explains that the Zynq Z-7010 solved multiple hidden problems with the company’s previous-generation control panel, thanks to the Zynq SoC’s dual-core Arm Cortex-A9 MPCore processor and the on-chip programmable logic.

 

Due to the success that Bitmain has had with Xilinx Zynq SoCs in it’s Antminer S9 Bitcoin mining machine, the company is now exploring the use of Xilinx 20nm and 16nm devices (UltraScale and UltraScale+) for future, planned AI platforms and products.

 

 

 

 

DornerWorks is one of only three Xilinx Premier Alliance Partners in North America offering design services, so the company has more than a little experience using Xilinx All Programmable devices. The company has just launched a new learn-by-email series with “interesting shortcuts or automation tricks related to FPGA development.”

 

The series is free but you’ll need to provide an email address to receive the lessons. I signed up and immediately received a link to the first lesson titled “Algorithm Implementation and Acceleration on Embedded Systems” written by DornerWorks’ Anthony Boorsma. It contains information about the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC and the Xilinx SDSoC development environment.

 

Sign up here.

 

 

 

Need a tiny-but-powerful SOM for your next embedded project? The iWave iW-RainboW-G28M SOM based on a Xilinx Zynq Z-7007S, Z-7014S, Z-7010, or Z-7020 SoC is certainly tiny—it’s a 67.6x37mm plug-in SoDIMM—and with one or two Arm Cortex A9 MPCore processors, 512Mbytes of DDR3 SDRAM, 512Mbytes of NAND Flash, Gigabit Ethernet and USB 2.0 ports, and an optional WiFi/Bluetooth module it certainly qualifies as powerful and it’s offered in an industrial temp range (-40°C to +85°C).

 

 

iWave iW-RainboW-G28M SOM .jpg 

 

iWave’s iW-RainboW-G28M SoDIMM SOM is based on any one of four Xilinx Zynq SoCs

 

 

 

iWave’s SOM design obviously takes advantage of the pin compatibility built into the Xilinx Zynq Z-7000S and Z-7000 device families.

 

You’ll find the announcement for the iW-RainboW-G28M SoDIMM SOM here and the data sheet here.

 

 

Please contact iWave directly for more information about the iW-RainboW-G28M SoDIMM SOM.

 

 

 

 

The recent introduction of the groundbreaking Xilinx Zynq UltraScale+ RFSoC means that there are big changes in store for the way advanced RF and comms systems will be designed. With as many as 16 RF-class ADCs and DACs on one device along with a metric ton or two of other programmable resources, the Zynq UltraScale+ RFSoC makes it possible to start thinking about single-chip Massive MIMO systems. A new EDN.com article by Paul Newson , Hemang Parekh, and Harpinder Matharu titled “Realizing 5G New Radio massive MIMO systemsteases a few details for building such systems and includes this mind-tickling photo:

 

 

 

Zynq UltraScale RFSoC Massive MIMO proto system.jpg 

 

 

A sharp eye and keen memory will link that photo to a demo from last October’s Xilinx Showcase demo at the Xilinx facility in Longmont, Colorado. Here’s Xilinx’s Lee Hansen demonstrating a similar system based on the Xilinx Zynq UltraScale+ RFSoC:

 

 

 

 

For more details about the Zynq UltraScale+ RFSoC, contact your friendly neighborhood Xilinx or Avnet sales rep and see these previous Xcell Daily blog posts:

 

 

 

 

 

 

 

 

 

 

 

Last month, a user on EmbeddedRelated.com going by the handle stephaneb started a thread titled “When (and why) is it a good idea to use an FPGA in your embedded system design?” Olivier Tremois (oliviert), a Xilinx DSP Specialist FAE based in France, provided an excellent, comprehensive, concise, Xilinx-specific response worth repeating in the Xcell Daily blog:

 

 

 

As a Xilinx employee I would like to contribute on the Pros ... and the Cons.

 

Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.


Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:

 

FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...

 

Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.

 

Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).

 

Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].

 

Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:

 

  • IP Integrator(IPI): a graphical interface to stitch IPs together and generate bitstreams for complete systems.

 

  • Vivado HLS (High Level Synthesis): a tool that allows you to generate HDL code from C/C++ code. This tool will generate IPs that can be handled by IPI.

 

 

  • SDSoC (Software Defined SoC): This tool allows you to design complete systems, software and hardware on a Zynq SoC/Zynq UltraScale+ MPSoC platform. This tool with some plugins will allow you to move part of your C/C++ code to programmable logic (calling Vivado HLS in the background).

 

  • SDAccel: an OpenCL (and more) implementation. Not relevant for this thread.

 

 

There are also tools related to the MathWorks environment [MATLAB and Simulink]:

 

 

  • System Generator for DSP (aka SysGen): Low-level Simulink library (designed by Xilinx for Xilinx FPGAs). Allows you to program HDL code with blocks. This tools achieves even better performance (clock/area) than HDL code as each block is an instance of an IP (from register, adder, counter, multiplier up to FFT, FIR compiler, and VHLS IP). Bit-true and cycle-true simulations.

 

  • Xilinx Model Composer (XMC): available since ... yesterday! Again a Simulink blockset but based on Vivado HLS. Much faster simulations. Bit-true but not cycle-true.

 

 

All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…

 

 

 

 

One of life’s realities is that the most advanced semiconductor devices—including the Xilinx Zynq UltraScale+ MPSoCs—require multiple voltage supplies for proper operation. That means that you must devote a part of the system engineering effort for a product based on these devices to the power subsystem. Put another way, it’s been a long, long time since the days when a single 5V supply and a bypass capacitor were all you needed. Fortunately, there’s help. Xilinx has a number of vendor partners with ready, device-specific power-management ICs (PMICs). Case in point: Dialog Semiconductor.

 

If you need to power a Zynq UltraScale+ ZU3EG, ZU7EV, or ZU9CG MPSoC, you’ll want to check out Dialog’s App Note AN-PM-095 titled “Power Solutions for Xilinx Zynq Ultrascale+ ZU9EG.” This document contains reference designs for cost-optimized, PMIC-based circuits specifically targeting the power requirements for Zynq UltraScale+ MPSoCs. According to Xilinx Senior Tech Marketing Manager for Analog and Power Delivery Cathal Murphy, Dialog Semi’s PMICs can be used for low-cost power-supply designs because they generate as many as 12 power rails per device. They also switch at frequencies as high as 3MHz, which means that you can use smaller, less expensive passive devices in the design.

 

It also means that your overall power-management design will be smaller. For example, Dialog Semi’s power-management ref design for a Zynq UltraScale+ ZU9 MPSoC requires only 1.5in2 of board space—or less for smaller devices in the MPSoC family.

 

You don’t need to visualize that in your head. Here’s a photo and chart supplied by Cathal:

 

 

Dialog Semi Zynq UltraScale Plus MPSoC PMICs.jpg 

 

 

The Dialog Semi reference design is hidden under the US 25-cent piece.

 

As the chart notes, these Dialog Semi PMICs have built in power sequencing and can be obtained preprogrammed for Zynq-specific power sequences from distributors such as Avnet.

 

Cathal also pointed out that Dialog Semi has long been supplying PMICs to the consumer market (think smartphones and tablets) and that the power requirements for Zynq UltraScale+ MPSoCs map well into the existing capabilities of PMICs designed for this market, so you reap the benefit of the company’s volume manufacturing expertise.

 

Note: If you’re looking for a PMIC to power your Spartan-7 FPGA design, check out Dialog Semi’s DA9062 with four buck converters and four LDOs.

 

 

 

Late last week, Avnet announced that it’s now offering the Aaware Sound Capture Platform paired with the MiniZed Zynq SoC development platform as a complete dev kit for voice-based cloud services including Amazon Alexa and Google Home. It’s listed on the Avnet site for $198.99. Avnet and Aaware are demonstrating the new kit at CES 2018, being held this week in Las Vegas. You’ll find them at the Eureka Park booth #50212 in the Sands Expo.

 

 

Aaware and Avnet Voice Dev Kit.jpg 

 

The Aaware Sound Capture Platform coupled to a Zynq-based Avnet MiniZed dev board

 

 

The Aaware Sound Capture Platform couples as many as 13 MEMS microphones (you can use fewer in a 1D linear or 2D array) with a Xilinx Zynq Z-7010 SoC to pre-filter incoming voice, delivering a clean voice data stream to local or cloud-based voice recognition hardware. The system has a built-in wake word (like “Alexa” or “OK, Google”) that triggers the unit’s filtering algorithms.

 

Avnet’s MiniZed dev board is usually based on a single-core Zynq 7Z007S but the MiniZed board included in this kit is actually based on a dual-core Zynq Z-7010 SoC. This board offers you outstanding wireless I/O in the form of a WiFi 802.11b/g/n module and a Bluetooth 4.1 module.

 

 

 

For more information about the Aaware Sound Capture Platform, see:

 

 

 

 

 

 

Adam Taylor’s MicroZed Chronicles, Part 231: “Developing Image Processing Platforms”—The Video

by Xilinx Employee ‎01-08-2018 09:33 AM - edited ‎01-15-2018 09:44 AM (6,022 Views)

 

Adam Taylor has been writing about the use of Xilinx All Programmable devices for image-processing platforms for quite a while and he has wrapped up much of what he knows into a 44-minute video presentation, which appears below. Adam is presenting tomorrow at the Xilinx Developer Forum being held in Frankfurt, Germany.

 

 

 

 

You’ll find a PDF of his slides attached below:

 

My good friend Jack Ganssle has long published The Embedded Muse email newsletter and the January 2, 2018 issue (#341!) includes an extensive review of the new $759, Zynq-based Siglent SDS1204X-E 4-channel DSO. Best of all, he’s giving one of these bad boys away at the end of January. (Contest details below.)

 

 

Siglent SDS-1204X-E.jpg 

 

Siglent’s Zynq-based SDS1204X-E 4-channel DSO. Photo credit: Jack Ganssle

 

 

 

The Siglent SDS1204X-E is the 4-channel version of the Siglent SDS1202X-E that EEVblog’s Dave Jones tore down last April. (See “Dave Jones tears down the new, <$400, Zynq-powered, Siglent SDS1202X-E 2-channel, 200MHz, 1Gsamples/sec DSO.”) I personally bought one of those scopes and I can attest to it’s being one sweet instrument. You should read Jack’s detailed review on his Web site, but here’s his summary:

 

“I'm blown away by the advanced engineering and quality of manufacturing exhibited by this and some other Chinese test equipment. Steve Leibson wrote a piece about how the unit works, and it's clear that the innovation and technology in this unit are world-class.”

 

 

In my own review of the Siglent SDS1202X-E last November, I wrote:

 

 

“Siglent’s SDS-1202X-E and SDS-1104X-E DSOs once again highlight the Zynq SoC’s flexibility and capability when used as the foundation for a product family. The Zynq SoC’s unique combination of a dual-core Arm Cortex-A9 MPCore processing subsystem and a good-sized chunk of Xilinx 7 series FPGA permits the development of truly high-performance platforms.”

 

Last April, I wrote:

 

“The new SDS1000X-E DSO family illustrates the result of selecting a Zynq SoC as the foundation for a system design. The large number of on-chip resources permit you to think outside of the box when it comes to adding features. Once you’ve selected a Zynq SoC, you no longer need to think about cramming code into the device to add features. With the Zynq SoC’s hardware, software, and I/O programmability, you can instead start thinking up new features that significantly improve the product’s competitive position in your market.

 

“This is precisely what Siglent’s engineers were able to do. Once the Zynq SoC was included in the design, the designers of this entry-level DSO family were able to think about which high-performance features they wished to migrate to their new design.”

 

 

 

All of that is equally true for the Siglent SDS1204X-E 4-channel DSO, which is further proof of just how good the Zynq SoC is when used as a foundation for an entire product-family.

 

Now if you want to win the Siglent SDS1204X-E 4-channel DSO that Jack’s giving away at the end of January, you first need to subscribe to The Embedded Muse. The subscription is free, Jack’s an outstanding engineer and a wonderful writer, and he’s not going to sell or even give your email address to anyone else so consider the Embedded Muse subscription a bonus for entering the drawing. After you subscribe, you can enter the contest here. (Note: It’s Jack’s contest, so if you have questions, you need to ask him.)

 

Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.

by Xilinx Employee ‎01-04-2018 02:13 PM - edited ‎01-05-2018 12:34 PM (4,753 Views)

 

How do you get reliable, far-field voice recognition; robust, directional voice recognition in the presence of strong background noise; and multiple wake words for voice-based cloud services such as Amazon’s Alexa and Google Home? Aaware has an answer with its $199, Zynq-based Far-Field Development Platform. (See “13 MEMS microphones plus a Zynq SoC gives services like Amazon’s Alexa and Google Home far-field voice recognition clarity.”) A new Powered by Xilinx Demo Shorts video gives you additional info and another demo. (That’s a Zynq-based krtkl snickerdoodle processing board in the video.)

 

 

 

 

Powered by Xilinx: Another look at KORTIQ’s FPGA-based AIScale CNN Accelerator

by Xilinx Employee ‎01-04-2018 02:00 PM - edited ‎01-04-2018 02:16 PM (4,476 Views)

 

A previous blog at the end of last November discussed KORTIQ’s FPGA-based AIScale CNN Accelerator, which takes pre-trained CNNs (convolutional neural networks)—including industry standards such as ResNet, AlexNet, Tiny Yolo, and VGG-16—compresses them, and fits them into Xilinx’s full range of programmable logic fabrics. (See “KORTIQ’s AIScale Accelerator fits trained CNNs into large or small All Programmable devices, allowing you to pick the right price/performance ratio for your application.”) A short, new Powered by Xilinx video provides more details about Kortiq and its accelerated CNN.

 

In the video, KORTIQ CEO Harold Weiss discusses using low-end Zynq SoCs (up to the Z-7020) and Zynq UltraScale+ MPSoCs (the ZU2 and ZU3) to create low-power solutions that deliver “just enough” performance for target industrial applications such as video processing, which requires billions of operations per second. The Zynq SoCs and Zynq UltraScale+ MPSoCs consume far less power than competing GPUs and CPUs while accelerating multiple CNN layers including convolutional layers, pooling layers, fully connected layers, and adding layers.

 

Here’s the new video:

 

 

 

 

Vivado 2017.4 is now available. Download it now to get these new features (see the release notes for complete details):

 

 

 

 

Download the new version of the Vivado Design Suite HLx editions here.

 

 

 

 

Continental AG has announced its Assisted & Automated Driving Control Unit, based on Xilinx All Programmable technology and developed in collaboration with Xilinx. According to the company, “…the Assisted & Automated Driving Control Unit will enable Continental’s customers to get to market faster by building upon the Open Computing Language (OpenCL) framework…” and “…offers a scalable product family for assisted and automated driving fulfilling the highest safety requirements (ASIL D) by 2019.” 

 

 

 

Continental Assisted Driving Control Unit.jpg 

 

 

Continental AG’s Assisted & Automated Driving Control Unit is based on Xilinx All Programmable technology

 

 

 

Continental’s incorporation of Xilinx All Programmable technology “provides developers the ability to optimize software for the appropriate processing engine or to create their own hardware accelerators with the Xilinx All Programmable technology. The result is the ultimate freedom to optimize performance, without sacrificing latency, power dissipation, or the flexibility to move software algorithms between the integrated chips, as the project progresses.”

 

“Our Assisted & Automated Driving Control Unit will enable automotive engineers to create their own differentiated solutions for machine learning, and sensor fusion. Xilinx’s All Programmable Technology was chosen as it offers flexibility and scalability to address the ever-changing and new requirements along the way to fully automated self-driving cars,” said Karl Haupt, Head of Continental’s Advanced Driver Assistance Systems business unit. “For Continental, the Assisted & Automated Driving Control Unit is a central element for implementing the required functional safety architecture and, at the same time, a host for the comprehensive environment model and driving functions needed for automated driving.”

 

Continental will be exhibiting at next week’s CES in Las Vegas.

 

 

 

 

By Adam Taylor

 

 

What better way to start the New Year than with a new Adam Taylor MicroZed Chronicles blog? – The Editor

 

 

 

Following on from the popularity of my final blog of last year where I presented several tips for better image-processing systems, I thought I would kick off the 2018 series of blogs by providing a number of tips for using the XADC and Sysmon in Zynq SoCs and Zynq UltraScale+ MPSoCs.

 

Whether our targeted device uses a XADC or Sysmon depends upon the device family. If we are targeting a 7 series FPGA or Zynq SoC device, we will be using the XADC. If the target is an UltraScale FPGA, UltraScale+ FPGA, or a Zynq UltraScale+ MPSoC, we’ll be using a Sysmon block. Behaviorally, the on-chip XADC and Sysmon blocks are very similar but there are some minor differences in architecture and maximum sampling rates between the two. Including the XADC or Sysmon adds a very interesting analog/mixed-signal capability to your design and helps reduce the number of external components. Because they can monitor internal device parameters along with external signals, you can also use the XADC/Sysmon blocks to implement a comprehensive system health and security monitoring solution critical for many applications.

 

Here are some of my favourite tips for using the Xilinx XADC/Sysmon blocks:

 

 

 

  • Remember Nyquist’s sampling theorem and configure sample clocks correctly

 

 

Image1.jpg 

 

 

 

To prevent signal aliasing, you must set the XADC/Sysmon sampling rate to at least twice the frequency of the signal being quantized. When sampling external signals, the XADC and Sysmon have different maximum sampling frequencies of 1000Ksamples/sec and 200Ksamples/sec respectively. To set the appropriate sampling frequency, we need to consider the relationship between the clock provided to the XADC/Sysmon (called DClock) and the resultant internally derived clock used for sampling (called ADC Clock). Both the XADC and Sysmon take a minimum of 26 internal ADC Clock cycles to perform a conversion. To achieve the maximum conversion rate of 1000KSPS for the XADC, we therefore need to set the ADC Clock at 26 MHz. For the Sysmon block, we need to set the ADC Clock to 5.2MHz to achieve the full 200ksamples/sec sample rate. ADC clock frequencies below these will result in lower sampling rates. Correctly setting the sampling rate depends upon the device you are using and the access method:

 

 

  1. Zynq PS APB Access – The XADC is clocked using the PCAP_2X clock which has a nominal frequency of 200MHz. This is divided further internally within the DevC to generate a DClock for the XADC, which has a maximum frequency of 50MHz using the XADCIF_CFG[TCKRATE] register settings. This clock is then further divided down (minimum division by 2) using the XDAC configuration register to select the desired conversion rate.
  2. Zynq MPSoC APB Access – The PS and PL Sysmon are clocked by the AMS clock with a range 0 to 52 MHz. This is then divided further by the Sysmon to create the ADC Clock used for sampling. This further division is a minimum of 2 for the PS Sysmon or 8 for the PL Sysmon using the Sysmon configuration clock division registers PS/PL CONFIG_REG2.
  3. AXI Access using FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC (PL Sysmon) – Use the system management wizard or XADC wizard configuration to determine the AXI clock frequency and hence the sampling clock via the IP customization tab. The required sample clock frequency can then be supplied by either a fabric clock (Zynq SoC or Zynq UltraScale+ MPSoC) or clock wizard (FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC).

 

 

 

 

  • Configure the Analog Inputs Correctly

 

Image2.jpg 

 

 

 

The analog inputs are defined by IP Integrator or software to be either unipolar or bipolar and you can control the input configuration for each analog input individually. When a unipolar signal is quantized, the input signal can range between 0V and 1V. For a bipolar input, the differential voltage between the Vp and Vn inputs is ±0.5V. Selecting the right mode ensures the best performance and avoids damaging the analog inputs. For unipolar configurations, Vp cannot be negative with respect to Vn. For bipolar inputs, Vp and Vn can swing positive and negative with respect the common mode (reference) voltage. Bipolar mode provides better noise performance because any common-mode noise coupled onto the Vp and Vn signals will be removed thanks to differential sampling.

 

When it comes to providing better performance in electrically noisy environments, you can also turn on input-channel averaging to average out the noise.

 

 

 

 

  • Leverage the External Multiplexer Capabilities

 

Image3.jpg 

 

 

Both the XADC and the Sysmon can accept as many as seventeen external differential analog signals using one dedicated Vp/Vn pair and sixteen Auxiliary Vp/Vn pins. Doing so of course uses several I/O signal pins—as many as 34 I/O pins if all analog inputs are used. This may present issues, especially on smaller devices where I/O-pin availability might be tightly constrained so the XADC/Sysmon can drive an external multiplexer that reduces the number of pins required and also allows you to use and external mux with added protection for harsh operating environments (e.g. ESD protection).

 

 

 

 

 

  • Consider the Anti-Aliasing Filter effect on Conversion Performance

 

Image4.jpg 

 

 

Implementing an Anti-Aliasing filter on the front end of the XADC/Sysmon external inputs is critical to ensuring that only the signals we want are quantized.

 

The external resistor and capacitors in the AAF will increase the overall settling time. Therefore, we need to ensure the external AAF also does not adversely affect the total settling time and consequently the conversion performance. Failing to provide adequate system-level settling time can result in ADC measurement errors because the sampling capacitor will not charge to its final value.

 

Xilinx APP 795 Driving the Xilinx Analog-to-Digital Converter provides very useful information on this subject.

 

 

 

 

  • Use the Alarms and Set Appropriate Thresholds

 

 

Image5.jpg 

 

 

 

Both the XADC and Sysmon can monitor internal power supply voltages and temperatures. This is a great feature when we initially commission the boards because we can verify that the power supplies are delivering the expected voltages. We can even use the temperature sensor to verify thermal calculations at the high and low end of qualification environments.

 

 

When it comes to creating the run-time application you should use the temperature and voltage alarms, which are based on defined thresholds for core voltages and device temperature. Should the measured parameter fall outside of these defined thresholds, an alarm allows further action to be taken. Configured correctly this alarm capability can be used to generate an interrupt which alerts the processing system to a problem. Depending upon which alarm which has been raised, the system can then act to either protect itself or undertake graceful degradation, thus preventing sudden failure.

 

 

 

Hopefully these tips will enable you to create smoother XADC/Sysmon solutions. If you experience any issues, I have a page on my website that links to all previous XADC / Sysmon examples in this series.   

 

 

 

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

  

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

OKI IDS and Avnet have jointly announced a new board for developing ADAS (automated driver assist systems) and advanced SAE Level 4/5 autonomous driving systems based on two Xilinx UltraScale+ MPSoCs.

 

 

 

 

OKI Avnet ADAS Platform Based on Two Zynq UltraScale Plus MPSoCs.jpg 

 

 

 

Avnet plans to start distributing the board in Japan in February, 2018 and will then expand into other parts of Asia. The A4-sized board interfaces to as many as twelve sensors including cameras and other types of imagers. The board operates on 12V and, according to the announcement, consumes about 20% of the power compared to similar hardware based on GPUs because it employs the Xilinx UltraScale+ MPSoCs as its foundation.  

 

Want to see this board in person? You can, at the Xilinx booth at Automotive World 2018 being held at Tokyo Big Site from January 17th to 19th. (Hall East 6, 54-47)

 

 

Ann Stefora Mutschler has just published an article on the SemiconductorEngineering.com Web site titled “Mixing Interface Protocols” that describes some of the complexities of SoC design—all related to the proliferation of various on- and off-chip I/O protocols. However, the article can just as easily be read as a reason for using programmable-logic devices such as Xilinx Zynq SoCs, Zynq UltraScale+ MPSoCs, and FPGAs in your system designs.

 

For example, here’s Mutschler’s lead sentence:

 

“Continuous and pervasive connectivity requires devices to support multiple interface protocols, but that is creating problems at multiple levels because each protocol is based on a different set of assumptions.”

 

This sentence nicely sums up the last two decades of interface design philosophy for programmable-logic devices. Early on, it became clear that a lot of logic translation was needed to connect early FPGAs to the rest of a system. When Xilinx developed I/O pins with programmable logic levels, it literally wiped out a big chunk of the market for level-translator chips. When MGT (multi-gigabit serial transceivers) started to become popular for moving large amounts of data from one subsystem to another, Xilinx moved those onto its devices as well.

 

So if you’d like to briefly glimpse into the chaotic I/O scene that’s creating immense headaches for SoC designers, take a read through Ann Stefora Mutschler’s new article. If you’d like to sidestep those headaches, just remember that Xilinx’s engineering team has already suffered them for you.

 

 

Digilent’s end-of-the-year sale includes a few Xilinx-based products

by Xilinx Employee ‎12-20-2017 11:43 AM - edited ‎12-20-2017 02:35 PM (8,135 Views)

 

There are a few Xilinx-based dev boards and instruments on sale right now at Digilent. You’ll find them on the “End of the Year Sale” Web page:

 

 

  • Zynq-based ZedBoard: $396.00 (marked down from $495.00)

 

ZedBoard V2.jpg

 

 

 

  • Digital Discovery Logic Analyzer and Pattern Generator (based on a Spartan-6 FPGA): $169.99 (marked down from $199.99)

 

 

 

Digilent Digital Discovery Module.jpg

 

 

 

  • Analog Discovery 2 Maker Bundle (based on a Spartan-6 FPGA): $261.75 (marked down from $349.00)

 

 

Digilent Analog Discovery 2 v2.jpg

 

 

 

If you were considering one of these Digilent products, now’s probably the time to buy.

 

 

 

 

 

 

In an article published in EETimes today titled “Programmable Logic Holds the Key to Addressing Device Obsolescence,” Xilinx’s Giles Peckham argues that the use of programmable devices—such as the Zynq SoCs, Zynq UltraScale+ MPSoCs, and FPGAs offered by Xilinx—can help prevent product obsolescence in long-lived products designed for industrial, scientific, and military applications. And that assertion is certainly true. But in this blog, I want to highlight the response by a reader using the handle MWagner_MA who wrote:

 

Given the pace of change in FPGA's, I don't know if an FPGA will be a panacea for chip obsolescence issues. However, when changes in system design occur for hooking up new peripherals to a design off board, FPGA's can extend the life of a product 5+ years assuming you can get board-compatible FPGA's. Comm channels are what come to mind. If you use the same electrical interface but have an updated protocol, programmable logic can be a solution. Another solution is that when devices on SPI or I2C busses go obsolete, FPGA code can get updated to accomodate, even changing protocol if necessary assuming the right pins are connected at the other chip (like an A/D).”

 

 

MWagner_MA’s response is nuanced and tempered with obvious design experience. However, I will need to differ with the comment that the pace of change in FPGAs means something significant within the context of product obsolescence. Certainly FPGAs go obsolete, but it takes a long, long time.

 

Case in point:

 

I received an email just today from Xilinx about this very topic. (Feel free to insert amusement here about Xilinx’s corporate blogger being on the company’s promotional email list.) The email is about Xilinx’s Spartan-6 FPGAs, which were first announced in 2009. That’s eight or nine years ago. Today’s email states that Xilinx plans to ship Spartan-6 devices “until at least 2027.” That’s another nine or ten years into the future for a resulting product-line lifespan of nearly two decades and that’s not all that unusual for Xilinx parts. In other words, Xilinx FPGAs are in another universe entirely when compared to the rapid pace of obsolescence for semiconductor devices like PC and server processors. That’s something to keep in mind when you’re designing products destined for a long life in the field.

 

If you want to see the full long-life story for the Spartan-6 FPGA family, click here.

 

 

 

The Raptor from Rincon Research implements a 2x2 MIMO SDR (software-defined radio) in a compact 5x2.675-inch form factor by combining the capabilities of the Analog Devices AD9361 RF Agile Transceiver and the Zynq UltraScale+ ZU9EG MPSoC. The board has an RF tuning range of 70MHz to 6GHz. On-board memory includes 4Gbytes of DDR4 SDRAM, a pair of QSPI Flash memory chips, and an SD card socket. Digital I/O options include three on-board USB connectors (two USB 3.0 ports and one USB 2.0 port) and, through a mezzanine board, 10/100/1000 Ethernet, two SFP+ optical cages, an M.2 SATA port, DisplayPort, and a Samtec FireFly connector. Rincon Research provides the board along with a BSP, drivers, and COTS tool support.

 

Here’s a block diagram of the Raptor board:

 

 

Rincon Research Raptor Block Diagram.jpg

 

Rincon Research’s Raptor, a 2x2 MIMO SDR Board, Block Diagram

 

 

 

Here are photos of the Raptor main board and its I/O expansion mezzanine board:

 

 

 

Rincon Research Raptor Board.jpg 

 

Rincon Research’s Raptor 2x2 MIMO SDR Board

 

 

 

 

Rincon Research Raptor IO Mezzanine Board.jpg 

 

Rincon Research’s Raptor I/O Expansion Board

 

 

 

Please contact Rincon Research for more information about the Raptor SDR.

 

 

 

By Adam Taylor

 

 

For the final MicroZed Chronicles blog of the year, I thought I would wrap up with several tips to help when you are creating embedded-vision systems based on Zynq SoC, Zynq UltraScale+ MPSoC, and Xilinx FPGA devices.

 

Note: These tips and more will be part of Adam Taylor’s presentation at the Xilinx Developer Forum that will be held in Frankfurt, Germany on January 9.

 

 

 

Image1.jpg 

 

 

 

 

  1. Design in Flexibility from the Beginning

 

 

Image2.jpg

 

 

Video Timing Controller used to detect the incoming video standard

 

 

Use the flexibility provided by the Video Timing Controller (VTC) and reconfigurable clocking architectures such as Fabric Clocks, MMCM, and PLLs.  Using the VTC and associated software running on the PS (processor system) in the Zynq SoC and Zynq UltraScale+ MPSoC, it is possible to detect different video standards from an input signal at run time and to configure the processing and output video timing accordingly. Upon detection of a new video standard, the software running on the PS can configure new clock frequencies for the pixel clock and the image-processing chain along with re-configuring VDMA frame buffers for the new image settings. You can use the VTC’s timing detector and timing generator to define the new video timing. To update the output video timings for the new standard, the VTC can use the detected video settings to generate new output video timings.

 

 

 

  1. Convert input video to AXI Interconnect as soon as possible to leverage IP and HLS

 

 

Image3.jpg 

 

 

Converting Data into the AXI Streaming Format

 

 

 

Vivado provides a range of key IP cores that implement most of the functions required by an image processing chain—functions such as Color Filter Interpolation, Color Space Conversion, VDMA, and Video Mixing. Similarity Vivado HLS can generate IP cores that use the AXI interconnect to ease integration within Vivado designs. Therefore, to get maximum benefit from the available IP and tool chain capabilities, we need to convert our incoming video data into the AXI Streaming format as soon as possible in the image-processing chain. We can use the Video-In-to-AXI-Stream IP core as an aid here. This core converts video from a parallel format consisting of synchronization signals and pixel values into our desired AXI Streaming format. A good tip when using this IP core is that the sync inputs do not need to be timed as per a VGA standard; they are edge triggered. This eases integration with different video formats such as Camera Link, with its frame-valid, line-valid, and pixel information format, for example. 

 

 

 

  1. Use Logic Debugging Resources

 

 

Image4.jpg 

 

 

 

Insertion of the ILA monitoring the output stage

 

 

 

Insert integrated logic analyzers (ILAs) at key locations within the image-processing chain. Including these ILAs from day one in the design can help speed commissioning of the design. When implementing an image-processing chain in a new design, I insert ILA’s as a minimum in the following locations:

 

  • Directly behind the receiving IP module—especially if it is a custom block. This ILA enables me to be sure that I am receiving data from the imager / camera.
  • On the output of the first AXI Streaming IP Core. This ILA allows me to be sure the image-processing core has started to move data through the AXI interconnect. If you are using VDMA, remember you will not see activity on the interconnect until you have configured the VDMA via software.
  • On the AXI-Streaming-to-Video-Out IP block, if used. I also consider connecting the video timing controller generator outputs to this ILA as well. This enables me to determine if the AXI-Stream-to-Video-Out block is correctly locked and the VTC is generating output timing.

 

When combined with the test patterns discussed below, insertion of ILAs allows us to zero in faster on any issues in the design which prevent the desired behavior.

 

 

 

  1. Select an Imager / Camera with a Test Pattern capability

 

 

Image5.jpg 

 

 

Incorrectly received incrementing test pattern captured by an ILA

 

 

 

If possible when selecting the imaging sensor or camera for a project, choose one that provides a test pattern video output. You can then use this standard test pattern to ensure the reception, decoding, and image-processing chain is configured correctly because you’ll know exactly what the original video signal looks like. You can combine the imager/camera test pattern with ILAs connected close to the data reception module to determine if any issues you are experiencing when displaying an image is internal to the device and the image processing chain or are the result of the imager/camera configuration.

 

We can verify the deterministic pixel values of the test pattern using the ILA. If the pixel values, line length, and the number of lines are as we expect, then it is not an imager configuration issue. More likely you will find the issue(s) within the receiving module and the image-processing chain.  This is especially important when using complex imagers/cameras that require several tens, or sometimes hundreds of configuration settings to be applied before an image is obtained.

 

 

  1. Include a Test Patter Generator in your Zynq SoC, Zynq UltraScale+ MPSoC, or FPGA design

 

 

Image6.jpg 

 

 

Tartan Color Bar Test Pattern

 

 

 

If you include a test-pattern generator within the image-processing chain, you can use it to verify the VDMA frame buffers, output video timing, and decoding prior to the integration of the imager/camera. This reduces integration risks. To gain maximum benefit, the test-pattern generator should be configured with the same color space and resolution as the final imager. The test pattern generator should be included as close to the start of the image-processing chain as possible. This enables more of the image-processing pipeline to be verified, demonstrating that the image-processing pipeline is correct. When combined with test pattern capabilities on the imager, this enables faster identification of any problems.

 

 

 

  1. Understand how Video Direct Memory Access stores data in memory

 

 

Image7.jpg 

 

 

 

Video Direct Memory Access (VDMA) allows us to use the processor DDR memory as a frame buffer. This enables access to the images from the processor cores in the PS to perform higher-level algorithms if required. VDMA also provides the buffering required for frame-rate and resolution changes. Understanding how VDMA stores pixel data within the frame buffers is critical if the image-processing pipeline is to work as desired when configured.

 

One of the major points of confusion when implementing VDMA-based solutions centers around the definition of the frame size within memory. The frame buffer is defined in memory by three parameters: Horizontal Size (HSize), Vertical Size (VSize). and Stride.  The two parameters that define the Horizontal Size of the image are the HSize and the stride of the image. Like VSize, which defines the number of lines in the image, the HSize defines the length of each line. However instead of being measured in pixels the horizontal size is measured in bytes. We therefore need to know how many bytes make up each pixel.

 

The Stride defines the distance between the start of one line and another. To gain efficient use of the DDR memory, the Stride should at least equal the horizontal size. Increasing the Stride introduces a gap between lines. Implementing this gap can be very useful when verifying that the imager data is received correctly because it provides a clear indication of when a line of the image starts and ends with memory.

 

These six simple techniques have helped me considerably when creating imageprocessing examples for this blog or solutions for clients and they significantly ease both the creation and commissioning of designs.

 

As I said, this is my last blog of the year. We will continue this series in the New Year. Until then I wish you all happy holidays.

 

 

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

  

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

 

The European Tulipp (Towards Uniquitous Low-power Image Processing Platforms) project has just published a very short, 102-second video that graphically demonstrates the advantages of FPGA-accelerated video processing over CPU-based processing. In this demo, a Sundance EMC²-DP board based on a Xilinx Zynq Z-7030 SoC accepts real-time video from a 720P camera and applies filters to the video before displaying the images on an HDMI screen.

 

Here are the performance comparisons for the real-time video processing:

 

  • CPU-based grayscale filter: ~15fps
  • FPGA-accelerated grayscale filter: ~90fps (6x speedup)
  • CPU-based sobel filter: ~2fps
  • FPGA-accelerated sobel filter: ~90fps (45x speedup)

 

 

The video rotates through the various filters so that you can see the effects.

 

 

 

 

 

TSN (time-sensitive networking) is a set of evolving IEEE standards that support a mix of deterministic, real-time and best-effort traffic over fast Ethernet connections. The TSN set of standards is bocming increasingly important in many industrial networking sutuations, particularly for IIoT (the Industrial Internet of Things). SoC-e has developed TSN IP that you can instantiate in Xilinx All Programmable devices. (Because the standards are still evolving, implementing the TSN hardware in reprogrammable hardware is a good idea.)

 

In particular, the company offers the MTSN (Multiport TSN Switch IP Core) IP core, which provides precise time synchronization of network nodes using synchronized, distributed local clocks with a reference and IEEE 802.1Qbv for enhanced traffic scheduling. You can currently instantiate the SoC-e core on all of the Xilinx 7 series devices (the Zynq SoC and Spartan-7, Artix-7, Kintex-7, and Virtex-7 FPGAs), Virtex and Kintex UltraScale devices, and all UltraScale+ devices (the Zynq UltraScale+ MPSoCs and Virtex and Kintex UltraScale+ FPGAs).

 

Here’s a short three-and-a-half minute video explaining TSN and the SoC-e MSTN IP:

 

 

 

 

Tincy YOLO: a real-time, low-latency, low-power object detection system running on a Zynq UltraScale+ MPSoC

by Xilinx Employee ‎12-14-2017 10:39 AM - edited ‎12-15-2017 06:15 AM (20,650 Views)

 

Last week at the NIPS 2017 conference in Long Beach, California, a Xilinx team demonstrated a live object-detection implementation of a YOLO—“you only look once”—network called Tincy YOLO (pronounced “teensy YOLO”) running on a Xilinx Zynq UltraScale+ MPSoC. Tincy YOLO combines reduced precision, pruning, and FPGA-based hardware acceleration to speed network performance by 160x, resulting in a YOLO network capable of operating on video frames at 16fps while dissipating a mere 6W.

 

 

Figure 5.jpg 

 

Live demo of Tincy YOLO at NIPS 2017. Photo credit: Dan Isaacs

 

 

 

Here’s a description of that demo:

 

 

 

 

TincyYOLO: a real-time, low-latency, low-power object detection system running on a Zynq UltraScale+ MPSoC

 

 

By Michaela Blott, Principal Engineer, Xilinx

 

 

The Tincy YOLO demonstration shows real-time, low-latency, low-power object detection running on a Zynq UltraScale+ MPSoC device. In object detection, the challenge is to identify objects of interest within a scene and to draw bounding boxes around them, as shown in Figure 1. Object detection is useful in many areas, particularly in advanced driver assistance systems (ADAS) and autonomous vehicles where systems need to automatically detect hazards and to take the right course of action. Tincy YOLO leverages the “you only look once” (YOLO) algorithm, which delivers state-of-the-art object detection. Tincy YOLO is based on the Tiny YOLO convolutional network, which is based on the Darknet reference network. Tincy YOLO has been optimized through heavy quantization and modification to fit into the Zynq UltraScale+ MPSoC’s PL (programmable logic) and Arm Cortex-A53 processor cores to produce the final, real-time demo.

 

 

Figure 1.jpg 

 

Figure 1: YOLO-recognized people with bounding boxes

 

 

 

To appreciate the computational challenge posed by Tiny YOLO, note that it takes 7 billion floating-point operations to process a single frame. Before you can conquer this computational challenge on an embedded platform, you need to pull many levers. Luckily, the all-programmable Zynq UltraScale+ MPSoC platform provides many levers to pull. Figure 2 summarizes the versatile and heterogeneous architectural options of the Zynq platform.

 

 

Figure 2.jpg 

 

Figure 2: Tincy YOLO Platform Overview

 

 

 

The vanilla Darknet open-source neural network framework is optimized for CUDA acceleration but its generic, single-threaded processing option can target any C-programmable CPU. Compiling Darknet for the embedded Arm processors in the Zynq UltraScale+ MPSoC left us with a sobering performance of one recognized frame every 10 seconds. That’s about two orders of magnitude of performance away from a useful ADAS implementation. It also produces a very limited live-video experience.

 

To create Tincy YOLO, we leveraged several of the Zynq UltraScale+ MPSoC’s architectural features in steps, as shown in Figure 3. Our first major move was to quantize the computation of the network’s twelve inner (aka. hidden) layers by giving them binary weights and 3-bit activations. We then pruned this network to reduce the total operations to 4.5 GOPs/frame.

 

 

 

Figure 3.jpg 

 

Figure 3: Steps used to achieve a 160x speedup of the Tiny YOLO network

 

 

 

We created a reduced-precision accelerator using a variant of the FINN BNN library (https://github.com/Xilinx/BNN-PYNQ) to offload the quantized layers into the Zynq UltraScale+ MPSoC’s PL. These layers account for more than 97% of all the computation within the network. Moving the computations for these layers into hardware bought us a 30x speedup of their specific execution, which translated into an 11x speedup within the overall application context, bringing the network’s performance up to 1.1fps.

 

We tackled the remaining outer layers by exploiting the NEON SIMD vector capabilities built into the Zynq UltraScale+ MPSoC’s Arm Cortex-A53 processor cores, which gained another 2.2x speedup. Then we cracked down on the complexity of the initial convolution using maxpool elimination for another 2.2x speedup. This work raised the frame rate to 5.5fps. A final re-write of the network inference to parallelize the CPU computations across all four of the Zynq UltraScale+ MPSoC’s Arm Cortex-A53 processor delivered video performance at 16fps.

 

The result of these changes appears in Figure 4, which demonstrates better recognition accuracy than Tiny YOLO.

 

 

 

Figure 4.jpg 

 

Figure 4: Tincy YOLO results

 

 

 

 

 

An article titled “Living on the Edge” by Farhad Fallah, one of Aldec’s Application Engineers, on the New Electronics Web site recently caught my eye because it succinctly describes why FPGAs are so darn useful for many high-performance, edge-computing applications. Here’s an example from the article:

 

“The benefits of Cloud Computing are many-fold… However, there are a few disadvantages to the cloud too, the biggest of which is that no provider can guarantee 100% availability.”

 

There’s always going to be some delay when you ship data to the cloud for processing. You will need to wait for the answer. The article continues:

 

“Edge processing needs to be high-performance and in this respect an FPGA can perform several different tasks in parallel.”

 

The article then continues to describe a 4-camera ADAS demo based on Aldec’s TySOM-2-7Z100 prototyping board that was shown at this year’s Embedded Vision Summit held in Santa Clara, California. (The TySOM-2-7Z100 proto board is based on the Xilinx Zynq Z-7100 SoC—the largest member of the Zynq SoC family.)

 

 

 

 

Aldec TySOM-2-Z100 Prototyping Board.jpg 

 

Aldec’s TySOM-2-7Z100 prototyping board

 

 

 

Then the article describes the significant performance boost that the Zynq SoC’s FPGA fabric provides:

 

“The processing was shared between a dual-core ARM Cortex-A9 processor and FPGA logic (both of which reside within the Zynq device) and began with frame grabbing images from the cameras and applying an edge detection algorithm (‘edge’ here in the sense of physical edges, such as objects, lane markings etc.). This is a computational-intensive task because of the pixel-level computations being applied (i.e. more than 2 million pixels). To perform this task on the ARM CPU a frame rate of only 3 per second could have been realized, whereas in the FPGA 27.5 fps was achieved.”

 

That’s nearly a 10x performance boost thanks to the on-chip FPGA fabric. Could your application benefit similarly?

 

 

 

 

The latest hypervisor to host Wind River’s VxWorks RTOS alongside with Linux is the Xen Project Hypervisor, an open-source virtualization platform from the Linux Foundation. DornerWorks has released a version of the Xen Project Hypervisor called Virtuosity (the hypervisor formerly known as the Xen Zynq Distribution) that runs on the Arm Cortex-A53 processor cores in the Xilinx Zynq UltraScale+ MPSoC. Consequently, Wind River has partnered with DornerWorks to provide a Xen Project Hypervisor solution for VxWorks and Linux on the Xilinx Zynq UltraScale+ MPSoC ZCU102 eval kit.

 

Having VxWorks and Linux running on the same system allows developers to create hybrid software systems that offer the combined advantages of the two operating systems, with VxWorks managing mission-critical functions and Linux managing human-interactive functions and network cloud connection functions.

 

Wind River has just published a blog about using VxWorks and Linux on the Arm cortex-A53 processor, concisely titled “VxWorks on Xen on ARM Cortex A53,” written by Ka Kay Achacoso. The blog describes an example system with VxWorks running signal-processing and spectrum-analysis applications. Results are compiled into a JSON string and sent through the virtual network to Ubuntu.  On Ubuntu, the Apache2 HTTP server sends results to a browser using Node.js and Chart.js to format the data display.

 

Here’s a block diagram of the system in the Wind River blog:

 

 

 

Wind River VxWorks and Linux Hybrid System.jpg 

 

VxWorks and Linux Hybrid OS System

 

 

 

VxWorks runs as a guest OS on top of the unmodified Virtuosity hypervisor.

 

 

 

For more information about DornerWorks Xen Hypervisor (Virtuosity), see:

 

 

 

 

 

 

By Adam Taylor

 

Getting the best performance from our embedded-vision systems often requires that we can capture frames individually for later analysis in addition to displaying them. Programs such as Octave, Matlab, or Image J can analyze these captured frames, allowing us to examine parameters such as:

 

  • Compare the received pixel values against those expected for a test or calibration pattern.
  • Examine the Image Histogram, enabling histogram equalization to be implemented if necessary.
  • Ensure that the integration time of the imager is set correctly for the scene type.
  • Examine the quality of the image sensor to identify defective pixels—for example dead or stuck-at pixels.
  • Determine the noise present in the image. The noise present will be due to both inherent imager noise sources—for example fixed pattern noise, device noise and dark current—and also due to system noise as coupled in via power supplies and other sources of electrical noise in the system design.

 

Typically, this testing may occur in the lab as part of the hardware design validation and is often performed before the higher levels of the application software are available.  Such testing is often implemented using a bare-metal approach on the processor system.

 

If we are using VDMA, the logical point to extract the captured data is from the frame buffer in the DDR SDRAM attached to the Zynq SoC’s or MPSoC’s PS. There are two methods we can use to examine the contents of this buffer:

 

  • Use XSCT terminal to read out the frame buffer and post process it using a TCL script.
  • Output the frame buffer over RS232 or Ethernet using the Light Weight IP Stack and then capturing the image data in a terminal for post processing using a TCL file.

 

For this example, I am going to use the UltraZed design we created a few weeks ago to examine PL-to-PS image transfers in the Zynq UltraScale+ MPSoC (see here). This design rather helpfully uses the test pattern generator to transfer a test image to a frame buffer in the PS-attached DDR SDRAM. In this example, we will extract the test pattern and convert it into a bit-map (BMP) file. Once we have the bit-map file, we can read it into the analysis program of choice.

 

BMP files are very simple. In the most basic format, they consist of a BMP Header, Device Independent Bitmap (DIB) Header, and the pixel array. In this example the pixel array will consist of 24-bit pixels, using eight bits each for blue, green and red pixel values.

 

It is important to remember two key facts when generating the pixel array. First, when generating the pixel array each line must be padded with zeros so that its length is a multiple of four, allowing for 32-bit word access. Second, the BMP image is stored upside down in the array. That is the first line of the pixel array is the bottom line of the image.

 

Combined, both headers equal 54 bytes in length and are structured as shown below:

 

 

 

Image1.jpg

 

Bitmap Header Construction

 

 

 

Image2.jpg 

 

DIB Header Construction

 

 

 

Having understood what is involved in creating the file, all we need to do now is gather the pixel data from the PS-attached DDR SDRAM and output it in the correct format.

 

As we have done several times before in this blog, when we extract the pixel values it is a good idea to double check that the frame buffer contains pixel values. We can examine the contents of the frame buffer using the memory viewer in SDK. However, the view we choose will ease our understanding of the pixel values and hence the frame. This is due to how the VDMA packs the pixels into the frame buffer.

 

The default view for the Memory viewer is to display 32-bit words as shown below:

 

 

 

Image3.jpg

 

TPG Test Pattern in memory

 

 

 

The data we are working with has a pixel width of 24 bits. To ensure efficient use of the DDR SDRAM memory, the VDMA packs the 24-bit pixels into 32-bit values, splitting pixels across locations. This can make things a little confusing when we look at the memory contents for expected pixel values. Because we know the image is formatted as 8-bit RGB, a better view is to configure the memory display to list the memory contents in byte order. We then know that each group of three bytes represents one pixel.

 

 

 

Image4.jpg

 

 

TPG Test Pattern in memory Byte View

 

 

 

 

Having confirmed that the frame buffer contains image data, I am going to output the BMP information over the RS232 port for this example. I have selected this interface because it is the simplest interface available on many development boards and it takes only a few seconds to read out even a large image.

 

The first thing I did in my SDK application was to create a structure that defines the header and sets the values as required for this example:

 

 

Image5.jpg 

 

Header Structure in the application

 

 

 

I then created a simple loop that creates three u8 arrays, each the size of the image. There is one array for each color element. I then used these arrays with the header information to output the BMP information, taking care to use the correct format for the pixel array. A BMP pixel array organizes the pixel element as Blue-Green-Red:

 

 

Image6.jpg 

 

Body of the Code to Output the Image

 

 

 

Wanting to keep the processes automated and without the need to copy and paste to capture the output, I used Putty as the terminal program to receive the output data. I selected Putty because it is capable of saving received data a log file.

 

 

Image7.jpg 

 

Putty Configuration for logging

 

 

 

Of course, this log file contains an ASCII representation of the BMP. To view it, we need to convert it to a binary file of the same values. I wrote a simple TCL script to do this. The script performs the conversion, reading in the ASCII file and writing out the binary BMP File.

 

 

Image8.jpg 

 

TCL ASCII to Binary Conversion Widget

 

 

 

With this complete, we have the BMP image which we can load into Octave, Matlab, or another tool for analysis. Below is an example of the tartan color-bar test pattern that I captured from the Zynq frame buffer using this method:

 

 

 

Image9.jpg

 

Generated BMP captured from the PS DDR

 

 

 

Now if we can read from the frame buffer, then it springs to mind that we can use the same process to write a BMP image into the frame buffer. This can be especially useful when we want to generate overlays and use them with the video mixer.

 

We will look at how we do this in a future blog.

 

 

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

  

 

MicroZed Chronicles hardcopy.jpg 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.