With last week’s introduction of the Digilent Arty Z7 Zynq SoC training and dev board, I felt it was time to review some of the low-cost boards that occupy an outsized piece of mindshare in my head. So here are seven that have appeared previously in the Xcell Daily blog, listed in alphabetical order by vendor:
Analog Devices ADALM-PLUTO ($149): Students from all levels and backgrounds looking to improve their RF knowledge will want to take a look at the new ADALM-PLUTO SDR USB Learning Module from Analog Devices. The $149 USB module has an RF range of 325MHz to 3.8GHz with separate transmit and receive channels and 20MHz of instantaneous bandwidth. It pairs two devices that seem made for each other: an Analog Devices AD9363 Agile RF Transceiver and a Xilinx Zynq Z-7010 SoC.
Analog Devices’ $149 ADALM-PLUTO SDR USB Learning Module
Digilent ARTY Z7 ($149 to $209): The first thing you’ll note from the Arty Z7 dev board photo is that there’s a Zynq SoC in the middle of the board. You’ll also see the board’s USB, Ethernet, Pmod, and HDMI ports. On the left, you can see double rows of tenth-inch headers in an Arduino/chipKIT shield configuration. There are a lot of ways to connect to this board, which should make it a student’s or experimenter’s dream board considering what you can do with a Zynq SoC.
Digilent Arty Z7 dev board for makers and hobbyists
Digilent PYNQ-Z1 ($229): PYNQ is an open-source project that makes it easy for you to design embedded systems using the Xilinx Zynq-7000 SoC using the Python language, associated libraries, and the Jupyter Notebook, which is a pretty nice, collaborative learning and development environment for many programming languages including Python. PYNQ allows you to exploit the benefits of programmable logic used together with microprocessors to build more capable embedded systems with superior performance when performing embedded tasks.
Digilent PYNQ-Z1 Dev Board
Krtkl’s snickerdoodle ($95 to $195): The amazing, Zynq-based “snickerdoodle one” is a low-cost, single-board computer with wireless capability based on the Xilinx Zynq Z-7010 SoC, available for purchase on the Crowd Supply crowdsourcing Web site.
Krtkl’s Zynq-based, WiFi-enabled Snickerdoodle Dev Board
National Instruments myRIO ($400 to $1001): The NI myRIO hardware/software development platform for NI’s LabVIEW system design software is based on the Zynq-7010 All Programmable SoC. About the size of a small paperback book (so that it easily drops into a backpack), the NI myRIO sports ten analog inputs, six analog outputs, left and right audio channels, 40 digital I/O lines (SPI, I2C, UARD, PWM, rotary encoder) and an on-board, 3-axis accelerometer, and two 34-pin expansion headers.
The Zynq-based NI myRIO
National Instruments RoboRIO ($435 for FRC teams): The NI roboRIO robotic controller was specifically designed for the FIRST Robotics Competition (FRC). The FRC event is a particular passion for NI’s founder, Dr. James Truchard.
NI roboRIO for First Robotics Competition teams
Trenz ZynqBerry (€79.00 to €109.00): The Trenz Electronic TE0726 ZynqBerry Dev Board puts a Xilinx Zynq Z-7010 or Z-7020 SoC into a Rasberry-Pi-compatible form factor with 64Mbytes of LPDDR2 SDRAM, four USB ports (in a hub configuration), a 100Mbps Ethernet port, an HDMI port, MIPI DSI and CSI-2 connectors, a PWM digital audio jack, and 128Mbits of Flash memory for configuration and operation.
TE0726 ZynqBerry Dev Board from Trenz
By Adam Taylor
A couple of weeks ago, I talked about the Xilinx reVision stack and the support it provides for OpenVX and OpenCV. One of the most exciting things I explained was about how we could accelerate several OpenCV functions (which include the OpenVX Core functions) using the Zynq SoC’s programmable logic. What I did not look at was the other significant part of the reVision stack and its support for machine learning.
Machine learning is increasing important for embedded-vision applications because it helps systems to evolve from being vision-enabled to being vision-guided autonomous systems. Machine learning is often used for embedded-vision applications to identify and classify information contained within an image. The embedded-vision system uses these identifications and classifications to make informed decisions in real time, enabling increased interaction with the environment.
For those unfamiliar with machine learning it is most often implemented by the creation and training of a neural network. Neural networks are modelled upon the human cerebral cortex in that each neuron receives an input, processes it, and communicates the processed signal it to another neuron. Neural networks typically consist of an input layer, internal layer(s), and an output layer.
Those familiar with machine learning may have come across the term “deep learning.” This is where there are several hidden layers in the neural network, allowing more complex machine-learning algorithms to be implemented.
When working with neural networks in embedded-vision applications, we need to use a 2D network. This is where Convolutional Neural Networks (CNNs) are used. CNNs are deep-learning networks that contain several convolutional and sub-sampling layers along with a separate, fully connected network to perform the final classification. Within the convolution layer, the input image will be broken down into several overlapping smaller tiles.
The results from this convolution layer are used to create an activation map, using an activation layer in the network placed before further sub-sampling and additional stages and preceding the final, fully connected network. The exact implementation of the CNN network varies depending upon the network architecture implemented (GoogLeNet, SSD, AlexNet). However, a CNN will typically contain at least the following elements:
The weights used for each of these elements are determined via training, and one of the CNN’s advantages is the relative ease of training the network. Training requires large data sets and high-performance computers to correctly determine the weights for each stage.
To ease the development of machine-learning applications, many engineers use a framework like Caffe, which supports the implementation and training of machine learning. The use of frameworks allows us to work at a higher level and maximize reuse. Using a framework, we don’t need to start from scratch each time we develop an application.
The Xilinx reVision stack provides an integrated Caffe framework flow, which allows us to take the prototext definition of the network and trained weights to deploy the machine-learning application. (Note that network training is separate and distinct from deployment.) To enable this, the Xilinx reVision stack provides several hardware-accelerated functions that can be implemented within the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic) to create the machine-learning inference engine. The reVision stack also provides examples for a wide range of network structures, enabling us to get up and running with our machine-learning application without the need to initially compile the PL design. Once we are happy with the machine-learning application, we can then use the SDSoC flow to develop our own embedded-vision application containing the optimized machine-learning application.
Using the Zynq PL provides for an optimal implementation that delivers faster response times when interacting with the embedded-vision system environment. This is especially true as machine learning applications are increasingly implemented using fixed-point integers like INT8, which are ideal for implementation in DSP elements.
Machine learning is going to be a hot area for several applications. So I will be coming back to this topic in detail as the MicroZed Chronicles progress—with some examples of course.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
The multi-GHz processing capabilities of Xilinx FPGAs never fails to amaze me and the following video from National Instruments (NI) demonstrating the real-time signal-generation and analysis capabilities of the NI PXIe-5840 VST (Vector Signal Transceiver) are merely one more proof point. The NI VST is designed for use in a wide range of RF test systems including 5G and IoT RF applications, ultra-wideband radar prototyping, and RFIC testing. In the demo below, this 2nd-generation NI VST is generating an RF signal spanning 1.2GHz to 2.2GHz (1GHz of analog bandwidth) containing five equally spaced LTE channels. The analyzer portion of the VST is simultaneously and in real time demodulating and decoding the signal constellations in two of the five LTE channels.
The resulting analysis screen generated by NI’s LabVIEW software tells the story:
The reason that the NI PXIe-5840 VST can perform all of these feats in real time is because there’s a Xilinx Virtex-7 690T FPGA inside pulling the levers, making this happen. (NI’s 1st-generation VSTs employed Xilinx Virtex-6 FPGAs.)
Here's the 2-minute video of the NI VST demo:
Please contact National Instruments directly for more information on its VST family.
For additional blogs about NI’s line of VSTs, see:
If you’re going to strap two 12Gsamples/sec, 16-bit DACs and two 6.4Gsamples/sec, 12-bit ADCs into your VPX/AMC module, you’d better include massive real-time DSP horsepower to tame them. That’s exactly what VadaTech has done with its VPX599 and AMC599 modules by placing a Xilinx Kintex UltraScale KU115 FPGA (along with 16 or 20Gbytes of high-speed DDR4 SDRAM) on the modules’ digital carrier board mated to an FMC analog converter board.
VadaTech AMC599 ADC/DAC Module
VadaTech VPX599 ADC/DAC Module
Here’s a block diagram of the AMC599 module (the VPX599 block diagram is quite similar):
VadaTech AMC599 ADC/DAC Module Block Diagram
At these conversion rates, raw data streams to and from the host CPU are quite impractical so you must, repeat must, have on-board processing and local storage—and what other processing genie besides a Xilinx UltraScale FPGA would you trust to handle and process those sorts of extreme streams?
Please contact VadaTech directly for more information on the VPX599 and AMC599 modules.
The just-announced VICO-4 TICO SDI Converter from Village Island employs visually lossless 4:1 TICO compression to funnel a 4K60p video (on four 3G-SDI video streams or one 12G-SDI stream) into onto a single 3G-SDI output stream, which reduces infrastructure costs for transport, cabling, routing, and compression in broadcast networks.
VICO-4 4:1 SDI Converter from Village Island
Here’s a block diagram of what’s going on inside of Village Island’s VICO-4 TICO SDI Converter:
And here’s a diagram showing you what broadcasters can do with this sort of box:
The reason this is even possible in a real-time broadcast environment is because the lightweight intoPIX TICO compression algorithm has very low latency (just very a few video lines) when implemented in hardware as IP. (Software-based, frame-by-frame video compression is therefore totally out of the question in an application such as this introduces too much delay.)
Looking at the VICO-4’s main (and only) circuit board shows one main chip implementing the 4:1 compression and signal multiplexing. And that chip is… a Xilinx Kintex UltraScale KU035 FPGA. It has plenty of on-chip programmable logic for the TICO compression IP and it has sixteen 16.3Gbps transceiver ports—more than plenty to handle the 3G- and 12G-SGI I/O required by this application.
Note: Paltek in Japan is distributing Village Island’s VICO-4 board in Japan as an OEM component. The board needs 12Vdc at ~25VA.
For more information about TICO compression IP, see:
A configurable, COG (center-of-gravity), laser-line extraction algorithm allows VRmagic’s LineCam3D to resolve complex surface contours with 1/64 sub-pixel accuracy. (The actual measurement precision, which can be as small as a micrometer, depends on the optics attached to the camera.) The camera must process the captured video internally because, at its maximum 1KHz scan rate, there would be far more raw contour data than can be pumped over the camera’s GigE Vision interface. The algorithm therefore runs in real time on the camera’s internal Xilinx series 7 FPGA, which is paired with a TI DaVinci SoC to handle other processing chores and 2Gbytes of DDR3 SDRAM. The camera’s imager is a 2048x1088-pixel CMOSIS CMV2000 CMOS image sensor with a pipelined global shutter. The VRmagic LineCam3D also has a 2D imaging mode that permits the extraction of additional object information such as surface printing that would not appear on the contour scans (as demonstrated in the photo below).
Here’s a composite photo of the camera’s line-scan contour output (upper left), the original object being scanned (lower left), and the image of the object constructed from the contour scans (right):
In laser-triangulation measurement setups, the camera’s lens plane is not parallel to the scanned object’s image plane, which means that only a relatively small part of the laser-scanned image would normally be in focus due to limited depth of focus. To compensate for this, the LineCam3D integrates a 10° tilt-shift adapter into its rugged IP65/67 aluminum housing, to expand the maximum in-focus object height. Anyone familiar with photographic tilt-shift lenses—mainly used for architectural photography in the non-industrial world—immediately recognizes this as the Scheimpflug principle, which increases depth of focus by tilting the lens relative to both the imager plane and the subject plane. It’s fascinating that this industrial camera incorporates this ability into the camera body so that any C-mount lens can be used as a tilt-shift lens.
For more information about the LineCam3D camera, please contact VRmagic directly.
There’s a new line in the table for Spartan-7 FPGAs in the FPGA selection guide on page 2 showing an expanded-temperature range option of -40 to +125°C for all six family members. These are “Expanded Temp” -1Q devices. So if you have the need for extreme hi-temp (or low-temp) operation, you might want to check into these devices. Ask your friendly neighborhood Xilinx or Avnet sales rep.
For more information about the Spartan-7 FPGA product family, see:
National Instruments’ (NI’s) VirtualBench All-in-One Instrument, based on the Xilinx Zynq Z-7020 SoC, combines a mixed-signal oscilloscope with protocol analysis, an arbitrary waveform generator, a digital multimeter, a programmable DC power supply, and digital I/O. The PC- or tablet-based user-interface software allows you to make all of those instruments play together as a troubleshooting symphony. That point is made very apparent in this new 3-minute video demonstrating the speed at which you can troubleshoot circuits using all of the VirtualBench’s capabilities in concert:
For more Xcell Daily blog posts about the NI VirtualBench All-in-One instrument, see:
For more information about the VirtualBench, please contact NI directly.
I’ve known this was coming for more than a week, but last night I got double what I expected. Digilent’s Web site has been teasing the new Arty Z7 Zynq SoC dev board for makers and hobbyists for a week—but with no listed price. Last night, prices appeared. That’s right, there are two versions of the board available:
Digilent Arty Z7 dev board for makers and hobbyists
Other than that, the board specs appear identical.
The first thing you’ll note from the photo is that there’s a Zynq SoC in the middle of the board. You’ll also see the board’s USB, Ethernet, Pmod, and HDMI ports. On the left, you can see double rows of tenth-inch headers in an Arduino/chipKIT shield configuration. There are a lot of ways to connect to this board, which should make it a student’s or experimenter’s dream board considering what you can do with a Zynq SoC. (In case you don’t know, there’s a dual-core ARM Cortex-A9 MPCore processor on the chip along with a hearty serving of FPGA fabric.)
Oh yeah. The Xilinx Vivado HL Design Suite WebPACK tools? Those are available at no cost. (So is Digilent’s attractive cardboard packaging, according to Arty Z7 Web page.)
Although the Arty Z7 board has now appeared on Digilent’s Web site, the product’s Web page says the expected release date is March 27. That’s five whole days away!
As they say, operators are standing by.
Please contact Digilent directly for more Arty Z7 details.
The organizers of last week’s Embedded World show in Nuremberg gave out embedded AWARDS in three categories last week during the show and MathWorks’ HDL Coder won in the tools category. (See the announcement here.) If you don’t know about this unique development tool, now is a good time to become acquainted with it. HDL Coder accepts model-based designs created using MathWorks’ MATLAB and Simulink and can generate VHDL or Verilog for all-hardware designs or hardware and software code for designs based on a mix of custom hardware and embedded software running on a processor. That means that HDL Coder works well with Xilinx FPGAs and Zynq SoCs.
Here’s a diagram of what HDL Coder does:
You might also want to watch this detailed MathWorks video titled “Accelerate Design Space Exploration Using HDL Coder Optimizations.” (Email registration required.)
For more information about using MathWorks HDL Coder to target your designs for Xilinx devices, see:
Sundance PXIe700 module based on a Xilinx Kintex-7 FPGA
Here’s a block diagram of the Sundance PXIe700 module:
Sundance PXIe700 module based on a Xilinx Kintex-7 FPGA, Block Diagram
Sundance provides this board with the SCom IP Core, which communicates to the host through the PCIe interface and provides the user logic instantiated in the Kintex-7 FPGA with a multichannel streaming interface to the host CPU and sample applications. Other IP cores, a Windows driver, DLL, and user-interface software are also available. The PXIe700 data sheet also mentions a VideoGuru toolset that can turn this hardware into a video test center for NTSC, VGA, DVI, SMPTE, GigE-Vision, and other video standards. (Contact Sundance DSP for more details.)
The Sundance product page also shows the PXIe700 board with a couple of Sundance FMC modules attached as example applications:
Sundance PXIe700 with attached FMC-DAQ2p5 multi-Gsample/sec ADC and DAC card
Sundance PXIe700 with attached FMC-ADC500-5 5-channel, 500Msamples/sec, 16-bit ADC card
You want to learn how to design with and use RF, right? Students from all levels and backgrounds looking to improve their RF knowledge will want to take a look at the new ADALM-PLUTO SDR USB Learning Module from Analog Devices. The $149 USB module has an RF range of 325MHz to 3.8GHz with separate transmit and receive channels and 20MHz of instantaneous bandwidth. It pairs two devices that seem made for each other: an Analog Devices AD9363 Agile RF Transceiver and a Xilinx Zynq Z-7010 SoC.
Analog Devices’ $149 ADALM-PLUTO SDR USB Learning Module
Here’s an extremely simplified block diagram of the module:
Analog Devices’ ADALM-PLUTO SDR USB Learning Module Block Diagram
However, the learning module’s hardware is of little use without training material and Analog Devices has already created dozens of online tutorials and teaching materials for this device including ADS-B aircraft position, receiving NOAA and Meteor-M2 weather satellite imagery, GSM analysis, listening to TETRA signals, and pager decoding.
By Adam Taylor
So far, we have addressed how to communicate, control, and read conversions from the Sysmon block in the Zynq UltraScale+ MPSoC’s PS (processing system). However, we have not addressed the Sysmon in the Zynq UltraScale+ MPSoC’s PL (programmable logic), which provides us the ability to monitor external signals.
As I mentioned in my first blog looking at the Zynq UltraScale+ MPSoC’s AMS capabilities, the first thing we need to do to use the PL Sysmon is check that it is available. We do this by reading the AMS block’s combined status register, which resides at address 0xFFA50044, named PL_SYSMON_CONTROL_STATUS. Only the LSB is used within this register. If that bit is set high, then the PL Sysmon is available. Otherwise, we should not try to address it.
We can access this register using a simple call to read the address as below:
pl_status = Xil_In32(0xFFA50044); // pl_status is a u32
Once we have confirmed availability, we can use the same approach that we did in the previous blog to configure the PS Sysmon. The only difference is that this time we will be configuring and using PL Sysmon, not the PS Sysmon. The exception this time it that as opposed to using the XSYSMON_PS block identifier, which says we wish to us the PS Sysmon, this time we need to use the XSYSMON_PL block identifier to use the PL Sysmon. This block identifier establishes the base address of the Sysmon we wish to use.
We can select the Sysmon channel wish to monitor using the sequencer masks available within xsysmonpsu_hw.h. Most of these sequencer channels are named as per their channel (e.g. VP_VN) as they are dedicated to one or the other Sysmon. The supply channels are not so named. As such, we need to understand when addressing either the PL or PS Sysmon what voltage parameter we are monitoring. The table below shows to monitored parameter for the both the PS and the PL channels.
What is very helpful is the ability of the PL Sysmon to monitor the PS core voltages. This is especially of interest for mission-critical applications. We will talk more about this in future blogs.
Putting this all together in a simple application that we run on the Zynq UltraScale+ MPSoC’s ARM cortex-A53 cores (the ARM Cortex-R5 cores can do the same), we can see the following results when reading both the PL and PS Sysmons:
Mapping the supplies above against the Zynq UltraScale+ MPSoC’s data sheet and IOCC user guide, voltages for the IO banks show both Sysmons are reading the correct values.
At this point, we have addressed either one or the other of the Sysmon blocks available using the block identifier. However, we can also read back several other critical voltages via the AMS block. This AMS Block is the third block identifier available which is XSYSMON_AMS.
We use this third block identifier to address the AMS block, within which we can read back the values of several other system critical voltages. These voltages include the five Zynq UltraScale+ MPSoC PS PLL voltages; the DDR PLL voltage; the battery voltage; and the VCC INT, BRAM, and AUX voltages.
We can address these voltages within the AMS block using the correct channel number and the “get ADC data” function as shown below:
XSysMonPsu_GetAdcData(InstancePtr, XSM_CH_VCC_PSLL0, XSYSMON_AMS);
When we are working with these channel numbers, it can get confusing as to what block identifier we should be using. It is worthwhile remembering that channels 0-6, 8-10 and 13-37 are addressed by the PL or PS Sysmon block identifiers while channels 38-53 to are addressed only by the AMS identifier.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
I did not go to Embedded World in Nuremberg this week but apparently SemiWiki’s Bernard Murphy was there and he’s published his observations about three Zynq-based reference designs that he saw running in Aldec’s booth on the company’s Zynq-based TySOM embedded dev and prototyping boards.
Aldec TySOM-2 Embedded Prototyping Board
Murphy published this article titled “Aldec Swings for the Fences” on SemiWiki and wrote:
“At the show, Aldec provided insight into using the solution to model the ARM core running in QEMU, together with a MIPI CSI-2 solution running in the FPGA. But Aldec didn’t stop there. They also showed off three reference designs designed using this flow and built on their TySOM boards.
“The first reference design targets multi-camera surround view for ADAS (automotive – advanced driver assistance systems). Camera inputs come from four First Sensor Blue Eagle systems, which must be processed simultaneously in real-time. A lot of this is handled in software running on the Zynq ARM cores but the computationally-intensive work, including edge detection, colorspace conversion and frame-merging, is handled in the FPGA. ADAS is one of the hottest areas in the market and likely to get hotter since Intel just acquired Mobileye.
“The next reference design targets IoT gateways – also hot. Cloud interface, through protocols like MQTT, is handled by the processors. The gateway supports connection to edge devices using wireless and wired protocols including Bluetooth, ZigBee, Wi-Fi and USB.
“Face detection for building security, device access and identifying evil-doers is also growing fast. The third reference design is targeted at this application, using similar capabilities to those on the ADAS board, but here managing real-time streaming video as 1280x720 at 30 frames per second, from an HDR-CMOS image sensor.”
The article contains a photo of the Aldec TySOM-2 Embedded Prototyping Board, which is based on a Xilinx Zynq Z-7045 SoC. According to Murphy, Aldec developed the reference designs using its own and other design tools including the Aldec Riviera-PRO simulator and QEMU. (For more information about the Zynq-specific QEMU processor emulator, see “The Xilinx version of QEMU handles ARM Cortex-A53, Cortex-R5, Cortex-A9, and MicroBlaze.”)
Then Murphy wrote this:
“So yes, Aldec put together a solution combining their simulator with QEMU emulation and perhaps that wouldn’t justify a technical paper in DVCon. But business-wise they look like they are starting on a much bigger path. They’re enabling FPGA-based system prototype and build in some of the hottest areas in systems today and they make these solutions affordable for design teams with much more constrained budgets than are available to the leaders in these fields.”
Digilent says that its new $199.99 Digital Discovery—a low-cost USB instrument that combines a 24-channel, 800Msamples/sec logic analyzer; a 16-bit, 100Msamples/sec digital pattern generator; and a 100mA power supply—“was created to be the ultimate embedded development companion.” Further, “Its features and specifications were deliberately chosen to maintain a small and portable form factor, withstand use in a variety of environments, and keep costs down, while balancing the requirements of operating on USB Power.”
(Note: Skip to the bottom of this blog for a limited-time offer. Then come back.)
Here’s a photo of the Digital Discovery:
Digilent’s Digital Discovery—a combined 24-channel, 800Msamples/sec logic analyzer and 16-bit, 100Msamples/sec digital pattern generator
If that form factor looks familiar, you’re probably reminded of the company’s Analog Discovery and Analog Discovery 2 100Msamples/sec USB DSO, logic analyzer, and power supply. (See “$279 Analog Discovery 2 DSO, logic analyzer, power supply, etc. relies on Spartan-6 for programmability, flexibility.) And just like the Analog Discovery modules, the Digilent Digital Discovery is based on a Xilinx Spartan-6 FPGA (an LX25), which becomes pretty clear when you look at the product’s board photo. The Spartan-6 FPGA is right there in the center of the board:
Digilent’s Digital Discovery is based on a Xilinx Spartan-6 FPGA
When I write “based on,” what I mean to say is that Digilent’s IP in the Spartan-6 FPGA pretty much implements the entire low-cost instrument—as clearly shown in the block diagram:
Digilent’s Digital Discovery Module Block Diagram
And what does this $199.99 instrument do, considering that it’s implemented using a low-cost FPGA? Here are the specs (and pretty impressive specs they are):
*Note: to obtain speeds of 200MS/s and higher, the High Speed Adapter must be used.
So, you may have noted that asterisk on the logic analyzer's maximum sample rate. You need a Digital Discovery High Speed Adapter to attain the full 800Msamples/sec acquisition rate on the logic analyzer. Normally, that’s another $49.99. However, for the first 100 Digital Discovery buyers, Digilent is throwing in the high-speed adapter in for free.
Operators are standing by.
AEye is the latest iteration of the eye-tracking technology developed by EyeTech Digital Systems. The AEye chip is based on the Zynq Z-7020 SoC. It’s located immediately adjacent to the imaging sensor, which creates compact, stand-alone systems. This technology is finding its way into diverse vision-guided systems in the automotive, AR/VR, and medical diagnostic arenas. According to EyeTech, the Zynq SoC’s unique abilities allows the company to create products they could not do any other way.
With the advent of the reVISION stack, EyeTech is looking to expand its product offerings into machine learning, as discussed in this short, 3-minute video:
For more information about EyeTech, see:
Next week at OFC 2017 in Los Angeles, Acacia Communications, Optelian, Precise-ITC, Spirent, and Xilinx will present the industry’s first interoperability demo supporting 200/400GbE connectivity over standardized OTN and DWDM. Putting that succinctly, the demo is all about packing more bits/λ, so that you can continue to use existing fiber instead of laying more.
Callite-C4 400GE/OTN Transponder IP from Precise-ITC instantiated in a Xilinx Virtex UltraScale+ VU9P FPGA will map native 200/400GbE traffic—generated by test equipment from Spirent—into 2x100 and 4x100 OTU4-encapsulated signals. The 200GbE and 400GbE standards are still in flux, so instantiating the Precise-ITC transponder IP in an FPGA allows the design to quickly evolve with the standards with no BOM or board changes. Concise translation: faster time to market with much less risk.
Callite-C4 400GE/OTN Transponder IP Block Diagram
Optelian’s TMX-2200 200G muxponder, scheduled for release later this year, will muxpond the OTU4 signals into 1x200Gbps or 2x200Gbps DP-16QAM using Acacia Communications’ CFP2-DCO coherent pluggable transceiver.
The Optelian and Precise-ITC exhibit booths at OFC 2017 are 4139 and 4141 respectively.
By Adam Taylor
In looking at the Zynq UltraScale+ MPSoC’s AMS capabilities so far, we have introduced the two slightly different Sysmon blocks residing within the Zynq UltraScale+ MPSoC’s PS (processing system) and PL (programmable logic). In this blog, I am going to demonstrate how we can get the PS Symon up and running when we use both the ARM Cortex-A53 and Cortex-R5 processor cores in the Zynq UltraScale+ MPSoC’s PS. There is little difference when we use both types of processor, but I think it important to show you how to use both.
The process to use the Sysmon is the same as it is for many of the peripherals we have looked at previously with the MicroZed Chronicles:
The function names in parentheses are those which we use to perform the operation we desire, provided we pass the correct parameters. In the simplest case, as in this example, we can then poll the output registers using the XSysMonPsu_GetAdcData() function. All of these functions are defined within the file xsysmonpsu.h, which is available under the board Support Package Lib Src directory in SDK.
Examining the functions, you will notice that each of the functions used in step 4 to 8 require an input parameter called SysmonBlk. You must pass this parameter to the function. This parameter is how we which Sysmon (within the PS or the PL) we want to address. For this example, we will be specifying the PS Sysmon using XSYSMON_PS, which is also defined within xsysmonpsu.h. If we want to address the PL, we use the XSYSMON_PL definition, which we will be looking at next time.
There is also another header file which is of use and that is xsysmonpsu_hw.h. Within this file, we can find the definitions required to correctly select the channels we wish to sample in the sequencer. These are defined in the format:
This simple example samples the following within the PS Sysmon:
We can use conversion functions provided within the xsysmonpsu.h to convert from the raw value supplied by the ADC into temperature and voltage. However, the PS IO banks are capable of supporting 3v3 logic. As such, the conversion macro from raw reading to voltage is not correct for these IO banks or for the HD banks in the PL. (We will look at different IO bank types in another blog).
The full-scale voltage is 3V for most of the voltage conversions. However, in line with UG580 Pg43, we need to use a full scale of 6V for the PS IO. Otherwise we will see a value only half of what we are expecting for that bank’s supply voltage setting. With this in mind, my example contains a conversion function at the top of the source file to be used for these IO banks, to ensure that we get the correct value.
The Zynq UltraScale+ MPSoC architecture permits both the APU (the ARM Cortex-A53 processors) and the RPU (the ARM Cortex-R5 processors) to address the Sysmon. To demonstrate this, the same file was used in applications first targeting an ARM Cortex-A53 processor in the APU and then targeting the ARM Cortex-R5 processor in the RPU. I used Core 0 in both cases.
The only difference between these two cases was the need to create new applications that select the core to be targeted and then updating the FSBL to load the correct core. (See “Adam Taylor’s MicroZed Chronicles, Part 172: UltraZed Part 3—Saying hello world and First-Stage Boot” for more information on how to do this.)
Results when using the ARM Cortex-A53 Core 0 Processor
Results when using the ARM Cortex-R5 Core 0 Processor
When I ran the same code, which is available in the GitHub directory, I received the examples as above over the terminal program, which show it working on both the ARM Cortex-A53 and ARM Cortex-R5 cores.
Next time we will look at how we can use the PL Sysmon.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
This week, EETimes’ Junko Yoshida published an article titled “Xilinx AI Engine Steers New Course” that gathers some comments from industry experts and from Xilinx with respect to Monday’s reVISION stack announcement. To recap, the Xilinx reVISION stack is a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference.
As Xilinx Senior Vice President of Corporate Strategy Steve Glaser tells Yoshida, “Xilinx designed the stack to ‘enable a much broader set of software and systems engineers, with little or no hardware design expertise to develop, intelligent vision guided systems easier and faster.’”
“While talking to customers who have already begun developing machine-learning technologies, Xilinx identified ‘8 bit and below fixed point precision’ as the key to significantly improve efficiency in machine-learning inference systems.”
Yoshida also interviewed Karl Freund, Senior Analyst for HPC and Deep Learning at Moor Insights & Strategy, who said:
“Artificial Intelligence remains in its infancy, and rapid change is the only constant.” In this circumstance, Xilinx seeks “to ease the programming burden to enable designers to accelerate their applications as they experiment and deploy the best solutions as rapidly as possible in a highly competitive industry.”
She also quotes Loring Wirbel, a Senior Analyst at The Linley group, who said:
“What’s interesting in Xilinx's software offering, [is that] this builds upon the original stack for cloud-based unsupervised inference, Reconfigurable Acceleration Stack, and expands inference capabilities to the network edge and embedded applications. One might say they took a backward approach versus the rest of the industry. But I see machine-learning product developers going a variety of directions in trained and inference subsystems. At this point, there's no right way or wrong way.”
There’s a lot more information in the EETimes article, so you might want to take a look for yourself.
Next week at the OFC Optical Networking and Communication Conference & Exhibition in Los Angeles, Xilinx will be in the Ethernet Alliance booth demonstrating the industry’s first, standard-based, multi-vendor 400GE network. A 400GE MAC and PCS instantiated in a Xilinx Virtex UltraScale+ VU9P FPGA will be driving a Finisar 400GE CFP8 optical module, which in turn will communicate with a Spirent 400G test module over a fiber connection.
In addition, Xilinx will be demonstrating:
If you’re visiting OFC, be sure to stop by the Xilinx booth (#1809).
PLDA has announced the XpressRICH4-AXI PCIe 4.0 configurable IP block that ties an on-chip AXI bus to PICe 4.0. The IP block complies with the PCI Express Base 4.0r7 specification and supports endpoint, root port, and dual mode configurations. The IP supports Xilinx Virtex-7, Virtex UltraScale, and Kintex UltraScale devices and can be used for ASIC design as well.
Here’s a block diagram of the core:
PLDA XpressRICH4-AXI PCIe 4.0 configurable IP Block Diagram
Please contact PLDA directly for more information about this IP.
This week at Embedded World in Nuremberg, Lynx Software Technologies is demonstrating its port of the LynxSecure Separation Kernel hypervisor to the ARM Cortex-A53 processors on the Xilinx Zynq UltraScale+ MPSoC. According to Robert Day, Vice President of Marketing at Lynx, "ARM designers are now able to run safety critical environments alongside a general purpose OS like Linux or LynxOS RTOS on the same Xilinx processor without compromising safety, security or real-time performance. Use cases include automotive systems based on environments such as AUTOSAR RTA-BSW from ETAS and avionics designs using LynxOS-178 RTOS from Lynx. Designers can match the security of air-gap hardware partitioning without incurring the cost, power and size overhead of separate hardware."
The LynxSecure port to the Zynq UltraScale+ MPSoC supports modular software architectures and tight integration with the Zynq UltraScale+ MPSoC’s FPGA fabric for hosting bare-metal applications, trusted functions, and open-source projects on a single SoC with secure partitioning. You have the option to decide which functions run in software using LynxSecure bare-metal apps and which functions you need to hardware-accelerate through the Zynq UltraScale+ MPSoC’s FPGA fabric.
The LunxSecure technology was designed to satisfy high-assurance computing requirements in support of the NIST, NSA Common Criteria, and NERC CIP evaluation processes which are used to regulate military and industrial computing environments.
The LynxSecure Separation Kernel hypervisor provides:
Here’s a diagram of the LynxSecure Separation Kernel hypervisor architecture:
Please contact Lynx Software Technologies directly for information about the LynxSecure Separation Kernel hypervisor.
Machine learning and machine inference based on CNNs (convolutional neural networks) are the latest way to classify images and, as I wrote in Monday’s blog post about the new Xilinx reVISION announcement, “The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down.” (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”) The challenge now is to make the CNNs run faster while consuming less power. It would be nice to make them easier to use as well.
OK, that’s a setup. A paper published last month at the 25th International Symposium on Field Programmable Gate Arrays titled “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference” describes a method to speed up CNN-based inference while cutting power consumption by reducing CNN precision in the inference machines. As the paper states:
“…a growing body of research demonstrates this approach [CNN] incorporates significant redundancy. Recently, it has been shown that neural networks can classify accurately using one- or two-bit quantization for weights and activations. Such a combination of low-precision arithmetic and small memory footprint presents a unique opportunity for fast and energy-efficient image classification using Field Programmable Gate Arrays (FPGAs). FPGAs have much higher theoretical peak performance for binary operations compared to floating point, while the small memory footprint removes the off-chip memory bottleneck by keeping parameters on-chip, even for large networks. Binarized Neural Networks (BNNs), proposed by Courbariaux et al., are particularly appealing since they can be implemented almost entirely with binary operations, with the potential to attain performance in the teraoperations per second (TOPS) range on FPGAs.”
The paper then describes the techniques developed by the authors to generate BNNs and instantiate them into FPGAs. The results, based on experiment using a Xilinx ZC706 eval kit based on a Zynq Z-7045 SoC, are impressive:
“When it comes to pure image throughput, our designs outperform all others. For the MNIST dataset, we achieve an FPS which is over 48/6x over the nearest highest throughput design  for our SFC-max/LFC-max designs respectively. While our SFC-max design has lower accuracy than the networks implemented by Alemdar et al. for our LFC-max design outperforms their nearest accuracy design by over 6/1.9x for throughput and FPS/W respectively. For other datasets, our CNV-max design outperforms TrueNorth for FPS by over 17/8x for CIFAR-10 / SVHN datasets respectively, while achieving 9.44x higher throughput than the design by Ovtcharov et al., and 2:2x over the fastest results reported by Hegde et al. Our prototypes have classification accuracy within 3% of the other low-precision works, and could have been improved by using larger BNNs.”
There’s something even more impressive, however. This design approach to creating BNNs is so scalable that it’s now on a low-end platform—the $229 Digilent PYNQ-Z1. (Digilent’s academic price for the PYNQ-Z1 is only $65!) Xilinx Research Labs in Ireland, NTNU (Norwegian U. of Science and Technology), and the U. of Sydney have released an open-source Binarized Neural Network (BNN) Overlay for the PYNQ-Z1 based on the work described in the above paper.
According to Giulio Gambardella of Xilinx Reseach Labs, “…running on the PYNQ-Z1 (a smaller Zynq 7020), [the PYNQ-Z1] can achieve 168,000 image classifications per second with 102µsec latency on the MNIST dataset with 98.40% accuracy, and 1700 images per seconds with 2.2msec latency on the CIFAR-10, SVHN, and GTSRB dataset, with 80.1%, 96.69%, and 97.66% accuracy respectively running at under 2.5W.”
Digilent PYNQ-Z1 board, based on a Xilinx Zynq Z-7020 SoC
Because the PYNQ-Z1 programming environment centers on Python and the Jupyter development environment, there are a number of Jupyter notebooks associated with this package that demonstrate what the overlay can do through live code that runs on the PYNQ-Z1 board, equations, visualizations and explanatory text and program results including images.
There are also examples of this BNN in practical application:
For more information about the Digilent PYNQ-Z1 board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.”
Today, EEJournal’s Kevin Morris has published a review article of the announcement titled “Teaching Machines to See: Xilinx Launches reVISION” following Monday’s announcement of the Xilinx reVISION stack for developing vision-guided applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”
“But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms.
“That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric - such as Xilinx Zynq and Altera SoC FPGAs - absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space.”
Later on, Morris gets to the fly in the ointment:
“Oh, yeah, There’s still that “almost impossible to program” issue.”
And then he gets to the solution:
“reVISION, announced this week, is a stack - a set of tools, interfaces, and IP - designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development - with customers predicting cuts of as much as 12 months from current schedules for new product and update development.
In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.
The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ⅕ the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications.”
But don’t take my word for it, read Morris’ article yourself.
Pentek has published the 10th edition of “Putting FPGAs to Work in Software Radio Systems,” a 90-page tutorial written by Rodger H. Hosking, Pentek’s Vice-President & Cofounder. As the preface of this tutorial guide states:
“FPGAs have become an increasingly important resource for software radio systems. Programmable logic technology now offers significant advantages for implementing software radio functions such as DDCs (Digital Downconverters). Over the past few years, the functions associated with DDCs have seen a shift from being delivered in ASICs (Application-Specific ICs) to operating as IP (Intellectual Property) in FPGAs.
“For many applications, this implementation shift brings advantages that include design flexibility, higher precision processing, higher channel density, lower power, and lower cost per channel. With the advent of each new, higher-performance FPGA family, these benefits continue to increase.
“This handbook introduces the basics of FPGA technology and its relationship to SDR (Software-Defined Radio) systems. A review of Pentek’s GateFlow FPGA Design Resources is followed by a discussion of features and benefits of FPGA-based DDCs. Pentek SDR products that utilize FPGA technology and applications based on such products are also presented.”
Pentek has long used Xilinx All Programmable devices in its board-level products and that long experience shows in some unique, multi-generational analysis of the performance improvements in Xilinx’s Virtex FPGA generations starting with the Virtex-II Pro family (introduced in 2002) and moving through the Virtex-7 device family.
Yesterday, Xilinx announced the reVISION stack for software-defined, embedded-vision apps. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”) Today, there’s a video demo of an application built with the reVISION stack and a second with more technical detail on optical flow. The first demo shows a working system that accepts video from two cameras, performs an optical flow transformation on one video stream, processes a recorded video stream from an SD card through a CNN, and then merges the video and optical-flow representations and outputs the resulting video on an HDMI screen. All of these operations occur within one Xilinx Zynq UltraScale+ ZU9EG MPSoC running on a Xilinx ZCU102 eval kit.
Here’s a block diagram of the demo showing how the three video streams are captured, processed, and displayed by the Zynq UltraScale+ MPSoC:
The two image sensors in this first demo are worthy of note. One is the new Sony IMX274LQC 4K60 image sensor designed for industrial applications. It’s connected to the ZCU102 eval kit over a MIPI interface running through one of the eval kit’s FMC connectors. The second camera is a Stereolabs ZED 2K stereo camera interfaced over USB 3.0, a high-speed native interface for the Zynq UltraScale+ MPSoC.
Here’s the 2-minute video of the operating demo:
For more details on the optical-flow part of the demo, here’s another 2-minute video:
By Dr. Rajan Bedi, Spacechips
Several of my satcom ground-segment clients and I are considering Xilinx's recently announced RFSoC for future transceivers and I want to share the benefits of this impending device. (Note: For more information on the Xilinx RFSoC, see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable,” we mean it!”)
Direct RF/IF sampling and direct DAC up-conversion are currently being used very successfully in-orbit and on the ground. For example, bandpass sampling provides flexible RF frequency planning with some spacecraft by directly digitizing L- and S-band carriers to remove expensive and cumbersome superheterodyne down-conversion stages. Today, many navigation satellites directly re-construct the L-band carrier from baseband data without using traditional up-conversion. Direct RF/IF Sampling and direct DAC up-conversion have dramatically reduced the BOM, size, weight, power consumption, as well as the recurring and non-recurring costs of transponders. Software-defined radio (SDR) has given operators real scalability, reusability, and reconfigurability. Xilinx's new RFSoC will offer further hardware integration advantages for the ground segment.
The Xilinx RFSoC integrates multi-Gsamples/sec ADCs and DACs into a 16nm Zynq UltraScale+ MPSoC. At this geometry and with this technology, the mixed-signal converters draw very little power and economies of scale make it possible to add a lot of digital post-processing (Small A/Big D!) to implement functions such as DDC (digital down-conversion), DUC (digital up-conversion), AGC (automatic gain control), and interleaving calibration.
While CMOS scaling has improved ADC and DAC sample rates, which results in greater bandwidths at lower power, the transconductance of transistors and the size of the analog input/output voltage swing are reduced for analog designs, which impacts G/T at the satellite receiver. (G/T is antenna gain-to-noise-temperature, a figure of merit in the characterization of antenna performance where G is the antenna gain in decibels at the receive frequency and T is the equivalent noise temperature of the receiving system in kelvins. The receiving system’s noise temperature is the summation of the antenna noise temperature and the RF-chain noise temperature from the antenna terminals to the receiver output.)
Integrating ADCs and DACs with Xilinx's programmable MPSoC fabric shrinks physical footprint, reduces chip-to-chip latency, and completely eliminates the external digital interfaces between the mixed-signal converters and the FPGA. These external interfaces typically consume appreciable power. For parallel-I/O connections, they also need large amounts of pc board space and are difficult to route.
There will be a number of devices in the Xilinx RFSoC family, each containing different ADC/DAC combinations targeting different markets. Depending on the number of integrated mixed-signal converters, Xilinx is predicting a 55% to 77% reduction in footprint compared to current discrete implementations using JESD204B high-speed serial links between the FPGA and the ADCs and DACs, as illustrated below. Integration will also benefit clock distribution both at the device and system level.
Figure 1: RFSoC device concept (Source Xilinx)
The RFSoC’s integrated 12-bit ADCs can each sample up to 4Gsamples/sec, which offers flexible bandwidth and RF frequency-planning options. The analog input bandwidth of each ADC appears to be 4GHz, which allows direct RF/IF sampling up to the S-band.
Direct RF/IF sampling obeys the bandpass Nyquist Theorem when oversampling at 2x the information bandwidth (or greater) and undersampling the absolute carrier frequencies. For example, the spectrum below shows a 48.5MHz L-band signal centerd at 1.65GHz, digitized using an undersampling rate of 140.5Msamples/sec. The resulting oversampling ratio is 2.9 with the information located in the 24th Nyquist zone. Digitization aliases the bandpass information to the first Nyquist zone, which may or may not be baseband depending on your application. If not, the RFSoC's integrated DDC moves the alias to dc, allowing the use of a low-pass filter.
Figure 2: Direct L-Band Sampling
As the sample rate increases, the noise spectral density spreads across a wider Nyquist region with respect to the original signal bandwidth. Each time the sampling frequency doubles, the noise spectral density decreases by 3dB as the noise re-distributes across twice the bandwidth, which increases dynamic range and SNR. Understandably, operators want to exploit this processing gain! A larger oversampling ratio also moves the aliases further apart, relaxing the specification of the anti-aliasing filter. Furthermore, oversampling increases the correlation between successive samples in the time-domain, allowing the use of a decimating filter to remove some samples and reduce the interface rate between the ADC and the FPGA.
The RFSoC’s integrated 14-bit DACs operate up to 6.4Gsamples/sec, which also offers flexible bandwidth and RF frequency-planning options.
Just like any high-frequency, large bandwidth mixed-signal device, designing an RFSoC into a system requires careful consideration of floor-planning, front/back-end component placement, routing, grounding, and analog-digital segregation to achieve the required system performance. The partitioning starts at the die and extends to the module/sub-system level with all the analog signals (including the sampling clock) typically on one side of an ADC or DAC. Given the RFSoC's high sampling frequencies, at the pcb level, analog inputs and outputs must be isolated further to prevent crosstalk between adjacent channels and clocks, and from digital noise.
At low carrier frequencies, the performance of an ADC or DAC is limited by its resolution and linearity (DNL/INL). However at higher signal frequencies, SNR is determined primarily by the sampling clock’s purity. For direct RF/IF applications, minimizing jitter will be key to achieving the desired performance as shown below:
Figure 3: SNR of an ideal ADC vs analog input frequency and clock jitter
While there are aspects of the mixed-signal processing that could be improved, from the early announcements and information posted on their website, Xilinx has done a good job with the RFSoC. Although not specifically designed for satellite communication, but more so for 5G MIMO and wireless backhaul, the RFSoC's ADCs and DACs have sufficient dynamic range and offer flexible RF frequency-planning options for many ground-segment OEMs.
The specification of the RFSoC's ADC will allow ground receivers to directly digitize the information broadcast at traditional satellite communication frequencies at L- and S-band as well as the larger bandwidths used by high-throughput digital payloads. Thanks to its reprogrammability, the same RFSoC-based architecture with its wideband ADCs can be re-used for other frequency plans without having to re-engineer the hardware.
The RFSoC's DAC specification will allow ground transmitters to directly construct approximately 3GHz of bandwidth up to the X-band (9.6GHz). Xilinx says that first samples of RFSoC will become available in 2018 and I look forward to designing the part into satcom systems and sharing my experiences with you.
Dr. Rajan Bedi pioneered the use of Direct RF/IF Sampling and direct DAC up-conversion for the space industry with many in-orbit satellites currently using these techniques. He was previously invited by ESA and NASA to present his work and was also part of the project teams which developed many of the ultra-wideband ADCs and DACs currently on the market. These devices are successfully operating in orbit today. Last year, his company, Spacechips, was awarded High-Reliability Product of the Year for advancing Software Defined Radio.
Spacechips provides space electronics design consultancy services to manufacturers of satellites and spacecraft around the world. The company also helps OEMs assess the benefits of COTS components and exploit the advantages of direct RF/IF sampling and direct DAC up-conversion. Prior to founding Spacechips, Dr. Bedi headed the Mixed-Signal Design Group at Airbus Defence & Space in the UK for twelve years. Rajan is the author of Out-of-this-World Design, the popular, award-winning blog on Space Electronics. He also teaches direct RF/IF sampling and direct DAC up-conversion techniques in his Mixed-Signal and FPGA courses which are offered around the world. Rajan offers a series of unique training courses, Courses for Rocket Scientists, which teach and compare all space-grade FPGAs as well as the use of COTS Xilinx UltraScale and UltraScale+ parts for implementing spacecraft IP. Rajan has designed every space-grade FPGA into satellite systems!
As part of today’s reVISION announcement of a new, comprehensive development stack for embedded-vision applications, Xilinx has produced a 3-minute video showing you just some of the things made possible by this announcement.
Here it is:
By Adam Taylor
Several times in this series, we have looked at image processing using the Avnet EVK and the ZedBoard. Along with the basics, we have examined object tracking using OpenCV running on the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PS (processing system) and using HLS with its video library to generate image-processing algorithms for the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic, see blogs 140 to 148 here).
Xilinx’s reVision is an embedded-vision development stack that provides support for a wide range of frameworks and libraries often used for embedded-vision applications. Most exciting, from my point of view, is that the stack includes acceleration-ready OpenCV functions.
The stack itself is split into three layers. Once we select or define our platform, we will be mostly working at the application and algorithm layers. Let’s take a quick look at the layers of the stack:
As I mentioned above one of the most exciting aspects of the reVISION stack is the ability to accelerate a wide range of OpenCV functions using the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL. We can group the OpenCV functions that can be hardware-accelerated using the PL into four categories:
What is very interesting with these function calls is that we can optimize them for resource usage or performance within the PL. The main optimization method is specifying the number of pixels to be processed during each clock cycle. For most accelerated functions, we can choose to process either one or eight pixels. Processing more pixels per clock cycle reduces latency but increases resource utilization. Processing one pixel per clock minimizes the resource requirements at the cost of increased latency. We control the number of pixels processed per clock in via the function call.
Over the next few blogs, we will look more at the reVision stack and how we can use it. However in the best Blue Peter tradition, the image below shows the result of running a reVision Harris OpenCV acceleration function within the PL when accelerated.
Accelerated Harris Corner Detection in the PL
Code is available on Github as always.
Today, Xilinx announced a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference. It’s called the reVISION stack and it allows design teams without deep hardware expertise to use a software-defined development flow to combine efficient machine-learning and computer-vision algorithms with Xilinx All Programmable devices to create highly responsive systems. (Details here.)
The Xilinx reVISION stack includes a broad range of development resources for platform, algorithm, and application development including support for the most popular neural networks: AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN. Additionally, the stack provides library elements such as pre-defined and optimized implementations for CNN network layers, which are required to build custom neural networks (DNNs and CNNs). The machine-learning elements are complemented by a broad set of acceleration-ready OpenCV functions for computer-vision processing.
For application-level development, Xilinx supports industry-standard frameworks including Caffe for machine learning and OpenVX for computer vision. The reVISION stack also includes development platforms from Xilinx and third parties, which support various sensor types.
The reVISION development flow starts with a familiar, Eclipse-based development environment; the C, C++, and/or OpenCL programming languages; and associated compilers all incorporated into the Xilinx SDSoC development environment. You can now target reVISION hardware platforms within the SDSoC environment, drawing from a pool of acceleration-ready, computer-vision libraries to quickly build your application. Soon, you’ll also be able to use the Khronos Group’s OpenVX framework as well.
For machine learning, you can use popular frameworks including Caffe to train neural networks. Within one Xilinx Zynq SoC or Zynq UltraScale+ MPSoC, you can use Caffe-generated .prototxt files to configure a software scheduler running on one of the device’s ARM processors to drive CNN inference accelerators—pre-optimized for and instantiated in programmable logic. For computer vision and other algorithms, you can profile your code, identify bottlenecks, and then designate specific functions that need to be hardware-accelerated. The Xilinx system-optimizing compiler then creates an accelerated implementation of your code, automatically including the required processor/accelerator interfaces (data movers) and software drivers.
The Xilinx reVISION stack is the latest in an evolutionary line of development tools for creating embedded-vision systems. Xilinx All Programmable devices have long been used to develop such vision-based systems because these devices can interface to any image sensor and connect to any network—which Xilinx calls any-to-any connectivity—and they provide the large amounts of high-performance processing horsepower that vision systems require.
Initially, embedded-vision developers used the existing Xilinx Verilog and VHDL tools to develop these systems. Xilinx introduced the SDSoC development environment for HLL-based design two years ago and, since then, SDSoC has dramatically and successfully shorted development cycles for thousands of design teams. Xilinx’s new reVISION stack now enables an even broader set of software and systems engineers to develop intelligent, highly responsive embedded-vision systems faster and more easily using Xilinx All Programmable devices.
And what about the performance of the resulting embedded-vision systems? How do their performance metrics compare against against systems based on embedded GPUs or the typical SoCs used in these applications? Xilinx-based systems significantly outperform the best of this group, which employ Nvidia devices. Benchmarks of the reVISION flow using Zynq SoC targets against Nvidia Tegra X1 have shown as much as:
There is huge value to having a very rapid and deterministic system-response time and, for many systems, the faster response time of a design that's been accelerated using programmable logic can mean the difference between success and catastrophic failure. For example, the figure below shows the difference in response time between a car’s vision-guided braking system created with the Xilinx reVISION stack running on a Zynq UltraScale+ MPSoC relative to a similar system based on an Nvidia Tegra device. At 65mph, the Xilinx embedded-vision system’s response time stops the vehicle 5 to 33 feet faster depending on how the Nvidia-based system is implemented. Five to 33 feet could easily mean the difference between a safe stop and a collision.
(Note: This example appears in the new Xilinx reVISION backgrounder.)
The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down. Many new types of neural networks for vision-guided systems have emerged along with new techniques that make deployment of these neural networks much more efficient. No matter what you develop today or implement tomorrow, the hardware and I/O reconfigurability and software programmability of Xilinx All Programmable devices can “future-proof” your designs whether it’s to permit the implementation of new algorithms in existing hardware; to interface to new, improved sensing technology; or to add an all-new sensor type (like LIDAR or Time-of-Flight sensors, for example) to improve a vision-based system’s safety and reliability through advanced sensor fusion.
Xilinx is pushing even further into vision-guided, machine-learning applications with the new Xilinx reVISION Stack and this announcement complements the recently announced Reconfigurable Acceleration Stack for cloud-based systems. (See “Xilinx Reconfigurable Acceleration Stack speeds programming of machine learning, data analytics, video-streaming apps.”) Together, these new development resources significantly broaden your ability to deploy machine-learning applications using Xilinx technology—from inside the cloud to the very edge.
You might also want to read “Xilinx AI Engines Steers New Course” by Junko Yoshida on the EETimes.com site.