UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Displaying articles for: 03-12-2017 - 03-18-2017

 

I did not go to Embedded World in Nuremberg this week but apparently SemiWiki’s Bernard Murphy was there and he’s published his observations about three Zynq-based reference designs that he saw running in Aldec’s booth on the company’s Zynq-based TySOM embedded dev and prototyping boards.

 

 

Aldec TySOM-2 Prototyping Board.jpg

 

Aldec TySOM-2 Embedded Prototyping Board

 

 

 

Murphy published this article titled “Aldec Swings for the Fences” on SemiWiki and wrote:

 

 

“At the show, Aldec provided insight into using the solution to model the ARM core running in QEMU, together with a MIPI CSI-2 solution running in the FPGA. But Aldec didn’t stop there. They also showed off three reference designs designed using this flow and built on their TySOM boards.

 

“The first reference design targets multi-camera surround view for ADAS (automotive – advanced driver assistance systems). Camera inputs come from four First Sensor Blue Eagle systems, which must be processed simultaneously in real-time. A lot of this is handled in software running on the Zynq ARM cores but the computationally-intensive work, including edge detection, colorspace conversion and frame-merging, is handled in the FPGA. ADAS is one of the hottest areas in the market and likely to get hotter since Intel just acquired Mobileye.

 

“The next reference design targets IoT gateways – also hot. Cloud interface, through protocols like MQTT, is handled by the processors. The gateway supports connection to edge devices using wireless and wired protocols including Bluetooth, ZigBee, Wi-Fi and USB.

 

“Face detection for building security, device access and identifying evil-doers is also growing fast. The third reference design is targeted at this application, using similar capabilities to those on the ADAS board, but here managing real-time streaming video as 1280x720 at 30 frames per second, from an HDR-CMOS image sensor.”

 

The article contains a photo of the Aldec TySOM-2 Embedded Prototyping Board, which is based on a Xilinx Zynq Z-7045 SoC. According to Murphy, Aldec developed the reference designs using its own and other design tools including the Aldec Riviera-PRO simulator and QEMU. (For more information about the Zynq-specific QEMU processor emulator, see “The Xilinx version of QEMU handles ARM Cortex-A53, Cortex-R5, Cortex-A9, and MicroBlaze.”)

 

Then Murphy wrote this:

 

“So yes, Aldec put together a solution combining their simulator with QEMU emulation and perhaps that wouldn’t justify a technical paper in DVCon. But business-wise they look like they are starting on a much bigger path. They’re enabling FPGA-based system prototype and build in some of the hottest areas in systems today and they make these solutions affordable for design teams with much more constrained budgets than are available to the leaders in these fields.”

 

 

 

Digilent says that its new $199.99 Digital Discovery—a low-cost USB instrument that combines a 24-channel, 800Msamples/sec logic analyzer; a 16-bit, 100Msamples/sec digital pattern generator; and a 100mA power supply—“was created to be the ultimate embedded development companion.” Further, “Its features and specifications were deliberately chosen to maintain a small and portable form factor, withstand use in a variety of environments, and keep costs down, while balancing the requirements of operating on USB Power.”

 

(Note: Skip to the bottom of this blog for a limited-time offer. Then come back.)

 

Here’s a photo of the Digital Discovery:

 

 

Digilent Digital Discovery Module.jpg

 

Digilent’s Digital Discovery—a combined 24-channel, 800Msamples/sec logic analyzer and 16-bit, 100Msamples/sec digital pattern generator

 

 

If that form factor looks familiar, you’re probably reminded of the company’s Analog Discovery and Analog Discovery 2 100Msamples/sec USB DSO, logic analyzer, and power supply. (See “$279 Analog Discovery 2 DSO, logic analyzer, power supply, etc. relies on Spartan-6 for programmability, flexibility.) And just like the Analog Discovery modules, the Digilent Digital Discovery is based on a Xilinx Spartan-6 FPGA (an LX25), which becomes pretty clear when you look at the product’s board photo. The Spartan-6 FPGA is right there in the center of the board:

 

 

Digilent Digital Discovery Module Board.jpg

 

Digilent’s Digital Discovery is based on a Xilinx Spartan-6 FPGA

 

 

When I write “based on,” what I mean to say is that Digilent’s IP in the Spartan-6 FPGA pretty much implements the entire low-cost instrument—as clearly shown in the block diagram:

 

 

Digilent Digital Discovery Module Block Diagram.jpg

 

 

Digilent’s Digital Discovery Module Block Diagram

 

 

And what does this $199.99 instrument do, considering that it’s implemented using a low-cost FPGA? Here are the specs (and pretty impressive specs they are):

 

 

  • 24-channel, 800Msamples/sec* digital logic analyzer (1.2…3.3V CMOS)
  • 16-channel, 100Msamples/sec pattern generator (1.2…3.3V)
  • 16-channel virtual digital I/O including buttons, switches, and LEDs – perfect for logic training applications
  • Two input/output digital trigger signals for linking multiple instruments (1.2…3.3V CMOS)
  • A programmable power supply of 1.2…3.3V/100mA. The same voltage supplies the Logic Analyzer input buffers and the Pattern Generator input/output buffers, for keeping the logic level compatibility with the circuit under test.
  • Digital Bus Analyzers (SPI, I²C, UART, Parallel)

 

*Note: to obtain speeds of 200MS/s and higher, the High Speed Adapter must be used.

 

 

So, you may have noted that asterisk on the logic analyzer's maximum sample rate. You need a Digital Discovery High Speed Adapter to attain the full 800Msamples/sec acquisition rate on the logic analyzer. Normally, that’s another $49.99. However, for the first 100 Digital Discovery buyers, Digilent is throwing in the high-speed adapter in for free.

 

Operators are standing by.

 

 

 

 

 

Image3.jpgAEye is the latest iteration of the eye-tracking technology developed by EyeTech Digital Systems. The AEye chip is based on the Zynq Z-7020 SoC. It’s located immediately adjacent to the imaging sensor, which creates compact, stand-alone systems. This technology is finding its way into diverse vision-guided systems in the automotive, AR/VR, and medical diagnostic arenas. According to EyeTech, the Zynq SoC’s unique abilities allows the company to create products they could not do any other way.

 

With the advent of the reVISION stack, EyeTech is looking to expand its product offerings into machine learning, as discussed in this short, 3-minute video:

 

 

 

 

 

 

For more information about EyeTech, see:

 

 

 

 

 

Next week at OFC 2017 in Los Angeles, Acacia Communications, Optelian, Precise-ITC, Spirent, and Xilinx will present the industry’s first interoperability demo supporting 200/400GbE connectivity over standardized OTN and DWDM. Putting that succinctly, the demo is all about packing more bits/λ, so that you can continue to use existing fiber instead of laying more.

 

Callite-C4 400GE/OTN Transponder IP from Precise-ITC instantiated in a Xilinx Virtex UltraScale+ VU9P FPGA will map native 200/400GbE traffic—generated by test equipment from Spirent—into 2x100 and 4x100 OTU4-encapsulated signals. The 200GbE and 400GbE standards are still in flux, so instantiating the Precise-ITC transponder IP in an FPGA allows the design to quickly evolve with the standards with no BOM or board changes. Concise translation: faster time to market with much less risk.

 

 

Precise-ITC Callite-4 IP.jpg

 

Callite-C4 400GE/OTN Transponder IP Block Diagram

 

 

 

Optelian’s TMX-2200 200G muxponder, scheduled for release later this year, will muxpond the OTU4 signals into 1x200Gbps or 2x200Gbps DP-16QAM using Acacia Communications’ CFP2-DCO coherent pluggable transceiver.

 

 

The Optelian and Precise-ITC exhibit booths at OFC 2017 are 4139 and 4141 respectively.

 

 

 

By Adam Taylor

 

In looking at the Zynq UltraScale+ MPSoC’s AMS capabilities so far, we have introduced the two slightly different Sysmon blocks residing within the Zynq UltraScale+ MPSoC’s PS (processing system) and PL (programmable logic). In this blog, I am going to demonstrate how we can get the PS Symon up and running when we use both the ARM Cortex-A53 and Cortex-R5 processor cores in the Zynq UltraScale+ MPSoC’s PS. There is little difference when we use both types of processor, but I think it important to show you how to use both.

 

The process to use the Sysmon is the same as it is for many of the peripherals we have looked at previously with the MicroZed Chronicles:

 

  1. Look Up the configuration of the Sysmon Peripheral (XSysMonPsu_LookupConfig)
  2. Initialize the Sysmon Peripheral (XSysMonPsu_CfgInitialize)
  3. Reset the Sysmon (XSysMonPsu_Reset)
  4. Set the Sequencer to safe mode while we update its configuration (XSysMonPsu_SetSequencerMode)
  5. Disable the alarms (XSysMonPsu_SetAlarmEnables)
  6. Set the Sequencer Enables for the channels we want to sample (XSysMonPsu_SetSeqChEnables)
  7. Set the ADC Clock Divisor (XSysMonPsu_SetAdcClkDivisor)
  8. Set the Sequencer Mode (XSysMonPsu_SetSequencerMode)

 

The function names in parentheses are those which we use to perform the operation we desire, provided we pass the correct parameters. In the simplest case, as in this example, we can then poll the output registers using the XSysMonPsu_GetAdcData() function. All of these functions are defined within the file xsysmonpsu.h, which is available under the board Support Package Lib Src directory in SDK.

 

Examining the functions, you will notice that each of the functions used in step 4 to 8 require an input parameter called SysmonBlk. You must pass this parameter to the function. This parameter is how we which Sysmon (within the PS or the PL) we want to address. For this example, we will be specifying the PS Sysmon using XSYSMON_PS, which is also defined within xsysmonpsu.h. If we want to address the PL, we use the XSYSMON_PL definition, which we will be looking at next time.

 

There is also another header file which is of use and that is xsysmonpsu_hw.h. Within this file, we can find the definitions required to correctly select the channels we wish to sample in the sequencer. These are defined in the format:

 

 

XSYSMONPSU_SEQ_CH*_<Param>_MASK

 

 

This simple example samples the following within the PS Sysmon:

 

  1. Temperature
  2. Low Power Core Supply Voltage
  3. Full Power Core Supply Voltage
  4. DDR Supply Voltage
  5. Supply voltage for PS IO banks 0 to 3

 

We can use conversion functions provided within the xsysmonpsu.h to convert from the raw value supplied by the ADC into temperature and voltage. However, the PS IO banks are capable of supporting 3v3 logic. As such, the conversion macro from raw reading to voltage is not correct for these IO banks or for the HD banks in the PL. (We will look at different IO bank types in another blog).

 

The full-scale voltage is 3V for most of the voltage conversions. However, in line with UG580 Pg43, we need to use a full scale of 6V for the PS IO. Otherwise we will see a value only half of what we are expecting for that bank’s supply voltage setting. With this in mind, my example contains a conversion function at the top of the source file to be used for these IO banks, to ensure that we get the correct value.

 

The Zynq UltraScale+ MPSoC architecture permits both the APU (the ARM Cortex-A53 processors) and the RPU (the ARM Cortex-R5 processors) to address the Sysmon. To demonstrate this, the same file was used in applications first targeting an ARM Cortex-A53 processor in the APU and then targeting the ARM Cortex-R5 processor in the RPU. I used Core 0 in both cases.

 

The only difference between these two cases was the need to create new applications that select the core to be targeted and then updating the FSBL to load the correct core. (See “Adam Taylor’s MicroZed Chronicles, Part 172: UltraZed Part 3—Saying hello world and First-Stage Boot” for more information on how to do this.)

 

 

 

Image1.jpg

 

Results when using the ARM Cortex-A53 Core 0 Processor

 

 

 

Image2.jpg

 

Results when using the ARM Cortex-R5 Core 0 Processor

 

 

 

When I ran the same code, which is available in the GitHub directory, I received the examples as above over the terminal program, which show it working on both the ARM Cortex-A53 and ARM Cortex-R5 cores.

 

Next time we will look at how we can use the PL Sysmon.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

 

 

 

EETimes’ Junko Yoshida with some expert help analyzes this week’s Xilinx reVISION announcement

by Xilinx Employee ‎03-15-2017 01:25 PM - edited ‎03-22-2017 07:20 AM (13,923 Views)

 

Image3.jpgThis week, EETimes’ Junko Yoshida published an article titled “Xilinx AI Engine Steers New Course” that gathers some comments from industry experts and from Xilinx with respect to Monday’s reVISION stack announcement. To recap, the Xilinx reVISION stack is a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference.

 

(See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)

 

As Xilinx Senior Vice President of Corporate Strategy Steve Glaser tells Yoshida, “Xilinx designed the stack to ‘enable a much broader set of software and systems engineers, with little or no hardware design expertise to develop, intelligent vision guided systems easier and faster.’

 

Yoshida continues:

 

While talking to customers who have already begun developing machine-learning technologies, Xilinx identified ‘8 bit and below fixed point precision’ as the key to significantly improve efficiency in machine-learning inference systems.

 

 

Yoshida also interviewed Karl Freund, Senior Analyst for HPC and Deep Learning at Moor Insights & Strategy, who said:

 

Artificial Intelligence remains in its infancy, and rapid change is the only constant.” In this circumstance, Xilinx seeks “to ease the programming burden to enable designers to accelerate their applications as they experiment and deploy the best solutions as rapidly as possible in a highly competitive industry.

 

 

She also quotes Loring Wirbel, a Senior Analyst at The Linley group, who said:

 

What’s interesting in Xilinx's software offering, [is that] this builds upon the original stack for cloud-based unsupervised inference, Reconfigurable Acceleration Stack, and expands inference capabilities to the network edge and embedded applications. One might say they took a backward approach versus the rest of the industry. But I see machine-learning product developers going a variety of directions in trained and inference subsystems. At this point, there's no right way or wrong way.

 

 

There’s a lot more information in the EETimes article, so you might want to take a look for yourself.

 

 

 

 

Next week at the OFC Optical Networking and Communication Conference & Exhibition in Los Angeles, Xilinx will be in the Ethernet Alliance booth demonstrating the industry’s first, standard-based, multi-vendor 400GE network. A 400GE MAC and PCS instantiated in a Xilinx Virtex UltraScale+ VU9P FPGA will be driving a Finisar 400GE CFP8 optical module, which in turn will communicate with a Spirent 400G test module over a fiber connection.

 

In addition, Xilinx will be demonstrating:

 

 

PAM4 Eye.jpg

 

 

  • The world’s first complete FlexE 1.0 solution showcasing bonding, sub-rating and channelization on UltraScale+ FPGAs.

 

  • LLDP packet snooping on transport line cards to allow SDN controllers to build network topology maps, which aid data-center network automation.

 

  • Optical technology abstraction in DCI transport.

 

If you’re visiting OFC, be sure to stop by the Xilinx booth (#1809).

 

 

 

PLDA has announced the XpressRICH4-AXI PCIe 4.0 configurable IP block that ties an on-chip AXI bus to PICe 4.0. The IP block complies with the PCI Express Base 4.0r7 specification and supports endpoint, root port, and dual mode configurations. The IP supports Xilinx Virtex-7, Virtex UltraScale, and Kintex UltraScale devices and can be used for ASIC design as well.

 

Here’s a block diagram of the core:

 

 

PLDA XpressRICH4-AXI PCIe 4 IP.jpg

 

 

PLDA XpressRICH4-AXI PCIe 4.0 configurable IP Block Diagram

 

 

Please contact PLDA directly for more information about this IP.

 

 

 

This week at Embedded World in Nuremberg, Lynx Software Technologies is demonstrating its port of the LynxSecure Separation Kernel hypervisor to the ARM Cortex-A53 processors on the Xilinx Zynq UltraScale+ MPSoC. According to Robert Day, Vice President of Marketing at Lynx, "ARM designers are now able to run safety critical environments alongside a general purpose OS like Linux or LynxOS RTOS on the same Xilinx processor without compromising safety, security or real-time performance. Use cases include automotive systems based on environments such as AUTOSAR RTA-BSW from ETAS and avionics designs using LynxOS-178 RTOS from Lynx. Designers can match the security of air-gap hardware partitioning without incurring the cost, power and size overhead of separate hardware."

 

The LynxSecure port to the Zynq UltraScale+ MPSoC supports modular software architectures and tight integration with the Zynq UltraScale+ MPSoC’s FPGA fabric for hosting bare-metal applications, trusted functions, and open-source projects on a single SoC with secure partitioning.  You have the option to decide which functions run in software using LynxSecure bare-metal apps and which functions you need to hardware-accelerate through the Zynq UltraScale+ MPSoC’s FPGA fabric.

 

The LunxSecure technology was designed to satisfy high-assurance computing requirements in support of the NIST, NSA Common Criteria, and NERC CIP evaluation processes which are used to regulate military and industrial computing environments.

 

The LynxSecure Separation Kernel hypervisor provides:

 

  • Safety & Security
  • Domain Isolation
  • Trusted Execution Environments
  • Reference Monitor Plugins (e.g. Firewalls, IDS encryption, guards)

 

 

Here’s a diagram of the LynxSecure Separation Kernel hypervisor architecture:

 

 

 

LynxSecure Architecture.jpg 

 

 

 

Please contact Lynx Software Technologies directly for information about the LynxSecure Separation Kernel hypervisor.

 

 

Zynq + PYNQ + Python + BNNs: Machine inference does not get any easier… or faster

by Xilinx Employee ‎03-14-2017 03:10 PM - edited ‎03-15-2017 10:25 AM (21,180 Views)

 

Machine learning and machine inference based on CNNs (convolutional neural networks) are the latest way to classify images and, as I wrote in Monday’s blog post about the new Xilinx reVISION announcement, “The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down.” (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”) The challenge now is to make the CNNs run faster while consuming less power. It would be nice to make them easier to use as well.

 

OK, that’s a setup. A paper published last month at the 25th International Symposium on Field Programmable Gate Arrays titled “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference” describes a method to speed up CNN-based inference while cutting power consumption by reducing CNN precision in the inference machines. As the paper states:

 

…a growing body of research demonstrates this approach [CNN] incorporates significant redundancy. Recently, it has been shown that neural networks can classify accurately using one- or two-bit quantization for weights and activations.  Such a combination of low-precision arithmetic and small memory footprint presents a unique opportunity for fast and energy-efficient image classification using Field Programmable Gate Arrays (FPGAs). FPGAs have much higher theoretical peak performance for binary operations compared to floating point, while the small memory footprint removes the off-chip memory bottleneck by keeping parameters on-chip, even for large networks. Binarized Neural Networks (BNNs), proposed by Courbariaux et al., are particularly appealing since they can be implemented almost entirely with binary operations, with the potential to attain performance in the teraoperations per second (TOPS) range on FPGAs.

 

The paper then describes the techniques developed by the authors to generate BNNs and instantiate them into FPGAs. The results, based on experiment using a Xilinx ZC706 eval kit based on a Zynq Z-7045 SoC, are impressive:

 

When it comes to pure image throughput, our designs outperform all others. For the MNIST dataset, we achieve an FPS which is over 48/6x over the nearest highest throughput design [1] for our SFC-max/LFC-max designs respectively. While our SFC-max design has lower accuracy than the networks implemented by Alemdar et al. for our LFC-max design outperforms their nearest accuracy design by over 6/1.9x for throughput and FPS/W respectively. For other datasets, our CNV-max design outperforms TrueNorth for FPS by over 17/8x for CIFAR-10 / SVHN datasets respectively, while achieving 9.44x higher throughput than the design by Ovtcharov et al., and 2:2x over the fastest results reported by Hegde et al. Our prototypes have classification accuracy within 3% of the other low-precision works, and could have been improved by using larger BNNs.

 

There’s something even more impressive, however. This design approach to creating BNNs is so scalable that it’s now on a low-end platform—the $229 Digilent PYNQ-Z1. (Digilent’s academic price for the PYNQ-Z1 is only $65!) Xilinx Research Labs in Ireland, NTNU (Norwegian U. of Science and Technology), and the U. of Sydney have released an open-source Binarized Neural Network (BNN) Overlay for the PYNQ-Z1 based on the work described in the above paper.

 

According to Giulio Gambardella of Xilinx Reseach Labs, “…running on the PYNQ-Z1 (a smaller Zynq 7020), [the PYNQ-Z1] can achieve 168,000 image classifications per second with 102µsec latency on the MNIST dataset with 98.40% accuracy, and 1700 images per seconds with 2.2msec latency on the CIFAR-10, SVHN, and GTSRB dataset, with 80.1%, 96.69%, and 97.66% accuracy respectively running at under 2.5W.”

 

 

PYNQ-Z1.jpg

 

Digilent PYNQ-Z1 board, based on a Xilinx Zynq Z-7020 SoC

 

 

 

Because the PYNQ-Z1 programming environment centers on Python and the Jupyter development environment, there are a number of Jupyter notebooks associated with this package that demonstrate what the overlay can do through live code that runs on the PYNQ-Z1 board, equations, visualizations and explanatory text and program results including images.

 

There are also examples of this BNN in practical application:

 

 

 

 

For more information about the Digilent PYNQ-Z1 board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.

 

 

 

EEJournal’s Kevin Morris weighs in on Monday’s Xilinx reVISION stack launch for embedded-vision apps

by Xilinx Employee ‎03-14-2017 01:35 PM - edited ‎03-22-2017 07:20 AM (13,999 Views)

 

Image3.jpgToday, EEJournal’s Kevin Morris has published a review article of the announcement titled “Teaching Machines to See: Xilinx Launches reVISION” following Monday’s announcement of the Xilinx reVISION stack for developing vision-guided applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”

 

Morris writes:

 

But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms.

 

“That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric - such as Xilinx Zynq and Altera SoC FPGAs - absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space.

 

Later on, Morris gets to the fly in the ointment:

 

“Oh, yeah, There’s still that “almost impossible to program” issue.”

 

And then he gets to the solution:

 

reVISION, announced this week, is a stack - a set of tools, interfaces, and IP - designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development - with customers predicting cuts of as much as 12 months from current schedules for new product and update development.

 

In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.

 

The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ⅕ the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications.

 

 

But don’t take my word for it, read Morris’ article yourself.

 

 

 

 

 

Pentek has published the 10th edition of “Putting FPGAs to Work in Software Radio Systems,” a 90-page tutorial written by Rodger H. Hosking, Pentek’s Vice-President & Cofounder. As the preface of this tutorial guide states:

 

Pentek SDR Handbook.jpg 

“FPGAs have become an increasingly important resource for software radio systems. Programmable logic technology now offers significant advantages for implementing software radio functions such as DDCs (Digital Downconverters). Over the past few years, the functions associated with DDCs have seen a shift from being delivered in ASICs (Application-Specific ICs) to operating as IP (Intellectual Property) in FPGAs.

 

“For many applications, this implementation shift brings advantages that include design flexibility, higher precision processing, higher channel density, lower power, and lower cost per channel. With the advent of each new, higher-performance FPGA family, these benefits continue to increase.

 

“This handbook introduces the basics of FPGA technology and its relationship to SDR (Software-Defined Radio) systems. A review of Pentek’s GateFlow FPGA Design Resources is followed by a discussion of features and benefits of FPGA-based DDCs. Pentek SDR products that utilize FPGA technology and applications based on such products are also presented.”

 

 

 

Pentek has long used Xilinx All Programmable devices in its board-level products and that long experience shows in some unique, multi-generational analysis of the performance improvements in Xilinx’s Virtex FPGA generations starting with the Virtex-II Pro family (introduced in 2002) and moving through the Virtex-7 device family.

 

 

 

Yesterday, Xilinx announced the reVISION stack for software-defined, embedded-vision apps. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”) Today, there’s a video demo of an application built with the reVISION stack and a second with more technical detail on optical flow. The first demo shows a working system that accepts video from two cameras, performs an optical flow transformation on one video stream, processes a recorded video stream from an SD card through a CNN, and then merges the video and optical-flow representations and outputs the resulting video on an HDMI screen. All of these operations occur within one Xilinx Zynq UltraScale+ ZU9EG MPSoC running on a Xilinx ZCU102 eval kit.

 

Here’s a block diagram of the demo showing how the three video streams are captured, processed, and displayed by the Zynq UltraScale+ MPSoC:

 

 

reVISION demo block diagram.jpg 

 

 

The two image sensors in this first demo are worthy of note. One is the new Sony IMX274LQC 4K60 image sensor designed for industrial applications. It’s connected to the ZCU102 eval kit over a MIPI interface running through one of the eval kit’s FMC connectors. The second camera is a Stereolabs ZED 2K stereo camera interfaced over USB 3.0, a high-speed native interface for the Zynq UltraScale+ MPSoC.

 

Here’s the 2-minute video of the operating demo:

 

 

 

 

 

For more details on the optical-flow part of the demo, here’s another 2-minute video:

 

 

 

 

 

Using the Xilinx RFSoC for Satcom applications

by Xilinx Employee ‎03-13-2017 03:46 PM - edited ‎03-24-2017 08:46 AM (15,967 Views)

 

By Dr. Rajan Bedi, Spacechips

 

Several of my satcom ground-segment clients and I are considering Xilinx's recently announced RFSoC for future transceivers and I want to share the benefits of this impending device. (Note: For more information on the Xilinx RFSoC, see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable,” we mean it!”)

 

Direct RF/IF sampling and direct DAC up-conversion are currently being used very successfully in-orbit and on the ground. For example, bandpass sampling provides flexible RF frequency planning with some spacecraft by directly digitizing L- and S-band carriers to remove expensive and cumbersome superheterodyne down-conversion stages. Today, many navigation satellites directly re-construct the L-band carrier from baseband data without using traditional up-conversion. Direct RF/IF Sampling and direct DAC up-conversion have dramatically reduced the BOM, size, weight, power consumption, as well as the recurring and non-recurring costs of transponders. Software-defined radio (SDR) has given operators real scalability, reusability, and reconfigurability. Xilinx's new RFSoC will offer further hardware integration advantages for the ground segment.

 

The Xilinx RFSoC integrates multi-Gsamples/sec ADCs and DACs into a 16nm Zynq UltraScale+ MPSoC. At this geometry and with this technology, the mixed-signal converters draw very little power and economies of scale make it possible to add a lot of digital post-processing (Small A/Big D!) to implement functions such as DDC (digital down-conversion), DUC (digital up-conversion), AGC (automatic gain control), and interleaving calibration.

 

While CMOS scaling has improved ADC and DAC sample rates, which results in greater bandwidths at lower power, the transconductance of transistors and the size of the analog input/output voltage swing are reduced for analog designs, which impacts G/T at the satellite receiver. (G/T is antenna gain-to-noise-temperature, a figure of merit in the characterization of antenna performance where G is the antenna gain in decibels at the receive frequency and T is the equivalent noise temperature of the receiving system in kelvins. The receiving system’s noise temperature is the summation of the antenna noise temperature and the RF-chain noise temperature from the antenna terminals to the receiver output.)

 

Integrating ADCs and DACs with Xilinx's programmable MPSoC fabric shrinks physical footprint, reduces chip-to-chip latency, and completely eliminates the external digital interfaces between the mixed-signal converters and the FPGA. These external interfaces typically consume appreciable power. For parallel-I/O connections, they also need large amounts of pc board space and are difficult to route.

 

There will be a number of devices in the Xilinx RFSoC family, each containing different ADC/DAC combinations targeting different markets. Depending on the number of integrated mixed-signal converters, Xilinx is predicting a 55% to 77% reduction in footprint compared to current discrete implementations using JESD204B high-speed serial links between the FPGA and the ADCs and DACs, as illustrated below. Integration will also benefit clock distribution both at the device and system level.

 

 

RFSoC Footprint Reduction 2.jpg

 

Figure 1: RFSoC device concept (Source Xilinx)

 

 

The RFSoC’s integrated 12-bit ADCs can each sample up to 4Gsamples/sec, which offers flexible bandwidth and RF frequency-planning options. The analog input bandwidth of each ADC appears to be 4GHz, which allows direct RF/IF sampling up to the S-band.

 

Direct RF/IF sampling obeys the bandpass Nyquist Theorem when oversampling at 2x the information bandwidth (or greater) and undersampling the absolute carrier frequencies. For example, the spectrum below shows a 48.5MHz L-band signal centerd at 1.65GHz, digitized using an undersampling rate of 140.5Msamples/sec. The resulting oversampling ratio is 2.9 with the information located in the 24th Nyquist zone. Digitization aliases the bandpass information to the first Nyquist zone, which may or may not be baseband depending on your application. If not, the RFSoC's integrated DDC moves the alias to dc, allowing the use of a low-pass filter.

 

 

Direct L-band sampling.jpg

 

 

Figure 2: Direct L-Band Sampling

 

 

As the sample rate increases, the noise spectral density spreads across a wider Nyquist region with respect to the original signal bandwidth. Each time the sampling frequency doubles, the noise spectral density decreases by 3dB as the noise re-distributes across twice the bandwidth, which increases dynamic range and SNR. Understandably, operators want to exploit this processing gain! A larger oversampling ratio also moves the aliases further apart, relaxing the specification of the anti-aliasing filter. Furthermore, oversampling increases the correlation between successive samples in the time-domain, allowing the use of a decimating filter to remove some samples and reduce the interface rate between the ADC and the FPGA.

 

The RFSoC’s integrated 14-bit DACs operate up to 6.4Gsamples/sec, which also offers flexible bandwidth and RF frequency-planning options.

 

Just like any high-frequency, large bandwidth mixed-signal device, designing an RFSoC into a system requires careful consideration of floor-planning, front/back-end component placement, routing, grounding, and analog-digital segregation to achieve the required system performance. The partitioning starts at the die and extends to the module/sub-system level with all the analog signals (including the sampling clock) typically on one side of an ADC or DAC. Given the RFSoC's high sampling frequencies, at the pcb level, analog inputs and outputs must be isolated further to prevent crosstalk between adjacent channels and clocks, and from digital noise.

 

At low carrier frequencies, the performance of an ADC or DAC is limited by its resolution and linearity (DNL/INL). However at higher signal frequencies, SNR is determined primarily by the sampling clock’s purity. For direct RF/IF applications, minimizing jitter will be key to achieving the desired performance as shown below:

 

 

SNR of an ideal ADC vs analog input frequency and clock jitter.jpg 

 

Figure 3: SNR of an ideal ADC vs analog input frequency and clock jitter

 

 

While there are aspects of the mixed-signal processing that could be improved, from the early announcements and information posted on their website, Xilinx has done a good job with the RFSoC. Although not specifically designed for satellite communication, but more so for 5G MIMO and wireless backhaul, the RFSoC's ADCs and DACs have sufficient dynamic range and offer flexible RF frequency-planning options for many ground-segment OEMs.

 

The specification of the RFSoC's ADC will allow ground receivers to directly digitize the information broadcast at traditional satellite communication frequencies at L- and S-band as well as the larger bandwidths used by high-throughput digital payloads. Thanks to its reprogrammability, the same RFSoC-based architecture with its wideband ADCs can be re-used for other frequency plans without having to re-engineer the hardware.

 

The RFSoC's DAC specification will allow ground transmitters to directly construct approximately 3GHz of bandwidth up to the X-band (9.6GHz). Xilinx says that first samples of RFSoC will become available in 2018 and I look forward to designing the part into satcom systems and sharing my experiences with you.

 

 

 

Dr. Rajan Bedi pioneered the use of Direct RF/IF Sampling and direct DAC up-conversion for the space industry with many in-orbit satellites currently using these techniques. He was previously invited by ESA and NASA to present his work and was also part of the project teams which developed many of the ultra-wideband ADCs and DACs currently on the market. These devices are successfully operating in orbit today. Last year, his company, Spacechips, was awarded High-Reliability Product of the Year for advancing Software Defined Radio.

 

Spacechips provides space electronics design consultancy services to manufacturers of satellites and spacecraft around the world. The company also helps OEMs assess the benefits of COTS components and exploit the advantages of direct RF/IF sampling and direct DAC up-conversion. Prior to founding Spacechips, Dr. Bedi headed the Mixed-Signal Design Group at Airbus Defence & Space in the UK for twelve years. Rajan is the author of Out-of-this-World Design, the popular, award-winning blog on Space Electronics. He also teaches direct RF/IF sampling and direct DAC up-conversion techniques in his Mixed-Signal and FPGA courses which are offered around the world. Rajan offers a series of unique training courses, Courses for Rocket Scientists, which teach and compare all space-grade FPGAs as well as the use of COTS Xilinx UltraScale and UltraScale+ parts for implementing spacecraft IP. Rajan has designed every space-grade FPGA into satellite systems!

 

 

As part of today’s reVISION announcement of a new, comprehensive development stack for embedded-vision applications, Xilinx has produced a 3-minute video showing you just some of the things made possible by this announcement.

 

Here it is:

 

 

Adam Taylor’s MicroZed Chronicles, Part 177: Introducing the reVision stack

by Xilinx Employee ‎03-13-2017 10:39 AM - edited ‎03-22-2017 07:19 AM (14,977 Views)

 

By Adam Taylor

 

Several times in this series, we have looked at image processing using the Avnet EVK and the ZedBoard. Along with the basics, we have examined object tracking using OpenCV running on the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PS (processing system) and using HLS with its video library to generate image-processing algorithms for the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic, see blogs 140 to 148 here).

 

Xilinx’s reVision is an embedded-vision development stack that provides support for a wide range of frameworks and libraries often used for embedded-vision applications. Most exciting, from my point of view, is that the stack includes acceleration-ready OpenCV functions.

 

Image1.jpg 

 

 

The stack itself is split into three layers. Once we select or define our platform, we will be mostly working at the application and algorithm layers. Let’s take a quick look at the layers of the stack:

 

  1. Platform layer: This is the lowest level of the stack and is the one on which the remaining stack layers are built. This layer includes platform definitions of the hardware and the software environment. Should we choose not to use a predefined platform, we can generate a custom platform using Vivado.

 

  1. Algorithm layer: Here we create our application using SDSoC and the platform definition for the target hardware. It is within this layer that we can use the acceleration-ready OpenCV functions along with predefined and optimized implementations for Customized Neural Network (CNN) developments such as inference accelerators within the PL.

 

  1. Application Development Layer: The highest layer of the stack. Development here is where high-level frameworks such as Caffe and OpenVX are used to complete the application.

 

As I mentioned above one of the most exciting aspects of the reVISION stack is the ability to accelerate a wide range of OpenCV functions using the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL. We can group the OpenCV functions that can be hardware-accelerated using the PL into four categories:

 

  1. Computation – Includes functions such as absolute difference between two frames, pixel-wise operations (addition, subtraction and multiplication), gradient, and integral operations
  2. Input Processing – Supports bit-depth conversions, channel operations, histogram equalization, remapping, and resizing.
  3. Filtering – Supports a wide range of filters including Sobel, Custom Convolution, and Gaussian filters.
  4. Other – Provides a wide range of functions including Canny/Fast/Harris edge detection, thresholding, SVM, HoG, LK Optical Flow, Histogram Computation, etc.

 

What is very interesting with these function calls is that we can optimize them for resource usage or performance within the PL. The main optimization method is specifying the number of pixels to be processed during each clock cycle. For most accelerated functions, we can choose to process either one or eight pixels. Processing more pixels per clock cycle reduces latency but increases resource utilization. Processing one pixel per clock minimizes the resource requirements at the cost of increased latency. We control the number of pixels processed per clock in via the function call.

 

Over the next few blogs, we will look more at the reVision stack and how we can use it. However in the best Blue Peter tradition, the image below shows the result of running a reVision Harris OpenCV acceleration function within the PL when accelerated.

 

 

Image2.jpg

 

 

Accelerated Harris Corner Detection in the PL

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge

by Xilinx Employee ‎03-13-2017 07:37 AM - edited ‎03-22-2017 07:19 AM (18,327 Views)

 

Image3.jpgToday, Xilinx announced a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference. It’s called the reVISION stack and it allows design teams without deep hardware expertise to use a software-defined development flow to combine efficient machine-learning and computer-vision algorithms with Xilinx All Programmable devices to create highly responsive systems. (Details here.)

 

The Xilinx reVISION stack includes a broad range of development resources for platform, algorithm, and application development including support for the most popular neural networks: AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN. Additionally, the stack provides library elements such as pre-defined and optimized implementations for CNN network layers, which are required to build custom neural networks (DNNs and CNNs). The machine-learning elements are complemented by a broad set of acceleration-ready OpenCV functions for computer-vision processing.

 

For application-level development, Xilinx supports industry-standard frameworks including Caffe for machine learning and OpenVX for computer vision. The reVISION stack also includes development platforms from Xilinx and third parties, which support various sensor types.

 

The reVISION development flow starts with a familiar, Eclipse-based development environment; the C, C++, and/or OpenCL programming languages; and associated compilers all incorporated into the Xilinx SDSoC development environment. You can now target reVISION hardware platforms within the SDSoC environment, drawing from a pool of acceleration-ready, computer-vision libraries to quickly build your application. Soon, you’ll also be able to use the Khronos Group’s OpenVX framework as well.

 

For machine learning, you can use popular frameworks including Caffe to train neural networks. Within one Xilinx Zynq SoC or Zynq UltraScale+ MPSoC, you can use Caffe-generated .prototxt files to configure a software scheduler running on one of the device’s ARM processors to drive CNN inference accelerators—pre-optimized for and instantiated in programmable logic. For computer vision and other algorithms, you can profile your code, identify bottlenecks, and then designate specific functions that need to be hardware-accelerated. The Xilinx system-optimizing compiler then creates an accelerated implementation of your code, automatically including the required processor/accelerator interfaces (data movers) and software drivers.

 

The Xilinx reVISION stack is the latest in an evolutionary line of development tools for creating embedded-vision systems. Xilinx All Programmable devices have long been used to develop such vision-based systems because these devices can interface to any image sensor and connect to any network—which Xilinx calls any-to-any connectivity—and they provide the large amounts of high-performance processing horsepower that vision systems require.

 

Initially, embedded-vision developers used the existing Xilinx Verilog and VHDL tools to develop these systems. Xilinx introduced the SDSoC development environment for HLL-based design two years ago and, since then, SDSoC has dramatically and successfully shorted development cycles for thousands of design teams. Xilinx’s new reVISION stack now enables an even broader set of software and systems engineers to develop intelligent, highly responsive embedded-vision systems faster and more easily using Xilinx All Programmable devices.

 

And what about the performance of the resulting embedded-vision systems? How do their performance metrics compare against against systems based on embedded GPUs or the typical SoCs used in these applications? Xilinx-based systems significantly outperform the best of this group, which employ Nvidia devices. Benchmarks of the reVISION flow using Zynq SoC targets against Nvidia Tegra X1 have shown as much as:

 

  • 6x better images/sec/watt in machine learning
  • 42x higher frames/sec/watt for computer-vision processing
  • 1/5th the latency, which is critical for real-time applications

 

Image1.jpg 

 

There is huge value to having a very rapid and deterministic system-response time and, for many systems, the faster response time of a design that's been accelerated using programmable logic can mean the difference between success and catastrophic failure. For example, the figure below shows the difference in response time between a car’s vision-guided braking system created with the Xilinx reVISION stack running on a Zynq UltraScale+ MPSoC relative to a similar system based on an Nvidia Tegra device. At 65mph, the Xilinx embedded-vision system’s response time stops the vehicle 5 to 33 feet faster depending on how the Nvidia-based system is implemented. Five to 33 feet could easily mean the difference between a safe stop and a collision.

 

 

Image2.jpg 

 

(Note: This example appears in the new Xilinx reVISION backgrounder.)

 

 

The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down. Many new types of neural networks for vision-guided systems have emerged along with new techniques that make deployment of these neural networks much more efficient. No matter what you develop today or implement tomorrow, the hardware and I/O reconfigurability and software programmability of Xilinx All Programmable devices can “future-proof” your designs whether it’s to permit the implementation of new algorithms in existing hardware; to interface to new, improved sensing technology; or to add an all-new sensor type (like LIDAR or Time-of-Flight sensors, for example) to improve a vision-based system’s safety and reliability through advanced sensor fusion.

 

Xilinx is pushing even further into vision-guided, machine-learning applications with the new Xilinx reVISION Stack and this announcement complements the recently announced Reconfigurable Acceleration Stack for cloud-based systems. (See “Xilinx Reconfigurable Acceleration Stack speeds programming of machine learning, data analytics, video-streaming apps.”) Together, these new development resources significantly broaden your ability to deploy machine-learning applications using Xilinx technology—from inside the cloud to the very edge.

 

 

You might also want to read “Xilinx AI Engines Steers New Course” by Junko Yoshida on the EETimes.com site.

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.