The Xcell Daily blog will be archived on June 28th. Thank you Steve Leibson for your informative and entertaining posts over the past several years. Our community is grateful! To keep up-to-date on everything Xilinx, please subscribe to our two new blogs written by employees, partners and industry experts:
Xilinx Xclusive – Your source for Xilinx announcements, customer success stories, industry trends and more.
Adaptive Advantage – Xilinx products, tools, IP, boards, and solution news directly from our subject matter experts to you.
To receive e-mail notifications when new posts are made, go to the blog page, select the “Blog Options” menu and then click “Subscribe”. Registration for the Xilinx community forums is required.
Thanks for your continued support. We hope you join the conversation.
Last August, I wrote a blog titled “The $10 Xilinx Box, the FPGA Board, and the Man from HP Labs: An Xcell Daily triple mystery story” about a 20-year-old, Xilinx-branded box containing a Xilinx demo board with two early-generation Xilinx FPGAs (an XC3020 and an XC4003E) that I’d found at HSC Electronics, a Silicon Valley surplus electronics store. According to the Xilinx shipping invoice in the box, it had belonged to Dave Moberly, who had worked for HP Labs at the time and he’d been instrumental in getting FPGA technology introduced into HP instrumentation starting with the HP 8145A Optical TDR. Sadly, David Moberly passed away on March 9, 2017 so I was not able to interview him.
However, early this month, an Xcell Daily reader using the handle “tomshoup” left a comment to that blog post that continue’s Moberly’s story. I found the comment so compelling that I’m promoting it to a full blog post below:
I was David Moberly's next-to-last manager at Agilent and had hired him into his last job at HP/Agilent from elsewhere in HP, around 1999. HP Labs at 1501 Page Mill Road was incubating a new medical business that was far different than the existing capital equipment medical business. We were designing patient monitoring equipment to be used to monitor patients with congestive heart failure at home, to be offered as a service to Medicare HMOs. We built a simple set of instruments to measure weight (yes, we built a bathroom scale, but it was an HP bathroom scale), blood pressure and heart rate, and a single lead ECG rhythm strip. We introduced this product to the market in 1999 at the Heart Failure Society of America at their annual meeting in San Francisco that year. After about 18 months we had ~5,000 patients under monitoring for a marquee customer. Philips, which bought the medical business from Agilent in 2001 still offers a current generation of this equipment and the associated service.
David was one of several people who basically came up to me and said "I've always wanted to work on medical equipment but didn't want to move to Boston (location of HP's medical business), so here I am." He was one of the most inquisitive people I've ever met, irritatingly so sometimes, but always in an endearing way.
I left Agilent in 2002 when they sold the business to Philips but kept bumping into David and learned he had a love affair with FPGAs. One product he was over the moon about was mixed-signal acquisition systems that are basically an FPGA that provides 2 to 4 channels of analog 'scope function, 16 channels of logic analyzer input, some power supply output, spectrum analysis, arbitrary waveform generation, and other features, all using a laptop as the user interface. Somehow David became the go-to guy to test drive these because of his blogging about them, so manufacturers of such systems would send him new devices to play with and hopefully write them up in his blogs.
David and I ended up working together again early in 2013. I was a contractor at a medical-device company and the company needed help writing software test plans and protocols. I introduced them to David and they approved me hiring him as a subcontractor. We worked together for about another year then. David was retired then and wasn't really looking for work, but was intrigued with the product and the chance to work again. He told me afterwards that he paid for his daughter's wedding with that gig. He also gave me a wonderful thank-you gift: a brand new Analog Discovery by Digilent, one of those [Spartan-6] FPGA-based mixed-signal acquisition systems. I've used it a lot in my current consulting work and think of David every time I use it. David's fondness for these instruments rubbed off on me and we exchanged a round of e-mail as I lusted after the Picoscope 3000, the Cadillac in David's words; I still lust after it but haven't taken the leap. I know if I buy it David would be proud, and if he was still alive would probably want to borrow it to take it apart.
Through mutual friends I knew David had had a bout with cancer. We met for coffee after he was in remission and true to form he told me in detail about his treatment, with his usual level of fascination at the technology, almost to the molecular level.
After David died, I knew his wife Cheryl was faced with the too-common task of being the widow of a pack-rat engineer who had excess storage capacity: think containers. So I volunteered to help Cheryl sort through the Moberly archives and introduced her to HSC Electronics as a way to recycle some of what David had collected. Having known David, and ending up sorting through his stash with Cheryl, I could easily imagine the glee he must have felt when he acquired all those goodies we sorted through.
David had a wonderful career: MIT education, Apple, Trimble, HP, HP Labs, Agilent, Philips. A small team of engineers, with a couple of Davids in it, could probably build anything and run it with an FPGA and a couple of AA batteries.
My thanks to “tomshoup” for sharing this story with Xcell Daily.
Meanwhile, there seems to be a new pledge level that I don’t recall: a $179 level that includes a 1080p video camera. That’s in addition to the touch screen and voice input, which gives the Mycroft Mark II an even more interesting user interface. There are only a limited number of $179 pledge options, with 177 remaining as of the posting of this blog.
Aldec recently published a very descriptive example design for a high-performance, re-programmable network router/switch based on the company’s TySOM-2A-7Z030 embedded development board and an FMC-NET daughter card. The TySOM-2A-7Z030 incorporates a Xilinx Zynq Z-7030 SoC. The Zynq SoC’s dual-core Arm cortex-A9 MPCore processor runs the OpenWrt Linux distribution for embedded devices in this design. It’s a favorite for network switch developers. The design employs the programmable logic (PL) in the Zynq SoC to create four 1G/2.5G Ethernet MACs that connect to the FMC-NET card’s four Ethernet PHYs and a 10G Ethernet subsystem that connects to the FMC-NET card’s QSFP+ card cage.
Bittware has already announced two PCIe boards for these HBM-enhanced Xilinx FPGAs:
The XUPVVH: a double-slot board that accommodates HBM-enhanced Virtex UltraScale+ VU35P or VU37P FPGAs (each with 8Gbytes of HBM DRAM) with four QSFP28 optical cages and two DIMM slots that accommodate as much as 256Gbytes of DDR4 SDRAM (128Gbytes/slot).
We recently looked at how we could use the Zynq SoC’s XADC streaming output with DMA. For that example, I demonstrated only outputting one XADC channel over an AXI stream. However, it is important we understand how we can use multiple channels within an AXI stream to transfer them to processor memory whether we are using the XADC as the source or not.
To demonstrate this, I will be updating the XADC design that we used for the previous streaming example. Upgrading the software is simple. All we need to do is enable another XADC channel when we configure the sequencer and update the API we use. Updating the hardware is a little more complicated.
To upgrade the Vivado hardware design, the first thing we need to do is replace the DMA IP module with the multi-channel DMA (MCDMA) IP core. The MCDMA IP core supports as many as 16 different input channels. DMA channels are mapped to AXI Stream contents using the TDest bus, which is part of the AXIS standard.
As with the previous XADC streaming example, we’ll configure the MCDMA for uni-directional operation (write only) and support for two channels:
Configuration of the MCDMA IP core.
Vivado design with the MCDMA IP Core (Tcl BD available on GitHub)
TDest is the AXI signal used for routing AXI Stream contents. In addition, when we configure the XADC for AXI Streaming, the different XADC channels output on the stream are identified by the TId bus.
To be able to use the MCDMA in conjunction with the XADC, we need to remap the XADC TId channel to the MCDMA TDest channel. We also need to packetize data by asserting TLast on the MCDMA AXIS input.
In the previous example we used a custom IP core to generate the TLast signal. A better solution however is to remap the TId and generate the TLast signal using the AXIS subset converter.
AXIS Subset Converter Configuration
The eagle-eyed will at this point notice that the XADC uses channel numbers up to 31 with the auxiliary inputs using channel ID’s (16 to 31), which are outside the channel range of the MCDMA. If we are using the auxiliary inputs, we can also use the AXIS subset convertor to remap these higher channel numbers into the MCDMA range by remapping the lower four bits of the XADC TId to the MCDMA TDest channel. When using this method the lower XADC channels cannot be used otherwise there would be conflict.
Output of the XADC with multiple channels (Temperature & VPVN Channels)
Output of the AXIS subset block following remapping and TLast Generation
When it comes to updating the software application, we need to use the XMCDMA.h header APIs to configure the MCDMA and set up the buffer descriptors for each of the channels. The software performs the following steps:
Allocate memory areas for the receive buffer and the buffer descriptors.
For each Channel, create the buffer descriptors.
Populate the buffer descriptors and the receive buffer address.
Reset the receive buffer memory contents to zero.
Invalidate the caching in on the receive buffer to ensure the values can be seen in DDR memory.
Commit the channels to the hardware.
Start the MCDMA transfers for each channel.
The software application defines several buffer descriptors for each channel. When it comes to the receive buffer for this example, I have used a single receive buffer so the received data for both channels shares the same address space. This can be seen below. Halfwords starting 0x4yyy relate to the VPVN input while the device temperature half words start 0x9yyy.
Memory Contents showing the two channels
This is a simple adaption to the existing software to use multiple receive buffers in memory. For many applications, separate receive buffers are more useful.
Being able to move AXI Data streams to memory-mapped locations is a vital requirement for many applications—for example signal processing, communication, and sensor interfacing. Using the AXI subset convertor allows us to correctly remap and format the AXIS stream data into a compliant format for the MCDMA IP core.
These are very early days for autonomous vehicles and if you’re designing control systems for such machines, you’d better be thinking of using technologies that bring adaptable intelligence to party. (Would you want to ride in a self-driving car that’s not adaptable?)
There is no PCIe Gen5—yet—but there’s a 32Gbps/lane future out there and TE Connectivity demonstrated that future at this week’s DesignCon 2018. The demo’s real purpose was to show the capabilities of TE Connectivity’s Sliver connector system which includes card-edge and cabled connectors. In the demo at DesignCon, four channels carry 32Gbps data streams through surface-mount and right-angle connectors to create a mockup of a future removable-storage device. Those 32Gbps data streams are generated, transmitted, and received by bulletproof Xilinx UltraScale+ GTY transceivers operating reliably at a theoretical PCIe Gen5’s 32Gbps/lane data rate despite 35dB of loss through the demo system.
In the following short video, Samtec’s Ralph Page describes the demo and mentions the nice eyes and clear data levels, as seen on the Xilinx demo software screen positioned above the demo boards. He also mentions the BER—5.29x10-8. That’s the error rate before adding the error-reducing capabilities of a FEC, which can drop the error rate by perhaps another ten orders of magnitude or more.
Samtec’s demo points to a foreseeable future where you will be able to develop large backplanes with screamingly fast performance using PAM4 SerDes transceivers.
Today, Digilent announced a $299 bundle including its Zybo Z7-20 dev board (based on a Xilinx Zynq Z-7020 SoC), a Pcam 5C 5Mpixel (1080P) color video camera, and a Xilinx SDSoC development environment voucher. (That’s the same price as a Zybo Z7-20 dev board without the camera.) The Zybo Z7 dev board includes a new 15-pin FFC connector that allows the board to interface with the Pcam 5C camera over a 2-lane MIPI CSI-2 and I2C interfaces. (This connector is pin-compatible with the Raspberry Pi’s FFC camera port.) The Pcam 5C camera is based on the Omnivision OV5640 image sensor.
Digilent has created the Pcam 5C + Zybo Z7 demo project to get you started. The demo accepts video from the Pcam 5C camera and passes it out to a display via the Zybo Z7’s HDMI port. All IP used in the demo including a D-PHY receiver, CSI-2 decoder, Bayer to RGB converter and gamma correction is free and open-source so you can study exactly how the D-PHY and CSI-2 decoding works and then develop you own embedded vision products.
If you want this deal, you’d better hurry. The offer expires February 23—three weeks from today.
Rigol’s new RSA5000 real-time spectrum analyzer allows you to capture, identify, isolate, and analyze complex RF signals with a 40MHz real-time bandwidth over either a 3.2GHz or 6.5GHz signal span. It’s designed for engineers working on RF designs in the IoT and IIot markets as well as industrial, scientific, and medical equipment. Rigol was demonstrating the RSA5000 real-time spectrum analyzer at this week’s DesignCon being held at the Santa Clara Convention Center. I listened to a presentation from Rigol’s North American General Manager Mike Rizzo and then a demo by Rigol’s Director of Product Marketing & Software Applications Chris Armstrong, both captured in the 2.5-minute video below.
Rigol RSA5000 Real-Time Spectrum Analyzer
Based on what I saw in the demo, this is an extremely responsive instrument—far more responsive than a swept spectrum analyzer—with several visualization display modes to help you isolate the significant signal in a sea of signals and noise, in real time. It’s capable of continuously executing 146,484 FFTs/sec, which results in a minimum 100% POI (probability of intercept) of 7.45μsec. You need some real DSP horsepower to achieve that sort of performance and the Rigol RSA5000 real-time spectrum analyzer gets this performance from a pair of Xilinx Zynq Z-7015 SoCs. (You'll find many more details about real-time spectrum analysis and the RSA5000 Real-Time Spectrum Analyzer in the Rigol app note "Realtime Spectrum Analyzer vs Spectrum Analyzer," attached at the end of this post. See below.)
Here’s the short presentation and demo of the Rigol RSA5000 real-time spectrum analyzer from DesignCon 2018:
Mike Rizzo told me that the Rigol design engineers selected the Zynq Z-7015 SoCs for three main reasons:
High-bandwidth access between the Zynq SoC’s PS (processing system) and PL (programmable logic)
Excellent development tools including Xilinx’s Vivado HLS
If you’re looking for a very capable spectrum analyzer, give the Rigol RSA5000 a look. If you’re designing your own real-time system and need high-speed computation coupled with fast user response, take a look at the line of Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.
The 2-minute video below shows you an operational Xilinx Virtex UltraScale+ XCVU37P FPGA, which is enhanced with co-packaged HBM (high-bandwidth memory) DRAM using Xilinx’s well-proven, 3rd-generation 3D manufacturing process. (Xilinx started shipping 3D FPGAs way back in 2011, starting with the Virtex-7 2000T and we’ve been shipping these types of devices ever since.)
This video was made on the very first day of silicon bringup for the device and it is already operating at full speed (460Gbytes/sec), error-free, over 32 channels. The Virtex UltraScale+ XCVU37P is one big All Programmable device with:
2852K System Logic Cells
9Mbits of BRAM
270Mbits of UltraRAM
9024 DSP48E2 slices
8Gbytes of integrated HBM DRAM
96 32.75Gbps GTY SerDes transceivers
Whatever your requirements, whatever your application, chances are this extremely powerful FPGA will deliver all of the heavy lifting (processing, memory, and I/O) that you need.
The Avnet MiniZed is an incredibly low-cost dev board based on the Xilinx Zynq Z7007S SoC with WiFi and Bluetooth built in. It currently lists for $89 on the Avnet site. If you’d like a fast start to using this dev board, Avnet is ready to help. As of now, it’s placed four MiniZed Speedway Design Workshops online so that you can learn at your own convenience and your own pace. The four workshops are:
In the Developing Zynq Hardware Speedway, you will be introduced to the single ARM Cortex –A9 Processor core as you explore its robust AXI peripheral set. Doing so you will utilize the Xilinx embedded systems tool set to design a Zynq AP SoC system, add Xilinx IP as well as custom IP, run software applications to test the IP, and finally debug your embedded system.
From within an Ubuntu OS running within a virtual machine, learn how to install PetaLinux 2017.1 and build embedded Linux targeting MiniZed. In the hands-on labs learn about Yocto and PetaLinux tools to import your own FPGA hardware design, integrate user space applications, and configure/customize PetaLinux.
Using proven flows for SDSoC, the student will learn how to navigate SDSoC. Through hands-on labs, we will create a design for a provided platform and then also create a platform for the Avnet MiniZed. You will see how to accelerate an algorithm in the course lab.
Quite simply, Vadatech’s AMC584 module is an I/O monster. Its immense I/O capabilities start with the five QSFP28 100GbE-capable cages on the module’s front panel. Then there are the AMC Tongues. AMC Tongue 1 is fully routed with SerDes ports and there are as many as 20 lanes routed to Tongue 2. The AMC584 also contains a high-speed Zone 3 connector that provides the primary digital I/O routing and enables multi-module configurations.
The SerDes ports on these boards are all implemented in a Xilinx Virtex UltraScale+ XCVU13P FPGA, which is itself an I/O monster. It has 128 on-chip GTY 32.75Gbs SerDes transceivers, so it makes an ideal foundation for an I/O monster board.
Here’s a block diagram of the Vadatech AMC584 module:
Vadatech AMC584 Module Block Diagram
Now, before you get the idea that the Virtex UltraScale+ XCVU13P FPGA is just I/O, please understand that there are also 3780K system logic cells, 12,288 DSP48E2 slices, 94.5Mbits of BRAM, and 360Mbits of UltraRAM on the device as well, so it’s a DSP monster and a processing monster too. The Virtex UltraScale+ XCVU13P FPGA is capable of implementing just about any system you might imagine.
And just in case the hundreds of Mbits of SRAM on the Virtex UltraScale+ XCVU13P FPGA aren’t sufficient for your processing needs, the AMC584 module also has two banks of DDR4 SDRAM on board.
Vadatech AMC584 Module
Please contact Vadatech directly for more information about the AMC584 Module.
Here are two reasons you might want to participate in this Kickstarter campaign:
The Mycroft Mark II is a hands-free, privacy-oriented, open-source smart speaker with a touch screen. It has advanced far-field voice recognition and multiple wake words for voice-based cloud services such as Amazon’s Alexa and Google Home, courtesy of Aaware’s technology. (See “Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.”) The finished smart speaker requires a pledge of $129 (or $299 for three of them) but the dev kit version of the Mycroft Mark II requires a pledge of only $99, which is cheap as dev kits go. (Note: there are only 88 of these kits left, as of this writing.)
You could look at the Mycroft Mark II as a general-purpose, $99 Zynq UltraScale+ MPSoC open-source dev kit with a touch screen that’s also been enabled for voice control, which you can use as a platform for a variety of IIoT, cloud computing, or embedded projects. That in itself is a very attractive offer. As the Mycroft Mark II Kickstarter project page says: “The Mark II has special features that make hacking and customizing easy, not to mention thorough documentation and a community to lean on when building. Support for our community is central to the Mycroft mission.” That’s a lot for a sub-$100 dev kit, don’t you think?
If you’d like some intense training on the Xilinx Zynq UltraScale+ MPSoC—one of the most powerful embedded application processor (plus programmable logic) families that you can throw at an embedded-processing application—then Hardent’s 3-day class titled “Embedded System Design for the Zynq UltraScale+ MPSoC” might just be what you’re looking for. There’s a live, E-Learning version kicking off February 7 with live, in-person classes scheduled for North America from February 21 (in Ottawa) through August. The schedule’s on the referenced Web page.
You certainly might want a comprehensive course outline before you decide, so here it is:
Zynq UltraScale+ MPSoC Overview – Overview of the Zynq UltraScale+ MPSoC All Programmable device.
Application Processing Unit – Introduction to the members of the APU (based on 64-bit Arm Cortex-A53 processors) and how to configure and manage the APU cluster.
Real-Time Processing Unit – Introduction to the various elements within the RPU including the dual-core Arm Cortex-R5 processor and different modes of configuration.
QEMU – Introduction to the Quick Emulator: an emulation tool for the Zynq UltraScale+ MPSoC device that lets you run software whenever, wherever without the actual hardware.
Platform Management Unit –Tools and techniques for debugging your Zynq UltraScale+ MPSoC design.
Booting – Learn how to implement an embedded system including the boot process and boot-image creation.
AXI – Discover how the Zynq UltraScale+ MPSoC’s PS (processing system) and PL (programmable logic) connect to permit designers to create very high performance embedded systems with hardware-speed processing where needed.
Clocks and Resets – Overview of the Zynq UltraScale+ MPSoC’s clocking and reset functions, focusing more on capabilities than specific implementations.
DDR SDRAM and QoS – Learn how to configure the system’s DDR SDRAM to maximize system performance.
System Protection – Covers all the hardware elements that support the separation of software domains within the the Zynq UltraScale+ MPSoC’s PS.
Security and Software – Shows you how to use the safety and security features of the the Zynq UltraScale+ MPSoC in the context of embedded system design and introduces several standards.
ARM TrustZone Technology – Presents the use of the Arm TrustZone technology.
Linux – Discussion and examples showing you how to configure Linux to manage multiple processors.
Yocto – Compares kernel-building methods between a "pure" Yocto build and the Xilinx PetaLinux build (which uses Yocto "under-the-hood").
OpenAMP – Introduction to the concept of the Multicore Association’s OpenAMP framework for asymmetric multiprocessing on heterogeneous processor architectures like the the Zynq UltraScale+ MPSoC.
Hardware/Software Virtualization – Covers the hardware and software elements of virtualization. A lab shows you how to use hypervisors.
Xen Hypervisor – Starts with a description of generic hypervisors and then discusses the details of implementing a hypervisor based on Xen.
Ecosystem Support – Overview of the Zynq UltraScale+ MPSoC’s supported operating systems, software stacks, hypervisors, etc.
FreeRTOS – Overview of FreeRTOS with examples of how to use it.
Software Stack – Introduces the concept of a software stack and discusses the many available stacks for the Zynq UltraScale+ MPSoC.
Curtiss-Wright’s VPX3-535 3U OpenVPX transceiver module implements a single-slot, dual-channel, 6Gsamples/sec analog data-acquisition and processing system using two 12-bit, 6Gsamples/sec ADCs and two 12-bit, 6Gsamples/sec DACs. This is the type of capability you need for demanding applications such as radar, Signal Intelligence (SIGINT), Electronic Warfare (EW), and Software Defined Radio (SDR). This amount of analog-to-digital and digital-to-analog conversion capability demands wicked-fast digital processing and on the VPX3-535 transceiver module, that digital processing comes in the form of two of Xilinx’s most powerful All Programmable devices: a Virtex UltraScale+ VU9P and a Zynq UltraScale+ ZU4 MPSoC.
Here’s a block diagram of the Curtiss-Wright VPX3-535 module:
The VPX3-535 is Curtiss-Wright’s first publicly announced module to feature full compliance to the VITA 48.8 Air-Flow-Through (AFT) cooling standard, which ensures optimal performance in the harshest conditions. VITA 48.8 provides a low-cost, effective means to cool high-power COTS 3U and 6U VPX modules that dissipate ~150W+.
At the same time, Curtiss-Wright is also introducing a conduction-cooled variant, called the VPX3-534, which designed for applications that do not require the performance of the VPX3-535. The VPX3-534 supports the same dual-channel, 12-bit, 6Gsamples/sec ADC and DAC channels as the VPX3-535 but it replaces the Virtex UltraScale+ FPGA with a Xilinx Kintex UltraScale KU115 FPGA. This module also supports an option for four 3Gsamples/sec ADC channels.
Please contact Curtiss-Wright directly for more information about the VPX3-535 and VPX3-534 OpenVPX transceiver modules.
Keysight published a 14-minute video back in 2015 that gives you the basics behind RF beamforming and its use in 5G applications. The video also invites you to download a free, 30-day trial of Keysight’s SystemVue with Keysight’s 5G simulation library to try out some of the concepts discussed in the video and the link appears to be active still.
Here’s the video:
Meanwhile, should you need an implementation technology for RF beamforming (5G or otherwise), allow me to suggest that the new Xilinx Zynq UltraScale+ RFSoC with its many integrated RF ADCs and DACs be at the top of your technology choices. There is literally no other device like the Zynq UltraScale+ RFSoC. It’s in a category of one.
For more information about the Zynq UltraScale+ RFSoC, see:
The MV1-D1280-L01-3D05-1280-G2 based on a LUXIMA LUX1310 image sensor has a triangulation rate of 948fps for a 1280x1024-pixel image but narrow its programmable region of interest to 768x16 pixels and its triangulation rate jumps to a blindingly fast 68,800fps.
Photonfocus’ MV1-D1280-L01-3D05-1280-G2 high-speed 3D camera operates at triangulation rates as fast as 68,800fps
As you might expect, the high-speed interface and processing requirements for the sensors in these two 3D-imaging cameras differ significantly, which is why both of these cameras, like other cameras in Photonfocus’ MV1 product line, are based on a Xilinx Spartan-6 LX75T FPGA. As discussed in the prior blog post last September, use of the Spartan-6 FPGA permits Photonfocus to use an extremely flexible and programmable, real-time, vision-processing platform that serves as a foundation for many different types of cameras with very different imaging sensors and very different sensor interfaces—all operating at high speed.
The Zynq UltraScale MPSoC is a complex system on chip containing as many as four Arm Cortex-A53 application processors, a dual-core Arm Cortex-R5 real-time processor, a Mali GPU, and of course programmable logic. When it comes to generating our software application, we want to use the A53-based Application Processing Unit (APU) and R5 Real-Time Processing Unit (RPU) cores appropriately. This means we want to use the APU for computationally intensive, high-level applications or virtualization while using the RPU for real-time control and monitoring.
This means the APU will likely be running an operating system such as Linux while the real-time needs are addressed by the RPU using bare-metal software or a simplified OS such as FreeRTOS. Often an overall system solution requires communication between the APU and RPU to achieve the desired solution functionality but communication between different processors running different applications has previously been challenging and ad-hoc with inter-processor communications (IPC) using shared memory, mail boxes, or even networks for IPC. As a result, IPC solutions differed from implementation to implementation and device to device, which increased development time and hence time to market.
This is inefficient engineering.
To best leverage the capabilities of the UltraScale+ Zynq MPSoC, we need an open framework that allows us to abstract device-specific interfaces and enables the implementation of AMP (asymmetric multi-processing) with greater ease across multiple projects.
OpenAMP developed by the Multicore Association provides everything we need to run different operating systems on the APU and RPU. Of course, for OpenAMP to function from processor to processor, we need an abstraction layer that provides device-specific interfaces (e.g. interrupt handlers, memory requests, and device access). The libmetal library provides these for Xilinx devices through several APIs that abstract the processor.
For our Zynq UltraScale+ MPSoC designs, the provided OpenAMP frameworks enable messaging between the master processor and remote processor and lifecycle management of the remote processor using the following structures:
Remoteproc – enables lifecycle management of the remote processor. This includes downloading the application to the remote processor, stopping and starting the remote processor, and system resource allocation as required.
RPMsg - supports IPC between different processors in the system.
OpenAMP remoteproc and RPMsg concepts
For this example, we are going to run Linux on the APU and a bare-metal application on the RPU using RPMsg within the kernel space. When we run the RPMsg from within the kernel space, the remote application lifecycle must be managed by Linux. This means the remote processor application does not run independently. If we develop the RPMsg application to run within the Linux user space, the remote processor can run independently.
To create this example first we need to enable remote-processor support within our Linux build. This requires that we rebuild the petalinux project, customising the kernel and root fs. If you are not familiar with building petalinux you might want to read this blog.
Within our petalinux project, the first thing we need to do is enable the remoteproc driver. Using a terminal application within the petalinux project, issue the command:
petalinux-config -c kernel
This will open the kernel configuration menu. Here we can enable the remote-processor drivers which are located under:
Device Drivers -> Remoteproc drivers
Enabling the Remoteproc drivers
The second step is to include the OpenAMP examples within the file system. Again inside the project, issue the command:
petalinux-config -c rootfs
Within the configuration menu, navigate to Filesystem Packages -> misc and enable the packagegroup-petalinux-openamp:
Enabling the package group
The final step before we can rebuild the petalinux image is to update the device tree. We can find an OpenAMP template dtsi file at the location:
Once the kernel, filesystem, and device tree have been updated, rebuild the petalinux image using the command below:
This will generate an updated Linux build that we can copy onto the boot medium of choice and run on our Zynq UltraScale+ MPSoC design.
Using a terminal connected to our preferred development board (in my case the UltraZed), we can test the OpenAMP examples we included within the Linux file System. There are three examples provided: echo test, matrix multiplication, and proxy server.
I ran the matrix-multiply example because it demonstrates the remote processor performing mathematical calculations.
Using the terminal, I entered the following commands:
Following the on-screen menu and commands, I ran the example which provided the results below:
Executing the Matrix Multiply Example
Matrix Multiply example running
This example shows that the OpenAMP framework is running correctly on the Zynq UltraScale+ MPSoC petalinux build and that we can begin to create our own applications. If you want to run the other two examples refer to UG1186.
If we wish to create our own OpenAMP-based application for the RPU, which uses the kernel space RPMsg, we can create this using the SDK and install the generated elf as an app within petalinux. Although it does mean we need to rebuild the petalinux image again, we will look at how we do this in another blog. There is a lot more for us to explore here.
Note: We have looked at OpenAMP before for the Zynq 7000 series of devices in blogs 169 & 171.
Do you need an extremely powerful yet extremely tiny SOM to implement a challenging embedded design? Enclustra’s credit-card sized Mercury+ XU1 is worth your consideration. It packs a Xilinx Zynq UltraScale+ MPSoC, as much as 8Gbytes of DDR4 SDRAM with ECC, 16Gbytes of eMMC Flash memory, two Gbit Ethernet PHYs, two USB 2.0/3.0 PHYs, and 294 user I/O pins on three 168-pin Hirose FX10 connectors into a mere 74x54mm. That’s a lot of computational horsepower in a teeny, tiny package.
Here’s a block diagram of the Mercury+ XU1 SOM:
By itself, the Zynq UltraScale+ MPSoC gives the SOM a tremendous set of resources including:
A quad-core Arm Cortex-A53 64-bit application processor
A dual-core Arm Cortex-R5 32-bit real-time processor
An Arm Mali-400 MP2 GPU
As many as 747K system logic cells (in a Zynq UltraScale+ ZU15EG MPSoC)
As much as 26.2Mbits of BRAM and 31.5Mbits of UltraRAM (in a Zynq UltraScale+ ZU15EG MPSoC)
As many as 3528 DSP48E2 slices (in a Zynq UltraScale+ ZU15EG MPSoC)
As with many product family designs, Enclustra is able to tailor the price/performance of the Mercury+ XU1 SOM by offering multiple versions based on different pin-compatible members of the Zynq UltraScale+ MPSoC family.
Please contact Enclustra directly for more information about the Mercury+ XU1 SOM.
If you’re designing and debugging high-speed logic—as you might with video or radar applications, for example—then perhaps you could use some fast debugging capability. As in really-fast. As in much, much, much faster than JTAG. EXOSTIV Labs has got a solution. It’s called the EXOSTIV FPGA Debug Probe and it uses the [bulletproof] high-speed SerDes ports that are pervasive throughout Xilinx All Programmable device families to extract debug data from running devices with great alacrity.
Here’s a 3-minute video showing the EXOSTIV FPGA Debug Probe communicating with a Xilinx Virtex UltraScale VCU108 Eval Kit, connected through the kit’s high-speed QSFP connector, creating a 50Gbps link between the board and the debugger.
Here’s a second 3-minute video with some additional information. This one shows the EXOSTIV Probe and Dashboard being used to monitor 640 signals in a high-speed video interface design:
You observe the captured data using the EXOSTIV Dashboard, as demonstrated in the above video. The probe and software can handle debug data from as many as 32,768 internal nodes per capture. The mind boggles at the potential complexity being handled here.
According to EXOSTIV, the FPGA Debug Probe and Dashboard give you 200,000x more observability into your design than the tools you might currently be using. That’s a major leap in debugging speed and capability that could save you days or weeks of debugging time.
When you’ve exhausted JTAG’s debug capabilities, consider EXOSTIV.
Today marks the launch of Joshua Montgomery’s Mycroft Mark II open-source Voice Assistant, a hands-free, privacy-oriented smart speaker with a touch screen that also happens to be based on a 6-microphone version of Aaware’s Sound Capture Platform. In fact, according to today’s article on EEWeb written by my good friend and industry gadfly Max Maxfield, Aaware is designing the pcb for the Mycroft Mark II Voice Assistant, which will be based on a Xilinx Zynq UltraScale+ MPSoC according to Max’s article. (It’s billed as a “Xilinx quad-core processor” in the Kickstarter project listing.) According to Max’s article, “This PCB will be designed to support different microphone arrays, displays, and cameras such that it can be used for follow-on products that use the Mycroft open-source voice assistant software stack.”
To repeat: That’s an open-source, consumer-level product based on one of the most advanced MPSoC’s on the market today with at least two 64-bit Arm Cortex-A53 processors and two 32-bit Arm Cortex-R5 processors plus a generous chunk of the industry’s most advanced programmable logic based on Xilinx’s 16nm UltraScale+ technology.
Aaware’s technology starts with an array of six individual microphones. The outputs of these microphones are combined and processed with several Aaware-developed algorithms including acoustic echo cancellation, noise reduction and beamforming that allow the Mycroft Mark II smart speaker to isolate the voice of a speaking human even in noisy environments. (See “Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.”) The combination of Aaware’s Sound Capture Platform, Mycroft’s Mark II smart speaker open-source code, and the immensely powerful Zynq UltraScale+ MPSoC give you an incredible platform for developing your own end products.
Here’s a 3-minute video demo of the Mycroft Mark II smart speaker’s capabilities:
Pledge $99 on Kickstarter and you’ll get a DIY dev kit that includes the pcbs, an LCD, speakers, and cables but no handsome plastic housing. Pledge $129—thirty bucks more—and you get a built unit in an elegant housing. There are higher pledge levels too.
What’s the risk? As of today, the first day of the pledge campaign, the project is 167% funded, so it’s already a “go.” There are 28 days left to jump in. Also, Mycroft delivered the Mark I speaker, a previous Kickstarter project, last July so the company has a track record of successful Kickstarter project completion.
In a new report titled “Hitting the accelerator: the next generation of machine-learning chips,” Deloitte Global predicted that “by the end of 2018, over 25 percent of all chips used to accelerate machine learning in the data center will be FPGAs and ASICs.” The report then continues: “These new kinds of chips should increase dramatically the use of ML, enabling applications to consume less power and at the same time become more responsive, flexible and capable, which is likely to expand the addressable market.” And later in the Deloitte Global report:
“There will also be over 200,000 FPGA and 100,000 ASIC chips sold for ML applications.”
“…the new kinds of chips may dramatically increase the use of ML, enabling applications to use less power and at the same time become more responsive, flexible and capable, which is likely to expand the addressable market…”
“Total 2018 FPGA chip volume for ML would be a minimum of 200,000. The figure is almost certainly going to be higher, but by exactly how much is difficult to predict.”
These sorts of statements are precisely why Xilinx has rapidly expanded its software offerings for machine-learning development from the edge to the cloud. That includes the reVISION stack for developing responsive and reconfigurable vision systems and the Reconfigurable Acceleration stack for developing and deploying platforms at cloud scale.
XIMEA has added two new high-speed industrial cameras to its xiB-64 family: the 1280x864-pixel CB013 capable of imaging at 3500fps and the 1920x1080-pixel CB019 capable of imaging at 2500fps. As with all digital cameras, the story for these cameras starts with the sensors. The CB013 camera is based on a LUXIMA Technology LUX13HS 1.1Mpixel sensor and the CB019 is based on a LUXIMA Technology LUX19HS 2Mpixel sensor. Both cameras use PCIe 3.0 x8 interfaces capable of 64Gbps sustained transfer rates. Use of the PCIe interface allows a host PC to use DMA for direct transfers of the video stream into the computer’s main memory with virtually no CPU overhead.
Both cameras are also based on a Xilinx Kintex UltraScale KU035 FPGA. Why such a fast FPGA in an industrial video camera? The frame rates and 64Gbps PCIe interface transfer rate are all the explanation you need. The Kintex UltraScale KU035 FPGA has 444K system logic cells and 1700 DSP48E2 slices—ample for handling the different sensors in the camera product line and just about any sort of video processing that’s needed. The Kintex UltraScale FPGA also incorporates two integrated (hardened) PCIe Gen3 IP blocks with sixteen bulletproof 16.3Gbps SerDes transceivers to handle the camera’s PCIe Gen3 interface.
Xcell Daily has covered the FPGA-accelerated AWS EC2 F1 instances from Amazon Web Services several times. The AWS EC2 F1 instances allows AWS customers to develop accelerated code in C, C++, OpenCL, Verilog, or VHDL and run it on Amazon servers augmented with hardware-accelerated cards based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. (See below.)
A new AWS case study titled “Xilinx Speeds Testing Time, Increases Developer Productivity Using AWS” turns the tables. It discusses Xilinx’s use of AWS services to speed development of Xilinx development software such as the Vivado and SDx development environments. Xilinx employs extensive regression testing when developing new releases of these complex tools and the resulting demand spikes called for more “elastic” server resources. (Amazon’s “EC2” designation stands for “Elastic Compute Cloud.”)
As the case study states:
“Xilinx addressed its infrastructure-scaling problem by migrating to a high-performance computing (HPC) cluster running on Amazon Web Services (AWS). ‘We evaluated several cloud providers and chose AWS because it had the best tools and most mature solution,’” says [Ambs] Kesavan, [software engineering and DevOps director at Xilinx].
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Everspin’s nvNITRO NVMe Storage Accelerator is a persistent-memory PCIe storage card for cloud and data-center applications that delive4rs up to 1.46 million IOPS for random 4Kbyte mixed 70/30 read/write operations. It’s based on Everspin’s STT-MRAM (spin-transfer torque magnetic RAM) chips and uses a Xilinx Kintex UltraScale KU060 FPGA to implement the MRAM controller and the board’s PCIe Gen3 x8 host interface. Everspin has just published an nvNITRO application note titled “Accelerating Fintech Applications with Lossless and Ultra-Low Latency Synchronous Logging using nvNITRO” that details the use of the nvNITRO Storage Accelerator to speed cloud-based financial transactions. The application note explores how Everspin nvNITRO technology can improve FinTech (Financial Technology) performance without creating additional compliance risks.
If you haven’t looked deeply into the intricacies of financial trading transactions, the app note starts with a clarifying block diagram, which shows the multiple layers built into the transaction process:
The diagram shows many opportunities for accelerating transactions, which is important because in this market, microseconds translate into millions of dollars gained or lost.
If you’re developing cloud-based systems and acceleration is important, whether or not you’re developing FinTech applications, take a few minutes to read the Everspin app note.
5G NR gNodeB deployments start as early as mid CY2019. Three key challenges:
Instantiating gNodeB and NGCore network functions in Telco Cloud
Next-generation fronthaul that enables gNodeB functional partitioning
Massive MIMO radios
A Xilinx session at Mobile World Congress (MWC) on March 1 titled “Enabling 5G NR Deployments” will discuss these three facets of 5G NR and strategies needed to overcome these challenges in three separate presentations. The presentation titles are:
5G NR acceleration in Telco Cloud
5G Transport Network and Packet Based Fronthaul
Implementing 5G NR Massive MIMO Radio
The three presenters are:
Awanish Verma, Senior Architect and Product Manager in Xilinx’s Communication Business Unit
Harpinder Matharu, Director of the Communications Business at Xilinx
Raghu Rao, Principal Architect and Director of Strategic Marketing for Wireless Products at Xilinx
The three presenters will also discuss real-world lessons learned while working with the supply chain—including operators, system vendors, semiconductor and software providers—in building 5G proofs of concepts and trial testbeds.
The hour-long event starts at 1:00pm and is being held in Hall 8.0, NEXTech Theatre F at MWC in Barcelona.
Huawei’s FACS cloud offering is based on a PCIe server card that incorporates a Xilinx Virtex UltraScale+ VU9P FPGA. (Huawei also offers the board for on-premise installations.) In addition to the hardware, Huawei offers three major development tools for FACS:
An SDAccel-based shell that offers fast, easy development. SDAccel is Xilinx’s development environment for C, C++, and OpenCL. This shell also provides access to Xilinx’s Vivado development environment.
A DPDK shell for high-performance applications. Intel originally developed DPDK as a packet-processing framework for accelerated server systems and Huawei’s implementation can support throughputs as high as 12Gbytes/sec.
A Professional Simulation Platform that encapsulates more than two decades of Huawei’s FPGA development experience.
With these offerings, Davies said, Huawei is looking to add partners to expand its ecosystem and is particularly interested in talking to companies that offer:
There’s a Huawei Cloud Marketplace that serves as an outlet for FACS applications. The company is also welcoming end users to try the service.
Here’s a video of Davies’ 32-minute presentation at XDF:
Amazon’s Senior Director of Business Development and Product, Gadi Hutt, gave an in-depth presentation at the recent Xilinx Developers Forum in Frankfurt, Germany where he detailed the specifics, advantages, and the nuts-and-bolts “how to” with respect to using the FPGA-based AWS EC2 F1 instances to accelerate your business.
First, Hutt gave one of the most succinct definitions of “the cloud” I’ve heard: “the on-demand delivery of compute, storage, networking, etc. services.” This definition is free of the niggling details such as hardware, networking, power, and cooling that you are now free to ignore.
Then Hutt listed the advantages of cloud-based services:
Agility and speed of innovation
Elasticity: scale up or down quickly, as needed
Breadth of functionality
Go global in minutes
From there, Hutt provided a deep explanation of the steps you need to take to distribute cloud-based services globally. He also quoted a Gartner estimate, which said that AWS (Amazon Web Services) has more compute capacity than all of the other cloud providers combined. Certainly, this Gartner report puts AWS far in the upper right corner of the Gartner Magic Quadrant for Cloud Infrastructure as a Service, Worldwide.
Using AWS allows your company to “get out of IT” and focus on providing specialized services where you can add value, said Hutt. “You can focus on your core business,” he continued.