Last August, I wrote a blog titled “The $10 Xilinx Box, the FPGA Board, and the Man from HP Labs: An Xcell Daily triple mystery story” about a 20-year-old, Xilinx-branded box containing a Xilinx demo board with two early-generation Xilinx FPGAs (an XC3020 and an XC4003E) that I’d found at HSC Electronics, a Silicon Valley surplus electronics store. According to the Xilinx shipping invoice in the box, it had belonged to Dave Moberly, who had worked for HP Labs at the time and he’d been instrumental in getting FPGA technology introduced into HP instrumentation starting with the HP 8145A Optical TDR. Sadly, David Moberly passed away on March 9, 2017 so I was not able to interview him.
However, early this month, an Xcell Daily reader using the handle “tomshoup” left a comment to that blog post that continue’s Moberly’s story. I found the comment so compelling that I’m promoting it to a full blog post below:
I was David Moberly's next-to-last manager at Agilent and had hired him into his last job at HP/Agilent from elsewhere in HP, around 1999. HP Labs at 1501 Page Mill Road was incubating a new medical business that was far different than the existing capital equipment medical business. We were designing patient monitoring equipment to be used to monitor patients with congestive heart failure at home, to be offered as a service to Medicare HMOs. We built a simple set of instruments to measure weight (yes, we built a bathroom scale, but it was an HP bathroom scale), blood pressure and heart rate, and a single lead ECG rhythm strip. We introduced this product to the market in 1999 at the Heart Failure Society of America at their annual meeting in San Francisco that year. After about 18 months we had ~5,000 patients under monitoring for a marquee customer. Philips, which bought the medical business from Agilent in 2001 still offers a current generation of this equipment and the associated service.
David was one of several people who basically came up to me and said "I've always wanted to work on medical equipment but didn't want to move to Boston (location of HP's medical business), so here I am." He was one of the most inquisitive people I've ever met, irritatingly so sometimes, but always in an endearing way.
I left Agilent in 2002 when they sold the business to Philips but kept bumping into David and learned he had a love affair with FPGAs. One product he was over the moon about was mixed-signal acquisition systems that are basically an FPGA that provides 2 to 4 channels of analog 'scope function, 16 channels of logic analyzer input, some power supply output, spectrum analysis, arbitrary waveform generation, and other features, all using a laptop as the user interface. Somehow David became the go-to guy to test drive these because of his blogging about them, so manufacturers of such systems would send him new devices to play with and hopefully write them up in his blogs.
David and I ended up working together again early in 2013. I was a contractor at a medical-device company and the company needed help writing software test plans and protocols. I introduced them to David and they approved me hiring him as a subcontractor. We worked together for about another year then. David was retired then and wasn't really looking for work, but was intrigued with the product and the chance to work again. He told me afterwards that he paid for his daughter's wedding with that gig. He also gave me a wonderful thank-you gift: a brand new Analog Discovery by Digilent, one of those [Spartan-6] FPGA-based mixed-signal acquisition systems. I've used it a lot in my current consulting work and think of David every time I use it. David's fondness for these instruments rubbed off on me and we exchanged a round of e-mail as I lusted after the Picoscope 3000, the Cadillac in David's words; I still lust after it but haven't taken the leap. I know if I buy it David would be proud, and if he was still alive would probably want to borrow it to take it apart.
Through mutual friends I knew David had had a bout with cancer. We met for coffee after he was in remission and true to form he told me in detail about his treatment, with his usual level of fascination at the technology, almost to the molecular level.
After David died, I knew his wife Cheryl was faced with the too-common task of being the widow of a pack-rat engineer who had excess storage capacity: think containers. So I volunteered to help Cheryl sort through the Moberly archives and introduced her to HSC Electronics as a way to recycle some of what David had collected. Having known David, and ending up sorting through his stash with Cheryl, I could easily imagine the glee he must have felt when he acquired all those goodies we sorted through.
David had a wonderful career: MIT education, Apple, Trimble, HP, HP Labs, Agilent, Philips. A small team of engineers, with a couple of Davids in it, could probably build anything and run it with an FPGA and a couple of AA batteries.
My thanks to “tomshoup” for sharing this story with Xcell Daily.
Last week, the Mycroft Mark II Privacy-Centric Open Voice Assistant Kickstarter Project, which is based on Aaware’s far-field Sound Capture Platform and the Xilinx Zynq UltraScale+ MPSoC, hit 300% funding on Kickstarter. Today, the pledge level hit 400%—$200k—with 1235 backers. There are still 18 days left in the funding campaign; still time for you to get in on this very interesting, multi-talented smart speaker and low-cost, open-source Zynq UltraScale+ MPSoC development platform.
Meanwhile, there seems to be a new pledge level that I don’t recall: a $179 level that includes a 1080p video camera. That’s in addition to the touch screen and voice input, which gives the Mycroft Mark II an even more interesting user interface. There are only a limited number of $179 pledge options, with 177 remaining as of the posting of this blog.
In addition, Fast Company has also published an article on the Mycroft Mark I Kickstarter project titled “Can Mycroft’s Privacy-Centric Voice Assistant Take On Alexa And Google?” Be sure to take a look.
For more information about the Mycroft Mark II Open Voice Assistant, see:
For more information about Aaware’s far-field Sound Capture Platform, see:
Aldec recently published a very descriptive example design for a high-performance, re-programmable network router/switch based on the company’s TySOM-2A-7Z030 embedded development board and an FMC-NET daughter card. The TySOM-2A-7Z030 incorporates a Xilinx Zynq Z-7030 SoC. The Zynq SoC’s dual-core Arm cortex-A9 MPCore processor runs the OpenWrt Linux distribution for embedded devices in this design. It’s a favorite for network switch developers. The design employs the programmable logic (PL) in the Zynq SoC to create four 1G/2.5G Ethernet MACs that connect to the FMC-NET card’s four Ethernet PHYs and a 10G Ethernet subsystem that connects to the FMC-NET card’s QSFP+ card cage.
Here’s a block diagram of the design:
For more information about the Aldec TySOM-2A dev board, see “Aldec introduces TySOM-2A Embedded Proto Board based on Zynq Z-7030 SoC, demos real-time face-detection ref design.”
For more information about the re-programmable network switch design, please contact Aldec directly.
Last week, Xilinx posted a 2-minute video showing a Xilinx Virtex UltraScale+ XCVU37P HBM-enhanced FPGA operating with the on-device HBM DRAM communicating at full speed (460Gbytes/sec), error-free, over 32 channels. (See “Virtex UltraScale+ FPGA augmented with co-packaged HBM DRAM operating at full speed (460Gbytes/sec), error-free, on the very first day of silicon bringup.”)
Bittware has already announced two PCIe boards for these HBM-enhanced Xilinx FPGAs:
Bittware’s XUPVVH double-slot PCIe board, block diagram
Bittware’s XUPSVH single-slot PCIe board, block diagram
As a reminder, Bittware had previously announced the XUPVV4 based on the Virtex UltraScale+ VU13P FPGA. (See “Warning, Naked FPGA: Bittware’s XUPVV4 PCIe card goes big using the Xilinx Virtex UltraScale+ VU13P “lidless” FPGA.”)
Please contact Bittware directly for more information about these PCIe boards.
By Adam Taylor
We recently looked at how we could use the Zynq SoC’s XADC streaming output with DMA. For that example, I demonstrated only outputting one XADC channel over an AXI stream. However, it is important we understand how we can use multiple channels within an AXI stream to transfer them to processor memory whether we are using the XADC as the source or not.
To demonstrate this, I will be updating the XADC design that we used for the previous streaming example. Upgrading the software is simple. All we need to do is enable another XADC channel when we configure the sequencer and update the API we use. Updating the hardware is a little more complicated.
To upgrade the Vivado hardware design, the first thing we need to do is replace the DMA IP module with the multi-channel DMA (MCDMA) IP core. The MCDMA IP core supports as many as 16 different input channels. DMA channels are mapped to AXI Stream contents using the TDest bus, which is part of the AXIS standard.
As with the previous XADC streaming example, we’ll configure the MCDMA for uni-directional operation (write only) and support for two channels:
Configuration of the MCDMA IP core.
Vivado design with the MCDMA IP Core (Tcl BD available on GitHub)
TDest is the AXI signal used for routing AXI Stream contents. In addition, when we configure the XADC for AXI Streaming, the different XADC channels output on the stream are identified by the TId bus.
To be able to use the MCDMA in conjunction with the XADC, we need to remap the XADC TId channel to the MCDMA TDest channel. We also need to packetize data by asserting TLast on the MCDMA AXIS input.
In the previous example we used a custom IP core to generate the TLast signal. A better solution however is to remap the TId and generate the TLast signal using the AXIS subset converter.
AXIS Subset Converter Configuration
The eagle-eyed will at this point notice that the XADC uses channel numbers up to 31 with the auxiliary inputs using channel ID’s (16 to 31), which are outside the channel range of the MCDMA. If we are using the auxiliary inputs, we can also use the AXIS subset convertor to remap these higher channel numbers into the MCDMA range by remapping the lower four bits of the XADC TId to the MCDMA TDest channel. When using this method the lower XADC channels cannot be used otherwise there would be conflict.
Output of the XADC with multiple channels (Temperature & VPVN Channels)
Output of the AXIS subset block following remapping and TLast Generation
When it comes to updating the software application, we need to use the XMCDMA.h header APIs to configure the MCDMA and set up the buffer descriptors for each of the channels. The software performs the following steps:
The software application defines several buffer descriptors for each channel. When it comes to the receive buffer for this example, I have used a single receive buffer so the received data for both channels shares the same address space. This can be seen below. Halfwords starting 0x4yyy relate to the VPVN input while the device temperature half words start 0x9yyy.
Memory Contents showing the two channels
This is a simple adaption to the existing software to use multiple receive buffers in memory. For many applications, separate receive buffers are more useful.
Being able to move AXI Data streams to memory-mapped locations is a vital requirement for many applications—for example signal processing, communication, and sensor interfacing. Using the AXI subset convertor allows us to correctly remap and format the AXIS stream data into a compliant format for the MCDMA IP core.
You can find the example source code on GitHub.
Adam Taylor’s Web site is http://adiuvoengineering.com/.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here
These are very early days for autonomous vehicles and if you’re designing control systems for such machines, you’d better be thinking of using technologies that bring adaptable intelligence to party. (Would you want to ride in a self-driving car that’s not adaptable?)
The SAE (Society of Automotive Engineers) thinks this is a serious enough topic to host a free Webinar on the topic titled “Reconfigurable Chips for Automated/Connected and Cyber-Secure Vehicles.” This is a panel-style Webinar with the following speakers:
Moderated by SAE International’s Lisa Arrigo.
The Webinar is being held on February 22. More details and registration here.
There is no PCIe Gen5—yet—but there’s a 32Gbps/lane future out there and TE Connectivity demonstrated that future at this week’s DesignCon 2018. The demo’s real purpose was to show the capabilities of TE Connectivity’s Sliver connector system which includes card-edge and cabled connectors. In the demo at DesignCon, four channels carry 32Gbps data streams through surface-mount and right-angle connectors to create a mockup of a future removable-storage device. Those 32Gbps data streams are generated, transmitted, and received by bulletproof Xilinx UltraScale+ GTY transceivers operating reliably at a theoretical PCIe Gen5’s 32Gbps/lane data rate despite 35dB of loss through the demo system.
Here’s a 1-minute video of the demo:
Xilinx demonstrated its 56Gbps PAM4 SerDes test chip nearly a year ago (See “3 Eyes are Better than One for 56Gbps PAM4 Communications: Xilinx silicon goes 56Gbps for future Ethernet”) and this week at DesignCon 2018 in Santa Clara, Samtec used that chip to demo its high-speed ExaMAX backplane connectors on working on an actual backplane. The demo setup included a Xilinx board with the PAM4 test chip driving a pair of coaxial cables connected to a paddle card plugged into one end of a backplane, which was populated with 14 ExaMAX connectors. A second paddle card at the other end received the PAM4 signals and conveyed them back to the Xilinx board via a second set of coaxial cables. Here’s a photo of the setup:
In the following short video, Samtec’s Ralph Page describes the demo and mentions the nice eyes and clear data levels, as seen on the Xilinx demo software screen positioned above the demo boards. He also mentions the BER—5.29x10-8. That’s the error rate before adding the error-reducing capabilities of a FEC, which can drop the error rate by perhaps another ten orders of magnitude or more.
Samtec’s demo points to a foreseeable future where you will be able to develop large backplanes with screamingly fast performance using PAM4 SerDes transceivers.
Here’s the 46-second demo video:
Today, Digilent announced a $299 bundle including its Zybo Z7-20 dev board (based on a Xilinx Zynq Z-7020 SoC), a Pcam 5C 5Mpixel (1080P) color video camera, and a Xilinx SDSoC development environment voucher. (That’s the same price as a Zybo Z7-20 dev board without the camera.) The Zybo Z7 dev board includes a new 15-pin FFC connector that allows the board to interface with the Pcam 5C camera over a 2-lane MIPI CSI-2 and I2C interfaces. (This connector is pin-compatible with the Raspberry Pi’s FFC camera port.) The Pcam 5C camera is based on the Omnivision OV5640 image sensor.
Digilent has created the Pcam 5C + Zybo Z7 demo project to get you started. The demo accepts video from the Pcam 5C camera and passes it out to a display via the Zybo Z7’s HDMI port. All IP used in the demo including a D-PHY receiver, CSI-2 decoder, Bayer to RGB converter and gamma correction is free and open-source so you can study exactly how the D-PHY and CSI-2 decoding works and then develop you own embedded vision products.
If you want this deal, you’d better hurry. The offer expires February 23—three weeks from today.
Rigol’s new RSA5000 real-time spectrum analyzer allows you to capture, identify, isolate, and analyze complex RF signals with a 40MHz real-time bandwidth over either a 3.2GHz or 6.5GHz signal span. It’s designed for engineers working on RF designs in the IoT and IIot markets as well as industrial, scientific, and medical equipment. Rigol was demonstrating the RSA5000 real-time spectrum analyzer at this week’s DesignCon being held at the Santa Clara Convention Center. I listened to a presentation from Rigol’s North American General Manager Mike Rizzo and then a demo by Rigol’s Director of Product Marketing & Software Applications Chris Armstrong, both captured in the 2.5-minute video below.
Rigol RSA5000 Real-Time Spectrum Analyzer
Based on what I saw in the demo, this is an extremely responsive instrument—far more responsive than a swept spectrum analyzer—with several visualization display modes to help you isolate the significant signal in a sea of signals and noise, in real time. It’s capable of continuously executing 146,484 FFTs/sec, which results in a minimum 100% POI (probability of intercept) of 7.45μsec. You need some real DSP horsepower to achieve that sort of performance and the Rigol RSA5000 real-time spectrum analyzer gets this performance from a pair of Xilinx Zynq Z-7015 SoCs. (You'll find many more details about real-time spectrum analysis and the RSA5000 Real-Time Spectrum Analyzer in the Rigol app note "Realtime Spectrum Analyzer vs Spectrum Analyzer," attached at the end of this post. See below.)
Rigol RSA5000 Real-Time Spectrum Analyzer Display Modes
Here’s the short presentation and demo of the Rigol RSA5000 real-time spectrum analyzer from DesignCon 2018:
Mike Rizzo told me that the Rigol design engineers selected the Zynq Z-7015 SoCs for three main reasons:
If you’re looking for a very capable spectrum analyzer, give the Rigol RSA5000 a look. If you’re designing your own real-time system and need high-speed computation coupled with fast user response, take a look at the line of Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.
The 2-minute video below shows you an operational Xilinx Virtex UltraScale+ XCVU37P FPGA, which is enhanced with co-packaged HBM (high-bandwidth memory) DRAM using Xilinx’s well-proven, 3rd-generation 3D manufacturing process. (Xilinx started shipping 3D FPGAs way back in 2011, starting with the Virtex-7 2000T and we’ve been shipping these types of devices ever since.)
This video was made on the very first day of silicon bringup for the device and it is already operating at full speed (460Gbytes/sec), error-free, over 32 channels. The Virtex UltraScale+ XCVU37P is one big All Programmable device with:
Whatever your requirements, whatever your application, chances are this extremely powerful FPGA will deliver all of the heavy lifting (processing, memory, and I/O) that you need.
Here’s the video:
For more information about the Virtex UltraScale+ HBM-enhanced device family, see “Xilinx Virtex UltraScale+ FPGAs incorporate 32 or 64Gbits of HBM, delivers 20x more memory bandwidth than DDR.”
The Avnet MiniZed is an incredibly low-cost dev board based on the Xilinx Zynq Z7007S SoC with WiFi and Bluetooth built in. It currently lists for $89 on the Avnet site. If you’d like a fast start to using this dev board, Avnet is ready to help. As of now, it’s placed four MiniZed Speedway Design Workshops online so that you can learn at your own convenience and your own pace. The four workshops are:
In the Developing Zynq Hardware Speedway, you will be introduced to the single ARM Cortex –A9 Processor core as you explore its robust AXI peripheral set. Doing so you will utilize the Xilinx embedded systems tool set to design a Zynq AP SoC system, add Xilinx IP as well as custom IP, run software applications to test the IP, and finally debug your embedded system.
In the Developing Zynq Software Speedway, you will be introduced to Xilinx SDK and shown how it offers everything necessary to make Zynq software design easy.
From within an Ubuntu OS running within a virtual machine, learn how to install PetaLinux 2017.1 and build embedded Linux targeting MiniZed. In the hands-on labs learn about Yocto and PetaLinux tools to import your own FPGA hardware design, integrate user space applications, and configure/customize PetaLinux.
Using proven flows for SDSoC, the student will learn how to navigate SDSoC. Through hands-on labs, we will create a design for a provided platform and then also create a platform for the Avnet MiniZed. You will see how to accelerate an algorithm in the course lab.
Avnet MiniZed Dev Board
Quite simply, Vadatech’s AMC584 module is an I/O monster. Its immense I/O capabilities start with the five QSFP28 100GbE-capable cages on the module’s front panel. Then there are the AMC Tongues. AMC Tongue 1 is fully routed with SerDes ports and there are as many as 20 lanes routed to Tongue 2. The AMC584 also contains a high-speed Zone 3 connector that provides the primary digital I/O routing and enables multi-module configurations.
The SerDes ports on these boards are all implemented in a Xilinx Virtex UltraScale+ XCVU13P FPGA, which is itself an I/O monster. It has 128 on-chip GTY 32.75Gbs SerDes transceivers, so it makes an ideal foundation for an I/O monster board.
Here’s a block diagram of the Vadatech AMC584 module:
Vadatech AMC584 Module Block Diagram
Now, before you get the idea that the Virtex UltraScale+ XCVU13P FPGA is just I/O, please understand that there are also 3780K system logic cells, 12,288 DSP48E2 slices, 94.5Mbits of BRAM, and 360Mbits of UltraRAM on the device as well, so it’s a DSP monster and a processing monster too. The Virtex UltraScale+ XCVU13P FPGA is capable of implementing just about any system you might imagine.
And just in case the hundreds of Mbits of SRAM on the Virtex UltraScale+ XCVU13P FPGA aren’t sufficient for your processing needs, the AMC584 module also has two banks of DDR4 SDRAM on board.
Vadatech AMC584 Module
Please contact Vadatech directly for more information about the AMC584 Module.
Mycroft AI’s Mycroft Mark II Open Voice Assistant, which is based on Aaware’s far-field Sound Capture Platform and the Xilinx Zynq UltraScale+ MPSoC, is a Kickstarter project initiated last Friday. (See “New Kickstarter Project: The Mycroft Mark II open-source Voice Assistant is based on Aaware’s Sound Capture Platform running on a Zynq UltraScale+ MPSoC.”) The Mycroft Mark II project was fully funded in an astonishingly short seven hours, guaranteeing that the project would proceed. After only four days, the project has exceeded its pledge goal of $50,000 by 300%. As of this writing, 935 backers have pledged $150,801 so the project is most definitely a “go” and the project team is currently developing stretch goals to extend the project’s scope.
Here are two reasons you might want to participate in this Kickstarter campaign:
Mycroft Mark II Voice Assistant Xray Diagram
If you’d like some intense training on the Xilinx Zynq UltraScale+ MPSoC—one of the most powerful embedded application processor (plus programmable logic) families that you can throw at an embedded-processing application—then Hardent’s 3-day class titled “Embedded System Design for the Zynq UltraScale+ MPSoC” might just be what you’re looking for. There’s a live, E-Learning version kicking off February 7 with live, in-person classes scheduled for North America from February 21 (in Ottawa) through August. The schedule’s on the referenced Web page.
You certainly might want a comprehensive course outline before you decide, so here it is:
Curtiss-Wright’s VPX3-535 3U OpenVPX transceiver module implements a single-slot, dual-channel, 6Gsamples/sec analog data-acquisition and processing system using two 12-bit, 6Gsamples/sec ADCs and two 12-bit, 6Gsamples/sec DACs. This is the type of capability you need for demanding applications such as radar, Signal Intelligence (SIGINT), Electronic Warfare (EW), and Software Defined Radio (SDR). This amount of analog-to-digital and digital-to-analog conversion capability demands wicked-fast digital processing and on the VPX3-535 transceiver module, that digital processing comes in the form of two of Xilinx’s most powerful All Programmable devices: a Virtex UltraScale+ VU9P and a Zynq UltraScale+ ZU4 MPSoC.
Here’s a block diagram of the Curtiss-Wright VPX3-535 module:
The VPX3-535 is Curtiss-Wright’s first publicly announced module to feature full compliance to the VITA 48.8 Air-Flow-Through (AFT) cooling standard, which ensures optimal performance in the harshest conditions. VITA 48.8 provides a low-cost, effective means to cool high-power COTS 3U and 6U VPX modules that dissipate ~150W+.
At the same time, Curtiss-Wright is also introducing a conduction-cooled variant, called the VPX3-534, which designed for applications that do not require the performance of the VPX3-535. The VPX3-534 supports the same dual-channel, 12-bit, 6Gsamples/sec ADC and DAC channels as the VPX3-535 but it replaces the Virtex UltraScale+ FPGA with a Xilinx Kintex UltraScale KU115 FPGA. This module also supports an option for four 3Gsamples/sec ADC channels.
Please contact Curtiss-Wright directly for more information about the VPX3-535 and VPX3-534 OpenVPX transceiver modules.
Keysight published a 14-minute video back in 2015 that gives you the basics behind RF beamforming and its use in 5G applications. The video also invites you to download a free, 30-day trial of Keysight’s SystemVue with Keysight’s 5G simulation library to try out some of the concepts discussed in the video and the link appears to be active still.
Here’s the video:
Meanwhile, should you need an implementation technology for RF beamforming (5G or otherwise), allow me to suggest that the new Xilinx Zynq UltraScale+ RFSoC with its many integrated RF ADCs and DACs be at the top of your technology choices. There is literally no other device like the Zynq UltraScale+ RFSoC. It’s in a category of one.
For more information about the Zynq UltraScale+ RFSoC, see:
Photonfocus has two new GigE video cameras for high-speed, 3D triangulation used in conjunction with a line laser:
Photonfocus’ MV1-D1280-L01-3D05-1280-G2 high-speed 3D camera operates at triangulation rates as fast as 68,800fps
As you might expect, the high-speed interface and processing requirements for the sensors in these two 3D-imaging cameras differ significantly, which is why both of these cameras, like other cameras in Photonfocus’ MV1 product line, are based on a Xilinx Spartan-6 LX75T FPGA. As discussed in the prior blog post last September, use of the Spartan-6 FPGA permits Photonfocus to use an extremely flexible and programmable, real-time, vision-processing platform that serves as a foundation for many different types of cameras with very different imaging sensors and very different sensor interfaces—all operating at high speed.
By Adam Taylor
The Zynq UltraScale MPSoC is a complex system on chip containing as many as four Arm Cortex-A53 application processors, a dual-core Arm Cortex-R5 real-time processor, a Mali GPU, and of course programmable logic. When it comes to generating our software application, we want to use the A53-based Application Processing Unit (APU) and R5 Real-Time Processing Unit (RPU) cores appropriately. This means we want to use the APU for computationally intensive, high-level applications or virtualization while using the RPU for real-time control and monitoring.
This means the APU will likely be running an operating system such as Linux while the real-time needs are addressed by the RPU using bare-metal software or a simplified OS such as FreeRTOS. Often an overall system solution requires communication between the APU and RPU to achieve the desired solution functionality but communication between different processors running different applications has previously been challenging and ad-hoc with inter-processor communications (IPC) using shared memory, mail boxes, or even networks for IPC. As a result, IPC solutions differed from implementation to implementation and device to device, which increased development time and hence time to market.
This is inefficient engineering.
To best leverage the capabilities of the UltraScale+ Zynq MPSoC, we need an open framework that allows us to abstract device-specific interfaces and enables the implementation of AMP (asymmetric multi-processing) with greater ease across multiple projects.
OpenAMP developed by the Multicore Association provides everything we need to run different operating systems on the APU and RPU. Of course, for OpenAMP to function from processor to processor, we need an abstraction layer that provides device-specific interfaces (e.g. interrupt handlers, memory requests, and device access). The libmetal library provides these for Xilinx devices through several APIs that abstract the processor.
For our Zynq UltraScale+ MPSoC designs, the provided OpenAMP frameworks enable messaging between the master processor and remote processor and lifecycle management of the remote processor using the following structures:
OpenAMP remoteproc and RPMsg concepts
For this example, we are going to run Linux on the APU and a bare-metal application on the RPU using RPMsg within the kernel space. When we run the RPMsg from within the kernel space, the remote application lifecycle must be managed by Linux. This means the remote processor application does not run independently. If we develop the RPMsg application to run within the Linux user space, the remote processor can run independently.
To create this example first we need to enable remote-processor support within our Linux build. This requires that we rebuild the petalinux project, customising the kernel and root fs. If you are not familiar with building petalinux you might want to read this blog.
Within our petalinux project, the first thing we need to do is enable the remoteproc driver. Using a terminal application within the petalinux project, issue the command:
petalinux-config -c kernel
This will open the kernel configuration menu. Here we can enable the remote-processor drivers which are located under:
Device Drivers -> Remoteproc drivers
Enabling the Remoteproc drivers
The second step is to include the OpenAMP examples within the file system. Again inside the project, issue the command:
petalinux-config -c rootfs
Within the configuration menu, navigate to Filesystem Packages -> misc and enable the packagegroup-petalinux-openamp:
Enabling the package group
The final step before we can rebuild the petalinux image is to update the device tree. We can find an OpenAMP template dtsi file at the location:
Within this location you will find example device trees for both the lockstep and split running modes of the RPU cores.
Select the dtsi file with the desired operating mode and copy the contents into the system-user.dtsi at the following location:
Once the kernel, filesystem, and device tree have been updated, rebuild the petalinux image using the command below:
This will generate an updated Linux build that we can copy onto the boot medium of choice and run on our Zynq UltraScale+ MPSoC design.
Using a terminal connected to our preferred development board (in my case the UltraZed), we can test the OpenAMP examples we included within the Linux file System. There are three examples provided: echo test, matrix multiplication, and proxy server.
I ran the matrix-multiply example because it demonstrates the remote processor performing mathematical calculations.
Using the terminal, I entered the following commands:
Following the on-screen menu and commands, I ran the example which provided the results below:
Executing the Matrix Multiply Example
Matrix Multiply example running
This example shows that the OpenAMP framework is running correctly on the Zynq UltraScale+ MPSoC petalinux build and that we can begin to create our own applications. If you want to run the other two examples refer to UG1186.
If we wish to create our own OpenAMP-based application for the RPU, which uses the kernel space RPMsg, we can create this using the SDK and install the generated elf as an app within petalinux. Although it does mean we need to rebuild the petalinux image again, we will look at how we do this in another blog. There is a lot more for us to explore here.
You can find the example source code on GitHub.
Adam Taylor’s Web site is http://adiuvoengineering.com/.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here
Do you need an extremely powerful yet extremely tiny SOM to implement a challenging embedded design? Enclustra’s credit-card sized Mercury+ XU1 is worth your consideration. It packs a Xilinx Zynq UltraScale+ MPSoC, as much as 8Gbytes of DDR4 SDRAM with ECC, 16Gbytes of eMMC Flash memory, two Gbit Ethernet PHYs, two USB 2.0/3.0 PHYs, and 294 user I/O pins on three 168-pin Hirose FX10 connectors into a mere 74x54mm. That’s a lot of computational horsepower in a teeny, tiny package.
Here’s a block diagram of the Mercury+ XU1 SOM:
By itself, the Zynq UltraScale+ MPSoC gives the SOM a tremendous set of resources including:
As with many product family designs, Enclustra is able to tailor the price/performance of the Mercury+ XU1 SOM by offering multiple versions based on different pin-compatible members of the Zynq UltraScale+ MPSoC family.
Please contact Enclustra directly for more information about the Mercury+ XU1 SOM.
If you’re designing and debugging high-speed logic—as you might with video or radar applications, for example—then perhaps you could use some fast debugging capability. As in really-fast. As in much, much, much faster than JTAG. EXOSTIV Labs has got a solution. It’s called the EXOSTIV FPGA Debug Probe and it uses the [bulletproof] high-speed SerDes ports that are pervasive throughout Xilinx All Programmable device families to extract debug data from running devices with great alacrity.
Here’s a 3-minute video showing the EXOSTIV FPGA Debug Probe communicating with a Xilinx Virtex UltraScale VCU108 Eval Kit, connected through the kit’s high-speed QSFP connector, creating a 50Gbps link between the board and the debugger.
Here’s a second 3-minute video with some additional information. This one shows the EXOSTIV Probe and Dashboard being used to monitor 640 signals in a high-speed video interface design:
You observe the captured data using the EXOSTIV Dashboard, as demonstrated in the above video. The probe and software can handle debug data from as many as 32,768 internal nodes per capture. The mind boggles at the potential complexity being handled here.
According to EXOSTIV, the FPGA Debug Probe and Dashboard give you 200,000x more observability into your design than the tools you might currently be using. That’s a major leap in debugging speed and capability that could save you days or weeks of debugging time.
When you’ve exhausted JTAG’s debug capabilities, consider EXOSTIV.
Today marks the launch of Joshua Montgomery’s Mycroft Mark II open-source Voice Assistant, a hands-free, privacy-oriented smart speaker with a touch screen that also happens to be based on a 6-microphone version of Aaware’s Sound Capture Platform. In fact, according to today’s article on EEWeb written by my good friend and industry gadfly Max Maxfield, Aaware is designing the pcb for the Mycroft Mark II Voice Assistant, which will be based on a Xilinx Zynq UltraScale+ MPSoC according to Max’s article. (It’s billed as a “Xilinx quad-core processor” in the Kickstarter project listing.) According to Max’s article, “This PCB will be designed to support different microphone arrays, displays, and cameras such that it can be used for follow-on products that use the Mycroft open-source voice assistant software stack.”
To repeat: That’s an open-source, consumer-level product based on one of the most advanced MPSoC’s on the market today with at least two 64-bit Arm Cortex-A53 processors and two 32-bit Arm Cortex-R5 processors plus a generous chunk of the industry’s most advanced programmable logic based on Xilinx’s 16nm UltraScale+ technology.
Aaware’s technology starts with an array of six individual microphones. The outputs of these microphones are combined and processed with several Aaware-developed algorithms including acoustic echo cancellation, noise reduction and beamforming that allow the Mycroft Mark II smart speaker to isolate the voice of a speaking human even in noisy environments. (See “Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.”) The combination of Aaware’s Sound Capture Platform, Mycroft’s Mark II smart speaker open-source code, and the immensely powerful Zynq UltraScale+ MPSoC give you an incredible platform for developing your own end products.
Here’s a 3-minute video demo of the Mycroft Mark II smart speaker’s capabilities:
Pledge $99 on Kickstarter and you’ll get a DIY dev kit that includes the pcbs, an LCD, speakers, and cables but no handsome plastic housing. Pledge $129—thirty bucks more—and you get a built unit in an elegant housing. There are higher pledge levels too.
What’s the risk? As of today, the first day of the pledge campaign, the project is 167% funded, so it’s already a “go.” There are 28 days left to jump in. Also, Mycroft delivered the Mark I speaker, a previous Kickstarter project, last July so the company has a track record of successful Kickstarter project completion.
In a new report titled “Hitting the accelerator: the next generation of machine-learning chips,” Deloitte Global predicted that “by the end of 2018, over 25 percent of all chips used to accelerate machine learning in the data center will be FPGAs and ASICs.” The report then continues: “These new kinds of chips should increase dramatically the use of ML, enabling applications to consume less power and at the same time become more responsive, flexible and capable, which is likely to expand the addressable market.” And later in the Deloitte Global report:
“There will also be over 200,000 FPGA and 100,000 ASIC chips sold for ML applications.”
“…the new kinds of chips may dramatically increase the use of ML, enabling applications to use less power and at the same time become more responsive, flexible and capable, which is likely to expand the addressable market…”
“Total 2018 FPGA chip volume for ML would be a minimum of 200,000. The figure is almost certainly going to be higher, but by exactly how much is difficult to predict.”
These sorts of statements are precisely why Xilinx has rapidly expanded its software offerings for machine-learning development from the edge to the cloud. That includes the reVISION stack for developing responsive and reconfigurable vision systems and the Reconfigurable Acceleration stack for developing and deploying platforms at cloud scale.
Check out the Xilinx Machine Learning Web page for more in-depth information.
XIMEA has added two new high-speed industrial cameras to its xiB-64 family: the 1280x864-pixel CB013 capable of imaging at 3500fps and the 1920x1080-pixel CB019 capable of imaging at 2500fps. As with all digital cameras, the story for these cameras starts with the sensors. The CB013 camera is based on a LUXIMA Technology LUX13HS 1.1Mpixel sensor and the CB019 is based on a LUXIMA Technology LUX19HS 2Mpixel sensor. Both cameras use PCIe 3.0 x8 interfaces capable of 64Gbps sustained transfer rates. Use of the PCIe interface allows a host PC to use DMA for direct transfers of the video stream into the computer’s main memory with virtually no CPU overhead.
Both cameras are also based on a Xilinx Kintex UltraScale KU035 FPGA. Why such a fast FPGA in an industrial video camera? The frame rates and 64Gbps PCIe interface transfer rate are all the explanation you need. The Kintex UltraScale KU035 FPGA has 444K system logic cells and 1700 DSP48E2 slices—ample for handling the different sensors in the camera product line and just about any sort of video processing that’s needed. The Kintex UltraScale FPGA also incorporates two integrated (hardened) PCIe Gen3 IP blocks with sixteen bulletproof 16.3Gbps SerDes transceivers to handle the camera’s PCIe Gen3 interface.
Note that XIMEA has previously introduced large camera family lines based on Xilinx FPGAs, such as the xiC family based on the Xilinx Artix-7 XC7A50T FPGA. (See “Ximea builds 10-member xiC family of USB3 industrial video cameras around Sony CMOS sensors and Xilinx Artix-7 FPGA.”)
For more information about these cameras, please contact XIMEA directly.
Xcell Daily has covered the FPGA-accelerated AWS EC2 F1 instances from Amazon Web Services several times. The AWS EC2 F1 instances allows AWS customers to develop accelerated code in C, C++, OpenCL, Verilog, or VHDL and run it on Amazon servers augmented with hardware-accelerated cards based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. (See below.)
A new AWS case study titled “Xilinx Speeds Testing Time, Increases Developer Productivity Using AWS” turns the tables. It discusses Xilinx’s use of AWS services to speed development of Xilinx development software such as the Vivado and SDx development environments. Xilinx employs extensive regression testing when developing new releases of these complex tools and the resulting demand spikes called for more “elastic” server resources. (Amazon’s “EC2” designation stands for “Elastic Compute Cloud.”)
As the case study states:
“Xilinx addressed its infrastructure-scaling problem by migrating to a high-performance computing (HPC) cluster running on Amazon Web Services (AWS). ‘We evaluated several cloud providers and chose AWS because it had the best tools and most mature solution,’” says [Ambs] Kesavan, [software engineering and DevOps director at Xilinx].
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Everspin’s nvNITRO NVMe Storage Accelerator is a persistent-memory PCIe storage card for cloud and data-center applications that delive4rs up to 1.46 million IOPS for random 4Kbyte mixed 70/30 read/write operations. It’s based on Everspin’s STT-MRAM (spin-transfer torque magnetic RAM) chips and uses a Xilinx Kintex UltraScale KU060 FPGA to implement the MRAM controller and the board’s PCIe Gen3 x8 host interface. Everspin has just published an nvNITRO application note titled “Accelerating Fintech Applications with Lossless and Ultra-Low Latency Synchronous Logging using nvNITRO” that details the use of the nvNITRO Storage Accelerator to speed cloud-based financial transactions. The application note explores how Everspin nvNITRO technology can improve FinTech (Financial Technology) performance without creating additional compliance risks.
If you haven’t looked deeply into the intricacies of financial trading transactions, the app note starts with a clarifying block diagram, which shows the multiple layers built into the transaction process:
The diagram shows many opportunities for accelerating transactions, which is important because in this market, microseconds translate into millions of dollars gained or lost.
If you’re developing cloud-based systems and acceleration is important, whether or not you’re developing FinTech applications, take a few minutes to read the Everspin app note.
For more information about the Everspin nvNITRO Storage Accelerator, see “Everspin’s new MRAM-based nvNITRO NVMe card delivers Optane-crushing 1.46 million IOPS (4Kbyte, mixed 70/30 read/write).”
Please contact Everspin for more information about the nvNITRO Storage Accelerator.
5G NR gNodeB deployments start as early as mid CY2019. Three key challenges:
A Xilinx session at Mobile World Congress (MWC) on March 1 titled “Enabling 5G NR Deployments” will discuss these three facets of 5G NR and strategies needed to overcome these challenges in three separate presentations. The presentation titles are:
The three presenters are:
The three presenters will also discuss real-world lessons learned while working with the supply chain—including operators, system vendors, semiconductor and software providers—in building 5G proofs of concepts and trial testbeds.
The hour-long event starts at 1:00pm and is being held in Hall 8.0, NEXTech Theatre F at MWC in Barcelona.
Click here for more details.
Earlier this month at the Xilinx Developers Forum (XDF) in Frankfurt, Huawei’s Principal Hardware Architect Craig Davies gave a half-hour presentation about Huawei Cloud’s FaaS (FPGAs as a Service). His primary mission: to enlist new Huawei Cloud partners to expand the company’s FACS (FPGA Accelerated Cloud Server) FaaS ecosystem. (Huawei announced the FACS offering at HUAWEI CONNECT 2017 last September, see “Huawei bases new, accelerated cloud service and FPGA Accelerated Cloud Server on Xilinx Virtex UltraScale+ FPGAs.”)
Huawei’s FACS cloud offering is based on a PCIe server card that incorporates a Xilinx Virtex UltraScale+ VU9P FPGA. (Huawei also offers the board for on-premise installations.) In addition to the hardware, Huawei offers three major development tools for FACS:
With these offerings, Davies said, Huawei is looking to add partners to expand its ecosystem and is particularly interested in talking to companies that offer:
There’s a Huawei Cloud Marketplace that serves as an outlet for FACS applications. The company is also welcoming end users to try the service.
Here’s a video of Davies’ 32-minute presentation at XDF:
Amazon’s Senior Director of Business Development and Product, Gadi Hutt, gave an in-depth presentation at the recent Xilinx Developers Forum in Frankfurt, Germany where he detailed the specifics, advantages, and the nuts-and-bolts “how to” with respect to using the FPGA-based AWS EC2 F1 instances to accelerate your business.
First, Hutt gave one of the most succinct definitions of “the cloud” I’ve heard: “the on-demand delivery of compute, storage, networking, etc. services.” This definition is free of the niggling details such as hardware, networking, power, and cooling that you are now free to ignore.
Then Hutt listed the advantages of cloud-based services:
From there, Hutt provided a deep explanation of the steps you need to take to distribute cloud-based services globally. He also quoted a Gartner estimate, which said that AWS (Amazon Web Services) has more compute capacity than all of the other cloud providers combined. Certainly, this Gartner report puts AWS far in the upper right corner of the Gartner Magic Quadrant for Cloud Infrastructure as a Service, Worldwide.
Using AWS allows your company to “get out of IT” and focus on providing specialized services where you can add value, said Hutt. “You can focus on your core business,” he continued.
Then he turned to the specifics of the AWS EC2 F1 instances, which are based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Two of the many points Hutt made include:
“There’s pretty good maturity in the software ecosystem, today,” Hutt observed.
One of Hutt’s conclusions with respect to AWS EC2 F1 instances:
“There’s a tremendous opportunity for FPGAs to shine in a number of areas.”
If you’re interested in FPGA-based cloud acceleration, here’s the 48-minute video with Gadi Hutt’s full presentation at XDF:
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Earlier this month, Xilinx held a developer’s forum in Frankfurt, Germany and Xilinx’s Senior Director for Software and IP Ramine Roan discussed the growing role of Xilinx All Programmable devices in his opening remarks, which appear in a New Electronics article written by Neil Tyler titled “Resurgence of interest in FPGAs helped by new services via the Cloud.” Roane started by stating something that any design team already knows: CPU architectures are failing to meet the demand of increasing workloads because Dennard frequency and power scaling—often erroneously lumped into Moore’s Law, which is really about transistor and density scaling—essentially died several years ago after several decades of robust health. The current workaround—multicore architectures—rapidly hits its own limits in most embedded systems where there just aren’t enough tasks to distribute to dozens of processor cores.
The article then quotes Roane:
“There are too many transistors switching at the same time and current leakage at lower geometries is hitting power constraint limits, and this is all happening at a time when workload demand is growing exponentially both in the Cloud and at the edge.”
One solution, hardware application accelerators, only make sense if the production volumes are justified. For that you need a killer app said Roane.
Problem: there just aren’t that many killer apps.
The current situation plays to the strengths of Xilinx All Programmable devices, which can be reconfigured for a truly wide range of applications. “They provide configurable processor sub-systems and hardware that can be reconfigured dynamically,” said Roane.
The problem, of course, is that taking advantage of the programmable hardware resources in Xilinx devices has not been as easy as it might be. In the past, you needed specialized hardware-design skills; You needed to know Verilog or VHDL; You needed to wade into possibly unfamiliar hardware waters.
Roane emphasized that things are very different today. As the article states, “Xilinx and its growing ecosystem of partners are now delivering a much richer development stack so that hardware, embedded and application software developers can program them more easily by using higher level programming options, like C, C++ and OpenCL.”
“We are now able to deliver a development stack that designers are increasingly familiar with and which is also available on the Cloud via secure cloud services platforms,” added Roane, referring to Xilinx-based cloud acceleration offerings from Amazon Web Services (AWS EC2 F1 instances) and Alibaba Cloud.
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
For more information about the Xilinx-based Alibaba Cloud F2 offering in Xcell Daily, see: