Last week, the Mycroft Mark II Privacy-Centric Open Voice Assistant Kickstarter Project, which is based on Aaware’s far-field Sound Capture Platform and the Xilinx Zynq UltraScale+ MPSoC, hit 300% funding on Kickstarter. Today, the pledge level hit 400%—$200k—with 1235 backers. There are still 18 days left in the funding campaign; still time for you to get in on this very interesting, multi-talented smart speaker and low-cost, open-source Zynq UltraScale+ MPSoC development platform.
Meanwhile, there seems to be a new pledge level that I don’t recall: a $179 level that includes a 1080p video camera. That’s in addition to the touch screen and voice input, which gives the Mycroft Mark II an even more interesting user interface. There are only a limited number of $179 pledge options, with 177 remaining as of the posting of this blog.
In addition, Fast Company has also published an article on the Mycroft Mark I Kickstarter project titled “Can Mycroft’s Privacy-Centric Voice Assistant Take On Alexa And Google?” Be sure to take a look.
For more information about the Mycroft Mark II Open Voice Assistant, see:
For more information about Aaware’s far-field Sound Capture Platform, see:
Last week, Xilinx posted a 2-minute video showing a Xilinx Virtex UltraScale+ XCVU37P HBM-enhanced FPGA operating with the on-device HBM DRAM communicating at full speed (460Gbytes/sec), error-free, over 32 channels. (See “Virtex UltraScale+ FPGA augmented with co-packaged HBM DRAM operating at full speed (460Gbytes/sec), error-free, on the very first day of silicon bringup.”)
Bittware has already announced two PCIe boards for these HBM-enhanced Xilinx FPGAs:
Bittware’s XUPVVH double-slot PCIe board, block diagram
Bittware’s XUPSVH single-slot PCIe board, block diagram
As a reminder, Bittware had previously announced the XUPVV4 based on the Virtex UltraScale+ VU13P FPGA. (See “Warning, Naked FPGA: Bittware’s XUPVV4 PCIe card goes big using the Xilinx Virtex UltraScale+ VU13P “lidless” FPGA.”)
Please contact Bittware directly for more information about these PCIe boards.
Rigol’s new RSA5000 real-time spectrum analyzer allows you to capture, identify, isolate, and analyze complex RF signals with a 40MHz real-time bandwidth over either a 3.2GHz or 6.5GHz signal span. It’s designed for engineers working on RF designs in the IoT and IIot markets as well as industrial, scientific, and medical equipment. Rigol was demonstrating the RSA5000 real-time spectrum analyzer at this week’s DesignCon being held at the Santa Clara Convention Center. I listened to a presentation from Rigol’s North American General Manager Mike Rizzo and then a demo by Rigol’s Director of Product Marketing & Software Applications Chris Armstrong, both captured in the 2.5-minute video below.
Rigol RSA5000 Real-Time Spectrum Analyzer
Based on what I saw in the demo, this is an extremely responsive instrument—far more responsive than a swept spectrum analyzer—with several visualization display modes to help you isolate the significant signal in a sea of signals and noise, in real time. It’s capable of continuously executing 146,484 FFTs/sec, which results in a minimum 100% POI (probability of intercept) of 7.45μsec. You need some real DSP horsepower to achieve that sort of performance and the Rigol RSA5000 real-time spectrum analyzer gets this performance from a pair of Xilinx Zynq Z-7015 SoCs. (You'll find many more details about real-time spectrum analysis and the RSA5000 Real-Time Spectrum Analyzer in the Rigol app note "Realtime Spectrum Analyzer vs Spectrum Analyzer," attached at the end of this post. See below.)
Rigol RSA5000 Real-Time Spectrum Analyzer Display Modes
Here’s the short presentation and demo of the Rigol RSA5000 real-time spectrum analyzer from DesignCon 2018:
Mike Rizzo told me that the Rigol design engineers selected the Zynq Z-7015 SoCs for three main reasons:
If you’re looking for a very capable spectrum analyzer, give the Rigol RSA5000 a look. If you’re designing your own real-time system and need high-speed computation coupled with fast user response, take a look at the line of Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.
The 2-minute video below shows you an operational Xilinx Virtex UltraScale+ XCVU37P FPGA, which is enhanced with co-packaged HBM (high-bandwidth memory) DRAM using Xilinx’s well-proven, 3rd-generation 3D manufacturing process. (Xilinx started shipping 3D FPGAs way back in 2011, starting with the Virtex-7 2000T and we’ve been shipping these types of devices ever since.)
This video was made on the very first day of silicon bringup for the device and it is already operating at full speed (460Gbytes/sec), error-free, over 32 channels. The Virtex UltraScale+ XCVU37P is one big All Programmable device with:
Whatever your requirements, whatever your application, chances are this extremely powerful FPGA will deliver all of the heavy lifting (processing, memory, and I/O) that you need.
Here’s the video:
For more information about the Virtex UltraScale+ HBM-enhanced device family, see “Xilinx Virtex UltraScale+ FPGAs incorporate 32 or 64Gbits of HBM, delivers 20x more memory bandwidth than DDR.”
Quite simply, Vadatech’s AMC584 module is an I/O monster. Its immense I/O capabilities start with the five QSFP28 100GbE-capable cages on the module’s front panel. Then there are the AMC Tongues. AMC Tongue 1 is fully routed with SerDes ports and there are as many as 20 lanes routed to Tongue 2. The AMC584 also contains a high-speed Zone 3 connector that provides the primary digital I/O routing and enables multi-module configurations.
The SerDes ports on these boards are all implemented in a Xilinx Virtex UltraScale+ XCVU13P FPGA, which is itself an I/O monster. It has 128 on-chip GTY 32.75Gbs SerDes transceivers, so it makes an ideal foundation for an I/O monster board.
Here’s a block diagram of the Vadatech AMC584 module:
Vadatech AMC584 Module Block Diagram
Now, before you get the idea that the Virtex UltraScale+ XCVU13P FPGA is just I/O, please understand that there are also 3780K system logic cells, 12,288 DSP48E2 slices, 94.5Mbits of BRAM, and 360Mbits of UltraRAM on the device as well, so it’s a DSP monster and a processing monster too. The Virtex UltraScale+ XCVU13P FPGA is capable of implementing just about any system you might imagine.
And just in case the hundreds of Mbits of SRAM on the Virtex UltraScale+ XCVU13P FPGA aren’t sufficient for your processing needs, the AMC584 module also has two banks of DDR4 SDRAM on board.
Vadatech AMC584 Module
Please contact Vadatech directly for more information about the AMC584 Module.
Mycroft AI’s Mycroft Mark II Open Voice Assistant, which is based on Aaware’s far-field Sound Capture Platform and the Xilinx Zynq UltraScale+ MPSoC, is a Kickstarter project initiated last Friday. (See “New Kickstarter Project: The Mycroft Mark II open-source Voice Assistant is based on Aaware’s Sound Capture Platform running on a Zynq UltraScale+ MPSoC.”) The Mycroft Mark II project was fully funded in an astonishingly short seven hours, guaranteeing that the project would proceed. After only four days, the project has exceeded its pledge goal of $50,000 by 300%. As of this writing, 935 backers have pledged $150,801 so the project is most definitely a “go” and the project team is currently developing stretch goals to extend the project’s scope.
Here are two reasons you might want to participate in this Kickstarter campaign:
Mycroft Mark II Voice Assistant Xray Diagram
If you’d like some intense training on the Xilinx Zynq UltraScale+ MPSoC—one of the most powerful embedded application processor (plus programmable logic) families that you can throw at an embedded-processing application—then Hardent’s 3-day class titled “Embedded System Design for the Zynq UltraScale+ MPSoC” might just be what you’re looking for. There’s a live, E-Learning version kicking off February 7 with live, in-person classes scheduled for North America from February 21 (in Ottawa) through August. The schedule’s on the referenced Web page.
You certainly might want a comprehensive course outline before you decide, so here it is:
Curtiss-Wright’s VPX3-535 3U OpenVPX transceiver module implements a single-slot, dual-channel, 6Gsamples/sec analog data-acquisition and processing system using two 12-bit, 6Gsamples/sec ADCs and two 12-bit, 6Gsamples/sec DACs. This is the type of capability you need for demanding applications such as radar, Signal Intelligence (SIGINT), Electronic Warfare (EW), and Software Defined Radio (SDR). This amount of analog-to-digital and digital-to-analog conversion capability demands wicked-fast digital processing and on the VPX3-535 transceiver module, that digital processing comes in the form of two of Xilinx’s most powerful All Programmable devices: a Virtex UltraScale+ VU9P and a Zynq UltraScale+ ZU4 MPSoC.
Here’s a block diagram of the Curtiss-Wright VPX3-535 module:
The VPX3-535 is Curtiss-Wright’s first publicly announced module to feature full compliance to the VITA 48.8 Air-Flow-Through (AFT) cooling standard, which ensures optimal performance in the harshest conditions. VITA 48.8 provides a low-cost, effective means to cool high-power COTS 3U and 6U VPX modules that dissipate ~150W+.
At the same time, Curtiss-Wright is also introducing a conduction-cooled variant, called the VPX3-534, which designed for applications that do not require the performance of the VPX3-535. The VPX3-534 supports the same dual-channel, 12-bit, 6Gsamples/sec ADC and DAC channels as the VPX3-535 but it replaces the Virtex UltraScale+ FPGA with a Xilinx Kintex UltraScale KU115 FPGA. This module also supports an option for four 3Gsamples/sec ADC channels.
Please contact Curtiss-Wright directly for more information about the VPX3-535 and VPX3-534 OpenVPX transceiver modules.
Keysight published a 14-minute video back in 2015 that gives you the basics behind RF beamforming and its use in 5G applications. The video also invites you to download a free, 30-day trial of Keysight’s SystemVue with Keysight’s 5G simulation library to try out some of the concepts discussed in the video and the link appears to be active still.
Here’s the video:
Meanwhile, should you need an implementation technology for RF beamforming (5G or otherwise), allow me to suggest that the new Xilinx Zynq UltraScale+ RFSoC with its many integrated RF ADCs and DACs be at the top of your technology choices. There is literally no other device like the Zynq UltraScale+ RFSoC. It’s in a category of one.
For more information about the Zynq UltraScale+ RFSoC, see:
By Adam Taylor
The Zynq UltraScale MPSoC is a complex system on chip containing as many as four Arm Cortex-A53 application processors, a dual-core Arm Cortex-R5 real-time processor, a Mali GPU, and of course programmable logic. When it comes to generating our software application, we want to use the A53-based Application Processing Unit (APU) and R5 Real-Time Processing Unit (RPU) cores appropriately. This means we want to use the APU for computationally intensive, high-level applications or virtualization while using the RPU for real-time control and monitoring.
This means the APU will likely be running an operating system such as Linux while the real-time needs are addressed by the RPU using bare-metal software or a simplified OS such as FreeRTOS. Often an overall system solution requires communication between the APU and RPU to achieve the desired solution functionality but communication between different processors running different applications has previously been challenging and ad-hoc with inter-processor communications (IPC) using shared memory, mail boxes, or even networks for IPC. As a result, IPC solutions differed from implementation to implementation and device to device, which increased development time and hence time to market.
This is inefficient engineering.
To best leverage the capabilities of the UltraScale+ Zynq MPSoC, we need an open framework that allows us to abstract device-specific interfaces and enables the implementation of AMP (asymmetric multi-processing) with greater ease across multiple projects.
OpenAMP developed by the Multicore Association provides everything we need to run different operating systems on the APU and RPU. Of course, for OpenAMP to function from processor to processor, we need an abstraction layer that provides device-specific interfaces (e.g. interrupt handlers, memory requests, and device access). The libmetal library provides these for Xilinx devices through several APIs that abstract the processor.
For our Zynq UltraScale+ MPSoC designs, the provided OpenAMP frameworks enable messaging between the master processor and remote processor and lifecycle management of the remote processor using the following structures:
OpenAMP remoteproc and RPMsg concepts
For this example, we are going to run Linux on the APU and a bare-metal application on the RPU using RPMsg within the kernel space. When we run the RPMsg from within the kernel space, the remote application lifecycle must be managed by Linux. This means the remote processor application does not run independently. If we develop the RPMsg application to run within the Linux user space, the remote processor can run independently.
To create this example first we need to enable remote-processor support within our Linux build. This requires that we rebuild the petalinux project, customising the kernel and root fs. If you are not familiar with building petalinux you might want to read this blog.
Within our petalinux project, the first thing we need to do is enable the remoteproc driver. Using a terminal application within the petalinux project, issue the command:
petalinux-config -c kernel
This will open the kernel configuration menu. Here we can enable the remote-processor drivers which are located under:
Device Drivers -> Remoteproc drivers
Enabling the Remoteproc drivers
The second step is to include the OpenAMP examples within the file system. Again inside the project, issue the command:
petalinux-config -c rootfs
Within the configuration menu, navigate to Filesystem Packages -> misc and enable the packagegroup-petalinux-openamp:
Enabling the package group
The final step before we can rebuild the petalinux image is to update the device tree. We can find an OpenAMP template dtsi file at the location:
Within this location you will find example device trees for both the lockstep and split running modes of the RPU cores.
Select the dtsi file with the desired operating mode and copy the contents into the system-user.dtsi at the following location:
Once the kernel, filesystem, and device tree have been updated, rebuild the petalinux image using the command below:
This will generate an updated Linux build that we can copy onto the boot medium of choice and run on our Zynq UltraScale+ MPSoC design.
Using a terminal connected to our preferred development board (in my case the UltraZed), we can test the OpenAMP examples we included within the Linux file System. There are three examples provided: echo test, matrix multiplication, and proxy server.
I ran the matrix-multiply example because it demonstrates the remote processor performing mathematical calculations.
Using the terminal, I entered the following commands:
Following the on-screen menu and commands, I ran the example which provided the results below:
Executing the Matrix Multiply Example
Matrix Multiply example running
This example shows that the OpenAMP framework is running correctly on the Zynq UltraScale+ MPSoC petalinux build and that we can begin to create our own applications. If you want to run the other two examples refer to UG1186.
If we wish to create our own OpenAMP-based application for the RPU, which uses the kernel space RPMsg, we can create this using the SDK and install the generated elf as an app within petalinux. Although it does mean we need to rebuild the petalinux image again, we will look at how we do this in another blog. There is a lot more for us to explore here.
You can find the example source code on GitHub.
Adam Taylor’s Web site is http://adiuvoengineering.com/.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here
Do you need an extremely powerful yet extremely tiny SOM to implement a challenging embedded design? Enclustra’s credit-card sized Mercury+ XU1 is worth your consideration. It packs a Xilinx Zynq UltraScale+ MPSoC, as much as 8Gbytes of DDR4 SDRAM with ECC, 16Gbytes of eMMC Flash memory, two Gbit Ethernet PHYs, two USB 2.0/3.0 PHYs, and 294 user I/O pins on three 168-pin Hirose FX10 connectors into a mere 74x54mm. That’s a lot of computational horsepower in a teeny, tiny package.
Here’s a block diagram of the Mercury+ XU1 SOM:
By itself, the Zynq UltraScale+ MPSoC gives the SOM a tremendous set of resources including:
As with many product family designs, Enclustra is able to tailor the price/performance of the Mercury+ XU1 SOM by offering multiple versions based on different pin-compatible members of the Zynq UltraScale+ MPSoC family.
Please contact Enclustra directly for more information about the Mercury+ XU1 SOM.
If you’re designing and debugging high-speed logic—as you might with video or radar applications, for example—then perhaps you could use some fast debugging capability. As in really-fast. As in much, much, much faster than JTAG. EXOSTIV Labs has got a solution. It’s called the EXOSTIV FPGA Debug Probe and it uses the [bulletproof] high-speed SerDes ports that are pervasive throughout Xilinx All Programmable device families to extract debug data from running devices with great alacrity.
Here’s a 3-minute video showing the EXOSTIV FPGA Debug Probe communicating with a Xilinx Virtex UltraScale VCU108 Eval Kit, connected through the kit’s high-speed QSFP connector, creating a 50Gbps link between the board and the debugger.
Here’s a second 3-minute video with some additional information. This one shows the EXOSTIV Probe and Dashboard being used to monitor 640 signals in a high-speed video interface design:
You observe the captured data using the EXOSTIV Dashboard, as demonstrated in the above video. The probe and software can handle debug data from as many as 32,768 internal nodes per capture. The mind boggles at the potential complexity being handled here.
According to EXOSTIV, the FPGA Debug Probe and Dashboard give you 200,000x more observability into your design than the tools you might currently be using. That’s a major leap in debugging speed and capability that could save you days or weeks of debugging time.
When you’ve exhausted JTAG’s debug capabilities, consider EXOSTIV.
Today marks the launch of Joshua Montgomery’s Mycroft Mark II open-source Voice Assistant, a hands-free, privacy-oriented smart speaker with a touch screen that also happens to be based on a 6-microphone version of Aaware’s Sound Capture Platform. In fact, according to today’s article on EEWeb written by my good friend and industry gadfly Max Maxfield, Aaware is designing the pcb for the Mycroft Mark II Voice Assistant, which will be based on a Xilinx Zynq UltraScale+ MPSoC according to Max’s article. (It’s billed as a “Xilinx quad-core processor” in the Kickstarter project listing.) According to Max’s article, “This PCB will be designed to support different microphone arrays, displays, and cameras such that it can be used for follow-on products that use the Mycroft open-source voice assistant software stack.”
To repeat: That’s an open-source, consumer-level product based on one of the most advanced MPSoC’s on the market today with at least two 64-bit Arm Cortex-A53 processors and two 32-bit Arm Cortex-R5 processors plus a generous chunk of the industry’s most advanced programmable logic based on Xilinx’s 16nm UltraScale+ technology.
Aaware’s technology starts with an array of six individual microphones. The outputs of these microphones are combined and processed with several Aaware-developed algorithms including acoustic echo cancellation, noise reduction and beamforming that allow the Mycroft Mark II smart speaker to isolate the voice of a speaking human even in noisy environments. (See “Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.”) The combination of Aaware’s Sound Capture Platform, Mycroft’s Mark II smart speaker open-source code, and the immensely powerful Zynq UltraScale+ MPSoC give you an incredible platform for developing your own end products.
Here’s a 3-minute video demo of the Mycroft Mark II smart speaker’s capabilities:
Pledge $99 on Kickstarter and you’ll get a DIY dev kit that includes the pcbs, an LCD, speakers, and cables but no handsome plastic housing. Pledge $129—thirty bucks more—and you get a built unit in an elegant housing. There are higher pledge levels too.
What’s the risk? As of today, the first day of the pledge campaign, the project is 167% funded, so it’s already a “go.” There are 28 days left to jump in. Also, Mycroft delivered the Mark I speaker, a previous Kickstarter project, last July so the company has a track record of successful Kickstarter project completion.
XIMEA has added two new high-speed industrial cameras to its xiB-64 family: the 1280x864-pixel CB013 capable of imaging at 3500fps and the 1920x1080-pixel CB019 capable of imaging at 2500fps. As with all digital cameras, the story for these cameras starts with the sensors. The CB013 camera is based on a LUXIMA Technology LUX13HS 1.1Mpixel sensor and the CB019 is based on a LUXIMA Technology LUX19HS 2Mpixel sensor. Both cameras use PCIe 3.0 x8 interfaces capable of 64Gbps sustained transfer rates. Use of the PCIe interface allows a host PC to use DMA for direct transfers of the video stream into the computer’s main memory with virtually no CPU overhead.
Both cameras are also based on a Xilinx Kintex UltraScale KU035 FPGA. Why such a fast FPGA in an industrial video camera? The frame rates and 64Gbps PCIe interface transfer rate are all the explanation you need. The Kintex UltraScale KU035 FPGA has 444K system logic cells and 1700 DSP48E2 slices—ample for handling the different sensors in the camera product line and just about any sort of video processing that’s needed. The Kintex UltraScale FPGA also incorporates two integrated (hardened) PCIe Gen3 IP blocks with sixteen bulletproof 16.3Gbps SerDes transceivers to handle the camera’s PCIe Gen3 interface.
Note that XIMEA has previously introduced large camera family lines based on Xilinx FPGAs, such as the xiC family based on the Xilinx Artix-7 XC7A50T FPGA. (See “Ximea builds 10-member xiC family of USB3 industrial video cameras around Sony CMOS sensors and Xilinx Artix-7 FPGA.”)
For more information about these cameras, please contact XIMEA directly.
Everspin’s nvNITRO NVMe Storage Accelerator is a persistent-memory PCIe storage card for cloud and data-center applications that delive4rs up to 1.46 million IOPS for random 4Kbyte mixed 70/30 read/write operations. It’s based on Everspin’s STT-MRAM (spin-transfer torque magnetic RAM) chips and uses a Xilinx Kintex UltraScale KU060 FPGA to implement the MRAM controller and the board’s PCIe Gen3 x8 host interface. Everspin has just published an nvNITRO application note titled “Accelerating Fintech Applications with Lossless and Ultra-Low Latency Synchronous Logging using nvNITRO” that details the use of the nvNITRO Storage Accelerator to speed cloud-based financial transactions. The application note explores how Everspin nvNITRO technology can improve FinTech (Financial Technology) performance without creating additional compliance risks.
If you haven’t looked deeply into the intricacies of financial trading transactions, the app note starts with a clarifying block diagram, which shows the multiple layers built into the transaction process:
The diagram shows many opportunities for accelerating transactions, which is important because in this market, microseconds translate into millions of dollars gained or lost.
If you’re developing cloud-based systems and acceleration is important, whether or not you’re developing FinTech applications, take a few minutes to read the Everspin app note.
For more information about the Everspin nvNITRO Storage Accelerator, see “Everspin’s new MRAM-based nvNITRO NVMe card delivers Optane-crushing 1.46 million IOPS (4Kbyte, mixed 70/30 read/write).”
Please contact Everspin for more information about the nvNITRO Storage Accelerator.
Earlier this month at the Xilinx Developers Forum (XDF) in Frankfurt, Huawei’s Principal Hardware Architect Craig Davies gave a half-hour presentation about Huawei Cloud’s FaaS (FPGAs as a Service). His primary mission: to enlist new Huawei Cloud partners to expand the company’s FACS (FPGA Accelerated Cloud Server) FaaS ecosystem. (Huawei announced the FACS offering at HUAWEI CONNECT 2017 last September, see “Huawei bases new, accelerated cloud service and FPGA Accelerated Cloud Server on Xilinx Virtex UltraScale+ FPGAs.”)
Huawei’s FACS cloud offering is based on a PCIe server card that incorporates a Xilinx Virtex UltraScale+ VU9P FPGA. (Huawei also offers the board for on-premise installations.) In addition to the hardware, Huawei offers three major development tools for FACS:
With these offerings, Davies said, Huawei is looking to add partners to expand its ecosystem and is particularly interested in talking to companies that offer:
There’s a Huawei Cloud Marketplace that serves as an outlet for FACS applications. The company is also welcoming end users to try the service.
Here’s a video of Davies’ 32-minute presentation at XDF:
Amazon’s Senior Director of Business Development and Product, Gadi Hutt, gave an in-depth presentation at the recent Xilinx Developers Forum in Frankfurt, Germany where he detailed the specifics, advantages, and the nuts-and-bolts “how to” with respect to using the FPGA-based AWS EC2 F1 instances to accelerate your business.
First, Hutt gave one of the most succinct definitions of “the cloud” I’ve heard: “the on-demand delivery of compute, storage, networking, etc. services.” This definition is free of the niggling details such as hardware, networking, power, and cooling that you are now free to ignore.
Then Hutt listed the advantages of cloud-based services:
From there, Hutt provided a deep explanation of the steps you need to take to distribute cloud-based services globally. He also quoted a Gartner estimate, which said that AWS (Amazon Web Services) has more compute capacity than all of the other cloud providers combined. Certainly, this Gartner report puts AWS far in the upper right corner of the Gartner Magic Quadrant for Cloud Infrastructure as a Service, Worldwide.
Using AWS allows your company to “get out of IT” and focus on providing specialized services where you can add value, said Hutt. “You can focus on your core business,” he continued.
Then he turned to the specifics of the AWS EC2 F1 instances, which are based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Two of the many points Hutt made include:
“There’s pretty good maturity in the software ecosystem, today,” Hutt observed.
One of Hutt’s conclusions with respect to AWS EC2 F1 instances:
“There’s a tremendous opportunity for FPGAs to shine in a number of areas.”
If you’re interested in FPGA-based cloud acceleration, here’s the 48-minute video with Gadi Hutt’s full presentation at XDF:
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
A quick look at the latest product table for the Xilinx Zynq UltraScale+ RFSoC will tell you that the sample rate for the devices’ RF-class, 14-bit DAC has jumped to 6.554Gsamples/sec, up from 6.4Gsamples/sec. I asked Senior Product Line Manager Wouter Suverkropp about the change and he told me that the increase supports “…an extra level of oversampling for DOCSIS3.1 [designers]. The extra oversampling gives them 3dB processing gain and therefore simplifies the external circuits even further.”
Zynq UltraScale+ RFSoC Conceptual Diagram
For more information about the Zynq UltraScale+ RFSoC, see:
Javier Alejandro Varela and Professor Dr-Ing Norbert Wehn at of the University of Kaiserslautern’s Microelectronic Systems Design Research Group have just published a White Paper titled “Running Financial Risk Management Applications on FPGA in the Amazon Cloud” and the last sentence in the White Paper’s abstract reads:
“…our FPGA implementation achieves a 10x speedup on the compute intensive part of the code, compared to an optimized parallel implementation on multicore CPU, and it delivers a 3.5x speedup at application level for the given setup.”
The University of Kaiserslautern’s Microelectronic Systems Design Research Group has been working on accelerating financial applications using FPGAs in connection with high-performance computing systems since 2010 and that research has recently migrated to cloud-based computing systems including Amazon’s EC2 F1 Instance, which is based on Xilinx Virtex Ultrascale+ FPGAs. The results in this White Paper are based on using OpenCL code and the Xilinx SDAccel development environment.
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Xilinx has announced availability of automotive-grade Zynq UltraScale+ MPSoCs, enabling development of safety critical ADAS and Autonomous Driving systems. The 4-member Xilinx Automotive XA Zynq UltraScale+ MPSoC family is qualified according to AEC-Q100 test specifications with full ISO 26262 ASIL-C level certification and is ideally suited for various automotive platforms by delivering the right performance/watt while integrating critical functional safety and security features.
The XA Zynq UltraScale+ MPSoC family has been certified to meet ISO 26262 ASIL-C level requirements by Exida, one of the world's leading accredited certification companies specializing in automation and automotive system safety and security. The product includes a "safety island" designed for real-time processing functional safety applications that has been certified to meet ISO 26262 ASIL-C level requirements. In addition to the safety island, the device’s programmable logic can be used to create additional safety circuits tailored for specific applications such as monitors, watchdogs, or functional redundancy. These additional hardware safety blocks effectively allow ASIL decomposition and fault-tolerant architecture designs within a single device.
Bitmain manufactures Bitcoin, Litecoin, and other cryptocurrency mining machines and currently operates the world’s largest cryptocurrency mines. The company’s latest-generation Bitcoin miner, the Antminer S9, incorporates 189 of Bitmain’s 16nm ASIC, the BM1387, which performs the Bitcoin hash algorithm at a reate of 14 TeraHashes/sec. (See “Heigh ho! Heigh ho! Bitmain teams 189 bitcoin-mining ASICs with a Zynq SoC to create world's most powerful bitcoin miner.”) The company also uses one Zynq Z-7010 SoC to control those 189 hash-algorithm ASICs.
Bitmain’s Antminer S9 Bitcoin Mining Machine uses a Zynq Z-7010 SoC as a main control processor
The Powered by Xilinx program has just published a 3-minute video containing an interview with Yingfei Li, Bitmain’s Marketing Director, and Wenguo Zhang, Bitmain’s Hardware R&D Director. In the video, Zhang explains that the Zynq Z-7010 solved multiple hidden problems with the company’s previous-generation control panel, thanks to the Zynq SoC’s dual-core Arm Cortex-A9 MPCore processor and the on-chip programmable logic.
Due to the success that Bitmain has had with Xilinx Zynq SoCs in it’s Antminer S9 Bitcoin mining machine, the company is now exploring the use of Xilinx 20nm and 16nm devices (UltraScale and UltraScale+) for future, planned AI platforms and products.
DornerWorks is one of only three Xilinx Premier Alliance Partners in North America offering design services, so the company has more than a little experience using Xilinx All Programmable devices. The company has just launched a new learn-by-email series with “interesting shortcuts or automation tricks related to FPGA development.”
The series is free but you’ll need to provide an email address to receive the lessons. I signed up and immediately received a link to the first lesson titled “Algorithm Implementation and Acceleration on Embedded Systems” written by DornerWorks’ Anthony Boorsma. It contains information about the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC and the Xilinx SDSoC development environment.
Sign up here.
The recent introduction of the groundbreaking Xilinx Zynq UltraScale+ RFSoC means that there are big changes in store for the way advanced RF and comms systems will be designed. With as many as 16 RF-class ADCs and DACs on one device along with a metric ton or two of other programmable resources, the Zynq UltraScale+ RFSoC makes it possible to start thinking about single-chip Massive MIMO systems. A new EDN.com article by Paul Newson , Hemang Parekh, and Harpinder Matharu titled “Realizing 5G New Radio massive MIMO systems” teases a few details for building such systems and includes this mind-tickling photo:
A sharp eye and keen memory will link that photo to a demo from last October’s Xilinx Showcase demo at the Xilinx facility in Longmont, Colorado. Here’s Xilinx’s Lee Hansen demonstrating a similar system based on the Xilinx Zynq UltraScale+ RFSoC:
For more details about the Zynq UltraScale+ RFSoC, contact your friendly neighborhood Xilinx or Avnet sales rep and see these previous Xcell Daily blog posts:
Last month, a user on EmbeddedRelated.com going by the handle stephaneb started a thread titled “When (and why) is it a good idea to use an FPGA in your embedded system design?” Olivier Tremois (oliviert), a Xilinx DSP Specialist FAE based in France, provided an excellent, comprehensive, concise, Xilinx-specific response worth repeating in the Xcell Daily blog:
As a Xilinx employee I would like to contribute on the Pros ... and the Cons.
Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.
Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:
FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...
Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.
Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).
Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].
Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:
There are also tools related to the MathWorks environment [MATLAB and Simulink]:
All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…
One of life’s realities is that the most advanced semiconductor devices—including the Xilinx Zynq UltraScale+ MPSoCs—require multiple voltage supplies for proper operation. That means that you must devote a part of the system engineering effort for a product based on these devices to the power subsystem. Put another way, it’s been a long, long time since the days when a single 5V supply and a bypass capacitor were all you needed. Fortunately, there’s help. Xilinx has a number of vendor partners with ready, device-specific power-management ICs (PMICs). Case in point: Dialog Semiconductor.
If you need to power a Zynq UltraScale+ ZU3EG, ZU7EV, or ZU9CG MPSoC, you’ll want to check out Dialog’s App Note AN-PM-095 titled “Power Solutions for Xilinx Zynq Ultrascale+ ZU9EG.” This document contains reference designs for cost-optimized, PMIC-based circuits specifically targeting the power requirements for Zynq UltraScale+ MPSoCs. According to Xilinx Senior Tech Marketing Manager for Analog and Power Delivery Cathal Murphy, Dialog Semi’s PMICs can be used for low-cost power-supply designs because they generate as many as 12 power rails per device. They also switch at frequencies as high as 3MHz, which means that you can use smaller, less expensive passive devices in the design.
It also means that your overall power-management design will be smaller. For example, Dialog Semi’s power-management ref design for a Zynq UltraScale+ ZU9 MPSoC requires only 1.5in2 of board space—or less for smaller devices in the MPSoC family.
You don’t need to visualize that in your head. Here’s a photo and chart supplied by Cathal:
The Dialog Semi reference design is hidden under the US 25-cent piece.
As the chart notes, these Dialog Semi PMICs have built in power sequencing and can be obtained preprogrammed for Zynq-specific power sequences from distributors such as Avnet.
Cathal also pointed out that Dialog Semi has long been supplying PMICs to the consumer market (think smartphones and tablets) and that the power requirements for Zynq UltraScale+ MPSoCs map well into the existing capabilities of PMICs designed for this market, so you reap the benefit of the company’s volume manufacturing expertise.
Adam Taylor has been writing about the use of Xilinx All Programmable devices for image-processing platforms for quite a while and he has wrapped up much of what he knows into a 44-minute video presentation, which appears below. Adam is presenting tomorrow at the Xilinx Developer Forum being held in Frankfurt, Germany.
You’ll find a PDF of his slides attached below:
A previous blog at the end of last November discussed KORTIQ’s FPGA-based AIScale CNN Accelerator, which takes pre-trained CNNs (convolutional neural networks)—including industry standards such as ResNet, AlexNet, Tiny Yolo, and VGG-16—compresses them, and fits them into Xilinx’s full range of programmable logic fabrics. (See “KORTIQ’s AIScale Accelerator fits trained CNNs into large or small All Programmable devices, allowing you to pick the right price/performance ratio for your application.”) A short, new Powered by Xilinx video provides more details about Kortiq and its accelerated CNN.
In the video, KORTIQ CEO Harold Weiss discusses using low-end Zynq SoCs (up to the Z-7020) and Zynq UltraScale+ MPSoCs (the ZU2 and ZU3) to create low-power solutions that deliver “just enough” performance for target industrial applications such as video processing, which requires billions of operations per second. The Zynq SoCs and Zynq UltraScale+ MPSoCs consume far less power than competing GPUs and CPUs while accelerating multiple CNN layers including convolutional layers, pooling layers, fully connected layers, and adding layers.
Here’s the new video:
Vivado 2017.4 is now available. Download it now to get these new features (see the release notes for complete details):
Download the new version of the Vivado Design Suite HLx editions here.
Continental AG has announced its Assisted & Automated Driving Control Unit, based on Xilinx All Programmable technology and developed in collaboration with Xilinx. According to the company, “…the Assisted & Automated Driving Control Unit will enable Continental’s customers to get to market faster by building upon the Open Computing Language (OpenCL) framework…” and “…offers a scalable product family for assisted and automated driving fulfilling the highest safety requirements (ASIL D) by 2019.”
Continental AG’s Assisted & Automated Driving Control Unit is based on Xilinx All Programmable technology
Continental’s incorporation of Xilinx All Programmable technology “provides developers the ability to optimize software for the appropriate processing engine or to create their own hardware accelerators with the Xilinx All Programmable technology. The result is the ultimate freedom to optimize performance, without sacrificing latency, power dissipation, or the flexibility to move software algorithms between the integrated chips, as the project progresses.”
“Our Assisted & Automated Driving Control Unit will enable automotive engineers to create their own differentiated solutions for machine learning, and sensor fusion. Xilinx’s All Programmable Technology was chosen as it offers flexibility and scalability to address the ever-changing and new requirements along the way to fully automated self-driving cars,” said Karl Haupt, Head of Continental’s Advanced Driver Assistance Systems business unit. “For Continental, the Assisted & Automated Driving Control Unit is a central element for implementing the required functional safety architecture and, at the same time, a host for the comprehensive environment model and driving functions needed for automated driving.”
Continental will be exhibiting at next week’s CES in Las Vegas.
By Adam Taylor
What better way to start the New Year than with a new Adam Taylor MicroZed Chronicles blog? – The Editor
Following on from the popularity of my final blog of last year where I presented several tips for better image-processing systems, I thought I would kick off the 2018 series of blogs by providing a number of tips for using the XADC and Sysmon in Zynq SoCs and Zynq UltraScale+ MPSoCs.
Whether our targeted device uses a XADC or Sysmon depends upon the device family. If we are targeting a 7 series FPGA or Zynq SoC device, we will be using the XADC. If the target is an UltraScale FPGA, UltraScale+ FPGA, or a Zynq UltraScale+ MPSoC, we’ll be using a Sysmon block. Behaviorally, the on-chip XADC and Sysmon blocks are very similar but there are some minor differences in architecture and maximum sampling rates between the two. Including the XADC or Sysmon adds a very interesting analog/mixed-signal capability to your design and helps reduce the number of external components. Because they can monitor internal device parameters along with external signals, you can also use the XADC/Sysmon blocks to implement a comprehensive system health and security monitoring solution critical for many applications.
Here are some of my favourite tips for using the Xilinx XADC/Sysmon blocks:
To prevent signal aliasing, you must set the XADC/Sysmon sampling rate to at least twice the frequency of the signal being quantized. When sampling external signals, the XADC and Sysmon have different maximum sampling frequencies of 1000Ksamples/sec and 200Ksamples/sec respectively. To set the appropriate sampling frequency, we need to consider the relationship between the clock provided to the XADC/Sysmon (called DClock) and the resultant internally derived clock used for sampling (called ADC Clock). Both the XADC and Sysmon take a minimum of 26 internal ADC Clock cycles to perform a conversion. To achieve the maximum conversion rate of 1000KSPS for the XADC, we therefore need to set the ADC Clock at 26 MHz. For the Sysmon block, we need to set the ADC Clock to 5.2MHz to achieve the full 200ksamples/sec sample rate. ADC clock frequencies below these will result in lower sampling rates. Correctly setting the sampling rate depends upon the device you are using and the access method:
The analog inputs are defined by IP Integrator or software to be either unipolar or bipolar and you can control the input configuration for each analog input individually. When a unipolar signal is quantized, the input signal can range between 0V and 1V. For a bipolar input, the differential voltage between the Vp and Vn inputs is ±0.5V. Selecting the right mode ensures the best performance and avoids damaging the analog inputs. For unipolar configurations, Vp cannot be negative with respect to Vn. For bipolar inputs, Vp and Vn can swing positive and negative with respect the common mode (reference) voltage. Bipolar mode provides better noise performance because any common-mode noise coupled onto the Vp and Vn signals will be removed thanks to differential sampling.
When it comes to providing better performance in electrically noisy environments, you can also turn on input-channel averaging to average out the noise.
Both the XADC and the Sysmon can accept as many as seventeen external differential analog signals using one dedicated Vp/Vn pair and sixteen Auxiliary Vp/Vn pins. Doing so of course uses several I/O signal pins—as many as 34 I/O pins if all analog inputs are used. This may present issues, especially on smaller devices where I/O-pin availability might be tightly constrained so the XADC/Sysmon can drive an external multiplexer that reduces the number of pins required and also allows you to use and external mux with added protection for harsh operating environments (e.g. ESD protection).
Implementing an Anti-Aliasing filter on the front end of the XADC/Sysmon external inputs is critical to ensuring that only the signals we want are quantized.
The external resistor and capacitors in the AAF will increase the overall settling time. Therefore, we need to ensure the external AAF also does not adversely affect the total settling time and consequently the conversion performance. Failing to provide adequate system-level settling time can result in ADC measurement errors because the sampling capacitor will not charge to its final value.
Xilinx APP 795 Driving the Xilinx Analog-to-Digital Converter provides very useful information on this subject.
Both the XADC and Sysmon can monitor internal power supply voltages and temperatures. This is a great feature when we initially commission the boards because we can verify that the power supplies are delivering the expected voltages. We can even use the temperature sensor to verify thermal calculations at the high and low end of qualification environments.
When it comes to creating the run-time application you should use the temperature and voltage alarms, which are based on defined thresholds for core voltages and device temperature. Should the measured parameter fall outside of these defined thresholds, an alarm allows further action to be taken. Configured correctly this alarm capability can be used to generate an interrupt which alerts the processing system to a problem. Depending upon which alarm which has been raised, the system can then act to either protect itself or undertake graceful degradation, thus preventing sudden failure.
Hopefully these tips will enable you to create smoother XADC/Sysmon solutions. If you experience any issues, I have a page on my website that links to all previous XADC / Sysmon examples in this series.
You can find the example source code on GitHub.
Adam Taylor’s Web site is http://adiuvoengineering.com/.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here