Meanwhile, there seems to be a new pledge level that I don’t recall: a $179 level that includes a 1080p video camera. That’s in addition to the touch screen and voice input, which gives the Mycroft Mark II an even more interesting user interface. There are only a limited number of $179 pledge options, with 177 remaining as of the posting of this blog.
Bittware has already announced two PCIe boards for these HBM-enhanced Xilinx FPGAs:
The XUPVVH: a double-slot board that accommodates HBM-enhanced Virtex UltraScale+ VU35P or VU37P FPGAs (each with 8Gbytes of HBM DRAM) with four QSFP28 optical cages and two DIMM slots that accommodate as much as 256Gbytes of DDR4 SDRAM (128Gbytes/slot).
Rigol’s new RSA5000 real-time spectrum analyzer allows you to capture, identify, isolate, and analyze complex RF signals with a 40MHz real-time bandwidth over either a 3.2GHz or 6.5GHz signal span. It’s designed for engineers working on RF designs in the IoT and IIot markets as well as industrial, scientific, and medical equipment. Rigol was demonstrating the RSA5000 real-time spectrum analyzer at this week’s DesignCon being held at the Santa Clara Convention Center. I listened to a presentation from Rigol’s North American General Manager Mike Rizzo and then a demo by Rigol’s Director of Product Marketing & Software Applications Chris Armstrong, both captured in the 2.5-minute video below.
Rigol RSA5000 Real-Time Spectrum Analyzer
Based on what I saw in the demo, this is an extremely responsive instrument—far more responsive than a swept spectrum analyzer—with several visualization display modes to help you isolate the significant signal in a sea of signals and noise, in real time. It’s capable of continuously executing 146,484 FFTs/sec, which results in a minimum 100% POI (probability of intercept) of 7.45μsec. You need some real DSP horsepower to achieve that sort of performance and the Rigol RSA5000 real-time spectrum analyzer gets this performance from a pair of Xilinx Zynq Z-7015 SoCs. (You'll find many more details about real-time spectrum analysis and the RSA5000 Real-Time Spectrum Analyzer in the Rigol app note "Realtime Spectrum Analyzer vs Spectrum Analyzer," attached at the end of this post. See below.)
Here’s the short presentation and demo of the Rigol RSA5000 real-time spectrum analyzer from DesignCon 2018:
Mike Rizzo told me that the Rigol design engineers selected the Zynq Z-7015 SoCs for three main reasons:
High-bandwidth access between the Zynq SoC’s PS (processing system) and PL (programmable logic)
Excellent development tools including Xilinx’s Vivado HLS
If you’re looking for a very capable spectrum analyzer, give the Rigol RSA5000 a look. If you’re designing your own real-time system and need high-speed computation coupled with fast user response, take a look at the line of Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.
The 2-minute video below shows you an operational Xilinx Virtex UltraScale+ XCVU37P FPGA, which is enhanced with co-packaged HBM (high-bandwidth memory) DRAM using Xilinx’s well-proven, 3rd-generation 3D manufacturing process. (Xilinx started shipping 3D FPGAs way back in 2011, starting with the Virtex-7 2000T and we’ve been shipping these types of devices ever since.)
This video was made on the very first day of silicon bringup for the device and it is already operating at full speed (460Gbytes/sec), error-free, over 32 channels. The Virtex UltraScale+ XCVU37P is one big All Programmable device with:
2852K System Logic Cells
9Mbits of BRAM
270Mbits of UltraRAM
9024 DSP48E2 slices
8Gbytes of integrated HBM DRAM
96 32.75Gbps GTY SerDes transceivers
Whatever your requirements, whatever your application, chances are this extremely powerful FPGA will deliver all of the heavy lifting (processing, memory, and I/O) that you need.
Quite simply, Vadatech’s AMC584 module is an I/O monster. Its immense I/O capabilities start with the five QSFP28 100GbE-capable cages on the module’s front panel. Then there are the AMC Tongues. AMC Tongue 1 is fully routed with SerDes ports and there are as many as 20 lanes routed to Tongue 2. The AMC584 also contains a high-speed Zone 3 connector that provides the primary digital I/O routing and enables multi-module configurations.
The SerDes ports on these boards are all implemented in a Xilinx Virtex UltraScale+ XCVU13P FPGA, which is itself an I/O monster. It has 128 on-chip GTY 32.75Gbs SerDes transceivers, so it makes an ideal foundation for an I/O monster board.
Here’s a block diagram of the Vadatech AMC584 module:
Vadatech AMC584 Module Block Diagram
Now, before you get the idea that the Virtex UltraScale+ XCVU13P FPGA is just I/O, please understand that there are also 3780K system logic cells, 12,288 DSP48E2 slices, 94.5Mbits of BRAM, and 360Mbits of UltraRAM on the device as well, so it’s a DSP monster and a processing monster too. The Virtex UltraScale+ XCVU13P FPGA is capable of implementing just about any system you might imagine.
And just in case the hundreds of Mbits of SRAM on the Virtex UltraScale+ XCVU13P FPGA aren’t sufficient for your processing needs, the AMC584 module also has two banks of DDR4 SDRAM on board.
Vadatech AMC584 Module
Please contact Vadatech directly for more information about the AMC584 Module.
Here are two reasons you might want to participate in this Kickstarter campaign:
The Mycroft Mark II is a hands-free, privacy-oriented, open-source smart speaker with a touch screen. It has advanced far-field voice recognition and multiple wake words for voice-based cloud services such as Amazon’s Alexa and Google Home, courtesy of Aaware’s technology. (See “Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.”) The finished smart speaker requires a pledge of $129 (or $299 for three of them) but the dev kit version of the Mycroft Mark II requires a pledge of only $99, which is cheap as dev kits go. (Note: there are only 88 of these kits left, as of this writing.)
You could look at the Mycroft Mark II as a general-purpose, $99 Zynq UltraScale+ MPSoC open-source dev kit with a touch screen that’s also been enabled for voice control, which you can use as a platform for a variety of IIoT, cloud computing, or embedded projects. That in itself is a very attractive offer. As the Mycroft Mark II Kickstarter project page says: “The Mark II has special features that make hacking and customizing easy, not to mention thorough documentation and a community to lean on when building. Support for our community is central to the Mycroft mission.” That’s a lot for a sub-$100 dev kit, don’t you think?
If you’d like some intense training on the Xilinx Zynq UltraScale+ MPSoC—one of the most powerful embedded application processor (plus programmable logic) families that you can throw at an embedded-processing application—then Hardent’s 3-day class titled “Embedded System Design for the Zynq UltraScale+ MPSoC” might just be what you’re looking for. There’s a live, E-Learning version kicking off February 7 with live, in-person classes scheduled for North America from February 21 (in Ottawa) through August. The schedule’s on the referenced Web page.
You certainly might want a comprehensive course outline before you decide, so here it is:
Zynq UltraScale+ MPSoC Overview – Overview of the Zynq UltraScale+ MPSoC All Programmable device.
Application Processing Unit – Introduction to the members of the APU (based on 64-bit Arm Cortex-A53 processors) and how to configure and manage the APU cluster.
Real-Time Processing Unit – Introduction to the various elements within the RPU including the dual-core Arm Cortex-R5 processor and different modes of configuration.
QEMU – Introduction to the Quick Emulator: an emulation tool for the Zynq UltraScale+ MPSoC device that lets you run software whenever, wherever without the actual hardware.
Platform Management Unit –Tools and techniques for debugging your Zynq UltraScale+ MPSoC design.
Booting – Learn how to implement an embedded system including the boot process and boot-image creation.
AXI – Discover how the Zynq UltraScale+ MPSoC’s PS (processing system) and PL (programmable logic) connect to permit designers to create very high performance embedded systems with hardware-speed processing where needed.
Clocks and Resets – Overview of the Zynq UltraScale+ MPSoC’s clocking and reset functions, focusing more on capabilities than specific implementations.
DDR SDRAM and QoS – Learn how to configure the system’s DDR SDRAM to maximize system performance.
System Protection – Covers all the hardware elements that support the separation of software domains within the the Zynq UltraScale+ MPSoC’s PS.
Security and Software – Shows you how to use the safety and security features of the the Zynq UltraScale+ MPSoC in the context of embedded system design and introduces several standards.
ARM TrustZone Technology – Presents the use of the Arm TrustZone technology.
Linux – Discussion and examples showing you how to configure Linux to manage multiple processors.
Yocto – Compares kernel-building methods between a "pure" Yocto build and the Xilinx PetaLinux build (which uses Yocto "under-the-hood").
OpenAMP – Introduction to the concept of the Multicore Association’s OpenAMP framework for asymmetric multiprocessing on heterogeneous processor architectures like the the Zynq UltraScale+ MPSoC.
Hardware/Software Virtualization – Covers the hardware and software elements of virtualization. A lab shows you how to use hypervisors.
Xen Hypervisor – Starts with a description of generic hypervisors and then discusses the details of implementing a hypervisor based on Xen.
Ecosystem Support – Overview of the Zynq UltraScale+ MPSoC’s supported operating systems, software stacks, hypervisors, etc.
FreeRTOS – Overview of FreeRTOS with examples of how to use it.
Software Stack – Introduces the concept of a software stack and discusses the many available stacks for the Zynq UltraScale+ MPSoC.
Curtiss-Wright’s VPX3-535 3U OpenVPX transceiver module implements a single-slot, dual-channel, 6Gsamples/sec analog data-acquisition and processing system using two 12-bit, 6Gsamples/sec ADCs and two 12-bit, 6Gsamples/sec DACs. This is the type of capability you need for demanding applications such as radar, Signal Intelligence (SIGINT), Electronic Warfare (EW), and Software Defined Radio (SDR). This amount of analog-to-digital and digital-to-analog conversion capability demands wicked-fast digital processing and on the VPX3-535 transceiver module, that digital processing comes in the form of two of Xilinx’s most powerful All Programmable devices: a Virtex UltraScale+ VU9P and a Zynq UltraScale+ ZU4 MPSoC.
Here’s a block diagram of the Curtiss-Wright VPX3-535 module:
The VPX3-535 is Curtiss-Wright’s first publicly announced module to feature full compliance to the VITA 48.8 Air-Flow-Through (AFT) cooling standard, which ensures optimal performance in the harshest conditions. VITA 48.8 provides a low-cost, effective means to cool high-power COTS 3U and 6U VPX modules that dissipate ~150W+.
At the same time, Curtiss-Wright is also introducing a conduction-cooled variant, called the VPX3-534, which designed for applications that do not require the performance of the VPX3-535. The VPX3-534 supports the same dual-channel, 12-bit, 6Gsamples/sec ADC and DAC channels as the VPX3-535 but it replaces the Virtex UltraScale+ FPGA with a Xilinx Kintex UltraScale KU115 FPGA. This module also supports an option for four 3Gsamples/sec ADC channels.
Please contact Curtiss-Wright directly for more information about the VPX3-535 and VPX3-534 OpenVPX transceiver modules.
Keysight published a 14-minute video back in 2015 that gives you the basics behind RF beamforming and its use in 5G applications. The video also invites you to download a free, 30-day trial of Keysight’s SystemVue with Keysight’s 5G simulation library to try out some of the concepts discussed in the video and the link appears to be active still.
Here’s the video:
Meanwhile, should you need an implementation technology for RF beamforming (5G or otherwise), allow me to suggest that the new Xilinx Zynq UltraScale+ RFSoC with its many integrated RF ADCs and DACs be at the top of your technology choices. There is literally no other device like the Zynq UltraScale+ RFSoC. It’s in a category of one.
For more information about the Zynq UltraScale+ RFSoC, see:
The Zynq UltraScale MPSoC is a complex system on chip containing as many as four Arm Cortex-A53 application processors, a dual-core Arm Cortex-R5 real-time processor, a Mali GPU, and of course programmable logic. When it comes to generating our software application, we want to use the A53-based Application Processing Unit (APU) and R5 Real-Time Processing Unit (RPU) cores appropriately. This means we want to use the APU for computationally intensive, high-level applications or virtualization while using the RPU for real-time control and monitoring.
This means the APU will likely be running an operating system such as Linux while the real-time needs are addressed by the RPU using bare-metal software or a simplified OS such as FreeRTOS. Often an overall system solution requires communication between the APU and RPU to achieve the desired solution functionality but communication between different processors running different applications has previously been challenging and ad-hoc with inter-processor communications (IPC) using shared memory, mail boxes, or even networks for IPC. As a result, IPC solutions differed from implementation to implementation and device to device, which increased development time and hence time to market.
This is inefficient engineering.
To best leverage the capabilities of the UltraScale+ Zynq MPSoC, we need an open framework that allows us to abstract device-specific interfaces and enables the implementation of AMP (asymmetric multi-processing) with greater ease across multiple projects.
OpenAMP developed by the Multicore Association provides everything we need to run different operating systems on the APU and RPU. Of course, for OpenAMP to function from processor to processor, we need an abstraction layer that provides device-specific interfaces (e.g. interrupt handlers, memory requests, and device access). The libmetal library provides these for Xilinx devices through several APIs that abstract the processor.
For our Zynq UltraScale+ MPSoC designs, the provided OpenAMP frameworks enable messaging between the master processor and remote processor and lifecycle management of the remote processor using the following structures:
Remoteproc – enables lifecycle management of the remote processor. This includes downloading the application to the remote processor, stopping and starting the remote processor, and system resource allocation as required.
RPMsg - supports IPC between different processors in the system.
OpenAMP remoteproc and RPMsg concepts
For this example, we are going to run Linux on the APU and a bare-metal application on the RPU using RPMsg within the kernel space. When we run the RPMsg from within the kernel space, the remote application lifecycle must be managed by Linux. This means the remote processor application does not run independently. If we develop the RPMsg application to run within the Linux user space, the remote processor can run independently.
To create this example first we need to enable remote-processor support within our Linux build. This requires that we rebuild the petalinux project, customising the kernel and root fs. If you are not familiar with building petalinux you might want to read this blog.
Within our petalinux project, the first thing we need to do is enable the remoteproc driver. Using a terminal application within the petalinux project, issue the command:
petalinux-config -c kernel
This will open the kernel configuration menu. Here we can enable the remote-processor drivers which are located under:
Device Drivers -> Remoteproc drivers
Enabling the Remoteproc drivers
The second step is to include the OpenAMP examples within the file system. Again inside the project, issue the command:
petalinux-config -c rootfs
Within the configuration menu, navigate to Filesystem Packages -> misc and enable the packagegroup-petalinux-openamp:
Enabling the package group
The final step before we can rebuild the petalinux image is to update the device tree. We can find an OpenAMP template dtsi file at the location:
Once the kernel, filesystem, and device tree have been updated, rebuild the petalinux image using the command below:
This will generate an updated Linux build that we can copy onto the boot medium of choice and run on our Zynq UltraScale+ MPSoC design.
Using a terminal connected to our preferred development board (in my case the UltraZed), we can test the OpenAMP examples we included within the Linux file System. There are three examples provided: echo test, matrix multiplication, and proxy server.
I ran the matrix-multiply example because it demonstrates the remote processor performing mathematical calculations.
Using the terminal, I entered the following commands:
Following the on-screen menu and commands, I ran the example which provided the results below:
Executing the Matrix Multiply Example
Matrix Multiply example running
This example shows that the OpenAMP framework is running correctly on the Zynq UltraScale+ MPSoC petalinux build and that we can begin to create our own applications. If you want to run the other two examples refer to UG1186.
If we wish to create our own OpenAMP-based application for the RPU, which uses the kernel space RPMsg, we can create this using the SDK and install the generated elf as an app within petalinux. Although it does mean we need to rebuild the petalinux image again, we will look at how we do this in another blog. There is a lot more for us to explore here.
Note: We have looked at OpenAMP before for the Zynq 7000 series of devices in blogs 169 & 171.
Do you need an extremely powerful yet extremely tiny SOM to implement a challenging embedded design? Enclustra’s credit-card sized Mercury+ XU1 is worth your consideration. It packs a Xilinx Zynq UltraScale+ MPSoC, as much as 8Gbytes of DDR4 SDRAM with ECC, 16Gbytes of eMMC Flash memory, two Gbit Ethernet PHYs, two USB 2.0/3.0 PHYs, and 294 user I/O pins on three 168-pin Hirose FX10 connectors into a mere 74x54mm. That’s a lot of computational horsepower in a teeny, tiny package.
Here’s a block diagram of the Mercury+ XU1 SOM:
By itself, the Zynq UltraScale+ MPSoC gives the SOM a tremendous set of resources including:
A quad-core Arm Cortex-A53 64-bit application processor
A dual-core Arm Cortex-R5 32-bit real-time processor
An Arm Mali-400 MP2 GPU
As many as 747K system logic cells (in a Zynq UltraScale+ ZU15EG MPSoC)
As much as 26.2Mbits of BRAM and 31.5Mbits of UltraRAM (in a Zynq UltraScale+ ZU15EG MPSoC)
As many as 3528 DSP48E2 slices (in a Zynq UltraScale+ ZU15EG MPSoC)
As with many product family designs, Enclustra is able to tailor the price/performance of the Mercury+ XU1 SOM by offering multiple versions based on different pin-compatible members of the Zynq UltraScale+ MPSoC family.
Please contact Enclustra directly for more information about the Mercury+ XU1 SOM.
If you’re designing and debugging high-speed logic—as you might with video or radar applications, for example—then perhaps you could use some fast debugging capability. As in really-fast. As in much, much, much faster than JTAG. EXOSTIV Labs has got a solution. It’s called the EXOSTIV FPGA Debug Probe and it uses the [bulletproof] high-speed SerDes ports that are pervasive throughout Xilinx All Programmable device families to extract debug data from running devices with great alacrity.
Here’s a 3-minute video showing the EXOSTIV FPGA Debug Probe communicating with a Xilinx Virtex UltraScale VCU108 Eval Kit, connected through the kit’s high-speed QSFP connector, creating a 50Gbps link between the board and the debugger.
Here’s a second 3-minute video with some additional information. This one shows the EXOSTIV Probe and Dashboard being used to monitor 640 signals in a high-speed video interface design:
You observe the captured data using the EXOSTIV Dashboard, as demonstrated in the above video. The probe and software can handle debug data from as many as 32,768 internal nodes per capture. The mind boggles at the potential complexity being handled here.
According to EXOSTIV, the FPGA Debug Probe and Dashboard give you 200,000x more observability into your design than the tools you might currently be using. That’s a major leap in debugging speed and capability that could save you days or weeks of debugging time.
When you’ve exhausted JTAG’s debug capabilities, consider EXOSTIV.
Today marks the launch of Joshua Montgomery’s Mycroft Mark II open-source Voice Assistant, a hands-free, privacy-oriented smart speaker with a touch screen that also happens to be based on a 6-microphone version of Aaware’s Sound Capture Platform. In fact, according to today’s article on EEWeb written by my good friend and industry gadfly Max Maxfield, Aaware is designing the pcb for the Mycroft Mark II Voice Assistant, which will be based on a Xilinx Zynq UltraScale+ MPSoC according to Max’s article. (It’s billed as a “Xilinx quad-core processor” in the Kickstarter project listing.) According to Max’s article, “This PCB will be designed to support different microphone arrays, displays, and cameras such that it can be used for follow-on products that use the Mycroft open-source voice assistant software stack.”
To repeat: That’s an open-source, consumer-level product based on one of the most advanced MPSoC’s on the market today with at least two 64-bit Arm Cortex-A53 processors and two 32-bit Arm Cortex-R5 processors plus a generous chunk of the industry’s most advanced programmable logic based on Xilinx’s 16nm UltraScale+ technology.
Aaware’s technology starts with an array of six individual microphones. The outputs of these microphones are combined and processed with several Aaware-developed algorithms including acoustic echo cancellation, noise reduction and beamforming that allow the Mycroft Mark II smart speaker to isolate the voice of a speaking human even in noisy environments. (See “Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.”) The combination of Aaware’s Sound Capture Platform, Mycroft’s Mark II smart speaker open-source code, and the immensely powerful Zynq UltraScale+ MPSoC give you an incredible platform for developing your own end products.
Here’s a 3-minute video demo of the Mycroft Mark II smart speaker’s capabilities:
Pledge $99 on Kickstarter and you’ll get a DIY dev kit that includes the pcbs, an LCD, speakers, and cables but no handsome plastic housing. Pledge $129—thirty bucks more—and you get a built unit in an elegant housing. There are higher pledge levels too.
What’s the risk? As of today, the first day of the pledge campaign, the project is 167% funded, so it’s already a “go.” There are 28 days left to jump in. Also, Mycroft delivered the Mark I speaker, a previous Kickstarter project, last July so the company has a track record of successful Kickstarter project completion.
XIMEA has added two new high-speed industrial cameras to its xiB-64 family: the 1280x864-pixel CB013 capable of imaging at 3500fps and the 1920x1080-pixel CB019 capable of imaging at 2500fps. As with all digital cameras, the story for these cameras starts with the sensors. The CB013 camera is based on a LUXIMA Technology LUX13HS 1.1Mpixel sensor and the CB019 is based on a LUXIMA Technology LUX19HS 2Mpixel sensor. Both cameras use PCIe 3.0 x8 interfaces capable of 64Gbps sustained transfer rates. Use of the PCIe interface allows a host PC to use DMA for direct transfers of the video stream into the computer’s main memory with virtually no CPU overhead.
Both cameras are also based on a Xilinx Kintex UltraScale KU035 FPGA. Why such a fast FPGA in an industrial video camera? The frame rates and 64Gbps PCIe interface transfer rate are all the explanation you need. The Kintex UltraScale KU035 FPGA has 444K system logic cells and 1700 DSP48E2 slices—ample for handling the different sensors in the camera product line and just about any sort of video processing that’s needed. The Kintex UltraScale FPGA also incorporates two integrated (hardened) PCIe Gen3 IP blocks with sixteen bulletproof 16.3Gbps SerDes transceivers to handle the camera’s PCIe Gen3 interface.
Everspin’s nvNITRO NVMe Storage Accelerator is a persistent-memory PCIe storage card for cloud and data-center applications that delive4rs up to 1.46 million IOPS for random 4Kbyte mixed 70/30 read/write operations. It’s based on Everspin’s STT-MRAM (spin-transfer torque magnetic RAM) chips and uses a Xilinx Kintex UltraScale KU060 FPGA to implement the MRAM controller and the board’s PCIe Gen3 x8 host interface. Everspin has just published an nvNITRO application note titled “Accelerating Fintech Applications with Lossless and Ultra-Low Latency Synchronous Logging using nvNITRO” that details the use of the nvNITRO Storage Accelerator to speed cloud-based financial transactions. The application note explores how Everspin nvNITRO technology can improve FinTech (Financial Technology) performance without creating additional compliance risks.
If you haven’t looked deeply into the intricacies of financial trading transactions, the app note starts with a clarifying block diagram, which shows the multiple layers built into the transaction process:
The diagram shows many opportunities for accelerating transactions, which is important because in this market, microseconds translate into millions of dollars gained or lost.
If you’re developing cloud-based systems and acceleration is important, whether or not you’re developing FinTech applications, take a few minutes to read the Everspin app note.
Huawei’s FACS cloud offering is based on a PCIe server card that incorporates a Xilinx Virtex UltraScale+ VU9P FPGA. (Huawei also offers the board for on-premise installations.) In addition to the hardware, Huawei offers three major development tools for FACS:
An SDAccel-based shell that offers fast, easy development. SDAccel is Xilinx’s development environment for C, C++, and OpenCL. This shell also provides access to Xilinx’s Vivado development environment.
A DPDK shell for high-performance applications. Intel originally developed DPDK as a packet-processing framework for accelerated server systems and Huawei’s implementation can support throughputs as high as 12Gbytes/sec.
A Professional Simulation Platform that encapsulates more than two decades of Huawei’s FPGA development experience.
With these offerings, Davies said, Huawei is looking to add partners to expand its ecosystem and is particularly interested in talking to companies that offer:
There’s a Huawei Cloud Marketplace that serves as an outlet for FACS applications. The company is also welcoming end users to try the service.
Here’s a video of Davies’ 32-minute presentation at XDF:
Amazon’s Senior Director of Business Development and Product, Gadi Hutt, gave an in-depth presentation at the recent Xilinx Developers Forum in Frankfurt, Germany where he detailed the specifics, advantages, and the nuts-and-bolts “how to” with respect to using the FPGA-based AWS EC2 F1 instances to accelerate your business.
First, Hutt gave one of the most succinct definitions of “the cloud” I’ve heard: “the on-demand delivery of compute, storage, networking, etc. services.” This definition is free of the niggling details such as hardware, networking, power, and cooling that you are now free to ignore.
Then Hutt listed the advantages of cloud-based services:
Agility and speed of innovation
Elasticity: scale up or down quickly, as needed
Breadth of functionality
Go global in minutes
From there, Hutt provided a deep explanation of the steps you need to take to distribute cloud-based services globally. He also quoted a Gartner estimate, which said that AWS (Amazon Web Services) has more compute capacity than all of the other cloud providers combined. Certainly, this Gartner report puts AWS far in the upper right corner of the Gartner Magic Quadrant for Cloud Infrastructure as a Service, Worldwide.
Using AWS allows your company to “get out of IT” and focus on providing specialized services where you can add value, said Hutt. “You can focus on your core business,” he continued.
A quick look at the latest product table for the Xilinx Zynq UltraScale+ RFSoC will tell you that the sample rate for the devices’ RF-class, 14-bit DAC has jumped to 6.554Gsamples/sec, up from 6.4Gsamples/sec. I asked Senior Product Line Manager Wouter Suverkropp about the change and he told me that the increase supports “…an extra level of oversampling for DOCSIS3.1 [designers]. The extra oversampling gives them 3dB processing gain and therefore simplifies the external circuits even further.”
Zynq UltraScale+ RFSoC Conceptual Diagram
For more information about the Zynq UltraScale+ RFSoC, see:
“…our FPGA implementation achieves a 10x speedup on the compute intensive part of the code, compared to an optimized parallel implementation on multicore CPU, and it delivers a 3.5x speedup at application level for the given setup.”
The University of Kaiserslautern’s Microelectronic Systems Design Research Group has been working on accelerating financial applications using FPGAs in connection with high-performance computing systems since 2010 and that research has recently migrated to cloud-based computing systems including Amazon’s EC2 F1 Instance, which is based on Xilinx Virtex Ultrascale+ FPGAs. The results in this White Paper are based on using OpenCL code and the Xilinx SDAccel development environment.
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Xilinx has announced availability of automotive-grade Zynq UltraScale+ MPSoCs, enabling development of safety critical ADAS and Autonomous Driving systems. The 4-member Xilinx Automotive XA Zynq UltraScale+ MPSoC family is qualified according to AEC-Q100 test specifications with full ISO 26262 ASIL-C level certification and is ideally suited for various automotive platforms by delivering the right performance/watt while integrating critical functional safety and security features.
The XA Zynq UltraScale+ MPSoC family has been certified to meet ISO 26262 ASIL-C level requirements by Exida, one of the world's leading accredited certification companies specializing in automation and automotive system safety and security. The product includes a "safety island" designed for real-time processing functional safety applications that has been certified to meet ISO 26262 ASIL-C level requirements. In addition to the safety island, the device’s programmable logic can be used to create additional safety circuits tailored for specific applications such as monitors, watchdogs, or functional redundancy. These additional hardware safety blocks effectively allow ASIL decomposition and fault-tolerant architecture designs within a single device.
Bitmain’s Antminer S9 Bitcoin Mining Machine uses a Zynq Z-7010 SoC as a main control processor
The Powered by Xilinx program has just published a 3-minute video containing an interview with Yingfei Li, Bitmain’s Marketing Director, and Wenguo Zhang, Bitmain’s Hardware R&D Director. In the video, Zhang explains that the Zynq Z-7010 solved multiple hidden problems with the company’s previous-generation control panel, thanks to the Zynq SoC’s dual-core Arm Cortex-A9 MPCore processor and the on-chip programmable logic.
Due to the success that Bitmain has had with Xilinx Zynq SoCs in it’s Antminer S9 Bitcoin mining machine, the company is now exploring the use of Xilinx 20nm and 16nm devices (UltraScale and UltraScale+) for future, planned AI platforms and products.
DornerWorks is one of only three Xilinx Premier Alliance Partners in North America offering design services, so the company has more than a little experience using Xilinx All Programmable devices. The company has just launched a new learn-by-email series with “interesting shortcuts or automation tricks related to FPGA development.”
The series is free but you’ll need to provide an email address to receive the lessons. I signed up and immediately received a link to the first lesson titled “Algorithm Implementation and Acceleration on Embedded Systems” written by DornerWorks’ Anthony Boorsma. It contains information about the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC and the Xilinx SDSoC development environment.
The recent introduction of the groundbreaking Xilinx Zynq UltraScale+ RFSoC means that there are big changes in store for the way advanced RF and comms systems will be designed. With as many as 16 RF-class ADCs and DACs on one device along with a metric ton or two of other programmable resources, the Zynq UltraScale+ RFSoC makes it possible to start thinking about single-chip Massive MIMO systems. A new EDN.com article by Paul Newson , Hemang Parekh, and Harpinder Matharu titled “Realizing 5G New Radio massive MIMO systems” teases a few details for building such systems and includes this mind-tickling photo:
A sharp eye and keen memory will link that photo to a demo from last October’s Xilinx Showcase demo at the Xilinx facility in Longmont, Colorado. Here’s Xilinx’s Lee Hansen demonstrating a similar system based on the Xilinx Zynq UltraScale+ RFSoC:
For more details about the Zynq UltraScale+ RFSoC, contact your friendly neighborhood Xilinx or Avnet sales rep and see these previous Xcell Daily blog posts:
As a Xilinx employee I would like to contribute on the Pros ... and the Cons.
Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.
Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:
FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...
Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.
Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).
Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].
Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:
IP Integrator(IPI): a graphical interface to stitch IPs together and generate bitstreams for complete systems.
Vivado HLS (High Level Synthesis): a tool that allows you to generate HDL code from C/C++ code. This tool will generate IPs that can be handled by IPI.
SDSoC (Software Defined SoC): This tool allows you to design complete systems, software and hardware on a Zynq SoC/Zynq UltraScale+ MPSoC platform. This tool with some plugins will allow you to move part of your C/C++ code to programmable logic (calling Vivado HLS in the background).
SDAccel: an OpenCL (and more) implementation. Not relevant for this thread.
There are also tools related to the MathWorks environment [MATLAB and Simulink]:
System Generator for DSP (aka SysGen): Low-level Simulink library (designed by Xilinx for Xilinx FPGAs). Allows you to program HDL code with blocks. This tools achieves even better performance (clock/area) than HDL code as each block is an instance of an IP (from register, adder, counter, multiplier up to FFT, FIR compiler, and VHLS IP). Bit-true and cycle-true simulations.
Xilinx Model Composer (XMC): available since ... yesterday! Again a Simulink blockset but based on Vivado HLS. Much faster simulations. Bit-true but not cycle-true.
All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…
One of life’s realities is that the most advanced semiconductor devices—including the Xilinx Zynq UltraScale+ MPSoCs—require multiple voltage supplies for proper operation. That means that you must devote a part of the system engineering effort for a product based on these devices to the power subsystem. Put another way, it’s been a long, long time since the days when a single 5V supply and a bypass capacitor were all you needed. Fortunately, there’s help. Xilinx has a number of vendor partners with ready, device-specific power-management ICs (PMICs). Case in point: Dialog Semiconductor.
If you need to power a Zynq UltraScale+ ZU3EG, ZU7EV, or ZU9CG MPSoC, you’ll want to check out Dialog’s App Note AN-PM-095 titled “Power Solutions for Xilinx Zynq Ultrascale+ ZU9EG.” This document contains reference designs for cost-optimized, PMIC-based circuits specifically targeting the power requirements for Zynq UltraScale+ MPSoCs. According to Xilinx Senior Tech Marketing Manager for Analog and Power Delivery Cathal Murphy, Dialog Semi’s PMICs can be used for low-cost power-supply designs because they generate as many as 12 power rails per device. They also switch at frequencies as high as 3MHz, which means that you can use smaller, less expensive passive devices in the design.
It also means that your overall power-management design will be smaller. For example, Dialog Semi’s power-management ref design for a Zynq UltraScale+ ZU9 MPSoC requires only 1.5in2 of board space—or less for smaller devices in the MPSoC family.
You don’t need to visualize that in your head. Here’s a photo and chart supplied by Cathal:
The Dialog Semi reference design is hidden under the US 25-cent piece.
As the chart notes, these Dialog Semi PMICs have built in power sequencing and can be obtained preprogrammed for Zynq-specific power sequences from distributors such as Avnet.
Cathal also pointed out that Dialog Semi has long been supplying PMICs to the consumer market (think smartphones and tablets) and that the power requirements for Zynq UltraScale+ MPSoCs map well into the existing capabilities of PMICs designed for this market, so you reap the benefit of the company’s volume manufacturing expertise.
Adam Taylor has been writing about the use of Xilinx All Programmable devices for image-processing platforms for quite a while and he has wrapped up much of what he knows into a 44-minute video presentation, which appears below. Adam is presenting tomorrow at the Xilinx Developer Forum being held in Frankfurt, Germany.
In the video, KORTIQ CEO Harold Weiss discusses using low-end Zynq SoCs (up to the Z-7020) and Zynq UltraScale+ MPSoCs (the ZU2 and ZU3) to create low-power solutions that deliver “just enough” performance for target industrial applications such as video processing, which requires billions of operations per second. The Zynq SoCs and Zynq UltraScale+ MPSoCs consume far less power than competing GPUs and CPUs while accelerating multiple CNN layers including convolutional layers, pooling layers, fully connected layers, and adding layers.
Continental AG has announced its Assisted & Automated Driving Control Unit, based on Xilinx All Programmable technology and developed in collaboration with Xilinx. According to the company, “…the Assisted & Automated Driving Control Unit will enable Continental’s customers to get to market faster by building upon the Open Computing Language (OpenCL) framework…” and “…offers a scalable product family for assisted and automated driving fulfilling the highest safety requirements (ASIL D) by 2019.”
Continental AG’s Assisted & Automated Driving Control Unit is based on Xilinx All Programmable technology
Continental’s incorporation of Xilinx All Programmable technology “provides developers the ability to optimize software for the appropriate processing engine or to create their own hardware accelerators with the Xilinx All Programmable technology. The result is the ultimate freedom to optimize performance, without sacrificing latency, power dissipation, or the flexibility to move software algorithms between the integrated chips, as the project progresses.”
“Our Assisted & Automated Driving Control Unit will enable automotive engineers to create their own differentiated solutions for machine learning, and sensor fusion. Xilinx’s All Programmable Technology was chosen as it offers flexibility and scalability to address the ever-changing and new requirements along the way to fully automated self-driving cars,” said Karl Haupt, Head of Continental’s Advanced Driver Assistance Systems business unit. “For Continental, the Assisted & Automated Driving Control Unit is a central element for implementing the required functional safety architecture and, at the same time, a host for the comprehensive environment model and driving functions needed for automated driving.”
Continental will be exhibiting at next week’s CES in Las Vegas.
What better way to start the New Year than with a new Adam Taylor MicroZed Chronicles blog? – The Editor
Following on from the popularity of my final blog of last year where I presented several tips for better image-processing systems, I thought I would kick off the 2018 series of blogs by providing a number of tips for using the XADC and Sysmon in Zynq SoCs and Zynq UltraScale+ MPSoCs.
Whether our targeted device uses a XADC or Sysmon depends upon the device family. If we are targeting a 7 series FPGA or Zynq SoC device, we will be using the XADC. If the target is an UltraScale FPGA, UltraScale+ FPGA, or a Zynq UltraScale+ MPSoC, we’ll be using a Sysmon block. Behaviorally, the on-chip XADC and Sysmon blocks are very similar but there are some minor differences in architecture and maximum sampling rates between the two. Including the XADC or Sysmon adds a very interesting analog/mixed-signal capability to your design and helps reduce the number of external components. Because they can monitor internal device parameters along with external signals, you can also use the XADC/Sysmon blocks to implement a comprehensive system health and security monitoring solution critical for many applications.
Here are some of my favourite tips for using the Xilinx XADC/Sysmon blocks:
Remember Nyquist’s sampling theorem and configure sample clocks correctly
To prevent signal aliasing, you must set the XADC/Sysmon sampling rate to at least twice the frequency of the signal being quantized. When sampling external signals, the XADC and Sysmon have different maximum sampling frequencies of 1000Ksamples/sec and 200Ksamples/sec respectively. To set the appropriate sampling frequency, we need to consider the relationship between the clock provided to the XADC/Sysmon (called DClock) and the resultant internally derived clock used for sampling (called ADC Clock). Both the XADC and Sysmon take a minimum of 26 internal ADC Clock cycles to perform a conversion. To achieve the maximum conversion rate of 1000KSPS for the XADC, we therefore need to set the ADC Clock at 26 MHz. For the Sysmon block, we need to set the ADC Clock to 5.2MHz to achieve the full 200ksamples/sec sample rate. ADC clock frequencies below these will result in lower sampling rates. Correctly setting the sampling rate depends upon the device you are using and the access method:
Zynq PS APB Access – The XADC is clocked using the PCAP_2X clock which has a nominal frequency of 200MHz. This is divided further internally within the DevC to generate a DClock for the XADC, which has a maximum frequency of 50MHz using the XADCIF_CFG[TCKRATE] register settings. This clock is then further divided down (minimum division by 2) using the XDAC configuration register to select the desired conversion rate.
Zynq MPSoC APB Access – The PS and PL Sysmon are clocked by the AMS clock with a range 0 to 52 MHz. This is then divided further by the Sysmon to create the ADC Clock used for sampling. This further division is a minimum of 2 for the PS Sysmon or 8 for the PL Sysmon using the Sysmon configuration clock division registers PS/PL CONFIG_REG2.
AXI Access using FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC (PL Sysmon) – Use the system management wizard or XADC wizard configuration to determine the AXI clock frequency and hence the sampling clock via the IP customization tab. The required sample clock frequency can then be supplied by either a fabric clock (Zynq SoC or Zynq UltraScale+ MPSoC) or clock wizard (FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC).
Configure the Analog Inputs Correctly
The analog inputs are defined by IP Integrator or software to be either unipolar or bipolar and you can control the input configuration for each analog input individually. When a unipolar signal is quantized, the input signal can range between 0V and 1V. For a bipolar input, the differential voltage between the Vp and Vn inputs is ±0.5V. Selecting the right mode ensures the best performance and avoids damaging the analog inputs. For unipolar configurations, Vp cannot be negative with respect to Vn. For bipolar inputs, Vp and Vn can swing positive and negative with respect the common mode (reference) voltage. Bipolar mode provides better noise performance because any common-mode noise coupled onto the Vp and Vn signals will be removed thanks to differential sampling.
When it comes to providing better performance in electrically noisy environments, you can also turn on input-channel averaging to average out the noise.
Leverage the External Multiplexer Capabilities
Both the XADC and the Sysmon can accept as many as seventeen external differential analog signals using one dedicated Vp/Vn pair and sixteen Auxiliary Vp/Vn pins. Doing so of course uses several I/O signal pins—as many as 34 I/O pins if all analog inputs are used. This may present issues, especially on smaller devices where I/O-pin availability might be tightly constrained so the XADC/Sysmon can drive an external multiplexer that reduces the number of pins required and also allows you to use and external mux with added protection for harsh operating environments (e.g. ESD protection).
Consider the Anti-Aliasing Filter effect on Conversion Performance
Implementing an Anti-Aliasing filter on the front end of the XADC/Sysmon external inputs is critical to ensuring that only the signals we want are quantized.
The external resistor and capacitors in the AAF will increase the overall settling time. Therefore, we need to ensure the external AAF also does not adversely affect the total settling time and consequently the conversion performance. Failing to provide adequate system-level settling time can result in ADC measurement errors because the sampling capacitor will not charge to its final value.
Both the XADC and Sysmon can monitor internal power supply voltages and temperatures. This is a great feature when we initially commission the boards because we can verify that the power supplies are delivering the expected voltages. We can even use the temperature sensor to verify thermal calculations at the high and low end of qualification environments.
When it comes to creating the run-time application you should use the temperature and voltage alarms, which are based on defined thresholds for core voltages and device temperature. Should the measured parameter fall outside of these defined thresholds, an alarm allows further action to be taken. Configured correctly this alarm capability can be used to generate an interrupt which alerts the processing system to a problem. Depending upon which alarm which has been raised, the system can then act to either protect itself or undertake graceful degradation, thus preventing sudden failure.
Hopefully these tips will enable you to create smoother XADC/Sysmon solutions. If you experience any issues, I have a page on my website that links to all previous XADC / Sysmon examples in this series.