Adam Taylor just published an EETimes review of the Xilinx RFSoC, announced earlier this week. (See “Game-Changing RFSoCs from Xilinx”.) Taylor has a lot of experience with high-speed analog converters: he’s designed systems based on them—so his perspective is that of a system designer who has used these types of devices and knows where the potholes are—and he’s worked for a semiconductor company that made them—so he should know what to look for with a deep, device-level perspective.
Here’s the capsulized summary of his comments in EETimes:
“The ADCs are sampled at 4 Gsps (gigasamples per second), while the DACs are sampled at 6.4 Gsps, all of which provides the ability to work across a very wide frequency range. The main benefit of this, of course, is a much simpler RF front end, which reduces not only PCB footprint and the BOM cost but -- more crucially -- the development time taken to implement a new system.”
“…these devices offer many advantages beyond the simpler RF front end and reduced system power that comes from such a tightly-coupled solution.”
“These devices also bring with them a simpler clocking scheme, both at the device-level and the system-level, ensuring clock distribution while maintaining low phase noise / jitter between the reference clock and the ADCs and DACs, which can be a significant challenge.”
“These RFSoCs will also simplify the PCB layout and stack, removing the need for careful segregation of high-speed digital signals from the very sensitive RF front-end.”
“I, for one, am very excited to learn more about RFSoCs and I cannot wait to get my hands on one.”
For more information about the new Xilinx RFSoC, see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say ‘All Programmable,’ we mean it!” and “The New All Programmable RFSoC—and now the video.”
Avnet’s new $499 UltraZed PCIe I/O carrier card for its UltraZed-EG SoM (System on Module)—based on the Xilinx Zynq UltraScale+ MPSoC—gives you easy access to the SoM’s 180 user I/O pins, 26 MIO pins from the Zynq MPSoC’s MIO, and 4 GTR transceivers from the Zynq MPSoC’s PS (Processor System) through the PCIe x1 edge connector; two Digilent PMOD connectors; an FMC LPC connector; USB and microUSB, SATA, DisplayPort, and RJ45 connectors; an LVDS touch-panel interface; a SYSMON header; pushbutton switches; and LEDs.
$499 UltraZed PCIe I/O Carrier Card for the UltraZed-EG SoM
That’s a lot of connectivity to track in your head, so here’s a block diagram of the UltraZed PCIe I/O carrier card:
UltraZed PCIe I/O Carrier Card Block Diagram
For information on the Avnet UltraZed SOM, see “Look! Up in the sky! Is it a board? Is it a kit? It’s… UltraZed! The Zynq UltraScale+ MPSoC Starter Kit from Avnet” and “Avnet UltraZed-EG SOM based on 16nm Zynq UltraScale+ MPSoC: $599.” Also, see Adam Taylor’s MicroZed Chronicles about the UltraZed:
Yesterday, Xilinx announced breakthrough RF converter technology that allows the creation of an RFSoC with multi-Gsamples/sec DACs and ADCs on the same piece of TSMC 16nm FinFET silicon as the digital programmable-logic circuitry, the microprocessors, and the digital I/O. This capability transforms the Zynq UltraScale+ MPSoC into an RFSoC that's ideal for implementing 5G and other advanced RF system designs. (See “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say ‘All Programmable,’ we mean it!” for more information about that announcement.)
Today there’s a 4-minute video with Sr. Staff Technical Marketing Engineer Anthony Collins providing more details including an actual look at the performance of a 16nm test chip with the 12-bit, 4Gsamples/sec ADC and the 14-bit, 6.4Gsamples/sec DAC in operation.
Here’s the video:
To learn more about the All Programmable RFSoC architecture, click here or contact your friendly, neighborhood Xilinx sales representative.
Xilinx has just introduced a totally new technology for high-speed RF designs: an integrated RF-processing subsystem consisting of RF-class ADCs and DACs implemented on the same piece of 16nm UltraScale+ silicon along with the digital programmable-logic, microprocessor, and I/O circuits. This technology transforms the All Programmable Zynq UltraScale+ MPSoC into an RFSoC. The technology’s high-performance, direct-RF sampling simplifies the design of all sorts of RF systems while cutting power consumption, reducing the system’s form factor, and improving accuracy—driving every critical, system-level figure of merit in the right direction.
The fundamental converter technology behind this announcement was recently discussed in two ISSCC 2017 papers by Xilinx authors: “A 13b 4GS/s Digitally Assisted Dynamic 3-Stage Asynchronous Pipelined-SAR ADC” and “A 330mW 14b 6.8GS/s Dual-Mode RF DAC in 16nm FinFET Achieving -70.8dBc ACPR in a 20MHz Channel at 5.2GHz.” (You can download a PDF copy of those two papers here.)
This advanced RF converter technology vastly extends the company’s engineering developments that put high-speed, on-chip analog processing onto Xilinx All Programmable devices starting with the 1Msamples/sec XADC converters introduced on All Programmable 7 series devices way back in 2012. However, these new 16nm RFSoC converters are much, much faster—by more than three orders of magnitude. Per today’s technology announcement, the RFSoC’s integrated 12-bit ADC achieves 4Gsamples/sec and the integrated 14-bit DAC achieves 6.4Gsamples/sec, which places Xilinx RFSoC technology squarely into the arena for 5G direct-RF design as well as millimeter-wave backhaul, radar, and EW applications.
Here’s a block diagram of the RFSoC’s integrated RF subsystem:
Xilinx Zynq UltraScale+ RFSoC RF Subsystem
In addition to the analog converters, the RF Data Converter subsystem includes mixers, a numerically controlled oscillator (NCO), decimation/interpolation, and other DSP blocks dedicated to each channel. The RF subsystem can handle real and complex signals, required for IQ processing. The analog converters achieve high sample rates, large dynamic range, and the resolution required for 5G radio-head and backhaul applications. In some cases, the integrated digital down-conversion (DDC) built into the RF subsystem requires no additional FPGA resources.
The end result is breakthrough integration. The analog-to-digital signal chain, in particular, is supported by a hardened DSP subsystem for flexible configuration by the analog designer. This leads to a 50-75% reduction in system power and system footprint, along with the needed flexibility to adapt to evolving specifications and network topologies.
Where does that system-power reduction come from? The integration of both the digital and analog-conversion electronics on one piece of silicon eliminates a lot of power-hungry I/O and takes the analog converters down to the 16nm FinFET realm. Here’s a power-reduction table from the backgrounder with three MIMO radio example systems:
How about the form-factor reduction? Here’s a graphical example:
You save the pcb space needed by the converters and you save the space required to route all of the length-matched, serpentine pcb I/O traces between the converters and the digital SoCs. All of that I/O connectivity and the length matching now takes place on-chip.
To learn more about the All Programmable RFSoC architecture, click here or contact your friendly, neighborhood Xilinx sales representative.
Note: When we say “All Programmable” we mean it.
By Adam Taylor
We have now built a basic Zynq UltraScale+ MPSoC hardware design for the UltraZed board in Vivado that got us up and running. We’ve also started to develop software for the cores within the Zynq UltraScale+ MPSoC’s PS (processor system). The logical next step is to generate a simple “hello world” program, which is exactly what we are going to do for one of the cores in the Zynq UltraScale+ MPSoC’s APU (Application Processing Unit).
As with the Zynq Z-7000 SoC, we need three elements to create a simple bare-metal program for the Zynq UltraScale+ MPSoC:
To create a new hardware platform definition, select:
File-> New -> Other -> Xilinx – Hardware Platform Specification
Provide a project name and select the hardware definition file, which was exported from Vivado. You can find the exported file within the SDK directory if you exported it local to the project.
Creating the Hardware platform
Once the hardware platform has been created within SDK, you will see the hardware definition file opens within the file viewer. Browsing through this file, you will see the address ranges of the Zynq UltraScale+ MPSoC’s ARM Cotex-A53 and Cortex-R5 processors and PMU (Performance Monitor Unit) cores within the design. A list of all IP within the processors’ address space appears at the very bottom of the file.
Hardware Platform Specification in SDK file browser
We then use the information provided within the hardware platform to create a BSP for our application. We create a new application by selecting:
File -> New -> Board Support Package
Within the create BSP dialog, we can select the processor this BSP will support, the compiler to be used, and the selected OS, In this case, we’ll use bare metal or FreeRTOS.
For this first example, we will be running the “hello world” program from the APU on processor core 0. We must be sure to target the same core as we create the BSP and application if everything is to function correctly.
Board Support Package Generation
With the BSP created, the next step is to create the application using this BSP. We can create the application in a similar manner to the BSP and hardware platform:
File -> New -> Application Project
This command opens a dialog that allows us to name the project, select the BSP, specify the processor core, and select operating system. On the first tab of the dialog, configure these settings for APU core 0, bare metal, and the BSP just created. On the second tab of the dialog box, select the pre-existing “hello world” application.
Configuring the application
Selecting the Hello World Application
At this point, we have the application ready to run on the UltraZed dev board. We can run the application using either the debugger within SDK or we can boot the device from a non-volatile memory such as an SD card.
To boot from an SD Card, we need to first create a first-stage boot loader (FSBL). To do this, we follow the same process as we do when creating a new application. The FSBL will be based on the current hardware platform but it will have its own BSP with several specific libraries enabled.
Select File -> New -> Application Project
Enter a project name and select the core and OS to support the current build as previously done for the “hello world” application. Click the “Create New” radio button for the BSP and then on the next page, select the Zynq MP FSBL template.
Configuring the FSBL application
Selecting the FSBL template
With the FSBL created, we now need to build all our applications to create the required ELF files for the FSBL and the application. If SDK is set to build automatically, these files will have been created following the creation of the FSBL. If not, then select:
Project -> Build All
Once this process completes, the final step is to create a boot file. The Zynq UltraScale+ MPSoC boots from a file named boot.bin, created by SDK. This file contains the FSBL, FPGA programming file, and the applications. We can create this file by hand and indeed later in this series we will be doing so to examine the more advanced options. However, for the time being we can create a boot.bin by right-clicking on the “hello world” application and selecting the “Create Boot Image” option.
Creating the boot image from the file, from the hello world application
This will populate the “create boot image” dialog correctly with the FSBL, FPGA bit file, and our application—provided the elf files are available.
Boot Image Creation Dialog correctly populated
Once the boot file is created, copy the boot.bin onto a microSD card and insert it into the SD card holder on the UltraZed IOCC (I/O Carrier Card). The final step, before we apply power, is to set SW2 on the UltraZed card to boot from the SD Card. The setting for this is 1 = ON, 2 = OFF, 3 = ON, and 4 = OFF. Now switch on the power on, connect to a terminal window, and you will see the program start and execute.
When I booted this on my UltraZed and IOCC combination, the following appeared in my terminal window:
Hello World Running
Next week we will look a little more at the architecture of the Zynq UltraScale+ MPSoC’s PS.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
The steep attenuation of 28Gbps signals on pcbs cause problems for every system designer trying to deal with high-speed serial interfaces. Samtec’s new brochure titled “Samtec Products Supporting Xilinx Virtex UltraScale+ FPGA VCU118 Development Kit” sums up the problem in just one graphic:
The best you can hope for with a pcb (the dark blue line) is about half a db of attenuation per trace inch for a pcb manufactured using PTFE laminates, which were originally developed for microwave and RF applications. Up there at the top of the graph, showing far less attenuation, you see Samtec’s solution: its Twinax FireFly interconnect system, which is designed to carry these high-speed data signals within a system. (For essentially no loss, there’s a compatible optical FireFly module, shown for comparison at the very top of the graph.)
This next table, also taken from the Samtec brochure, sharpens the picture even more:
As the table shows, if you are dealing with 28Gbps transceiver signals (and soon perhaps 56Gbps) and you need to go more than a very few inches, then you’ll want to consider the Samtec FireFly Micro Flyover system.
An easy way to experiment with the Samtec FireFly Micro Twinax interconnect system is to use the company’s FMC+ Active Loopback card, which is included with the Xilinx VCU118 Virtex UltraScale+ FPGA Dev Kit. Here’s a photo of that card plugged into the FMC port of the VCU118 Dev Kit:
Xilinx VCU118 Virtex UltraScale+ FPGA Development Kit with Samtec FireFly Micro Flyover FMC Card
Note that one end of the FireFly Twinax cable is connected to this FMC card and the other end plugs into a FireFly connector already present on the Xilinx VCU118 Dev Kit’s FPGA board.
Samtec has a new 12-page FireFly Application Design Guide that provides more details.
For previous Xcell Daily blog posts about the Samtec FireFly interconnect system, see:
By Adam Taylor
Having introduced the concept of AMP and the OpenAMP frame work in a previous blog (See Adam Taylor’s MicroZed Chronicles, Part 169: OpenAMP and Asymmetric Processing, Part 1), the easiest way to get OpenAMP up and running is to use one of the examples provided. Once we have the example up and running, we can then add our own application. Getting the example up and running before we develop our own application allows us to pipe-clean the development process.
To create the example, we are going to use PetaLinux SDK to correctly configure the PetaLinux image. We will be running PetaLinux on the Zynq SoC. As such, this blog will also serve as a good tutorial for building and using PetaLinux.
The first thing we need to do is download the PetaLinux SDK and install it on our Linux machine, if you do not have one of these, that’s not a problem. You can use a virtual machine, which is exactly what I am doing. Installing PetaLinx is very simple. You can download the installer from here and the installation guide and all of the supporting documentation is available here.
Once we have installed PetaLinux to get this example going, we are going to follow the UG1186 example and run the “echo world” example. To do this, we need to create a new PetaLinux project and then update it to support Open AMP.
When we create a new PetaLinux project, we need to provide the project a BSP (Board Support Package) reference. We can of course create our own BSP, however as I am going to be using the ZedBoard to demonstrate OpenAMP, I downloaded the ZedBoard BSP provided with the PetaLinux Installation files.
To create the new project, open a terminal window and enter the following:
petalinux-create --type project --name <desired name> -- source <path to BSP>
This command creates the project that we can then customize and build. After you create the project, change your directory to the project directory in the terminal window.
The examples we are going to be using are provided within the PetaLinux project. If you open a File Explorer window, you will find these examples under <project>/components/apps/. We plan to create new code, so this is where will be adding new applications as well.
To run these applications, we need to first make some changes to PetaLinux. The first change is to configure the kernel base address. The bare-metal application and the Linux application need to use different base addresses. The bare-metal application has a fixed base address of 0, so we need to offset the Linux base address. To do this within the terminal window in the project directory type the following command:
This command opens a dialog window that allows us to configure the Linux base address. Using the arrow keys, navigate to Subsytem AUTO Hardware Settings ---> Memory Settings and set the kernel base address to 0x10000000 as shown below. Remember to save the configuration and then navigate from the root menu to U-boot Configuration and set the netboot offset to 0x11000000. Again save the configuration before exiting and returning to the root menu.
Setting the Kernel Base Address
Setting the NetBoot Offset
The next step is to configure the kernel to ensure that it contains the remote processor and can load modules and user firmware. To do this, enter the following command in the terminal window:
petalinux-config -c kernel
This will again open a configuration dialog within which we need to set the following:
Make sure Enable Loadable Module Support is set
Under Device Drivers --->Generic Drivers ---> set Userspace firmware loading support
Under Device Drivers --->Remoteproc drivers ensure support for the Zynq remoteproc is set
We also need to correctly configure the kernel memory space between user and kernel. Navigate to kernel features ---> Memory split (<current setting> user/kernel split) ---> and set it to 2G/2G:
Setting the User /Kernel Split
The final stage of configuring the kernel is to support high memory support:
Setting the high memory support under Kernel Features
This completes the changes we need to make to the kernel. Save the changes and exit the dialog.
The next thing we need to configure the Root File System to include the example applications. To do this, enter the following to bring up the rootfs dialog:
petalinux-config -c rootfs
Under Apps, ensure that echo_test, mat_mul_demo, and proxy_app are enabled:
Enabling the Apps
We also need to enable the OpenAMP drivers modules. These are under the Modules:
Enabling the OpenAMP drivers
With these changes made, save the changes and exit the dialog. We are nearly now almost ready to create our new PetaLinux build. But first we need to update the device tree to support OpenAMP. To do this, use File Explorer to navigate to:
Within this directory, you will find a number of files including system_top.dts and openamp-overlay.dtsi. Open the system_top.dts file and tell it to include the openamp-overlay.dtsi file by adding the line:
System_top.dts configured for OpenAMP
Now we are finally ready to build PetaLinux. In the terminal window within the project directory, type:
This will create the following folders within your project directory using the built image files <project>/images/linux.
Within this file you will see a range of files. However, the key files moving forward are:
For this example I want to boot from the SD card so I need to create a boot.bin file. Looking in the directory you will notice one has not been provided. However, all the files we need to create one have been included.
To create a boot.bin file, enter the following using the terminal:
petalinux-package --boot --fsbl images/linux/zynq_fsbl.elf --fpga images/linux/download.bit --uboot
This command creates the boot.bin file that we will put onto our SD card along with the image.ub. Once we have these files on an SD card, connect it to the ZedBoard and boot the board.
Using a serial terminal connected to the ZedBoard, we can now run our example and see that it is running ok. Once the example boots, you need to log in. The username and password are both “root.”
Enter the following command in the terminal to run the example:
modprobe zynq_remoteproc firmware=image_echo_test
Results of running the first command
After the channel is created, you may need to press return to bring up the command prompt again. Now enter the following command:
Result of running the second command
Now it is time to run the application itself. Enter the following:
Start of the Echo Test application
Select option 1 and the test will run to completion. You can exit the application by selecting 2:
Successful test of the example application
At this point, we have successfully demonstrated that OpenAMP is running on the Zynq SoC. (We will look at the Zynq UltraScale+ MPSoC in due course.)
Now we are ready to start developing our own applications using OpenAMP.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
As the BittWare video below explains, CPUs are simply not able to process 100GbE packet traffic without hardware acceleration. BittWare’s new Streamsleuth, to be formally unveiled at next week’s RSA Conference in San Francisco (Booth S312), adroitly handles blazingly fast packet streams thanks to a hardware assist from an FPGA. And as the subhead in the title slide of the video presentation says, StreamSleuth lets you program its FPGA-based packet-processing engine “without the hassle of FPGA programming.”
(Translation: you don’t need Verilog or VHDL proficiency to get this box working for you. You get all of the FPGA’s high-performance goodness without the bother.)
That said, as BittWare’s Network Products VP & GM Craig Lund explains, this is not an appliance that comes out of the box ready to roll. You need (and want) to customize it. You might want to add packet filters, for example. You might want to actively monitor the traffic. And you definitely want the StreamSleuth to do everything at wire-line speeds, which it can. “But one thing you do not have to do, says Lund, “is learn how to program an FPGA.” You still get the performance benefits of FPGA technology—without the hassle. That means that a much wider group of network and data-center engineers can take advantage of BittWare’s StreamSleuth.
As Lund explains, “100GbE is a different creature” than prior, slower versions of Ethernet. Servers cannot directly deal with 100GbE traffic and “that’s not going to change any time soon.” The “network pipes” are now getting bigger than the server’s internal “I/O pipes.” This much traffic entering a server this fast clogs the pipes and also causes “cache thrash” in the CPU’s L3 cache.
Sounds bad, doesn’t it?
What you want is to reduce the network traffic of interest down to something a server can look at. To do that, you need filtering. Lots of filtering. Lots of sophisticated filtering. More sophisticated filtering than what’s available in today’s commodity switches and firewall appliances. Ideally, you want a complete implementation of the standard BPF/pcap filter language running at line rate on something really fast, like a packet engine implemented in a highly parallel FPGA.
The same thing holds true for attack mitigation at 100Gbe line rates. Commodity switching hardware isn’t going to do this for you at 100GbE (10GbE yes but 100GbE, “no way”) and you can’t do it in software at these line rates. “The solution is FPGAs” says Lund, and BittWare’s StreamSleuth with FPGA-based packet processing gets you there now.
Software-based defenses cannot withstand Denial of Service (DoS) attacks at 100GbE line rates. FPGA-accelerated packet processing can.
So what’s that FPGA inside of the BittWare Streamsleuth doing? It comes preconfigured for packet filtering, load balancing, and routing. (“That’s a Terabit router in there.”) To go beyond these capabilities, you use the BPF/pcap language to program your requirements into the the StreamSleuth’s 100GbE packet processor using a GUI or APIs. That packet processor is implemented with a Xilinx Virtex UltraScale+ VU9P FPGA.
Here’s what the guts of the BittWare StreamSleuth look like:
And here’s a block diagram of the StreamSleuth’s packet processor:
The Virtex UltraScale+ FPGA resides on a BittWare XUPP3R PCIe board. If that rings a bell, perhaps you read about that board here in Xcell Daily last November. (See “BittWare’s UltraScale+ XUPP3R board and Atomic Rules IP run Intel’s DPDK over PCIe Gen3 x16 @ 150Gbps.”)
Finally, here’s the just-released BittWare StreamSleuth video with detailed use models and explanations:
For more information about the StreamSleuth, contact BittWare directly or go see the company’s StreamSleuth demo at next week’s RSA conference. For more information about the packet-processing capabilities of Xilinx All Programmable devices, click here. And for information about the new Xilinx Reconfigurable Acceleration Stack, click here.
Annapolis Microsystems has adopted the Xilinx Zynq UltraScale+ MPSoC in a big way today by introducing three 6U and 3U OpenVPX boards and one PCIe board based on three of the Zynq UltraScale+ MPSoC family members. The four new COTS boards are:
Annapolis Microsystems WILDSTAR UltraKVP ZP 3PE for 6U OpenVPX
Annapolis Microsystems WILDSTAR UltraKVP ZP 2PE for 6U OpenVPX
Annapolis Microsystems WILDSTAR UltraKVP ZP for 3U OpenVPX
Annapolis Microsystems WILDSTAR UltraKVP ZP for PCIe
Why choose the Xilinx Zynq UltraScale+ MPSoC? Noah Donaldson, Annapolis Micro Systems VP of Product Development explains: “Here at Annapolis we pride ourselves on being first to integrate the latest, cutting-edge components into our FPGA boards, all in pursuit of one goal: Delivering the highest-performing COTS boards and systems that have been proven in some of the most challenging environments on earth.”
That pretty much says it all, doesn’t it?
Amazon Web Services (AWS) rolled out the F1 instance for cloud application development based on Xilinx Virtex UltraScale+ Plus VU0P FPGAs last November. (See “Amazon picks Xilinx UltraScale+ FPGAs to accelerate AWS, launches F1 instance with 8x VU9P FPGAs per instance.) It appears from the following LinkedIn post that people are using it already to do some pretty interesting things:
If you’re interested in Cloud computing applications based on the rather significant capabilities of Xilinx-based hardware application acceleration, check out the Xilinx Acceleration Zone.
By Adam Taylor
Like is MicroZed and PicoZed predecessors, the UltraZed-EG is a System on Module (SOM) that contains all of the necessary support functions for a complete embedded processing system. As a SOM, this module is designed to be integrated with an application-specific carrier card. In this instance, our application-specific card is the Avnet UltraZed IO Carrier Card.
The specific Zynq UltraScale+ MPSoC contained within the UltraZed SOM is the XCZU3EG-SFVA625, which incorporates a quad-core ARM Cortex-A53 APU (Application Processing Unit), dual ARM Cortex-R5 processors in an RPU (Real-Time Processing Unit), and an ARM Mali-400 GPU. Coupled with a very high performance programmable-logic array based on the Xilinx UltraScale+ FPGA fabric, suffice it to say that exploring how to best use all of these resources it will keep us very, very busy. You can find the 36-page product specification for the device here.
The UltraZed SOM itself shown in the diagram below provides us with 2GBytes of DDR4 SDRAM, while non-volatile storage for our application(s) is provided by both dual QSPI or eMMC Flash memory. Most of the Zynq UltraScale+ MPSoC’s PS and PL I/O are broken out to one of three headers to provide maximum flexibility on the application-specific carrier card.
Avnet UltraZed-EG SOM Block Diagram
The UltraZed IO Carrier Card (IOCC), breaks out the I/O pins from the SOM to a wide variety of interface and interconnect technologies including Gigabit Ethernet, USB 2/3, UART, PMOD, Display Port, SATA, and Ardunio Shield. This diverse set of I/O connections give us wide lattitude in developing all sorts of systems. The IOCC also provicdes a USB to JTAG interface allowing us to program and debug our system. You’ll find more information on the IOCC here.
Having introduced the UltraZed and its IOCC, it is time to write a simple “hello world” application and to generate our first Zynq UltraScale+ MPSoC design.
The first step on this journey is make sure we have used the provided voucher to generate a license and downloaded the Design Edition of the Vivado Design Suite.
The next step is to install the board files to provide Vivado with the necessary information to create designs targeting the UltraZed SoM. You can download these files using this link. These board-definition files include information such as the actual Zynq UltraScale+ MPSoC device populated on the SoM, connections to the PS on the IOCC, and a preset configuration for the SoM. We can of course create an example without using these files, however it requires a lot more work.
Once you have downloaded the zip file, extract the contents into the following directory:
<Vivado Install Root>/data/boards/boardfiles
When this is complete, you will see that the UltraZed board defintions are now in the directory and we can now use them within our design.
I should point out at this point that some of the UltraZed boards (including mine) use ES1 silicon. To alert Vivado about this, we need to create a init.tcl file in the scripts directory that will enable us to use ES1 silicon. Doing so is very simple. Within the directory:
Create a file called init.tcl. Enter the line “enable_beta_device*” into this file to enable the use of ES1 silicon within your toolchain.
With this completed we can open Vivado and create a new RTL project. After entering the project name and location, click next on the add sources, IP, and constraints tabs. This should bring you to part selection tab. Click on boards and you should see our UltraZed IOCC board. Select that board and then finish the open project dialog.
This will create a new project.
For this project I am just going to just use the Zynq UltraScale+ MPSoC’s PS to print “hello world.” I usually like to do this with new boards to ensure that I have pipe-cleaned the tool chain. To do this, we need a hardware-definition file to export to SDK to define the hardware platform.
The first step in this sequence is within Flow Navigator. On the left-hand side of the Vivado screen, select the Create Block Diagram option. This will provide a dialog box allowing you to name your block design (or you can leave it default). Click OK and this will create a blank block diagram (in the example below mine is called design_1).
Within this block diagram, we need to add an MPSoC system. Click on the “add IP” button as indicated in the block diagram. This will bring up an IP dialog. Within the search box, type in “MPSoC” and you will see the Zynq UltraScale+ MPSoC IP block. Double click on this and it will be added to the diagram automatically.
Once the block has been added, you will notice a designer assistance notification across the top of the block diagram. For the moment, do not click on that. Instead, double click on the MPSoC IP in your block diagram and it will open up the customization screen for the MPSoC, just like any other IP block.
Looking at the customization screen, you will see it is not yet configured for the target board. For instance, the IOU block has no MIO configuration. Had we not downloaded the board definition, we would now have to configure this by manually. But why do that when we can use the shortcut?
We have the board-definition files, so all we need to do to correctly configure this for the IOCC is close the customization dialog and click on the Run Block Automation notification at the top of the block diagram. This will configure the MPSoC for our use on the IOCC. Within the block automation dialog, check to make sure that the “apply pre-sets” option is selected before clicking OK.
Re-open the MPSoC IP block again and you will see a different configuration of the MPSoC—one that is ready to use with our IOCC.
Do not change anything. Close the dialog box. Then, on the block diagram, connect the PL_CLK0 pin to the maxihpm0_lpd_ack pin. Once that is complete, click on “validate” to ensure that the design has no errors.
The next step is very simple. We’ll create an RTL wrapper file for the block diagram. This will allow us to implement the design. Under the sources tab, right-click on the block diagram and select “create HDL wrapper.” When prompted, select the option that allows Vivado to manage the file for you and click OK.
To generate the bitstream, click on the “Generate Bitstream” icon on the menu bar. If you are prompted about any stages being out of date, re-run them first by clicking on “yes.”
Depending on the speed of your system, this step may take a few minutes or longer to generate the bitstream. Once completed, select the “open implementation” option. Having the implementation open allows us to export the hardware definition file to SDK where we will develop our software.
To export the hardware definition, select File-> Export->Export Hardware. Select “include bit file” and export it.
To those familiar with the original Zynq SoC, all of this should look pretty familiar.
We are now ready to write our first software program—next time.
You can find links to previous editions of the MPSoC edition here
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Accolade’s newly announced ATLAS-1000 Fully Integrated 1U OEM Application Acceleration Platform pairs a Xilinx Kintex UltraScale KU060 FPGA on its motherboard with an Intel x86 processor on a COM Express module to create a network-security application accelerator. The ATLAS-1000 platform integrates Accolade’s APP (Advanced Packet Processor), instantiated in the Kintex UltraScale FPGA, which delivers acceleration features for line-rate packet processing including lossless packet capture, nanosecond-precision packet timestamping, packet merging, packet filtering, flow classification, and packet steering. The platform accepts four 10G SFP+ or two 40G QSFP pluggable optical modules. Although the ATLAS-1000 is designed as a flow-through security platform, especially for bump-in-the-wire applications, there’s also 1Tbyte worth of on-board local SSD storage.
Accolade Technology's ATLAS-1000 Fully Integrated 1U OEM Application Acceleration Platform
Here’s a block diagram of the ATLAS-1000 platform:
All network traffic enters the FPGA-based APP for packet processing. Packet data is then selectively forwarded to the x86 CPU COM Express module depending on the defined application policy.
Please contact Accolade Technology directly for more information about the ATLAS-1000.
By Adam Taylor
As we are going to be looking at both the Zynq Z-7000 SoC and the Zynq UltraScale+ MPSoC in this series moving forward, one important aspect we need to consider is how we can best leverage the processor cores provided within our chosen device. How we use these cores of course, depends upon the system architecture we implement to achieve the overall system requirements. Increasingly, system designers use an asymmetric approach to obtain optimal performance and to address the system-design challenges. Of course, system-design challenges are usually application-specific.
At this point, those unfamiliar with the term may find themselves asking what an asymmetric approach is? Simply put, a asymmetric approach is one where different processing elements within a device perform different functions and indeed some may be running different operating systems to achieve that function. One example of this would be one of the two ARM Cortex-A9 processor cores of a Zynq SoC running Linux and handling system communications and other tasks, which do not need to be addressed in real time, while the second processor core runs a bare-metal application or a FreeRTOS application to execute real-time processing tasks and communicating results to the other core.
When we implement systems in such a manner, we call this asymmetric multi-processing or AMP. We have looked at AMP before, briefly. However, we did not look at the OpenAMP framework developed by the Multicore Association. This open-source framework is supported by both the Zynq SoC and Zynq UltraScale+ MPSoC and provides the software elements necessary for us to establish an AMP system. As such, it is something we need to be familiar with as we develop our examples going forward.
The alternative is a symmetric multi-processing (SMP) system in which all the cores run the same operating system and are balancing the workload among themselves. An example of this would be running Linux on both cores of a Zynq SoC.
Creating an AMP system allows us to leverage the parallelism provided by having several processing cores available, i.e. we can set different cores to perform different tasks under the control of a master core. However, AMP does come with challenges such as how inter-process communication is implemented, how resources are shared, process control, and how the life cycle is managed. The OpenAMP framework is designed to address these issues and to enable reuse and portability at the same time.
When working with the Zynq SoC and Zynq UltraScale+ MPSoC, we can implement AMP solutions which have the following configuration:
I should note at this point that while in the Zynq SoC, we can use one core to run Linux as the master, in the Zynq UltraScale+ MPSoC we can use the quad-core APU (based on ARM Cortex-A53 processorsto run Linux while using the dual-core RPU (based on ARM Cortex-R5 processors) as the remote to run the bare-metal or FreeRTOS applications.
The master core, running Linux contains most of the OpenAMP framework within the kernel. There are main three components:
The diagram below (taken from UG1186, “OpenAMP Framework for Zynq Devices”) illustrates the process between master and remote processor using OpenAMP.
Example of OpenAMP flow
Of course, when we build our bare-metal and FreeRTOS applications, we also need to include the necessary libraries within the BSP to enable these to support OpenAMP. The libraries we need to enable are the OpenAMP library and the libmetal library.
Having introduced the OpenAMP concept, next week we will look at how we can get an example up and running on a Zynq device.
Code is available on Github as always.
All of Adam Taylor’s MicroZed Chronicles are cataloged here.
By Adam Taylor
Note: Adam Taylor just cannot stop working with or writing about Xilinx devices (nor would we want him to). So here’s the first instalment of his new sub-series about the Zynq UltraScale+ MPSoC.)
Over the last three years, we have used this blog to look at and will continue to look at how we can use the Zynq-7000 SoC in our designs. However, the next generation Zynq, the Zynq UltraScale+ MPSoC, is now available and it would be remiss if we did not also cover how to use this new device and the Avnet UltraZed board in our designs and applications as well. So, welcome to the UltraZed edition of the MicroZed Chronicles.
Avnet UltraZed board on Carrier Card
Before we delve into the Avnet UltraZed board, which we are going to be using to explore this device, I want to spend some time explaining the internals of the Zynq UltraScale+ MPSoC device itself. Of course, this is just an overview and we will be looking more in depth at all aspects of the device as this series continues.
The Zynq UltraScale+ MPSoC is a heterogeneous processing platform which, like the Zynq-7000 SoC, combines a Processing System (PS) with Programmable Logic (PL). However, both PS and PL in the Zynq UltraScale+ MPSoC are significantly more capable.
Within the Zynq UltraScale+ MPSoC’s PS, we find the following main processing elements (I say main as there are others which will be introduced as well):
These processing elements connect via a central interconnect to the MIO peripherals and other functions and interfaces within the PS. The MIO contains the same SPI, UART, I2C, CAN, etc. that are familiar to developers using the Zynq-7000 SoC. For configuration and storage, we can use SD/eMMC, Quad SPI or NAND Flash, also provided by the MIO, while high-speed system communication is provided via multiple GigE and USB 3 interfaces.
Zynq UltraScale+ MPSoC Block Diagram
The APU connects to the central interconnect via the System Memory Manager Unit (SMMU) and the Cache Coherent Interconnect (CCI) while the RPU connects to it via the low-power domain switch and the SMMU.
Which brings us nicely to the MPSoC power domains. There are four in total; three within the PS; and one in the PL:
We should remember that in these modes, the power dissipation will depend upon which of the components within the domain are currently being used and their operating frequency. These power domains are operated under the control of the Platform Management Unit (PMU). The PMU is a triple-redundant processor. It controls the power-up, reset and system monitoring for the Zynq UltraScale+ MPSoC. The PMU is a very interesting resource as it is also capable of running user-developed programs to provide more detailed system monitoring for safety and security applications.
When it comes to executing our application(s), we can use DDR3/4 SDRAMs or their low-power versions under the control of the integrated DDR controller. Data paths to this controller are directly from the RPU or the APU via its Cache Coherent Interconnect, while the PL, DMA controller, and Display Port Interfaces can be switched between as required.
So far, we have only examined the PS. The PL of the Zynq UltraScale+ MPSoC consists of next-generation programmable-logic fabric from either the Kintex UltraScale+ or Virtex UltraScale+ FPGA families, which include UltraRAM, Block RAM, and DSP48E2 slices. Depending upon which of the Zynq UltraScale+ MPSoC devices you select, you will find increased connectivity solutions like PCIe, Interlaken, GTH and GTY transceivers within the PL. In the EG device family, you will also find an H.265 / H.264 Video Codec.
Like the Zynq-7000 SoC, the PS is the device master and configures the PL after power-up and initialization. The main method communication between the PL and the PS is also very similar and uses AXI Interfaces in both directions. Either the PL or the PS can be the AXI master. Depending upon the interface selected, these can be Cache- or I/O-Coherent or non-coherent; with data widths of 32, 64 or 128 bits.
Additional interfaces between the PS and the PL include:
Having briefly introduced the Zynq UltraScale+ MPSoC architecture, next week we will look at the UltraZed board and begin to build our first example.
You can find an overview of the different Zynq UltraScale+ MPSoC families here.
The in depth Zynq UltraScale+ MPSoC technical reference manual is available here.
SP Devices’ ADQ7 digitizer boasts one or two 14-bit acquisition channels with an aggregate acquisition rate of 10Gsamples/sec (5G samples/sec in 2-channel mode) with either ac or dc coupling and is available in multiple bus formats including MTCA.4, USB3, PCIe, PXIe, and 10 Gbit Ethernet with a maximum, sustained data-transfer rate of 5Gbytes/sec (over PCIe).
Given those specs, you should see an immediate problem: there’s not enough bus bandwidth to get the full, continuous digitized data stream out of the module so you immediately know you’re going to need local storage and on-board processing and data reduction.
And what controls that on-board storage and performs that on-board processing (and does practically everything else as well)?
A Xilinx Kintex UltraScale FPGA, of course.
Here’s a block diagram of the dc-coupled ADQ7DC:
On the left, you see the two pairs of high-speed ADCs that yield the 10Gsamples/sec acquisition speed. Note that all four of those ADCs directly feed the Kintex UltraScale FPGA, which performs several signal-processing steps including calibration; gain and offset adjustments; triggering; and any user-defined, real-time signal processing. You can define those processing blocks using SP Devices’ Dev Kit or you can order one of the several optional signal-processing modules that SP Devices has already developed for the ADQ7 including:
As you can see from the block diagram, the Kintex UltraScale FPGA also manages the ADQ7 module’s overall timing control and manages the data flow through FIFO queues, into and out of the on-board 4Gbyte DRAM, and over the various interfaces offered in this module family—although the native interface appears to be PCIe. (Note: All Kintex UltraScale FPGA family members have at least one integrated PCIe Gen1/2/3 controller, independent of the on-chip programmable logic.)
This digitizer illustrates the use of one Xilinx All Programmable device to implement most of the functions in a complex system. The Kintex UltraScale FPGA in this design implements everything but the ADCs, the clock, the memory, and some of the interface hardware. Everything else is nicely bundled in the one UltraScale device with enough capability left over to allow for the addition of user-defined processing. This sort of flexibility offers real value for system designers.
For more information about the ADQ7 digitizer family, contact SP Devices directly.
TI has created a power supply reference design for the Xilinx Zynq UltraScale+ MPSoC specifically for Remote Radio Head (RRH) and backhaul applications but there’s no reason you can’t use this design in any other design employing the Zynq UltraScale+ MPSoC. The compact reference design is based on TI’s TPS6508640 power-management IC (PMIC), which is a pretty sophisticated power supply controller, several power FETs, and a TPS544C25 high-current regulator. The TPS6508640 PMIC reduces board size, cost, and power loss using a high switching frequency and separate rails for core supplies.
The design creates ten regulated supply voltages for the Zynq UltraScale+ MPSoC based on a 12V source supply. Here’s what TI’s reference design looks like:
Here’s what the design looks like when placed on a pc board:
You’ll find a PDF describing this reference design in detail here.
Please contact TI directly for additional details.
Targeting military and other high-end, real-time computing applications, the PanaTeQ VPX3-ZU1 3U OpenVPX Module delivers the Xilinx Zynq UltraScale+ MPSoC with its six ARM processor cores (a quad-core ARM Cortex-A53 Application Processing Unit and a dual-core ARM Cortex-R5 Real-Time Processing Unit), an ARM MALI-400 Graphic Processing Unit, and big chunk of Xilinx’s advanced 16nm programmable logic available in the 100x160mm VPX form factor. The module also includes an on-board PCIe Gen2 switch driving eight PCIe ports on the VPX P1 port and an FMC site that complies with the Vita 57.1 HPC standard, which makes the VPX3-ZU1 board instantly compatible with a large number add-on I/O modules including the FMC-ZU1RF-A FMC Wideband RF Transceiver module based on the Analog Devices AD9371 Integrated, Dual RF Transceiver with Observation Path.
The VPX-ZU1 is available with one of three Xilinx Zynq UltraScale+ MPSoC devices (ZU6EG/ZU9EG/ZU15EG), and with either 2 or 4Gbytes of 64-bit DDR4-2400 SDRAM with 8-bit ECC for the Zynq UltraScale+ MPSoC’s processor system and either 512Mbytes or 1Gbyte of DDR4-2400 SDRAM connected to the Zynq UltraScale+ MPSoC’s programmable logic.
Here’s a detailed block diagram of the PanaTeQ VPX3-ZU1 3U OpenVPX Module:
PanaTeQ VPX3-ZU1 Block Diagram
PanaTeQ also offers the RTM-ZU1-A 3U OpenVPX Rear IO Transition Module for VPX3-ZU1, which makes the following I/O interfaces available on connectors through the rear-panel connectors:
And the following I/O interfaces on internal connectors:
Xilinx has a version of QEMU—a fast, open-source, just-in-time functional simulator—for the ARM processors in the Zynq SoC and the Zynq UltraScale+ MPSoC and for the company’s MicroBlaze soft processor core. QEMU accelerates code development by giving embedded software developers an enhanced execution environment long before hardware is available and they can continue to use QEMU as a software-development platform even after the hardware is ready. (After all, it’s a lot easier to distribute QEMU to 300 software developers than to ship hardware units to each of them.)
Although QEMU was already available through the open-source community, Xilinx has added several innovations over time to match the multi-core, heterogeneous devices available in the two distinct Zynq device families, augmented by additional MicroBlaze processors instantiated in programmable logic.
The latest version of Xilinx QEMU, available on github at https://github.com/Xilinx/qemu, includes extended features including:
Xilinx is actively developing QEMU enhancements, which means more features are on the way. Meanwhile, you’ll find the Xilinx QEMU Wiki here.
Introduced last month, VadaTech’s AMC596 places just a few chips on an AMC module—a Xilinx Virtex UltraScale VU440 FPGA, a QorIQ PPC2040 quad-core PowerPC processor, and 8Gbytes of DDR4 SDRAM—but you can build nearly anything with a combo like that. As the company says, this module is “ideal for ASIC prototyping/emulation” but it will also perform well in moderate-volume designs where the economics of ASIC design and manufacture do not deliver an advantage.
Here’s a photo of the AMC596 AMC module:
VadaTech AMC596 AMC Module
And here’s a block diagram:
VadaTech AMC596 AMC Module Block Diagram
The Xilinx Virtex UltraScale VU440 FPGA on the VadaTech AMC596 module is a DSP monster with 2880 DSP48E2 slices on chip along with 5541K system logic cells.
Despite the immense processing power built into VadaTech’s AMC596 module, it’s power consumption rating is only about 65W (depending on the application). In addition, the module measures a mere 73.5x180.6 mm. That’s a lot of capability packed into a small, low-power form factor.
Please contact VadaTech directly for additional information about the AMC596 module.
Need a really powerful SOM (system on module) with six heterogeneous ARM processor cores (four 64-bit ARM Cortex-A53 cores and two 32-bit ARM Cortex-R5 cores), an ARM Mali-400 MP2 GPU, and a big chunk of the latest-generation UltraScale+ programmable logic? Sounds like you need the Avnet-designed UltraZed-EG SOM based on a Xilinx Zynq UltraScale+ MPSoC (XCZU3EG). This $599 board is tiny (2x3.5 inches) but packs a wallop in the form of the Zynq UltraScale+ MPSoC, 2Gbytes of DDR4 SDRAM, and a bunch of I/O ports. That’s a ton of processing power for projects that require heavy lifting.
Here’s a photo of the SOM:
$599 Avnet UltraZed-EG SOM
And here’s a block diagram of the SOM to give you a better idea of what’s on the SOM.
Avnet UltraZed-EG SOM Block diagram
Pentek’s new video discusses the broad product line of more than 20 Jade XMC, PCIe, AMC, compact PCI, and VPX boards based on the Xilinx Kintex UltraScale FPGA family. The Pentek Jade modules are designed for high-performance data-acquisition and signal-processing applications with on-board ADCs and DACs as fast as 5G samples/sec.
The broad Jade product line illustrates how a company can take a basic idea and use programmable logic to develop comprehensive, multi-member product lines while minimizing engineering effort by leveraging the numerous resources included in the broad line of mutually compatible Xilinx Kintex UltraScale FPGAs. The Jade family represents the latest generation of related products that Pentek has based on three successive generations of Xilinx FPGAs. This latest generation from Pentek is 13% lighter, uses 23% less power, and costs about 30% less than the preceding generation, partly due to using next-generation Xilinx devices.
The Jade product line illustrates this concept especially because Pentek has not only developed a comprehensive line of board-level products, the company has also created a set of support tools called Navigator Design Suite that provides BSPs and software support for the Jade modules using Pentek-supplied IP for the on-board FPGAs. A companion tool called the Navigator FPGA Design Kit allows you to develop your own IP for high-speed data acquisition and signal processing. The Navigator BSP package and the Navigator FPGA Design Kit are closely linked so that the software and hardware IP dovetail.
Here’s the 4-minute Pentek video:
Note: For additional information on the Pentek Jade product line, see “Pentek kicks its radar, SDR DSP architecture up a notch from Cobalt to Onyx to Jade by jumping to Kintex UltraScale FPGAs.”
Jan Gray’s FPGA.org site has just published a blog post detailing the successful test of the GRVI Phalanx massively parallel accelerator framework, with 1680 open-source RISC-V processor cores running simultaneously on one Xilinx Virtex UltraScale+ VU9P. (That’s a mid-sized Virtex UltraScale+ FPGA.) According to the post, this is the first example of a kilocore RISC-V implementation and represents “the most 32-bit RISC cores on a chip in any technology.”
That’s certainly worth a picture (is a picture worth 1000 cores?):
1680 RISC-V processor cores run simultaneously on a Xilinx VCU118 eval kit with a Virtex UltraScale+ VU9P FPGA
The GRVI Phalanx’s design consists of 210 processing clusters with each cluster comprised of eight RISC-V processor cores, 128Kbytes of multiported RAM, and a 300-bit Hoplite NOC router. Here’s a block diagram of one such Phalanx cluster:
GRVI Phalanx Cluster Block Diagram
Note: Jan Gray contacted Xcell Daily after this post first appeared and wanted to clarify that the RISC-V ISA may be open-source and there may be open-source implementations of the RISC-V processor, but the multicore GRVI Phalanx is a commercial design and is not open-source.
Next week, the Xilinx booth at the CAR-ELE JAPAN show at Tokyo Big Sight will hold a variety of ADAS (Advanced Driver Assistance Systems) demos based on Xilinx Zynq SoC and Zynq UltraScale+ MPSoC devices from several companies including:
The Zynq UltraScale+ MPSoC and original Zynq SoC offer a unique mix of ARM 32- and 64-bit processors with the heavy-duty processing you get from programmable logic, needed to process and manipulate video and to fuse data from a variety of sensors such as video and still cameras, radar, lidar, and sonar to create maps of the local environment.
If you are developing any sort of sensor-based electronic systems for future automotive products, you might want to come by the Xilinx booth (E35-38) to see what’s already been explored. We’re ready to help you get a jump on your design.
The video below shows Ravi Sunkavalli, the Xilinx Sr. Director of Data Center Solutions, discussing how advanced FPGAs like devices based on the Xilinx UltraScale architecture can aid you in developing high-speed networking and storage equipment as data centers migrate to faster internal networking speeds. Sunkavalli posits that CPUs, which are largely used for networking and storage applications connected with today’s 10G networks, quickly run out of gas at 40G and 100G networking speeds. FPGAs can provide “bump-in-the-wire” acceleration for high-speed networking ports thanks to the large number of fast compute elements and the high-speed transceivers incorporated into devices like the Xilinx UltraScale and UltraScale+ FPGAs.
Examples of networking applications already handled by FPGAs include VNF (Virtual Network Functions) such as VPNs, firewalls, and security. FPGAs are already being used to implement high-speed data center storage functions such as error correction, compression, and security.
The following 8-minute video was recorded during a Xilinx technology briefing at the recent SC16 conference in Salt Lake City:
All Internet-connected video devices produce data streams that are processed somewhere in the cloud, said Xilinx Chief Video Architect Johan Janssen during a talk at November’s SC16 conference in Salt Lake City. FPGAs are well suited to video acceleration and deliver better compute density than cloud servers based on microprocessors. One example Janssen gave during his talk shows a Xilinx Virtex UltraScale VU190 FPGA improving the video-stream encoding rate from 3 to 60fps while cutting power consumption by half when compared to the performance of a popular Intel Xeon microprocessor executing the same encoding task. In power-constrained data centers, that’s a 40x efficiency improvement with no increase in electrical or heat load. In other words, it costs a lot less operationally to use FPGA for video encoding in data centers.
Here’s the 7-minute video of Janssen’s talk at SC16:
Last November at SC16 in Salt Lake City, Xilinx Distinguished Engineer Ashish Sirasao gave a 10-minute talk on deploying deep-learning applications using FPGAs with significant performance/watt benefits. Sirasao started by noting that we’re already knee-deep in machine-learning applications: spam filters; cloud-based and embedded voice-to-text converters; and Amazon’s immensely successful, voice-operated Alexa are all examples of extremely successful machine-learning apps in broad use today. More—many more—will follow. These applications all have steep computing requirements.
There are two phases in any machine-learning application. The first is training and the second is deployment. Training is generally done using floating-point implementations so that application developers need not worry about numeric precision. Training is a 1-time event so energy efficiency isn’t all that critical.
Deployment is another matter however.
Putting a trained deep-learning application in a small appliance like Amazon’s Alexa calls for attention to factors such as energy efficiency. Fortunately, said Sirasao, the arithmetic precision of the application can change from training to mass deployment and there are significant energy-consumption gains to be had by deploying fixed-point machine-learning applications. According to Sirasao, you can get accurate machine inference using 8- or 16-bit fixed-point implementations while realizing a 10x gain in energy efficiency for the computing hardware and a 4x gain in memory energy efficiency.
The Xilinx DSP48E2 block implemented in the company’s UltraScale and UltraScale+ devices is especially useful for these machine-learning deployments because its DSP architecture can perform two independent 8-bit operations per clock per DSP block. That translates into nearly double the compute performance, which in turn results in much better energy efficiency. There’s a Xilinx White Paper on this topic titled “Deep Learning with INT8 Optimization on Xilinx Devices.”
Further, Xilinx recently announced its Acceleration Stack for machine-learning (and other cloud-based applications), which allows you to focus on developing your application rather than FPGA programming. You can learn about the Xilinx Acceleration Stack here
Finally, here’s the 10-minute video with Sirasao’s SC16 talk:
Nextera Video is helping the broadcast video industry migrate to video-over-IP as quickly as possible with an FPGA IP core developed for Xilinx UltraScale and other Xilinx FPGAs that compresses 4K video using Sony’s low-latency, noise-free NMI (Network Media Interface) packet protocols to achieve compression ratios of 3:1 to 14:1. The company’s products can transport compressed 4Kp60 video between all sorts of broadcast equipment over standard 10G IP switches, which significantly lowers equipment and operating costs for broadcasters.
Here’s a quick video that describes Nextera’s approach:
S2C wants you to get into system prototyping with the super-capable Xilinx Kintex UltraScale FPGA fast, so it’s running a short-term, limited-time, limited-quantity promo cutting the price of a proto package in half. You get a bundle including the company’s Single KU115 Prodigy Logic Module, the 8Gbyte Prodigy DDR4 Memory Module, and the Prodigy GPIO Extension Module for
Prodigy Kintex UltraScale Proto Package with DDR4, GPIO extension modules
The Kintex UltraScale KU115 FPGA is a DSP monster with 5520 DSP48E2 DSP slices, 1.451 million system logic cells, 75.9Mbits of BRAM, and 52 16.3Gbps GTH serial transceiver ports (48 of which are brought out to connectors on the S2C Prodigy Logic Module), and 832 I/O pins (656 of which are brought out to connectors on the S2C Prodigy Logic Module).
Want that S2C deal? (Of course you do!) Click here.
Better do it fast though, before S2C changes its mind.
You can now download the latest version of the Vivado Design Suite HLx Editions, release 2016.4, which adds support for multiple Xilinx UltraScale+ devices including the Virtex UltraScale+ XCVU11P and XCVU13P FPGAs and board support packages for the Zynq UltraScale+ MPSoC ZCU102-ES2 and Virtex UltraScale+ VCU118-ES1 boards.
Download the latest version here.
VadaTech’s new AMC596 “FPGA carrier” answers the question, “How much processing power can you pack into the MicroTCA AMC form factor?” The answer is: a lot.
VadaTech MicroTCA AMC596 FPGA Carrier
The AMC596 combines the processing horsepower of a Xilinx Virtex UltraScale VU440 FPGA with a QorIQ P2040 quad-core communications processor (that’s a team of four PowerPC e500mc processor cores, each running at 1.2GHz). All of this processing power plus 8Gbytes of 64-bit DDR4 SDRAM for the Virtex UltraScale FPGA and 1Gbyte of DDR3 SDRAM dedicated to the QorIQ processor fits on a board measuring a mere 73.5x180.6mm.
Here’s a block diagram of VadaTech’s AMC596 “FPGA carrier”:
The Virtex UltraScale VU440 FPGA on the VadaTech AMC596 is the largest 20nm Xilinx Virtex UltraScale device and brings 4.433M logic cells, 2880 DSP48E2 slices, and 88.6Mbits of Block RAM to the party. You can build a lot of things with that many on-chip resources. VadaTech’s Web page for the AMC596 suggests that it’s ideal for ASIC prototyping or emulation, but of course there are a lot of interesting things you can do with this much processing power in a small form factor.
Powerful things. Fast things.