By Adam Taylor
So far on this journey (which is only just beginning) of looking at the Zynq UltraScale+ MPSoC we have explored mostly the A53 processors within the Application Processing Unit (APU). However, we must not overlook the Real-Time Processing Unit (RPU), which contains two ARM Cortex-R5 32 bit RISC processors and operates within the Zynq MPSoC’s PS’ (processing systems’) Low Power Domain.
R5 RPU Architecture
The RPU executes real-time processing applications, including safety-critical applications. As such, you can use it for applications that must comply with IEC61508 or ISO 26262. We will be looking at this capability in more detail in a future blog. To support this, the RPU can operate in two distinct modes:
Of course, it is the lock-step mode which is implemented as one step when a safety application is being implemented (see chapter 8 of the TRM for full safety and security capabilities). To provide deterministic processing times, both ARM Cortex-R5 cores include 128KB of Tightly Coupled Memory (TCM) in addition to the Caches and OCM (on-chip memory). How the TCMs are used depends upon the operating mode. In Split mode, each processor has 128Kbytes of TCM (divided into A and B TCMs). In lock-step mode, there is one 256Kbyte TCM.
RPU in Lock Step Mode
At reset, the default setting configures the RPU to operate in lock-step mode. However, we can change between the operating modes while the processor group is in reset. We do this by updating the RPU Global Control Register SLCAMP bit, which clamps the outputs of the redundant processors, and the SLSPLIT bit, which sets the operating mode. We cannot change the RPU’s operating mode during operation, so we need to decide upfront during the architectural phase which mode we desire for a given application.
However, we do not have to worry about setting these bits when we use the debugger or generate a boot image. Instead we can use these to configure the operating mode. What I want to look at in the rest of the blog is look at how we configure the RPU operating mode both in our debug applications and boot-image generation.
The first way that we verify many of our designs is to use the System Debugger within SDK, which allows us to connect over JTAG or Ethernet and download our application. Using this method, we can of course use breakpoints and step through the code as it operates, to get to the bottom of any issues in the design. Within the debug configuration tab, we can also enable the RPU to operate in split mode if that’s the mode we want after system reset.
Debug Configuration to enable RPU Split Mode
When you download the code and run it on the Zynq MPSoC’s RPU, you will be able to see the operating mode within the debug window. This should match with your debug configuration setting.
Debug Window showing Lock-Step Mode
Once we are happy with the application, we will want to create a boot image and we will want to determine the RPU operating mode when we create that boot image. We can add the RPU elf to the FSB, FPGA, and APU files using the boot-image dialog. To select the RPU mode, we choose the edit option and then select the destination CPU—either both ARM Cortex-R5 cores in lockstep or the ARM Cortex-R5 core we wish it run on if we are using split mode.
Selecting the R5 Mode of operation when generating a boot image
Of course if we want to be sure we are in the correct mode in this operation, we need to read the RPU Global Control register and ensure the correct mode is selected as expected.
Now that we understand the different operating modes of the Zynq UltraScale+ MPSoC’s RPU, we can come back to these modes when we look at the security and safety capabilities provided by the Zynq MPSoC.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
By: Gaurav Singh
CCIX was just announced last year and already things are getting interesting.
The approach of CCIX as an acceleration interconnect is to work within the existing volume server infrastructure while delivering improvements in performance and cost.
We’ve reached a major milestone. CCIX members Xilinx and Amphenol FCI have recently revealed the first public CCIX technology demo and what it means for the future of data center system design is exciting to consider.
In the demo video below, you’ll see the transferring of a data pattern at 25 Gbps between two Xilinx FPGAs, across a channel comprised of an Amphenol/FCI PCI Express CEM connector and a trace card. The two devices contain Xilinx Transceivers electrically compliant with CCIX. By using the PCI Express infrastructure found in every data center today, we can achieve this 25 Gig performance milestone. The total insertion loss in the demo is greater than 35dB, die pad to die pad, which allows flexibility in system design. We’re seeing excellent margin, and a BER of less than 1E-12.
At 25 Gig, this is the fastest data transfer between accelerators over PCI Express connections ever achieved. It’s three times faster than the top transfer speed of PCI Express Gen3 solutions available today. The application benefits of communicating three times faster between accelerators is significant in data centers, and CCIX is designed to excel in multi-accelerator configurations.
CCIX will enable seamless system integration between processors such as X86, POWER and ARM and all accelerator types, including FPGAs, GPUs, network accelerators and storage adaptors. Even custom ASICs can be incorporated into a CCIX topology. And CCIX gives system designers the flexibility to choose the right combination of heterogeneous components from many different vendors to deliver optimized configurations for the data center.
We’re looking forward to the first products with CCIX sampling later this year.
This week at its annual NI Week conference in Austin, Texas, National Instruments (NI) announced a new FlexRIO PXIe module, the PXIe-7915, based on three Xilinx Kintex UltraScale FPGAs. NI’s PCIe FlexRIO modules serve two purposes in NI-based systems: flexible, programmable, high-speed I/O and high-speed computation (usually DSP). NI’s customers access these FlexRIO resources using the company’s LabVIEW FPGA software, part of NI’s LabVIEW graphical development environment. Thanks to the Kintex UltraScale FPGA, new FlexRIO PXIe-7915 module contains significantly more programmable resources and delivers significantly more performance than previous FlexRIO modules, which are all based on earlier generations of Xilinx FPGAs. The set of graphs below shows the increased resources and performance delivered by the PXIe-7915 FlexRIO module in NI systems relative to previous-generation FlexRIO modules based on Xilinx Kintex-7 FPGAs:
However, the UltraScale-based FlexRIO modules are not simply standalone products. They serve as design platforms for NI’s design engineers, who will use these platforms to develop many new, high-performance instruments. In fact, NI introduced the first two of these new instruments at NI Week 2017: the PXIe-5763 and PXIe-5764 high-speed, quad-channel, 16-bit digitizers. Here are the specs for these two new digitizers from NI:
Previous digitizers in this product family employed parallel LVDS signals to couple high-speed ADCs to an FPGA. However, today’s fastest ADCs employ high-speed serial interfaces--particularly the JESD-204B interface specification—necessitating a new design. The resulting new design uses the FlexRIO PXIe-7915 card as a motherboard and the JESD204B ADC card as a mezzanine board, as shown in this photo:
NI’s design engineers took advantage of the pin compatibility among various Kintex UltraScale FPGAs to maximize the flexibility of their design. They can populate the FlexRIO PXIe-7915 card with a Kintex UltraScale KU035, KU040, or KU060 FPGA depending on customer requirements. This flexibility allows them to create multiple products using one board layout—a hallmark of a superior, modular platform design.
Normally, you access the programmable-logic features of a Xilinx FPGA or Zynq SoC inside of an NI product using LabVIEW FPGA, and that’s certainly still true. However, NI has added something extra in its LabVIEW 2017 release: a Xilinx Vivado Project Export feature that provides direct access to the Xilinx Vivado Design Suite tools for hardware engineers experienced with writing HDL code for programmable logic. Here’s how it works:
You can export all the necessary hardware files from LabVIEW 2017 to a Vivado project that is pre-configured for your specific deployment target. Any LabVIEW signal-processing IP in the LabVIEW design is included in the export as encrypted IP cores. As an added bonus, you can use the new Xilinx Vivado Project Export on all of NI’s FlexRIO and high-speed-serial devices based on Xilinx Kintex-7 or newer FPGAs.
NI has published a White Paper describing all of this. You’ll find it here.
Please contact National Instruments directly for more information about the new FlexRIO modules and LabVIEW 2017.
TI has a new design example for a 2-device power converter to supply multiple voltage rails to a Xilinx Zynq UltraScale+ MPSoC for Remote Radio Heads and wireless backhaul applications, but the design looks usable across the board for many applications of the Zynq MPSoC. The two TI power-control and -conversion devices in this reference design are the TPS6508640 configurable, multi-rail PMIC for multicore processors and the TPS544C25 high-current, single-channel dc-dc converter. Here’s a simplified diagram of the design:
Please contact TI for more information about these power-control and –conversion devices.
By Adam Taylor
We have looked at SDSoC several times throughout this series, however I recently organized and presented at the NMI FPGA Machine Vision event and during the coffee breaks and lunch, attendees showed considerable interest in SDSoC—not only for its use in the Xilinx reVISION acceleration stack but also its use in a range of over developments. As such, I thought it would be worth some time looking at what SDSoC is and the benefits we have previously gained using it. I also want to discuss a new use case.
SDSoC Development Environment
SDSoC is an Eclipse-based, system-optimizing compiler that allows us to develop our Zynq SoC or Zynq UltraScale+ MPSoC design in its entirety using C or C++. We can then profile the application to find aspects that cause performance bottlenecks and move then into the Zynq device’s Programmable Logic (PL). SDSoC does this using HLS (High Level Synthesis) and a connectivity framework that’s transparent to the user. What this means is that we are able develop at a higher level of abstraction and hence reduce the time to market of the product or demonstration.
To do this, SDSoC needs a hardware platform, which can be pre-defined or custom. Typically, these platforms within the PL provide the basics: I/O interfaces and DMA transfers to and from Zynq device’s PS’ (Processing System’s) DDR SDRAM. This frees up most the PL resources and PL/PS interconnects to be used by SDSoC when it accelerates functions.
This ability to develop at a higher level and accelerate performance by moving functions into the PL enables us to produce very flexible and responsive systems. This blog has previously looked at acceleration examples including AES encryption, matrix multiplication, and FIR Filters. The reduction in execution time has been significant in these cases. Here’s a table of these previously discussed examples:
Previous Acceleration Results with SDSoC. Blogs can be found here
To aid us in the optimization of the final application, we can use pragmas to control the HLS optimizations. We can use SDSoC’s tracing and profiling capabilities while optimizing these accelerated functions and the interaction between the PS and PL.
Here’s an example of a trace:
Results of tracing an example application
(Orange = Software, Green = Accelerated function and Blue = Transfer)
Let us take a look at a simple use case to demonstrate SDSoC’s abilities.
Frequency Modulated Continuous Wave (FMCW) RADAR is used for a number of applications that require the ability to detect objects and gauge their distance. FMCW applications make heavy use of FFT and other signal-processing techniques such as windowing, Constant False Alarm Rate (CFAR), and target velocity and range extraction. These algorithms and models are ideal for description using a high-level language such as C / C++. SDSoC can accelerate the execution of functions described this way and such an approach allows you to quickly demonstrate the application.
It is possible to create a simple FMCW receive demo using a ZedBoard and an AD9467 FPGA Mezzanine Card (FMC). At the simplest level, the hardware element of the SDSoC platform needs to be able to transfer samples received from the ADC into the PS memory space and then transfer display data from the PS memory space to the display, which in most cases will be connected with DVI or HDMI interfaces.
Example SDSoC Platform for FMCW application
This platform permits development of the application within SDSoC at a higher level. It also provides a platform that we can use for several different applications, not just FMCW. Rather helpfully, the AD9467 FMC comes with a reference design that can serve as the hardware element of the SDSoC Platform. It also provides drivers, which can be used as part of the software element.
With a platform in hand, it is possible to write the application within the SDSoC using C or C++, where we can make use of the acceleration libraries and stacks including matrix multiplication, math functions, and the ability to wrap bespoke HLD IP cores and use them within the development.
Developing in this manner provides a much faster development process, and provides a more responsive solution as it leverages the Zynq PL for inherently parallel or pipelined functions. It also makes it easier to upgrade designs in terms. As the majority development will also use C or C++ and because SDSoC is a system-optimizing complier, the application developer does not need to be a HDL specialist.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Enea just announced that it has added a BSP (board support package) for the Zynq UltraScale+ MPSoC and ZCU102 Eval Kit to its POSIX-compliant, multicore OSE operating system. OSE offers embedded developers extremely low latency, low jitter, and minimal processing overhead to deliver bare-metal performance that extracts maximum performance from heterogeneous processors like the Zynq UltraScale+ MPSoC. According to Enea, OSE supports both SMP (symmetric multiprocessing) and AMP (asymmetric multiprocessing) and delivers linear performance scalability for MPSoCs with as many as 24 cores, so it should be able to easily handle the four or more 64- and 32-bit ARM Cortex-A53 and –R5 processor cores in the various Zynq UltraScale+ MPSoC family members.
Xilinx ZCU102 Eval Kit for the Zynq UltraScale+ MPSoC
Enea’s carrier-grade OSE has long been used in the telecom industry and is incorporated into more than half of the world's radio base stations. In addition, OSE is used in automotive, medical, and avionics designs.
You’ve probably heard that “time equals money.” That’s especially true with high-frequency trading (HFT), which seeks high profits based on super-short portfolio holding periods driven by quant (quantitative) modeling. Microseconds make the difference in the HFT arena. As a result, a lot of high-frequency trading companies use FPGA-based hardware to make decisions and place trades and a lot of those companies use Xilinx FPGAs. No doubt that’s why Aldec is showing its HES-HPC-DSP-KU115 FPGA accelerator board at the Trading Show 2017 being held in Chicago, starting today.
Aldec HES-HPC-DSP-KU115 FPGA accelerator board
This board is based on two Xilinx All Programmable devices: the Kintex UltraScale KU115 FPGA and the Zynq Z-7100 SoC (the largest member of the Zynq SoC family). This board has been optimized for High Performance Computing (HPC) applications and prototyping of DSP algorithms thanks to the Kintex UltraScale KU115 FPGA’s 5520 DSP blocks. This board partners the Kintex UltraScale FPGA with six simultaneously accessible external memories—two DDR4 SODIMMs and four low-latency RLDRAMs—providing immense FPGA-to-memory bandwidth.
The Zynq Z-7100 SoC can operate as an embedded Linux host CPU and it can implement a PCIe host interface and multiple Gigabit Ethenert ports.
In addition, the Aldec HES-HPC-DSP-KU115 FPGA accelerator board has two QSFP+ optical-module sockets for 40Gbps network connections.
Amazon Web Services (AWS) has just posted a 35-minute deep-dive video discussing the Amazon EC2 F1 Instance, a programmable cloud accelerator based on Xilinx Virtex UltraScale+ FPGAs. (See “AWS makes Amazon EC2 F1 instance hardware acceleration based on Xilinx Virtex UltraScale+ FPGAs generally available.”) This fresh, new video talks about the development process and the AWS SDK.
Rather than have me filter this interesting video, here it it:
Today, IBM and Xilinx announced PCIe Gen4 16/Gtransfer/sec/lane interoperation between an IBM Power9 processor and a Xilinx UltraScale+ All Programmable device. (FYI: That’s double the performance of a PCIe Gen3 connection.) IBM expects this sort of interface to be particularly important in the data center for high-speed, processor-to-accelerator communications, but the history of PCIe evolution clearly suggests that PCIe Gen4 is destined for wide industry adoption across many markets—just like PCIe generations 1 through 3. The thirst for bit rate exists everywhere, in every high-performance design.
All Xilinx Virtex UltraScale+ FPGAs, many Zynq UltraScale+ MPSoCs, and some Kintex UltraScale+ FPGAs incorporate one or more PCIe Gen3/4 hardened, integrated blocks, which can operate as PCIe Gen4 x8 or Gen3 x16 Endpoints or Roots. In addition, all UltraScale+ MGT transceivers (except the PS-GTR transceivers in Zynq UltraScale+ MPSoCs) support the data rates required for PCIe Gen3 and Gen4 interfaces. (See “DS890: UltraScale Architecture and Product Data Sheet: Overview” and “WP458: Leveraging UltraScale Architecture Transceivers for High-Speed Serial I/O Connectivity” for more information.)
With at least four, six, or more hardened, embedded programmable microprocessor cores in Xilinx Zynq UltraScale+ MPSoCs and a nearly unlimited number of soft MicroBlaze processor cores possible in the devices’ programmable logic, you need to start thinking pretty hard about how you’re going to harness all of that software programmability. Hardent would like to help you so it’s offering a free Webinar on June 20 titled “Leveraging The OpenAMP Framework for Heterogeneous Software Architecture.”
The OpenAMP framework for the Zynq UltraScale+ MPSoC virtualizes the Zynq MPSoC processors and makes that consolidated computing power available to software developers in a more familiar form. Hardent’s webinar will discuss the OpenAMP framework and will then outline how designers can leverage the framework to run different operating systems—such as Linux and an RTOS—concurrently, using different processors within the same MPSoC.
In this webinar, you will:
By Adam Taylor
I have previously discussed the Zynq UltraScale+ MPSoC’s interrupt architecture, so this blog will show you how to use these interrupts in a simple example. To do this we are going to use the push button and the LED on the Avnet UltraZed Starter Kit. These peripherals are connected to the Zynq MPSoC’s PS MIO. We will configure the system so that pressing the button generates an interrupt, causing the Zynq MPSoC to toggle the LED on and off.
We are using the UltraZed SoM on the UltraZed IOCC (I/O Carrier Card), so the push button is connected to MIO 26 while the LED is connected to MIO 31. Within Vivado, you can see what each MIO is used for and, if necessary, configure it on the IO configuration tab of the MPSoC Customization wizard. Both of the MIO signals used in this example are on MIO bank 1 and, because we used Vivado’s the board automation when we instantiated the MPSoC in our block diagram, the MIO and PS are already configured correctly for both the SoM and the IOCC.
MIO configuration on the MPSoC PS
Because we are using the MIO for this example, we can use the existing Vivado MPSoC design that we’ve been using to date. The main work to get this example up and running will be within SDK, where we need to do the following:
Example running on the MPSoC
To implement this example and write the elements identified above, we will need to use functions contained with the Xilinx PS GPIO, PS Generic Interrupt Controller and Exception drivers. These were created when we established our BSP.
I have uploaded the source code and the bin file to the GitHub repository. If you are using a different board than the UltraZed IOCC, you can modify this example very simply. To do this you need to change the input and output pin and bank mapping to the MIO allocations as used on your board, assuming there is a switch and LED connected to the MIO.
GPIO Bank and MIO Pin numbers to be updated in the source code for different boards
Once you have updated the source code example for your board, all you need to do is rebuild the project and run it on your hardware.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
The new PALTEK DS-VU 3 P-PCIE Data Brick places a Xilinx Virtex UltraScale+ VU3P FPGA along with 8Gbytes of DDR4-2400 SDRAM, two VITA57.1 FMC connectors, and four Samtec FireFly Micro Flyover ports on one high-bandwidth, PCIe Gen3 with a x16 host connector. The card aims to provide FPGA-based hardware acceleration for applications including 2K/4K video processing, machine learning, big data analysis, financial analysis, and high-performance computing.
PALTEK Data Brick packs Virtex UltraScale+ VU3P FPGA onto a PCIe card
The Samtec Micro Flyover ports accept both ECUE copper twinax and ECUO optical cables. The ECUE twinax cables are for short-reach applications and have a throughput of 28Gbps per channel. The ECUO optical cables operate at a maximum data rate of 14Gbps per channel and are available with as many as 12 simplex or duplex channels (with 28Gbps optical channels in development at Samtec).
For broadcast video applications, PALTEK also offers companion 12G-SDI Rx and 12G-SDI-Tx cards that can break out eight 12G-SDI video channels from one FireFly connection.
Please contact PALTEK directly for more information about these products.
For more information about the Samtec FireFly system, see:
On May 16, David Pellerin, Business Development Manager at AWS (Amazon Web Services) will be presenting two 1-hour Webinars with a deep dive into Amazon’s EC2 F1 Instance. (The two times are to accommodate different time zones worldwide.) The Amazon EC2 F1 Instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Each Amazon EC2 F1 Instance can include as many as eight FPGAs, so you can develop extremely large and capable, custom compute engines with this technology. Applications in diverse fields such as genomics research, financial analysis, video processing, security/cryptography, and machine learning are already using the FPGA-accelerated EC2 F1 Instance to improve application performance by as much as 30x over general-purpose CPUs.
Register for Amazon’s Webinar here.
“Xilinx All Programmable FPGAs and SoCs are playing a pivotal role in building 5G systems that can be easily and rapidly updated and enhanced to align with emerging standards and opportunities. The majority of the industry’s 5G proof of concepts, test beds and early commercialization trials for eMBB, URLLC, and mMTC use cases are leveraging Xilinx technology,” because “merchant silicon does not exist and ASICs are not viable this early in the 5G standardization phase. … The first wave of commercial 5G system deployments are likely to rely on these prototypes.”
That’s the premise of a new blog written by Xilinx’s Director Communications Strategic & Technical Marketing Harpinder Matharu and posted on the knect365.com Web site. Follow the link to read Matharu’s full blog post.
For more 5G coverage in Xcell Daily, see:
Face it, you use PCIe to go fast. That’s the whole point of the interface. So when you move data over your PCIe buses, you likely want to go as fast as possible. Perhaps you’d like some tips on getting maximum PCIe performance when designing with Xilinx’s most advanced FPGAs. You’re in luck , there’s a new 13-minute video that discusses that topic.
The video covers these contributors to PCIe performance:
The video explores a PCIe design for the KCU105 Kintex UltraScale FPGA Evaluation Kit using the Vivado Design Suite’s graphical IP Integrator (IPI) tool. The design took only about 20 to 30 minutes to create using IPI.
The video then discusses the results of various performance experiments using this design. Results like this:
Here’s the video:
We’ve had KVM (keyboard, video, mouse) switches for controlling multiple computers from one set of user-interface devices for a long, long time. Go back far enough, and you were switching RS-232 ports to control multiple computers or other devices with one serial terminal. Here’s what they looked like back in the day:
In those days, these KVM switches could be entirely mechanical. Now, they can’t. There are different video resolutions, different coding and compression standards, there’s video over IP (Ethernet), etc. Today’s KVM switch is also a many-to-many converter. Your vintage rotary switch isn’t going to cut it for today’s Pro AV and Broadcast applications.
If you need to meet this kind of design challenge—today—you need low-latency video codecs like H.265/HEVC and compression standards such as TICO; you need 4K and 8K video resolution with conversion to and from HD; and you need compatibility and interoperability with all sorts of connectivity standards including 3G/12G SGI and high-speed Ethernet. In short, you need “Any Media Over Any Network” capability and you need all of that without exploding your BOM cost.
Where are you going to get it?
Well, considering that this is the Xilinx Xcell Daily blog, it’s a good bet that you’re going to hear about the capabilities of at least one Xilinx All Programmable device.
Actually, this blog is about a couple of upcoming Webinars being held on May 23 titled “Any Media Over Any Network: KVM Extenders, Switches and KVM-over-IP.” The Webinars are identical but are being held at two different times to accommodate worldwide time zones. In this webinar, Xilinx will show you how you can use the Zynq UltraScale+ MPSoC in KVM applications. The webinar will highlight how Xilinx and its partners’ video-processing and -connectivity IP cores along with the integrated H.265/HEVC codec in the three Zynq UltraScale+ MPSoC EV family members can quickly and easily address new opportunities in the KVM market.
Can we talk? About security? You know that it’s a dangerous world out there. For a variety of reasons, bad actors want to steal your data, or steal your customers’ data, or disrupt operations. Your job is not only to design something that works; these days, you also need to design equipment that resists hacking and tampering. PFP Cybersecurity provides IP that helps you create systems that have robust defenses against such exploits.
“PFP” stands for “power fingerprinting,” which combines AI and analog power analysis to create high-speed, next-generation cyber protection that can detect tampering in milliseconds instead of days, weeks, or months. It does this by observing the tiny changes to a system’s power consumption during normal operation, learning what’s normal, and then monitoring power consumption to detect an abnormal situation that might signal tampering.
The 3-minute video below discusses these aspects of PFP Cybersecurity’s IP and also discusses why the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC are a perfect fit for this security IP. The Zynq device families can all perform high-speed signal processing, have built-in analog conversion circuitry for measuring voltage and current, and can implement high-performance machine-learning algorithms for analyzing power usage.
Originally, PFP Cybersecurity designed a monitoring system based on the Zynq SoC for monitoring other systems but, as the video discusses, if the system is already based on a Zynq device, it can monitor itself and return itself to a known good state if tampering is suspected.
Here’s the video:
Note: For more information about PFP Cybersecurity, see “Zynq-based PFP eMonitor brings power-based security monitoring to embedded systems.”
By Adam Taylor
We need to be able to create more advanced, event-driven applications for Xilinx Zynq UltraScale+ MPSoCs but before that can happen, we need to look at some of the more complex aspects of these devices. In particular, we need to understand how interrupts function within the Zynq MPSoC’s PS (processing system). As would be expected, the Zynq MPSoC’s interrupt structure is slightly more complicated than the Zynq SoC’s PS because the Zynq MPSoC has more processor cores.
Architectural view of the Interrupt System
The Zynq UltraScale+ MPSoC’s interrupt architecture has four main elements:
At the highest level, we can break these interrupts down into several groupings, which are supplied to each element of the architecture:
Shared Peripheral Interrupts can also be sourced by the PL, which is where it gets interesting. We can enable interrupts in either direction between the PS and PL, from within the PS-PL configuration tab of the Zynq MPSoC customization GUI.
For the RPU, we are provided an IRQ and an FIQ for each processor core. For fast, low-latency responses, we should use the FIQ. For typical interrupt sources, we should use the IRQ.
RPU IRQ and FIQ Interrupts Enabled for each Core on the MPSoC
When it comes to the APU, we have two options for connecting the interrupts. The first option is to use the legacy IRQ and FIQ interrupts. There’s one of each for each processor core within the APU. When enabled at the top level of the Zynq MPSoC’s IP Block, we get two 4-bit ports—one for the IRQ and one for the FIQ. Again, the FIQ input provides lowest-latency interrupt response.
APU IRQ and FIQ Interrupts Enabled for each Core on the MPSoC
The second approach to interrupts is to use interrupt groups. The APU’s GICv2 supports two interrupt groups: zero and one. You can assign interrupts within group zero to the IRQ or FIQ, while those within group 1 can only be assigned to IRQ. This assignment occurs internally within the GICv2. We can also use these interrupt groups when we implement secure environments, with group 0 being used for secure interrupts and group 1 for not-secure.
APU IRQ Groups Enabled for each Core on the MPSoC
We can use the Inter-Processor Interrupt (IPI) to enable interrupts between the APU, RPU, and PMU. The IPI enables processors within the APU, RPU and PMU to interrupt each other. There IPI also has the ability to interrupt one or more softcore processors implemented within the Zynq MPSoC’s PL.
IPI Interrupt Numbers in the SPI
In addition to the interrupt, the IPI also provides a 32-byte IPI payload buffer for each direction, which can be used for limited communication. The IPI provides eight masters: the APU, RPU0, RPU1, and PMU, along with the LPD and FPD S AXI Interfaces. These masters can be changed from the default allocation by selecting advanced, on the MPSoC Re-Customisation GUI.
The final element of the interrupt structure is the Proxy GIC, which collates the shared interrupts connected to the LPU GIC and provides a series of Interrupt Status Registers, used by the PMU.
Now that we understand Zynq UltraScale+ MPSoC interrupts a little more, we will look at how we can use these within our designs going forward.
Code is available on Github as always.
In this 40-minute webinar, Xilinx will present a new approach that allows you to unleash the power of the FPGA fabric in Zynq SoCs and Zynq UltraScale+ MPSoCs using hardware-tuned OpenCV libraries, with a familiar C/C++ development environment and readily available hardware development platforms. OpenCV libraries are widely used for algorithm prototyping by many leading technology companies and computer vision researchers. FPGAs can achieve unparalleled compute efficiency on complex algorithms like dense optical flow and stereo vision in only a few watts of power.
This Webinar is being held on July 12. Register here.
Here’s a fairly new, 4-minute video showing a 1080p60 Dense Optical Flow demo, developed with the Xilinx SDSoC Development Environment in C/C++ using OpenCV libraries:
For related information, see Application Note XAPP1167, “Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries.”
The PCI-SIG Compliance Workshop #101 held earlier this month in Milpitas, CA, dedicated to testing PCIe compliance, was the first interoperability testing for the preliminary PCIe 4.0 spec. The preliminary 4.0 testing included CEM electrical testing and Link/Transaction testing at 16Gtransfers/sec. PLDA went to this workshop with its Gen4SWITCH PCIe 4.0 Board, which is based on based on the company’s PCIe-compliant XpressSWITCH IP and XpressRICH4 controller IP for PCIe 4.0 technology. This configuration supports PCIe 4.0 V0.7. PLDA took a board based on the Xilinx Virtex UltraScale+ VU3P FPGA to the workshop.
PLDA XpressRICH4 IP for AXI Block Diagram
With the PLDA Gen4SWITCH configured in x4 and x1 configurations, the board successfully interoperated in the following systems at the PCI-SIG workshop:
When Xcell Daily last looked at PLDA’s Gen4SWITCH PCIe 4.0 Platform Development Kit nearly a year ago, see “PLDA shows working PCIe 4.0 Platform Development Kit operating @ 16G transfers/sec at today’s PCI-SIG Developer’s Conference,” it was based on a Xilinx Virtex UltraScale VU065 FPGA. It now appears that PLDA may have been able to upgrade this board by taking advantage of the footprint compatibility between the Virtex UltraScale VU065 FPGA and the Xilinx Virtex UltraScale+ VU3P FPGA. Here’s a table from the Xilinx UltraScale FPGA Product Selection Guide showing you how the various members of the UltraScale and UltraScale+ FPGA families line up with respect to footprint compatibility:
At the relatively low image resolution permitted by the Xcell Daily layout, you can just make out from the table that the Virtex UltraScale VU065 FPGA and the Xilinx Virtex UltraScale+ VU3P FPGA in the C1517 package have compatible footprints. It actually takes a fair amount of careful engineering to achieve this level of physical compatibility across four different FPGA families (Kintex UltraScale, Virtex UltraScale, Kintex UltraScale+, and Virtex UltraScale+) and multiple devices within these families.
Like any semiconductor device, designing with Xilinx All Programmable devices means dealing with their power-supply requirements—and like any FPGA or SoC, Xilinx devices do have their fair share of requirements in the power-supply department. They require several supply voltages, more or less depending on your I/O requirements, and they need to have these voltages ramped up and down in a certain sequence and with specific ramp rates if they’re to operate properly. On top of that, power-supply designs are board-specific—different for every unique pcb. Dealing with all of these supply specs is a challenging engineering problem, just due to the number of requirements, so you might like some help tackling it.
Here’s some help.
Infineon demonstrated a reference power supply design for Xilinx Zynq UltraScale+ MPSoCs based on its IRPS5401 Flexible Power Management Unit at APEC (the Applied Power Electronics Conference) last month. The reference design employs two IRPS5401 devices to manage and supply ten different power supplies. Here’s a block diagram of the reference design:
This design is used on the Avnet UltraZed SOM, so you know that it’s already proven. (For more information about the Avnet UltraZed SOM, see “Avnet UltraZed-EG SOM based on 16nm Zynq UltraScale+ MPSoC: $599” and “Look! Up in the sky! Is it a board? Is it a kit? It’s… UltraZed! The Zynq UltraScale+ MPSoC Starter Kit from Avnet.”)
Now the UltraZed SOM measures only 2x3.5 inches (50.8x76.2mm) and the power supply consumes only a small fraction of the space on the SOM, so you know that the Infineon power supply design must be compact.
It needs to be.
Here’s a photo of the UltraZed SOM with the power supply section outlined in yellow:
Infineon Power Supply Design on the Avnet UltraZed SOM (outlined in yellow)
Even though this power supply design is clearly quite compact, the high integration level inside of Infineon’s IRPS5401 Flexible Power Management Unit means that you don’t need additional components to handle the power-supply sequencing or ramp rates. The IRPS5401s handle that for you.
However, every Zynq UltraScale+ MPSoC pcb is different because every pcb presents different loads, capacitances, and inductances to the power supply. So you will need to tailor the sequencing and ramp times for each board design. Sounds like a major pain, right?
Well, Infineon felt your pain and offers an antidote. It’s an Infineon software app called the PowIRCenter and it’s designed to reduce the time needed to develop the complex supply-voltage sequencing and ramp times to perhaps 15 minutes worth of work—which is how long it took, apparently, for an Avnet design engineer to set the timings for the UltraZed SOM.
Here’s a 4-minute video where Infineon’s Senior Product Marketing Manager Tony Ochoa walks you through the highlights of this power supply design and the company’s PowIRCenter software:
Just remember, the Infineon IRPS5401 Flexible Power Management Unit isn’t dedicated to the Zynq UltraScale+ MPSoC. You can use it to design power supplies for the full Xilinx device range.
Note: For more information about the IRPS5401 Flexible Power Management Unit, please contact Infineon directly.
Today, intoPIX announced that it’s lightweight TICO video-compression IP cores for Xilinx FPGAs can now support frame resolutions and rates to 8K60p as well as the previously supported HD and 4K resolutions. Currently, the compression cores support 10-bit, 4:2:2 workflows but intoPIX also disclosed in a published table (see below) that a future release of the IP core will support 4:4:4 color sampling. The TICO compression standard simplifies the management of live and broadcast video streams over existing video network infrastructures based on SDI and Ethernet by reducing the bandwidth requirements of high-definition and ultra-high-definition video at compression ratios as large as 5:1 (visually lossless at ratios to 4:1). TICO compression supports live video streams through its low latency—less than 1msec end-to-end.
Conveniently, intoPIX has published a comprehensive chart showing its various TICO compression IP cores and the Xilinx FPGAs that can support them. Here’s the intoPIX chart:
Note that the most cost-effective Xilinx FPGAs including the Spartan-6 and Artix-7 families support TICO compression at HD and even some UHD/4K video formats while the Kintex-7, Virtex-7, and UltraScale device families support all video formats through 8K.
Please contact intoPIX for more information about these IP cores.
Mentor Embedded is now supporting the Android OS (plus Linux and Nucleus) on Zynq UltraScale+ MPSoCs. You learn more in a free Webinar titled “Android in Safety Critical Designs” that’s being held on May 3 and 4. The Webinar will discuss how to use Android in safety-critical designs on the Xilinx Zynq UltraScale+ MPSoC. Register for the Webinars here.
As of today, Amazon Web Services (AWS) has made the FPGA-accelerated Amazon EC2 F1 compute instance generally available to all AWS customers. (See the new AWS video below and this Amazon blog post.) The Amazon EC2 F1 compute instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Each Amazon EC2 F1 compute instance can include as many as eight FPGAs, so you can develop extremely large and capable, custom compute engines with this technology. According to the Amazon video, use of the FPGA-accelerated F1 instance can accelerate applications in diverse fields such as genomics research, financial analysis, video processing (in addition to security/cryptography and machine learning) by as much as 30x over general-purpose CPUs.
Access through Amazon’s FPGA Developer AMI (an Amazon Machine Image within the Amazon Elastic Compute Cloud (EC2)) and the AWS Hardware Developer Kit (HDK) on Github. Once your FPGA-accelerated design is complete, you can register it as an Amazon FPGA Image (AFI), and deploy it to your F1 instance in just a few clicks. You can reuse and deploy your AFIs as many times, and across as many F1 instances as you like and you can list it in the AWS Marketplace.
The Amazon EC2 F1 compute instance reduces the time a cost needed to develop secure, FPGA-accelerated applications in the cloud and has now made access quite easy through general availability.
Here’s the new AWS video with the general-availability announcement:
The Amazon blog post announcing general availability lists several companies already using the Amazon EC2 F1 instance including:
By Adam Taylor
Having introduced the Real-Time Clock (RTC) in the Xilinx Zynq UltraScale+ MPSoC, the next step is to write some simple software to set the time, get the time, and calibrate the RTC. Doing this is straightforward and aligns with how we use other peripherals in the Zynq MPSoC and Zynq-7000 SoC.
Like all Zynq peripherals, the first thing we need to do with the RTC is look up the configuration and then use it to initialize the peripheral device. Once we have the RTC initialized, we can configure and use it. We can use the functions provided in the xrtcpsu.h header file to initialize and use the RTC. All we need to do is correctly set up the driver instance and include the xrtcpsu.h header file. If you want to examine the file’s contents, you will find them within the generated BSP for the MPSoC. Under this directory, you will also find all the other header files needed for your design. Which files are available depends upon how you configured the MPSoC in Vivado (e.g. what peripherals are present in the design).
We need to use a driver instance to use the RTC within our software application. For the RTC, that’s XRtcPsu, which defines the essential information such as the device configuration, oscillator frequency, and calibration values. This instance is used in all interactions with the RTC using the functions in the xrtcpsu.h header file.
As I explained last week, the RTC counts the number of seconds, so we will need to convert to and from values in units of seconds. The xrtcpsu.h header file contains several functions to support these conversions. To support this, we’ll use a C structure to hold the real date prior to conversion and loading into the RTC or to hold the resultant conversion date following conversion from the seconds counter.
We can use the following functions to set or read the RTC (which I did in the code example available here):
By convention, the functions used to set the RTC seconds counter is based on a time epoch from 1/1/2000. If we are going to be using internet time, which is often based on a 1/1/1970 epoch by a completely different convention, we will need to convert from one format to another. The functions provided for the RTC only support years between 2000 and 2099.
In the example code, we’ve used these functions to report the last set time before allowing the user to enter the time over using a UART. Once the time has been set, the RTC is calibrated before being re-initialized. The RTC is then read once a second and the values output over the UART giving the image shown at the top of this blog. This output will continue until the MPSoC is powered down.
To really exploit the capabilities provided by the RTC, we need to enable the interrupts. I will look at RTC interrupts in the Zynq MPSoC in the next issue of the MicroZed Chronicles, UltraZed Edition. Once we understand how interrupts work, we can look at the RTC alarms. I will also fit a battery to the UltraZed board to test its operation on battery power.
The register map with the RTC register details can be found here.
My code is available on Github as always.
Basic problem: When you’re fighting aliens to save the galaxy wearing your VR headset, having a wired tether to pump the video to your headset is really going to crimp your style. Spin around to blast that battle droid sneaking up on you from behind is just as likely to garrote you as save your neck. What to do? How will you successfully save the galaxy?
Well, NGCodec and Celeno Communications have a demo for you in the NGCodec booth (N2635SP-A) at NAB in the Las Vegas Convention Center next week. Combine NGCodec’s low-latency H.265/HEVC “RealityCodec” video coder/decoder IP with Celeno’s 5GHz 802.11ac WiFi connection and you have a high-definition (2160x1200), high-frame-rate (90 frames/sec) wireless video connection over a 15Mbps wireless channel. This demo uses a 250:1 video compression setting to fit the video into the 15Mbps channel.
In the demo, a RealityCodec hardware instance in a Xilinx Virtex UltraScale+ VU9P FPGA on a PCIe board plugged into a PC running Windows 10 compresses generated video in real time. The PC sends the compressed, 15Mbps video stream to a Celeno 802.11ac WiFi radio, which transmits the video over a standard 5GHz 802.11ac WiFi connection. Another Celeno WiFi radio receives the compressed video stream and sends it to a second RealityCodec for decoding. The decoder hardware is instantiated in a relatively small Xilinx Kintex-7 325T FPGA. The decoded video stream feeding the VR goggles requires 6Gbps of bandwidth, which is why you want to compress it for RF transmission.
Of course, if you’re going to polish off the aliens quickly, you really need that low compression latency. Otherwise, you’re dead meat and the galaxy’s lost. A bad day all around.
Here’s a block diagram of the NAB demo:
Later this month at the NAB Show in Las Vegas, you’ll be able to see several cutting-edge video demos based on the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC in the Omnitek booth (C7915). First up is an HEVC video encoder demo using the embedded, hardened video codec built into the Zynq UltraScale+ ZU7EV MPSoC on a ZCU106 eval board. (For more information about the ZCU106 board, see “New Video: Zynq UltraScale+ EV MPSoCs encode and decode 4K60p H.264 and H.265 video in real time.”)
Next up is a demo of Omnitek’s HDMI 2.0 IP core, announced earlier this year. This core consists of separate transmit and receive subsystems. The HDMI 2.0 Rx subsystem can convert an HDMI video stream (up to 4KP60) into an RGB/YUV video AXI4-Stream and places AUX data in an auxiliary AXI4-Stream. The HDMI 2.0 Tx subsystem converts an RGB/YUV video AXI4-Stream plus AUX data into an HDMI video stream. This IP features a reduced resource count (small footprint in the programmable logic) and low latency.
Finally, Omnitek will be demonstrating a new addition to its OSVP Video Processor Suite: a real-time Image Signal Processing (ISP) Pipeline Subsystem, which can create an RGB video stream from raw image-sensor outputs. The ISP pipeline includes blocks that perform image cropping, defective-pixel correction, black-level compensation, vignette correction, automatic white balancing, and Bayer filter demosaicing.
Omnitek’s Image Signal Processing (ISP) Pipeline Subsystem
Both the HDMI 2.0 and ISP Pipeline Subsystem IP are already proven on Xilinx All Programmable devices including all 7 series devices (Artix-7, Kintex-7, and Virtex-7), Kintex UltraScale and Virtex UltraScale devices, Kintex UltraScale+ and Virtex UltraScale+ devices, and Zynq-7000 SoCs and Zynq UltraScale+ MPSoCs.
By Adam Taylor
When we look at the peripherals in the Zynq UltraScale+ MPSoC’s PS (processor system), we see several which, while not identical to the those in the Zynq-7000 SoC, perform a similar function (e.g. the Sysmon, I2C controller etc.). As would be expected however, there are also peripherals that are brand new in the MPSoC. One of these is the Real-Time Clock (RTC), which will be the subject of my next few blogs.
The Zynq UltraScale+ MPSoC’s RTC is an interesting starting point in this examination of the on-chip PS peripherals as it can be powered from its own supply PSBATT to ensure that the RTC functions when the rest of the system is powered down. If we want that feature to work in our system design, then we need to include a battery that will provide this power over the equipment’s operating life.
The Zynq UltraScale+ MPSoC’s RTC needs an external battery to operated when the system is powered down
As shown in the figure above, the Zynq UltraScale+ MPSoC’s RTC is split into the RTC Core (dark gray rectangle) and the RTC Controller (medium gray “L”). The RTC Core resides within the battery power domain. The core contains all the counters needed to implement the timer functions and includes a tick counter driven directly by the external crystal oscillator (see clocking blog).
At the simplest level, the tick counter determines when a second has elapsed, incrementing a second counter. The operating system uses this second counter to determine the date from a specific reference point known by the operating system. The second counter is 32 bits wide so it can count for 136 years. If necessary, we can set the seconds counter to a known value as well once the low-power domain is operational.
To ensure timing accuracy, the RTC provides a calibration register that can correct timing errors every 16 seconds due to static inaccuracies caused by the crystal’s frequency tolerance. At some point, your application code can determine the RTC’s timing inaccuracy based on an external timing reference (like a GPS-derived time, for example) and then use the computed inaccuracy to discipline the RTC by setting the calibration register.
The Zynq UltraScale+ MPSoC’s RTC incorporates a calibration register for clock-crystal compensation
The RTC can generate an interrupt once every second when it’s fully powered. (There’s no need for clock interrupts when the RTC is running on battery power because there’s no operational processor to interrupt.) The Zynq UltraScale+ MPSoC’s ARM processor that’s controlling the RTC should have this interrupt enabled so that it can correctly manage it.
During board testing and commissioning, we can use an RTC register bit to clock the counters in place of the external crystal oscillator. This is of interest if we want to ensure that alarms occur at set values but we do not want to wait for the long time they would normally take to occur if we waited for the oscillator ticks. The other approach is to use a different value for the alarms during testing, which requires a different load of the application software and is not representative of the actual code.
When it comes to selecting an external crystal for the RTC, we should select either a 32768Hz or a 65536Hz crystal. If the part selected has a 20 PPM tolerance, the RTC’s calibration feature allows us to achieve better than 2 PPM if we use the 32768Hz crystal or 1 PPM if we use the 65536Hz crystal. (We get more calibration resolution with the faster crystal.)
We need to use the RTC Controller to access and manage the RTC core. The controller provides the ability to control and interact the RTC Core once the low-power domain is powered up. We also configure the interrupts and alarms to be generated within the RTC controller. We can set an alarm to occur at any point within the 136-year range of the second counter.
I should also note that battery power is only required when the PS main supplies are not powered. If the main supplies are powered, then the battery does not power the RTC Core. We can use the ratio of the time the system is powered up to the time it spends powered down to correctly size the battery.
In the next blog, we will look at the software we need to write to configure, control, and calibrate the RTC.
My code is available on Github as always.
Xcell Daily has covered the Xilinx version of QEMU for the Zynq UltraScale+ MPSoC, Zynq-7000 SoC, and for any design that uses the Xilinx MicroBlaze 32-bit soft-core RISC processor but now there’s a short video that gives you a good introduction to the tool in just over 10 minutes.
If you haven’t heard about QEMU, it’s a fast, software-based processor emulator and virtual emulation platform developed by the open-source community. Xilinx adopted QEMU several years ago for internal product development and, as a consequence, has developed and extended QEMU to cover the ARM Cortex-A53, -A9, and -R5 processor cores used in the Zynq UltraScale+ MPSoCs and Zynq-7000 SoCs and for the Xilinx MicroBlaze processor core as well. At the same time, Xilinx has fed these enhancements back to the open-source community. (More info here.)
You reap the benefit of this development in the rapid introduction of new Zynq devices and in your ability to download the Xilinx version of QEMU and use it for developing your own designs. The availability of QEMU for the processors used in Zynq devices and the MicroBlaze processor allows you to develop code for your design long before the application-specific hardware IP is ready. That means you can short-cut big chunks of your development cycle when the software team might otherwise be idled, waiting for a development platform. With QEMU, the development platform is as simple as a laptop PC.
That said, here’s the short video:
For additional Xcell Daily coverage of QEMU, see:
If you have a mere 14 minutes to spare, you can watch this new video that will show you how to set up the Zynq UltraScale+ MPSoC’s hardened, embedded PCIe block as a Root Port using the Vivado Design Suite. The target system is the ZCU102 eval kit (currently on sale for half price) and the video shows you how to use the PetaLinux tools to connect to a PCIe-connected NVMe SSD.
This is a fast, painless way to see a complete set of Xilinx development tools being used to create a fully operational system based on the Zynq UltraScale+ MPSoC in less than a quarter of an hour.