We recently looked at how we could use the Zynq SoC’s XADC streaming output with DMA. For that example, I demonstrated only outputting one XADC channel over an AXI stream. However, it is important we understand how we can use multiple channels within an AXI stream to transfer them to processor memory whether we are using the XADC as the source or not.
To demonstrate this, I will be updating the XADC design that we used for the previous streaming example. Upgrading the software is simple. All we need to do is enable another XADC channel when we configure the sequencer and update the API we use. Updating the hardware is a little more complicated.
To upgrade the Vivado hardware design, the first thing we need to do is replace the DMA IP module with the multi-channel DMA (MCDMA) IP core. The MCDMA IP core supports as many as 16 different input channels. DMA channels are mapped to AXI Stream contents using the TDest bus, which is part of the AXIS standard.
As with the previous XADC streaming example, we’ll configure the MCDMA for uni-directional operation (write only) and support for two channels:
Configuration of the MCDMA IP core.
Vivado design with the MCDMA IP Core (Tcl BD available on GitHub)
TDest is the AXI signal used for routing AXI Stream contents. In addition, when we configure the XADC for AXI Streaming, the different XADC channels output on the stream are identified by the TId bus.
To be able to use the MCDMA in conjunction with the XADC, we need to remap the XADC TId channel to the MCDMA TDest channel. We also need to packetize data by asserting TLast on the MCDMA AXIS input.
In the previous example we used a custom IP core to generate the TLast signal. A better solution however is to remap the TId and generate the TLast signal using the AXIS subset converter.
AXIS Subset Converter Configuration
The eagle-eyed will at this point notice that the XADC uses channel numbers up to 31 with the auxiliary inputs using channel ID’s (16 to 31), which are outside the channel range of the MCDMA. If we are using the auxiliary inputs, we can also use the AXIS subset convertor to remap these higher channel numbers into the MCDMA range by remapping the lower four bits of the XADC TId to the MCDMA TDest channel. When using this method the lower XADC channels cannot be used otherwise there would be conflict.
Output of the XADC with multiple channels (Temperature & VPVN Channels)
Output of the AXIS subset block following remapping and TLast Generation
When it comes to updating the software application, we need to use the XMCDMA.h header APIs to configure the MCDMA and set up the buffer descriptors for each of the channels. The software performs the following steps:
Allocate memory areas for the receive buffer and the buffer descriptors.
For each Channel, create the buffer descriptors.
Populate the buffer descriptors and the receive buffer address.
Reset the receive buffer memory contents to zero.
Invalidate the caching in on the receive buffer to ensure the values can be seen in DDR memory.
Commit the channels to the hardware.
Start the MCDMA transfers for each channel.
The software application defines several buffer descriptors for each channel. When it comes to the receive buffer for this example, I have used a single receive buffer so the received data for both channels shares the same address space. This can be seen below. Halfwords starting 0x4yyy relate to the VPVN input while the device temperature half words start 0x9yyy.
Memory Contents showing the two channels
This is a simple adaption to the existing software to use multiple receive buffers in memory. For many applications, separate receive buffers are more useful.
Being able to move AXI Data streams to memory-mapped locations is a vital requirement for many applications—for example signal processing, communication, and sensor interfacing. Using the AXI subset convertor allows us to correctly remap and format the AXIS stream data into a compliant format for the MCDMA IP core.
Rigol’s new RSA5000 real-time spectrum analyzer allows you to capture, identify, isolate, and analyze complex RF signals with a 40MHz real-time bandwidth over either a 3.2GHz or 6.5GHz signal span. It’s designed for engineers working on RF designs in the IoT and IIot markets as well as industrial, scientific, and medical equipment. Rigol was demonstrating the RSA5000 real-time spectrum analyzer at this week’s DesignCon being held at the Santa Clara Convention Center. I listened to a presentation from Rigol’s North American General Manager Mike Rizzo and then a demo by Rigol’s Director of Product Marketing & Software Applications Chris Armstrong, both captured in the 2.5-minute video below.
Rigol RSA5000 Real-Time Spectrum Analyzer
Based on what I saw in the demo, this is an extremely responsive instrument—far more responsive than a swept spectrum analyzer—with several visualization display modes to help you isolate the significant signal in a sea of signals and noise, in real time. It’s capable of continuously executing 146,484 FFTs/sec, which results in a minimum 100% POI (probability of intercept) of 7.45μsec. You need some real DSP horsepower to achieve that sort of performance and the Rigol RSA5000 real-time spectrum analyzer gets this performance from a pair of Xilinx Zynq Z-7015 SoCs. (You'll find many more details about real-time spectrum analysis and the RSA5000 Real-Time Spectrum Analyzer in the Rigol app note "Realtime Spectrum Analyzer vs Spectrum Analyzer," attached at the end of this post. See below.)
Here’s the short presentation and demo of the Rigol RSA5000 real-time spectrum analyzer from DesignCon 2018:
Mike Rizzo told me that the Rigol design engineers selected the Zynq Z-7015 SoCs for three main reasons:
High-bandwidth access between the Zynq SoC’s PS (processing system) and PL (programmable logic)
Excellent development tools including Xilinx’s Vivado HLS
If you’re looking for a very capable spectrum analyzer, give the Rigol RSA5000 a look. If you’re designing your own real-time system and need high-speed computation coupled with fast user response, take a look at the line of Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.
The Avnet MiniZed is an incredibly low-cost dev board based on the Xilinx Zynq Z7007S SoC with WiFi and Bluetooth built in. It currently lists for $89 on the Avnet site. If you’d like a fast start to using this dev board, Avnet is ready to help. As of now, it’s placed four MiniZed Speedway Design Workshops online so that you can learn at your own convenience and your own pace. The four workshops are:
In the Developing Zynq Hardware Speedway, you will be introduced to the single ARM Cortex –A9 Processor core as you explore its robust AXI peripheral set. Doing so you will utilize the Xilinx embedded systems tool set to design a Zynq AP SoC system, add Xilinx IP as well as custom IP, run software applications to test the IP, and finally debug your embedded system.
From within an Ubuntu OS running within a virtual machine, learn how to install PetaLinux 2017.1 and build embedded Linux targeting MiniZed. In the hands-on labs learn about Yocto and PetaLinux tools to import your own FPGA hardware design, integrate user space applications, and configure/customize PetaLinux.
Using proven flows for SDSoC, the student will learn how to navigate SDSoC. Through hands-on labs, we will create a design for a provided platform and then also create a platform for the Avnet MiniZed. You will see how to accelerate an algorithm in the course lab.
Xcell Daily has covered the FPGA-accelerated AWS EC2 F1 instances from Amazon Web Services several times. The AWS EC2 F1 instances allows AWS customers to develop accelerated code in C, C++, OpenCL, Verilog, or VHDL and run it on Amazon servers augmented with hardware-accelerated cards based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. (See below.)
A new AWS case study titled “Xilinx Speeds Testing Time, Increases Developer Productivity Using AWS” turns the tables. It discusses Xilinx’s use of AWS services to speed development of Xilinx development software such as the Vivado and SDx development environments. Xilinx employs extensive regression testing when developing new releases of these complex tools and the resulting demand spikes called for more “elastic” server resources. (Amazon’s “EC2” designation stands for “Elastic Compute Cloud.”)
As the case study states:
“Xilinx addressed its infrastructure-scaling problem by migrating to a high-performance computing (HPC) cluster running on Amazon Web Services (AWS). ‘We evaluated several cloud providers and chose AWS because it had the best tools and most mature solution,’” says [Ambs] Kesavan, [software engineering and DevOps director at Xilinx].
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Huawei’s FACS cloud offering is based on a PCIe server card that incorporates a Xilinx Virtex UltraScale+ VU9P FPGA. (Huawei also offers the board for on-premise installations.) In addition to the hardware, Huawei offers three major development tools for FACS:
An SDAccel-based shell that offers fast, easy development. SDAccel is Xilinx’s development environment for C, C++, and OpenCL. This shell also provides access to Xilinx’s Vivado development environment.
A DPDK shell for high-performance applications. Intel originally developed DPDK as a packet-processing framework for accelerated server systems and Huawei’s implementation can support throughputs as high as 12Gbytes/sec.
A Professional Simulation Platform that encapsulates more than two decades of Huawei’s FPGA development experience.
With these offerings, Davies said, Huawei is looking to add partners to expand its ecosystem and is particularly interested in talking to companies that offer:
There’s a Huawei Cloud Marketplace that serves as an outlet for FACS applications. The company is also welcoming end users to try the service.
Here’s a video of Davies’ 32-minute presentation at XDF:
Without a doubt, how we use the Zynq SoC’s XADC in our developments is the one area I receive the most emails about. Just before Christmas, I received two questions that I thought would make for a pretty good blog. The first question asked how to use the AXI streaming output with DMA while the follow up question was about how to output the captured data for further analysis.
The AXI streaming output of the XADC is very useful when we want to create a signal-processing path, which might include filters, FFTs, or our own custom HLS processing block for example. It is also useful if we want to transfer many samples as efficiently as possible to the PS (processing system) memory for output or higher-level processing and decision making within the PS.
The XADC outputs samples on the AXI Stream for each of its channels when it is enabled. The AXI Streaming interface implements the optional TID bus that identifies the channel currently streaming on the AXI stream data output to allow downstream IP cores to correlate the AXI Stream data with an input channel. If we output only a single XADC channel, we do not need to monitor the TID signal information. However if we are outputting multiple channels in the AXI stream, we need to pay attention to the TID information to ensure that our processing blocks use the correct samples.
We need a more complex DMA architecture to support multi-channel XADC operation. The DMA IP Core must have its scatter-gather engine enabled to provide multi-channel support. Of course, this level of complexity is not required if we’re only using a single XADC channel.
For the following example, we will be using a single XADC output channel so that I can demonstrate these concepts simply. I will return to this example in a later blog and expand the design for multiple output channels.
We will use a DMA transfer from the PL to the PS to move XADC samples into the PS memory. However, we cannot connect the XADC and DMA AXI stream interfaces directly. That design won’t function correctly because the DMA IP Core requires the assertion of the optional AXI Stream signal TLast to signal AXI transfer completion. Unfortunately, the XADC’s AXI Streaming output does not contain this signal so we need to add a block to drive the TLast signal between the XADC and DMA.
This interface block is very simple. It should allow the user to define the size of the AXI transfer and it needs to assert the TLast signal once the AXI transfer size is reached. Rather helpfully, an IP block called Tlast_gen that implements this functionality has been provided here. All we need to do is add this IP core to the project IP repository and include it in our design.
We can use an AXI GPIO in the PL to control the size of the transfer dynamically at run time.
Creating the block diagram within Vivado for this example is very simple. In fact, most of the design can be automatically created using Vivado, as shown in this video:
The final block diagram is below. I have uploaded the TCL BD description to my GitHub to enable more detailed inspection and recreation of the project.
Once the project has been completed in Vivado, we can build the design and export it to SDK to create the application.
The software application performs the following functions:
Initialize the AXI GPIO
Initialize and configure the XADC to read a single channel (VP/VN on the Zedboard)
Reset the DMA channel
Loop forever performing the following
Simple DMA transfer from the PL to the PS
Flush the cache at the PS transfer address to ensure that we can see the data in the PS DDR memory
Flushing the cache is very important. If we don’t flush the cache, we will not see the captured ADC values in memory when we examine the PS DDR memory.
We also need to take care when setting the number of ADC samples transferred. The output Stream is 16 bits wide. The tlast_gen block counts these 16-bit (2-byte) transfers while the DMA transfer counts bytes. So we need we need to set the tlast_gen transfer size to be half the number of bytes the DMA is configured to transfer. If we fail to set this correctly, we will only be able to perform the transfer once. Then the DMA will hang because the tlast signal will not be generated.
Generation of the tlast signal
When I ran this software on the ZedBoard, I could see the ADC values changing as the DMA transfer occurred in a memory watch window.
Memory watch window showing the 256-byte DMA capture
Now that we can capture the ADC values in the PS memory, we may want to extract this information for further analysis—especially if we are currently verifying or validating the design. The simplest way to do this is to write out the values over an RS-232 port. However, this can be a slow process and it requires modification to the application software.
Another method we can use is the XSCT Console within the debug view in SDK. Using XSCT we can:
Read out a memory address range
Read out a memory address range as a TCL list
Read out a memory address range to a binary file
The simplest approach is to output a memory address range. To do this, we use the command:
mrd <address> <number of words>
Reading out 256 words from the address 0x00100000
While this technique outputs the data, the output format is not the easiest for an analysis program to work with because it contains both address and data values.
We can obtain a more useful data output by requesting the data to be output as a TCL list using the command:
mrd -value -size h 0x00100000 128
Reading out a TCL list
We can then use this TCL list with a program like Microsoft Excel, MATLAB, or Octave to further analyze the captured signal:
Captured Data analyzed and plotted in Excel.
Finally, if we want to download a binary file containing the memory contents we can use the command:
mrd -bin -file part233.bin 0x00100000 128
We can then read this binary file into an analysis program like Octave or MATLAB or into custom analysis software.
Hopefully by this point I have answered the questions posed and shared the answers more widely, enabling you to get your XADC designs up and running faster.
First, the YouTube video demonstrates the IoT design interacting with an app on a mobile phone. Then video takes you step-by-step through the creation process using the Xilinx Vivado development environment.
The YouTuber writes:
“I implemented a web server using Python and bottle framework, which works with another C++ application. The C++ application controls my custom IPs (such as PWM) implemented in PL block. A user can control LEDs, 3-color LEDs, buttons and switches mounted on ZYBO board.”
The YouTube video’s Web page also lists the resources you need to recreate the IoT design:
As a Xilinx employee I would like to contribute on the Pros ... and the Cons.
Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.
Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:
FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...
Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.
Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).
Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].
Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:
IP Integrator(IPI): a graphical interface to stitch IPs together and generate bitstreams for complete systems.
Vivado HLS (High Level Synthesis): a tool that allows you to generate HDL code from C/C++ code. This tool will generate IPs that can be handled by IPI.
SDSoC (Software Defined SoC): This tool allows you to design complete systems, software and hardware on a Zynq SoC/Zynq UltraScale+ MPSoC platform. This tool with some plugins will allow you to move part of your C/C++ code to programmable logic (calling Vivado HLS in the background).
SDAccel: an OpenCL (and more) implementation. Not relevant for this thread.
There are also tools related to the MathWorks environment [MATLAB and Simulink]:
System Generator for DSP (aka SysGen): Low-level Simulink library (designed by Xilinx for Xilinx FPGAs). Allows you to program HDL code with blocks. This tools achieves even better performance (clock/area) than HDL code as each block is an instance of an IP (from register, adder, counter, multiplier up to FFT, FIR compiler, and VHLS IP). Bit-true and cycle-true simulations.
Xilinx Model Composer (XMC): available since ... yesterday! Again a Simulink blockset but based on Vivado HLS. Much faster simulations. Bit-true but not cycle-true.
All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…
Mathworks has been advocating model-based design using its MATLAB and Simulink development tools for some time because the design technique allows you to develop more complex software with better quality in less time. (See the Mathworks White Paper: “How Small Engineering Teams Adopt Model-Based Design.”) Model-based design employs a mathematical and visual approach to developing complex control and signal-processing systems through the use of system-level modeling throughout the development process—from initial design, through design analysis, simulation, automatic code generation, and verification. These models are executable specifications that consist of block diagrams, textual programs, and other graphical elements. Model-based design encourages rapid exploration of a broader design space than other design approaches because you can iterate your design more quickly, earlier in the design cycle. Further, because these models are executable, verification becomes an integral part of the development process at every step. Hopefully, this design approach results in fewer (or no) surprises at the end of the design cycle.
Xilinx supports model-based design using MATLAB and Simulink through the new Xilinx Model Composer, a design tool that integrates into the MATLAB and Simulink environments. The Xilinx Model Composer includes libraries with more than 80 high-level, performance-optimized, Xilinx-specific blocks including application-specific blocks for computer vision, image processing, and linear algebra. You can also import your own custom IP blocks written in C and C++, which are subsequently processed by Vivado HLS.
Here’s a block diagram that shows you the relationship among Mathworks’ MATLAB, Simulink, and Xilinx Model Composer:
Finally, here’s a 6-minute video explaining the benefits and use of Xilinx Model Composer:
RHS Research’s PicoEVB FPGA dev board based on an Artix-7 A50T FPGA snaps into an M.2 2230 key A or E slot, which is common in newer laptops. The board measures 22x30mm, which is slightly larger than the Artix-7 FPGA and configuration EEPROM mounted on one side of the board. It has a built-in JTAG connection that works natively with Vivado.
Here’s a photo that shows you the board’s size in relation to a US 25-cent piece:
Even though the board itself is small, you still get a lot of resources in the Artix-7 A50T FPGA including 52,160 logic cells, 120 DSP48 slices, and 2.7Mbits of BRAM.
For the final MicroZed Chronicles blog of the year, I thought I would wrap up with several tips to help when you are creating embedded-vision systems based on Zynq SoC, Zynq UltraScale+ MPSoC, and Xilinx FPGA devices.
Note: These tips and more will be part of Adam Taylor’s presentation at the Xilinx Developer Forum that will be held in Frankfurt, Germany on January 9.
Design in Flexibility from the Beginning
Video Timing Controller used to detect the incoming video standard
Use the flexibility provided by the Video Timing Controller (VTC) and reconfigurable clocking architectures such as Fabric Clocks, MMCM, and PLLs. Using the VTC and associated software running on the PS (processor system) in the Zynq SoC and Zynq UltraScale+ MPSoC, it is possible to detect different video standards from an input signal at run time and to configure the processing and output video timing accordingly. Upon detection of a new video standard, the software running on the PS can configure new clock frequencies for the pixel clock and the image-processing chain along with re-configuring VDMA frame buffers for the new image settings. You can use the VTC’s timing detector and timing generator to define the new video timing. To update the output video timings for the new standard, the VTC can use the detected video settings to generate new output video timings.
Convert input video to AXI Interconnect as soon as possible to leverage IP and HLS
Converting Data into the AXI Streaming Format
Vivado provides a range of key IP cores that implement most of the functions required by an image processing chain—functions such as Color Filter Interpolation, Color Space Conversion, VDMA, and Video Mixing. Similarity Vivado HLS can generate IP cores that use the AXI interconnect to ease integration within Vivado designs. Therefore, to get maximum benefit from the available IP and tool chain capabilities, we need to convert our incoming video data into the AXI Streaming format as soon as possible in the image-processing chain. We can use the Video-In-to-AXI-Stream IP core as an aid here. This core converts video from a parallel format consisting of synchronization signals and pixel values into our desired AXI Streaming format. A good tip when using this IP core is that the sync inputs do not need to be timed as per a VGA standard; they are edge triggered. This eases integration with different video formats such as Camera Link, with its frame-valid, line-valid, and pixel information format, for example.
Use Logic Debugging Resources
Insertion of the ILA monitoring the output stage
Insert integrated logic analyzers (ILAs) at key locations within the image-processing chain. Including these ILAs from day one in the design can help speed commissioning of the design. When implementing an image-processing chain in a new design, I insert ILA’s as a minimum in the following locations:
Directly behind the receiving IP module—especially if it is a custom block. This ILA enables me to be sure that I am receiving data from the imager / camera.
On the output of the first AXI Streaming IP Core. This ILA allows me to be sure the image-processing core has started to move data through the AXI interconnect. If you are using VDMA, remember you will not see activity on the interconnect until you have configured the VDMA via software.
On the AXI-Streaming-to-Video-Out IP block, if used. I also consider connecting the video timing controller generator outputs to this ILA as well. This enables me to determine if the AXI-Stream-to-Video-Out block is correctly locked and the VTC is generating output timing.
When combined with the test patterns discussed below, insertion of ILAs allows us to zero in faster on any issues in the design which prevent the desired behavior.
Select an Imager / Camera with a Test Pattern capability
Incorrectly received incrementing test pattern captured by an ILA
If possible when selecting the imaging sensor or camera for a project, choose one that provides a test pattern video output. You can then use this standard test pattern to ensure the reception, decoding, and image-processing chain is configured correctly because you’ll know exactly what the original video signal looks like. You can combine the imager/camera test pattern with ILAs connected close to the data reception module to determine if any issues you are experiencing when displaying an image is internal to the device and the image processing chain or are the result of the imager/camera configuration.
We can verify the deterministic pixel values of the test pattern using the ILA. If the pixel values, line length, and the number of lines are as we expect, then it is not an imager configuration issue. More likely you will find the issue(s) within the receiving module and the image-processing chain. This is especially important when using complex imagers/cameras that require several tens, or sometimes hundreds of configuration settings to be applied before an image is obtained.
Include a Test Patter Generator in your Zynq SoC, Zynq UltraScale+ MPSoC, or FPGA design
Tartan Color Bar Test Pattern
If you include a test-pattern generator within the image-processing chain, you can use it to verify the VDMA frame buffers, output video timing, and decoding prior to the integration of the imager/camera. This reduces integration risks. To gain maximum benefit, the test-pattern generator should be configured with the same color space and resolution as the final imager. The test pattern generator should be included as close to the start of the image-processing chain as possible. This enables more of the image-processing pipeline to be verified, demonstrating that the image-processing pipeline is correct. When combined with test pattern capabilities on the imager, this enables faster identification of any problems.
Understand how Video Direct Memory Access stores data in memory
Video Direct Memory Access (VDMA) allows us to use the processor DDR memory as a frame buffer. This enables access to the images from the processor cores in the PS to perform higher-level algorithms if required. VDMA also provides the buffering required for frame-rate and resolution changes. Understanding how VDMA stores pixel data within the frame buffers is critical if the image-processing pipeline is to work as desired when configured.
One of the major points of confusion when implementing VDMA-based solutions centers around the definition of the frame size within memory. The frame buffer is defined in memory by three parameters: Horizontal Size (HSize), Vertical Size (VSize). and Stride. The two parameters that define the Horizontal Size of the image are the HSize and the stride of the image. Like VSize, which defines the number of lines in the image, the HSize defines the length of each line. However instead of being measured in pixels the horizontal size is measured in bytes. We therefore need to know how many bytes make up each pixel.
The Stride defines the distance between the start of one line and another. To gain efficient use of the DDR memory, the Stride should at least equal the horizontal size. Increasing the Stride introduces a gap between lines. Implementing this gap can be very useful when verifying that the imager data is received correctly because it provides a clear indication of when a line of the image starts and ends with memory.
These six simple techniques have helped me considerably when creating imageprocessing examples for this blog or solutions for clients and they significantly ease both the creation and commissioning of designs.
As I said, this is my last blog of the year. We will continue this series in the New Year. Until then I wish you all happy holidays.
Over the last couple of weeks, we have examined how we can debug our designs using Micrium’s μC/Probe (Post 1 and Post 2) or with the JTAG to AXI Bridge. However, the best way to minimize time spent debugging is to generate high quality designs in the first place. We can then focus on ensuring that the design functionality is as specified instead of hunting bugs.
To improve the quality of our design, there are several things we can do that help us achieve timing closure and identify design issues and bugs:
Review code to ensure that it not only complies with coding and design standards and to catch functional issues earlier in the design stage.
Correctly constrain the design for clocks, multicycle, and false paths.
Analyze CDCs (Clock Domain Crossings) to ensure that all CDCs are correctly handled.
Perform detailed simulation to test corner cases and boundary conditions.
Over my career, I have spent many hours performing code reviews, checking designs for functionality, and for compliance with coding standards and tool-chain recommendations.
The Blue Pearl Visual Verification Suite is an EDA tool that automates design checking over a range of different customizable rule sets including basic rules, Xilinx Ultrafast design methodology rules, and DO254. The Blue Pearl tools also perform detailed analysis of clocks, counters, state machines, CDCs, paths, and constraints. All of this checking helps engineers gain a better understanding of the functional side of their design. In short, this is a very useful tool set to have in our tool box to improve design quality. Let’s look at how this tool integrates with the Xilinx Vivado design environment and how we can use it on a simple design.
With Blue Pearl installed, the first step is to integrate it with Vivado. To do this we use the Xilinx TCL Store to install the Blue Pearl Visual Verification Suite.
Installing Blue Pearl via the Xilinx TCL Store
Once Blue Pearl is installed, the next step is to create two custom commands. The first command allows us to open a new Blue Pearl project from an open Vivado project. The second command allows updates from Vivado into the Blue Pearl project.
We create these custom commands by selecting tools->custom commands->customizes commands.
Open the Command Customization dialog
This opens a dialog that allows you to create custom commands. For each command, we need to define the callable TCL procedures in the Blue Pearl Visual Verification Suite.
Creating the Launch BPS command
For the “launch BPS” command, we need to use the command:
Creating the Update Command
For the update BPS command, we call the following command:
Once you have completed the addition of the customized commands, you will see two new buttons on the Vivado tool bar.
With the integration completed, we can now use Blue Pearl to analyze and improve the quality of our design if we identify issues that need analysis. Clicking the newly created “launch Blue Pearl” command within a Vivado project opens a new Blue Pearl project for analysis.
As it loads the Vivado design, Blue Pearl checks the code for synthesis and identifies any black boxes. Any syntax errors encountered will be flagged for correction before further analysis can be performed.
There are an extensive number of checks and analysis that can be run on the loaded design, ranging from basic checks to DO254 compliance. There are so many possible checklist items that it might take a little time to select the checks that are important to you. However, once you’ve specified the checks you want, you can save the rules and use them across multiple projects. What is interesting is the tool also reports if the check has been run and not just its status as pass of fail. This explicit feedback mechanism removes the ability of designers to achieve compliance by omission. (And that’s a good thing.)
Blue Pearl Environment
Design Check configuration
As an example, I loaded a project that I am working on to see what the design check and analysis reports look like. The design is simple. It decodes a MIPI stream to frame sync, line sync, and pixel values. While this is a simple design, Blue Pearl still identified a few issues within the code that need consideration to see if they present an issue or not.
The first potential issue identified was in the If/Then/Else (ITE) analysis. The design contains a VHDL process that decodes the MIPI header type. This process is written using an if / elsif structure, which implies a priority encoder. Furthermore, to differentiate between five different header commands, the length of the priority encoder contains a five deep if / elsif structure. Blue Pearl calls this a length of five. By default, Blue Pearl generates warnings on lengths greater than 3. In this case no priority required and a case statement would provide better synthesis results because there is no need to consider input priority. Although each application is different, you as the engineer need to use your own experience and knowledge of the design to decide whether or not priority is needed.
Along with reporting the length of the if structure, ITE analysis also analyzes the number of conditions within a statement. This is important when an if statement contains several conditions because additional conditions require additional logic resources and routing, which will impact our timing performance.
Identification of if / then /else large length
State machines are of course used in designs for control structures. Complex control structures requires large state machines, which can be difficult to follow in the RTL. As part of its analysis, Blue Pearl creates visualizations of state machines within a design. This visualization details the transitions among states, along with identifying any unreachable states. I found this capability very useful not only in debugging and verifying the behavior of my own state machines, but also for visualizing third-party designs. This graphical capability definitely helps me understand the designer’s intent.
FSM Analysis Viewer
Blue Pearl also provides the ability to visualise CDCs and paths and to monitor fan out within a design. These features allow us to identify places in the design where we might want to add CDC-mitigation measures such as re-timing or pipeline registers within the design.
Clock Domain Crossing Analysis
Flip-Flop Fan out reporting
Having touched lightly on the capabilities of Blue Pearl, I am impressed with the results once you have taken the time to set the correct checks and analysis. The analysis provided allows you to catch potential issues earlier in the design cycle, which should reduce the time spent in the lab hunting bugs. In turn, this frees us to spend more of our time testing functionality.
Over the last two instalments of this blog series, we have explored Micrium’s μC/Probe and using it’s virtual dashboard and oscilloscope capabilities to debug code on the Zynq SoC’s PS. (See Post 1 and Post 2.) Then, I started thinking about how we might use μC/Probe to debug hardware in the Zynq SoC’s PL. The most obvious choice is to insert an integrated logic analyzer (ILA) IP core into the hardware design to observe AXI, AXI Stream, or native logic signals as we have done several times in the MicroZed Chronicles series. Similarly, if we wish to change a signal’s value at run time, we can use the virtual input output IP core to toggle the state of a signal or bus. Reading or modifying the contents of an AXI device within the PL will be a little more complicated, but it can be done by using the JTAG-to-AXI Bridge to connect into the AXI network and then reading or writing to any peripheral on the bus using Vivado’s hardware manager and a TCL script.
To debug designs that contain embedded processing, we could of course write a simple software application for the processor that reads and writes registers in the PL. However this is not always possible, especially if there are separate hardware- and software-development teams. FPGA development teams are sometimes unused to developing embedded software so it can be quicker and easier to use the JTAG-to-AXI bridge for testing out basic hardware functionality before passing the design over to the software development team.
There is also another very important use case for the JTAG-to-AXI bridge if the design does not contain an embedded processing core. (For example, it’s not a Zynq SoC that you’re debugging and you haven’t instantiated a MicroBlaze processor in your FPGA.) AXI is often used to configure and connect IP blocks available via the Vivado IP library. Even if you don’t have an embedded processor available, the JTAG-to-AXI bridge enables fast hardware-in-the-loop testing to ensure that the block performs as intended.
Inserting the JTAG-to-AXI Bridge
Regardless of the presence or absence of an embedded processing element, we can use the JTAG to AXI Bridge for:
Hardware-in-the-loop validation of IP blocks with the JTAG-to-AXI bridge acting as a traffic generator.
During initial board commissioning, we can create TCL scripts to test hardware interfaces, which frees us from depending on software development for hardware debugging.
For more formal validation, we can use the JTAG-to-AXI bridge to extract information from memories, making that information available for later analysis.
Including the JTAG-to-AXI core in the design is simple and requires only connections to the AXI network, clock, and reset. When it comes to IP configuration, all we need to do is define the preferred variety of AXI: AXI4 or AXI Lite. In the JTAG-to-AXI bridge configuration dialog, we can also define the read and write queue lengths. These define the number of TCL commands that the bridge can queue. I have defined the maximum queue length of 16 commands in this example because this will simplify the script I need to write.
Customizing the JTAG-to-AXI Bridge
For the example in this blog post, I have implemented a Zynq-based design with the PS connected via an AXI interconnect to an AXI QSPI module, which I’ve configured as a standard SPI master. To enable the JTAG-to-AXI bridge to also be able to access the AXI QSPI module, I added a second slave port to the AXI interconnect, and then connected the JTAG-to-AXI bridge.
Once this is complete we can use the address editor within Vivado, which will clearly show the peripherals that can be addressed by the Zynq SoC’s PS and which peripherals the JTAG-to-AXI bridge can address. In this case, both the PS and JTAG-to-AXI bridge can address the QSPI device on the AXI bus:
Addressable AXI Memory Peripherals for both the PS and JTAG to AXI Bridge
When we use the JTAG-to-AXI bridge, we must also pay attention to our clocking and reset architecture. If we plan to configure the PS before we use the bridge, the PS-provided fabric clocks and resets are suitable. However, if we intend to use the bridge before we configure the PS side of the Zynq SoC, we will need to source a clock from within the PL and ensure that the PS does not provide the AXI network reset. Failure to do this will result in the Vivado hardware manager not being able to detect the JTAG-to-AXI bridge until the PS has been configured.
There are two basic commands: create_hw_axi_txn and run_hw_axi. We use the first command to create a number of AXI transactions, both read and write. We use the second command to execute the transactions, performing the reads and writes as necessary:
TCL Script used to configure and write out using SPI
To demonstrate the power of the JTAG-to-AXI bridge, I created a simple script that configures the QSPI as a master and transmits several words. The scripting commands allow us to assign a name to each transaction that we define. We can then call these named transactions multiple times as needed within the script. This is a very nice feature. In the image of the script above, these names are identified by the names wr_txn_liteX. These named transactions are then called in a different order when executed in the hardware. For example, I did this to first fill up the QSPI TX FIFO before I enabled Master transmissions.
I ran this example on a Digilent Arty Z7 connected to a Digilent Digital Discovery logic analyzer and pattern generator to monitor the SPI transactions. Here’s what I saw in the Digital Discovery’s Waveforms user interface:
Captured SPI waveform generated by the JTAG-to-AXI Bridge
This example demonstrates the ease with which we can implement the JTAG-to-AXI bridge and with which we can create scripts to interact with the AXI networks within the PL of the design. Hopefully these hints will allow you to debug and verify your design with greater ease.
The card’s differential ADC inputs handle ±2.5V signals, resulting in a ±5 V differential voltage range. The card also offers s 64 digital I/O pins on the XMC P14 connector, which can be used as 64 single-ended LVCMOS24 or as 32 differential LVDS25 interfaces, and four 12.5Gbps GTX transceiver pins on P16.
Per the TEWS data sheet: You can develop your own data-acquisition and control applications for the TXMC638 with the Xilinx Vivado Design Suite.
If you’ve got some high-speed RF analog work to do, VadaTech’s new AMC598 and VPX598 Quad ADC/Quad DAC modules appear to be real workhorses. The four 14-bit ADCs (using two AD9208 dual ADCs) operate at 3Gsamples/sec and the quad 16-bit DACs (using four AD9162 or AD9164 DACs) operate at 12Gsamples/sec. You’re not going to drive those sorts of data rates over the host bus so the modules have local memory in the form of three DDR4 SDRAM banks for a total of 20Gbytes of on-board SDRAM. A Xilinx Kintex UltraScale KCU115 FPGA (aka the DSP Monster, the largest Kintex UltraScale FPGA family member with 5520 DSP slices that give you an immense amount of digital signal processing power to bring to bear on those RF analog signals) manages all of the on-board resources (memory, analog converters, and host bus) and handles the blazingly fast data-transfer rates allowing you to create RF waveform generators and advanced RF-capture systems for applications including communications and signal intelligence (COMINT/SIGINT), radar, and electronic warfare using Xilinx tools including the Vivado Design Suite HLx Editions and the Xilinx Vivado System Generator for DSP, which can be used in conjunction with MathWorks’ MATLAB and the Simulink model-based design tool.
Here’s a block diagram of the AMC598 module:
VadaTech AMC598 Quad ADC/Quad DAC Block Diagram
And here’s a photo of the AMC598 Quad ADC/Quad DAC module:
VadaTech AMC598 Quad ADC/Quad DAC
Note: Please contact VadaTech directly for more information about the AMC598 and VPX598 Quad ADC/Quad DAC modules.
In my last blog, I showed you how to use Micrium’s μC/Probe to debug Zynq-based embedded systems in real time without impacting target operation. The example I presented last time demonstrated the basics of how you can use μC/Probe to monitor variables in target memory. While this capability is very useful, μC/Probe has advanced debugging features that can aid us even further to debug our design and provides us with a real-time user interface that lets us control and monitor internal system variables using a variety of graphical interface components.
I want to explore some of these advanced features in this blog including:
Modifying global variables on the target processor during run time.
Implementing an eight-channel oscilloscope, allowing us to monitor multiple variables at run time.
We need to add some simple code to the target to implement the oscilloscope. We do not need to add code if we only wish to modify variables at run time.
To create a more in-depth example, I am going to use the XADC to receive an external signal on the Vp/Vn inputs and then output a copy of the received signal using a Pmod DA4 DAC. The amplitude of the signal output by the Pmod DA4 will be controlled by a software variable which in turn will be controlled using μC/Probe.
To further demonstrate the μC/Probe’s interactive capabilities with GPIO, I have configured the eight LEDs and four DIP switches on the MicroZed IO carrier Card (IOCC) in a manner that will ease interfacing with μC/Probe.
To create this blog’s example in the Vivado design, I instantiated an XADC wizard, routed 12 Zynq PS GPIO signals to the EMIO, and enabled SPI 0 in the Zynq PS. I routed the SPI 0 signals over EMIO as well. I then connected EMIO SPI outputs to a Pmod bridge, as the Pmod DA4 has no defined driver.
Vivado Block Design for the Advanced μC/Probe Example
With the Vivado design built and exported to SDK, I needed to create a software application to perform the following:
Initialize the Zynq SoC’s XADC, GPIO, SPI, and Interrupt controllers
Configure the XADC to receive data from the Vp/Vn inputs
Configure the SPI controller for master operation with a clock rate as close to 50MHz as possible. (Note that 50MHz is the maximum frequency for the DAC on the Pmod DA4.)
Configure the interrupt controller for handling SPI interrupts.
Configure the GPIO with eight outputs to drive the LEDs and four inputs to read the switches.
Within the main program loop, read a value from the XADC, scale it for the appropriate output value, and write the computed value to the Pmod DA4 before updating the status of the LEDs and switches.
This sequence gives us the ability to read in a XADC values and output a scaled version over the Pmod DA4. Within the application, any variable we wish to be able to update using μC/Probe needs to be declared as a global variable. Therefore, in this example, the XADC values, the value to be written to the Pmod DA4, and the LED and switch variables are all declared globally.
To use the oscilloscope, we need to add some target code. This code is defined in three files provided on the Micrium μC/Probe download page. This download also contains the code necessary for implementing a μC/Probe interface to a target using RS232, USB, or Ethernet instead of JTAG.
The three files we need to map into our design are found under the directory:
The first two files, probe_scope.c and probe_scope.h, define the oscilloscope’s behavior and should be left unchanged. Add them to SDK as new include directory paths. The file within the cfg directory, probe_scope_cfg.h, defines the scope configuration. This file allows us to define the number of active channels, the maximum number of samples, support for 16 and 32-bit variables and the sampling frequency default. Copy this file into your working directory.
To make use of the files we have just added, call probe_scope_init() in the main application to initialize the oscilloscope at program start. Samples are then captured using the function probe_scope_sampling(), which is called when we wish to take a sample for the oscilloscope.
In this example I have called the sampling function at the end of the XADC/Pmod DA4 write loop. However, it can also be used with a timer and called during the ISR.
With the target code included and the main application ready, I built a target application boot file and booted the MicroZed.
I used my Digilent Analog Discovery module (since superseded by the Analog Discovery 2 module) to provide a stimulus waveform for the XADC and to capture the output from the Pmod DA4. The Analog Discovery module allows me to change the stimulus as desired and to monitor the Pmod DA4 output. Of course the waveforms I see using the Analog Discovery module will also be displayed on the μC/Probe oscilloscope.
Test Set up with the MicroZed, MicroZed IO Carrier Card, Segger JTAG probe, and Analog Discovery module
I created a simple dashboard within the μC/Probe application that allows me to change the output signal’s amplitude and the LEDs’ status and to monitor the DIP-switch settings on the IO Carrier Card. Doing all of this is simple with the μC/Probe application. We do the same as we did before: drop the graphical component(s) we desire on the dashboard and then associate the appropriate variables in the ELF. Only this time, instead of just monitoring variable values we will also manipulate them in the case of the output scale and LEDs.
To scale the Pmod DA4 output, I placed a horizontal slider and associated the slider with the float variable “scale” from the ELF. I also changed the slider’s full-scale value from 100 to 1.0 using the properties editor.
For each of the LEDs, I used graphical toggle buttons and associated each button with appropriate LED variable within the ELF. Finally, for each of the switches, I placed a graphical LED on the dashboard and configured the graphical LED to illuminate when its associated variable was set.
To create an oscilloscope within the dashboard, right click on the project under the workspace explorer and select the oscilloscope. This will create a new oscilloscope tab on the dashboard. As before, we can associate the variable from the ELF with each of the enabled channels by dragging and dropping as we do for any application.
Adding the Oscilloscope Tab to μC/Probe’s dashboard
Associating an ELF variable with Oscilloscope channel
To be able to see both tabs on the dashboard at the same time, I combed them as shown below. I did this by right-clicking on a tab and selecting “tile horizontally.”
Control and Oscilloscope combined on one μC/Probe dashboard
This approach does hide the oscilloscope control panel. However, you can open the oscilloscope control panel by clicking on “Scope Settings” at the bottom of the display. Doing so will open a traditional Oscilloscope control panel where you can set the triggering, timebase, etc.
Oscilloscope control panel, opened
Putting all of this together allowed me to control the amplitude of the output waveform on the Pmod DA4 using the slider and to control the LED status using the toggle buttons. I can of course see changes in both the input and output stimulus using μC/Probe’s oscilloscope and view the switch status via the simulated LEDs on the dashboard.
To demonstrate all of this in action I created a very short video:
The screen shots below also demonstrate the oscilloscope working.
Capturing the initial sine wave input
Output value to the DAC with the slider in the nominal position
I will be using μC/Probe in future blogs when I want to interact with the application software running on my Zynq target. I continue to be very impressed with μC/Probe’s capabilities.
The following links were useful in developing this blog:
Being able to see internal software variables in our Zynq-based embedded systems in real time is extremely useful during system bring-up and for debugging. I often use an RS-232 terminal during commissioning to report important information like register values from my designs and have often demonstrated that technique in previous blog posts. Information about variable values in a running system provides a measure of reassurance that the design is functioning as intended and, as we progress though the engineering lifecycle, provides verification that the system is continuing to work properly. In many cases we will develop custom test software that reports the status of key variables and processes to help prove that the design functions as intended.
This approach works well and has for decades—since the earliest days of embedded design. However, a better solution for Zynq-based systems that allows us to read the contents of the processor memory and extract the information we need without impacting the target’s operation and without the need to add a single line of code to the running target now presents itself. It’s called μC/Probe and it’s from Micrium, the same company that has long offered the µC/OS RTOS for a wide variety of processors including the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC.
Micrium’s μC/Probe tool allows us to create custom graphical user interfaces that display the memory contents of interest in our systems designs. With this capability, we can create a virtual dashboard that provides control and monitoring of key system parameters and we can do this very simply by dragging and dropping indicator, display, and control components onto the dashboard and associating them with variables in the target memory. In this manner it is possible to both read and write memory locations using the dashboard.
When it comes to using Micrium’s μC/Probe tool with our Zynq solution, we have choices regarding interfacing:
Use a Segger J Link JTAG Pod. In this case, the target system requires no additional code unless we wish to use an advanced μC/Probe feature such as an oscilloscope.
Use RS-232, USB, or TCP/IP. In this case we do not need to use JTAG. However we do need to add some code to the target embedded system. Micrium supplies sample code for us to use.
For this first example, I am going to use a Segger J Link JTAG pod to create a simple example and demonstrate the capabilities. However, the second interface option proves useful for Zynq-based boards that lack a separate JTAG header and instead use a USB-to-JTAG device or if you do not have a J Link probe. We will look at using Micrium’s μC/Probe tool with the second interface option in a later blog.
Of course, the first thing we need to do is create a test application and determine the parameters to observe. The Zynq SoC’s XADC is perfect for this because it provides quantized values of the device temperature and voltage rails. These are ideal parameters to monitor during an embedded system’s test, qualification, and validation so we will use these parameters in this blog.
The test application example will merely read these values in a continuous loop. That’s a very simple program to write (see MicroZed Chronicles 7 and 8 for more information on how to do this). To understand the variables that we can monitor or interact with using μC/Probe, we need to understand that the tool reads in and parses the ELF produced by SDK to get pointers to the memory values of interest. To ensure that the ELF can be properly read in and parsed by μC/Probe, the ELF needs to contain debugging data in the DWARF format. That means that within SDK, we need to set the compile option –gdwarf-2 to ensure that we use the appropriate version of DWARF. Failure to use this switch will result in μC/Probe being unable to read and parse the generated ELF.
We set this compile switch in the C/C++ build settings for the application, as shown below:
Setting the correct DWARF information in Xilinx SDK
With the ELF file properly created, I made a bootable SD card image of the application and powered on the MicroZed. To access the memory, I connected the Segger J Link, which uses a Xilinx adaptor cable to mate with the MicroZed board’s JTAG connector.
MicroZed with the Segger J Link
All I needed to do now was to create the virtual dashboard. Within μC/Probe, I loaded the ELF by clicking on the ELF button in the symbol browser. Once loaded, we can see a list of all the symbols that can be used on μC/Probe’s virtual dashboard.
ELF loaded, and available symbols displayed
For this example, which is monitoring Zynq SoC’s XADC internal signals including device temperature and power, I added six Numeric Indicators and one Angular Gauge Quadrant. Adding graphical elements to the data screen is very simple. All you need to do this find the display element you desire in the toolbox, drag onto the data screen, and drop it in place.
Adding a graphical element to the display
To display information from the running Zynq SoC on μC/Probe’s Numeric Indicators and on the Gauge, I needed to associate each indicator and gauge with a variable in memory. We use the Symbol viewer to do this. Select the variable you want and drag it onto the display indicator as shown below.
Associating a variable with a display element
If you need to scale the display to use the full variable range or otherwise customize it, hold the mouse over the appropriate display element and select the properties editor icon on the right. The properties editor lets you scale the range, enter a simple transfer function, or increase the number of decimal places if desired.
Formatting a Display Element
Once I’d associated all the Numeric Indicators and the Gauge with appropriate variables but before I could run the project and watch the XADC values in real time, one final thing remained: I needed to inform the project how I wished to communicate with the target and select the target processor. For this example, I used Segger’s J Link probe.
Configuring the communication with the target
With this complete. I clicked “run” and captured the following video of the XADC data being captured and displayed by μC/Probe.
All of this was pretty simple and very easy to do. Of course, this short has just scratched the surface of the capabilities of Micrium’s μC/Probe tool. It is possible to implement advanced features such as oscilloscopes, bridges to Microsoft Excel, and communication with the target using terminal windows or more advanced interfaces like USB. In the next blog we will look at how we can use some of these advanced features to create a more in-depth and complex virtual dashboard.
I think I am going to be using Micrium’s μC/Probe tool in many blogs going forward where I want to interact with the Zynq as well.
You can find the example source code on the GitHub.
So far, all of my image-processing examples have used only one sensor and produce one video stream within the Zynq SoC or Zynq UltraScale+ MPSoC PL (programmable logic). However, if we want to work with multiple sensors or overlay information like telemetry on a video frame, we need to do some video mixing.
Video mixing merges several different video streams together to create one output stream. In our designs we can use this merged video stream in several ways:
Tile together multiple video streams to be displayed on a larger display. For example, stitching multiple images into a 4K display.
Blend together multiple image streams as vertical layers to create one final image. For example, adding an overlay or performing sensor fusion.
To do this within our Zynq SoC or Zynq UltraScale+ MPSoC system, we use the Video Mixer IP core, which comes with the Vivado IP library. This IP core mixes as many as eight image streams plus a final logo layer. The image streams are provided to the core via AXI Streaming or AXI memory-mapped inputs. You can select which one on a stream-by-stream basis. The IP Core’s merged-video output uses an AXI Stream.
To give a demonstration of the how we can use the video mixer, I am going to update the MiniZed FLIR Lepton project to use the 10-inch touch display and merge a second video stream using a TPG. Using the 10-inch touch display gives me a larger screen to demonstrate the concept. This screen has been sitting in my office for a while now so it’s time it became useful.
Upgrading to the 10-inch display is easy. All we need to do in the Vivado design is increase the pixel clock frequency (fabric clock 2) from 33.33MHz to 71.1MHz. Along with adjusting the clock frequency, we also need to set the ALI3 controller block to 71.1MHz.
Now include a video mixer within the MiniZed Vivado design. Enable layer one and select a streaming interface with global alpha control enabled. Enabling a layer’s global alpha control allows the video mixer to blend the alpha on a pixel-by-pixel basis. This setting allows pixels to be merged according to the defined alpha value rather than just over riding the pixel on the layer beneath. The alpha value for each layer ranges between 0 (transparent) and 1 (opaque). Each layer’s alpha value is defined within an 8-bit register.
Insertion of the Video Mixer and Video Test Pattern Generator
Enabling layer 1, for AXI streaming and Global Alpha Blending
The FLIR camera provides the first image stream. However we need a second image stream for this example, so we’ll instantiate a video TPG core and connect its output to the video mixer’s layer 1 input. For the video mixer and test pattern generator, be sure to use the high-speed video clock used in the image-processing chain. Build the design and export it to SDK.
We use the API xv_mix.h to configure the video mixer in SDK. This API provides the functions needed to control the video mixer.
The principle of the mixer is simple. There is a master layer and you declare the vertical and horizontal size of this layer using the API. For this example using the 10-inch display, we set the size to 1280 pixels by 800 lines. We can then fill this image space using the layers, either tiling or overlapping them as desired for our application.
Each layer has an alpha register to control blending along with X and Y origin registers and height and width registers. These registers tell the mixer how it should create the final image. Positional location for a layer that does not fill the entire display area is referenced from the top left of the display. Here’s an illustration:
Video Mixing Layers, concept. Layer 7 is a reduced-size image in this example.
To demonstrate the effects of layering in action, I used the test pattern generator to create a 200x200-pixel checkerboard pattern with the video mixer’s TPG layer alpha set to opaque so that it overrides the FLIR Image. Here’s what that looks like:
Test Pattern FLIR & Test Pattern Generator Layers merged, test pattern has higher alpha.
Then I set the alpha to a lower value, enabling merging of the two layers:
Test Pattern FLIR & Generator Layers merged, test pattern alpha lower.
We can also use the video mixer to tile images as shown below. I added three more TPGs to create this image.
Four tiled video streams using the mixer
The video mixer is a good tool to have in our toolbox when creating image-processing or display solutions. It is very useful if we want to merge the outputs of multiple cameras working in different parts of the electromagnetic spectrum. We’ll look at this sort of thing in future blogs.
You can find the example source code on the GitHub.
Programmable logic is proving to be an excellent, flexible implementation medium for neural networks that gets faster and faster as you go from floating-point to fixed-point representation—making it ideal for embedded AI and machine-learning applications—and the latest proof point is a recently published paper written by Yufeng Hao and Steven Quigley in the Department of Electronic, Electrical and Systems Engineering at the University of Birmingham, UK. The paper is titled “The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA” and it describes a successful implementation and training of a fixed-point Deep Recurrent Neural Network (DRNN) using the Python programming language; the Theano math library and framework for multi-dimensional arrays; the open-source, Python-based PYNQ development environment; the Digilent PYNQ-Z1 dev board; and the Xilinx Zynq Z-7020 SoC on the PYNQ-Z1 board. Using a Python DRNN hardware-acceleration overlay, the two-person team achieved 20GOPS of processing throughput for an NLP (natural language processing) application with this design and outperformed earlier FPGA-based implementation by factors ranging from 2.75x to 70.5x.
Most of the paper discusses NLP and the LM (language model), “which is involved in machine translation, voice search, speech tagging, and speech recognition.” The paper then discusses the implementation of a DRNN LM hardware accelerator using Vivado HLS and Verilog to synthesize a custom overlay for the PYNQ development environment. The resulting accelerator contains five Process Elements (PEs) capable of delivering 20 GOPS in this application. Here’s a block diagram of the design:
DRNN Accelerator Block Diagram
There are plenty of deep technical details embedded in this paper but this one sentence sums up the reason for this blog post about the paper: “More importantly, we showed that a software and hardware joint design and simulation process can be useful in the neural network field.” This statement is doubly true considering that the PYNQ-Z1 dev board sells for $229.
Xilinx has a terrific tool designed to get you from product definition to working hardware quickly. It’s called SDSoC. Digilent has a terrific dev board to get you up and running with the Zynq SoC quickly. It’s the low-cost Arty Z7. A new blog post by Digilent’s Alex Wong titled “Software Defined SoC on Arty Z7-20, Xilinx ZYNQ evaluation board” posted on RS Online’s DesignSpark site gives you a detailed, step-by-step tutorial on using SDSoC with the Digilent Arty S7. In particular, the focus here is on the ease of moving functions from software running on the Zynq SoC’s Arm Cortex-A9 processors to the Zynq SoC’s programmable hardware using Vivado HLS, which is embedded in SDSoC. That’s so that you can get the performance benefit of hardware-based task execution.
As engineers we cannot assume the systems we design will operate as intended 100% of the time. Unexpected events and failures do occur. Depending upon the application, the protections implemented against these unexpected failures vary. For example, a safety-critical system used for an industrial or automotive application requires considerably more failure-mode analysis and much better protection mechanisms than a consumer application. One simple protection mechanism that can be implemented quickly and simply in any application is the use of a watchdog and this blog post describes the three watchdogs in a Zynq UltraScale+ MPSoC.
Watchdogs are intended to protect the processor against the software application crashing and becoming unresponsive. A watchdog is essentially a counter. In normal operation, the application software prevents this counter from reaching its terminal count by resetting it periodically. Should the application software crash and allow the watchdog to reach its terminal count by failing to reset the counter, the watchdog is said to have expired.
Upon expiration, the watchdog generates a processor reset or non-maskable interrupt to enable the system to recover. The watchdog is designed to protect against software failures so it must be implemented physically in silicon. It cannot be a software counter for reasons that should be obvious.
Preventing the watchdog’s expiration can be complicated. We do not want crashes to be masked resulting in the watchdog’s failure to trigger. It is good practice for the application software to restart the watchdog in the main body of application. Using an interrupt service routine to restart the watchdog opens the possibility of the main application crashing but the ISR still being serviced. In that situation, the watchdog will restart without a crash recovery.
The Zynq UltraScale+ MPSoC provides three watchdogs in its processing system (PS):
Full Power Domain (FPD) Watchdog protecting the APU and its interconnect
Low Power Domain (LPD) Watchdog protecting the RPU and its interconnect
Configuration and Security Unit (CSU) Watchdog protecting the CSU and its interconnect
The FPD and LPD watchdogs can be configured to generate a reset, an interrupt, or both should a timeout occur. The CSU can only generate an interrupt, which can then be actioned by the APU, RPU, or the PMU. We use the PMU to manage the effects of a watchdog timeout, configuring it via to act via its global registers.
The FPD and LPD watchdogs are clocked either from an internal 100MHz clock or from an external source connected to the MIO or EMIO. The FPD and LPD watchdogs can output a reset signal via MIO or EMIO. This is helpful if we wish to alert functions in the PL that a watchdog timeout has occurred.
Each watchdog is controlled by four registers:
Zero Mode Register – This control register enables the watchdog and enables generation of reset and interrupt signals along with the ability to define the reset and interrupt pulse duration.
Counter Control Register – This counter configuration register sets the counter reset value and clock pre-scaler.
Restart Register – This write-only register takes a specific key to restart the watchdog.
Status Register – This read-only register indicates if the watchdog has expired.
To ensure that writes to the Zero Mode, Counter Control, and Restart registers are intentional and not the result of an incorrect software operation, write accesses to these registers require that specific keys, different for each register, must be included in the written data word for each write take effect.
Zynq UltraScale+ MPSoC Timer and Watchdog Architecture
To include the FPD or LPD watchdogs in our design, we need to enable them in Vivado. You do so using the I/O configuration tab of the MPSoC customization dialog.
Enabling the SWDT (System Watchdog Timer) in the design
For my example in this blog post, I enabled the external resets and connected them to an ILA within the PL so that I can capture the reset signal when it’s generated.
Simple Test Design for the Zynq UltraScale+ MPSoC watchdog
To configure and use the watchdog, we use SDK and the API defined in xwdtps.h. This API allows us to configure, initialize, start, restart, and stop the selected watchdog with ease.
To use the watchdog to its fullest extent, we also need to configure the PMU to respond to the watchdog error. This is simple and requires writes to the PMU Error Registers (ERROR_SRST_EN_1 and ERROR_EN_1) enabling the PMU to respond to watchdog timeout. This will also cause the PS to assert its Error OUT signal and will result in LED D3 illuminating on the UltraZed SOM when the timeout occurs.
For this example, I also used the PMU persistent global registers, which are cleared only by a power-on reset, to keep a record of the fact that a watchdog event has occurred. This count increments each time a watchdog timeout occurs. After the example design has reset 6 times, the code finally begins to restart the watchdog and stays alive.
Because the watchdog causes a reset event each time the processor reboots, we must take care to clear the previously recorded error. Failure to do so will result in a permanent reset cycle because the timeout error is only cleared by a power-on reset or a software action that clears the error. To prevent this from happening, we need to clear the watchdog fault indication in the PMU’s global ERROR_STATUS_1 register at the start of the program.
Reset signal being issued to the programmable logic
Observing the terminal output after I created my example application, it was obvious that the timeout was occurring and the number of occurrences was incrementing. The screen shot below shows the initial error status register and the error mask register. The error status register is shown twice. The first instance shows the watchdog error in the system and the second confirms that it has been cleared. The reset reason also indicates the cause of the reset. In this case it’s a PS reset.
Looping Watchdog and incrementing count
We have touched lightly on the PMU’s error-handling capabilities. We will explore these capabilities more in a future blog.
Meanwhile, you can find the example source code on the GitHub.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
National Instruments’ (NI’s) PXI FlexRIO modular instrumentation product line has been going strong for more than a decade and the company has just revamped its high-speed PXI analog digitizers and PXI digital signal processors by upgrading to high-performance Xilinx Kintex UltraScale FPGAs. According to NI’s press release, “With Kintex UltraScale FPGAs, the new FlexRIO architecture offers more programmable resources than previous Kintex-7-based FlexRIO modules. In addition, the new mezzanine architecture fits both the I/O module and the FPGA back end within a single, integrated 3U PXI module. For high-speed communication with other modules in the chassis, these new FlexRIO modules feature PCIe Gen 3 x8 connectivity for up to 7 GB/s of streaming bandwidth.” (Note that all Kintex UltraScale FPGAs incorporate hardened PCIe Gen1/2/3 cores.)
The change was prompted by a tectonic shift in analog converter interface technology—away from parallel LVDS converter interfaces and towards newer, high-speed serial protocols like JESD204B. As a result, NI’s engineers re-architected their PXI module architecture by splitting it into interface-compatible Mezzanine I/O modules and a trio of back-end FPGA carrier/PCIe-interface cards—NI calls them “FPGA back ends”—based on three pin-compatible Kintex UltraScale FPGAs: the KU035, KU040, and KU060 Kintex UltraScale devices. These devices allow NI to offer three different FPGA resource levels with the same pcb design.
Modular NI PXI FlexRIO Module based on Xilinx Kintex UltraScale FPGAs
The new products in the revamped FlexRIO product line include:
Coprocessor Modules – Kintex UltraScale PXI FlexRIO Coprocessor Modules add real-time signal processing capabilities to a system. A chassis full of these modules delivers high-density computational resources—the most computational density ever offered by NI.
Module Development Kit – You can use NI’s LabVIEW to program these PXI modules and you can using use the Xilinx Vivado Project Export feature included with LabVIEW FPGA 2017 to develop, simulate, and compile custom I/O modules for more performance or to meet unique application requirements. (More details available from NI here.)
Here’s a diagram showing you the flow for LabVIEW 2017’s Xilinx Vivado Project Export capability:
NI’s use of three Kintex UltraScale FPGA family members to develop these new FlexRIO products illustrate several important benefits associated with FPGA-based design.
First, NI has significantly future-proofed its modular PXIe FlexRIO design by incorporating the flexible, programmable I/O capabilities of Xilinx FPGAs. JESD204B is an increasingly common analog interface standard, easily handled by the Kintex UltraScale FPGAs. In addition, those FPGA I/O pins driving the FlexRIO Mezzanine interface connector are bulletproof, fully programmable, 16.3Gbps GTH/GTY ports that can accommodate a very wide range of high-speed interfaces—nearly anything that NI’s engineering teams might dream up in years to come.
Second, NI is able to offer three different resource levels based on three pin-compatible Kintex UltraScale FPGAs using the same pcb design. This pin compatibility is not accidental. It’s deliberate. Xilinx engineers labored to achieve this compatibility just so that system companies like NI could benefit. NI recognized and took advantage of this pre-planned compatibility. (You'll find that same attention to detail in every one of Xilinx's All Programmable device families.)
Third, NI is able to leverage the same Kintex UltraScale FPGA architecture for its high-speed Digitizer Modules and for its Coprocessor Modules, rather than using two entirely dissimilar chips—one for I/O control in the digitizers and one for the programmable computing engine Coprocessor Modules. The same programmable Kintex UltraScale FPGA architecture suits both applications well. The benefit here is the ability to develop common drivers and other common elements for both types of FlexRIO module.
For more information about these new NI FlexRIO products, please contact NI directly.
Note: Xcell Daily has covered the high-speed JESD204B interface repeatedly in the past. See:
The ability to cancel interfering noise without a reference signal. (Competing solutions focus on AEC—acoustic echo cancellation—which cancels noise relative to a required audio reference channel.)
Support for non-uniform 1D and 2D microphone array spacing.
Scales up with more microphones for noisier environments.
Offers a one-chip solution for sound capture, multiple wake words, and customer applications. (Today this is a two-chip solution.)
Makes everything available in a “software-ready” environment: Just log in to the Ubuntu linux environment and use Aaware’s streaming audio API to begin application development.
Aaware’s Far-Field Development Platform
These features are layered on top of a Xilinx Zynq SoC or Zynq UltraScale+ MPSoC and Aaware’s CTO Chris Eddington feels that the Zynq devices provide “well over” 10x the performance of an embedded processor thanks to the devices’ on-chip programmable logic, which offloads a significant amount of processing from the on-chip ARM Cortex processor(s). (Aaware can squeeze its technology into a single-core Zynq Z-7007S SoC and can scale up to larger Zynq SoC and Zynq UltraScale+ MPSoC devices as needed by the customer application.)
Aaware’s algorithm development is based on a unique tool chain:
Algorithm development in MathWork’s MATLAB.
Hand-coding of an equivalent application in C++.
Initial hardware-accelerator synthesis from the C++ specification using Vivado HLS.
Use of Xilinx SDSoC to connect the hardware accelerators to the AXI bus and memory.
This tool chain allows Aaware to fit the features it wants into the smallest Zynq Z-7007S SoC or to scale up to the largest Zynq UltraScale+ MPSoC.
In all of our previous MicroBlaze soft processor examples, we have used JTAG to download and execute the application code via SDK. You can’t do that in the real world. To deploy a real system, we need the MicroBlaze processor to load and boot its application code from non-volatile memory without our intervention. I thought that showing you how to do this would make for a pretty interesting blog. So using the $99 Digilent Arty A7 board, which is based on a Xilinx Artix-7 FPGA, I am going to demonstrate the steps you need to take using the Arty’s on-board QSPI Flash memory. We will store both the bitstream configuration file and the application software in the QSPI flash.
The QSPI therefore has two roles:
Configure the Artix FPGA
Store the application software
For the first role, we do not need to include a QSPI interface in our Vivado design. All we need to do is update the Vivado configuration settings to QSPI, provided the QSPI flash memory is connected to the FPGA’s configuration pins. However, we need to include a QSPI interface in our design to interface with the QSPI Flash memory once the FPGA is configured and the MicroBlaze soft processor is instantiated. This addition allows a bootloader to copy the application software from the QSPI Flash memory to the Arty’s DDR SDRAM where it actually executes.
Of course, this raises the question:
Where does the MicroBlaze bootloader come from?
The process flowchart for developing the bootloader looks like this:
Development Flow Chart
Our objective is to create a single MCS image containing both the FPGA bitstream and the application software that we can burn into the QSPI Flash memory. To so this we need to perform the following steps in Vivado and SDK:
Include a QSPI Interface within the existing Vivado MicroBlaze design
Edit the device settings in Vivado to configure the device using Master SPI_4 and to compress the bit file, then to build and export the application to SDK
In SDK, create a new application project based on the exported hardware design. During the project-creation dialog, select the SREC SPI Bootloader template. This selection creates an SREC bootloader application that will load the main application code from QSPI Flash memory. Before we build the bootloader ELF, we first need to define the address offset from the base address of the QSPI to the location of the application software. In this case it is 0x600000. We define this offset within blconfig.h. We also need to update the SREC Bootloader BSP to identify the correct serial Flash memory device family. To do this, reconfigure the BSP. The family identification number to use is defined within xilisf.h, available under BSP libsrc. For this application, we select type 5 because the Arty board uses a Micron QSPI device, which is type 5.
Now create a second application project In SDK. This is the application we are going to load using the bootloader. For this application, I created a simple “hello world” application, ensuring in the linker file that this program will run from DDR SDRAM. To create the single MCS file, we need the application software to be in the S-record format. This format stores binary information in an ASCII format. (This format is now 40 years old and was originally developed for the 8-bit Motorola 6800 microprocessor.) We can use SDK to convert the generated ELF into S-record format. To generate the S-record file in SDK, open a bash shell and enter the following command in the directory that contains the application ELF:
cmd /c mb-objcopy -O srec <app>.elf <app>.srec
With the bootloader ELF created, we now need to merge the bitstream with the bootloader ELF in Vivado. This step allows the bootloader to be loaded into and run from the MicroBlaze processor’s local memory following configuration. Because this memory is small, the bootloader application must also be small. If you are experiencing problems reducing the size of the software application, consider using a compiler optimisation before increasing the local memory size.
With the merged bit file created and the S-record file available, use Vivado’s hardware manager to add the configuration memory:
The final step is to generate the unified MCS file containing the merged bitstream and the application software. When generating this file, we need to remember to load the application software using the same offset used in the SREC bootloader.
Once the file is built and burned into the QSPI memory, we can test to see that the MCS file works by connecting the Arty board to a terminal and pressing the board’s reset button. After a few seconds you should see the Arty board’s “done” LED illuminate and then you should see the results of the SREC bootloader appear in the terminal window. This report should show that the S records have loaded from QSPI memory into to DDR SDRAM before the program executes.
We now have a working MicroBlaze system that we can deploy in our designs.
The free, Web-based airhdl register file generator from noasic GmbH, an FPGA design and coaching consultancy and EDA tool developer, uses a simple, online definition tool to create register definitions from which the tool then automatically generates HDL, a C header file, and HTML documentation. The company’s CEO Guy Eschemann has been working with FPGAs for more than 15 years, so he’s got considerable experience in the need to create bulletproof register definitions to achieve design success. His company noasic is a member of the Xilinx Alliance Program.
What’s the big deal about creating registers? Many complex FPGA-based designs now require hundreds or even thousands of registers to operate and monitor a system and keeping these register definitions straight and properly documented, especially within the context of engineering changes, is a tough challenge for any design team.
“The hardware/firmware interface is the junction where hardware and firmware meet and communicate with each other. On the hardware side, it is a collection of addressable registers that are accessible to firmware via reads and writes. This includes the interrupts that notify firmware of events. On the firmware side, it is the device drivers or the low-level software that controls the hardware by writing values to registers, interprets the information read from the registers, and responds to interrupt requests from the hardware. Of course, there is more to hardware than registers and interrupts, and more to firmware than device drivers, but this is the interface between the two and where engineers on both sides must be concerned to achieve successful integration.”
The airhdl EDA tool from noasic is designed to help your hardware and software/firmware teams “achieve successful integration” by creating a central nexus for defining the critical, register-based hardware/firmware interface. It uses a single register map (with built-in version control) to create the HDL register definitions, the C header file for firmware’s use of those registers, and the HTML documentation that both the hardware and software/firmware teams will need to properly integrate the defined registers into a design.
Here’s an 11-minute video made by noasic to explain the airhdl EDA tool:
Consider signing up for access to this free tool. It will very likely save you a lot of time and effort.
For more information about airhdl, use the links above or contact noasic GmbH directly.
One ongoing area we have been examining is image processing. We’ve look at the algorithms and how to capture images from different sources. A few weeks ago, we looked at the different methods we could use to receive HDMI data and followed up with an example using an external CODEC (P1 & P2). In this blog we are going to look at using internal IP cores to receive HDMI images in conjunction with the Analog Devices AD8195 HDMI buffer, which equalizes the line. Equalization is critical when using long HDMI cables.
Nexys board, FMC HDMI and the Digilent PYNQ-Z1
To do this I will be using the Digilent FMC HDMI card, which provisions one of its channels with an AD8195. The AD8195I on the FMC HDMI card needs a 3v3 supply, which is not available on the ZedBoard unless I break out my soldering iron. Instead, I broke out my Digilent Nexys Video trainer board, which comes fitted with an Artix-7 FPGA and an FMC connector. This board has built-in support for HDMI RX and TX but the HDMI RX path on this board supports only 1m of HDMI cable while the AD8195 on the FMC HDMI card supports cable runs of up to 20m—far more useful in many distributed applications. So we’ll add the FMC HDMI card.
First, I instantiated a MicroBlaze soft microprocessor system in the Nexys Video card’s Artix-7 FPGA to control the simple image-processing chain needed for this example. Of course, you can implement the same approach to the logic design that I outline here using a Xilinx Zynq SoC or Zynq UltraScale+ MPSoC. The Zynq PS simply replaces the MicroBlaze.
The hardware design we need to build this system is:
MicroBlaze controller with local memory, AXI UART, MicroBlaze Interrupt controller, and DDR Memory Interface Generator.
DVI2RGB IP core to receive the HDMI signals and convert them to a parallel video format.
Video Timing Controller, configured for detection.
ILA connected between the VTC and the DVI2RBG cores, used for verification.
Clock Wizard used to generate a 200MHz clock, which supplies the DDR MIG and DVI2RGB cores. All other cores are clocked by the MIG UI clock output.
Two 3-bit GPIO modules. The first module will set the VADJ to 3v3 on the HDMI FMC. The second module enables the ADV8195 and provides the hot-plug detection.
The final step in this hardware build is to map the interface pins from the AD8195 to the FPGA’s I/O pins through the FMC connector. We’ll use the TMDS_33 SelectIO standard for the HDMI clock and data lanes.
Once the hardware is built, we need to write some simple software to perform the following:
Disable the VADJ regulator using pin 2 on the first GPIO port.
Set the desired output voltage on VADJ using pins 0 & 1 on the first GPIO port.
Enable the VADJ regulator using pin 2 on the first GPIO port.
Enable the AD8195 using pin 0 on the second GPIO port.
Enable pre- equalization using pin 1 on the second GPIO port.
Assert the Hot-Plug Detection signal using pin 2 on the second GPIO port.
Read the registers within the VTC to report the modes and status of the video received.
To test this system, I used a Digilent PYNQ-Z1 board to generate different video modes. The first step in verifying that this interface is working is to use the ILA to check that the pixel clock is received and that its DLL is locked, along with generating horizontal and vertical sync signals and the correct pixel values.
Provided the sync signals and pixel clock are present, the VTC will be able to detect and classify the video mode. The application software will then report the detected mode via the terminal window.
ILA Connected to the DVI to RGB core monitoring its output
Software running on the Nexys Video detecting SVGA mode (600 pixels by 800 lines)
With the correct video mode being detected by the VTC, we can now configure a VDMA write channel to move the image from the logic into a DDR frame buffer.
These workshops start in November and run through March of next year and there are four full-day workshops in the series:
Developing Zynq Software
Developing Zynq Hardware
Integrating Sensors on MiniZed with PetaLinux
A Practical Guide to Getting Started with Xilinx SDSoC
You can mix and match the workshops to meet your educational requirements. Here’s how Avnet presents the workshop sequence:
These workshops are taking place in cities all over North America including Austin, Dallas, Chicago, Montreal, Seattle, and San Jose, CA. All cities will host the first two workshops. Montreal and San Jose will host all four workshops.
A schedule for workshops in other countries has yet to be announced. The Web page says “Coming soon” so please contact Avnet directly for more information.
Finally, here’s a 1-minute YouTube video with more information about the workshops
For more information on and to register for the Avnet MiniZed SpeedWay Design Workshops, click here.
The Vivado 2017.3 HLx Editions are now available and the Vivado 2017.3 Release Notes (UG973) tells you why you’ll want to download this latest version now. I’ve scanned UG973, watched the companion 20-minute Quick Take video, and cherry-picked twenty of the many enhancements that jumped out at me to help make your decision easier:
Reduced compilation time with a new incremental compilation capability
Improved heuristics to automatically choose between high-reuse and low-reuse modes for incremental compilation
Verification IP (VIP) now included as part of pre-compiled IP libraries including support for AXI-Stream VIP
Enhanced ability to integrate RTL designs into IP Integrator using drag-and-drop operations. No need to run packager any more.
Checks put in place to ensure that IP is available when invoking write_bd_tcl command
write_project_tcl command now includes Block designs if they are part of the project
Hard 100GE Subsystem awareness for the VCU118 UltraScale+ Board with added assistance support
Hard Interlaken Subsystem awareness for the VCU118 UltraScale+ Board with assistance support
Support added for ZCU106 and VCU118 production reference boards
FMC Support added to ZCU102 and ZCU106 reference boards
Bus skew reports (report_bus_skew) from static timing analysis now available in the Vivado IDE
Enhanced ease of use for debug over PCIe using XVC
Partial Reconfiguration (PR) flow support for all UltraScale+ devices in production
Support for optional flags (FIFO Almost Full, FIFO Almost Empty, Read Data Valid) in XPM FIFO
Support for different read and write widths while using Byte Write Enable in XPM Memory
New Avalon to AXI bridge IP
New USXGMII Subsystem that switches 10M/100M/1G/2.5G/5G/10G on 10GBASE-R 10G GT line rate for NBASE-T standard
New TSN (Time-Sensitive Network) subsystem
Simplified interface for DDR configuration in the Processing Systems Configuration Wizard
Fractional Clock support for DisplayPort Audio and Video to reduce BOM costs