Since starting this blog back in 2013, I get emailed questions from time to time. I definitely try to answer as many as I can and recently received an interesting one that I thought would make for a great example blog discussing what is actually occurring when we work with High-Level Synthesis.
The questioner was working with Vivado HLS 2020.1 and Vitis HLS 2020.1 and struggling to achieve the same operation in both tools. The code itself is simple and leverages the ap_wait_n() command to delay for several clock cycles.
When it comes to developing for Xilinx FPGA and heterogeneous SoCs, the ability to work effectively with Vivado, Vitis, and PYNQ is key. Regardless of our level of experience as developers, it is always good to refresh our skills and keep learning about the tools and devices.
This past summer, I was lucky to present several detailed virtual workshops on Vivado, PYNQ, Vitis, and more. For readers who did not see them the first time, these courses are available to watch on demand and all supporting lab books and materials are posted online.
Not every imaging application is created equal, so it’s no surprise that a single Image Signal Processor (ISP) can’t address every application’s requirements. The requirements for an ISP that is optimized for cellphone selfies are very different from the needs of an ISP designed for machine vision applications or feeding in-depth data to autonomous vehicles. Even within a specific application or use case, technology evolution of core components such as image sensors or AI tools often outpace the life cycle of dedicated ASICs. A key advantage of a programmable MPSoC architecture such as the Xilinx® Zynq® UltraScale+™ MPSoC is the flexibility it provides to adapt quickly to changes in components and customize or “tune” an ISP to a customer’s specific application.
ON Semiconductor’s Intelligent Power Modules (IPM) and Transfer Molded Power Integrated Modules are made for efficient motor control. They are suitable for high voltages and high operating temperatures and allow a reduction of the drive’s real estate. Higher switching frequencies facilitate increased efficiency beyond the capabilities of legacy motor control solutions.
That’s why microcontrollers with algorithms in software can’t optimize new technologies. Fast switching that uses advanced algorithms in dedicated logic controls the rotating field in the motor—maximizing efficiency. With motor control in the Xilinx® Zynq®-7000 SoC, the system gets the right performance by design. The speed of the control loops is always predictable, regardless of what else is running inside of the Xilinx SoC. Arm® cores in the processing system make the unit intelligent and ready for the Industrial Internet of Things (IIoT).
Last week we examined the AXI VIP, which we use when working with memory-mapped AXI or AXI-Lite. This week, we will examine the AXI Stream VIP, which is similar in behavior but has enough key differences to warrant a separate blog.
The AXI Stream VIP is extremely useful when we want to generate signal and image processing IP that use AXI Stream for interfacing. Using the AXI VIP, we can generate stimulus data and of course act as a slave to ensure data is output from the UUT.
Compared to the AXI VIP previously examined, the AXI Stream VIP is much simpler to get up and running, although it does use the same two elements in the static and dynamic environments. Again, the static element is what we create in Vivado, and the dynamic is what we implement and control in the test bench.
The designs we implement in Vivado often use AXI interfaces. These might be AXI Lite for configuration and control, AXI Memory Mapped for high-speed memory mapped transfer, or AXI Stream for high-bandwidth streams.
These interfaces can be complex to verify, ensuring we get the protocol implemented correctly for bus transactions can present a challenge on its own. To help verify these interfaces in simulation, Vivado provides us with the AXI verification IP. This IP can be deployed in the following three different configurations:
Generating AXI transactions as the bus master
Responding to AXI transactions as a bus slave
Pass-through in this mode is capable of AXI protocol checking
Being able to leverage the power of programmable logic thanks to using High Level Synthesis allows us to significantly reduce development times. Of course, we want to leverage existing libraries in order to get the best from HLS developments and avoid having to reinvent the wheel each time.
Last year, I examined SLX FPGA and used it to optimise IP Cores for implementation in Vivado looking at security and industrial algorithms. Of course, things have moved on in the HLS world with the introduction of Vitis, last November. I was curious to see how SLX FPGA could be used in a Vitis bottom up flow. When working in a bottom up flow we use Vivado HLS to generate a Xilinx Object (XO) which is then added into Vitis for use later in the Vitis application. Such a bottom up flow allows us to focus on complex algorithms, verify thier performance and ensuring the optimization for programmable logic implementation provide the best implementation and are kept close to the algorithm.
Available in a PCIe® form factor, BittWare’s RFX-8440 4-channel digital acquisition card leverages the latest generation Xilinx® Zynq® UltraScale+™ RFSoC Gen 3. The card is designed for both development and deployment with extensive expansion options, and RF-ADC/RF-DAC features to match a range of customer applications such as 5G, LTE wireless, phased array Radar, and satellite communications.
I recently received the new Trenz TE0802 development board, designed and manufactured by Digilent partner Trenz Electronic. This board is interesting because it is the first development board to contain a Zynq UltraScale+ MPSoC device of the CG variety. The CG devices sit within the mid-range of the Xilinx Heterogeneous MPSoC portfolio with the application processing unit containing dual-core Corex-A53 processors in place of the quad-core processors provided in the EV and EG devices.
Verifying the security of your system can be difficult. In search of a trusted foundation for a single piece of software, repeated cycles of pen testing and iteration can extend development cycles far beyond what the market is willing to wait for. These tests can provide increased confidence in the security of an application but will also fall short of ever proving it.
Let's face it. Customers aren't willing to risk their data security on a hunch that application security is possible. They want proof. Thankfully, the proof is available.
Last week we examined Xilinx simulation and how we could create test benches for behavioural and post-layout simulation along with creating the switching activity file that is used to provide more accurate power estimate.
The test bench we created last week was self-checking because it checked the output value against expected values supplied from a text file. This works well for simple examples like the one presented, however, for more complex examples we may want to use a different structure.
For complex algorithms, it is often a good idea to first create a model which defines the behaviour. This algorithm could be developed in a high-level language such as C or Python. It is important for some projects to get the customer to sign off and agree to the models before further RTL development.
We can use these models within the test bench to verify the HDL implementation on a cycle-by-cycle basis. This ensures the implemented algorithm works exactly as the model intends it.
Over the years, streaming media has been getting a lot of traction—whether live, broadcast of an event, or usage of drone technology in various applications. But achieving low latency becomes challenging when it comes to actual live streaming in real-time. Furthermore, delivering 4K resolution video footage with ultra-low latency requires a highly reliable codec mechanism to optimize end-to-end video delivery.
Taking this into account, iWave Systems introduces Zynq® UltraScale+™ MPSoC single board computer (SBC) integrated with Xilinx Sync IP implemented in programmable logic (PL) that works along with an integrated video codec unit (VCU) to optimize end-to-end latency.
No matter how captured (RTL, HLS, Model Driven), all programmable logic designs should start with agreed requirements that define the interfacing and functional performance. Depending upon the target application, the requirements may be significant in defining every aspect operation and behaviour under failure modes. Alternatively, the requirements maybe a cardinal point specification of key performance requirements.
Demonstrating that these requirements have been implemented correctly is often the role of simulation. Simulation enables us to stimulate a programmable logic design and observe its outputs.
So, what is involved in creating a simulation from the beginning? Designing a good simulation requires careful thought as to what is going to be tested and how -- even before we begin to write a line of code.
Xilinx and Movandi have teamed up on an Open-RAN (Radio Unit RU) that companies are demonstrating at the virtual BIG 5G event September 22-24. This continues to advance open 5G architecture and innovation and, most importantly, accelerates deployments.
Networks have become increasingly complex with the advent of 5G, densification, and richer, more demanding applications. To tame this complexity, we cannot use traditional human-intensive means of deploying, optimizing, and operating a network. Instead, networks must be self-driving and should be able to leverage new learning-based technologies to automate operational network functions and reduce OPEX.
Looking both ways before you cross the street is a good way to avoid a catastrophe. Your brain can almost instantly analyze information from your optic nerve, determine there’s an oncoming car, and stop your leg muscles from moving you in its way.
For the car, the same process has a few more steps.
It takes a massive amount of computing capacity to analyze information from digital cameras, determine size and speed from differences in the contrast of individual pixels, identify a human moving into the path of the vehicle, and synthesize that data into a message to the braking system.
In the end, you just hope the message arrives on time.
A large part of a programmable logic developer’s time is not spent implementing RTL, but verifying RTL functionality and behavior. A few weeks ago, a young engineer asked me about simulation and its role in the development process. I intend to create several blogs focusing on how we can use Vivado XSIM to verify our design. However, verification of RTL can be a wider task than just performing a simulation.
The Vitis™ integrated AI Development environment is Xilinx’s development platform for AI inference on Xilinx hardware platforms consisting of optimized IP tools, models, and example designs.
Designed with high efficiency and ease of use in mind, it unleashes the full potential of AI accelerations on Xilinx FPGA and Adaptive Compute Acceleration Platforms (ACAPs). To understand better how to get started with Deep Learning, Doulos, in conjunction with Xilinx, has organized a one-day on-line training workshop for embedded engineers.
Over the last couple of weeks, we have examined how Vivado can help us identify design issues that might impact the implementation.
However, as all engineers know, the earlier we can find an issue, the easier and cheaper it is to correct (both financially and in time spent). What is even better is to avoid the issue in the first place. This is where using Xilinx UltraFast design methodologies can offer significant benefits when implementing designs.
By following the rules outlined below, we can create a design with reduced issues encountered later on. Of course, the UltraFast design methodology rules provide significant insight and explanation as to why the rules exist. However, five of the most critical are summarized below.
More than a decade ago, we anticipated the pervasiveness of PCI Express® and began offering integrated blocks for it in our devices. Over the years, we have refined our integrated block offering with each new Xilinx architecture. We now see PCI Express applied in nearly all our developers’ markets.
The Versal™ architecture continues to offer a Programmable Logic Integrated Block for PCI Express (PL PCIE) further improved from that available in prior architectures and adds an Integrated Block for PCI Express® with DMA and Cache Coherent Interconnect (CPM). Architecturally, CPM is one component of the Versal architecture integrated shell, the whole of which is timing closed and resides “outside” the programmable logic (PL). The illustration below shows where CPM resides, with PL PCIE included for context:
The Xilinx® UltraScale+™ FPGAs and SoCs with GTY transceivers can support a PCI Express® Gen4 interface. Design Gateway’s NVMe Host Controller IP core is designed to leverage the GTY transceivers to support the latest NVMe SSD drive PCIe Gen4 technology. The IP core is implemented on Xilinx’s Virtex® UltraScale+ FPGA VCU118 Evaluation Kit and able to achieve incredibly fast read/write performance—more than 4GB/s.
One of the most time-consuming elements of implementing an FPGA design is not often the design capture, but achieving the necessary timing performance. To achieve the required timing closure, we may need to adjust the design by inserting pipeline stages and using constraints to correctly define the clocks, their relationship, and even the location of logic elements.
Vivado has many capabilities which can be used to help us understand our design’s implementation as well as the challenges it can present in implementation.
One of these tools is the Design Analysis Report which enables the user to understand the design challenges (e.g. congestion) and make changes to the design or constraints.
Every day there are new devices appearing in homes, offices, hospitals, factories, and thousands of other places that are part of the Internet-of-Things (IoT). They need to be connected to the Internet, and there is a need for a huge amount of raw data to be collected, stored, and processed on the cloud.
Many data centers are available to store the data. However, only some provide features specifically for IoT applications. One of the most complete cloud-based IoT services available is Amazon Web Services (AWS) IoT Greengrass. It enables edge devices to act locally on the data networked devices and provides secure bi-directional communication between the IoT devices and the AWS cloud for management, analytics, and storage.
By using AWS Greengrass IoT, devices can keep the data in synch even when not connected to the Internet. Also, they can run AWS Lambda functions and make predictions based on machine learning models provided by AWS.
In the last blog, we installed the Alveo U200 accelerator card and ensured the card could be correctly validated. In this blog, we are going to look at what remains to be done to run the first executable program on the Alveo card itself.
First, let us do a little recap. In the previous blog, we installed the XRT and the deployment target platform. Once installed, these are available under the /opt/Xilinx/XRT and /opt/Xilinx/DSA directories in the file system. We used some of the available commands to update and validate the Alveo card.
To get started creating our own application on the Alveo card, we need to also install the development target platform. This will also be in the /opt directory once installed and will provide the acceleration platform.
seL4 is a formally verified microkernel that was built with security and performance in mind. It is a very attractive software solution for projects that have rigorous security and/or safety requirements. While seL4 is a great option, one of the drawbacks is that there is less built around it than other, more mature solutions, such as a traditional OS. For example, seL4 doesn’t work on that many platforms, and it doesn’t have many drivers developed.
DornerWorks is trying to mitigate this by building up the ecosystem around seL4 and giving developers the tools they need to use seL4 for their systems. One way we are building up this ecosystem is by porting seL4 to more platforms that can really take advantage of its benefits.
In most of the blogs I write, I demonstrate or explain an FPGA or SoC design technique. This one, however, is going to be a little different because I am going to ask a question.
How do you architect your programmable logic designs?
The question popped to mind as I was working on the architecture for three FPGAs as part of a satellite development. Of course, due to the end application, this architecture will be subjected to several reviews by both the prime contractor and appropriate space agency. Therefore, I want the architecture to show as much detail as possible without making the drawings difficult to maintain so that my design team can work from it with ease.
I get asked this question a lot in my interactions with customers. Okay, maybe not phrased exactly like that, but more or less along the same lines: “Why should I move to Versal™ ACAP and is now the right time to do so?” It’s a great question, and the answer is...
This is a follow-on to last week’s demonstration of developing embedded software using Vitis 2020.1.
In this week’s blog, we are going to look at how we can debug applications running on the processing system using Vitis. To be able to do this, we need a JTAG debugger because the MicroZed does not have on-board JTAG USB capability like some other boards.
To get started debugging the MicroZed, the Digilent JTAG HS2 / HS3 programming cables are the most cost effective. Of course for larger developments, you might want to consider the SmartLynq from Xilinx.
Traditionally, the RF design world and digital design worlds were separated. That has changed now that the Xilinx® Zynq® UltraScale+™ RFSoC has brought them together in one device. The Zynq® UltraScale+™ RFSoC revolutionized RF and wireless systems by introducing a programmable device with integrated RF data converters. MATLAB® and Simulink® help engineers work across both RF and digital domains, get the most out of their RF hardware, and save time and effort in developing and deploying their wireless processing algorithms on highly integrated devices like the Xilinx Zynq UltraScale+ RFSoC.
In the last blog we where looking at how we could start create a MicroZed project in Vivado 2020.1 just like we did when we started the blog nearly 7 years ago. In the previous blog we had just created the hardware build in Vivado, in this blog we are going to create a simple hello world program using Vitis.
The first thing we need to do in Vivado is to export the XSA, this is a compressed file which contains several elements which enables Vitis to create software applications. Within an XSA file you will find information on the address map, the IP included and of course the bit file. Although the bit file is not necessary to get started with the software development.