The ability to work with OpenCL at higher levels of abstraction is increasingly important for FPGA developers. We can use OpenCL to develop applications for heterogeneous SoCs and acceleration cards, and to create HLS IP blocks.
I wanted to write a few blogs on OpenCL to highlight its increasing importance further. For this blog, I thought featuring an Alveo U50 acceleration card would allow me to focus on OpenCL in addition to the resulting acceleration.
Of course, first we must get the U50 up and running and installed in my Linux desktop. Once the board is physically connected, the first thing to do is boot the system and log into Linux.
As we enter the era of heterogeneous compute, where different processing engines in a single application take us to that next level of performance and efficiency, debug and trace tools have to evolve to keep up with devices.
Today, Xilinx adds a new product to its programming, debug, and trace module portfolio. The SmartLynq+ module is a high-speed debug and trace module primarily targeting designs using the Versal™ platform. It drastically improves configuration and trace speed. The SmartLynq+ module provides up to 28X faster Linux download time via the high-speed debug port (HSDP) than the SmartLynq data cable. For trace capture, the SmartLynq+ module is capable of speeds up to 10Gb/s via HSDP. That's 100X faster than standard JTAG! More rapid iterations and repetitive downloads increase development productivity and reduce the design cycle. This means that you no longer need to spend your precious time on debugging and instead can focus on the launch of your Versal based solutions.
In last week's blog, we examined how to install and use the open-source simulator GHDL. This week, we will expand the use of GHDL with an open-source verification framework for VHDL called UVVM.
UVVM or Universal VHDL Verification Methodology is a free and open-source verification framework created by the Norwegian company bitvis.
Funded in part by the European Space Agency (ESA), UVVM is one of the most powerful VHDL verification frameworks. Along with the framework for creating the test benches which includes scoreboards, alerts, logging, checking, transactions and more, UVVM also provides several VHDL Verification Components (VVC). These VHDL Verification Components provide users with a fast and easy method to implement standard interfaces like the ones listed below.
To kick off this new year of blogs, I want to build upon the verification blogs previously started and examine how we can use VHDL frameworks such as UVVM. UVVM requires VHDL 2008 commands not currently supported by the Vivado Simulator so this leaves us with the choice of using either a commercial simulator such as ModelSim or an open source simulator such as GHDL.
As such, in this blog we are going to look at how to install GHDL and use it to simulate our VHDL designs. GHDL has been around for a several years, first released in 2002, and provides VHDL analysis, compilation, and simulation. What makes GHDL different to other simulations is that it compiles the VHDL into machine code by using one of several backends like GCC, LLVM, or its own internal mcode compiler. GHDL will run across Linux, Windows, and MAC platforms. I will be using a Windows instantiation for this example.
Xilinx is introducing two Versal™ ACAP evaluation kits: Versal AI Core series VCK190 evaluation kit and Versal Prime series VMK180 evaluation kit. Both the VCK190 and VMK180 evaluation kits come with the same set of accessories and have similar onboard interfaces. The main difference between the kits is...
With every major software release, comes new capabilities. For example, Vitis and Vivado 2020.2 introduced support for Versal ACAP, along with Vitis HLS becoming the default HLS compiler for both Vivado and Vitis.
One thing you might have missed in the 2020.2 release of Vitis HLS is that Xilinx opened the LLVM Intermediate Representation (IR) layer of Vitis HLS to partners. This allows partners to make additional optimization pragmas available to developers, which will result in a better quality of result for the developer.
If you have followed my blogs, courses, and/or webinars, you will know that I do a lot of high-reliability design. In fact, my first-ever article for Xilinx was on mission-critical design over 10 years ago in the Xcell Journal. This year, I have talked a lot about high-reliability design and ran a webinar on high-reliability design a few weeks ago. In addition, one of my recent Hackster projects was demonstrating how we can simulate single event upsets and the impacts on our FSMs, should they occur.
In all of these things, I mentioned how we can design state machines that do not lock up if subjected to a single event upset. With the proliferation of programmable logic into automotive applications and other high-reliability applications, I thought I would examine how we can implement safe state machines when using Vivado 2020.2 and Xilinx synthesis.
The Xilinx® Vivado® Design Suite’s IP integrator provides an excellent way to build custom designs using Xilinx IP or custom IP using block designs. Users can place these block designs into a hierarchical block, which can then be part of an overall hierarchical design. This is a great way to build complex hierarchical designs in a project.
But what if you’d like to use one of these great block designs that you’ve created in a different Vivado project? These block designs cannot simply be copied into another project and used in a block design within that project.
I recently learned of a simple method to use a block design from one project in a block design of a separate Vivado project.
I recently came across a link to the Trenz ZynqBerry Zero. This is a Zynq Z-7010 device in the same form factor as the Raspberry Pi Zero. I liked the small size and its interfaces looked interesting so I ordered one from Trenz in order to take a closer look.
At first glance, I was surprised at just how small the board is. It measures less than 3 cm by 6.5 cm into a space that packs the following:
Xilinx is hosting a new series of virtual technology events called Xilinx Adapt.
The five-part series runs from Nov 2020 through Mar 2021 and presents Xilinx products, technology, and solutions, together with insights from partners and industry leaders. It’s an opportunity to hear about our latest product and technologies and to learn from industry luminaries.
Each segment-focused event includes several sessions spanning 2 or 3 days, ~3 hours per day. Pick your preferred sessions, tune in for a full event, or attend the complete series! Missed something that interests you? Don’t worry – you can register and view on demand until April 30, 2021.
Since starting this blog back in 2013, I get emailed questions from time to time. I definitely try to answer as many as I can and recently received an interesting one that I thought would make for a great example blog discussing what is actually occurring when we work with High-Level Synthesis.
The questioner was working with Vivado HLS 2020.1 and Vitis HLS 2020.1 and struggling to achieve the same operation in both tools. The code itself is simple and leverages the ap_wait_n() command to delay for several clock cycles.
When it comes to developing for Xilinx FPGA and heterogeneous SoCs, the ability to work effectively with Vivado, Vitis, and PYNQ is key. Regardless of our level of experience as developers, it is always good to refresh our skills and keep learning about the tools and devices.
This past summer, I was lucky to present several detailed virtual workshops on Vivado, PYNQ, Vitis, and more. For readers who did not see them the first time, these courses are available to watch on demand and all supporting lab books and materials are posted online.
Not every imaging application is created equal, so it’s no surprise that a single Image Signal Processor (ISP) can’t address every application’s requirements. The requirements for an ISP that is optimized for cellphone selfies are very different from the needs of an ISP designed for machine vision applications or feeding in-depth data to autonomous vehicles. Even within a specific application or use case, technology evolution of core components such as image sensors or AI tools often outpace the life cycle of dedicated ASICs. A key advantage of a programmable MPSoC architecture such as the Xilinx® Zynq® UltraScale+™ MPSoC is the flexibility it provides to adapt quickly to changes in components and customize or “tune” an ISP to a customer’s specific application.
ON Semiconductor’s Intelligent Power Modules (IPM) and Transfer Molded Power Integrated Modules are made for efficient motor control. They are suitable for high voltages and high operating temperatures and allow a reduction of the drive’s real estate. Higher switching frequencies facilitate increased efficiency beyond the capabilities of legacy motor control solutions.
That’s why microcontrollers with algorithms in software can’t optimize new technologies. Fast switching that uses advanced algorithms in dedicated logic controls the rotating field in the motor—maximizing efficiency. With motor control in the Xilinx® Zynq®-7000 SoC, the system gets the right performance by design. The speed of the control loops is always predictable, regardless of what else is running inside of the Xilinx SoC. Arm® cores in the processing system make the unit intelligent and ready for the Industrial Internet of Things (IIoT).
Last week we examined the AXI VIP, which we use when working with memory-mapped AXI or AXI-Lite. This week, we will examine the AXI Stream VIP, which is similar in behavior but has enough key differences to warrant a separate blog.
The AXI Stream VIP is extremely useful when we want to generate signal and image processing IP that use AXI Stream for interfacing. Using the AXI VIP, we can generate stimulus data and of course act as a slave to ensure data is output from the UUT.
Compared to the AXI VIP previously examined, the AXI Stream VIP is much simpler to get up and running, although it does use the same two elements in the static and dynamic environments. Again, the static element is what we create in Vivado, and the dynamic is what we implement and control in the test bench.
The designs we implement in Vivado often use AXI interfaces. These might be AXI Lite for configuration and control, AXI Memory Mapped for high-speed memory mapped transfer, or AXI Stream for high-bandwidth streams.
These interfaces can be complex to verify, ensuring we get the protocol implemented correctly for bus transactions can present a challenge on its own. To help verify these interfaces in simulation, Vivado provides us with the AXI verification IP. This IP can be deployed in the following three different configurations:
Generating AXI transactions as the bus master
Responding to AXI transactions as a bus slave
Pass-through in this mode is capable of AXI protocol checking
Being able to leverage the power of programmable logic thanks to using High Level Synthesis allows us to significantly reduce development times. Of course, we want to leverage existing libraries in order to get the best from HLS developments and avoid having to reinvent the wheel each time.
Last year, I examined SLX FPGA and used it to optimise IP Cores for implementation in Vivado looking at security and industrial algorithms. Of course, things have moved on in the HLS world with the introduction of Vitis, last November. I was curious to see how SLX FPGA could be used in a Vitis bottom up flow. When working in a bottom up flow we use Vivado HLS to generate a Xilinx Object (XO) which is then added into Vitis for use later in the Vitis application. Such a bottom up flow allows us to focus on complex algorithms, verify thier performance and ensuring the optimization for programmable logic implementation provide the best implementation and are kept close to the algorithm.
Available in a PCIe® form factor, BittWare’s RFX-8440 4-channel digital acquisition card leverages the latest generation Xilinx® Zynq® UltraScale+™ RFSoC Gen 3. The card is designed for both development and deployment with extensive expansion options, and RF-ADC/RF-DAC features to match a range of customer applications such as 5G, LTE wireless, phased array Radar, and satellite communications.
I recently received the new Trenz TE0802 development board, designed and manufactured by Digilent partner Trenz Electronic. This board is interesting because it is the first development board to contain a Zynq UltraScale+ MPSoC device of the CG variety. The CG devices sit within the mid-range of the Xilinx Heterogeneous MPSoC portfolio with the application processing unit containing dual-core Corex-A53 processors in place of the quad-core processors provided in the EV and EG devices.
Verifying the security of your system can be difficult. In search of a trusted foundation for a single piece of software, repeated cycles of pen testing and iteration can extend development cycles far beyond what the market is willing to wait for. These tests can provide increased confidence in the security of an application but will also fall short of ever proving it.
Let's face it. Customers aren't willing to risk their data security on a hunch that application security is possible. They want proof. Thankfully, the proof is available.
Last week we examined Xilinx simulation and how we could create test benches for behavioural and post-layout simulation along with creating the switching activity file that is used to provide more accurate power estimate.
The test bench we created last week was self-checking because it checked the output value against expected values supplied from a text file. This works well for simple examples like the one presented, however, for more complex examples we may want to use a different structure.
For complex algorithms, it is often a good idea to first create a model which defines the behaviour. This algorithm could be developed in a high-level language such as C or Python. It is important for some projects to get the customer to sign off and agree to the models before further RTL development.
We can use these models within the test bench to verify the HDL implementation on a cycle-by-cycle basis. This ensures the implemented algorithm works exactly as the model intends it.
Over the years, streaming media has been getting a lot of traction—whether live, broadcast of an event, or usage of drone technology in various applications. But achieving low latency becomes challenging when it comes to actual live streaming in real-time. Furthermore, delivering 4K resolution video footage with ultra-low latency requires a highly reliable codec mechanism to optimize end-to-end video delivery.
Taking this into account, iWave Systems introduces Zynq® UltraScale+™ MPSoC single board computer (SBC) integrated with Xilinx Sync IP implemented in programmable logic (PL) that works along with an integrated video codec unit (VCU) to optimize end-to-end latency.
No matter how captured (RTL, HLS, Model Driven), all programmable logic designs should start with agreed requirements that define the interfacing and functional performance. Depending upon the target application, the requirements may be significant in defining every aspect operation and behaviour under failure modes. Alternatively, the requirements maybe a cardinal point specification of key performance requirements.
Demonstrating that these requirements have been implemented correctly is often the role of simulation. Simulation enables us to stimulate a programmable logic design and observe its outputs.
So, what is involved in creating a simulation from the beginning? Designing a good simulation requires careful thought as to what is going to be tested and how -- even before we begin to write a line of code.
Xilinx and Movandi have teamed up on an Open-RAN (Radio Unit RU) that companies are demonstrating at the virtual BIG 5G event September 22-24. This continues to advance open 5G architecture and innovation and, most importantly, accelerates deployments.
Networks have become increasingly complex with the advent of 5G, densification, and richer, more demanding applications. To tame this complexity, we cannot use traditional human-intensive means of deploying, optimizing, and operating a network. Instead, networks must be self-driving and should be able to leverage new learning-based technologies to automate operational network functions and reduce OPEX.
Looking both ways before you cross the street is a good way to avoid a catastrophe. Your brain can almost instantly analyze information from your optic nerve, determine there’s an oncoming car, and stop your leg muscles from moving you in its way.
For the car, the same process has a few more steps.
It takes a massive amount of computing capacity to analyze information from digital cameras, determine size and speed from differences in the contrast of individual pixels, identify a human moving into the path of the vehicle, and synthesize that data into a message to the braking system.
In the end, you just hope the message arrives on time.
A large part of a programmable logic developer’s time is not spent implementing RTL, but verifying RTL functionality and behavior. A few weeks ago, a young engineer asked me about simulation and its role in the development process. I intend to create several blogs focusing on how we can use Vivado XSIM to verify our design. However, verification of RTL can be a wider task than just performing a simulation.
The Vitis™ integrated AI Development environment is Xilinx’s development platform for AI inference on Xilinx hardware platforms consisting of optimized IP tools, models, and example designs.
Designed with high efficiency and ease of use in mind, it unleashes the full potential of AI accelerations on Xilinx FPGA and Adaptive Compute Acceleration Platforms (ACAPs). To understand better how to get started with Deep Learning, Doulos, in conjunction with Xilinx, has organized a one-day on-line training workshop for embedded engineers.
Over the last couple of weeks, we have examined how Vivado can help us identify design issues that might impact the implementation.
However, as all engineers know, the earlier we can find an issue, the easier and cheaper it is to correct (both financially and in time spent). What is even better is to avoid the issue in the first place. This is where using Xilinx UltraFast design methodologies can offer significant benefits when implementing designs.
By following the rules outlined below, we can create a design with reduced issues encountered later on. Of course, the UltraFast design methodology rules provide significant insight and explanation as to why the rules exist. However, five of the most critical are summarized below.
More than a decade ago, we anticipated the pervasiveness of PCI Express® and began offering integrated blocks for it in our devices. Over the years, we have refined our integrated block offering with each new Xilinx architecture. We now see PCI Express applied in nearly all our developers’ markets.
The Versal™ architecture continues to offer a Programmable Logic Integrated Block for PCI Express (PL PCIE) further improved from that available in prior architectures and adds an Integrated Block for PCI Express® with DMA and Cache Coherent Interconnect (CPM). Architecturally, CPM is one component of the Versal architecture integrated shell, the whole of which is timing closed and resides “outside” the programmable logic (PL). The illustration below shows where CPM resides, with PL PCIE included for context: