The free, Web-based airhdl register file generator from noasic GmbH, an FPGA design and coaching consultancy and EDA tool developer, uses a simple, online definition tool to create register definitions from which the tool then automatically generates HDL, a C header file, and HTML documentation. The company’s CEO Guy Eschemann has been working with FPGAs for more than 15 years, so he’s got considerable experience in the need to create bulletproof register definitions to achieve design success. His company noasic is a member of the Xilinx Alliance Program.
What’s the big deal about creating registers? Many complex FPGA-based designs now require hundreds or even thousands of registers to operate and monitor a system and keeping these register definitions straight and properly documented, especially within the context of engineering changes, is a tough challenge for any design team.
The best way I’ve seen to put register definition in context comes from the book “Hardware/Firmware Interface Design: Best Practices for Improving Embedded Systems Development” written by my friend Gary Stringham:
“The hardware/firmware interface is the junction where hardware and firmware meet and communicate with each other. On the hardware side, it is a collection of addressable registers that are accessible to firmware via reads and writes. This includes the interrupts that notify firmware of events. On the firmware side, it is the device drivers or the low-level software that controls the hardware by writing values to registers, interprets the information read from the registers, and responds to interrupt requests from the hardware. Of course, there is more to hardware than registers and interrupts, and more to firmware than device drivers, but this is the interface between the two and where engineers on both sides must be concerned to achieve successful integration.”
The airhdl EDA tool from noasic is designed to help your hardware and software/firmware teams “achieve successful integration” by creating a central nexus for defining the critical, register-based hardware/firmware interface. It uses a single register map (with built-in version control) to create the HDL register definitions, the C header file for firmware’s use of those registers, and the HTML documentation that both the hardware and software/firmware teams will need to properly integrate the defined registers into a design.
Here’s an 11-minute video made by noasic to explain the airhdl EDA tool:
Consider signing up for access to this free tool. It will very likely save you a lot of time and effort.
For more information about airhdl, use the links above or contact noasic GmbH directly.
By Adam Taylor
One ongoing area we have been examining is image processing. We’ve look at the algorithms and how to capture images from different sources. A few weeks ago, we looked at the different methods we could use to receive HDMI data and followed up with an example using an external CODEC (P1 & P2). In this blog we are going to look at using internal IP cores to receive HDMI images in conjunction with the Analog Devices AD8195 HDMI buffer, which equalizes the line. Equalization is critical when using long HDMI cables.
Nexys board, FMC HDMI and the Digilent PYNQ-Z1
To do this I will be using the Digilent FMC HDMI card, which provisions one of its channels with an AD8195. The AD8195I on the FMC HDMI card needs a 3v3 supply, which is not available on the ZedBoard unless I break out my soldering iron. Instead, I broke out my Digilent Nexys Video trainer board, which comes fitted with an Artix-7 FPGA and an FMC connector. This board has built-in support for HDMI RX and TX but the HDMI RX path on this board supports only 1m of HDMI cable while the AD8195 on the FMC HDMI card supports cable runs of up to 20m—far more useful in many distributed applications. So we’ll add the FMC HDMI card.
First, I instantiated a MicroBlaze soft microprocessor system in the Nexys Video card’s Artix-7 FPGA to control the simple image-processing chain needed for this example. Of course, you can implement the same approach to the logic design that I outline here using a Xilinx Zynq SoC or Zynq UltraScale+ MPSoC. The Zynq PS simply replaces the MicroBlaze.
The hardware design we need to build this system is:
The final step in this hardware build is to map the interface pins from the AD8195 to the FPGA’s I/O pins through the FMC connector. We’ll use the TMDS_33 SelectIO standard for the HDMI clock and data lanes.
Once the hardware is built, we need to write some simple software to perform the following:
To test this system, I used a Digilent PYNQ-Z1 board to generate different video modes. The first step in verifying that this interface is working is to use the ILA to check that the pixel clock is received and that its DLL is locked, along with generating horizontal and vertical sync signals and the correct pixel values.
Provided the sync signals and pixel clock are present, the VTC will be able to detect and classify the video mode. The application software will then report the detected mode via the terminal window.
ILA Connected to the DVI to RGB core monitoring its output
Software running on the Nexys Video detecting SVGA mode (600 pixels by 800 lines)
With the correct video mode being detected by the VTC, we can now configure a VDMA write channel to move the image from the logic into a DDR frame buffer.
You can find the project on GitHub
If you are working with video applications you should also read these:
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here
Avnet’s MiniZed SpeedWay Design Workshops are designed to help you jump-start your embedded design capabilities using Xilinx Zynq Z-7000S All Programmable SoCs, which meld a processing system based on a single-core, 32-bit, 766MHz Arm Cortex-A9 processor with plenty of Xilinx FPGA fabric. Zynq SoCs are just the thing when you need to design high-performance embedded systems or need to use a processor along with some high-speed programmable logic. Even better, these Avnet workshops focus on using the Avnet MiniZed—a compact, $89 dev board packed with huge capabilities including built-in WiFi and Bluetooth wireless connectivity. (For more information about the Avnet MiniZed dev board, see “Avnet’s $89 MiniZed dev board based on Zynq Z-7000S SoC includes WiFi, Bluetooth, Arduino—and SDSoC! Ships in July.”)
These workshops start in November and run through March of next year and there are four full-day workshops in the series:
You can mix and match the workshops to meet your educational requirements. Here’s how Avnet presents the workshop sequence:
These workshops are taking place in cities all over North America including Austin, Dallas, Chicago, Montreal, Seattle, and San Jose, CA. All cities will host the first two workshops. Montreal and San Jose will host all four workshops.
A schedule for workshops in other countries has yet to be announced. The Web page says “Coming soon” so please contact Avnet directly for more information.
Finally, here’s a 1-minute YouTube video with more information about the workshops
For more information on and to register for the Avnet MiniZed SpeedWay Design Workshops, click here.
The Vivado 2017.3 HLx Editions are now available and the Vivado 2017.3 Release Notes (UG973) tells you why you’ll want to download this latest version now. I’ve scanned UG973, watched the companion 20-minute Quick Take video, and cherry-picked twenty of the many enhancements that jumped out at me to help make your decision easier:
By Adam Taylor
We really need an operating system to harness the capabilities of the two or four 64-bit Arm Cortex-A53 processors cores in the Zynq UltraScale+ MPSoC APU (Application Processing Unit). An operating system enables effective APU use by providing SMP (Symmetric Multi-Processing), multitasking, networking, security, memory management, and file system capabilities. Immediate availability of these resources saves us considerable coding and debugging time and allows us to focus on developing the actual application. Putting it succinctly: don’t reinvent the wheel when it comes to operating systems.
Avnet UltraZed-EG SOM plugged into an IOCC (I/O Carrier Card)
PetaLinux is one of the many operating systems that run on the Zynq UltraScale+ MPSoC’s APU. I am going to focus this blog on what we need to do to get PetaLinux up and running on the Zynq UltraScale+ MPSoC using the Avnet UltraZed-EG SoM. However, the process is the same for any design.
To do this we will need:
To ensure that it installs properly, do not use root to install PetaLinux. In other words, do not use the sudo command. If you want to install PetaLinux in the /opt/pkg directory as recommended in the installation guide, you must first change the directory permissions for your user account. Alternatively, you can install PetaLinux in your home area, which is exactly what I did.
With PetaLinux installed, run the settings script in a terminal window (source settings.sh), which sets the environment variable allowing us to call PetaLinux commands.
When we build PetaLinux, we get a complete solution. The build process creates the Linux image, he device tree blob, and a RAM disk combined into a single FIT image. PetaLinux also generates the PMU firmware, the FSBL (First Stage Boot Loader), and U-boot executables needed to create the boot.bin.
We need to perform the following steps to create the FIT image and boot files:
petalinux-create --type project --template zynqMP --name part219
Petalinux configuration page presented once the hardware definition is imported
Petalinux-config -c <u-boot or PMUFW or device-tree or rootfs>
This might take some time.
petalinux-package --boot --fsbl zynqmp_fsbl.elf --u-boot u-boot.elf --pmufw pmufw.elf –fpga fpga.bit
Once we have created the image file and the bin file, we can copy them to a SD card and boot the UltraZed-EG SOM.
Just as we simulate our logic designs first, we can test the PetaLinux image within our Linux build environment using QEMU. This allows us to verify that the image we have created will load correctly.
We run QEMU by issuing the following command:
petalinux-boot --qemu --image < Linux-image-file>
Created Petalinux image running in QEMU
Once we can see the PetaLinux image booting correctly in QEMU, the next step is to try it on the UltraZed-EG SOM. Copy the image.ub and boot.bin files to an SD card, configure the UltraZed-EG SOM mounted on the IOCC (I/O Carrier Card) to boot from the SD card, insert the SD card, and apply power.
If everything has been done correctly, you should see the FSBL load the PMU firmware in the terminal window. Then, U-Boot should run and load the Linux kernel.
Linux running on the UltraZed-EG SOM
Once the boot process has completed, we can log in using the user name and password of root, and then begin using our PetaLinux environment.
Now that we have the Zynq UltraScale+ MPSoC’s APU up and running with PetaLinux, we can use OpenAMP to download and execute programs in the Zynq UltraScale+ MPSoC’s dual-core ARM Cortex-R5 RPU (Real-time Processing Unit). We will explore how we do this in another blog.
Meanwhile, the following links were helpful in the generation of this image:
The project, as always, is on GitHub.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here
Brian Mathewson, Verification Technologist, Mentor Graphics
FPGA-based designs increasingly use advanced, multi-clocking architectures to meet high-performance throughput and computational requirements. However, an RTL or gate-level simulation of a design with multiple clock domains does not accurately capture the timing associated with data transfers between clock domains. Specifically, simulation does not model metastabilty as signals cross asynchronous clock domains. Consequently, bugs can escape the front-end verification process.
In the lab, these bugs can look like intermittent static-timing issues and you can spend weeks of expensive lab time chasing after them before you realize that CDC (clock domain crossing) metastability is the culprit. Fortunately, there are automated verification solutions that can apply exhaustive formal analysis to root out CDC issues up front.
Mentor recently developed an app for the Questa CDC verification tool that directly integrates with the Xilinx Vivado Design Suite. This new flow helps you analyze CDC paths more quickly by providing deeper CDC analysis. You can install the Questa CDC app from the Xilinx Vivado Tcl store. The app seamlessly integrates with Vivado and automatically creates the configuration files necessary to set up and run Mentor’s Questa CDC tool based on the FPGA design you have loaded into the Vivado Design Suite. Once the utility generates the CDC setup files, you can then launch Questa CDC from within Vivado with the click of a button.
In a real-life case study, a design team was able to run Questa CDC on their design based on a Xilinx FPGA and generated initial results in less than one day. Their structural analysis uncovered timing violations and issues that fell into 3 categories:
Furthermore, this FPGA-based design supported two operating modes and Questa CDC detected violations associated with the disabled modes. These violations pointed out a design bug that incorrectly enabled logic that should have been disabled for the inactive mode.
This same design team also used protocol verification on CDC paths with both “good” and “bad” synchronization structures. For the good structures, synchronizer protocol rules such as stability checks were validated with both formal verification and simulation. For the paths without synchronizers, synchronization structures were added to paths where protocol assertions failed in simulation. The FPGA-based design malfunctioned until all paths with protocol violations were synchronized.
Bottom line: the one week of CDC analysis was more productive than 2 weeks of debug in the lab.
But what about the underlying CDC verification technology? What type of analyses does it support?
Mentor’s Questa CDC is a full-featured CDC solution that includes:
The ability to thoroughly verify clock-domain crossings becomes ever more important as the number of clock domains increases in today's complex FPGA designs. By leveraging the Xilinx Tcl App Store, Mentor’s Questa CDC app for Vivado allows you to get started with advanced CDC analysis using the Questa CDC tool that adds essential capabilities for structural, protocol, and metastability verification. These features ensure that you handle CDC signals earlier in the design stage to avoid costly, time-wasting debug cycles in the lab.
For more detail on Questa CDC, click here.
P.S. If your FPGA designs target automotive applications, note that Questa CDC is the only CDC tool that has been qualified for ISO26262. For more about this and the Mentor “SAFE” program to qualify all tools for ISO26262, click here.
By Adam Taylor
The Xilinx Zynq UltraScale+ MPSoC is good for many applications including embedded vision. It’s APU with two or four 64-bit ARM Cortex-A53 processors, Mali GPU, DisplayPort interface, and on-chip programmable logic (PL) give the Zynq UltraScale+ MPSoC plenty of processing power to address exciting applications such as ADAS and vision-guided robotics with relative ease. Further, we can use the device’s PL and its programmable I/O to interface with a range of vision and video standards including MIPI, LVDS, parallel, VoSPI, etc. When it comes to interfacing image sensors, the Zynq UltraScale+ MPSoC can handle just about anything you throw at it.
Once we’ve brought the image into the Zynq UltraScale+ MPSoC’s PL, we can implement an image-processing pipeline using existing IP cores from the Xilinx library or we can develop our own custom IP cores using Vivado HLS (high-level synthesis). However, for many applications we’ll need to move the images into the device’s PS (processing system) domain before we can apply exciting application-level algorithms such as decision making or use the Xilinx reVISION acceleration stack.
I thought I would kick off the fourth year of this blog with a look at how we can use VDMA instantiated in the Zynq MPSoC’s PL to transfer images from the PL to the PS-attached DDR Memory without processor intervention. You often need to make such high-speed background transfers in a variety of applications.
To do this we will use the following IP blocks:
Once configured over its AXI Lite interface, the Test Pattern Generator outputs test patterns which are then transferred into the PS-attached DDR memory. We can demonstrate that this has been successful by examining the memory locations using SDK.
Enabling the FPD Master and Slave Interfaces
For this simple example, we’ll clock both the AXI networks at the same frequency, driven by PL_CLK_0 at 100MHz.
For a deployed system, an image sensor would replace the TPG as the image source and we would need to ensure that the VDMA input-channel clocks (Slave-to-Memory-Map and Memory-Map-to-Slave) were fast enough to support the required pixel and frame rate. For example, a sensor with a resolution of 1280 pixels by 1024 lines running at 60 frames per second would require a clock rate of at least 108MHz. We would need to adjust the clock frequency accordingly.
Block Diagram of the completed design
To aid visibility within this example, I have included three ILA modules, which are connected to the outputs of the Test Pattern Generator, AXI VDMA, and the Slave Memory Interconnect. Adding these modules enables the use of Vivado’s hardware manager to verify that the software has correctly configured the TPG and the VDMA to transfer the images.
With the Vivado design complete and built, creating the application software to configure the TPG and VDMA to generate images and move them from the PL to the PS is very straightforward. We use the AXIVDMA, V_TPG, Video Common APIs available under the BSP lib source directory to aid in creating the application. The software itself performs the following:
The application will then start generating test frames, transferred from the TPG into the PS DDR memory. I disabled the caches for this example to ensure that the DDR memory is updated.
Examining the ILAs, you will see the TPG generating frames and the VDMA transferring the stream into memory mapped format:
TPG output, TUSER indicates start of frame while TLAST indicates end of line
VDMA Memory Mapped Output to the PS
Examining the frame store memory location within the PS DDR memory using SDK demonstrates that the pixel values are present.
Test Pattern Pixel Values within the PS DDR Memory
You can use the same approach in Vivado when creating software for a Zynq Z-7000 SoC iinstead of a Zynq UltraScale+ MPSoC by enabling the AXI GP master for the AXI Lite bus and AXI HP slave for the VDMA channel.
Should you be experiencing trouble with your VDMA based image processing chain, you might want to read this blog.
The project, as always, is on GitHub.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here
I joined Xilinx five years ago and have looked for a good, introductory book on FPGA-based design ever since because people have repeatedly asked me for my recommendation. Until now, I could mention but not recommend Max Maxfield’s book published in 2004 titled “The Design Warrior’s Guide to FPGAs”—not because it was bad (it’s excellent) but because it’s more than a decade out of date. Patrick Lysaght, a Senior Director in the Xilinx CTO's office, alerted me to a brand new book that I can now recommend to anyone who wants to learn about using programmable logic to design digital systems.
It’s titled “Digital Systems Design with FPGA: Implementation Using Verilog and VHDL” and it was written by Prof. Dr. Cem Ünsalan in the Department of Electrical and Electronics Engineering at Yeditepe University in İstanbul and Dr. Bora Tar, now at the Power Management Research Lab at Ohio State University in Columbus, Ohio. Their book will take you from the basics of digital design and logic into FPGAs; FPGA architecture including programmable logic, block RAM, DSP slices, FPGA clock management, and programmable I/O; hardware description languages with an equal emphasis on Verilog and VHDL; the Xilinx Vivado Design Environment; and then on to IP cores including the Xilinx MicroBlaze and PicoBlaze soft processors. The book ends with 24 advanced embedded design projects. It’s quite obvious that the authors intend that this book be used as a textbook in a college-level digital design class (or two), but I think you could easily use this well-written book for self-directed study as well.
“Digital Systems Design with FPGA: Implementation Using Verilog and VHDL” uses the Xilinx Artix-7 FPGA as a model for describing the various aspects of a modern FPGA and goes on two describe two Digilent development boards based on the Artix-7 FPGA: the $149 Basys3 and the $99 Arty (now called the Arty A7 to differentiate it from the newer Arty S7, based on a Spartan-7 FPGA, and Zynq-based Arty Z7 dev boards). These boards are great for use in introductory design classes and they make powerful, low-cost development boards even for experienced designers.
At 400 pages, “Digital Systems Design with FPGA: Implementation Using Verilog and VHDL” is quite comprehensive and so new that the publisher has yet to put the table of contents online, so I decided to resolve that problem by publishing the contents here so that you can see for yourself how comprehensive the book is:
1.1 Hardware Description Languages
1.2 FPGA Boards and Software Tools
1.3 Topics to Be Covered in the Book
2 Field-Programmable Gate Arrays
2.1 A Brief Introduction to Digital Electronics
2.1.1 Bit Values as Voltage Levels
2.1.2 Transistor as a Switch
2.1.3 Logic Gates from Switches
2.2 FPGA Building Blocks
2.2.1 Layout of the Xilinx Artix-7 XC7A35T FPGA
2.2.2 Input / Output Blocks
2.2.3 Configurable Logic Blocks
2.2.4 Interconnect Resources
2.2.5 Block RAM
2.2.6 DSP Slices
2.2.7 Clock Management
2.2.8 The XADC Block
2.2.9 High-Speed Serial I/O Transceivers
2.2.10 Peripheral Component Interconnect Express Interface
2.3 PPGA-Based Digital System Design Philosophy
2.3.1 How to Think While Using FPGAS
2.3.2 Advantages and Disadvantages of FPGAS
2.4 Usage Areas of FPGAs
3 Basys3 and Arty FPGA Boards
3.1 The Basys3 Board
3.1.1 Powering the Board
3.1.2 Input / Output
3.1.3 Configuring the FPGA
3.1.4 Advanced Connectors
3.1.5 External Memory
3.1.6 Oscillator / Clock
3.2 The Arty Board
3.2.1 Powering the Board
3.2.3 Configuring the FPGA
3.2.4 Advanced Connectors
3.2.5 External Memory
3.2.6 Oscillator / Clock
4 The Vivado Design Suite
4.1 Installation and the Welcome Screen
4.2 Creating a New Project
4.2.1 Adding a Verilog File
4.2.2 Adding a VHDL File
4.3 Synthesizing the Project
4.4 Simulating the Project
4.4.1 Adding a Verilog Testbench File
4.4.2 Adding a VHDL Testbench File
4.5 Implementing the Synthesized Project
4.6 Programming the FPGA
4.6.1 Adding the Basys3 Board Constraint File to the Project
4.6.2 Programming the FPGA on the Basys3 Board
4.6.3 Adding the Arty Board Constraint File to the Project
4.6.4 Programming the FPGA on the Arty Board
4.7 Vivado Design Suite IP Management
4.7.1 Existing IP Blocks in Vivado
4.7.2 Generating a Custom IP
4.8 Application on the Vivado Design Suite
5 Introduction to Verilog and VHDL
5.1 Verilog Fundamentals
5.1.1 Module Representation
5.1.2 Timing and Delays in Modeling
5.1.3 Hierarchical Module Representation
5.2 Testbench Formation in Verilog
5.2.1 Structure of a Verilog Testbench File
5.2.2 Displaying Test Results
5.3 VHDL Fundamentals
5.3.1 Entity and Architecture Representations
5.3.2 Dataflow Modeling
5.3.3 Behavioral Modeling
5.3.4 Timing and Delays in Modeling
5.3.5 Hierarchical Structural Representation
5.4 Testbench Formation in VHDL
5.4.1 Structure of a VHDL Testbench File
5.4.2 Displaying Test Results
5.5 Adding an Existing IP to the Project
5.5.1 Adding an Existmg IP in Verilog
5.5 2 Adding an Existing IP in VHDL
6 Data Types and Operators
6.1 Number Representations
6.1.1 Binary Numbers
6.1.2 Octal Numbers
6.1.3 Hexadecimal Numbers
6.2 Negative Numbers
6.2.1 Signed Bit Representation
6.2.2 One’s Complement Representation
6.2.3 Two’s Complement Representation
6.3 Fixed- and Floating-Point Representations
6.3.1 Fixed-Point Representation
6.3.2 Floating-Point Representation
6.4 ASCII Code
6.5 Arithmetic Operations on Binary Numbers
6.6 Data Types in Verilog
6.6.1 Net and Variable Data Types
6.6.2 Data Values
6.6.3 Naming a Net or Variable
6.6.4 Defining Constants and Parameters
6.6.5 Defining Vectors
6.7 Operators in Verilog
6.7.1 Arithmetic Operators
6.7.2 Concatenation and Replication Operators
6.8 Data Types in VHDL
6.8.1 Signal and Variable Data Types
6.8.2 Data Values
6.8.3 Naming a Signal or Variable
6.8.4 Defining Constants
6.8.5 Defining Arrays
6.9 Operators in VHDL
6.9.1 Arithmetic Operators
6.9.2 Concatenation Operator
6.10 Application on Data Types and Operators
6.11 FPGA Building Blocks Used In Data Types and Operators
6.11.1 Implementation Details of Vector Operations
6.11.2 Implementation Details of Arithmetic Operations
7 Combinational Circuits
7.1 Basic Definitions
7.1.1 Binary Variable
7.1.2 Logic Function
7.1.3 Truth Table
7.2 Logic Gates
7.2.1 The NOT Gate
7.2.2 The OR Gate
7.2.3 The AND Gate
7.2.4 The XOR Gate
7.3 Combinational Circuit Analysis
7.3.1 Logic Function Formation between Input and Output
7.3.2 Boolean Algebra
7.3.3 Gate-Level Minimization
7.4 Combinational Circuit Implementation
7.4.1 Truth Table-Based Implementation
7.4.2 Implementing One-Input Combinational Circuits
7.4.3 Implementing Two-Input Combinational Circuits
7.4.4 Implementing Three-Input Combinational Circuits
7.5 Combinational Circuit Design
7.5.1 Analyzing the Problem to Be Solved
7.5.2 Selecting a Solution Method
7.5.3 Implementing the Solution
7.6 Sample Designs
7.6.1 Home Alarm System
7.6.2 Digital Safe System
7.6.3 Car Park Occupied Slot Counting System
7.7 Applications on Combinational Circuits
7.7.1 Implementing the Home Alarm System
7.7.2 Implementing the Digital Safe System
7.7.3 Implementing the Car Park Occupied Slot Counting System
7.8 FPGA Building Blocks Used in Combinational Circuits
8 Combinational Circuit Blocks
8.1.1 Half Adder
8.1.2 Full Adder
8.1.3 Adders in Verilog
8.1.4 Adders in VHDL
8.2.1 Comparators in Verilog
8.2.2 Comparators in VHDL
8.3.1 Decoders in Verilog
8.3.2 Decoders in VHDL
8.4.1 Encoders in Verilog
8.4.2 Encoders in VHDL
8.5.1 Multiplexers in Verilog
8.5.2 Multiplexers in VHDL
8.6 Parity Generators and Checkers
8.6.1 Parity Generators
8.6.2 Parity Checkers
8.6.3 Parity Generators and Checkers in Verilog
8.6.4 Parity Generators and Checkers in VHDL
8.7 Applications on Combinational Circuit Blocks
8.7.1 Improving the Calculator
8.7.2 Improving the Home Alarm System
8.7.3 Improving the Car Park Occupied Slot Counting System
8.8 FPGA Building Blocks Used in Combinational Circuit Blocks
9 Data Storage Elements
9.1.1 SR Latch
9.1.2 D Latch
9.1.3 Latches in Verilog
9.1.4 Latches in VHDL
9.2.1 D Flip-Flop
9.2.2 JK Flip-Flop
9.2.3 T Flip-Flop
9.2.4 Flip-Flops in Verilog
9.2.5 Flip-Flops in VHDL
9.5 Read-Only Memory
9.5.1 ROM in Verilog
9.5.2 ROM in VHDL
9.5.3 ROM Formation Using IP Blocks
9.6 Random Access Memory
9.7 Application on Data Storage Elements
9.8 FPGA Building Blocks Used in Data Storage Elements
10 Sequential Circuits
10.1 Sequential Circuit Analysis
10.1.1 Definition of State
10.1.2 State and Output Equations
10.1.3 State Table
10.1.4 State Diagram
10.1.5 State Representation in Verilog
10.1.6 State. Representation in VHDL
10.2 Timing in Sequential Circuits
10.2.1 Synchronous Operation
10.2.2 Asynchronous Operation
10.3 Shift Register as a Sequential Circuit
10.3.1 Shift Registers in Verilog
10.3.2 Shift Registers in VHDL
10.3.3 Multiplication and Division Using Shift Registers
10.4 Counter as a Sequential Circuit
10.4.1 Synchronous Counter
10.4.2 Asynchronous Counter
10.4.3 Counters in Verilog
10.4.4 Counters in VHDL
10.4.5 Frequency Division Using Counters
10.5 Sequential Circuit Design
10.6 Applications on Sequential Circuits
10.6.1 Improving the Home Alarm System
10.6.2 Improving the Digital Safe System
10.6.3 Improving the Car Park Occupied Slot Counting System
10.6.4 Vending Machine
10.6.5 Digital Clock
10.7 FPGA Building Blocks Used in Sequential Circuits
11 Embedding a Soft-Core Microcontroller
11.1 Building Blocks of a Generic Microcontroller
11.1.1 Central Processing Unit
11.1.2 Arithmetic Logic Unit
11.1.4 Oscillator / Clock
11.1.5 General Purpose Input/Output
11.1.6 Other Blocks
11.2 Xilinx PicoBlaze Microcontroller
11.2.1 Functional Blocks of PicoBlaze
11.2.2 PicoBlaze in Verilog
11.2.3 PicoBlaze in VHDL
11.2.4 PicoBlaze Application on the Basys3 Board
11.3 Xilinx MicroBlaze Microcontroller
11.3.1 MicroBlaze as an IP Block in Vivado
11.3.2 MicroBlaze MCS Application on the Basys3 Board
11.4 Soft-Core Microcontroller Applications
11.5 FPGA Building Blocks Used in Soft—Core Microcontrollers
12 Digital Interfacing
12.1 Universal Asynchronous Receiver/ Transmitter
12.1.1 Working Principles of UART
12.1.2 UART in Verilog
12.1.3 UART in VHDL
12.1.4 UART Applications
12.2 Serial Peripheral Interface
12.2.1 Working Principles of SPI
12.2.2 SPI in Verilog
12.2.3 SPI in VHDL
12.2.4 SPI Application
12.3 Inter-Integrated Circuit
12.3.1 Working Principles of I2C
12.3.2 I2C in Verilog
12.3.3 I2C in VHDL
12.3.4 I2C Application
12.4 Video Graphics Array
12.4.1 Working Principles of VGA
12.4.2 VGA in Verilog
12.4.3 VGA in VHDL
12.4.4 VGA Application
12.5 Universal Serial Bus
12.5.1 USB-Receiving Module in Verilog
12.5.2 USB-Receiving Module in VHDL
12.5.3 USB Keyboard Application
12.7 FPGA Building Blocks Used in Digital Interfacing
13 Advanced Applications
13.1 Integrated Logic Analyzer 1P Core Usage
13.2 The XADC Block Usage
13.3 Adding Two Floating-Point Numbers
13.5 Home Alarm System
13.6 Digital Safe System
13.7 Car Park Occupied Slot Counting System
13.8 Vending Machine
13.9 Digital Clock
13.10 Moving Wave Via LEDs
13.12 Air Freshener Dispenser
13.13 0bstacle-Avoiding Tank
13.14 Intelligent Washing Machine
13.15 Non-Touch Paper Towel Dispenser
13.16 Traffic Lights
13.17 Car Parking Sensor System
13.18 Body Weight Scale
13.19 Intelligent Billboard
13.20 Elevator Cabin Control System
13.21 Digital Table Tennis Game
13.22 Customer Counter
13.23 Frequency Meter
14 What Is Next?
14.1 Vivado High-Level Synthesis Platform
14.2 Developing a Project in Vivado HLS to Generate IP
14.3 Using the Generated IP in Vivado
Note: In an acknowledgement in the book, the authors thank Xilinx's Cathal McCabe, an AE working within the Xilinx University Program, for his guidance and assistance. They also thank thank Digilent for allowing them to use the Basys3 and Arty board images and sample projects in the book.
By Adam Taylor
Recently I received two different questions from engineers on how to use SPI with the Zynq SoC and Zynq UltraScale+ MPSoC. Having answered these I thought a detailed blog on the different uses of SPI would be of interest.
When we use a Zynq SoC or Zynq UltraScale+ MPSoC in our design we have two options for implementing SPI interfaces:
Selecting which controller to use comes down to understanding the application’s requirements. Both SPI implementations can support all four SPI modes and both can function as either a SPI master or SPI slave. However, there are some suitable differences as the table below demonstrates:
Initially, we will examine using the SPI controller integrated into the PS. We include this peripheral within our design by selecting the SPI controller within the Zynq MIO configuration tab. For this example I will route the SPI signals to the ARTY Z7 SPI connector, which requires use of EMIO via the PL I/O.
Enabling the SPI and mapping to the EMIO
With this selected all that remains within Vivado, is to connect the I/O from the SPI ports. How we do this depends upon whether we want a master or salve SPI implementation. Looking at the available ports on the SPI Controller, you will notice there are matching input (marked xxx_i) and output (marked xxx_o) ports for each SPI port. It is very important that we correctly connect these ports based on the master or slave implementation. Failure to do so will lead to hours of head scratching later when we develop the software because nothing will work as expected if we get the connections wrong. In addition, there is one Slave Select input when the controller is used as a SPI slave and three select pins for use in SPI master mode.
Once the I/O is correctly configured and the project built, we configure the SPI controller as either a master or slave using the SPI configuration options within the application software. To both configure and transfer data using the PS SPI controller, we use the API provided with the BSP, which is defined by XSPIps.H. In this first example, we will configure the PS SPI controller as the SPI Master.
By default, SPI transfers are 8-bit transfers. However we can transmit larger 16- or 32-bit words as well. To transmit 8-bit words, we use the type u8 within our C application. For 16- or 32-bit transfers, we use types u16 or u32 respectively for the read and write buffers.
At first, this may appear to cause a problem or at least generate a compiler warning because both API functions that perform data transfers require a u8 input for the transmit and receive buffers as shown below:
s32 XSpiPs_Transfer(XSpiPs *InstancePtr, u8 *SendBufPtr, u8 *RecvBufPtr, u32 ByteCount);
s32 XSpiPs_PolledTransfer(XSpiPs *InstancePtr, u8 *SendBufPtr, u8 *RecvBufPtr, u32 ByteCount);
To address this issue when using u16 or u32 types, we need to cast the buffers to a u8 pointer as demonstrated:
XSpiPs_PolledTransfer(&SpiInstance, (u8*)&TxBuffer, (u8*)&RxBuffer, 8);
This allows us to work with transfer sizes of 8, 16 or 32 bits. To demonstrate this, I connected the SPI master example to a Digilent Digital Discovery to capture the transmitted data. With the data width changed on the fly from 8- to 16-bits using the above methods in the application software.
Zynq SoC PS SPI Master transmitting four 8-bit words
PS SPI Master transmitting four 16-bit words
The alternative to implementing a SPI interface using the Zynq PS is to implement an AXI QSPI IP core within the Zynq PS. Doing this requires more options being set in the Vivado design, which will limit run-time flexibility. Within the AXI QSPI configuration dialog, we can configure the transaction width, frequency, and number of slaves. One of the most important things we also configure here is whether the AXI QSPI IP core will act as a SPI slave or master. To enable a SPI master, you must check the enable master mode option. If this module is to operate as a slave, this option must be unchecked to ensure the SPISel input pin is present. When the SPI IP core acts as a slave, this pin must be connected to the master’s slave select.
Configuring the AXI Quad SPI
As with the PS SPI controller, the BSP also provides an API for the SPI IP. We use it to develop the application software. This API is defined within the file XSPI.h. I used this API to configure the AXI QSPI as a SPI slave for the second part of the example.
To demonstrate the AXI QSPI core working properly as a SPI Slave once I had created the software. I used Digilent’s Digital Discovery to act as the SPI master, allowing data to be easily transferred between the two.
Transmitting and Receiving Data with the SPI slave. (Blue data is output by the Zynq SPI Slave)
The final design created in Vivado to support both these examples has been uploaded to github.
Final example block diagram
Of course, if you are using a Xilinx FPGA in place of a Zynq SoC or Zynq UltraScale MPSoC, it is possible to use a MicroBlaze soft processor with the same AXI QSPI configuration to implement a SPI interface. Just remember to correctly define it as a master or slave.
I hope this helps outline how we can create both master and slave SPI systems using the two different approaches.
Code is available on Github as always.
First Year E Book here
First Year Hardback here.
Second Year E Book here
Second Year Hardback here
By Adam Taylor
When we surveyed the different types of HDMI sources and sinks recently for our Zynq SoC and Zynq UltraScale+ MPSoC designs, one HDMI receiver we discussed was the ADV7611. This device receives three TDMS data streams and converts them into discrete video and audio outputs, which can then be captured and processed. Of course, the ADV7611 is a very capable and somewhat complex device. It requires configuration prior to use. We are going to examine how we can include one within our design.
ZedBoard HDMI Demonstration Configuration
To do this, we need an ADV7611. Helpfully, the FMC HDMI card provides two HDMI inputs, one of which uses an ADV7611. The second equalizes the TMDS data lanes and passes them on directly to the Zynq SoC for decoding.
To demonstrate how we can get this device up and running with our Zynq SoC or Zynq UltraScale+ MPSoC, we will create an example that uses the ZedBoard with the HDMI FMC. For this example, we first need to create a hardware design in Vivado that interfaces with the ADV7611 on the HDMI FMC card. To keep this initial example simple, I will be only receiving the timing signals output by the ADV7611. These signals are:
This approach uses the VTG’s (Video Timing Generator’s) detector to receive the sync signals and identify the received video’s timing parameters and video mode. Once the ADV7611 correctly identifies the video mode, we have configured correctly. It is then a simple step to connect the received pixel data to a Video-to-AXIS IP block and use VDMA to write the received video frames into DDR memory for further processing.
For this example, we need the following IP blocks:
The resulting block diagram appears below. The eagle-eyed will also notice the addition both a GPIO output and I2C bus from the processor system. We need these to control and configure the ADV7611.
Simple Architecture to detect the video type
Following power up, the ADV7611 generates no sync signals or video. We must first configure the device, which requires the use of an I2C bus. We therefore need to enable one of the two I2C controllers within the Zynq PS and route the IO to the EMIO so that we can then route the I2C signals (SDA and SCL) to the correct pins on the FMC connector. The ADV7611 is a complex device to configure with multiple I2C addresses that address different internal functions within the device. EDID and High-bandwidth Digital Content Protection (HDCP), for example.
We also need to be able to reset the ADV7611 following the application of power to the ZedBoard and FMC HDMI. We use a PS GPIO pin, output via the EMIO, to do this. Using a controllable I/O pin for this function allows the application software to reset of the device each time we run the program. This capability is also helpful when debugging the software application to ensure that we start from a fresh reset each time the program runs—a procedure that prevents previous configurations form affecting the next.
With the block diagram completed, all that remains is to build the design with the location constraints (identified below) to connect to the correct pins on the FMC connector for the ADV7611.
Vivado Constraints for the ADV7611 Design
Once Vivado generates the bit file, we are ready to begin configuring the ADV7611. Using the I2C interface this way is quite complex, so we will examine the steps we need to do this in detail in the next blog. However, the image below shows one set of the results from the testing of the completed software as it detects a VGA (640 pixel by 480 line) input:
VTC output when detecting VGA input format
Code is available on Github as always.
by Anthony Boorsma, DornerWorks
Why aren’t you getting all of the performance that you expect after moving a task or tasks from the Zynq PS (processing system) to its PL (programmable logic)? If you used SDSoC to develop your embedded design, there’s help available. Here’s some advice from DornerWorks, a Premier Xilinx Alliance Program member. This blog is adapted from a recent post on the DornerWorks Web site titled “Fine Tune Your Heterogeneous Embedded System with Emulation Tools.”
Thanks to Xilinx’s SDSoC Development Environment, offloading portions of your software algorithm to a Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic) to meet system performance requirements is straightforward. Once you have familiarized yourself with SDSoC’s data-transfer options for moving data back and forth between the PS and PL, you can select the appropriate data mover that represents the best choice for your design. SDSoC’s software estimation tool then shows you the expected performance results.
Yet when performing the ultimate test of execution—on real silicon—the performance of your system sometimes fails to match expectations and you need to discover the cause… and the cure. Because you’ve offloaded software tasks to the PL, your existing software debugging/analysis methods do not fully apply because not all of the processing occurs in the PS.
You need to pinpoint the cause of the unexpected performance gap. Perhaps you made a sub-optimal choice of data mover. Perhaps the offloaded code was not a good candidate for offloading to the PL. You cannot cure the performance problem without knowing its cause.
Just how do you investigate and debug system performance on a Zynq-based heterogeneous embedded system with part of the code running in the PS and part in the PL?
If you are new to the world of debugging PL data processing, you may not be familiar with the options you have for viewing PL data flow. Fortunately, if you used SDSoC to accelerate software tasks by offloading them to the PL, there is an easy solution. SDSoC has an emulation capability for viewing the simulated operation of your PL hardware that uses the context of your overall system.
This emulation capability allows you to identify any timing issues with the data flow into or out of the auto-generated IP blocks that accelerate your offloaded software. The same capability can also show you if there is an unexpected slowdown in the offloaded software acceleration itself.
Using this tool can help you find performance bottlenecks. You can investigate these potential bottlenecks by watching your data flow through the hardware via the displayed emulation signal waveforms. Similarly, you can investigate the interface points by watching the data signals transfer data between the PS and the PL. This information provides key insights that help you find and fix your performance issues.
We’ll focus on the multiplier IP block from the Xilinx MMADD example to demonstrate how you can debug/emulate a hardware-accelerated function. For simplicity, we will focus on one IP block, the matrix multiplier IP block from the Multiply and Add example, shown in Figure 1.
Figure 1: Multiplier IP block with Port A expanded to show its signals
We will look at the waveforms for the signals to and from this Mmult IP block in the emulation. Specifically we will view the A_PORTA signals as shown in the figure above. These signals represent the data input for matrix A, which corresponds to the software input param A to the matrix multiplier function.
To get started with the emulation, enable generation of the “emulation model” configuration for the build in SDSoC’s project’s settings, as shown in Figure 2.
Figure 2: The mmult Project Settings needed to enable emulation
Next, rebuild your project as normal. After building your project with emulation model support enabled in the configuration, run the emulator by selecting “Start/Stop Emulation” under the “Xilinx Tools” menu option. When a window opens, select “Start” to start the emulator. SDSoC will then automatically launch an instance of Xilinx Vivado, which triggers the auto-generated PL project that SDSoC created for you as a subproject within your SDSoC project.
We specifically want to view the A_PORTA signals of the Mmult IP block. These signals must be added to the Wave Window to be viewed during a simulation. The available Mmult signals can be viewed in the Objects pane by selecting the mmult_1 block in the Scopes pane. To add the A_PORTA signals to the Wave Window, select all of the “A_*” signals in the Objects pane, right click, and select “Add to Wave Window” as shown in Figure 3.
Figure 3: Behavioral Simulation – mmult_1 signals highlighted
Now you can run the emulation and view the signal states in the waveform viewer. Start the emulator by clicking “Run All” from the “Run” drop-down menu as shown in Figure 4.
Figure 4: Start emulation of the PL
Back SDSoC’s toolchain environment, you can now run a debugging session that connects to this emulation session as it would to your software running on the target. From the “Run” menu option, select “Debug As -> 1 Launch on Emulator (SDSoC Debugger)” to start the debug session as shown in Figure 5.
Figure 5: Connect Debug Session to run the PL emulation
Now you can step or run through your application test code and view the signals of interest in the emulator. Shown below in Figure 6 are the A_PORTA signals we highlighted earlier and their signal values at the end of the PL logic operation using the Mmult and Add example test code.
Figure 6: Emulated mmult_1 signal waveforms
These signals tell us a lot about the performance of the offloaded code now running in the PL and we used familiar emulation tools to obtain this troubleshooting information. This powerful debugging method can help illuminate unexpected behavior in your hardware-accelerated C algorithm by allowing you to peer into the black box of PL processing, thus revealing data-flow behavior that could use some fine-tuning.
If you’re teaching digital design (or learning it), then the Digilent Nexys4-DDR FPGA Trainer Board based on the Xilinx Artix-7 A100T FPGA is a very good teaching platform because it provides you with ample programmable logic to work with (15,850 logic cells, 13.14Mbits of on-chip SRAM, and 240 DSP48E1 slices) along with 128Mbytes of DDR2 SDRAM and a good mix of peripherals and it’s paired with the industry’s most advanced system design tool—Xilinx Vivado.
RS University and Digilent have partnered to provide academics with a free half-day workshop on teaching digital systems using FPGAs (and the Nexys4-DDR Trainer Board). The half-day workshop will take place at Coventry U. on October 25th 2017 in the Engineering and Computing Building. (More info here, registration here.)
The following blog post contains explicitly competitive information. If you do not like to read such things or if you live in a country where you’re not supposed to read such things, then stop reading.
In this blog post, I will discuss device performance in a competitive context. Now, whenever you read about “the competition” on a vendor’s Web site, you need to take the information provided with a big grain of salt. It’s hard to believe anything one vendors says about the competition, which is why I so rarely attempt to do so in the Xcell Daily blog.
This post is an exception.
With that caveat stated, let’s rush in where angels fear to tread.
There’s a new 18-page White Paper on the Xilinx.com Web site titled “Measuring Device Performance and Utilization: A Competitive Overview” and written by Frederic Rivoallon, the Vivado HLS and RTL Synthesis Product Manager here at Xilinx. Rivoallon’s White Paper “compares actual Kintex UltraScale FPGA results to Intel’s (formerly Altera) Arria 10, based on publicly available OpenCores designs.” (OpenCores.org declares itself to be “the world’s largest site/community for development of hardware IP cores as open source.”) The data for this White Paper was generated in June, 2017 and is based on the latest versions of the respective design tools available at that time (Vivado Design Suite 2017.1 and Quartus Prime v16.1).
Cutting to the chase, here’s the White Paper’s conclusion, conveniently summarized in the same White Paper’s introduction:
“Verifiable results based on OpenCores designs demonstrate that the Xilinx UltraScale architecture delivers a two-speed-grade performance boost over competing devices while implementing 20% more design content. This boost equates to a generation leap over the closest competitive offering.”
I place in evidence Exhibit 1 (actually Figure 1 in the White Paper), which compares Kintex UltraScale FPGA device utilization versus Arria 10 device utilization and shows that it’s much harder to use all of the Arria 10’s device capacity than it is for the Kintex UltraScale device:
It’s quite reasonable for you to ask “why is this so?” at this point. In fact, you certainly should. I’m told and the White Paper explains that there’s a fundamental architectural reason for this significant utilization disparity. You see it in the architectural difference between a Xilinx UltraScale CLB and an Arria ALM (adaptive logic module). Here’s the picture (which is Figure 2 in the White Paper):
You can see that the two 6-input LUTs in the Arria 10 ALM share four inputs while the two 6-input LUTs in the UltraScale device have independent inputs. (Xilinx UltraScale+ devices employ the same LUT configuration.) There’s no sleight of hand here. Given enough routing resources (which the Xilinx UltraScale architecture has) and a sufficiently clever place-and-route tool (which Vivado has), you will be able to use both 6-input LUTs more often if they have independent inputs than if they have several shared inputs. Hence the greater maximum usable resource capacity for UltraScale and UltraScale+ devices.
And now for Exhibit 2. Here’s the associated performance graph showing FMAX for the various OpenCores IP cores (Figure 3 in the White Paper):
As you might expect from a Xilinx White Paper, the UltraScale device performs better after placement and routing. There are many more such Exhibits (charts and graphs) for you to peruse in the White Paper and Xilinx does not always win.
Well, the purpose of this blog post is twofold. First, I wanted you to be aware of this White Paper. If you’ve read this far, that goal has been achieved. Second, I don’t want you to take my word for it. I am reporting what’s stated in the White Paper but you should know that this White Paper was created in response to a similar White Paper published a few months back by “the competition.” No surprise, the competition’s White Paper came to different conclusions.
So who is right?
As a former Editor-in-Chief of both EDN Magazine and Microprocessor Report, I am well aware of benchmarks. In fact, EEMBC, the industry alliance that developed industry-standard benchmarks for embedded systems, was based on a hands-on project conducted by former EDN editor Markus Levy in 1996 while I was EDN’s Editor-in-Chief. Markus founded EEMBC a year later. I devoted a portion of Chapter 3 in my book “Designing SoCs with Configured Cores” to microprocessor benchmarking and I wrote an entire chapter (Chapter 10) about the history of microprocessor benchmarking for the textbook titled “EDA for IC System Design, Verification, and Testing,” published in 2006. That chapter also discussed some of the many ways to achieve the results you desire from benchmarks. FPGA benchmarks are in a similar state of affairs, going back at least to the 1990s and the famous/infamous PREP benchmark suite.
Here’s what Alexander Carlton at HP in Cupertino, California wrote way back in 1994 in his article on the SPEC Web site titled “Lies, **bleep** Lies, and Benchmarks”:
“It has been said that there are three classes of untruths, and these can be classified (in order from bad to worse) as: Lies, **bleep** Lies, and Benchmarks. Actually, this view is a corollary to the observation that ‘Figures don't lie, but liars can figure...’ Regardless of the derivation of this opinion, criticism of the state of performance marketing has become common in the computer industry press.”
[Editorial note: The blogging tool has modified the article's title to meet its Victorian sense of propriety.]
To my knowledge, no shenanigans were used to achieve the above FPGA benchmark results (I did ask) but I nevertheless caution you to be careful when interpreting the numbers. Here’s how I’d view these White Paper benchmark results:
Your mileage may vary. (Even the US EPA says so.) The only benchmark truly indicative of the device utilization and performance you’ll get for your design is… your design. Benchmarks are merely surrogates for your design.
So go ahead. Download and read the new Xilinx “Measuring Device Performance and Utilization: A Competitive Overview” White Paper, get educated, and then start asking questions.
Now that Amazon has made the FPGA-accelerated Amazon EC2 F1 compute instance generally available to all AWS customers (see “AWS makes Amazon EC2 F1 instance hardware acceleration based on Xilinx Virtex UltraScale+ FPGAs generally available”), just about anyone can get access to the latest Xilinx All Programmable UltraScale+ devices from anywhere, just as long as you have an Internet connection and a Web browser. Xilinx has just published a new video demonstrating the use of its Vivado IP Integrator, a graphical-based design tool, with the AWS EC2 F1 compute instance.
Why use Vivado IP Integrator? As the video says, there are five main reasons:
Here’s the 5-minute video:
By Anthony Boorsma, DornerWorks
Having some trouble choosing between Vivado HLS and SDSoC? Here’s some advice from DornerWorks, a Premier Xilinx Alliance Program member. This blog is adapted from a recent post on the DornerWorks Web site titled “Algorithm Implementation and Acceleration on Embedded Systems”
How does an engineer already experienced and comfortable with working in the Zynq SoC’s software-based PS (processing system) domain take advantage of the additional flexibility and processing power of the Zynq SoC’s PL (programmable logic)? The traditional method is through education and training to learn to program the PL using an HDL such as Verilog or VHDL. Another way is to learn and use a tool that allows you to take a software-based design written exclusively for the ARM 32-bit processors in the PS and transfer some or most of the tasks to the PL, without writing HDL descriptions.
One such tool is Xilinx’s Vivado High Level Synthesis (HLS). By leveraging the capabilities of HLS, you can prototype a design using the Zynq PS and then move functionality to the PL to boost performance. The advantage of this tool is that it generates IP blocks that can be used in the programmable logic of Xilinx FPGAs as well as Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs.
Logic optimization occurs when Vivado HLS synthesizes your algorithm’s C model and creates RTL. There are code directives (essentially guidelines for the tools’ optimization process) available that allow you to guide the HLS tool’s synthesis from the C model source to the RTL bitstream programmed into the FPGA. If you are working with an existing algorithm modeled in C, C++, or SystemC and need to implement this algorithm in custom logic for added performance, then HLS is a great tool choice.
However, be aware that the data movers that transfer data between the Zynq PS and the PL must be manually configured for performance when using Vivado HLS. This can become a complicated process when there’s significant data transfer between the domains.
A recent innovation that simplifies data-mover configuration is the development of the Xilinx SDSoC (Software-Defined System on Chip) Development Environment for use with Zynq SoCs and Zynq UltraScale+ MPSoCs. SDSoC builds on Vivado HLS’ capabilities by using HLS to perform the C-to-RTL conversion but with the convenient addition of automatically generated data movers, which greatly simplifies configuring the connection between the software running on the Zynq PS and the accelerated algorithm executing in the Zynq PL. SDSoC also allows you to guide data-mover generation by providing a set of pragmas to make specific data-mover choices. The SDSoC directive pragmas give you control over the automatically generated data movers but still require some minimal manual configuration. Code-directive pragmas for RTL optimization available in Vivado HLS are also available in SDSoC and can be used in tandem with SDSoC pragmas to optimize both the PL algorithm and the automatically generated data movers.
It is possible to disable the SDSoC auto generated data movers and only use the HLS optimizations. Demonstrated below are an IP block diagram generated with the auto configured SDSoC data movers and one without them.
The following screen shots are taken from a Xilinx-provided template project demonstrating the acceleration of a software matrix multiplication and addition algorithm, provided with the SDx installation. We used the SDx 2016.4 toolchain and targeted an Avnet Zedboard with a standalone OS configuration for this example.
Here is a screen shot of the same block, but without the SDSoC data movers. (We have disabled the automatic generation of data movers within SDSoC by manually declaring the AXI HLS interface directives for both mmult and madd accelerated IP block.)
To achieve the best algorithm performance, be prepared to familiarize yourself and use both the SDSoC and Vivado HLS user guides and datasheets. SDSoC provides a superset of Vivado HLS’s capabilities.
If you are developing and accelerating your model from first principles but want to take advantage of the flexibility of testing and proving out a design in software first, and you don’t intend to use a Zynq SoC, then using the Vivado HLS toolset straightaway is the place to start. A design started in HLS is transferable to an SDSoC if requirements change. Alternatively, if using a Zynq-based system is possible, it would be worthwhile to start right away with using SDSoC.
Last September at the GNU Radio Conference in Boulder, Colorado, Ettus Research announced the RFNoC & Vivado Challenge for SDR (software-defined radio). Ettus’ RFNoC (RF Network on Chip) is designed to allow you to efficiently harness the latest-generation FPGAs for SDR applications without being an expert firmware or FPGA developer. Today, Ettus Research and Xilinx announced the three challenge winners.
Ettus’ GUI-based RFNoC design tool allows you to create FPGA applications as easily as you can create GNU Radio flowgraphs. This includes the ability to seamlessly transfer data between your host PC and an FPGA. It dramatically eases the task of FPGA off-loading in SDR applications. Ettus’ RFNoC is built upon Xilinx’s Vivado HLS.
Here are the three winning teams and their projects:
Finally, here’s a 5-minute video announcing the winners along with the prizes they have won:
Got a problem getting enough performance out of your processor-based embedded system? You might want to watch a 14-minute video that does a nice job of explaining how you can develop hardware accelerators directly from your C/C++ code using the Xilinx SDK.
How much acceleration do you need? If you don’t know for sure, the video gives an example of an autonomous drone with vision and control tasks that need real-time acceleration.
What are your alternatives? If you need to accelerate your code, you can:
Unfortunately, each of these three alternatives increases power consumption. There’s another alternative however that can actually cut power consumption. That alternative’s based on the use of Xilinx All Programmable Zynq SoCs and Zynq UltraScale+ MPSoCs. By moving critical code into custom hardware accelerators implements in the programmable logic incorporated into all Zynq family members, you can relieve the processor of the associated processing burden and actually slow the processor’s clock speed, thus reducing power. It’s quite possible to cut overall power consumption using this approach.
Ah, but implementing these accelerators. Aye, there’s the rub!
It turns out that implementation of these hardware accelerators might not be as difficult as you imagine. The Xilinx SDK is already a C/C++ development environment based on familiar IDE and compiler technology. Under the hood, the SDK serves as a single cockpit for all Zynq-based development work—software and hardware. It also includes SDSoC, the piece of the puzzle you need to convert C/C++ code into acceleration hardware using a 3-step process:
One development platform, SDK, serves all members of the Zynq SoC and Zynq UltraScale+ MPSoC device families, giving you a huge price/performance range.
Here’s that 14-minute video:
When I first wrote about JTAG for EDN magazine in 1988 (Design for testability creates better products at lower cost), it was not a well-liked standard. No one wanted to dedicate three or four precious pins on an IC package (back when a lot of devices had only 40 pins); no one wanted to spend approximately 2% to 4% of the silicon die’s real estate on testability; and everyone thought that the serial test protocol was slow. Fast forward three decades. JTAG is a definitive standard and we’ve found all sorts of terrific things to do with it besides testing—downloading configurations into FPGAs and debugging designs for example.
JTAG has been an essential part of device configuration, debugging, and performance analysis in Xilinx All Programmable devices for a long, long time. When the number of configuration bits was small, JTAG-based configuration and debug felt fast. Times are a bit different now and JTAG bit rates that were once OK might now feel a bit slow.
As of now, you have a new, faster alternative for JTAG-based configuration, debug, and performance analysis. That alternative is called the Xilinx SmartLynq Data Cable and it boosts default JTAG bitstream programming rates from 0.4 to 4Mbytes/sec (10x) and the JTAG maximum JTAG clock frequency from 12 to 40MHz (3.33x). That’s a lot faster.
The $495 SmartLynq Data Cable is backward compatible with the Xilinx Platform Cable USB II and uses the same, standard PC4 JTAG header connection to the target board. It is compatible with the Vivado Design Suite, Labtools, and Xilinx Software Development Kit. The SmartLynq Data Cable also has some nice features not available with the Xilinx Platform Cable USB II including an Ethernet host interface. (More details available in the SmartLynq Data Cable Quick Start Guide.)
Stop waiting. Check out the SmartLynq Data Cable today.
Blue Pearl software has just announced Visual Verification Suite 2017.2, a suite of advanced RTL verification tools for advanced RTL linting, constraint generation, clock-domain crossing (CDC) analysis, and a debug environment that directly integrates with and augments the tools included in the Xilinx Vivado Design Suite. These tools can help you find more design bugs sooner, before getting into the more time-consuming design and analysis techniques—namely simulation and synthesis.
This new release of Visual Verification Suite 2017.2 includes updates to Blue Pearl’s Analyze RTL linting (super-linting) and debug tools, Synopsys Design Constraints (SDC) file generation, and CDC analysis to accelerate RTL verification. The new release also provides built in FPGA libraries and design rules that follow the Xilinx UltraFast Design Methodology. (Rules can be customized for design reuse and for conformance to safety standards such as STARC and DO-254.) You can download the Blue Pearl app from the Xilinx Tcl Store to integrate the Visual Verification Suite into the Vivado interactive design environment.
The Blue Pearl Visual Verification Suite consists of:
Here’s a block diagram of the Blue Pearl Visual Verification Suite product flow:
And here’s an excellent, 2-minute video explaining the complex interactions of timing, timing constraints, critical path timing, false paths, and multi-cycle paths and their relationship to synthesis:
The Blue Pearl Visual Verification Suite tools come wrapped in a visual environment that make it easier for you to chase down and kill bugs as early in the design cycle as possible. I’m told by Blue Pearl that Visual Verification Suite customers say that the Blue Pearl tools save them more than two weeks of development time in an average 16-week development cycle.
For more information about the Visual Verification Suite, please contact Blue Pearl Software directly.
Note: For more information about the Xilinx UltraFast Design Methodology, see “UltraFast: Hand-picked best practices from industry experts, distilled into a potent Design Methodology” and “Xilinx UltraFast Design Methodology gets free, 2-page Quick Reference Guide that you can read…ultra fast.” You can also download the free “UltraFast Design Methodology Guide for the Vivado Design Suite” and “UltraFast Embedded Design Methodology Guide.”
Blue Pearl also has a White Paper titled “Accelerating Xilinx All Programmable FPGA and SoC Design Verification with Blue Pearl Software” that you might want to read.
Xilinx 7 series FPGAs have 50-pin I/O banks with one common supply voltage for all 50 pins. The smaller Spartan-7 FPGAs have 100 I/O pins in two I/O banks, so it might be convenient in some smaller designs (or even some not-so-small designs) to combine the I/O for configuration and DDR memories into one FPGA I/O bank (plus the dedicated configuration bank 0) if possible so that the remaining I/O bank can operate at a different I/O voltage.
It turns out, you can do this with some MIG (Memory Interface Generator) magic, a little Vivado tool fiddling, and a simple level translator for the Flash memory’s data lines.
Application note XAPP1313 titled “Spartan-7 FPGA Configuration with SPI Flash and Bank 14 at 1.35V” shows you how to do this with a 1.8V Quad SPI Flash memory and 1.35V DDR3L SDRAM. Here’s a simplified diagram of what’s going on:
The advantage here is that you don’t need to move up to a larger FPGA to get another I/O bank.
For step-by-step instructions, see XAPP1313.
If you have read Adam Taylor’s 200+ MicroZed Chronicles here in Xcell Daily, you already know Adam to be an expert in the design of systems based on programmable logic, Zynq SoCs, and Zynq UltraScale+ MPSoCs. But Adam has significant expertise in the development of mission-critical systems based on his aerospace engineering work. He gave a talk about this topic at the recent FPGA Kongress held in Munich and he’s kindly re-recorded his talk, combined with slides in the following 67-minute video.
Adam spends the first two-thirds of the video talking about the design of mission-critical systems in general and then spends the rest of the time talking about Xilinx-specific mission-critical design including the design tools and the Xilinx isolation design flow.
Here’s the video:
Xilinx is starting a Vivado Expert Webinar Series to help you improve your design productivity and the first one, devoted to achieving timing closure in high-speed designs, takes place on July 26. Balachander Krishnamurthy—a Senior Product Marketing Manager for Static Timing Analysis, Constraints and Pin Planning—will present the material and will provide insight into Vivado high-speed timing-closure techniques along with some helpful guidelines.
By Adam Taylor
With the Vivado design for the Lepton thermal imaging IR camera built and the breakout board connected to the Arty Z7 dev board, the next step is to update the software so that we can receive and display images. To do this, we can also use the HDMI-out example software application as this correctly configures the board’s VDMA output. We just need to remove the test-pattern generation function and write our own FLIR control and output function as a replacement.
This function must do the following:
To successfully read out an image from the Lepton camera, we need to synchronize the VoSPI output to find the start of the first line in the image. The camera outputs each line as a 160-byte block (Lepton 2) or two 160-byte blocks (Lepton 3), and each block has a 2-byte ID and a 2-byte CRC. We can use this ID to capture the image, identify valid frames, and store them within the image store.
Performing steps 3 and 4 allows us to increase the size of the displayed image on the screen. The Lepton 2 camera used for this example has a resolution of only 80 horizontal pixels by 60 vertical pixels. This image would be very small when displayed on a monitor, so we can easily scale the image to 640x480 pixels by outputting each pixel and line eight times. This scaling produces a larger image that’s easier to recognize on the screen although may look a little blocky.
However, scaling alone will not present the best image quality as we have not configured the Lepton camera module to optimize its output. To get the best quality image from the camera module, we need to use the I2C command interface to enable parameters such as AGC (automatic gain control), which affects the contrast and quality of the output image, and flat-field correction to remove pixel-to-pixel variation.
To write or read back the camera module’s settings, we need to create a data structure as shown below and write that structure into the camera module. If we are reading back the settings, we can then perform an I2C read to read back the parameters. Each 16-bit access requires two 8-bit commands:
This sequence allows us to configure the Lepton camera so that we get the best performance. When I executed the updated program, I could see the image that appears below, of myself taking a picture of the screen on the monitor screen. The image has been scaled up by a factor of 8.
Now that we have this image on the screen, I want to integrate this design with MiniZed dev board and configure the camera to transfer images over a wireless network.
Code is available on Github as always.
In a free Webinar taking place on July 12, Xilinx experts will present a new design approach that unleashes the immense processing power of FPGAs using the Xilinx reVISION stack including hardware-tuned OpenCV libraries, a familiar C/C++ development environment, and readily available hardware-development platforms to develop advanced vision applications based on complex, accelerated vision-processing algorithms such as dense optical flow. Even though the algorithms are advanced, power consumption is held to just a few watts thanks to Xilinx’s All Programmable silicon.
By Adam Taylor
Over this blog series, I have written a lot about how we can use the Zynq SoC in our designs. We have looked at a range of different applications and especially at embedded vision. However, some systems use a pure FPGA approach to embedded vision, as opposed to an SoC like the members in the Zynq family, so in this blog we are going to look at how we can get a simple HDMI input-and-output video-processing system using the Artix-7 XC7A200T FPGA on the Nexys Video Artix-7 FPGA Trainer Board. (The Artix-7 A200T is the largest member of the Artix-7 FPGA device family.)
Here’s a photo of my Nexys Video Artix-7 FPGA Trainer Board:
Nexys Video Artix-7 FPGA Trainer Board
For those not familiar with it, the Nexys Video Trainer Board is intended for teaching and prototyping video and vision applications. As such, it comes with the following I/O and peripheral interfaces designed to support video reception, processing, and generation/output:
To create a simple image-processing pipeline, we need to implement the following architecture:
The supervising processor (in this case, a Xilinx MicroBlaze soft-core RISC processor implemented in the Artix-7 FPGA) monitors communications with the user interface and configures the image-processing pipeline as required for the application. In this simple architecture, data received over the HDMI input is converted from its parallel format of Video Data, HSync and VSync into an AXI Streaming (AXIS) format. We want to convert the data into an AXIS format because the Vivado Design Suite provides several image-processing IP blocks that use this data format. Being able to support AXIS interfaces is also important if we want to create our own image-processing functions using Vivado High Level Synthesis (HLS).
The MicroBlaze processor needs to be able to support the following peripherals:
AXI Timer – Enables the MicroBlaze to time events
MicroBlaze Debugging Module – Enables the debugging of the MicroBlaze
We’ll use the memory interface generator to create a DDR interface to the board’s SDRAM. This interface and the SDRAM creates a common frame store accessible to both the image-processing pipeline and the supervising processor using an AXI interconnect.
Creating a simple image-processing pipeline requires the use of the following IP blocks:
The core of this video-processing chain is the VDMA, which we use to move the image into the DDR memory.
The diagram above demonstrates how the IP block converts from streamed data to memory-mapped data for the read and write channels. Both VDMA channels provide the ability to convert between streaming and memory-mapped data as required. The write channel supports Stream-to-Memory-Mapped conversion while the read channel provides Memory-Mapped-to-Stream conversion.
When all this is put together in Vivado to create the initial base system, we get the architecture below, which is provided by the Nexys Video HDMI example.
All that is required now is to look at the software required to configure the image-processing pipeline. I will explain that next time.
Code is available on Github as always.
The RISC-V open-source processor has a growing ecosystem and user community so it’s not surprising that someone would want to put one of these processors into a low-cost FPGA like a Xilinx Artix-7 device. And what could be easier than doing so using an existing low-cost dev board? Cue Digilent’s Arty Dev Board, currently on sale for $89.99 here. Normally, you’d find a copy of the Xilinx MicroBlaze soft RISC processor core inside of Arty’s Artix-7 FPGA but a SiFive Freedom E310 microcontroller platform that combines a RISC-V processor with peripherals seems to fit just fine so that’s just what Andrew Black has done using the no-cost Xilinx Vivado HL WebPack Edition to compile the HDL.
Digilent’s ARTY Artix-7 FPGA Dev Board
With Black’s step-by-step instructions based on SiFive's "Freedom E300 Arty FPGA Dev Kit Getting Started Guide", you can do the same pretty easily. (See “Build an open source MCU and program it with Arduino.”)
Note: For more information on the Digilent Arty Dev Board, see “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????” and “Free Webinar on $99 Arty dev kit, based on Artix-7 FPGA, now online.”
In February, I wrote a blog detailing the use of a Xilinx Kintex-7 K325T or K410T FPGA in Keysight’s new line of high-speed AWGs (arbitrary waveform generators) and signal digitizers. (See “Kintex-7 FPGAs sweep the design of six new Keysight high-speed PXI AWGs and Digitizers.”) The six new Keysight PXI instruments in that blog included the M3100A 100MSamples/sec, 4 or 8-channel FPGA digitizer; the M3102A 500Msamples/sec, 2 or 4-channel FPGA digitizer; M3201A 500MSamples/sec FPGA arbitrary waveform generator; the M3202A 1GSamples/sec FPGA arbitrary waveform generator; M3300A 500MSamples/sec, 2-channel FPGA AWG/digitizer combo; and the M3302A 500MSamples/sec, 4-channel FPGA AWG/digitizer combo.
In that blog post, I wrote:
“This family of Keysight M3xxx instruments clearly demonstrates the ability to create an FPGA-based hardware platform that enables rapid development of many end products from one master set of hardware designs. In this case, the same data-acquisition and AWG block diagrams recur on the data sheets of these instruments, so you know there’s a common set of designs.”
And that’s still true. Incorporating a Xilinx All Programmable FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC into your product design allows you to create a hardware platform (or platforms) that give you a fast way to spin out new, highly differentiated products based on that platform. Keysight, realizing that the FPGA capability would be useful to its own customers as well, exposed much of the internal FPGA capabilities in these instruments through the Keysight M3602A Graphical FPGA Development Environment, which allows you to customize these instruments using off-the-shelf DSP blocks, MATLAB/Simulink, the Xilinx CORE Generator and Vivado IP cores, and the Xilinx Vivado Design Suite with either VHDL or Verilog code.
Keysight’s M3602A FPGA Block Diagram Editor
A recent Keysight email alerted me to three new application notes Keysight has published that detail the use of on-board FPGA resources to enhance the instruments for specific applications. The three app notes are:
Only All Programmable devices give you this kind of high-speed hardware programmability in addition to microprocessor-based software programmability and these Keysight instruments and the M3602A Development Environment are yet one more demonstration of why that’s a very handy option for you to consider when designing your own products.
As I concluded in that February blog post (and it’s worth repeating):
“Xilinx FPGAs are inherently well-suited to this type of platform-based product design because of the All-Programmable (I/O, hardware, and software) nature of the devices. I/O programmability permits any-to-any connectivity—as is common with, for example, camera designs when you’re concerned about adapting to a range of sensors or different ADCs and DACs for digitizers and AWGs. Hardware programmability allows you to rapidly modify real-time signal-processing or motor-control algorithms—as is common with diverse designs including high-speed instrument designs and industrial controllers.”
Of course these same ideas apply to all types of products, not just AWGs and digitizers.
(You can access the three Keysight app notes here.)
Metamako decided that it needed more than one Xilinx UltraScale FPGA to deliver the low latency and high performance it wanted from its newest networking platform. The resulting design is a 1RU or 2RU box that houses one, two, or three Kintex UltraScale or Virtex UltraScale+ FPGAs, connected by “near-zero” latency links. The small armada of FPGAs means that the platform can run multiple networking applications in parallel—very quickly. This new networking platform allows Metamako to expand far beyond its traditional market—financial transaction networking—into other realms such as medical imaging, SDR (software-defined radio), industrial control, and telecom. The FPGAs are certainly capable of implementing tasks in all of these applications with extremely high performance.
Metamako’s Triple-FPGA Networking Platform
The Metamako platform offers an extensive range of standard networking features including data fan-out, scalable broadcast, connection monitoring, patching, tapping, time-stamping, and a deterministic port-to-FPGA latency of just 3nsec. Metamako also provides a developer’s kit with the platform with features that include:
This latest networking platform from Metamako demonstrates a key attribute of Xilinx All Programmable technology: the ability to fully differentiate a product by exploiting the any-to-any connectivity and high-speed processing capabilities of Xilinx silicon using Xilinx’s development tools. No other chip technology could provide Metamako with a comparable mix of extreme connectivity, speed, and design flexibility.
You can now download the Vivado Design Suite 2017.2 HLx editions, which include many new UltraScale+ devices:
In addition, the low-cost Spartan-7 XC7S50 FPGA has been added to the WebPack edition.
Download the latest releases of the Vivado Design Suite HL editions here.
Last month, Xilinx Product Marketing Manager Darren Zacher presented a Webinar on the extremely popular $99 Arty Dev Kit, which is based on a Xilinx Artix-7 A35T FPGA, and it’s now online. If you’re wondering if this might be the right way for you to get some design experience with the latest FPGA development tools and silicon, spend an hour with Zacher and Arty. The kit is available from Avnet and Digilent.
For more information about the Arty Dev Kit, see: “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????”