UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

 A YouTube video maker with the handle “takeshi i” has just posted an 18-minute video titled “IoT basics with ZYBO (Zynq)” that demonstrates an IoT design created with a $199 Digilent Zybo Z7 dev board based on a Xilinx Zynq SoC. (Note: It's a silent video.)

 

First, the YouTube video demonstrates the IoT design interacting with an app on a mobile phone. Then video takes you step-by-step through the creation process using the Xilinx Vivado development environment.

 

The YouTuber writes:

 

“I implemented a web server using Python and bottle framework, which works with another C++ application. The C++ application controls my custom IPs (such as PWM) implemented in PL block. A user can control LEDs, 3-color LEDs, buttons and switches mounted on ZYBO board.”

 

The YouTube video’s Web page also lists the resources you need to recreate the IoT design:

 

 

 

 

Here’s the video:

 

 

 

 

 

Last month, a user on EmbeddedRelated.com going by the handle stephaneb started a thread titled “When (and why) is it a good idea to use an FPGA in your embedded system design?” Olivier Tremois (oliviert), a Xilinx DSP Specialist FAE based in France, provided an excellent, comprehensive, concise, Xilinx-specific response worth repeating in the Xcell Daily blog:

 

 

 

As a Xilinx employee I would like to contribute on the Pros ... and the Cons.

 

Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.


Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:

 

FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...

 

Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.

 

Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).

 

Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].

 

Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:

 

  • IP Integrator(IPI): a graphical interface to stitch IPs together and generate bitstreams for complete systems.

 

  • Vivado HLS (High Level Synthesis): a tool that allows you to generate HDL code from C/C++ code. This tool will generate IPs that can be handled by IPI.

 

 

  • SDSoC (Software Defined SoC): This tool allows you to design complete systems, software and hardware on a Zynq SoC/Zynq UltraScale+ MPSoC platform. This tool with some plugins will allow you to move part of your C/C++ code to programmable logic (calling Vivado HLS in the background).

 

  • SDAccel: an OpenCL (and more) implementation. Not relevant for this thread.

 

 

There are also tools related to the MathWorks environment [MATLAB and Simulink]:

 

 

  • System Generator for DSP (aka SysGen): Low-level Simulink library (designed by Xilinx for Xilinx FPGAs). Allows you to program HDL code with blocks. This tools achieves even better performance (clock/area) than HDL code as each block is an instance of an IP (from register, adder, counter, multiplier up to FFT, FIR compiler, and VHLS IP). Bit-true and cycle-true simulations.

 

  • Xilinx Model Composer (XMC): available since ... yesterday! Again a Simulink blockset but based on Vivado HLS. Much faster simulations. Bit-true but not cycle-true.

 

 

All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…

 

 

 

 

Vivado 2017.4 is now available. Download it now to get these new features (see the release notes for complete details):

 

 

 

 

Download the new version of the Vivado Design Suite HLx editions here.

 

 

 

 

Mathworks has been advocating model-based design using its MATLAB and Simulink development tools for some time because the design technique allows you to develop more complex software with better quality in less time. (See the Mathworks White Paper: “How Small Engineering Teams Adopt Model-Based Design.”) Model-based design employs a mathematical and visual approach to developing complex control and signal-processing systems through the use of system-level modeling throughout the development process—from initial design, through design analysis, simulation, automatic code generation, and verification. These models are executable specifications that consist of block diagrams, textual programs, and other graphical elements. Model-based design encourages rapid exploration of a broader design space than other design approaches because you can iterate your design more quickly, earlier in the design cycle. Further, because these models are executable, verification becomes an integral part of the development process at every step. Hopefully, this design approach results in fewer (or no) surprises at the end of the design cycle.

 

Xilinx supports model-based design using MATLAB and Simulink through the new Xilinx Model Composer, a design tool that integrates into the MATLAB and Simulink environments. The Xilinx Model Composer includes libraries with more than 80 high-level, performance-optimized, Xilinx-specific blocks including application-specific blocks for computer vision, image processing, and linear algebra. You can also import your own custom IP blocks written in C and C++, which are subsequently processed by Vivado HLS.

 

Here’s a block diagram that shows you the relationship among Mathworks’ MATLAB, Simulink, and Xilinx Model Composer:

 

 

 

Xilinx Model Composer.jpg 

 

 

 

Finally, here’s a 6-minute video explaining the benefits and use of Xilinx Model Composer:

 

 

 

 

 

 

 

RHS Research’s PicoEVB FPGA dev board based on an Artix-7 A50T FPGA snaps into an M.2 2230 key A or E slot, which is common in newer laptops. The board measures 22x30mm, which is slightly larger than the Artix-7 FPGA and configuration EEPROM mounted on one side of the board. It has a built-in JTAG connection that works natively with Vivado.

 

Here’s a photo that shows you the board’s size in relation to a US 25-cent piece:

 

 

 

RHS Research PicoEVB.jpg 

 

 

 

Even though the board itself is small, you still get a lot of resources in the Artix-7 A50T FPGA including 52,160 logic cells, 120 DSP48 slices, and 2.7Mbits of BRAM.

 

Here’s a block diagram of the board:

 

 

 

 

RHS Research PicoEVB Block Diagram.jpg 

 

 

The PicoEVB is available on Crowd Supply. The project was funded at the end of October.

 

By Adam Taylor

 

 

For the final MicroZed Chronicles blog of the year, I thought I would wrap up with several tips to help when you are creating embedded-vision systems based on Zynq SoC, Zynq UltraScale+ MPSoC, and Xilinx FPGA devices.

 

Note: These tips and more will be part of Adam Taylor’s presentation at the Xilinx Developer Forum that will be held in Frankfurt, Germany on January 9.

 

 

 

Image1.jpg 

 

 

 

 

  1. Design in Flexibility from the Beginning

 

 

Image2.jpg

 

 

Video Timing Controller used to detect the incoming video standard

 

 

Use the flexibility provided by the Video Timing Controller (VTC) and reconfigurable clocking architectures such as Fabric Clocks, MMCM, and PLLs.  Using the VTC and associated software running on the PS (processor system) in the Zynq SoC and Zynq UltraScale+ MPSoC, it is possible to detect different video standards from an input signal at run time and to configure the processing and output video timing accordingly. Upon detection of a new video standard, the software running on the PS can configure new clock frequencies for the pixel clock and the image-processing chain along with re-configuring VDMA frame buffers for the new image settings. You can use the VTC’s timing detector and timing generator to define the new video timing. To update the output video timings for the new standard, the VTC can use the detected video settings to generate new output video timings.

 

 

 

  1. Convert input video to AXI Interconnect as soon as possible to leverage IP and HLS

 

 

Image3.jpg 

 

 

Converting Data into the AXI Streaming Format

 

 

 

Vivado provides a range of key IP cores that implement most of the functions required by an image processing chain—functions such as Color Filter Interpolation, Color Space Conversion, VDMA, and Video Mixing. Similarity Vivado HLS can generate IP cores that use the AXI interconnect to ease integration within Vivado designs. Therefore, to get maximum benefit from the available IP and tool chain capabilities, we need to convert our incoming video data into the AXI Streaming format as soon as possible in the image-processing chain. We can use the Video-In-to-AXI-Stream IP core as an aid here. This core converts video from a parallel format consisting of synchronization signals and pixel values into our desired AXI Streaming format. A good tip when using this IP core is that the sync inputs do not need to be timed as per a VGA standard; they are edge triggered. This eases integration with different video formats such as Camera Link, with its frame-valid, line-valid, and pixel information format, for example. 

 

 

 

  1. Use Logic Debugging Resources

 

 

Image4.jpg 

 

 

 

Insertion of the ILA monitoring the output stage

 

 

 

Insert integrated logic analyzers (ILAs) at key locations within the image-processing chain. Including these ILAs from day one in the design can help speed commissioning of the design. When implementing an image-processing chain in a new design, I insert ILA’s as a minimum in the following locations:

 

  • Directly behind the receiving IP module—especially if it is a custom block. This ILA enables me to be sure that I am receiving data from the imager / camera.
  • On the output of the first AXI Streaming IP Core. This ILA allows me to be sure the image-processing core has started to move data through the AXI interconnect. If you are using VDMA, remember you will not see activity on the interconnect until you have configured the VDMA via software.
  • On the AXI-Streaming-to-Video-Out IP block, if used. I also consider connecting the video timing controller generator outputs to this ILA as well. This enables me to determine if the AXI-Stream-to-Video-Out block is correctly locked and the VTC is generating output timing.

 

When combined with the test patterns discussed below, insertion of ILAs allows us to zero in faster on any issues in the design which prevent the desired behavior.

 

 

 

  1. Select an Imager / Camera with a Test Pattern capability

 

 

Image5.jpg 

 

 

Incorrectly received incrementing test pattern captured by an ILA

 

 

 

If possible when selecting the imaging sensor or camera for a project, choose one that provides a test pattern video output. You can then use this standard test pattern to ensure the reception, decoding, and image-processing chain is configured correctly because you’ll know exactly what the original video signal looks like. You can combine the imager/camera test pattern with ILAs connected close to the data reception module to determine if any issues you are experiencing when displaying an image is internal to the device and the image processing chain or are the result of the imager/camera configuration.

 

We can verify the deterministic pixel values of the test pattern using the ILA. If the pixel values, line length, and the number of lines are as we expect, then it is not an imager configuration issue. More likely you will find the issue(s) within the receiving module and the image-processing chain.  This is especially important when using complex imagers/cameras that require several tens, or sometimes hundreds of configuration settings to be applied before an image is obtained.

 

 

  1. Include a Test Patter Generator in your Zynq SoC, Zynq UltraScale+ MPSoC, or FPGA design

 

 

Image6.jpg 

 

 

Tartan Color Bar Test Pattern

 

 

 

If you include a test-pattern generator within the image-processing chain, you can use it to verify the VDMA frame buffers, output video timing, and decoding prior to the integration of the imager/camera. This reduces integration risks. To gain maximum benefit, the test-pattern generator should be configured with the same color space and resolution as the final imager. The test pattern generator should be included as close to the start of the image-processing chain as possible. This enables more of the image-processing pipeline to be verified, demonstrating that the image-processing pipeline is correct. When combined with test pattern capabilities on the imager, this enables faster identification of any problems.

 

 

 

  1. Understand how Video Direct Memory Access stores data in memory

 

 

Image7.jpg 

 

 

 

Video Direct Memory Access (VDMA) allows us to use the processor DDR memory as a frame buffer. This enables access to the images from the processor cores in the PS to perform higher-level algorithms if required. VDMA also provides the buffering required for frame-rate and resolution changes. Understanding how VDMA stores pixel data within the frame buffers is critical if the image-processing pipeline is to work as desired when configured.

 

One of the major points of confusion when implementing VDMA-based solutions centers around the definition of the frame size within memory. The frame buffer is defined in memory by three parameters: Horizontal Size (HSize), Vertical Size (VSize). and Stride.  The two parameters that define the Horizontal Size of the image are the HSize and the stride of the image. Like VSize, which defines the number of lines in the image, the HSize defines the length of each line. However instead of being measured in pixels the horizontal size is measured in bytes. We therefore need to know how many bytes make up each pixel.

 

The Stride defines the distance between the start of one line and another. To gain efficient use of the DDR memory, the Stride should at least equal the horizontal size. Increasing the Stride introduces a gap between lines. Implementing this gap can be very useful when verifying that the imager data is received correctly because it provides a clear indication of when a line of the image starts and ends with memory.

 

These six simple techniques have helped me considerably when creating imageprocessing examples for this blog or solutions for clients and they significantly ease both the creation and commissioning of designs.

 

As I said, this is my last blog of the year. We will continue this series in the New Year. Until then I wish you all happy holidays.

 

 

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

  

 

MicroZed Chronicles hardcopy.jpg 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

 

By Adam Taylor

 

Over the last couple of weeks, we have examined how we can debug our designs using Micrium’s μC/Probe (Post 1 and Post 2) or with the JTAG to AXI Bridge. However, the best way to minimize time spent debugging is to generate high quality designs in the first place. We can then focus on ensuring that the design functionality is as specified instead of hunting bugs.

 

To improve the quality of our design, there are several things we can do that help us achieve timing closure and identify design issues and bugs:

 

  1. Review code to ensure that it not only complies with coding and design standards and to catch functional issues earlier in the design stage.
  2. Ensure compliance with device/tool-chain-recommended coding standards—for example the Xilinx Ultrafast design methodology.
  3. Correctly constrain the design for clocks, multicycle, and false paths.
  4. Analyze CDCs (Clock Domain Crossings) to ensure that all CDCs are correctly handled.
  5. Perform detailed simulation to test corner cases and boundary conditions.

 

Over my career, I have spent many hours performing code reviews, checking designs for functionality, and for compliance with coding standards and tool-chain recommendations.

 

The Blue Pearl Visual Verification Suite is an EDA tool that automates design checking over a range of different customizable rule sets including basic rules, Xilinx Ultrafast design methodology rules, and DO254. The Blue Pearl tools also perform detailed analysis of clocks, counters, state machines, CDCs, paths, and constraints. All of this checking helps engineers gain a better understanding of the functional side of their design. In short, this is a very useful tool set to have in our tool box to improve design quality. Let’s look at how this tool integrates with the Xilinx Vivado design environment and how we can use it on a simple design.

 

With Blue Pearl installed, the first step is to integrate it with Vivado. To do this we use the Xilinx TCL Store to install the Blue Pearl Visual Verification Suite.

 

 

Image1.jpg

 

Installing Blue Pearl via the Xilinx TCL Store

 

 

 

Once Blue Pearl is installed, the next step is to create two custom commands. The first command allows us to open a new Blue Pearl project from an open Vivado project. The second command allows updates from Vivado into the Blue Pearl project.

 

We create these custom commands by selecting tools->custom commands->customizes commands.

 

 

Image2.jpg

 

 

Open the Command Customization dialog

 

 

 

This opens a dialog that allows you to create custom commands. For each command, we need to define the callable TCL procedures in the Blue Pearl Visual Verification Suite.

 

 

Image3.jpg 

 

 

Creating the Launch BPS command

 

 

For the “launch BPS” command, we need to use the command:

 

 

::tclapp::bluepearl::bpsvvs::launch_bps

 

 

 

Image4.jpg 

 

Creating the Update Command

 

 

 

For the update BPS command, we call the following command:

 

 

::tclapp::bluepearl::bpsvvs::update_vivado_into_bps

 

 

 

Once you have completed the addition of the customized commands, you will see two new buttons on the Vivado tool bar.

 

With the integration completed, we can now use Blue Pearl to analyze and improve the quality of our design if we identify issues that need analysis. Clicking the newly created “launch Blue Pearl” command within a Vivado project opens a new Blue Pearl project for analysis.

 

As it loads the Vivado design, Blue Pearl checks the code for synthesis and identifies any black boxes. Any syntax errors encountered will be flagged for correction before further analysis can be performed.  

 

There are an extensive number of checks and analysis that can be run on the loaded design, ranging from basic checks to DO254 compliance. There are so many possible checklist items that it might take a little time to select the checks that are important to you. However, once you’ve specified the checks you want, you can save the rules and use them across multiple projects. What is interesting is the tool also reports if the check has been run and not just its status as pass of fail. This explicit feedback mechanism removes the ability of designers to achieve compliance by omission. (And that’s a good thing.)

 

 

Image5.jpg 

 

Blue Pearl Environment

 

 

Image6.jpg 

 

Design Check configuration

 

 

As an example, I loaded a project that I am working on to see what the design check and analysis reports look like. The design is simple. It decodes a MIPI stream to frame sync, line sync, and pixel values. While this is a simple design, Blue Pearl still identified a few issues within the code that need consideration to see if they present an issue or not.

 

The first potential issue identified was in the If/Then/Else (ITE) analysis. The design contains a VHDL process that decodes the MIPI header type. This process is written using an if / elsif structure, which implies a priority encoder. Furthermore, to differentiate between five different header commands, the length of the priority encoder contains a five deep if / elsif structure. Blue Pearl calls this a length of five. By default, Blue Pearl generates warnings on lengths greater than 3. In this case no priority required and a case statement would provide better synthesis results because there is no need to consider input priority. Although each application is different, you as the engineer need to use your own experience and knowledge of the design to decide whether or not priority is needed.

 

Along with reporting the length of the if structure, ITE analysis also analyzes the number of conditions within a statement. This is important when an if statement contains several conditions because additional conditions require additional logic resources and routing, which will impact our timing performance.

 

 

Image7.jpg 

 

Identification of if / then /else large length

 

 

State machines are of course used in designs for control structures. Complex control structures requires large state machines, which can be difficult to follow in the RTL. As part of its analysis, Blue Pearl creates visualizations of state machines within a design. This visualization details the transitions among states, along with identifying any unreachable states. I found this capability very useful not only in debugging and verifying the behavior of my own state machines, but also for visualizing third-party designs. This graphical capability definitely helps me understand the designer’s intent.

 

 

 

Image8.jpg 

 

FSM Analysis Viewer

 

 

 

Blue Pearl also provides the ability to visualise CDCs and paths and to monitor fan out within a design. These features allow us to identify places in the design where we might want to add CDC-mitigation measures such as re-timing or pipeline registers within the design.

 

 

Image9.jpg 

 

Clock Domain Crossing Analysis

 

 

 

Image10.jpg 

 

Path Analysis

 

 

 

Image11.jpg 

 

 

Flip-Flop Fan out reporting

 

 

 

Having touched lightly on the capabilities of Blue Pearl, I am impressed with the results once you have taken the time to set the correct checks and analysis. The analysis provided allows you to catch potential issues earlier in the design cycle, which should reduce the time spent in the lab hunting bugs. In turn, this frees us to spend more of our time testing functionality.

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

 

  

MicroZed Chronicles hardcopy.jpg 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

Adam Taylor’s MicroZed Chronicles, Part 226: Debugging FPGA hardware with the JTAG-to-AXI Bridge

by Xilinx Employee ‎11-27-2017 11:22 AM - edited ‎11-27-2017 11:27 AM (23,768 Views)

 

By Adam Taylor

 

Over the last two instalments of this blog series, we have explored Micrium’s μC/Probe and using it’s virtual dashboard and oscilloscope capabilities to debug code on the Zynq SoC’s PS. (See Post 1 and Post 2.) Then, I started thinking about how we might use μC/Probe to debug hardware in the Zynq SoC’s PL. The most obvious choice is to insert an integrated logic analyzer (ILA) IP core into the hardware design to observe AXI, AXI Stream, or native logic signals as we have done several times in the MicroZed Chronicles series. Similarly, if we wish to change a signal’s value at run time, we can use the virtual input output IP core to toggle the state of a signal or bus. Reading or modifying the contents of an AXI device within the PL will be a little more complicated, but it can be done by using the JTAG-to-AXI Bridge to connect into the AXI network and then reading or writing to any peripheral on the bus using Vivado’s hardware manager and a TCL script.

 

To debug designs that contain embedded processing, we could of course write a simple software application for the processor that reads and writes registers in the PL. However this is not always possible, especially if there are separate hardware- and software-development teams. FPGA development teams are sometimes unused to developing embedded software so it can be quicker and easier to use the JTAG-to-AXI bridge for testing out basic hardware functionality before passing the design over to the software development team.

 

There is also another very important use case for the JTAG-to-AXI bridge if the design does not contain an embedded processing core. (For example, it’s not a Zynq SoC that you’re debugging and you haven’t instantiated a MicroBlaze processor in your FPGA.) AXI is often used to configure and connect IP blocks available via the Vivado IP library. Even if you don’t have an embedded processor available, the JTAG-to-AXI bridge enables fast hardware-in-the-loop testing to ensure that the block performs as intended.

 

 

Image1.jpg

 

Inserting the JTAG-to-AXI Bridge

 

 

Regardless of the presence or absence of an embedded processing element, we can use the JTAG to AXI Bridge for:

 

  • Hardware-in-the-loop validation of IP blocks with the JTAG-to-AXI bridge acting as a traffic generator.
  • During initial board commissioning, we can create TCL scripts to test hardware interfaces, which frees us from depending on software development for hardware debugging.
  • For more formal validation, we can use the JTAG-to-AXI bridge to extract information from memories, making that information available for later analysis.

 

Including the JTAG-to-AXI core in the design is simple and requires only connections to the AXI network, clock, and reset. When it comes to IP configuration, all we need to do is define the preferred variety of AXI: AXI4 or AXI Lite. In the JTAG-to-AXI bridge configuration dialog, we can also define the read and write queue lengths. These define the number of TCL commands that the bridge can queue. I have defined the maximum queue length of 16 commands in this example because this will simplify the script I need to write.

 

 

 

Image2.jpg

 

Customizing the JTAG-to-AXI Bridge

 

 

 

For the example in this blog post, I have implemented a Zynq-based design with the PS connected via an AXI interconnect to an AXI QSPI module, which I’ve configured as a standard SPI master. To enable the JTAG-to-AXI bridge to also be able to access the AXI QSPI module, I added a second slave port to the AXI interconnect, and then connected the JTAG-to-AXI bridge.

 

Once this is complete we can use the address editor within Vivado, which will clearly show the peripherals that can be addressed by the Zynq SoC’s PS and which peripherals the JTAG-to-AXI bridge can address. In this case, both the PS and JTAG-to-AXI bridge can address the QSPI device on the AXI bus:

 

 

Image3.jpg

 

 

Addressable AXI Memory Peripherals for both the PS and JTAG to AXI Bridge

 

 

 

When we use the JTAG-to-AXI bridge, we must also pay attention to our clocking and reset architecture. If we plan to configure the PS before we use the bridge, the PS-provided fabric clocks and resets are suitable. However, if we intend to use the bridge before we configure the PS side of the Zynq SoC, we will need to source a clock from within the PL and ensure that the PS does not provide the AXI network reset. Failure to do this will result in the Vivado hardware manager not being able to detect the JTAG-to-AXI bridge until the PS has been configured.

 

Vivado will report the error as:

 

ERROR: [Xicom 50-38] xicom: AXI TRANSACTION TIMED OUT.

 

 

We use the hardware manager in Vivado to communicate with the AXI network over JTAG by issuing commands through the TCL window. The commands are defined within PG174: JTAG to AXI Master v1.2 LogiCORE IP Product Guide.

 

There are two basic commands: create_hw_axi_txn and run_hw_axi. We use the first command to create a number of AXI transactions, both read and write. We use the second command to execute the transactions, performing the reads and writes as necessary:

 

 

 

Image4.jpg

 

TCL Script used to configure and write out using SPI

 

 

To demonstrate the power of the JTAG-to-AXI bridge, I created a simple script that configures the QSPI as a master and transmits several words. The scripting commands allow us to assign a name to each transaction that we define. We can then call these named transactions multiple times as needed within the script. This is a very nice feature. In the image of the script above, these names are identified by the names wr_txn_liteX. These named transactions are then called in a different order when executed in the hardware. For example, I did this to first fill up the QSPI TX FIFO before I enabled Master transmissions.

 

I ran this example on a Digilent Arty Z7 connected to a Digilent Digital Discovery logic analyzer and pattern generator to monitor the SPI transactions. Here’s what I saw in the Digital Discovery’s Waveforms user interface:

 

 

 

Image6.jpg 

 

Captured SPI waveform generated by the JTAG-to-AXI Bridge

 

 

 

This example demonstrates the ease with which we can implement the JTAG-to-AXI bridge and with which we can create scripts to interact with the AXI networks within the PL of the design. Hopefully these hints will allow you to debug and verify your design with greater ease.

 

 

You can find the example source code on GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

 

  

MicroZed Chronicles hardcopy.jpg 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

The TEWS TECHNOLOGIES TXMC638 24-channel, 16-bit, 5Msample/sec XMC card couples 24 ADC channels (implemented with LTC2323-16 ADCs) with a Xilinx Kintex-7 FPGA (the K160T, K325T, or K410T) and 1Gbyte of on-board DDR3 SDRAM to create a full programmable data acquisition system capable of taking 5M 16-bit samples/sec per channel. Here’s a block diagram:

 

 

TEWS TXMC638 24-channel ADC Card Block Diagram.jpg 

 

 

 

The card’s differential ADC inputs handle ±2.5V signals, resulting in a ±5 V differential voltage range. The card also offers s 64 digital I/O pins on the XMC P14 connector, which can be used as 64 single-ended LVCMOS24 or as 32 differential LVDS25 interfaces, and four 12.5Gbps GTX transceiver pins on P16.

 

Per the TEWS data sheet: You can develop your own data-acquisition and control applications for the TXMC638 with the Xilinx Vivado Design Suite.

 

 

 

TEWS TXMC638 24-channel ADC Card.jpg

 

TEWS TECHNOLOGIES TXMC638 24-channel, 16-bit, 5Msample/sec XMC card

 

 

 

Please contact TEWS TECHNOLOGIES for more information about the TXMC638 24-channel, 16-bit, 5Msample/sec XMC card.

 

If you’ve got some high-speed RF analog work to do, VadaTech’s new AMC598 and VPX598 Quad ADC/Quad DAC modules appear to be real workhorses. The four 14-bit ADCs (using two AD9208 dual ADCs) operate at 3Gsamples/sec and the quad 16-bit DACs (using four AD9162 or AD9164 DACs) operate at 12Gsamples/sec. You’re not going to drive those sorts of data rates over the host bus so the modules have local memory in the form of three DDR4 SDRAM banks for a total of 20Gbytes of on-board SDRAM. A Xilinx Kintex UltraScale KCU115 FPGA (aka the DSP Monster, the largest Kintex UltraScale FPGA family member with 5520 DSP slices that give you an immense amount of digital signal processing power to bring to bear on those RF analog signals) manages all of the on-board resources (memory, analog converters, and host bus) and handles the blazingly fast data-transfer rates allowing you to create RF waveform generators and advanced RF-capture systems for applications including communications and signal intelligence (COMINT/SIGINT), radar, and electronic warfare using Xilinx tools including the Vivado Design Suite HLx Editions and the Xilinx Vivado System Generator for DSP, which can be used in conjunction with MathWorks’ MATLAB and the Simulink model-based design tool.

 

Here’s a block diagram of the AMC598 module:

 

 

VadaTech AMC598 Quad ADC and Quad DAC Block Diagram.jpg

 

 

 VadaTech AMC598 Quad ADC/Quad DAC Block Diagram

 

 

 

And here’s a photo of the AMC598 Quad ADC/Quad DAC module:

 

 

 

VadaTech AMC598 Quad ADC and Quad DAC.jpg

 

 

VadaTech AMC598 Quad ADC/Quad DAC

 

 

 

Note: Please contact VadaTech directly for more information about the AMC598 and VPX598 Quad ADC/Quad DAC modules.

 

 

Adam Taylor’s MicroZed Chronicles, Part 225: Advanced System-Level Debugging with Micrium’s μC/Probe

by Xilinx Employee ‎11-20-2017 11:02 AM - edited ‎11-20-2017 12:24 PM (25,668 Views)

 

By Adam Taylor

 

 

In my last blog, I showed you how to use Micrium’s μC/Probe to debug Zynq-based embedded systems in real time without impacting target operation. The example I presented last time demonstrated the basics of how you can use μC/Probe to monitor variables in target memory. While this capability is very useful, μC/Probe has advanced debugging features that can aid us even further to debug our design and provides us with a real-time user interface that lets us control and monitor internal system variables using a variety of graphical interface components.

 

I want to explore some of these advanced features in this blog including:

 

  • Modifying global variables on the target processor during run time.
  • Implementing an eight-channel oscilloscope, allowing us to monitor multiple variables at run time.

 

We need to add some simple code to the target to implement the oscilloscope. We do not need to add code if we only wish to modify variables at run time.

 

To create a more in-depth example, I am going to use the XADC to receive an external signal on the Vp/Vn inputs and then output a copy of the received signal using a Pmod DA4 DAC. The amplitude of the signal output by the Pmod DA4 will be controlled by a software variable which in turn will be controlled using μC/Probe.

 

To further demonstrate the μC/Probe’s interactive capabilities with GPIO, I have configured the eight LEDs and four DIP switches on the MicroZed IO carrier Card (IOCC) in a manner that will ease interfacing with μC/Probe.

 

To create this blog’s example in the Vivado design, I instantiated an XADC wizard, routed 12 Zynq PS GPIO signals to the EMIO, and enabled SPI 0 in the Zynq PS. I routed the SPI 0 signals over EMIO as well. I then connected EMIO SPI outputs to a Pmod bridge, as the Pmod DA4 has no defined driver.

 

 

Image1.jpg

 

Vivado Block Design for the Advanced μC/Probe Example

 

 

 

With the Vivado design built and exported to SDK, I needed to create a software application to perform the following:

 

  • Initialize the Zynq SoC’s XADC, GPIO, SPI, and Interrupt controllers
  • Configure the XADC to receive data from the Vp/Vn inputs
  • Configure the SPI controller for master operation with a clock rate as close to 50MHz as possible. (Note that 50MHz is the maximum frequency for the DAC on the Pmod DA4.)
  • Configure the interrupt controller for handling SPI interrupts.
  • Configure the GPIO with eight outputs to drive the LEDs and four inputs to read the switches.
  • Within the main program loop, read a value from the XADC, scale it for the appropriate output value, and write the computed value to the Pmod DA4 before updating the status of the LEDs and switches.

 

This sequence gives us the ability to read in a XADC values and output a scaled version over the Pmod DA4. Within the application, any variable we wish to be able to update using μC/Probe needs to be declared as a global variable. Therefore, in this example, the XADC values, the value to be written to the Pmod DA4, and the LED and switch variables are all declared globally.

 

To use the oscilloscope, we need to add some target code. This code is defined in three files provided on the Micrium μC/Probe download page. This download also contains the code necessary for implementing a μC/Probe interface to a target using RS232, USB, or Ethernet instead of JTAG.

 

 

The three files we need to map into our design are found under the directory:

 

 

<download path>\Micrium-Probe-TargetCode-410\Micrium\Software\uC-Probe\Target\Scope

 

 

The first two files, probe_scope.c and probe_scope.h, define the oscilloscope’s behavior and should be left unchanged. Add them to SDK as new include directory paths. The file within the cfg directory, probe_scope_cfg.h, defines the scope configuration. This file allows us to define the number of active channels, the maximum number of samples, support for 16 and 32-bit variables and the sampling frequency default. Copy this file into your working directory.

 

To make use of the files we have just added, call probe_scope_init() in the main application to initialize the oscilloscope at program start. Samples are then captured using the function probe_scope_sampling(), which is called when we wish to take a sample for the oscilloscope.

 

In this example I have called the sampling function at the end of the XADC/Pmod DA4 write loop. However, it can also be used with a timer and called during the ISR.

 

With the target code included and the main application ready, I built a target application boot file and booted the MicroZed.

 

I used my Digilent Analog Discovery module (since superseded by the Analog Discovery 2 module) to provide a stimulus waveform for the XADC and to capture the output from the Pmod DA4. The Analog Discovery module allows me to change the stimulus as desired and to monitor the Pmod DA4 output. Of course the waveforms I see using the Analog Discovery module will also be displayed on the μC/Probe oscilloscope.

 

 

Image2.jpg

 

Test Set up with the MicroZed, MicroZed IO Carrier Card, Segger JTAG probe, and Analog Discovery module

 

 

 

I created a simple dashboard within the μC/Probe application that allows me to change the output signal’s amplitude and the LEDs’ status and to monitor the DIP-switch settings on the IO Carrier Card. Doing all of this is simple with the μC/Probe application. We do the same as we did before: drop the graphical component(s) we desire on the dashboard and then associate the appropriate variables in the ELF. Only this time, instead of just monitoring variable values we will also manipulate them in the case of the output scale and LEDs.

 

To scale the Pmod DA4 output, I placed a horizontal slider and associated the slider with the float variable “scale” from the ELF. I also changed the slider’s full-scale value from 100 to 1.0 using the properties editor.

 

For each of the LEDs, I used graphical toggle buttons and associated each button with appropriate LED variable within the ELF. Finally, for each of the switches, I placed a graphical LED on the dashboard and configured the graphical LED to illuminate when its associated variable was set.

 

To create an oscilloscope within the dashboard, right click on the project under the workspace explorer and select the oscilloscope. This will create a new oscilloscope tab on the dashboard. As before, we can associate the variable from the ELF with each of the enabled channels by dragging and dropping as we do for any application.

 

 

Image3.jpg 

 

Adding the Oscilloscope Tab to μC/Probe’s dashboard

 

 

 

Image4.jpg

 

Associating an ELF variable with Oscilloscope channel

 

 

 

To be able to see both tabs on the dashboard at the same time, I combed them as shown below. I did this by right-clicking on a tab and selecting “tile horizontally.”

 

 

 

Image5.jpg 

 

Control and Oscilloscope combined on one μC/Probe dashboard

 

 

 

This approach does hide the oscilloscope control panel. However, you can open the oscilloscope control panel by clicking on “Scope Settings” at the bottom of the display. Doing so will open a traditional Oscilloscope control panel where you can set the triggering, timebase, etc.

 

 

Image8.jpg

 

Oscilloscope control panel, opened

 

 

 

Putting all of this together allowed me to control the amplitude of the output waveform on the Pmod DA4 using the slider and to control the LED status using the toggle buttons. I can of course see changes in both the input and output stimulus using μC/Probe’s oscilloscope and view the switch status via the simulated LEDs on the dashboard.

 

To demonstrate all of this in action I created a very short video:

 

 

 

 

 

 

 

 

The screen shots below also demonstrate the oscilloscope working.

 

 

Image6.jpg

 

Capturing the initial sine wave input

 

 

 

Image7.jpg 

 

Output value to the DAC with the slider in the nominal position

 

 

 

I will be using μC/Probe in future blogs when I want to interact with the application software running on my Zynq target. I continue to be very impressed with μC/Probe’s capabilities.

 

 

The following links were useful in developing this blog:

 

 

 

 

You can find the example source code on the GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

 

  

MicroZed Chronicles hardcopy.jpg 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

By Adam Taylor

 

Being able to see internal software variables in our Zynq-based embedded systems in real time is extremely useful during system bring-up and for debugging. I often use an RS-232 terminal during commissioning to report important information like register values from my designs and have often demonstrated that technique in previous blog posts. Information about variable values in a running system provides a measure of reassurance that the design is functioning as intended and, as we progress though the engineering lifecycle, provides verification that the system is continuing to work properly. In many cases we will develop custom test software that reports the status of key variables and processes to help prove that the design functions as intended.

 

This approach works well and has for decades—since the earliest days of embedded design. However, a better solution for Zynq-based systems that allows us to read the contents of the processor memory and extract the information we need without impacting the target’s operation and without the need to add a single line of code to the running target now presents itself. It’s called μC/Probe and it’s from Micrium, the same company that has long offered the µC/OS RTOS for a wide variety of processors including the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC.

 

Micrium’s μC/Probe tool allows us to create custom graphical user interfaces that display the memory contents of interest in our systems designs.  With this capability, we can create a virtual dashboard that provides control and monitoring of key system parameters and we can do this very simply by dragging and dropping indicator, display, and control components onto the dashboard and associating them with variables in the target memory. In this manner it is possible to both read and write memory locations using the dashboard.

 

When it comes to using Micrium’s μC/Probe tool with our Zynq solution, we have choices regarding interfacing:

 

 

  • Use a Segger J Link JTAG Pod. In this case, the target system requires no additional code unless we wish to use an advanced μC/Probe feature such as an oscilloscope.

 

  • Use RS-232, USB, or TCP/IP. In this case we do not need to use JTAG. However we do need to add some code to the target embedded system. Micrium supplies sample code for us to use.

 

 

For this first example, I am going to use a Segger J Link JTAG pod to create a simple example and demonstrate the capabilities.  However, the second interface option proves useful for Zynq-based boards that lack a separate JTAG header and instead use a USB-to-JTAG device or if you do not have a J Link probe. We will look at using Micrium’s μC/Probe tool with the second interface option in a later blog.

 

Of course, the first thing we need to do is create a test application and determine the parameters to observe. The Zynq SoC’s XADC is perfect for this because it provides quantized values of the device temperature and voltage rails. These are ideal parameters to monitor during an embedded system’s test, qualification, and validation so we will use these parameters in this blog.

 

The test application example will merely read these values in a continuous loop. That’s a very simple program to write (see MicroZed Chronicles 7 and 8 for more information on how to do this). To understand the variables that we can monitor or interact with using μC/Probe, we need to understand that the tool reads in and parses the ELF produced by SDK to get pointers to the memory values of interest. To ensure that the ELF can be properly read in and parsed by μC/Probe, the ELF needs to contain debugging data in the DWARF format. That means that within SDK, we need to set the compile option –gdwarf-2 to ensure that we use the appropriate version of DWARF. Failure to use this switch will result in μC/Probe being unable to read and parse the generated ELF.

 

We set this compile switch in the C/C++ build settings for the application, as shown below:

 

 

 

Image1.jpg 

 

 

Setting the correct DWARF information in Xilinx SDK

 

 

With the ELF file properly created, I made a bootable SD card image of the application and powered on the MicroZed. To access the memory, I connected the Segger J Link, which uses a Xilinx adaptor cable to mate with the MicroZed board’s JTAG connector.

 

 

 

Image2.jpg

 

 

MicroZed with the Segger J Link

 

 

 

All I needed to do now was to create the virtual dashboard. Within μC/Probe, I loaded the ELF by clicking on the ELF button in the symbol browser. Once loaded, we can see a list of all the symbols that can be used on μC/Probe’s virtual dashboard.

 

 

 

Image3.jpg 

 

 

ELF loaded, and available symbols displayed

 

 

 

For this example, which is monitoring Zynq SoC’s XADC internal signals including device temperature and power, I added six Numeric Indicators and one Angular Gauge Quadrant. Adding graphical elements to the data screen is very simple. All you need to do this find the display element you desire in the toolbox, drag onto the data screen, and drop it in place.

 

 

 

Image4.jpg 

 

 

Adding a graphical element to the display

 

 

 

To display information from the running Zynq SoC on μC/Probe’s Numeric Indicators and on the Gauge, I needed to associate each indicator and gauge with a variable in memory. We use the Symbol viewer to do this. Select the variable you want and drag it onto the display indicator as shown below.

 

 

 

Image5.jpg

 

 

Associating a variable with a display element

 

 

 

If you need to scale the display to use the full variable range or otherwise customize it, hold the mouse over the appropriate display element and select the properties editor icon on the right.  The properties editor lets you scale the range, enter a simple transfer function, or increase the number of decimal places if desired.

 

 

Image6.jpg

 

 

Formatting a Display Element

 

 

 

Once I’d associated all the Numeric Indicators and the Gauge with appropriate variables but before I could run the project and watch the XADC values in real time, one final thing remained: I needed to inform the project how I wished to communicate with the target and select the target processor. For this example, I used Segger’s J Link probe.

 

 

 

Image7.jpg

 

 

Configuring the communication with the target

 

 

 

With this complete. I clicked “run” and captured the following video of the XADC data being captured and displayed by μC/Probe.

 

 

 

 

 

 

All of this was pretty simple and very easy to do. Of course, this short has just scratched the surface of the capabilities of Micrium’s μC/Probe tool. It is possible to implement advanced features such as oscilloscopes, bridges to Microsoft Excel, and communication with the target using terminal windows or more advanced interfaces like USB. In the next blog we will look at how we can use some of these advanced features to create a more in-depth and complex virtual dashboard.

 

I think I am going to be using Micrium’s μC/Probe tool in many blogs going forward where I want to interact with the Zynq as well.

 

 

 

You can find the example source code on the GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg  

 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

By Adam Taylor

 

 

So far, all of my image-processing examples have used only one sensor and produce one video stream within the Zynq SoC or Zynq UltraScale+ MPSoC PL (programmable logic). However, if we want to work with multiple sensors or overlay information like telemetry on a video frame, we need to do some video mixing.

 

Video mixing merges several different video streams together to create one output stream. In our designs we can use this merged video stream in several ways:

 

  1. Tile together multiple video streams to be displayed on a larger display. For example, stitching multiple images into a 4K display.
  2. Blend together multiple image streams as vertical layers to create one final image. For example, adding an overlay or performing sensor fusion.

 

To do this within our Zynq SoC or Zynq UltraScale+ MPSoC system, we use the Video Mixer IP core, which comes with the Vivado IP library. This IP core mixes as many as eight image streams plus a final logo layer. The image streams are provided to the core via AXI Streaming or AXI memory-mapped inputs. You can select which one on a stream-by-stream basis. The IP Core’s merged-video output uses an AXI Stream.

 

To give a demonstration of the how we can use the video mixer, I am going to update the MiniZed FLIR Lepton project to use the 10-inch touch display and merge a second video stream using a TPG. Using the 10-inch touch display gives me a larger screen to demonstrate the concept. This screen has been sitting in my office for a while now so it’s time it became useful.

 

Upgrading to the 10-inch display is easy. All we need to do in the Vivado design is increase the pixel clock frequency (fabric clock 2) from 33.33MHz to 71.1MHz. Along with adjusting the clock frequency, we also need to set the ALI3 controller block to 71.1MHz.

 

Now include a video mixer within the MiniZed Vivado design. Enable layer one and select a streaming interface with global alpha control enabled. Enabling a layer’s global alpha control allows the video mixer to blend the alpha on a pixel-by-pixel basis. This setting allows pixels to be merged according to the defined alpha value rather than just over riding the pixel on the layer beneath. The alpha value for each layer ranges between 0 (transparent) and 1 (opaque). Each layer’s alpha value is defined within an 8-bit register.

 

 

Image1.jpg 

 

 

Insertion of the Video Mixer and Video Test Pattern Generator

 

 

 

Image2.jpg

  

Enabling layer 1, for AXI streaming and Global Alpha Blending

 

 

The FLIR camera provides the first image stream. However we need a second image stream for this example, so we’ll instantiate a video TPG core and connect its output to the video mixer’s layer 1 input. For the video mixer and test pattern generator, be sure to use the high-speed video clock used in the image-processing chain. Build the design and export it to SDK.

 

We use the API xv_mix.h to configure the video mixer in SDK. This API provides the functions needed to control the video mixer.

 

The principle of the mixer is simple. There is a master layer and you declare the vertical and horizontal size of this layer using the API. For this example using the 10-inch display, we set the size to 1280 pixels by 800 lines. We can then fill this image space using the layers, either tiling or overlapping them as desired for our application.

 

Each layer has an alpha register to control blending along with X and Y origin registers and height and width registers. These registers tell the mixer how it should create the final image. Positional location for a layer that does not fill the entire display area is referenced from the top left of the display. Here’s an illustration:

 

 

 

Image3.jpg 

 

Video Mixing Layers, concept. Layer 7 is a reduced-size image in this example.

 

 

To demonstrate the effects of layering in action, I used the test pattern generator to create a 200x200-pixel checkerboard pattern with the video mixer’s TPG layer alpha set to opaque so that it overrides the FLIR Image. Here’s what that looks like:

 

 

 

Image4.jpg

 

Test Pattern FLIR & Test Pattern Generator Layers merged, test pattern has higher alpha.

 

 

 

Then I set the alpha to a lower value, enabling merging of the two layers:

 

 

 

Image5.jpg 

 

Test Pattern FLIR & Generator Layers merged, test pattern alpha lower.

 

 

 

We can also use the video mixer to tile images as shown below. I added three more TPGs to create this image.

 

 

 

Image6.jpg 

 

Four tiled video streams using the mixer

 

 

The video mixer is a good tool to have in our toolbox when creating image-processing or display solutions. It is very useful if we want to merge the outputs of multiple cameras working in different parts of the electromagnetic spectrum. We’ll look at this sort of thing in future blogs.

 

 

You can find the example source code on the GitHub.

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

  

MicroZed Chronicles hardcopy.jpg 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

Programmable logic is proving to be an excellent, flexible implementation medium for neural networks that gets faster and faster as you go from floating-point to fixed-point representation—making it ideal for embedded AI and machine-learning applications—and the latest proof point is a recently published paper written by Yufeng Hao and Steven Quigley in the Department of Electronic, Electrical and Systems Engineering at the University of Birmingham, UK. The paper is titled “The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA” and it describes a successful implementation and training of a fixed-point Deep Recurrent Neural Network (DRNN) using the Python programming language; the Theano math library and framework for multi-dimensional arrays; the open-source, Python-based PYNQ development environment; the Digilent PYNQ-Z1 dev board; and the Xilinx Zynq Z-7020 SoC on the PYNQ-Z1 board. Using a Python DRNN hardware-acceleration overlay, the two-person team achieved 20GOPS of processing throughput for an NLP (natural language processing) application with this design and outperformed earlier FPGA-based implementation by factors ranging from 2.75x to 70.5x.

 

Most of the paper discusses NLP and the LM (language model), “which is involved in machine translation, voice search, speech tagging, and speech recognition.” The paper then discusses the implementation of a DRNN LM hardware accelerator using Vivado HLS and Verilog to synthesize a custom overlay for the PYNQ development environment. The resulting accelerator contains five Process Elements (PEs) capable of delivering 20 GOPS in this application. Here’s a block diagram of the design:

 

 

 

PYNQ DRNN Block Diagram.jpg

 

DRNN Accelerator Block Diagram

 

 

 

There are plenty of deep technical details embedded in this paper but this one sentence sums up the reason for this blog post about the paper: “More importantly, we showed that a software and hardware joint design and simulation process can be useful in the neural network field.” This statement is doubly true considering that the PYNQ-Z1 dev board sells for $229.

 

 

The Xilinx PYNQ Hackathon wrap video just went live.

by Xilinx Employee on ‎11-01-2017 12:05 PM (9,838 Views)

 

Twelve student and industry teams competed for 30 straight hours in the Xilinx Hackathon 2017 competition in early October and the 3-minute wrap video just appeared on YouTube. The video shows a lot of people having a lot of fun with the Zynq-based Digilent PYNQ-Z1 dev board and Python-based PYNQ development environment:

 

 

 

 

In the end, the prizes:

 

 

 

  • The “Murphy’s Law” prize for dealing with insurmountable circumstances went to Team Harsh Constraints.

 

  • The “Best Use of Programmable Logic” prize went to Team Caffeine.

 

  • The “Runner Up” prize went to Team Snapback.

 

  • The “Grand Prize” went to Team Questionable.

 

 

For detailed descriptions of the Hackathon entries, see “12 PYNQ Hackathon teams competed for 30 hours, inventing remote-controlled robots, image recognizers, and an air keyboard.”

 

 

And a special “Thanks!” to Sparkfun for supplying much of the Hackathon hardware. Sparkfun is headquartered just down the road from the Xilinx facility in Longmont, Colorado.

 

 

 

 

 

Xilinx has a terrific tool designed to get you from product definition to working hardware quickly. It’s called SDSoC. Digilent has a terrific dev board to get you up and running with the Zynq SoC quickly. It’s the low-cost Arty Z7. A new blog post by Digilent’s Alex Wong titled “Software Defined SoC on Arty Z7-20, Xilinx ZYNQ evaluation board” posted on RS Online’s DesignSpark site gives you a detailed, step-by-step tutorial on using SDSoC with the Digilent Arty S7. In particular, the focus here is on the ease of moving functions from software running on the Zynq SoC’s Arm Cortex-A9 processors to the Zynq SoC’s programmable hardware using Vivado HLS, which is embedded in SDSoC. That’s so that you can get the performance benefit of hardware-based task execution.

 

 

 

Digilent Arty Z7.jpg

 

Digilent’s Arty Z7 dev board

 

 

 

 

 

Adam Taylor’s MicroZed Chronicles Part 222, UltraZed Edition Part 20: Zynq Watchdogs

by Xilinx Employee ‎10-30-2017 09:49 AM - edited ‎10-30-2017 09:50 AM (11,181 Views)

 

By Adam Taylor

 

As engineers we cannot assume the systems we design will operate as intended 100% of the time. Unexpected events and failures do occur. Depending upon the application, the protections implemented against these unexpected failures vary. For example, a safety-critical system used for an industrial or automotive application requires considerably more failure-mode analysis and much better protection mechanisms than a consumer application. One simple protection mechanism that can be implemented quickly and simply in any application is the use of a watchdog and this blog post describes the three watchdogs in a Zynq UltraScale+ MPSoC.

 

Watchdogs are intended to protect the processor against the software application crashing and becoming unresponsive. A watchdog is essentially a counter. In normal operation, the application software prevents this counter from reaching its terminal count by resetting it periodically. Should the application software crash and allow the watchdog to reach its terminal count by failing to reset the counter, the watchdog is said to have expired.

 

Upon expiration, the watchdog generates a processor reset or non-maskable interrupt to enable the system to recover. The watchdog is designed to protect against software failures so it must be implemented physically in silicon. It cannot be a software counter for reasons that should be obvious.

 

Preventing the watchdog’s expiration can be complicated. We do not want crashes to be masked resulting in the watchdog’s failure to trigger. It is good practice for the application software to restart the watchdog in the main body of application. Using an interrupt service routine to restart the watchdog opens the possibility of the main application crashing but the ISR still being serviced. In that situation, the watchdog will restart without a crash recovery.

 

The Zynq UltraScale+ MPSoC provides three watchdogs in its processing system (PS):

 

  • Full Power Domain (FPD) Watchdog protecting the APU and its interconnect
  • Low Power Domain (LPD) Watchdog protecting the RPU and its interconnect
  • Configuration and Security Unit (CSU) Watchdog protecting the CSU and its interconnect

 

The FPD and LPD watchdogs can be configured to generate a reset, an interrupt, or both should a timeout occur. The CSU can only generate an interrupt, which can then be actioned by the APU, RPU, or the PMU. We use the PMU to manage the effects of a watchdog timeout, configuring it via to act via its global registers.

 

The FPD and LPD watchdogs are clocked either from an internal 100MHz clock or from an external source connected to the MIO or EMIO. The FPD and LPD watchdogs can output a reset signal via MIO or EMIO. This is helpful if we wish to alert functions in the PL that a watchdog timeout has occurred.

 

Each watchdog is controlled by four registers:

 

  • Zero Mode Register – This control register enables the watchdog and enables generation of reset and interrupt signals along with the ability to define the reset and interrupt pulse duration.
  • Counter Control Register – This counter configuration register sets the counter reset value and clock pre-scaler.
  • Restart Register – This write-only register takes a specific key to restart the watchdog.
  • Status Register – This read-only register indicates if the watchdog has expired.

 

To ensure that writes to the Zero Mode, Counter Control, and Restart registers are intentional and not the result of an incorrect software operation, write accesses to these registers require that specific keys, different for each register, must be included in the written data word for each write take effect.

 

 

Image1.jpg

 

Zynq UltraScale+ MPSoC Timer and Watchdog Architecture

 

 

 

To include the FPD or LPD watchdogs in our design, we need to enable them in Vivado. You do so using the I/O configuration tab of the MPSoC customization dialog.

 

 

 

Image2.jpg

 

Enabling the SWDT (System Watchdog Timer) in the design

 

 

 

For my example in this blog post, I enabled the external resets and connected them to an ILA within the PL so that I can capture the reset signal when it’s generated.

 

 

Image3.jpg

 

Simple Test Design for the Zynq UltraScale+ MPSoC watchdog

 

 

 

To configure and use the watchdog, we use SDK and the API defined in xwdtps.h. This API allows us to configure, initialize, start, restart, and stop the selected watchdog with ease.

 

To use the watchdog to its fullest extent, we also need to configure the PMU to respond to the watchdog error. This is simple and requires writes to the PMU Error Registers (ERROR_SRST_EN_1 and ERROR_EN_1) enabling the PMU to respond to watchdog timeout. This will also cause the PS to assert its Error OUT signal and will result in LED D3 illuminating on the UltraZed SOM when the timeout occurs.

 

For this example, I also used the PMU persistent global registers, which are cleared only by a power-on reset, to keep a record of the fact that a watchdog event has occurred. This count increments each time a watchdog timeout occurs. After the example design has reset 6 times, the code finally begins to restart the watchdog and stays alive.

 

Because the watchdog causes a reset event each time the processor reboots, we must take care to clear the previously recorded error. Failure to do so will result in a permanent reset cycle because the timeout error is only cleared by a power-on reset or a software action that clears the error. To prevent this from happening, we need to clear the watchdog fault indication in the PMU’s global ERROR_STATUS_1 register at the start of the program.

 

 

 

Image4.jpg

 

Reset signal being issued to the programmable logic

 

 

 

Observing the terminal output after I created my example application, it was obvious that the timeout was occurring and the number of occurrences was incrementing. The screen shot below shows the initial error status register and the error mask register. The error status register is shown twice. The first instance shows the watchdog error in the system and the second confirms that it has been cleared. The reset reason also indicates the cause of the reset. In this case it’s a PS reset.

 

 

 

Image5.jpg

 

 

Looping Watchdog and incrementing count

 

 

 

We have touched lightly on the PMU’s error-handling capabilities. We will explore these capabilities more in a future blog.

 

Meanwhile, you can find the example source code on the GitHub.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

  

MicroZed Chronicles hardcopy.jpg 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

National Instruments’ (NI’s) PXI FlexRIO modular instrumentation product line has been going strong for more than a decade and the company has just revamped its high-speed PXI analog digitizers and PXI digital signal processors by upgrading to high-performance Xilinx Kintex UltraScale FPGAs. According to NI’s press release, “With Kintex UltraScale FPGAs, the new FlexRIO architecture offers more programmable resources than previous Kintex-7-based FlexRIO modules. In addition, the new mezzanine architecture fits both the I/O module and the FPGA back end within a single, integrated 3U PXI module. For high-speed communication with other modules in the chassis, these new FlexRIO modules feature PCIe Gen 3 x8 connectivity for up to 7 GB/s of streaming bandwidth.” (Note that all Kintex UltraScale FPGAs incorporate hardened PCIe Gen1/2/3 cores.)

 

The change was prompted by a tectonic shift in analog converter interface technology—away from parallel LVDS converter interfaces and towards newer, high-speed serial protocols like JESD204B. As a result, NI’s engineers re-architected their PXI module architecture by splitting it into interface-compatible Mezzanine I/O modules and a trio of back-end FPGA carrier/PCIe-interface cards—NI calls them “FPGA back ends”—based on three pin-compatible Kintex UltraScale FPGAs: the KU035, KU040, and KU060 Kintex UltraScale devices. These devices allow NI to offer three different FPGA resource levels with the same pcb design.

 

 

NI FlexRIO Digitizers based on Kintex UltraScale FPGAs.jpg

 

 

 

Modular NI PXI FlexRIO Module based on Xilinx Kintex UltraScale FPGAs

 

 

 

The new products in the revamped FlexRIO product line include:

 

Digitizer Modules – New PXI FlexRIO Digitizers higher-speed sample rates and wide bandwidth without compromising dynamic range. The 16-bit, 400MHz PXIe-5763 and PXIe-5764 operate at 500 Msamples/sec and 1 Gsamples/sec respectively. (Note: NI previewed these modules earlier this year at NI Week. See “NI’s new FlexRIO module based on Kintex UltraScale FPGAs serves as platform for new modular instruments.”)

 

 

NI FlexRIO Digitizers based on Kintex UltraScale FPGAs.jpg 

 

 

  • Coprocessor Modules – Kintex UltraScale PXI FlexRIO Coprocessor Modules add real-time signal processing capabilities to a system. A chassis full of these modules delivers high-density computational resources—the most computational density ever offered by NI.

 

  • Module Development Kit – You can use NI’s LabVIEW to program these PXI modules and you can using use the Xilinx Vivado Project Export feature included with LabVIEW FPGA 2017 to develop, simulate, and compile custom I/O modules for more performance or to meet unique application requirements. (More details available from NI here.)

 

Here’s a diagram showing you the flow for LabVIEW 2017’s Xilinx Vivado Project Export capability:

 

 

 

 NI Vivado Project Export for LabVIEW 2017.jpg

 

 

 

NI’s use of three Kintex UltraScale FPGA family members to develop these new FlexRIO products illustrate several important benefits associated with FPGA-based design.

 

First, NI has significantly future-proofed its modular PXIe FlexRIO design by incorporating the flexible, programmable I/O capabilities of Xilinx FPGAs. JESD204B is an increasingly common analog interface standard, easily handled by the Kintex UltraScale FPGAs. In addition, those FPGA I/O pins driving the FlexRIO Mezzanine interface connector are bulletproof, fully programmable, 16.3Gbps GTH/GTY ports that can accommodate a very wide range of high-speed interfaces—nearly anything that NI’s engineering teams might dream up in years to come.

 

Second, NI is able to offer three different resource levels based on three pin-compatible Kintex UltraScale FPGAs using the same pcb design. This pin compatibility is not accidental. It’s deliberate. Xilinx engineers labored to achieve this compatibility just so that system companies like NI could benefit. NI recognized and took advantage of this pre-planned compatibility. (You'll find that same attention to detail in every one of Xilinx's All Programmable device families.)

 

Third, NI is able to leverage the same Kintex UltraScale FPGA architecture for its high-speed Digitizer Modules and for its Coprocessor Modules, rather than using two entirely dissimilar chips—one for I/O control in the digitizers and one for the programmable computing engine Coprocessor Modules. The same programmable Kintex UltraScale FPGA architecture suits both applications well. The benefit here is the ability to develop common drivers and other common elements for both types of FlexRIO module.

 

 

For more information about these new NI FlexRIO products, please contact NI directly.

 

 

Note: Xcell Daily has covered the high-speed JESD204B interface repeatedly in the past. See:

 

 

 

 

 

 

 

 

 

 

 

Earlier this month, I described Aaware’s $199 Far-Field Development Platform for cloud-based, voice controlled systems such as Amazon’s Alexa and Google Home. (See “13 MEMS microphones plus a Zynq SoC gives services like Amazon’s Alexa and Google Home far-field voice recognition clarity.”) This far-field, sound-capture technology exhibits some sophisticated abilities including:

 

  1. The ability to cancel interfering noise without a reference signal. (Competing solutions focus on AEC—acoustic echo cancellation—which cancels noise relative to a required audio reference channel.)
  2. Support for non-uniform 1D and 2D microphone array spacing.
  3. Scales up with more microphones for noisier environments.
  4. Offers a one-chip solution for sound capture, multiple wake words, and customer applications. (Today this is a two-chip solution.)
  5. Makes everything available in a “software-ready” environment: Just log in to the Ubuntu linux environment and use Aaware’s streaming audio API to begin application development.

 

 

Aaware Far Field Development PLatform.jpg 

 

Aaware’s Far-Field Development Platform

 

 

 

These features are layered on top of a Xilinx Zynq SoC or Zynq UltraScale+ MPSoC and Aaware’s CTO Chris Eddington feels that the Zynq devices provide “well over” 10x the performance of an embedded processor thanks to the devices’ on-chip programmable logic, which offloads a significant amount of processing from the on-chip ARM Cortex processor(s). (Aaware can squeeze its technology into a single-core Zynq Z-7007S SoC and can scale up to larger Zynq SoC and Zynq UltraScale+ MPSoC devices as needed by the customer application.)

 

Aaware’s algorithm development is based on a unique tool chain:

 

  • Algorithm development in MathWork’s MATLAB.
  • Hand-coding of an equivalent application in C++.
  • Initial hardware-accelerator synthesis from the C++ specification using Vivado HLS.
  • Use of Xilinx SDSoC to connect the hardware accelerators to the AXI bus and memory.

 

 

This tool chain allows Aaware to fit the features it wants into the smallest Zynq Z-7007S SoC or to scale up to the largest Zynq UltraScale+ MPSoC.

 

 

 

 

 

 

 

By Adam Taylor

 

In all of our previous MicroBlaze soft processor examples, we have used JTAG to download and execute the application code via SDK. You can’t do that in the real world. To deploy a real system, we need the MicroBlaze processor to load and boot its application code from non-volatile memory without our intervention. I thought that showing you how to do this would make for a pretty interesting blog. So using the $99 Digilent Arty A7 board, which is based on a Xilinx Artix-7 FPGA, I am going to demonstrate the steps you need to take using the Arty’s on-board QSPI Flash memory. We will store both the bitstream configuration file and the application software in the QSPI flash.

 

The QSPI therefore has two roles:

 

  • Configure the Artix FPGA
  • Store the application software

 

For the first role, we do not need to include a QSPI interface in our Vivado design. All we need to do is update the Vivado configuration settings to QSPI, provided the QSPI flash memory is connected to the FPGA’s configuration pins. However, we need to include a QSPI interface in our design to interface with the QSPI Flash memory once the FPGA is configured and the MicroBlaze soft processor is instantiated. This addition allows a bootloader to copy the application software from the QSPI Flash memory to the Arty’s DDR SDRAM where it actually executes.

 

Of course, this raises the question:

 

Where does the MicroBlaze bootloader come from?

 

The process flowchart for developing the bootloader looks like this:

 

 

 

Image1.jpg

 

Development Flow Chart

 

 

Our objective is to create a single MCS image containing both the FPGA bitstream and the application software that we can burn into the QSPI Flash memory. To so this we need to perform the following steps in Vivado and SDK:

 

 

  • Include a QSPI Interface within the existing Vivado MicroBlaze design

 

  • Edit the device settings in Vivado to configure the device using Master SPI_4 and to compress the bit file, then to build and export the application to SDK

 

 

Image2.jpg 

 

 

 

  • In SDK, create a new application project based on the exported hardware design. During the project-creation dialog, select the SREC SPI Bootloader template. This selection creates an SREC bootloader application that will load the main application code from QSPI Flash memory. Before we build the bootloader ELF, we first need to define the address offset from the base address of the QSPI to the location of the application software. In this case it is 0x600000. We define this offset within blconfig.h. We also need to update the SREC Bootloader BSP to identify the correct serial Flash memory device family. To do this, reconfigure the BSP. The family identification number to use is defined within xilisf.h, available under BSP libsrc. For this application, we select type 5 because the Arty board uses a Micron QSPI device, which is type 5.

 

 

 Image3.jpg

 

 

 

  • Now create a second application project In SDK. This is the application we are going to load using the bootloader. For this application, I created a simple “hello world” application, ensuring in the linker file that this program will run from DDR SDRAM. To create the single MCS file, we need the application software to be in the S-record format. This format stores binary information in an ASCII format. (This format is now 40 years old and was originally developed for the 8-bit Motorola 6800 microprocessor.) We can use SDK to convert the generated ELF into S-record format. To generate the S-record file in SDK, open a bash shell and enter the following command in the directory that contains the application ELF:

 

cmd /c mb-objcopy -O srec <app>.elf <app>.srec

 

 

  • With the bootloader ELF created, we now need to merge the bitstream with the bootloader ELF in Vivado. This step allows the bootloader to be loaded into and run from the MicroBlaze processor’s local memory following configuration. Because this memory is small, the bootloader application must also be small. If you are experiencing problems reducing the size of the software application, consider using a compiler optimisation before increasing the local memory size.

 

 

Image4.jpg

 

 

 

  • With the merged bit file created and the S-record file available, use Vivado’s hardware manager to add the configuration memory:

 

 

Image5.jpg

 

 

  • The final step is to generate the unified MCS file containing the merged bitstream and the application software. When generating this file, we need to remember to load the application software using the same offset used in the SREC bootloader.

 

 

 Image6.jpg

 

 

 

Once the file is built and burned into the QSPI memory, we can test to see that the MCS file works by connecting the Arty board to a terminal and pressing the board’s reset button. After a few seconds you should see the Arty board’s “done” LED illuminate and then you should see the results of the SREC bootloader appear in the terminal window. This report should show that the S records have loaded from QSPI memory into to DDR SDRAM before the program executes.

 

We now have a working MicroBlaze system that we can deploy in our designs.

 

 


 

The project, as always, is on GitHub.

 

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

  

MicroZed Chronicles hardcopy.jpg 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

 

 

The free, Web-based airhdl register file generator from noasic GmbH, an FPGA design and coaching consultancy and EDA tool developer, uses a simple, online definition tool to create register definitions from which the tool then automatically generates HDL, a C header file, and HTML documentation. The company’s CEO Guy Eschemann has been working with FPGAs for more than 15 years, so he’s got considerable experience in the need to create bulletproof register definitions to achieve design success. His company noasic is a member of the Xilinx Alliance Program.

 

What’s the big deal about creating registers? Many complex FPGA-based designs now require hundreds or even thousands of registers to operate and monitor a system and keeping these register definitions straight and properly documented, especially within the context of engineering changes, is a tough challenge for any design team.

 

The best way I’ve seen to put register definition in context comes from the book “Hardware/Firmware Interface Design: Best Practices for Improving Embedded Systems Development” written by my friend Gary Stringham:

 

 

“The hardware/firmware interface is the junction where hardware and firmware meet and communicate with each other. On the hardware side, it is a collection of addressable registers that are accessible to firmware via reads and writes. This includes the interrupts that notify firmware of events. On the firmware side, it is the device drivers or the low-level software that controls the hardware by writing values to registers, interprets the information read from the registers, and responds to interrupt requests from the hardware. Of course, there is more to hardware than registers and interrupts, and more to firmware than device drivers, but this is the interface between the two and where engineers on both sides must be concerned to achieve successful integration.”

 

 

The airhdl EDA tool from noasic is designed to help your hardware and software/firmware teams “achieve successful integration” by creating a central nexus for defining the critical, register-based hardware/firmware interface. It uses a single register map (with built-in version control) to create the HDL register definitions, the C header file for firmware’s use of those registers, and the HTML documentation that both the hardware and software/firmware teams will need to properly integrate the defined registers into a design.

 

 

Here’s an 11-minute video made by noasic to explain the airhdl EDA tool:

 

 

 

 

 

Consider signing up for access to this free tool. It will very likely save you a lot of time and effort.

 

 

For more information about airhdl, use the links above or contact noasic GmbH directly.

 

 

By Adam Taylor

 

One ongoing area we have been examining is image processing. We’ve look at the algorithms and how to capture images from different sources. A few weeks ago, we looked at the different methods we could use to receive HDMI data and followed up with an example using an external CODEC (P1 & P2). In this blog we are going to look at using internal IP cores to receive HDMI images in conjunction with the Analog Devices AD8195 HDMI buffer, which equalizes the line. Equalization is critical when using long HDMI cables.

 

 

Image1.jpg 

 

Nexys board, FMC HDMI and the Digilent PYNQ-Z1

 

 

 

To do this I will be using the Digilent FMC HDMI card, which provisions one of its channels with an AD8195. The AD8195I on the FMC HDMI card needs a 3v3 supply, which is not available on the ZedBoard unless I break out my soldering iron. Instead, I broke out my Digilent Nexys Video trainer board, which comes fitted with an Artix-7 FPGA and an FMC connector. This board has built-in support for HDMI RX and TX but the HDMI RX path on this board supports only 1m of HDMI cable while the AD8195 on the FMC HDMI card supports cable runs of up to 20m—far more useful in many distributed applications. So we’ll add the FMC HDMI card.

 

First, I instantiated a MicroBlaze soft microprocessor system in the Nexys Video card’s Artix-7 FPGA to control the simple image-processing chain needed for this example. Of course, you can implement the same approach to the logic design that I outline here using a Xilinx Zynq SoC or Zynq UltraScale+ MPSoC. The Zynq PS simply replaces the MicroBlaze.

 

 The hardware design we need to build this system is:

 

  • MicroBlaze controller with local memory, AXI UART, MicroBlaze Interrupt controller, and DDR Memory Interface Generator.
  • DVI2RGB IP core to receive the HDMI signals and convert them to a parallel video format.
  • Video Timing Controller, configured for detection.
  • ILA connected between the VTC and the DVI2RBG cores, used for verification.
  • Clock Wizard used to generate a 200MHz clock, which supplies the DDR MIG and DVI2RGB cores. All other cores are clocked by the MIG UI clock output.
  • Two 3-bit GPIO modules. The first module will set the VADJ to 3v3 on the HDMI FMC. The second module enables the ADV8195 and provides the hot-plug detection.

 

 

Image2.jpg 

 

 

 

The final step in this hardware build is to map the interface pins from the AD8195 to the FPGA’s I/O pins through the FMC connector. We’ll use the TMDS_33 SelectIO standard for the HDMI clock and data lanes.

 

Once the hardware is built, we need to write some simple software to perform the following:

 

 

  • Disable the VADJ regulator using pin 2 on the first GPIO port.
  • Set the desired output voltage on VADJ using pins 0 & 1 on the first GPIO port.
  • Enable the VADJ regulator using pin 2 on the first GPIO port.
  • Enable the AD8195 using pin 0 on the second GPIO port.
  • Enable pre- equalization using pin 1 on the second GPIO port.
  • Assert the Hot-Plug Detection signal using pin 2 on the second GPIO port.
  • Read the registers within the VTC to report the modes and status of the video received.

 

 

To test this system, I used a Digilent PYNQ-Z1 board to generate different video modes. The first step in verifying that this interface is working is to use the ILA to check that the pixel clock is received and that its DLL is locked, along with generating horizontal and vertical sync signals and the correct pixel values.

 

Provided the sync signals and pixel clock are present, the VTC will be able to detect and classify the video mode. The application software will then report the detected mode via the terminal window.

 

 

Image3.jpg

 

ILA Connected to the DVI to RGB core monitoring its output

 

 

 

Image4.jpg 

 

 

Software running on the Nexys Video detecting SVGA mode (600 pixels by 800 lines)

 

 

 

With the correct video mode being detected by the VTC, we can now configure a VDMA write channel to move the image from the logic into a DDR frame buffer.

 

 

You can find the project on GitHub

 

 

If you are working with video applications you should also read these:

 

 

PL to PS using VDMA

What to do if you have VDMA issues  

Creating a MicroBlaze System Video

Writing MicroBlaze Software  

 

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg  

 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

Avnet’s MiniZed SpeedWay Design Workshops are designed to help you jump-start your embedded design capabilities using Xilinx Zynq Z-7000S All Programmable SoCs, which meld a processing system based on a single-core, 32-bit, 766MHz Arm Cortex-A9 processor with plenty of Xilinx FPGA fabric. Zynq SoCs are just the thing when you need to design high-performance embedded systems or need to use a processor along with some high-speed programmable logic. Even better, these Avnet workshops focus on using the Avnet MiniZed—a compact, $89 dev board packed with huge capabilities including built-in WiFi and Bluetooth wireless connectivity. (For more information about the Avnet MiniZed dev board, see “Avnet’s $89 MiniZed dev board based on Zynq Z-7000S SoC includes WiFi, Bluetooth, Arduino—and SDSoC! Ships in July.”)

 

These workshops start in November and run through March of next year and there are four full-day workshops in the series:

 

  • Developing Zynq Software
  • Developing Zynq Hardware
  • Integrating Sensors on MiniZed with PetaLinux
  • A Practical Guide to Getting Started with Xilinx SDSoC

 

You can mix and match the workshops to meet your educational requirements. Here’s how Avnet presents the workshop sequence:

 

 

Avnet MiniZed Workshops.jpg 

 

 

 

These workshops are taking place in cities all over North America including Austin, Dallas, Chicago, Montreal, Seattle, and San Jose, CA. All cities will host the first two workshops. Montreal and San Jose will host all four workshops.

 

A schedule for workshops in other countries has yet to be announced. The Web page says “Coming soon” so please contact Avnet directly for more information.

 

Finally, here’s a 1-minute YouTube video with more information about the workshops

 

 

 

 

For more information on and to register for the Avnet MiniZed SpeedWay Design Workshops, click here.

 

 

The Vivado 2017.3 HLx Editions are now available and the Vivado 2017.3 Release Notes (UG973) tells you why you’ll want to download this latest version now. I’ve scanned UG973, watched the companion 20-minute Quick Take video, and cherry-picked twenty of the many enhancements that jumped out at me to help make your decision easier:

 

 

  • Reduced compilation time with a new incremental compilation capability

 

  • Improved heuristics to automatically choose between high-reuse and low-reuse modes for incremental compilation

 

  • Verification IP (VIP) now included as part of pre-compiled IP libraries including support for AXI-Stream VIP

 

  • Enhanced ability to integrate RTL designs into IP Integrator using drag-and-drop operations. No need to run packager any more.

 

  • Checks put in place to ensure that IP is available when invoking write_bd_tcl command

 

  • write_project_tcl command now includes Block designs if they are part of the project

 

  • Hard 100GE Subsystem awareness for the VCU118 UltraScale+ Board with added assistance support

 

  • Hard Interlaken Subsystem awareness for the VCU118 UltraScale+ Board with assistance support

 

  • Support added for ZCU106 and VCU118 production reference boards

 

  • FMC Support added to ZCU102 and ZCU106 reference boards

 

  • Bus skew reports (report_bus_skew) from static timing analysis now available in the Vivado IDE

 

  • Enhanced ease of use for debug over PCIe using XVC

 

  • Partial Reconfiguration (PR) flow support for all UltraScale+ devices in production

 

  • Support for optional flags (FIFO Almost Full, FIFO Almost Empty, Read Data Valid) in XPM FIFO

 

  • Support for different read and write widths while using Byte Write Enable in XPM Memory

 

  • New Avalon to AXI bridge IP

 

  • New USXGMII Subsystem that switches 10M/100M/1G/2.5G/5G/10G on 10GBASE-R 10G GT line rate for NBASE-T standard

 

  • New TSN (Time-Sensitive Network) subsystem

 

  • Simplified interface for DDR configuration in the Processing Systems Configuration Wizard

 

  • Fractional Clock support for DisplayPort Audio and Video to reduce BOM costs

 

 

 

 

By Adam Taylor

 

 

We really need an operating system to harness the capabilities of the two or four 64-bit Arm Cortex-A53 processors cores in the Zynq UltraScale+ MPSoC APU (Application Processing Unit). An operating system enables effective APU use by providing SMP (Symmetric Multi-Processing), multitasking, networking, security, memory management, and file system capabilities. Immediate availability of these resources saves us considerable coding and debugging time and allows us to focus on developing the actual application. Putting it succinctly: don’t reinvent the wheel when it comes to operating systems.

 

 

 

Image1.jpg 

 

Avnet UltraZed-EG SOM plugged into an IOCC (I/O Carrier Card)

 

 

 

PetaLinux is one of the many operating systems that run on the Zynq UltraScale+ MPSoC’s APU. I am going to focus this blog on what we need to do to get PetaLinux up and running on the Zynq UltraScale+ MPSoC using the Avnet UltraZed-EG SoM. However, the process is the same for any design.

 

To do this we will need:

 

 

  • A Zynq UltraScale+ MPSoC Design: For this example, I will use the design we created last week
  • A Linux machine or Linux Virtual Machine
  • PetaLinux and the Xilinx software command-line tool chain installed to configure the PetaLinux OS and perform the build

 

 

To ensure that it installs properly, do not use root to install PetaLinux. In other words, do not use the sudo command. If you want to install PetaLinux in the /opt/pkg directory as recommended in the installation guide, you must first change the directory permissions for your user account.  Alternatively, you can install PetaLinux in your home area, which is exactly what I did.

 

With PetaLinux installed, run the settings script in a terminal window (source settings.sh), which sets the environment variable allowing us to call PetaLinux commands.

 

When we build PetaLinux, we get a complete solution. The build process creates the Linux image, he device tree blob, and a RAM disk combined into a single FIT image. PetaLinux also generates the PMU firmware, the FSBL (First Stage Boot Loader), and U-boot executables needed to create the boot.bin.

 

We need to perform the following steps to create the FIT image and boot files:

 

 

  • Export the Vivado hardware definition. This will export the hardware definition file under the directory <project_name.sdk> within the Vivado project

 

  • Create a new PetaLinux project. We are creating a design for a custom board (i.e. there is no PetaLinux BSP), so we will use the command petalinux-create and use the zynqMP template:

 

 

petalinux-create --type project --template zynqMP --name part219

 

 

  • Import the hardware definition from Vivado to configure the PetaLinux project. This will provide not only the bit file and HDF but will be used to create the device trees

 

petalinux-config --get-hw-description=<path-to-vivado-hardware-description-file>

 

 

 

  • This will open a petalinux configuration menu; you should review the Linux kernel and U-Boot settings. For the basic build in this example we do not need to change anything.

 

 

 

Image2.jpg

 

Petalinux configuration page presented once the hardware definition is imported

 

 

 

 

  • If desired you can also review the configuration of constituent parts of u-boot, PMUFW, and Device Tree or RAMFS by using the command:

 

Petalinux-config -c <u-boot or PMUFW or device-tree or rootfs>

 

 

  • Build the PetaLinux kernel using the command:

 

 

Petalinux-build

 

 

This might take some time.

 

 

  • Create the boot.bin file.

 

 

petalinux-package --boot --fsbl zynqmp_fsbl.elf --u-boot u-boot.elf --pmufw  pmufw.elf –fpga fpga.bit

 

 

 

Once we have created the image file and the bin file, we can copy them to a SD card and boot the UltraZed-EG SOM.

 

Just as we simulate our logic designs first, we can test the PetaLinux image within our Linux build environment using QEMU. This allows us to verify that the image we have created will load correctly.

 

We run QEMU by issuing the following command:

 

 

petalinux-boot --qemu --image < Linux-image-file>

 

 

 

Image3.jpg

 

Created Petalinux image running in QEMU

 

 

 

Once we can see the PetaLinux image booting correctly in QEMU, the next step is to try it on the UltraZed-EG SOM. Copy the image.ub and boot.bin files to an SD card, configure the UltraZed-EG SOM mounted on the IOCC (I/O Carrier Card) to boot from the SD card, insert the SD card, and apply power.

 

If everything has been done correctly, you should see the FSBL load the PMU firmware in the terminal window. Then, U-Boot should run and load the Linux kernel.

 

 

 

Image4.jpg 

 

Linux running on the UltraZed-EG SOM

 

 

Once the boot process has completed, we can log in using the user name and password of root, and then begin using our PetaLinux environment.

 

Now that we have the Zynq UltraScale+ MPSoC’s APU up and running with PetaLinux, we can use OpenAMP to download and execute programs in the Zynq UltraScale+ MPSoC’s dual-core ARM Cortex-R5 RPU (Real-time Processing Unit). We will explore how we do this in another blog.

 

Meanwhile, the following links were helpful in the generation of this image:

 

 

https://www.xilinx.com/support/answers/68390.html

 

https://www.xilinx.com/support/answers/67158.html

 

https://wiki.trenz-electronic.de/display/PD/Petalinux+Troubleshoot#PetalinuxTroubleshoot-Petalinux2016.4

 

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug1156-petalinux-tools-workflow-tutorial.pdf

 

 

 

 

The project, as always, is on GitHub.

 

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg  

 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

Brian Mathewson, Verification Technologist, Mentor Graphics

 

 

FPGA-based designs increasingly use advanced, multi-clocking architectures to meet high-performance throughput and computational requirements. However, an RTL or gate-level simulation of a design with multiple clock domains does not accurately capture the timing associated with data transfers between clock domains. Specifically, simulation does not model metastabilty as signals cross asynchronous clock domains. Consequently, bugs can escape the front-end verification process.

 

In the lab, these bugs can look like intermittent static-timing issues and you can spend weeks of expensive lab time chasing after them before you realize that CDC (clock domain crossing) metastability is the culprit. Fortunately, there are automated verification solutions that can apply exhaustive formal analysis to root out CDC issues up front.

 

Mentor recently developed an app for the Questa CDC verification tool that directly integrates with the Xilinx Vivado Design Suite. This new flow helps you analyze CDC paths more quickly by providing deeper CDC analysis. You can install the Questa CDC app from the Xilinx Vivado Tcl store. The app seamlessly integrates with Vivado and automatically creates the configuration files necessary to set up and run Mentor’s Questa CDC tool based on the FPGA design you have loaded into the Vivado Design Suite.  Once the utility generates the CDC setup files, you can then launch Questa CDC from within Vivado with the click of a button.

 

In a real-life case study, a design team was able to run Questa CDC on their design based on a Xilinx FPGA and generated initial results in less than one day. Their structural analysis uncovered timing violations and issues that fell into 3 categories:

 

 

  1. Design problems included missing synchronization structures and incorrect synchronizers. The incorrect synchronizers involved combinational logic violations.
  2. Stable transmitting signals will not toggle during normal operation (i.e. they were “CDC safe” already), so waivers were created for these paths to skip them in subsequent runs of the flow. The Questa CDC waiver flow allows teams to track and manage waivers, so you can review and validate all waivers and assumptions in your projects.
  3. Questionable paths where designers were unsure whether signals were stable. In this case, CDC protocol assertions were generated and validated in simulation.

 

 

Furthermore, this FPGA-based design supported two operating modes and Questa CDC detected violations associated with the disabled modes. These violations pointed out a design bug that incorrectly enabled logic that should have been disabled for the inactive mode.

 

This same design team also used protocol verification on CDC paths with both “good” and “bad” synchronization structures. For the good structures, synchronizer protocol rules such as stability checks were validated with both formal verification and simulation. For the paths without synchronizers, synchronization structures were added to paths where protocol assertions failed in simulation. The FPGA-based design malfunctioned until all paths with protocol violations were synchronized.

 

 

Bottom line: the one week of CDC analysis was more productive than 2 weeks of debug in the lab.

 

But what about the underlying CDC verification technology?  What type of analyses does it support?

 

Mentor’s Questa CDC is a full-featured CDC solution that includes:

 

  • Automatic, complex scheme detection including DMUX, handshake, and FIFO synchronizers
  • Integrated, assertion-based protocol verification for simulation and formal verification technologies
  • CDC static and dynamic coverage and tracking capabilities
  • Modal analysis for designs with multiple operating modes
  • Static and dynamic reconvergence verification
  • Advanced waiver audit flow for tracking and review of exceptions

 

The ability to thoroughly verify clock-domain crossings becomes ever more important as the number of clock domains increases in today's complex FPGA designs. By leveraging the Xilinx Tcl App Store, Mentor’s Questa CDC app for Vivado allows you to get started with advanced CDC analysis using the Questa CDC tool that adds essential capabilities for structural, protocol, and metastability verification. These features ensure that you handle CDC signals earlier in the design stage to avoid costly, time-wasting debug cycles in the lab.

 

For more detail on Questa CDC, click here.

 

P.S. If your FPGA designs target automotive applications, note that Questa CDC is the only CDC tool that has been qualified for ISO26262.  For more about this and the Mentor “SAFE” program to qualify all tools for ISO26262, click here.

 

 

 

By Adam Taylor

 

The Xilinx Zynq UltraScale+ MPSoC is good for many applications including embedded vision. It’s APU with two or four 64-bit ARM Cortex-A53 processors, Mali GPU, DisplayPort interface, and on-chip programmable logic (PL) give the Zynq UltraScale+ MPSoC plenty of processing power to address exciting applications such as ADAS and vision-guided robotics with relative ease. Further, we can use the device’s PL and its programmable I/O to interface with a range of vision and video standards including MIPI, LVDS, parallel, VoSPI, etc. When it comes to interfacing image sensors, the Zynq UltraScale+ MPSoC can handle just about anything you throw at it.

 

Once we’ve brought the image into the Zynq UltraScale+ MPSoC’s PL, we can implement an image-processing pipeline using existing IP cores from the Xilinx library or we can develop our own custom IP cores using Vivado HLS (high-level synthesis). However, for many applications we’ll need to move the images into the device’s PS (processing system) domain before we can apply exciting application-level algorithms such as decision making or use the Xilinx reVISION acceleration stack.

 

 

 

Image1.jpg 

 

The original MicroZed Evaluation kit and UltraZed board used for this demo

 

 

 

I thought I would kick off the fourth year of this blog with a look at how we can use VDMA instantiated in the Zynq MPSoC’s PL to transfer images from the PL to the PS-attached DDR Memory without processor intervention. You often need to make such high-speed background transfers in a variety of applications.

 

To do this we will use the following IP blocks:

 

  • Zynq MPSoC core – Configured to enable both a Full Power Domain (FPD) AXI HP Master and FPD HPC AXI Slave, along with providing at least one PL clock and reset to the PL fabric.
  • VDMA core – Configured for write only operations, No FSync option and with a Genlock Mode of master
  • Test Pattern Generator (TPG) – Configurable over the AXI Lite interface
  • AXI Interconnects – Implement the Master and Slave AXI networks

 

 

Once configured over its AXI Lite interface, the Test Pattern Generator outputs test patterns which are then transferred into the PS-attached DDR memory. We can demonstrate that this has been successful by examining the memory locations using SDK.

 

 

Image2.jpg 

 

Enabling the FPD Master and Slave Interfaces

 

 

 

For this simple example, we’ll clock both the AXI networks at the same frequency, driven by PL_CLK_0 at 100MHz.

 

For a deployed system, an image sensor would replace the TPG as the image source and we would need to ensure that the VDMA input-channel clocks (Slave-to-Memory-Map and Memory-Map-to-Slave) were fast enough to support the required pixel and frame rate.  For example, a sensor with a resolution of 1280 pixels by 1024 lines running at 60 frames per second would require a clock rate of at least 108MHz. We would need to adjust the clock frequency accordingly.

 

 

 

Image3.jpg

 

Block Diagram of the completed design

 

 

 

To aid visibility within this example, I have included three ILA modules, which are connected to the outputs of the Test Pattern Generator, AXI VDMA, and the Slave Memory Interconnect. Adding these modules enables the use of Vivado’s hardware manager to verify that the software has correctly configured the TPG and the VDMA to transfer the images.

 

With the Vivado design complete and built, creating the application software to configure the TPG and VDMA to generate images and move them from the PL to the PS is very straightforward. We use the AXIVDMA, V_TPG, Video Common APIs available under the BSP lib source directory to aid in creating the application. The software itself performs the following:

 

  1. Initialize the TPG and the AXI VDMA for use in the software application
  2. Configure the TPG to generate a test pattern configured as below
    1. Set the Image Width to 1280, Image Height to 1080
    2. Set the color space to YCRCB, 4:2:2 format
    3. Set the TPG background pattern
    4. Enable the TPG and set it for auto reloading
  3. Configure the VDMA to write data into the PS memory
    1. Set up the VDMA parameters using a variable of the type XAxiVdma_DmaSetup – remember the horizontal size and stride are measured in bytes not pixels.
    2. Configure the VDMA with the setting defined above
    3. Set the VDMA frame store location address in the PS DDR
    4. Start VDMA transfer

The application will then start generating test frames, transferred from the TPG into the PS DDR memory. I disabled the caches for this example to ensure that the DDR memory is updated.

 

Examining the ILAs, you will see the TPG generating frames and the VDMA transferring the stream into memory mapped format:

 

 

 

Image4.jpg

 

TPG output, TUSER indicates start of frame while TLAST indicates end of line

 

 

 

Image5.jpg

 

VDMA Memory Mapped Output to the PS

 

 

 

Examining the frame store memory location within the PS DDR memory using SDK demonstrates that the pixel values are present.

 

 

Image6.jpg 

 

Test Pattern Pixel Values within the PS DDR Memory

 

 

 

 

You can use the same approach in Vivado when creating software for a Zynq Z-7000 SoC iinstead of a Zynq UltraScale+ MPSoC by enabling the AXI GP master for the AXI Lite bus and AXI HP slave for the VDMA channel.

 

Should you be experiencing trouble with your VDMA based image processing chain, you might want to read this blog.

 

 

The project, as always, is on GitHub.

 

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg  

 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

I joined Xilinx five years ago and have looked for a good, introductory book on FPGA-based design ever since because people have repeatedly asked me for my recommendation. Until now, I could mention but not recommend Max Maxfield’s book published in 2004 titled “The Design Warrior’s Guide to FPGAs”—not because it was bad (it’s excellent) but because it’s more than a decade out of date. Patrick Lysaght, a Senior Director in the Xilinx CTO's office, alerted me to a brand new book that I can now recommend to anyone who wants to learn about using programmable logic to design digital systems.

 

It’s titled “Digital Systems Design with FPGA: Implementation Using Verilog and VHDL” and it was written by Prof. Dr. Cem Ünsalan in the Department of Electrical and Electronics Engineering at Yeditepe University in İstanbul and Dr. Bora Tar, now at the Power Management Research Lab at Ohio State University in Columbus, Ohio. Their book will take you from the basics of digital design and logic into FPGAs; FPGA architecture including programmable logic, block RAM, DSP slices, FPGA clock management, and programmable I/O; hardware description languages with an equal emphasis on Verilog and VHDL; the Xilinx Vivado Design Environment; and then on to IP cores including the Xilinx MicroBlaze and PicoBlaze soft processors. The book ends with 24 advanced embedded design projects. It’s quite obvious that the authors intend that this book be used as a textbook in a college-level digital design class (or two), but I think you could easily use this well-written book for self-directed study as well.

 

 

 

Digital System Design with FPGA Book cover.jpg

 

 

“Digital Systems Design with FPGA: Implementation Using Verilog and VHDL” uses the Xilinx Artix-7 FPGA as a model for describing the various aspects of a modern FPGA and goes on two describe two Digilent development boards based on the Artix-7 FPGA: the $149 Basys3 and the $99 Arty (now called the Arty A7 to differentiate it from the newer Arty S7, based on a Spartan-7 FPGA, and Zynq-based Arty Z7 dev boards). These boards are great for use in introductory design classes and they make powerful, low-cost development boards even for experienced designers.

 

At 400 pages, “Digital Systems Design with FPGA: Implementation Using Verilog and VHDL” is quite comprehensive and so new that the publisher has yet to put the table of contents online, so I decided to resolve that problem by publishing the contents here so that you can see for yourself how comprehensive the book is:

 

 

1 Introduction

1.1 Hardware Description Languages

1.2 FPGA Boards and Software Tools

1.3 Topics to Be Covered in the Book

 

2 Field-Programmable Gate Arrays

2.1 A Brief Introduction to Digital Electronics

2.1.1 Bit Values as Voltage Levels

2.1.2 Transistor as a Switch

2.1.3 Logic Gates from Switches

2.2 FPGA Building Blocks

2.2.1 Layout of the Xilinx Artix-7 XC7A35T FPGA

2.2.2 Input / Output Blocks

2.2.3 Configurable Logic Blocks

2.2.4 Interconnect Resources

2.2.5 Block RAM

2.2.6 DSP Slices

2.2.7 Clock Management

2.2.8 The XADC Block

2.2.9 High-Speed Serial I/O Transceivers

2.2.10 Peripheral Component Interconnect Express Interface

2.3 PPGA-Based Digital System Design Philosophy

2.3.1 How to Think While Using FPGAS

2.3.2 Advantages and Disadvantages of FPGAS

2.4 Usage Areas of FPGAs

2.5 Summary

2.6 Exercises

 

3 Basys3 and Arty FPGA Boards

3.1 The Basys3 Board

3.1.1 Powering the Board

3.1.2 Input / Output

3.1.3 Configuring the FPGA

3.1.4 Advanced Connectors

3.1.5 External Memory

3.1.6 Oscillator / Clock

3.2 The Arty Board

3.2.1 Powering the Board

3.2.2 Input/Output

3.2.3 Configuring the FPGA

3.2.4 Advanced Connectors

3.2.5 External Memory

3.2.6 Oscillator / Clock

3.3 Summary

3.4 Exercises

 

4 The Vivado Design Suite

4.1 Installation and the Welcome Screen

4.2 Creating a New Project

4.2.1 Adding a Verilog File

4.2.2 Adding a VHDL File

4.3 Synthesizing the Project

4.4 Simulating the Project

4.4.1 Adding a Verilog Testbench File

4.4.2 Adding a VHDL Testbench File

4.5 Implementing the Synthesized Project

4.6 Programming the FPGA

4.6.1 Adding the Basys3 Board Constraint File to the Project

4.6.2 Programming the FPGA on the Basys3 Board

4.6.3 Adding the Arty Board Constraint File to the Project

4.6.4 Programming the FPGA on the Arty Board

4.7 Vivado Design Suite IP Management

4.7.1 Existing IP Blocks in Vivado

4.7.2 Generating a Custom IP

4.8 Application on the Vivado Design Suite

4.9 Summary

4.10 Exercises

 

5 Introduction to Verilog and VHDL

5.1 Verilog Fundamentals

5.1.1 Module Representation

5.1.2 Timing and Delays in Modeling

5.1.3 Hierarchical Module Representation

5.2 Testbench Formation in Verilog

5.2.1 Structure of a Verilog Testbench File

5.2.2 Displaying Test Results

5.3 VHDL Fundamentals

5.3.1 Entity and Architecture Representations

5.3.2 Dataflow Modeling

5.3.3 Behavioral Modeling

5.3.4 Timing and Delays in Modeling

5.3.5 Hierarchical Structural Representation

5.4 Testbench Formation in VHDL

5.4.1 Structure of a VHDL Testbench File

5.4.2 Displaying Test Results

5.5 Adding an Existing IP to the Project

5.5.1 Adding an Existmg IP in Verilog

5.5 2 Adding an Existing IP in VHDL

5.6 Summary

5.7 Exercises

 

6 Data Types and Operators

6.1 Number Representations

6.1.1 Binary Numbers

6.1.2 Octal Numbers

6.1.3 Hexadecimal Numbers

6.2 Negative Numbers

6.2.1 Signed Bit Representation

6.2.2 One’s Complement Representation

6.2.3 Two’s Complement Representation

6.3 Fixed- and Floating-Point Representations

6.3.1 Fixed-Point Representation

6.3.2 Floating-Point Representation

6.4 ASCII Code

6.5 Arithmetic Operations on Binary Numbers

6.5.1 Addition

6.5.2 Subtraction

6.5.3 Multiplication

6.5.4 Division

6.6 Data Types in Verilog

6.6.1 Net and Variable Data Types

6.6.2 Data Values

6.6.3 Naming a Net or Variable

6.6.4 Defining Constants and Parameters

6.6.5 Defining Vectors

6.7 Operators in Verilog

6.7.1 Arithmetic Operators

6.7.2 Concatenation and Replication Operators

6.8 Data Types in VHDL

6.8.1 Signal and Variable Data Types

6.8.2 Data Values

6.8.3 Naming a Signal or Variable

6.8.4 Defining Constants

6.8.5 Defining Arrays

6.9 Operators in VHDL

6.9.1 Arithmetic Operators

6.9.2 Concatenation Operator

6.10 Application on Data Types and Operators

6.11 FPGA Building Blocks Used In Data Types and Operators

6.11.1 Implementation Details of Vector Operations

6.11.2 Implementation Details of Arithmetic Operations

6.12 Summary

6.13 Exercises

 

7 Combinational Circuits

7.1 Basic Definitions

7.1.1 Binary Variable

7.1.2 Logic Function

7.1.3 Truth Table

7.2 Logic Gates

7.2.1 The NOT Gate

7.2.2 The OR Gate

7.2.3 The AND Gate

7.2.4 The XOR Gate

7.3 Combinational Circuit Analysis

7.3.1 Logic Function Formation between Input and Output

7.3.2 Boolean Algebra

7.3.3 Gate-Level Minimization

7.4 Combinational Circuit Implementation

7.4.1 Truth Table-Based Implementation

7.4.2 Implementing One-Input Combinational Circuits

7.4.3 Implementing Two-Input Combinational Circuits

7.4.4 Implementing Three-Input Combinational Circuits

7.5 Combinational Circuit Design

7.5.1 Analyzing the Problem to Be Solved

7.5.2 Selecting a Solution Method

7.5.3 Implementing the Solution

7.6 Sample Designs

7.6.1 Home Alarm System

7.6.2 Digital Safe System

7.6.3 Car Park Occupied Slot Counting System

7.7 Applications on Combinational Circuits

7.7.1 Implementing the Home Alarm System

7.7.2 Implementing the Digital Safe System

7.7.3 Implementing the Car Park Occupied Slot Counting System

7.8 FPGA Building Blocks Used in Combinational Circuits

7.9 Summary

7.10 Exercises

 

8 Combinational Circuit Blocks

8.1 Adders

8.1.1 Half Adder

8.1.2 Full Adder

8.1.3 Adders in Verilog

8.1.4 Adders in VHDL

8.2 Comparators

8.2.1 Comparators in Verilog

8.2.2 Comparators in VHDL

8.3 Decoders

8.3.1 Decoders in Verilog

8.3.2 Decoders in VHDL

8.4 Encoders

8.4.1 Encoders in Verilog

8.4.2 Encoders in VHDL

8.5 Multiplexers

8.5.1 Multiplexers in Verilog

8.5.2 Multiplexers in VHDL

8.6 Parity Generators and Checkers

8.6.1 Parity Generators

8.6.2 Parity Checkers

8.6.3 Parity Generators and Checkers in Verilog

8.6.4 Parity Generators and Checkers in VHDL

8.7 Applications on Combinational Circuit Blocks

8.7.1 Improving the Calculator

8.7.2 Improving the Home Alarm System

8.7.3 Improving the Car Park Occupied Slot Counting System

8.8 FPGA Building Blocks Used in Combinational Circuit Blocks

8.9 Summary

8.10 Exercises

 

9 Data Storage Elements

9.1 Latches

9.1.1 SR Latch

9.1.2 D Latch

9.1.3 Latches in Verilog

9.1.4 Latches in VHDL

9.2 Flip—Flops

9.2.1 D Flip-Flop

9.2.2 JK Flip-Flop

9.2.3 T Flip-Flop

9.2.4 Flip-Flops in Verilog

9.2.5 Flip-Flops in VHDL

9.3 Register

9.4 Memory

9.5 Read-Only Memory

9.5.1 ROM in Verilog

9.5.2 ROM in VHDL

9.5.3 ROM Formation Using IP Blocks

9.6 Random Access Memory

9.7 Application on Data Storage Elements

9.8 FPGA Building Blocks Used in Data Storage Elements

9.9 Summary

9.10 Exercises

 

10 Sequential Circuits

10.1 Sequential Circuit Analysis

10.1.1 Definition of State

10.1.2 State and Output Equations

10.1.3 State Table

10.1.4 State Diagram

10.1.5 State Representation in Verilog

10.1.6 State. Representation in VHDL

10.2 Timing in Sequential Circuits

10.2.1 Synchronous Operation

10.2.2 Asynchronous Operation

10.3 Shift Register as a Sequential Circuit

10.3.1 Shift Registers in Verilog

10.3.2 Shift Registers in VHDL

10.3.3 Multiplication and Division Using Shift Registers

10.4 Counter as a Sequential Circuit

10.4.1 Synchronous Counter

10.4.2 Asynchronous Counter

10.4.3 Counters in Verilog

10.4.4 Counters in VHDL

10.4.5 Frequency Division Using Counters

10.5 Sequential Circuit Design

10.6 Applications on Sequential Circuits

10.6.1 Improving the Home Alarm System

10.6.2 Improving the Digital Safe System

10.6.3 Improving the Car Park Occupied Slot Counting System

10.6.4 Vending Machine

10.6.5 Digital Clock

10.7 FPGA Building Blocks Used in Sequential Circuits

10.8 Summary

10.9 Exercises

 

11 Embedding a Soft-Core Microcontroller

11.1 Building Blocks of a Generic Microcontroller

11.1.1 Central Processing Unit

11.1.2 Arithmetic Logic Unit

11.1.3 Memory

11.1.4 Oscillator / Clock

11.1.5 General Purpose Input/Output

11.1.6 Other Blocks

11.2 Xilinx PicoBlaze Microcontroller

11.2.1 Functional Blocks of PicoBlaze

11.2.2 PicoBlaze in Verilog

11.2.3 PicoBlaze in VHDL

11.2.4 PicoBlaze Application on the Basys3 Board

11.3 Xilinx MicroBlaze Microcontroller

11.3.1 MicroBlaze as an IP Block in Vivado

11.3.2 MicroBlaze MCS Application on the Basys3 Board

11.4 Soft-Core Microcontroller Applications

11.5 FPGA Building Blocks Used in Soft—Core Microcontrollers

11.6 Summary

11.7 Exercises

 

12 Digital Interfacing

12.1 Universal Asynchronous Receiver/ Transmitter

12.1.1 Working Principles of UART

12.1.2 UART in Verilog

12.1.3 UART in VHDL

12.1.4 UART Applications

12.2 Serial Peripheral Interface

12.2.1 Working Principles of SPI

12.2.2 SPI in Verilog

12.2.3 SPI in VHDL

12.2.4 SPI Application

12.3 Inter-Integrated Circuit

12.3.1 Working Principles of I2C

12.3.2 I2C in Verilog

12.3.3 I2C in VHDL

12.3.4 I2C Application

12.4 Video Graphics Array

12.4.1 Working Principles of VGA

12.4.2 VGA in Verilog

12.4.3 VGA in VHDL

12.4.4 VGA Application

12.5 Universal Serial Bus

12.5.1 USB-Receiving Module in Verilog

12.5.2 USB-Receiving Module in VHDL

12.5.3 USB Keyboard Application

12.6 Ethernet

12.7 FPGA Building Blocks Used in Digital Interfacing

12.8 Summary

12.9 Exercises

 

13 Advanced Applications

13.1 Integrated Logic Analyzer 1P Core Usage

13.2 The XADC Block Usage

13.3 Adding Two Floating-Point Numbers

13.4 Calculator

13.5 Home Alarm System

13.6 Digital Safe System

13.7 Car Park Occupied Slot Counting System

13.8 Vending Machine

13.9 Digital Clock

13.10 Moving Wave Via LEDs

13.11 Translator

13.12 Air Freshener Dispenser

13.13 0bstacle-Avoiding Tank

13.14 Intelligent Washing Machine

13.15 Non-Touch Paper Towel Dispenser

13.16 Traffic Lights

13.17 Car Parking Sensor System

13.18 Body Weight Scale

13.19 Intelligent Billboard

13.20 Elevator Cabin Control System

13.21 Digital Table Tennis Game

13.22 Customer Counter

13.23 Frequency Meter

13.24 Pedometer

 

14 What Is Next?

14.1 Vivado High-Level Synthesis Platform

14.2 Developing a Project in Vivado HLS to Generate IP

14.3 Using the Generated IP in Vivado

14.4 Summary

14.5 Exercises

 

References

Index

 

 

 

Note: In an acknowledgement in the book, the authors thank Xilinx's Cathal McCabe, an AE working within the Xilinx University Program, for his guidance and assistance. They also thank thank Digilent for allowing them to use the Basys3 and Arty board images and sample projects in the book.

 

By Adam Taylor

 

Recently I received two different questions from engineers on how to use SPI with the Zynq SoC and Zynq UltraScale+ MPSoC. Having answered these I thought a detailed blog on the different uses of SPI would be of interest.

 

 

Image1.jpg 

 

 

 

When we use a Zynq SoC or Zynq UltraScale+ MPSoC in our design we have two options for implementing SPI interfaces:

 

 

  • Use one of the two SPI controllers within the Processing System (PS)
  • Use an AXI Quad SPI (QSPI) IP module, configured for standard SPI communication within the programmable logic (PL)

 

 

Selecting which controller to use comes down to understanding the application’s requirements. Both SPI implementations can support all four SPI modes and both can function as either a SPI master or SPI slave. However, there are some suitable differences as the table below demonstrates:

 

 

Image2.jpg 

 

 

Initially, we will examine using the SPI controller integrated into the PS. We include this peripheral within our design by selecting the SPI controller within the Zynq MIO configuration tab. For this example I will route the SPI signals to the ARTY Z7 SPI connector, which requires use of EMIO via the PL I/O.

 

 

Image3.jpg

 

 

Enabling the SPI and mapping to the EMIO

 

 

 

With this selected all that remains within Vivado, is to connect the I/O from the SPI ports. How we do this depends upon whether we want a master or salve SPI implementation. Looking at the available ports on the SPI Controller, you will notice there are matching input (marked xxx_i) and output (marked xxx_o) ports for each SPI port. It is very important that we correctly connect these ports based on the master or slave implementation. Failure to do so will lead to hours of head scratching later when we develop the software because nothing will work as expected if we get the connections wrong. In addition, there is one Slave Select input when the controller is used as a SPI slave and three select pins for use in SPI master mode.

 

Once the I/O is correctly configured and the project built, we configure the SPI controller as either a master or slave using the SPI configuration options within the application software. To both configure and transfer data using the PS SPI controller, we use the API provided with the BSP, which is defined by XSPIps.H. In this first example, we will configure the PS SPI controller as the SPI Master.

 

By default, SPI transfers are 8-bit transfers. However we can transmit larger 16- or 32-bit words as well. To transmit 8-bit words, we use the type u8 within our C application. For 16- or 32-bit transfers, we use types u16 or u32 respectively for the read and write buffers.

 

At first, this may appear to cause a problem or at least generate a compiler warning because both API functions that perform data transfers require a u8 input for the transmit and receive buffers as shown below:

 

 

s32 XSpiPs_Transfer(XSpiPs *InstancePtr, u8 *SendBufPtr, u8 *RecvBufPtr, u32 ByteCount);

 

s32 XSpiPs_PolledTransfer(XSpiPs *InstancePtr, u8 *SendBufPtr, u8 *RecvBufPtr, u32 ByteCount);

 

 

To address this issue when using u16 or u32 types, we need to cast the buffers to a u8 pointer as demonstrated:

 

 

XSpiPs_PolledTransfer(&SpiInstance, (u8*)&TxBuffer, (u8*)&RxBuffer, 8);

 

 

This allows us to work with transfer sizes of 8, 16 or 32 bits. To demonstrate this, I connected the SPI master example to a Digilent Digital Discovery to capture the transmitted data. With the data width changed on the fly from 8- to 16-bits using the above methods in the application software.

 

 

Image4.jpg 

 

Zynq SoC PS SPI Master transmitting four 8-bit words

 

 

Image5.jpg

 

PS SPI Master transmitting four 16-bit words

 

 

The alternative to implementing a SPI interface using the Zynq PS is to implement an AXI QSPI IP core within the Zynq PS. Doing this requires more options being set in the Vivado design, which will limit run-time flexibility. Within the AXI QSPI configuration dialog, we can configure the transaction width, frequency, and number of slaves. One of the most important things we also configure here is whether the AXI QSPI IP core will act as a SPI slave or master. To enable a SPI master, you must check the enable master mode option. If this module is to operate as a slave, this option must be unchecked to ensure the SPISel input pin is present. When the SPI IP core acts as a slave, this pin must be connected to the master’s slave select.

 

 

 

Image6.jpg

 

Configuring the AXI Quad SPI

 

 

 

As with the PS SPI controller, the BSP also provides an API for the SPI IP. We use it to develop the application software. This API is defined within the file XSPI.h. I used this API to configure the AXI QSPI as a SPI slave for the second part of the example.

 

To demonstrate the AXI QSPI core working properly as a SPI Slave once I had created the software. I used Digilent’s Digital Discovery to act as the SPI master, allowing data to be easily transferred between the two.

 

 

 

Image7.jpg

 

 

Transmitting and Receiving Data with the SPI slave. (Blue data is output by the Zynq SPI Slave)

 

 

 

The final design created in Vivado to support both these examples has been uploaded to github.

 

 

 

Image8.jpg 

 

 

Final example block diagram

 

 

 

 

Of course, if you are using a Xilinx FPGA in place of a Zynq SoC or Zynq UltraScale MPSoC, it is possible to use a MicroBlaze soft processor with the same AXI QSPI configuration to implement a SPI interface. Just remember to correctly define it as a master or slave.  

 

I hope this helps outline how we can create both master and slave SPI systems using the two different approaches.

 

 

Code is available on Github as always.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg  

 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

By Adam Taylor

 

 

When we surveyed the different types of HDMI sources and sinks recently for our Zynq SoC and Zynq UltraScale+ MPSoC designs, one HDMI receiver we discussed was the ADV7611. This device receives three TDMS data streams and converts them into discrete video and audio outputs, which can then be captured and processed. Of course, the ADV7611 is a very capable and somewhat complex device. It requires configuration prior to use. We are going to examine how we can include one within our design.

 

 

 

Image1.jpg

 

ZedBoard HDMI Demonstration Configuration

 

 

 

To do this, we need an ADV7611. Helpfully, the FMC HDMI card provides two HDMI inputs, one of which uses an ADV7611. The second equalizes the TMDS data lanes and passes them on directly to the Zynq SoC for decoding.

 

To demonstrate how we can get this device up and running with our Zynq SoC or Zynq UltraScale+ MPSoC, we will create an example that uses the ZedBoard with the HDMI FMC. For this example, we first need to create a hardware design in Vivado that interfaces with the ADV7611 on the HDMI FMC card. To keep this initial example simple, I will be only receiving the timing signals output by the ADV7611. These signals are:

 

  • Local Locked Clock (LLC) – The pixel clock.
  • HSync – Horizontal Sync, indicates the start of a new line.
  • VSync – Vertical Sync, indicates the start of a new frame.
  • Video Active – indicates that the pixel data is valid (e.g. we are not in a Sync or Blanking period)

 

This approach uses the VTG’s (Video Timing Generator’s) detector to receive the sync signals and identify the received video’s timing parameters and video mode. Once the ADV7611 correctly identifies the video mode, we have configured correctly. It is then a simple step to connect the received pixel data to a Video-to-AXIS IP block and use VDMA to write the received video frames into DDR memory for further processing.

 

For this example, we need the following IP blocks:

 

  • VTC (Video Timing Controller) – Configured for detection and to receive sync signals only.
  • ILA – Connected to the sync signals so that we can see that they are toggling correctly—to aid debugging and commissioning.
  • Constant – Set to a constant 1 to enable the clock and detector enables.

 

The resulting block diagram appears below. The eagle-eyed will also notice the addition both a GPIO output and I2C bus from the processor system. We need these to control and configure the ADV7611.

 

 

Image2.jpg

 

 

Simple Architecture to detect the video type

 

 

Following power up, the ADV7611 generates no sync signals or video. We must first configure the device, which requires the use of an I2C bus. We therefore need to enable one of the two I2C controllers within the Zynq PS and route the IO to the EMIO so that we can then route the I2C signals (SDA and SCL) to the correct pins on the FMC connector. The ADV7611 is a complex device to configure with multiple I2C addresses that address different internal functions within the device. EDID and High-bandwidth Digital Content Protection (HDCP), for example.

 

We also need to be able to reset the ADV7611 following the application of power to the ZedBoard and FMC HDMI. We use a PS GPIO pin, output via the EMIO, to do this. Using a controllable I/O pin for this function allows the application software to reset of the device each time we run the program. This capability is also helpful when debugging the software application to ensure that we start from a fresh reset each time the program runs—a procedure that prevents previous configurations form affecting the next.

 

With the block diagram completed, all that remains is to build the design with the location constraints (identified below) to connect to the correct pins on the FMC connector for the ADV7611.

 

 

 

Image3.jpg

 

Vivado Constraints for the ADV7611 Design

 

 

 

Once Vivado generates the bit file, we are ready to begin configuring the ADV7611. Using the I2C interface this way is quite complex, so we will examine the steps we need to do this in detail in the next blog. However, the image below shows one set of the results from the testing of the completed software as it detects a VGA (640 pixel by 480 line) input:

 

 

 

Image4.jpg 

 

 

VTC output when detecting VGA input format

 

 

 

 

References:

 

https://reference.digilentinc.com/fmc_hdmi/refmanual

 

http://www.analog.com/media/en/technical-documentation/user-guides/UG-180.pdf

 

http://www.analog.com/media/en/evaluation-boards-kits/evaluation-software/ADV7611_Software_Manual.pdf

 

 

 

Code is available on Github as always.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

 

MicroZed Chronicles hardcopy.jpg

  

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.