We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!


Last September at the GNU Radio Conference in Boulder, Colorado, Ettus Research announced the RFNoC & Vivado Challenge for SDR (software-defined radio). Ettus’ RFNoC (RF Network on Chip) is designed to allow you to efficiently harness the latest-generation FPGAs for SDR applications without being an expert firmware or FPGA developer. Today, Ettus Research and Xilinx announced the three challenge winners.


Ettus’ GUI-based RFNoC design tool allows you to create FPGA applications as easily as you can create GNU Radio flowgraphs. This includes the ability to seamlessly transfer data between your host PC and an FPGA. It dramatically eases the task of FPGA off-loading in SDR applications. Ettus’ RFNoC is built upon Xilinx’s Vivado HLS.


Here are the three winning teams and their projects:






Finally, here’s a 5-minute video announcing the winners along with the prizes they have won:





Got a problem getting enough performance out of your processor-based embedded system? You might want to watch a 14-minute video that does a nice job of explaining how you can develop hardware accelerators directly from your C/C++ code using the Xilinx SDK.


How much acceleration do you need? If you don’t know for sure, the video gives an example of an autonomous drone with vision and control tasks that need real-time acceleration.


What are your alternatives? If you need to accelerate your code, you can:


  • Increase your processor’s clock speed, likely requiring a faster speed grade
  • Add more processor cores to share the load
  • Switch to a higher-end, code-compatible processor


Unfortunately, each of these three alternatives increases power consumption. There’s another alternative however that can actually cut power consumption. That alternative’s based on the use of Xilinx All Programmable Zynq SoCs and Zynq UltraScale+ MPSoCs. By moving critical code into custom hardware accelerators implements in the programmable logic incorporated into all Zynq family members, you can relieve the processor of the associated processing burden and actually slow the processor’s clock speed, thus reducing power. It’s quite possible to cut overall power consumption using this approach.


Ah, but implementing these accelerators. Aye, there’s the rub!


It turns out that implementation of these hardware accelerators might not be as difficult as you imagine. The Xilinx SDK is already a C/C++ development environment based on familiar IDE and compiler technology. Under the hood, the SDK serves as a single cockpit for all Zynq-based development work—software and hardware. It also includes SDSoC, the piece of the puzzle you need to convert C/C++ code into acceleration hardware using a 3-step process:



  • Code profiling to identify time-consuming tasks that are critical to real-time operation
  • Software/hardware partitioning based on the profiling data
  • Software/hardware compilation based on the system partitioning


One development platform, SDK, serves all members of the Zynq SoC and Zynq UltraScale+ MPSoC device families, giving you a huge price/performance range.


Here’s that 14-minute video:






When I first wrote about JTAG for EDN magazine in 1988 (Design for testability creates better products at lower cost), it was not a well-liked standard. No one wanted to dedicate three or four precious pins on an IC package (back when a lot of devices had only 40 pins); no one wanted to spend approximately 2% to 4% of the silicon die’s real estate on testability; and everyone thought that the serial test protocol was slow. Fast forward three decades. JTAG is a definitive standard and we’ve found all sorts of terrific things to do with it besides testing—downloading configurations into FPGAs and debugging designs for example.


JTAG has been an essential part of device configuration, debugging, and performance analysis in Xilinx All Programmable devices for a long, long time. When the number of configuration bits was small, JTAG-based configuration and debug felt fast. Times are a bit different now and JTAG bit rates that were once OK might now feel a bit slow.


Not anymore.


As of now, you have a new, faster alternative for JTAG-based configuration, debug, and performance analysis. That alternative is called the Xilinx SmartLynq Data Cable and it boosts default JTAG bitstream programming rates from 0.4 to 4Mbytes/sec (10x) and the JTAG maximum JTAG clock frequency from 12 to 40MHz (3.33x). That’s a lot faster.








The $495 SmartLynq Data Cable is backward compatible with the Xilinx Platform Cable USB II and uses the same, standard PC4 JTAG header connection to the target board. It is compatible with the Vivado Design Suite, Labtools, and Xilinx Software Development Kit. The SmartLynq Data Cable also has some nice features not available with the Xilinx Platform Cable USB II including an Ethernet host interface. (More details available in the SmartLynq Data Cable Quick Start Guide.)


 Stop waiting. Check out the SmartLynq Data Cable today.



Blue Pearl software has just announced Visual Verification Suite 2017.2, a suite of advanced RTL verification tools for advanced RTL linting, constraint generation, clock-domain crossing (CDC) analysis, and a debug environment that directly integrates with and augments the tools included in the Xilinx Vivado Design Suite. These tools can help you find more design bugs sooner, before getting into the more time-consuming design and analysis techniques—namely simulation and synthesis.


This new release of Visual Verification Suite 2017.2 includes updates to Blue Pearl’s Analyze RTL linting (super-linting) and debug tools, Synopsys Design Constraints (SDC) file generation, and CDC analysis to accelerate RTL verification. The new release also provides built in FPGA libraries and design rules that follow the Xilinx UltraFast Design Methodology. (Rules can be customized for design reuse and for conformance to safety standards such as STARC and DO-254.) You can download the Blue Pearl app from the Xilinx Tcl Store to integrate the Visual Verification Suite into the Vivado interactive design environment.


The Blue Pearl Visual Verification Suite consists of:


  • Blue Pearl’s Analyze RTL, which combines super-lint tools with formal verification into a single high-performance, high-capacity design checker.


  • Automatic SDC generation, which looks for false and multi-cycle paths and generates timing exception constraints in the industry-standard Synopsys Design Constraints file format.


  • A CDC Checker, which conducts a number of checks on clock signals to hunt for possible metastability by looking for problems such as missing synchronizers. (If you are not worried about metastability in large designs, be afraid. Be very afraid.)


  • A Management Dashboard, which provides real-time visibility into RTL design-rule and CDC checks to better assess schedules, risk, and overall design quality throughout the verification cycle. If you’re used to working on small Xilinx-based projects, this might not seem like a big issue. If you’re dealing with large design projects that fit into some of the newer Xilinx All Programmable devices, you know this is a significant challenge in the overall project design cycle.




Here’s a block diagram of the Blue Pearl Visual Verification Suite product flow:




Blue Pearl Visual Verification Suite Product Flow.jpg 




And here’s an excellent, 2-minute video explaining the complex interactions of timing, timing constraints, critical path timing, false paths, and multi-cycle paths and their relationship to synthesis:







The Blue Pearl Visual Verification Suite tools come wrapped in a visual environment that make it easier for you to chase down and kill bugs as early in the design cycle as possible. I’m told by Blue Pearl that Visual Verification Suite customers say that the Blue Pearl tools save them more than two weeks of development time in an average 16-week development cycle.


For more information about the Visual Verification Suite, please contact Blue Pearl Software directly.




Note: For more information about the Xilinx UltraFast Design Methodology, see “UltraFast: Hand-picked best practices from industry experts, distilled into a potent Design Methodology” and “Xilinx UltraFast Design Methodology gets free, 2-page Quick Reference Guide that you can read…ultra fast.” You can also download the free “UltraFast Design Methodology Guide for the Vivado Design Suite” and “UltraFast Embedded Design Methodology Guide.”


Blue Pearl also has a White Paper titled “Accelerating Xilinx All Programmable FPGA and SoC Design Verification with Blue Pearl Software” that you might want to read.


Xilinx 7 series FPGAs have 50-pin I/O banks with one common supply voltage for all 50 pins. The smaller Spartan-7 FPGAs have 100 I/O pins in two I/O banks, so it might be convenient in some smaller designs (or even some not-so-small designs) to combine the I/O for configuration and DDR memories into one FPGA I/O bank (plus the dedicated configuration bank 0) if possible so that the remaining I/O bank can operate at a different I/O voltage.


It turns out, you can do this with some MIG (Memory Interface Generator) magic, a little Vivado tool fiddling, and a simple level translator for the Flash memory’s data lines.


Application note XAPP1313 titled “Spartan-7 FPGA Configuration with SPI Flash and Bank 14 at 1.35V” shows you how to do this with a 1.8V Quad SPI Flash memory and 1.35V DDR3L SDRAM. Here’s a simplified diagram of what’s going on:



XAPP1313 Figure 1.jpg 




The advantage here is that you don’t need to move up to a larger FPGA to get another I/O bank.


For step-by-step instructions, see XAPP1313.






If you have read Adam Taylor’s 200+ MicroZed Chronicles here in Xcell Daily, you already know Adam to be an expert in the design of systems based on programmable logic, Zynq SoCs, and Zynq UltraScale+ MPSoCs. But Adam has significant expertise in the development of mission-critical systems based on his aerospace engineering work. He gave a talk about this topic at the recent FPGA Kongress held in Munich and he’s kindly re-recorded his talk, combined with slides in the following 67-minute video.


Adam spends the first two-thirds of the video talking about the design of mission-critical systems in general and then spends the rest of the time talking about Xilinx-specific mission-critical design including the design tools and the Xilinx isolation design flow.


Here’s the video:







Xilinx is starting a Vivado Expert Webinar Series to help you improve your design productivity and the first one, devoted to achieving timing closure in high-speed designs, takes place on July 26. Balachander Krishnamurthy—a Senior Product Marketing Manager for Static Timing Analysis, Constraints and Pin Planning—will present the material and will provide insight into Vivado high-speed timing-closure techniques along with some helpful guidelines.


Register here.






By Adam Taylor


With the Vivado design for the Lepton thermal imaging IR camera built and the breakout board connected to the Arty Z7 dev board, the next step is to update the software so that we can receive and display images. To do this, we can also use the HDMI-out example software application as this correctly configures the board’s VDMA output. We just need to remove the test-pattern generation function and write our own FLIR control and output function as a replacement.


This function must do the following:



  1. Configure the I2C and SPI peripherals using the XIICPS and XSPI API’s provided when we generated the BSP. To ensure that we can communicate with the Lepton Camera, we need to set the I2C address to 0x2A and configure the SPI for CPOL=1, CPHA=1, and master operation.
  2. Once we can communicate over the I2C interface to determine that the Lepton camera module is ready, we need to read the status register. If the camera is correctly configured and ready when we read this register, the Lepton camera will respond with 0x06.
  3. With the camera module ready, we can read out an image and store it within memory. To do this we execute several SPI reads.
  4. Having captured the image, we can move the stored image into the memory location being accessed by VDMA to display the image.



To successfully read out an image from the Lepton camera, we need to synchronize the VoSPI output to find the start of the first line in the image. The camera outputs each line as a 160-byte block (Lepton 2) or two 160-byte blocks (Lepton 3), and each block has a 2-byte ID and a 2-byte CRC. We can use this ID to capture the image, identify valid frames, and store them within the image store.


Performing steps 3 and 4 allows us to increase the size of the displayed image on the screen. The Lepton 2 camera used for this example has a resolution of only 80 horizontal pixels by 60 vertical pixels. This image would be very small when displayed on a monitor, so we can easily scale the image to 640x480 pixels by outputting each pixel and line eight times. This scaling produces a larger image that’s easier to recognize on the screen although may look a little blocky.


However, scaling alone will not present the best image quality as we have not configured the Lepton camera module to optimize its output. To get the best quality image from the camera module, we need to use the I2C command interface to enable parameters such as AGC (automatic gain control), which affects the contrast and quality of the output image, and flat-field correction to remove pixel-to-pixel variation.


To write or read back the camera module’s settings, we need to create a data structure as shown below and write that structure into the camera module. If we are reading back the settings, we can then perform an I2C read to read back the parameters. Each 16-bit access requires two 8-bit commands:


  • Write to the command word at address 0x00 0x04.
  • Generate the command-word data formed from the Module ID, Command ID, Type, and Protection bit. This word informs the camera module which element of the camera we wish to address and if we wish to read, write, or execute the command.
  • Write the number of words to be read or written to the data-length register at address 0x00 0x06.
  • Write the number of data words to addresses 0x00 0x08 to 0x00 0x26.


This sequence allows us to configure the Lepton camera so that we get the best performance. When I executed the updated program, I could see the image that appears below, of myself taking a picture of the screen on the monitor screen. The image has been scaled up by a factor of 8.  






Now that we have this image on the screen, I want to integrate this design with MiniZed dev board and configure the camera to transfer images over a wireless network.


Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here


MicroZed Chronicles Second Year.jpg 






reVISION Cobot logo.jpg

In a free Webinar taking place on July 12, Xilinx experts will present a new design approach that unleashes the immense processing power of FPGAs using the Xilinx reVISION stack including hardware-tuned OpenCV libraries, a familiar C/C++ development environment, and readily available hardware-development platforms to develop advanced vision applications based on complex, accelerated vision-processing algorithms such as dense optical flow. Even though the algorithms are advanced, power consumption is held to just a few watts thanks to Xilinx’s All Programmable silicon.


Register here.



By Adam Taylor


Over this blog series, I have written a lot about how we can use the Zynq SoC in our designs. We have looked at a range of different applications and especially at embedded vision. However, some systems use a pure FPGA approach to embedded vision, as opposed to an SoC like the members in the Zynq family, so in this blog we are going to look at how we can get a simple HDMI input-and-output video-processing system using the Artix-7 XC7A200T FPGA on the Nexys Video Artix-7 FPGA Trainer Board. (The Artix-7 A200T is the largest member of the Artix-7 FPGA device family.)


Here’s a photo of my Nexys Video Artix-7 FPGA Trainer Board:






Nexys Video Artix-7 FPGA Trainer Board




For those not familiar with it, the Nexys Video Trainer Board is intended for teaching and prototyping video and vision applications. As such, it comes with the following I/O and peripheral interfaces designed to support video reception, processing, and generation/output:



  • HDMI Input
  • HDMI Output
  • Display Port Output
  • Ethernet
  • UART
  • USB Host
  • 512 MB of DDR SDRAM
  • Line In / Mic In / Headphone Out / Line Out
  • FMC



To create a simple image-processing pipeline, we need to implement the following architecture:







The supervising processor (in this case, a Xilinx MicroBlaze soft-core RISC processor implemented in the Artix-7 FPGA) monitors communications with the user interface and configures the image-processing pipeline as required for the application. In this simple architecture, data received over the HDMI input is converted from its parallel format of Video Data, HSync and VSync into an AXI Streaming (AXIS) format. We want to convert the data into an AXIS format because the Vivado Design Suite provides several image-processing IP blocks that use this data format. Being able to support AXIS interfaces is also important if we want to create our own image-processing functions using Vivado High Level Synthesis (HLS).


The MicroBlaze processor needs to be able to support the following peripherals:



  • AXI UART – Enables communication and control of the system
  • AXI Timer Enables the MicroBlaze to time events

  • MicroBlaze Debugging Module – Enables the debugging of the MicroBlaze

  • MicroBlaze Local Memory – Connected to DLMB and ILMB (Data & Instruction Local Memory Bus)


We’ll use the memory interface generator to create a DDR interface to the board’s SDRAM. This interface and the SDRAM creates a common frame store accessible to both the image-processing pipeline and the supervising processor using an AXI interconnect.


Creating a simple image-processing pipeline requires the use of the following IP blocks:



  • DVI2RGB – HDMI input IP provided by Digilent
  • RGB2DVI – HDMI output IP provided by Digilent
  • Video In to AXI4-Stream – Converts a parallel-video input to AXI Streaming protocol (Vivado IP)
  • AXI4-Stream to Video Out – Converts an AXI-Stream-to-Parallel-video output (Vivado IP)
  • Video Timing Controller Input – Detects the incoming video parameters (Vivado IP)
  • Video Timing Controller Output – Generates the output video timing parameters (Vivado IP)
  • Video Direct Memory Access – Enables images to be written to and from the DDR SDRAM



The core of this video-processing chain is the VDMA, which we use to move the image into the DDR memory.







The diagram above demonstrates how the IP block converts from streamed data to memory-mapped data for the read and write channels. Both VDMA channels provide the ability to convert between streaming and memory-mapped data as required. The write channel supports Stream-to-Memory-Mapped conversion while the read channel provides Memory-Mapped-to-Stream conversion.


When all this is put together in Vivado to create the initial base system, we get the architecture below, which is provided by the Nexys Video HDMI example.







All that is required now is to look at the software required to configure the image-processing pipeline. I will explain that next time.




Code is available on Github as always.




If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Adam Taylor Special Edition.jpg




  • Second Year E Book here
  • Second Year Hardback here


MicroZed Chronicles Second Year.jpg 




The RISC-V open-source processor has a growing ecosystem and user community so it’s not surprising that someone would want to put one of these processors into a low-cost FPGA like a Xilinx Artix-7 device. And what could be easier than doing so using an existing low-cost dev board? Cue Digilent’s Arty Dev Board, currently on sale for $89.99 here. Normally, you’d find a copy of the Xilinx MicroBlaze soft RISC processor core inside of Arty’s Artix-7 FPGA but a SiFive Freedom E310 microcontroller platform that combines a RISC-V processor with peripherals seems to fit just fine so that’s just what Andrew Black has done using the no-cost Xilinx Vivado HL WebPack Edition to compile the HDL.



ARTY v4.jpg


Digilent’s ARTY Artix-7 FPGA Dev Board



With Black’s step-by-step instructions based on SiFive's "Freedom E300 Arty FPGA Dev Kit Getting Started Guide", you can do the same pretty easily. (See “Build an open source MCU and program it with Arduino.”)



Andrew Back is an open-source advocate, Treasurer and Director of the Free and Open Source Silicon Foundation, organizer of the Wuthering Bytes technology festival and founder of the Open Source Hardware User Group.


Note: For more information on the Digilent Arty Dev Board, see “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????” and “Free Webinar on $99 Arty dev kit, based on Artix-7 FPGA, now online.”






In February, I wrote a blog detailing the use of a Xilinx Kintex-7 K325T or K410T FPGA in Keysight’s new line of high-speed AWGs (arbitrary waveform generators) and signal digitizers. (See “Kintex-7 FPGAs sweep the design of six new Keysight high-speed PXI AWGs and Digitizers.”) The six new Keysight PXI instruments in that blog included the M3100A 100MSamples/sec, 4 or 8-channel FPGA digitizer; the M3102A 500Msamples/sec, 2 or 4-channel FPGA digitizer; M3201A 500MSamples/sec FPGA arbitrary waveform generator; the M3202A 1GSamples/sec FPGA arbitrary waveform generator; M3300A 500MSamples/sec, 2-channel FPGA AWG/digitizer combo; and the M3302A 500MSamples/sec, 4-channel FPGA AWG/digitizer combo.


In that blog post, I wrote:



“This family of Keysight M3xxx instruments clearly demonstrates the ability to create an FPGA-based hardware platform that enables rapid development of many end products from one master set of hardware designs. In this case, the same data-acquisition and AWG block diagrams recur on the data sheets of these instruments, so you know there’s a common set of designs.”



And that’s still true. Incorporating a Xilinx All Programmable FPGA, Zynq SoC, or Zynq UltraScale+ MPSoC into your product design allows you to create a hardware platform (or platforms) that give you a fast way to spin out new, highly differentiated products based on that platform. Keysight, realizing that the FPGA capability would be useful to its own customers as well, exposed much of the internal FPGA capabilities in these instruments through the Keysight M3602A Graphical FPGA Development Environment, which allows you to customize these instruments using off-the-shelf DSP blocks, MATLAB/Simulink, the Xilinx CORE Generator and Vivado IP cores, and the Xilinx Vivado Design Suite with either VHDL or Verilog code.




Keysight FPGA Block Diagram Editor.jpg


Keysight’s M3602A FPGA Block Diagram Editor




A recent Keysight email alerted me to three new application notes Keysight has published that detail the use of on-board FPGA resources to enhance the instruments for specific applications. The three app notes are:


  • FPGA Implementation of a LUT-Based Digital Pre-Distortion Using M3602A FPGA Design Environment
  • FPGA Implementation of a Digital-PLL Using M3602A FPGA Design Environment
  • FPGA Implementation of a LUT-Based Input Processing Using M3602A FPGA Design Environment



Only All Programmable devices give you this kind of high-speed hardware programmability in addition to microprocessor-based software programmability and these Keysight instruments and the M3602A Development Environment are yet one more demonstration of why that’s a very handy option for you to consider when designing your own products.


As I concluded in that February blog post (and it’s worth repeating):


“Xilinx FPGAs are inherently well-suited to this type of platform-based product design because of the All-Programmable (I/O, hardware, and software) nature of the devices. I/O programmability permits any-to-any connectivity—as is common with, for example, camera designs when you’re concerned about adapting to a range of sensors or different ADCs and DACs for digitizers and AWGs. Hardware programmability allows you to rapidly modify real-time signal-processing or motor-control algorithms—as is common with diverse designs including high-speed instrument designs and industrial controllers.”


Of course these same ideas apply to all types of products, not just AWGs and digitizers.



(You can access the three Keysight app notes here.)





Metamako decided that it needed more than one Xilinx UltraScale FPGA to deliver the low latency and high performance it wanted from its newest networking platform. The resulting design is a 1RU or 2RU box that houses one, two, or three Kintex UltraScale or Virtex UltraScale+ FPGAs, connected by “near-zero” latency links. The small armada of FPGAs means that the platform can run multiple networking applications in parallel—very quickly. This new networking platform allows Metamako to expand far beyond its traditional market—financial transaction networking—into other realms such as medical imaging, SDR (software-defined radio), industrial control, and telecom. The FPGAs are certainly capable of implementing tasks in all of these applications with extremely high performance.



Metamako Triple-FPGA Networking Platform.jpg


Metamako’s Triple-FPGA Networking Platform




The Metamako platform offers an extensive range of standard networking features including data fan-out, scalable broadcast, connection monitoring, patching, tapping, time-stamping, and a deterministic port-to-FPGA latency of just 3nsec. Metamako also provides a developer’s kit with the platform with features that include:



  • A Simplified Stack - One device houses the FPGA(s), x86 server, and Layer 1 switch, ensuring that all hardware components work in sync.
  • Integration with existing FPGA Tools – Platform-specific adapters for programming the FPGA(s) are embedded in the Metamako device, allowing for quick and easy (remote) access to the device by the FPGA development tools.
  • Layer 1+ Crosspoint Functionality – Includes all core Metamako Layer 1 functions such as market-scalable broadcast, connection monitoring, remote patching, tapping, and timestamping.
  • SFP Agnostic – Metamako’s Layer 1+ switch is SFP agnostic, which saves FPGA developers time and effort in having to interface with lots of different types of SFPs.
  • Feature Rich – Standard enterprise features include access control, syslog, SNMP, packet stats, tcpdump, JSON RPC API, time series data, and streaming telemetry.
  • Easy Application Deployment - Metamako's platform includes a built-in application infrastructure that allows developers to wrap applications into simple packages for deployment.



This latest networking platform from Metamako demonstrates a key attribute of Xilinx All Programmable technology: the ability to fully differentiate a product by exploiting the any-to-any connectivity and high-speed processing capabilities of Xilinx silicon using Xilinx’s development tools. No other chip technology could provide Metamako with a comparable mix of extreme connectivity, speed, and design flexibility.





Vivado HLx Logo.jpg 

You can now download the Vivado Design Suite 2017.2 HLx editions, which include many new UltraScale+ devices:


  • Kintex UltraScale+ XCKU13P
  • Zynq UltraScale+ MPSoCs XCZU7EG, XCZU7CG, and XCZU15EG
  • XA Zynq UltraScale+ MPSoCs XAZU2EG and XAZU3EG



In addition, the low-cost Spartan-7 XC7S50 FPGA has been added to the WebPack edition.


Download the latest releases of the Vivado Design Suite HL editions here.






Last month, Xilinx Product Marketing Manager Darren Zacher presented a Webinar on the extremely popular $99 Arty Dev Kit, which is based on a Xilinx Artix-7 A35T FPGA, and it’s now online. If you’re wondering if this might be the right way for you to get some design experience with the latest FPGA development tools and silicon, spend an hour with Zacher and Arty. The kit is available from Avnet and Digilent.


Register to watch the video here.



ARTY v4.jpg 



For more information about the Arty Dev Kit, see: “ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????






Avnet has formally introduced its MiniZed dev board based on the Xilinx Zynq Z-7000S SoC with the low, low price of just $89. For this, you get a Zynq Z-7007S SoC with one ARM Cortex-A9 processor core, 512Mbytes of DDR3L SDRAM, 128Mbits of QSPI Flash, 8Gbytes of eMMC Flash memory, WiFi 802.11 b/g/n, and Bluetooth 4.1. The MiniZed board incorporates an Arduino-compatible shield interface, two Pmod connectors, and a USB 2.0 host interface for fast peripheral expansion. You’ll also find an ST Microelectronics LIS2DS12 Motion and temperature sensor and an MP34DT05 Digital Microphone on the board. This is a low-cost dev board that packs the punch of a fast ARM Cortex-A9 processor, programmable logic, a dual-wireless communications system, and easy system expandability.


I find the software that accompanies the board equally interesting. According to the MiniZed Product Brief, the $89 price includes a voucher for an SDSoC license so you can program the programmable logic on the Zynq SoC using C or C++ in addition to Verilog or VHDL using Vivado. This is a terrific deal on a Zynq dev board, whether you’re a novice or an experienced Xilinx user.


Avnet’s announcement says that the board will start shipping in early July.


Stefan Rousseau, senior technical marketing engineer for Avnet, said, “Whether customers are developing a Linux-based system or have a simple bare metal implementation, with MiniZed, Zynq-7000 development has never been easier. Designers need only connect to their laptops with a single micro-USB cable and they are up and running. And with Bluetooth or Wi-Fi, users can also connect wirelessly, transforming a mobile phone or tablet into an on-the-go GUI.”




Here’s a photo of the MiniZed Dev board:



Avnet MiniZed 3.jpg 


Avnet’s $89 MiniZed Dev Board based on a Xilinx Zynq Z-7007S SoC



And here’s a block diagram of the board:



MiniZed Block Diagram.jpg 


Avnet’s $89 MiniZed Dev Board Block Diagram


By Adam Taylor


We can create very responsive design solutions using Xilinx Zynq SoC or Zynq UltraScale+ MPSoC devices, which enble us to architect systems that exploit the advantages provided by both the PS (processor system) and the PL (programmable logic) in these devices. When we work with logic designs in the PL, we can optimize the performance of design techniques like pipelining and other UltraFast design methods. We can see the results of our optimization techniques using simulation and Vivado implementation results.


When it comes to optimizing the software, which runs on acceleration cores instantiated in the PS, things may appear a little more opaque. However, things are not what they might appear. We can gather statistics on our accelerated code with ease using the performance analysis capabilities built into XSDK. Using performance analysis, we can examine the performance of the software we have running on the acceleration cores and we can monitor AXI performance within the PL to ensure that the software design is optimized for the application at hand.


Using performance analysis, we can examine several aspects of our running code:


  • CPU Utilization – Percentage of non-idling CPU clock cycles
  • CPU Instructions Per Cycle – Estimated number of executed instructions per cycle
  • L1 Cache Data Miss Rate % – L1 data-cache miss rate
  • L1 Cache Access Per msec – Number of L1 data-cache accesses
  • CPU Write Instructions Stall per cycle – Estimated number of stall cycles per instruction
  • CPU Read Instructions Stall per cycle – Estimated number of stall cycles per instruction


For those who may not be familiar with the concept, a stall occurs when the cache does not contain the requested data, which must then be fetched from main memory. While the data is fetched, the core can continue to process different instructions using out-of-order (OOO) execution, however the processor will eventually run out of independent instructions. It will have to wait for the information it needs. This is called a stall.


We can gather these stall statistics thanks to the Performance Monitor Unit (PMU) contained within each of the Zynq UltraScale+ MPSoC’s CPUs. The PMU provides six profile counters, which are configured by and post processed by XSDK to generate the statistics above.


If we want to use the performance monitor within SDK, we need to work with a debug build and then open the Performance Monitor Perspective within XSDK. If we have not done so before, we can open the perspective as shown below:









Opening the Performance Analysis Perspective



With the performance analysis perspective open, we can debug the application as normal. However, before we click on the run icon (the debugger should be set to stop at main, as default), we need to start the performance monitor. To do that, right click on the “System Debugger on Local” symbol within the performance monitor window and click start.





Starting the Performance Analysis




Then, once we execute the program, the statistics will be gathered and we can analyse them within XDSK to determine the best optimizations for our code.


To demonstrate how we can use this technique to deliver a more optimized system, I have created a design that runs on the ZedBoard and performs AES256 Encryption on 1024 packets of information. When this code was run the ZedBoard the following execution statistics were collected:





Performance Graphs






Performance Counters




So far, these performance statistics only look at code executing on the PS itself. Next time, we will look at how we can use the AXI Performance Monitor with XSDK. If we wish to do this, we need to first instrument the design in Vivado.





Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 




When someone asks where Xilinx All Programmable devices are used, I find it a hard question to answer because there’s such a very wide range of applications—as demonstrated by the thousands of Xcell Daily blog posts I’ve written over the past several years.


Now, there’s a 5-minute “Powered by Xilinx” video with clips from several companies using Xilinx devices for applications including:


  • Machine learning for manufacturing
  • Cloud acceleration
  • Autonomous cars, drones, and robots
  • Real-time 4K, UHD, and 8K video and image processing
  • VR and AR
  • High-speed networking by RF, LED-based free-air optics, and fiber
  • Cybersecurity for IIoT


That’s a huge range covered in just five minutes.


Here’s the video:






Perhaps you think DPDK (Data Plane Development Kit) is a high-speed data-movement standard that’s strictly for networking applications. Perhaps you think DPDK is an Intel-specific specification. Perhaps you think DPDK is restricted to the world of host CPUs and ASICs. Perhaps you’ve never heard of DPDK—given its history, that’s certainly possible. If any of those statements is correct, keep reading this post.


Originally, DPDK was a set of data-plane libraries and NIC (network interface controller) drivers developed by Intel for fast packet processing on Intel x86 microprocessors. That is the DPDK origin story. Last April, DPDK became a Linux Foundation Project. It lives at DPDK.org and is now processor agnostic.


DPDK consists of several main libraries that you can use to:


  • Send and receive packets while minimizing the number of CPU cycles needed (usually less than 80)
  • Develop fast packet-capture algorithms
  • Run 3rd-party fast-path stacks


So far, DPDK certainly sounds like a networking-specific development kit but, as Atomic Rules’ CTO Shep Siegel says, “If you can make your data-movement problem look like a packet-movement problem,” then DPDK might be a helpful shortcut in your development process.


Siegel knows more than a bit about DPDK because his company has just released Arkville, a DPDK-aware FPGA/GPP data-mover IP block and DPDK PMD (Poll Mode Driver) that allow Linux DPDK applications to offload server cycles to FPGA gates in tandem with the Linux Foundation’s 17.05 release of the open-source DPDK libraries. Atomic Rules’ Arkville release is compatible with Xilinx Vivado 2017.1 (the latest version of the Vivado Design Suite), which was released in April. Currently, Atomic rules provides two sample designs:



  • Four-Port, Four-Queue 10 GbE example (Arkville + 4×10 GbE MAC)
  • Single-Port, Single-Queue 100 GbE example (Arkville + 1×100 GbE MAC)


(Atomic Rules’ example designs for Arkville were compiled with Vivado 2017.1 as well.)



These examples are data movers; Arkville is a packet conduit. This conduit presents a DPDK interface on the CPU side and AXI interfaces on the FPGA side. There’s a convenient spot in the Arkville conduit where you can add your own hardware for processing those packets. That’s where the CPU offloading magic happens.


Atomic Rules’ Arkville IP works well with all Xilinx UltraScale devices but it works especially well with Xilinx UltraScale+ All Programmable devices that provide two integrated PCIe Gen3 x16 controllers. (That includes devices in the Kintex UltraScale+ and Virtex UltraScale+ FPGA families and the Zynq UltraScale+ MPSoC device families.)




Because, as BittWare’s VP of Network Products Craig Lund says, “100G Ethernet is hard. It’s not clear that you can use PCIe to get [that bit rate] into a server [using one PCIe Gen3 x16 interface]. From the PCIe specs, it looks like it should be easy, but it isn’t.” If you are handling minimum-size packets, says Lund, there are lots of them—more than 14 million per second. If you’re handling big packets, then you need a lot of bandwidth. Either use case presents a throughput challenge to a single PCIe Root Complex. In practice, you really need two.


BittWare has implemented products using the Atomic Rules Arkville IP, based on its XUPP3R PCIe card, which incorporates a Xilinx Virtex UltraScale+ VU13P FPGA. One of the many unique features of this BittWare board is that it has two PCIe Gen3 x16 ports: one available on an edge connector and the other available on an optional serial expansion port. This second PCIe Gen3 x16 port can be connected to a second PCIe slot for added bandwidth.


However, even that’s not enough says Lund. You don’t just need two PCIe Gen3 x16 slots; you need two PCIe Gen2 Root Complexes and that means you need a 2-socket motherboard with two physical CPUs to handle the traffic. Here’s a simplified block diagram that illustrates Lund’s point:



BittWare XUPP3R PCIe Card with two processors.jpg 



BittWare’s XUPP3R PCIe Card has two PCIe Gen3 x16 ports: one on an edge connector and the other on an optional serial expansion port for added bandwidth




BittWare has used its XUPP3R PCIe card and the Arkville IP to develop two additional products:




Note: For more information about Atomic Rules’ IP and BittWare’s XUPP3R PCIe card, see “BittWare’s UltraScale+ XUPP3R board and Atomic Rules IP run Intel’s DPDK over PCIe Gen3 x16 @ 150Gbps.”



Arkville is a product offered by Atomic Rules. The XUPP3R PCIe card is a product offered by BittWare. Please contact these vendors directly for more information about these products.






By Adam Taylor


So far, our examination of the Zynq UltraScale MPSoC + has focused mainly upon the PS (processing system) side of the device. However, to fully utilize the device’s capabilities we need to examine the PL (programmable logic) side also. So in this blog, we will look at the different AXI interfaces between the PS and the PL.





Zynq MPSoC Interconnect Structure




These different AXI interfaces provide a mixture of master and slave ports from the PS perspective and they can be coherent or not. The PS is the master for the following interfaces:


  1. FPD High Performance Master (HPM) – Two interfaces within the Full Power Domain.
  2. LPD High Performance Master (HPM) – One Interface within the Low Power Domain.


For the remaining interfaces the PL is the master:


  1. FPD High Performance Coherent (HPC) – Two Interfaces within the Full Power Domain. These interfaces pass through the CCI (Cache Coherent Interconnect) and provide one-way coherency from the PL to the PS.
  2. FPD High Performance (HP) – Four Interfaces within the Full Power Domain. These interfaces provide non-coherent transfers.
  3. Low Power Domain – One interface within the Low Power Domain.
  4. Accelerator Coherency Port (ACP) – One interface within the Full Power Domain. This interface provides one-way coherency (IO) allowing PL masters to snoop the APU Cache.
  5. Accelerator Coherency Extension (ACE) – One interface within the Full Power Domain. This interface provides full coherency using the CCI. For this interface, the PL master needs to have a cache within the PL.


Except for the ACE and ACP interfaces, which have a fixed data width, the remaining interfaces have a selectable data width of 32, 64, or 128 bits.


To support the different power domains within the Zynq MPSoC, each of the master interfaces within the PS is provided with an AXI isolation block that isolates the interface should a power domain be powered down. To protect the APU and RPU from hanging up performing an AXI access, each PS master interface also has a AXI timeout block to recover from any incorrect AXI interactions—for example, if the PL is not powered or configured.


We can use these interfaces simply within our Vivado design, where we can enable, disable, and configure the desired interface.







Once you have enabled and configured the desired interfaces, you can connect them into your design in the PL. Within the simple example in this blog post, we are going to transfer data to and from a BRAM located within the PL.






This example uses the AXI master connected to the low-power domain (LPD). However, both the APU and the RPU can address the BRAM via this interface thanks to the SMMU, the Central Switch, and the Low Power Switch. However, the use of the LPD AXI interconnect will allow the RPU to access the PL if the FPD (full-power domain) is powered down. Of course, it does increase complexity when using the APU.


This simple example performs the following steps:


  • Reads 256 addresses and check that they are all zero.
  • Write a count into the 256 addresses.
  • Read back the data stored in the 256 addresses to demonstrate that the data was written correctly.






Program Starting to read addresses for part 1





Data written to the first 256 BRAM addresses






Data read back to confirm the write



The key element in our designs is selecting the correct AXI interface for the application and data transfers at hand and ensuring that we are getting the best possible performance from the interconnect. Next time we will look at the quality of service and the AXI performance monitor.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 


Adam Taylor’s MicroZed Chronicles Part 199: The AD9467 SDSOC Platform

by Xilinx Employee ‎05-31-2017 03:36 PM - edited ‎05-31-2017 03:54 PM (8,675 Views)


By Adam Taylor


Having got the base hardware and software designs up and running, the next step is to create a SDSoC platform so that we can use this design efficiently. The SDSoC platform allows us to implement algorithms at a much higher level using C or C++. We can therefore develop C or C++ programs using SDSoC to access the ADC sample data within the DDR memory and verify that our algorithms work correctly. Once we are sure that we have the correct algorithmic function (but not necessarily the desired performance), we can accelerate these algorithms by putting them into the Zynq SoC’s programmable logic (PL) rather seamlessly. Taking such an approach enables us to use one base design for a range of applications. Because we are developing in a higher language, the time taken to produce the first working demonstration is reduced.


To generate an SDSoC platform, we need a Vivado base design, the necessary software libraries, and three definition files:


  • XPFM – This is the top-level definition of the Platform – Generated by hand
  • HPFM – The hardware definition of the platform – Generated by Vivado
  • SPFM – Defines the software definition of the platform – Generated by hand


The first thing we need to do to create the SDSoC platform (I am using version 2016.3) is to modify the design in Vivado using the UG1146 requirements for a hardware platform. This means that we need to update the concatenation block and move the used interrupts down to the least significant inputs. This frees up the remaining interrupts so that SDSoC can use them when it accelerates an algorithm using hardware. I also enabled all four FCLKs and Resets from the Zynq SoC’s PS (processing system) to the PL and instantiated the reset blocks for each of these clocks. I then followed the steps within UG1146 to create the hardware metadata to create one half of the platform. In this case, the hardware side of the SDSoC Platform makes available the AXI ACP, AXI HP2, AXI HP3, and AXI GP Master 1 connections. The other AXI interfaces are already in use by the existing AD9467 demo design.


There is one more thing we need as we create the hardware platform. Because this is a custom platform, which uses custom IP, we need to ensure that the IP is within the Vivado project for the SDSoC Platform. If it is not, then when we try to build our SDSoC platform we will get several failures in the build process because it cannot find IP information. The simplest method for preventing this problem is to use the Vivado Archive function to archive the design. Then the archived design will be extracted and used to define the SDSoC hardware platform.


To create the software platform (as we are using the ZedBoard for this example), I initially copied the software and top-level XML file from the <SDSoC Install>/platforms/zed directory, before editing them to reflect the needs of the platform:





Top Level of the ad9467_fmc_zed SDSoC Platform



These steps provided me with an SDSoC platform that I can use for development with the ZedBoard and the AD9467 FMC. My next step then was to perform some pipe cleaning to ensure that the platform functions as intended. To do this I wanted to:



  1. Build the AD9467 demo application and run it from with SDSoC with no acceleration.
  2. Create a simple acceleration example built onto to the base hardware. For this I am going to use one of the matrix multiply examples.



As I did not declare a prebuilt platform, SDSoC will generate the hardware the first time we build the application. I did this to ensure that SDSoC can re-build the hardware design without any accelerations but with the custom IP blocks needed for the AD9467 demo.






Vivado Diagram as used for the AD9467 Demo application



Having built the first application successfully, I then ran it on the ZedBoard with the AD9467 FMC connected and observed the same performance as I had previously seen when using SDK. This means that I can start developing that use the data provided by the AD9467 within the SDSoC environment.






However, once I have finished generating and testing my algorithms in C/C++, I will want to accelerate elements of the design. That is where the second test of the platform comes in: to test that the platform is correctly defined and is therefore capable of accelerating C and C++ functions into the hardware. Within the AD9467 FMC SDSoC platform, I created an example application for acceleration using one of the predefined SDSoC examples: the mmult. This will add functionality necessary to perform the MMult within the hardware in addition to the base design we have been using for the AD9467.





Accelerating the mmult_accel function in the AD9467 FMC Zed Platform






Resultant SDSoC Vivado design, AD9467 FMC design with additional hardware for the mmult_accel function (circled in red)






MMULT results on the AD9467 FMC Zed Platform



Generating this SDSoC platform was pretty simple and it allows us to develop our applications much faster than would be the case if we were using a standard HDL based approach. We will look at how we can do signal processing with this platform in future blogs.



I have uploaded the SDSoC Platform to the following git hub repository which is different to the standard one due to the organization of the platform.






If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



 MicroZed Chronicles Second Year.jpg



Adam Taylor’s MicroZed Chronicles Part 198: Building the 250Msamples/sec AD9467 FMC Card

by Xilinx Employee ‎05-30-2017 10:48 AM - edited ‎05-30-2017 10:50 AM (9,545 Views)


By Adam Taylor



Last week I mentioned, the Analog Devices AD9467 FMC in the blog and how we could use it with the Xilinx SDSoC development environment to capture data with a simple data-capture chain and then develop and accelerate the algorithm using a high-level language like C or C++.





Analog Devices AD9467 FMC and Zynq-based Avnet ZedBoard Combined




The AD9467 FMC contains the AD9467 ADC, which provides 16-bit quantization at sampling rates of up to 250Msamples/sec (MSPS). These specs allow us to use the AD9467 to sample Intermediate Frequency (IF) signals. An IF is used to move an RF carrier wave down from or up to a higher frequency for reception or transmission.


The first thing we need to do with the AD9467 board is to work out the clocking scheme we’ll use to provide the ADC with a sample clock. We have three options:


  1. Apply an externally generated sine-wave. This option allows us to easily change the sampling frequency. However, to ensure good convertor performance, we’ll need a low-jitter clock from a quality signal source.
  2. Use the on-board oscillator. This option provides a fixed 250MHz reference clock to the ADC. It has the advantage of being an on-board resource with a known good layout. However, its sampling frequency is fixed.
  3. Use the on-board AD9517—an SPI-controlled, 12-output clock generator. This option gives us the ability to set the sampling frequency as desired.


To change between the three sources, we add and remove ac coupling capacitors from the circuit to put the correct clock generator in the clock path. By default, the clock path is configured to use the external clock source.


However, before we can create an SDSoC Platform, we need to create a base design in Vivado. This base design interfaces with the AD9467 FMC and transfers the sampled data into the Zynq SoC’s PS (processing system) DDR memory using DMA. Rather helpfully, the AD9467 FMC comes with a Vivado example that we can use with the ZedBoard. This example design creates the structure to transfer samples into the PS DDR SDRAM using DMA.


To recreate this design, the first thing we need to do is download the Analog Devices Git Hub repository, which contains both the shared IP elements required and the actual Vivado design example. To ensure we are using the latest possible tool chain, select the latest tool revision from the Git Hub and download a zip of the repository or clone the repository from here.


To build this project, we need to be using either a Linux box or, if we are using Microsoft Windows, we’ll need to download and install CYGWIN. If you are using CYGWIN, you need to make sure you have Vivado in your path.


To build the project you just need to use either a terminal or CYGWIN to navigate to the AD9467_FMC directory and execute the make file for the Zed version.





Make file running in CYGWIN to recreate the project




Once this has been recreated, we will be able to open our project in Vivado, explore the design in the block diagram, and export the design. We can then use the test application software to complete the demo.






AD9467 FMC example design




As can be seen in the above example, these steps add the FMC example into the existing Zynq base hardware design so that all the other interfaces like HDMI are still available. These additional interfaces can be very useful to us. In the diagram above, you can see the highlighted path from the AD9467 receiver IP, into a DMA IP block and then an AXI Interconnect block that connects to a Zynq HP (high-performance) AXI port. This design allows the data move seamless into the PS DDR SDRAM for future processing.


Of course to do this we need to run some software on the Zynq SoC’s ARM Cortex-A9 processor to configure the AD9467, the AD9517, and the simple internal processing pipeline. You can download the demo application example from here on GitHub. Helpfully, it comes with batch files (one for Linux one for Windows), which are used to create the demo software application to support the Vivado design.


When we run this example on the Zynq SoC, we will find that it performs a number of tests prior to performing the first ADC sample capture.






Terminal Output from ZedBoard if the FMC is present




The samples will be stored at 0x0800_0000 within the DDR SDRAM. Using the debug facility within SDK, we can examine these values and see that they are updated when the sampling occurs.





DDR Memory location at 0x0800_0000 following power cycle






DDR Memory Location at 0x0800_0000 following the samples being captured




With this up and working, we can now think about how we can use the base platform efficiently to implement higher-level signal-processing algorithms.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg




By Adam Taylor



So far on this journey (which is only just beginning) of looking at the Zynq UltraScale+ MPSoC we have explored mostly the A53 processors within the Application Processing Unit (APU). However, we must not overlook the Real-Time Processing Unit (RPU), which contains two ARM Cortex-R5 32 bit RISC processors and operates within the Zynq MPSoC’s PS’ (processing systems’) Low Power Domain.






R5 RPU Architecture



The RPU executes real-time processing applications, including safety-critical applications. As such, you can use it for applications that must comply with IEC61508 or ISO 26262. We will be looking at this capability in more detail in a future blog. To support this, the RPU can operate in two distinct modes:


  • Split or Performance: - Both cores operate independently
  • Lock-Step: - Both cores operate in lockstep


Of course, it is the lock-step mode which is implemented as one step when a safety application is being implemented (see chapter 8 of the TRM for full safety and security capabilities). To provide deterministic processing times, both ARM Cortex-R5 cores include 128KB of Tightly Coupled Memory (TCM) in addition to the Caches and OCM (on-chip memory). How the TCMs are used depends upon the operating mode. In Split mode, each processor has 128Kbytes of TCM (divided into A and B TCMs). In lock-step mode, there is one 256Kbyte TCM.





RPU in Lock Step Mode



At reset, the default setting configures the RPU to operate in lock-step mode. However, we can change between the operating modes while the processor group is in reset. We do this by updating the RPU Global Control Register SLCAMP bit, which clamps the outputs of the redundant processors, and the SLSPLIT bit, which sets the operating mode. We cannot change the RPU’s operating mode during operation, so we need to decide upfront during the architectural phase which mode we desire for a given application.


However, we do not have to worry about setting these bits when we use the debugger or generate a boot image. Instead we can use these to configure the operating mode. What I want to look at in the rest of the blog is look at how we configure the RPU operating mode both in our debug applications and boot-image generation.


The first way that we verify many of our designs is to use the System Debugger within SDK, which allows us to connect over JTAG or Ethernet and download our application. Using this method, we can of course use breakpoints and step through the code as it operates, to get to the bottom of any issues in the design. Within the debug configuration tab, we can also enable the RPU to operate in split mode if that’s the mode we want after system reset.





Debug Configuration to enable RPU Split Mode



When you download the code and run it on the Zynq MPSoC’s RPU, you will be able to see the operating mode within the debug window. This should match with your debug configuration setting.





Debug Window showing Lock-Step Mode



Once we are happy with the application, we will want to create a boot image and we will want to determine the RPU operating mode when we create that boot image. We can add the RPU elf to the FSB, FPGA, and APU files using the boot-image dialog. To select the RPU mode, we choose the edit option and then select the destination CPU—either both ARM Cortex-R5 cores in lockstep or the ARM Cortex-R5 core we wish it run on if we are using split mode.






Selecting the R5 Mode of operation when generating a boot image



Of course if we want to be sure we are in the correct mode in this operation, we need to read the RPU Global Control register and ensure the correct mode is selected as expected.


Now that we understand the different operating modes of the Zynq UltraScale+ MPSoC’s RPU, we can come back to these modes when we look at the security and safety capabilities provided by the Zynq MPSoC.



Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here


MicroZed Chronicles Second Year.jpg 



This week at its annual NI Week conference in Austin, Texas, National Instruments (NI) announced a new FlexRIO PXIe module, the PXIe-7915, based on three Xilinx Kintex UltraScale FPGAs. NI’s PCIe FlexRIO modules serve two purposes in NI-based systems: flexible, programmable, high-speed I/O and high-speed computation (usually DSP). NI’s customers access these FlexRIO resources using the company’s LabVIEW FPGA software, part of NI’s LabVIEW graphical development environment. Thanks to the Kintex UltraScale FPGA, new FlexRIO PXIe-7915 module contains significantly more programmable resources and delivers significantly more performance than previous FlexRIO modules, which are all based on earlier generations of Xilinx FPGAs. The set of graphs below shows the increased resources and performance delivered by the PXIe-7915 FlexRIO module in NI systems relative to previous-generation FlexRIO modules based on Xilinx Kintex-7 FPGAs:




FlexRIO UltraScale Graphs.jpg 



However, the UltraScale-based FlexRIO modules are not simply standalone products. They serve as design platforms for NI’s design engineers, who will use these platforms to develop many new, high-performance instruments. In fact, NI introduced the first two of these new instruments at NI Week 2017: the PXIe-5763 and PXIe-5764 high-speed, quad-channel, 16-bit digitizers. Here are the specs for these two new digitizers from NI:



NI FlexRIO Digitizers based on Kintex UltraScale FPGAs.jpg 



Previous digitizers in this product family employed parallel LVDS signals to couple high-speed ADCs to an FPGA. However, today’s fastest ADCs employ high-speed serial interfaces--particularly the JESD-204B interface specification—necessitating a new design. The resulting new design uses the FlexRIO PXIe-7915 card as a motherboard and the JESD204B ADC card as a mezzanine board, as shown in this photo:



NI FlexRIO PXIe-5764 Digitizer.jpg 




NI’s design engineers took advantage of the pin compatibility among various Kintex UltraScale FPGAs to maximize the flexibility of their design. They can populate the FlexRIO PXIe-7915 card with a Kintex UltraScale KU035, KU040, or KU060 FPGA depending on customer requirements. This flexibility allows them to create multiple products using one board layout—a hallmark of a superior, modular platform design.


Normally, you access the programmable-logic features of a Xilinx FPGA or Zynq SoC inside of an NI product using LabVIEW FPGA, and that’s certainly still true. However, NI has added something extra in its LabVIEW 2017 release: a Xilinx Vivado Project Export feature that provides direct access to the Xilinx Vivado Design Suite tools for hardware engineers experienced with writing HDL code for programmable logic. Here’s how it works:



LabVIEW Vivado Export Design Flow.jpg 




You can export all the necessary hardware files from LabVIEW 2017 to a Vivado project that is pre-configured for your specific deployment target. Any LabVIEW signal-processing IP in the LabVIEW design is included in the export as encrypted IP cores. As an added bonus, you can use the new Xilinx Vivado Project Export on all of NI’s FlexRIO and high-speed-serial devices based on Xilinx Kintex-7 or newer FPGAs.



NI has published a White Paper describing all of this. You’ll find it here.


Please contact National Instruments directly for more information about the new FlexRIO modules and LabVIEW 2017.



Adam Taylor’s MicroZed Chronicles, Part 196: SDSoC and Levels of Abstraction

by Xilinx Employee ‎05-22-2017 09:40 AM - edited ‎05-22-2017 10:28 AM (10,172 Views)


By Adam Taylor



We have looked at SDSoC several times throughout this series, however I recently organized and presented at the NMI FPGA Machine Vision event and during the coffee breaks and lunch, attendees showed considerable interest in SDSoC—not only for its use in the Xilinx reVISION acceleration stack but also its use in a range of over developments. As such, I thought it would be worth some time looking at what SDSoC is and the benefits we have previously gained using it. I also want to discuss a new use case.





SDSoC Development Environment




SDSoC is an Eclipse-based, system-optimizing compiler that allows us to develop our Zynq SoC or Zynq UltraScale+ MPSoC design in its entirety using C or C++. We can then profile the application to find aspects that cause performance bottlenecks and move then into the Zynq device’s Programmable Logic (PL). SDSoC does this using HLS (High Level Synthesis) and a connectivity framework that’s transparent to the user. What this means is that we are able develop at a higher level of abstraction and hence reduce the time to market of the product or demonstration.


To do this, SDSoC needs a hardware platform, which can be pre-defined or custom. Typically, these platforms within the PL provide the basics: I/O interfaces and DMA transfers to and from Zynq device’s PS’ (Processing System’s) DDR SDRAM. This frees up most the PL resources and PL/PS interconnects to be used by SDSoC when it accelerates functions.


This ability to develop at a higher level and accelerate performance by moving functions into the PL enables us to produce very flexible and responsive systems. This blog has previously looked at acceleration examples including AES encryption, matrix multiplication, and FIR Filters. The reduction in execution time has been significant in these cases. Here’s a table of these previously discussed examples:





Previous Acceleration Results with SDSoC. Blogs can be found here




To aid us in the optimization of the final application, we can use pragmas to control the HLS optimizations. We can use SDSoC’s tracing and profiling capabilities while optimizing these accelerated functions and the interaction between the PS and PL.


Here’s an example of a trace:





Results of tracing an example application

(Orange = Software, Green = Accelerated function and Blue = Transfer)



Let us take a look at a simple use case to demonstrate SDSoC’s abilities.


Frequency Modulated Continuous Wave (FMCW) RADAR is used for a number of applications that require the ability to detect objects and gauge their distance. FMCW applications make heavy use of FFT and other signal-processing techniques such as windowing, Constant False Alarm Rate (CFAR), and target velocity and range extraction. These algorithms and models are ideal for description using a high-level language such as C / C++. SDSoC can accelerate the execution of functions described this way and such an approach allows you to quickly demonstrate the application.


It is possible to create a simple FMCW receive demo using a ZedBoard and an AD9467 FPGA Mezzanine Card (FMC). At the simplest level, the hardware element of the SDSoC platform needs to be able to transfer samples received from the ADC into the PS memory space and then transfer display data from the PS memory space to the display, which in most cases will be connected with DVI or HDMI interfaces.






Example SDSoC Platform for FMCW application



This platform permits development of the application within SDSoC at a higher level. It also provides a platform that we can use for several different applications, not just FMCW. Rather helpfully, the AD9467 FMC comes with a reference design that can serve as the hardware element of the SDSoC Platform. It also provides drivers, which can be used as part of the software element.


With a platform in hand, it is possible to write the application within the SDSoC using C or C++, where we can make use of the acceleration libraries and stacks including matrix multiplication, math functions, and the ability to wrap bespoke HLD IP cores and use them within the development.


Developing in this manner provides a much faster development process, and provides a more responsive solution as it leverages the Zynq PL for inherently parallel or pipelined functions. It also makes it easier to upgrade designs in terms. As the majority development will also use C or C++ and because SDSoC is a system-optimizing complier, the application developer does not need to be a HDL specialist.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 




  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 







By Adam Taylor


When I demonstrated how to boot the ZedBoard using the TFTP server, there was one aspect I did not demonstrate: configuring the Zynq SoC’s PL (programmable logic) over the TFTP. It’s very simple to do. We can include the PL bin file along with the Kernel, RAM Disk, and Device Tree blob on the server and then allow U-Boot to configure the PL as it boots, just as we did for the other elements.


We can also configure the Zynq SoC’s PL at any time we want using either Linux or bare-metal applications. To do this we use the DevC (Device configuration)/PCAP (Processor Configuration Access Port) within the Zynq SoC’s PS (processing system). There are three methods through which we configure the PL. The most obvious being JTAG, followed by PCAP under PS control, with the final method being the ICAP (Internal Configuration Access Port). It is through the DevC interface that we configure the PL when the device boots using the FSBL or U-Boot. The ICAP path is the least-used method and requires a configured PL prior to its use. One example where you might use the ICAP path would be to allow a MicroBlaze soft-core processor to reconfigure the PL.







When the device is running, we can replace the contents of the PL with an updated design using the same interface. All that we need to do is to have generated the new bit file and ensure that it is accessible to the program running on the ARM Cortex-A9 processors in the Zynq SoC’s PS so that they can download it via the DevC interface.


If we are using Linux, we can upload the file into the file system using FTP. We can then use the built-in DevC driver within the Linux Kernel to download the bit file.








From a command prompt, we can enter the command:



cat {filename} > /dev/xdevcfg



to download the bit file. When I did this for a simple Zedboard design, as shown below—which includes the ability to drive the LEDS connected to the PL—the “Done” LED lit. Of course, to ensure correct operation we need to have the device tree blob correctly configured to support the PL design.








If we want to configure the Zynq SoC’s PL using bare-metal software, we can use a similar approach. The BSP comes with an example file that downloads a PL image using the DevC interface provided that we have the PL file loaded into the Zynq SoC’s attached DDR memory. We can access the example and include it within our design using the System.MSS file, which is provided when we generate a BSP.







To correctly use the example provided, we need to have a PL bit file loaded in the DDR Memory. For a production-ready system, we would have to store the PL configuration file within a non-volatile memory and then load it into the DDR at a known address before running the DevC example code. However, to demonstrate the concept, we can use the debugger to download the configuration file into the DDR at the desired memory location.


Within the application example, all we need to do is define the location of the configuration file and the size of the file:







Having demonstrated how we can reconfigure the PL in its entirety, we can also use a similar approach to partially reconfigure regions within the PL, which we will look at in future blogs.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here


MicroZed Chronicles Second Year.jpg 




The huge number of low-cost Arduino peripheral shields from dozens of vendors makes the Arduino form factor extremely attractive. Now you can take advantage of that shield library using the Zynq SoC with the inexpensive, €89 Trenz ArduZynq, which puts a single-core Xilinx Zynq Z-7007S SoC along with 512Mbytes of DDR3L SDRAM, 16Mbytes of SPI Flash memory, and a MicroSD card socket into an Arduino form factor.


Here’s a photo of the Trenz ArduZynq board:



Trenz ArduZynq.jpg 



€89 Trenz ArduZynq puts a single-core Xilinx Zynq Z-7007S SoC into the Arduino form factor




Here’s a pinout diagram of the ArduZynq:



Trenz ArduZynq Pinout.jpg


Trenz ArduZynq Pinout





Hardware and software design for this board is supported by the Xilinx Vivado Design Suite HL WebPACK Edition, downloadable at no cost.



Note: For more information about the single-core Zynq SoC family, see “One-ARM Zynq family joins the Zynq parade. Now you can choose from 31 devices with one, two, four, or six ARM microprocessors.”



By Adam Taylor


One of the great things about many of the Zynq SoC’s PS (processing system) peripherals is that we can break them out via the PL (programmable logic) I/O. This capability provides us with great flexibility at the system level as we can implement more peripherals than can be supported over the Zynq SoC’s MIO on its own. However, during the years of writing the MicroZed Chronicles, I have been asked questions by a few readers’ about mapping from the PS to the PL I/O using EMIO and how to map when using PS GPIO. So in this post, I am going to address those questions and provide a nice simple reference for how to do it.


We can break out many of the PS peripherals into the PL using EMIO. The exceptions are the USB ports, the SMC (static memory controller), and the QSPI Flash controller. There may be some performance degradation when the EMIO is used. For example, SDcard controller I/O operates at 50MHz when routed to the MIO and 25MHz when routed to the EMIO.





Peripheral and Routing to the EMIO



When we route signals to the EMIO, we will see the appropriate port appear at the top level of the Zynq IP block within Vivado. To enable these signals, we need to configure the SPI to use the EMIO, which is done on the MIO configuration tab of the Zynq IP Configuration Wizard. We can enable the SPI and from the IO drop down select EMIO. This will create a SPI port at the top level of the design.





Selecting the EMIO for SPI




Resultant SPI port on the Zynq Block with port added





We can then use the standard XDC constraints file to route the I/O to any of the PL pins as we would for a normal element within the PL design.


Where it gets slightly more complicated is when we are using the GPIO and decide to extend that using EMIO. Suddenly, we need to understand GPIO banks and GPIO Numbers and IO pins.


The Zynq-7000 series provides 54 GPIO signals in two banks dedicated the MIO (although if you use all 54 pins you cannot use any other peripherals). These banks consist of a 32-bit bank 0 and a 22-bit bank 1. Additionally, there are also two EMIO-only banks. Both are 32 bits wide. Within the EMIO, these banks provide 64 inputs, 64 outputs, and another 64 output enables that can be used as outputs, giving us a total of 192 I/O signals (64 Inputs, 128 Outputs).


These GPIO signals are numbered from 0 to 53, for banks within the PS MIO, and 54 to 117 for GPIO within the EMIO region. When we break these signals out into the EMIO, the Zynq IP block will show them on the Zynq IP block. Note that GPIO 0 on the Zynq port is Pin 54 for the ARM cores. These IO signals can then be routed to the PL IO as we would any other signal and tied to a specific IO pin and standard using the XDC file. The diagram below shows the relationship between the different elements:







It does get slightly confusing however when we use software to drive the GPIO signals within SDK. To drive the desired GPIO pin, we must use either the bank or the pin number. For GPIO signals, the pin numbers range from 0 and 53. For the EMIO signals, pin numbers range from 54 to 117. Once we understand this and that we route the signals in the PL just like we do any other signal, we can quickly use the EMIO using the XGPIOPS library provided with the BSP.


Hopefully this makes things a little clearer to those still starting out the relationships among the Zynq software, the Zynq SoC’s PL, and the XDC file.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 





Face it, you use PCIe to go fast. That’s the whole point of the interface. So when you move data over your PCIe buses, you likely want to go as fast as possible. Perhaps you’d like some tips on getting maximum PCIe performance when designing with Xilinx’s most advanced FPGAs. You’re in luck , there’s a new 13-minute video that discusses that topic.


The video covers these contributors to PCIe performance:


  • Selecting the appropriate link speed and width
  • Maximum payload size
  • Largest possible transfer size
  • Enabling the maximum number of DMA channels
  • Polling versus interrupts (polling is faster)


The video explores a PCIe design for the KCU105 Kintex UltraScale FPGA Evaluation Kit using the Vivado Design Suite’s graphical IP Integrator (IPI) tool. The design took only about 20 to 30 minutes to create using IPI.


The video then discusses the results of various performance experiments using this design. Results like this:




PCIe results.jpg 




Here’s the video:






Adam Taylor’s MicroZed Chronicles, Part 192: Pmod – What if there is no Driver?

by Xilinx Employee ‎05-08-2017 10:32 AM - edited ‎05-08-2017 10:33 AM (17,851 Views)


By Adam Taylor



We recently examined how we could use Pmods in our system. There are a lot of Pmods available from many vendors but drivers are not necessarily available for all of them. If there is no driver available, we can use the Pmod bridge in the Zynq SoC’s PL (programmable logic), which enables us to correctly map Pmod ports on our selected development board and to create our own Zynq PS (processing system) driver. If we were to explore one of the provided drivers, we would find these also use the Pmod bridge, coupled with an AXI IIC or SPI component.






Pmod AD2 PL Driver Components



In this example, I am going to be using Digilent’s DA4 octal DAC Pmod. I’ll integrate it with Digilent’s dual ADC AD2 Pmod, which we previously used in the driver example. We will develop our own driver with the Pmod bridge, generate an analog signal using the DA4 Pmod, and then receive the signal using the AD2 Pmod.






Pmod DA4 test set up



The Pmod bridge allows us to define the input types for both the top and bottom row of the Pmod connector. We can select from either GPIO, UART, IIC, or SPI interfaces. We make this selection for each of the Pmod rows in line with the Pmod we wish to drive. Selecting the desired type makes the pinout of the Pmod connector align with the standard for the interface type.


For the DA4, we need to use a SPI interface on the top row only. With this selected, we need to provide the actual SPI communication channel. As we are using the Zynq SoC, we have two options. The first would be to use an AXI SPI IP block within the PL and connected to the bridge. The second approach—and the one I am going to use—is to connect the bridge to the Zynq PS’ SPI using EMIO. This choice provides us with the ability to wire the pins from the PS SPI ports to the bridge inputs directly.


To do this we need to read the standard to ensure we can map from the SPI pins to the input pins on the bridge in the correct order (e.g. which PS SPI signal is connected to IN_0?). As these pins on the bridge represent different interface types, they are named generically. The diagram below shows how I did it for the DA4. Once we have mapped the pins for this example, we can build the project, export it to SDK, and then write the software to drive our DA4.






We can use the SPI drivers created by the BSP within SDK to drive the DA4. To interact with the DA4, the first thing we need to do is initialize the SPI controller. Once we have set the SPI Options for clock phase and master operation, we can then define a buffer and use the polled-transfer mode to transfer the required information to the DA4. A more complex driver would use an interrupt-driven approach as opposed to a polled one.






I have uploaded the file I created to drive the DA4 onto the git hub repository. To test it I drove a simple ramp output and used the scope feature in the Digilent Analog Discovery module to monitor the DAC output. I received the following signal:






With this completed and the DA4 known to be working as expected, I connected the DA4 and the AD2 together so that the Zynq SoC could receive the signal:






When doing this, we need to be careful to ensure the signal output by the DA4 is within the AD2 Pmod’s operating range.


Having completed this and shown that the DA4 is working on the hardware, we now understand how we can create drivers if there is no driver available.



Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 



  • Second Year E Book here
  • Second Year Hardback here



MicroZed Chronicles Second Year.jpg 



About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.