The amazing “snickerdoodle one”—a low-cost, single-board computer with wireless capability based on the Xilinx Zynq Z-7010 SoC—is once more available for purchase on the Crowd Supply crowdsourcing Web site. Shipments are already going out to existing backers and, if you missed out on the original crowdsourcing campaign, you can order one for the post-campaign price of $95. That’s still a huuuuge bargain in my book. (Note: There is a limited number of these boards available, so if you want one, now’s the time to order it.)
In addition, you can still get the “snickerdoodle black” with a faster Zynq Z-7020 SoC and more SDRAM that also includes an SDSoC software license, all for $195. Finally, snickerdoodle’s creator krtkl has added two mid-priced options: the snickerdoodle prime and snickerdoodle prime LE—also based on Zynq Z-7020 SoCs—for $145.
The krtkl snickerdoodle low-cost, single-board computer based on a Xilinx Zynq SoC
Ryan Cousins at krtkl sent me this table that helps explain the differences among the four snickerdoodle versions:
For more information about krtkl’s snickerdoodle SBC, see:
I just received an email from Dave Embedded Systems announcing that the company will be showing its new ONDA SOM (System on Module) based on Xilinx Zynq UltraScale+ MPSoCs at next week’s Embedded World 2017 in Nuremberg. Here’s a board photo:
Dave Embedded Systems ONDA SOM based on the Xilinx Zynq UltraScale+ MPSoC (Note: Facsimile Image)
And here’s a photo of the SMM’s back side showing the three 140-pin, high-density I/O connectors:
Dave Embedded Systems ONDA SOM based on the Xilinx Zynq UltraScale+ MPSoC (Back Side)
Thanks to the multiple processors and programmable logic in the Zynq UltraScale+ MPSoC, the ONDA board packs a lot of processing power into its small 90x55mm board. Dave Embedded Systems plans to offer versions of the ONDA SOM based on the Zynq UltraScale+ ZU2, ZU3, ZU4, and ZU5 MPSoCs, so there should be a wide range of price/performance points to pick from while standardizing on one uniformly sized platform.
Here’s a block diagram of the board:
Dave Embedded Systems ONDA SOM based on the Xilinx Zynq UltraScale+ MPSoC, Block Diagram
Please contact Dave Embedded Systems for more information about the ONDA SOM.
A LinkedIn blog published last month by Alfred P Neves of Wild River Technology describes a DesignCon 2017 tutorial titled “32 to 56Gbps Serial Link Analysis and Optimization Methods for Pathological Channels.” (You can get a copy of the paper here on the Wild River Web site. Registration required.) Co-authors of the turorial included Al Neves and Tim Wang Lee of Wild River Technology, Heidi Barnes and Mike Resso of Keysight, and Jack Carrel and Hong Ahn of Xilinx.
The tutorial discussed ways to test pathological channels at these nose-bleed serial speeds and those methods employed the bulletproof GTY SerDes on a Xilinx 16nm UltraScale+ FPGA for the 32Gbps transmitters and receivers as well as the Wild River ISI-32 loss platform and XTALK-32 crosstalk platform and Keysight test equipment.
Here’s a photo of the test setup showing the Xilinx UltraScale+ FPGA characterization board on the right, the Wild River test platforms on the left, and the Keysight test equipment in the background:
If you don’t want to scan the DesignCon tutorial presentation, you can also watch a free 1-hour recorded Webinar about the topic on the Keysight web site. Click here.
On Thursday, March 30, two member companies from the IIConsortium (Industrial Internet Consortium)—Cisco and Xilinx—are presenting a free, 1-hour Webinar titled “How the IIoT (Industrial Internet of Things) Makes Critical Data Available When & Where it is Needed.” The discussion will cover machine learning and how self-optimization plays a pivotal role in enhancing factory intelligence. Other IIoT topics covered in the Webinar include TSN (time-sensitive networking), real-time control, and high-performance node synchronization. The Webinar will be presented by Paul Didier, the Manufacturing Solution Architect for the IoT SW Group at Cisco Systems, and Dan Isaacs, Director of Connected Systems at Xilinx.
By Adam Taylor
Embedded vision is one of my many FPGA/SoC interests. Recently, I have been doing some significant development work with the Avnet Embedded Vision Kit (EVK) significantly (for more info on the EVK and its uses see Issues 114 to 126 of the MicroZed Chronicles). As part my development, I wanted to synchronize the EVK display output with an external source—also useful if we desire to synchronize multiple image streams.
Implementing this is straight forward provided we have the correct architecture. The main element we need is a buffer between the upstream camera/image sensor chain and the downstream output-timing and -processing chain. VDMA (Video Direct Memory Access) provides this buffer by allowing us to store frames from the upstream image-processing pipeline in DDR SDRAM and then reading out the frames into a downstream processing pipeline with different timing.
The architectural concept appears below:
VDMA buffering between upstream and downstream with external sync
For most downstream chains, we use a combination of the video timing controller (VTC) and AXI Stream to Video Out IP blocks, both provided in the Vivado IP library. These two IP blocks work together. The VTC provides output timing and generates signals such as VSync and HSync. The AXI Stream to Video Out IP Block synchronizes its incoming AXIS stream with the timing signals provided by the VTC to generate the output video signals. Once the AXI Stream to Video Out block has synchronized with these signals, it is said to be locked and it will generate output video and timing signals that we can use.
The VTC itself is capable of both detecting input video timing and generating output video timing. These can be synchronized if you desire. If no video input timing signals are available to the VTC, then the input frame sync pulse (FSYNC_IN) serves to synchronize the output timing.
Enabling Synchronization with FSYNC_IN or the Detector
If FSYNC_IN alone is used to synchronize the output, we need to use not only FSYNC_IN but also the VTC-provided frame sync out (FSYNC_OUT) and GEN_CLKEN to ensure correct synchronization. GEN_CLKEN is an input enable that allows the VTC generator output stage to be clocked.
The FSYNC_OUT pulse can be configured to occur at any point within the frame. For this application, is has been configured to be generated at the very end of the frame. This configuration can take place in the VTC re-configuration dialog within Vivado for a one-time approach or, if an AXI Lite interface is provided, it can be positioned using that during run time.
The algorithm used to synchronize the VTC to an external signal is:
Should GEN_CLK not be disabled, the VTC will continue to run freely and will generate the next frame sequence. Issuing another FSYNC_IP while this is occurring will not result in re-synchronisation but will result in the AXI Stream to Video Out IP block being unable to synchronize the AXIS video with the timing information and losing lock.
Therefore, to control the enabling of the GEN_CLKEN we need to create a simple RTL block that implements the algorithm above.
Vivado Project Demonstrating the concept
When simulated, this design resulted in the VTC synchronizing to the FSYNC_IN signal as intended. It also worked the same when I implemented it in my EVK kit, allowing me to synchronize the output to an external trigger.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
MRAM (magnetic RAM) maker Everspin wants to make it easy for you to connect its 256Mbit DDR3 ST-MRAM devices (and it’s soon-to-be-announced 1Gbit ST-MRAMs) to Xilinx UltraScale FPGAs, so it now provides a software script for the Vivado MIG (Memory Interface Generator) that adapts the MIG DDR3 controller to the ST-MRAM’s unique timing and control requirements. Everspin has been shipping MRAMs for more than a decade and, according to this EETimes.com article by Dylan McGrath, it’s still the only company to have shipped commercial MRAM devices.
Nonvolatile MRAM’s advantage is that it has no wearout failure, as opposed to Flash memory for example. This characteristic gives MRAM huge advantages over Flash memory in applications such as server-class enterprise storage. MRAM-based storage cards require no wear leveling and their read/write performance does not degrade over time, unlike Flash-based SSDs.
As a result, Everspin also announced its nvNITRO line of NVMe storage-accelerator cards. The initial cards, the 1Gbyte nvNITRO ES1GB and 2Gbyte nvNITRO ES2GB, deliver 1,500,000 IOPS with 6μsec end-to-end latency. When Everspin's 1Gbit ST-MRAM devices become available later this year, the card capacities will increase to 4 to 16Gbytes.
Here’s a photo of the card:
Everspin nvNITRO Storage Accelerator
If it looks familiar, perhaps you’re recalling the preview of this board from last year’s SC16 conference in Salt Lake City. (See “Everspin’s NVMe Storage Accelerator mixes MRAM, UltraScale FPGA, delivers 1.5M IOPS.”)
If you look at the photo closely, you’ll see that the hardware platform for this product is the Alpha Data ADM-PCIE-KU3 PCIe accelerator card, loaded 1 or 2Gbyte Everspin ST-MRAM DIMMs. Everspin has added its own IP to the Alpha Data card, based on a Kintex UltraScale KU060 FPGA, to create an MRAM-based NVMe controller.
As I wrote in last year’s post:
“There’s a key point to be made about a product like this. The folks at Alpha Data likely never envisioned an MRAM-based storage accelerator when they designed the ADM-PCIE-KU3 PCIe accelerator card but they implemented their design using an advanced Xilinx UltraScale FPGA knowing that they were infusing flexibility into the design. Everspin simply took advantage of this built-in flexibility in a way that produced a really interesting NVMe storage product.”
It’s still an interesting product, and now Everspin has formally announced it.
By Lei Guan, MTS Nokia Bell Labs (email@example.com)
Many wireless communications signal-processing stages, for example equalization and precoding, require linear convolution functions. Particularly, complex linear convolution will play a very important role in future-proofing massive MIMO system through frequency-dependent, spatial-multiplexing filter banks (SMFBs), which enable efficient utilization of wireless spectrum (see Figure 1). My team at Nokia Bell Labs has developed a compact, FPGA-based SMFB implementation.
Figure 1 - Simplified diagram of SMFB for Massive MIMO wireless communications
Architecturally, linear convolution shares the same structure used for discrete finite impulse response (FIR) filters, employing a combination of multiplications and additions. Direct implementation of linear convolution in FPGAs may not satisfy the user constraints regarding key DSP48 resources, even when using the compact semi-parallel implementation architecture described in “Xilinx FPGA Enables Scalable MIMO Precoding Core” in the Xilinx Xcell Journal, Issue 94.
From a signal-processing perspective, the discrete FIR filter describes the linear convolution function in the time domain. Because the linear convolution in the time domain is equivalent to multiplication in the frequency domain, an alternative algorithm—called “fast linear convolution” (FLC)—is good candidate for FPGA implementation. Unsurprisingly, such an implementation is a game of trade-offs between space and time, between silicon area and latency. In this article, we mercifully skip the math for the FLC operation (but you will find many more details in the book “FPGA-based Digital Convolution for Wireless Applications”). Instead, let’s take closer look at the multi-branch FLC FPGA core that our team created.
The design targets supplied by the system team included:
Figure 2 shows the top-level design of the resulting FLC core in the Vivado System Generator Environment. Figure 3 illustrates the simplified processing stages at the module level with four branches as an example.
Figure 2 - Top level of the FLC core in Xilinx Vivado System Generator
Figure 3 - Illustration of multi-branch FLC-core processing (using 4 branches as an example)
The multi-branch FLC-core contains the following five processing stages, isolated by registers for logic separation and timing improvement:
Figure 4 - Simple Dual-Port RAM based input data buffer and reproduce stage
Table 1 compares the performance of our FLC design and a semi-parallel solution. Our compact FLC core implemented with Xilinx UltraScale and UltraScale+ FPGAs creates a cost-effective, power-efficient, single-chip frequency dependent Massive MIMO spatial multiplexing solution for actual field trials. For more information, please contact the author.
Last month, the European AXIOM Project took delivery of its first board based on a Xilinx Zynq UltraScale+ ZU9EG MPSoC. (See “The AXIOM Board has arrived!”) The AXIOM project (Agile, eXtensible, fast I/O Module) aims at researching new software/hardware architectures for Cyber-Physical Systems (CPS).
AXIOM Project Board based on Xilinx Zynq UltraScale+ MPSoC
The board in fact presents the pinout of an Arduino Uno so you can attach an Arduino Uno-compatible shield to the board. The presence of the Arduino UNO pinout enables fast prototyping and exposes the FPGA I/O pins in a user-friendly manner.
Here are the board specs:
You can see the AXIOM board for the first time during next week’s Embedded World 2017 at the SECO UDOO Booth, at the SECO booth, and at the EVIDENCE booth.
Please contact the AXIOM Project for more information.
A simple press release last month from the UK’s U of Bristol announced a 5G Massive MIMO milestone jointly achieved by BT, the Universities of Bristol and Lund, and National Instruments (NI): serving 2Gbps to 24 users simultaneously using a 20MHz LTE channel. That’s just short of 100 bits/sec/Hz and improves upon today’s LTE system capacity by 10x. The system that achieved this latest LTE milestone is based on the same Massive MIMO SDR system based on NI USRP RIO dual-channel SDR radios that delivered 145.6 bps/Hz in 5G experiments last year. (See “Kapow! NI-based 5G Massive MIMO SDR proto system “chock full of FPGAs” sets bandwidth record: 145.6 bps/Hz in 20MHz channel.”)
According to the press release:
“Initial experiments took place in BT’s large exhibition hall and used 12 streams in a single 20MHz channel to show the real-time transmission and simultaneous reception of ten unique video streams, plus two other spatial channels demonstrating the full richness of spatial multiplexing supported by the system.
“The system was also shown to support the simultaneous transmission of 24 user streams operating with 64QAM on the same radio channel with all modems synchronising over-the-air. It is believed that this is the first time such an experiment has been conducted with truly un-tethered devices, from which the team were able to infer a spectrum efficiency of just less than 100bit/s/Hz and a sum rate capacity of circa two Gbits/s in this single 20MHz wide channel.”
The NI USRP SDRs are based on Xilinx Kintex-7 325T FPGAs. Again, quoting from the press release:
“The experimental system uses the same flexible SDR platform from NI that leading wireless researchers in industry and academia are using to define 5G. To achieve accurate, real-time performance, the researchers took full advantage of the system's FPGAs using LabVIEW Communications System Design and the recently announced NI MIMO Application Framework. As lead users, both the Universities of Bristol and Lund worked closely with NI to implement, test and debug this framework prior to its product release. It now provides the ideal foundations for the rapid development, optimization and evaluation of algorithms and techniques for massive MIMO.”
Here’s a BT video describing this latest milestone in detail:
A paper describing the superior performance of an FPGA-based, speech-recognition implementation over similar implementations on CPUs and GPUs won a Best Paper Award at FPGA 2017 held in Monterey, CA last month. The paper—titled “ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA” and written by authors from Stanford U, DeePhi Tech, Tsinghua U, and Nvidia—describes a speech-recognition algorithm using LSTM (Long Short-Term Memory) models with load-balance-aware pruning implemented on a Xilinx Kintex UltraScale+ KU060 FPGA. The implementations runs at 200MHz and draws 41W (for the FPGA board) slotted into a PCIe chassis. Compared to Core i7 CPU/Pascal Titan X GPU implementations of the same algorithm, the FPGA-based implementation delivers 43x/3x more raw performance and 40x/11.5x better energy efficiency, according to the FPGA 2017 paper. So the FPGA implementation is both faster and more energy-efficient. Pick any two.
Here’s a block diagram of the resulting LSTM speech-recognition design:
The paper describes the algorithm and implementation in detail, which probably contributed to this paper winning the conference’s Best Paper Award. This work was supported by the National Natural Science Foundation of China.
By Adam Taylor
Without a doubt, some of the most popular MicroZed Chronicles blogs I have written about the Zynq 7000 SoC explain how to use the Zynq SoC’s XADC. In this blog, we are going to look at how we can use the Zynq UltraScale+ MPSoC’s Sysmon, which replaces the XADC within the MPSoC.
The MPSoC contains not one but two Sysmon blocks. One is located within the MPSoC’s PS (processing system) and another within the MPSoC’s PL (programmable logic). The capabilities of the PL and PS Sysmon blocks are slightly different. While the processors in the MPSoC’s PS can access both Sysmon blocks through the MPSoC’s memory space, the different Sysmon blocks have different sampling rates and external interfacing abilities. (Note: the PL must be powered up before the PL Sysmon can be accessed by the MPSoC’s PS. As such, we should check the PL Sysmon control register to ensure that it is available before we perform any operations that use it.)
The PS Sysmon samples its inputs at 1Msamples/sec while the PL Sysmon has a reduced sampling rate of 200Ksamples/sec. However, the PS Sysmon does not have the ability to sample external signals. Instead, it monitors the Zynq MPSoC’s internal supply voltages and die temperature. The PL Sysmon can sample external signals and it is very similar to the Zynq SoC’s XADC, having both a dedicated VP/VN differential input pair and the ability to interface to as many as sixteen auxiliary differential inputs. It can also monitor on-chip voltage supplies and temperature.
Sysmon Architecture within the Zynq UltraScale+ MPSoC
Just as with the Zynq SoC’s XADC, we can set upper and lower alarm limits for ADC channels within both the PL and PS Sysmon in the Zynq UltraScale+ MPSoC. You can use these limits to generate an interrupt should the configured bound be exceed. We will look at exactly how we can do this in another blog once we understand the basics.
The two diagrams below show the differences between the PS and PL Sysmon blocks in the Zynq UltraScale+ MPSoC:
Zynq UltraScale+ MPSoC’s PS System Monitor (UG580)
Zynq UltraScale+ MPSoC’s PL Sysmon (UG580)
Interestingly, the Sysmone4 block in the MPSoC’s PL provides direct register access to the ADC data. This will be useful if using either the VP/VN or Aux VP/VN inputs to interface with sensors that do not require high sample rates. This arrangement permits downstream signal processing, filtering, and transfer functions to be implemented in logic.
Both MPSoC Sysmon blocks require 26 ADC clock cycles to perform a conversion. Therefore, if we are sampling at 200Ksamlpes/sec, using the PL Sysmon we require a 5.2MHz ADC clock. For the PS Sysmon to sample at 1Msamples/sec, we need to provide a 26MHz ADC clock.
We set the AMS modules’ clock within the MPSoC Clock Configuration dialog, as shown below:
Zynq UltraScale+ MPSoC’s AMS clock configuration
The eagle-eyed will notice that I have set the clock to 52MHz and not 26 MHz. This is because the PS Sysmon’s clock divisor has a minimum value of 2, so setting the clock to 52MHz results in the desired 26MHz clock. The minimum divisor is 8 for the PL Sysmon, although in this case it would need to be divided by 10 to get the desired 5.2MHz clock. You also need to pay careful attention to the actual frequency and not just the requested frequency to get the best performance. This will impact the sample rate as you may not always get the exact frequency you want—as is the case here.
Next time in the UltraZed Edition of the MicroZed Chronicles, we will look at the software required to communicate with both the PS and PL Symon in the Zynq UltraScale+ MPSoC.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
RFEL has supplied the UK’s Defence Science and Technology Laboratory (DSTL), an executive agency sponsored by the UK’s Ministry of Defence, with two of its Zynq-based HALO Rapid Prototype Development Systems (RPDS). DSTL will evaluate video processing algorithms using IP from RFEL and 3rd parties in real-time, interactive video trials for military users. The HAL RPDS dramatically speeds assessment of complex video-processing solutions and provides real-time prototypes while conventional software-based simulations do not provide real-time performance).
HALO Rapid Prototype Development Systems (RPDS)
HALO is a small, lightweight, real-time video-processing subsystem based on the Xilinx Zynq Z-7020 or Z-7030 SoCs. It’s also relatively low-cost. HALO is designed for fast integration of high-performance vision capabilities for extremely demanding video applications—and military video applications are some of the most demanding because things whiz by pretty quickly and mistakes are very costly. The Zynq SoC’s software and hardware programmability give the HALO RPDS the flexibility to adapt to a wide variety of video-processing applications while providing real-time response.
Here’s a block diagram of the HALO RPDS:
HALO Rapid Prototype Development Systems (RPDS) Block Diagram
As you can see from the light blue boxes at the top of this block diagram, there are already a variety of real-time, video-processing algorithms. RFEL itself offers many such cores for:
All of these video-processing functions operate in real-time because they are hardware implementations instantiated in the Zynq SoC’s PL (programmable logic). In addition, the Zynq SoC’s extensive collection of I/O peripherals and programmable I/O mean that the HALO RPDS can interface with a broad range of image and video sources and displays.(That's why we say that Zynq SoCs are All Programmable.)
DSTL procured two HALO RPDS systems to support very different video processing investigations, for diverse potential applications. One system is being used to evaluate RFEL's suite of High Definition (HD) video-stabilization IP products to create bespoke solutions. The second system is being used to evaluate 3rd-party algorithms and their performance. The flexibility and high performance of the Zynq-based HALO RPDS system means that it is now possible for DSTL to rapidly experiment with many different hardware-based algorithms. Of course, any successful candidate solutions are inherently supported on the HALO platform, so the small, lightweight HALO system provides both a prototyping platform and an implementation platform.
For previous coverage of an earlier version of RFEL’s HALO system, see “Linux + Zynq + Hardware Image Processing = Fused Driver Vision Enhancement (fDVE) for Tank Drivers.”
Today, Keysight Technologies announced its low-cost InfiniiVision 1000 X-Series oscilloscopes with 50MHz and 100MHz models that start at $449. (Apparently, this is Scope Month and Keysight is giving away 125 DSOs in March—click here for more info.) Even at that low price, these 2-channel 1000 X-Series DSOs are based on Keysight’s high-performance MegaZoom IV custom ASIC technology, which enables a high 50,000 waveforms/sec update rate.
Keysight 1000 X-Series DSO
These new scopes are intended for students and new users. In fact, the two 50MHz models have “EDU” as a model number prefix. Here’s a table listing the salient features of the four models in the 1000 X-Series family:
All four models also operate as a serial protocol analyzer, digital voltmeter, and frequency counter. The EDUX1002G and DSOX1102G models include a frequency response analyzer and function generator.
So why are you reading about this very nice, new Keysight instrument in the Xilinx Xcell Daily blog? Well, Keysight supplied one of these new DSOs to my good friend Dave Jones at eevblog.com and of course, he did a teardown; and of course, he found a Xilinx FPGA inside. That's why.
Here’s Dave Jones’ 30-minute teardown video of the new Keysight 1000 X-series DSO:
This DSO features a 2-board construction. The main board is mostly analog and a small daughter board contains the high-speed ADC, the Keysight MegaZoom IV ASIC, an STMicroelectronics SPEAR600 application processor, and a low-end Xilinx Spartan-3E XC3S500E FPGA. That’s a lot of processing power in a very inexpensive DSO family that starts at $450.
Here’s Dave’s closeup shot of that digital daughtercard showing the Xilinx FPGA positioned between the SPEAR600 processor and the Keysight MegaZoom IV ASIC (under the finned heat sink):
Now the Spartan-3E FPGA is not a particularly new device; it was announced more than a decade ago. The Spartan-3E FPGA family was designed from the start to be cost-effective, which is no doubt why Keysight used it in this DSO design. Its appearance in a just-introduced product is a testament to Xilinx’s ongoing commitment to device availability over the long term.
Please contact Keysight directly for more information about the InfiniiVision 1000 X-Series DSOs.
What could be better than a PCIe SBC (single-board computer) that combines a Xilinx Zynq SoC with an FMC connector? How about the world’s smallest FMC carrier card that also happens to be based on any one of three Xilinx Zynq SoCs (your choice of a single-core Zynq Z7012S SoC or a dual-core Z7015 or Z-7030)? That’s the description of the Berten DSP GigaExpress SBC.
The Berten DSP GigaExpress SBC, the world’s smallest FMC carrier card
The GigaExpress SBC incorporates 1Gbyte of DDR3L-1066 SDRAM for the Zynq SoC’s single- or dual-core ARM Cortex-A9 PS (Processing System) and there’s 512Mbytes of dedicated DDR3L SDRAM clocked at 333MHz for exclusive use by the Zynq PL (Programmable Logic). The PS software and PL configuration are stored in a 512Mbit QSPI Flash memory. A 1000BASE-T Ethernet interface is available on a rugged Cat5e RJ45 connector. The block diagram of the GigaExpress SBC appears below. From this diagram and the above photo, you can see that the Zynq SoC along with the various memory devices is all the digital silicon you need to implement a complete, high-performance system.
Berten DSP GigaExpress SBC Block Diagram
Please contact Berten DSP directly for more information about the GigaExpress SBC.
Today, Aldec announced its latest FPGA-based HES prototyping board—the HES-US-440—with a whopping 26M ASIC gate capacity. This board is based on the Xilinx Virtex UltraScale VU440 FPGA and it also incorporates a Xilinx Zynq Z-7100 SoC that acts as the board’s peripheral controller and host interface. The announcement includes a new release of Aldec’s HES-DVM Hardware/Software Validation Platform that enables simulation acceleration and emulation use modes for the HES-US-440 board in addition to the physical prototyping capabilities. You can also use this prototyping board directly to implement HPC (high-performance computing) applications.
Aldec HES-US-440 Prototyping Board, based on a Xilinx Virtex UltraScale VU440 FPGA
The Aldec HES-US-440 board packs a wide selection of external interfaces to ease your prototyping work including four FMC HPC connections, PCIe, USB 3.0 and USB 2.0 OTG, UART/USB bridge, QSFP+, 1Gbps Ethernet, HDMI, SATA; has on-board NAND and SPI Flash memories; and incorporates two microSD slots.
Here’s a block diagram of the HES-US-440 prototyping board:
Aldec HES-US-440 Prototyping Board Block Diagram
For more information about the Aldec HES-US-440 prototyping board and Aldec’s HES-DVM Hardware/Software Validation Platform, please contact Aldec directly.
Last September at the GNU Radio Conference (GRCon16) in Boulder, CO, Ettus Research announced its RFNoC & Vivado HLS Challenge with a $10,000 grand prize for developing “innovative and useful open-source RF Network on Chip (RFNoC) blocks that highlight the productivity and development advantage of Xilinx Vivado High-Level Synthesis (HLS) for FPGA programming using C, C++, or System C. The new RFNoC blocks generated during the challenge will add to the rapidly growing library of available open-source blocks for programming FPGAs in SDR development and production.”
Based on formal proposals, the company has now accepted seven teams for the challenge:
The final challenge competition will take place in May or June 2017 (venue to be announced) and the teams are required to submit technical papers for publication in the GRCon17 technical proceedings outlining their design’s contribution, implementation, results, and lessons learned. (GRCon17 takes place on September 11-15, 2017 in San Diego, CA.)
For more information about the challenge, see “Matt Ettus of Ettus Research wants you to win $10K. All you have to do is meet his RFNoC & Vivado HLS challenge for SDR.”
Here are four online training classes in March that cover various technical design aspects of Xilinx UltraScale and UltraScale+ FPGAs and the Zynq UltraScale+ MPSoC:
03/09/2017 Zynq UltraScale+ MPSoC for the Software Developer
03/15/2017 Serial Transceivers in UltraScale Series FPGAs/MPSoCs – Part I – Transceiver Design Methodology
03/22/2017 Serial Transceivers in UltraScale Series FPGAs/MPSoCs – Part II – Debugging Techniques and PCB Design
03/23/2017 Zynq UltraScale+ MPSoC for the System Architect
These four classes will be taught by three Xilinx Authorized Training Providers: Faster Technology, Xprosys, and Hardent. Click here for registration details.
Berten DSP’s GigaX API for the Xilinx Zynq SoC creates a high-speed, 200Mbps full-duplex communications channel between a GbE port and the Zynq SoC’s PS (programmable logic) through an attached SDRAM buffer and an AXI DMA controller IP block. Here’s a diagram to clear up what’s happening:
The software API implements IP filtering and manages TCP/UDP headers, which help you implement a variety of hardware-accelerated Ethernet systems including Ethernet bridges, programmable network nodes, and network offload appliances. Here’s a performance curve illustrating the kind of throughput you can expect:
Please contact Berten DSP directly for more information about the GigaX API.
Today, Cadence announced the Protium S1 FPGA-Based Prototyping Platform, which delivers as many as 200M ASIC gates worth of prototyping capacity per chassis for hardware/software integration, software development, system validation, and hardware regression using one to eight Xilinx Virtex UltraScale VU440 FPGAs as a foundation. That’s double the previous version of the Protium FPGA-based Prototyping Platform which had a maximum gate capacity of 100M ASIC gates. The Protium S1 combines these Virtex UltraScale FPGA boards with a complete implementation and debug software suite, permitting ultra-fast design bring-up. The new Protium S1 platform is compatible with Cadence’s Palladium platforms and SpeedBridge adapters, paving the way for a smooth transition of SoC designs from an existing emulation environment into a high-performance FPGA-based prototype.
Here’s a 4-minute Protium S1 introductory video from Cadence:
Xcell Daily has covered the Samtec FireFly mid-board interconnect system several times but now there’s a new 3.5-minute video demo of a PCIe-specific version of the FireFly optical module. In the video demo, FireFly optical PCIe modules convey PCIe signals between a host PC and a video card over 100m of optical fiber in real time. The video passed over this link works smoothly. That’s quite a feat for a small module like the FireFly and it creates new possibilities for designing distributed systems.
The PCIe-specific version of the Samtec FireFly module handles PCIe sidebands and other PCIe-specific protocols. These modules match up well with the PCIe controllers found in Xilinx UltraScale and UltraScale+ devices and many 7 series FPGAs and Zynq SoCs. As Kevin Burt of Samtec’s Optical Group explains, the mid-board design of the FireFly system allows you to locate the modules adjacent to the driving chips (FPGAs in this case), which improves signal integrity of the pcb design.
Here’s the Samtec video:
For additional coverage of the Samtec FireFly system, see:
By Adam Taylor
Having looked at how we can quickly and easily get the Zynq UltraScale+ MPSoC up and running, I now want to look at the architecture of the system in a little more detail. I am going to start with examining the processor’s global address map. I am not going to look in detail into the contents of the address map. Initially, I want to explore how it is organized so that we understand it. I want to explain how the 32-bit ARM Cortex-R5 processors in the Zynq UlraScale+ MPSoC’s RPU (Real-time Processing Unit) and the 64-bit ARM Cortex-A53 processors in the APU (Application Processing Unit) share their address spaces.
The ARM Cortex-A53 processors use a 40-bit address bus, which can address up to 1Tbyte of memory. Compare this to the 32-bit address bus of the ARM Cortex-R5 processors, which can only address a 4Gbyte address space. The Zynq UltraScale+ MPSoC architects therefore had to consider how these address spaces would work together. The solution they came up with is pretty straightforward.
The memory map of the The Zynq UltraScale+ MPSoC is organised to so that the PMU (Platform Management Unit), MIO peripherals, DDR controller, the PL (programmable logic), etc. all fall within the first 4Gbyte of addressable space so that the APU and the RPU can both address these resources. The APU has further access to the DDR and PCIe controllers and the PL up to the remaining 1Tbyte address limit. The lower 4Gbytes of address space supports 32-bit addressing for some peripherals. One example of this is the PCIe controller, which supports 32-bit addressing via a 256Mbyte address range in the lower 4Gbytes and up to 256Gbytes (using 64-bit addressing) in the full address map.
MPSoC Global Address Map
It goes without saying that the only the APU can access the address space above 4 GB. However, the more observant amongst us will have noticed that there is also what appears to be a 36-bit addressable mode as well. Using a 36-bit address, provides for faster address translation, because the table walker uses only three stages instead of four for a 40-bit address. Therefore, 36 bit addressing should be used if possible to optimize system performance.
Address translation is the role of the System Memory Management Unit (SMMU), which has been designed to transform addresses from a virtual address space to a physical address space when using a virtualized environment. The SMMU can provide the following translations if desired:
Virtual Address (VA) - > Intermediate Physical Address (IPA) -> Physical Address (PA)
Within the SMMU, these are defined as being stage one VA to IPA or stage two IPA to PA and depending upon use case we can perform only a stage one, stage two or a stage one and two translation. To understand more about the SMMU—which is a complex subject—I would recommend reading chapters 3 and 10 of the Zynq UltraScale+ MPSoC TRM (UG1085) and the ARM SMMU architecture specification.
SMMU translation schemes
Now that we understand a little more about the Zynq UltraScale+ MPSoC’s global memory map, we will look at exactly what is contained within this memory map and how we can configure and use this map with both the APU and the RPU cores over the next few blogs.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
The six new PXI digitizers and AWGs (arbitrary waveform generators) in the Keysight M3xxx instrument series all take advantage of the real-time processing power and immense programmability of Xilinx Kintex-7 FPGAs. All six instruments are offered with either a Kintex-7 325T or 410T FPGA. Keysight provides programming libraries for C, C++, Visual Studio, LabVIEW, MATLAB, Python, other programming languages, and the Keysight M3602A Graphical FPGA Development Environment. Thanks to the built-in FPGA programmability, you can customize these instruments’ high- and low-level design elements using off-the-shelf DSP blocks, MATLAB/Simulink, the Xilinx CORE Generator and IP cores, and the Xilinx Vivado Design Suite with either VHDL or Verilog code. Clearly, Keysight has made FPGA programmability (and dynamic reprogrammability) integral to the feature sets of these instruments.
The six new Keysight PXI instruments are:
M3100A 100MSamples/sec, 4 or 8-channel FPGA digitizer
M3102A 500Msamples/sec, 2 or 4-channel FPGA digitizer
M3201A 500MSamples/sec FPGA arbitrary waveform generator
M3202A 1GSamples/sec FPGA arbitrary waveform generator
M3300A 500MSamples/sec, 2-channel FPGA AWG/digitizer combo
M3302A 500MSamples/sec, 4-channel FPGA AWG/digitizer combo
Keysight M3302A PXI AWG and Digitizer
According to a just-published article by Martin Rowe on EDN.com:
“The FPGA-based instruments in the table come from Signadyne, acquired by Keysight in 2016. The addition of the FPGA gives digitizers the ability to perform data processing on board, relieving the system controller from that resource-intensive task… Adding an FPGA to a waveform generator lets you program waveforms with complex modulation for emulating wireless signals such as [for] multiple input multiple output (MIMO) [antennas].”
This family of Keysight M3xxx instruments clearly demonstrates the ability to create an FPGA-based hardware platform that enables rapid development of many end products from one master set of hardware designs. In this case, the same data-acquisition and AWG block diagrams recur on the data sheets of these instruments, so you know there’s a common set of designs.
Xilinx FPGAs are inherently well-suited to this type of platform-based product design because of the All-Programmable (I/O, hardware, and software) nature of the devices. I/O programmability permits any-to-any connectivity—as is common with, for example, camera designs when you’re concerned about adapting to a range of sensors or different ADCs and DACs for digitizers and AWGs. Hardware programmability allows you to rapidly modify real-time signal-processing or motor-control algorithms—as is common with diverse designs including high-speed instrument designs and industrial controllers. Software programmability is of course pervasive and is common to every embedded design. Remember, all Xilinx devices give you all three; conventional SoCs, application processors, and microcontrollers do not.
With a month left in the Indiegogo funding period, the MATRIX Voice open-source voice platform campaign stands at 289% of its modest $5000 funding goal. MATRIX Voice is the third crowdfunding project by MATRIX Labs, based on Miami, Florida. The MATRIX Voice platform is a 3.14-inch circular circuit board capable of continuous voice recognition and compatible with the latest voice-based, cognitive cloud-based services including Microsoft Cognitive Service, Amazon Alexa Voice Service, Google Speech API, Wit.ai, and Houndify. The MATRIX Voice board, based on a Xilinx Spartan-6 LX4 FPGA, is designed to plug directly onto a low-cost Raspberry Pi single-board computer or it can be operated as a standalone board. You can get one of these boards, due to be shipped in May, for as little as $45—if you’re quick. (Already, 61 of the 230 early-bird special-price boards are pledged.)
Here’s a photo of the MATRIX Voice board:
This image of the top of the MATRIX Voice board shows the locations for the seven rear-mounted MEMS microphones, seven RGB LEDs, and the Spartan-6 FPGA. The bottom of the board includes a 64Mbit SDRAM and a connector for the Raspberry Pi board.
Because this is the latest in a series of developer boards from MATRIX Labs (see last year’s project: “$99 FPGA-based Vision and Sensor Hub Dev Board for Raspberry Pi on Indiegogo—but only for the next two days!”), there’s already a sophisticated, layered software stack for the MATRIX Voice platform that include a HAL (Hardware Abstraction Layer) with the FPGA code and C++ library, an intermediate layer with a streaming interface for the sensors and vision libraries (for the Raspberry Pi camera), and a top layer with the MATRIX OS and high-level APIs. Here’s a diagram of the software stack:
And now, who better to describe this project than the originators:
National Instruments (NI) has just added two members to its growing family of USRP RIO SDRs (software-defined radios)—the USRP-2944 and USRP-2945—with the widest frequency ranges, highest bandwidth, and best RF performance in the family. The USRP-2945 features a two-stage superheterodyne architecture that achieves superior selectivity and sensitivity required for applications such as spectrum analysis and monitoring, and signals intelligence. With four receiver channels, and the capability to share local oscillators, this SDR also sets new industry price/performance benchmarks for direction-finding applications. The USRP-2944 is a 2x2 MIMO-capable SDR that features 160MHz of bandwidth per channel and a frequency range of 10 MHz to 6 GHz. This SDR operates in bands well suited to LTE and WiFi research and exploration.
NI USRP RIO Platform
Like all of its USRP RIO products, the NI USRP-2944 and USRP-2945 incorporate Xilinx Kintex-7 FPGAs for local, real-time signal processing. The Kintex-7 FPGA implements a reconfigurable LabVIEW FPGA target that incorporates DSP48 coprocessing for high-rate, low-latency applications. With the company’s LabVIEW unified design flow, researchers can create prototype designs faster and significantly shorten the time needed to achieve results.
Here’s a block diagram showing the NI USRP RIO SDR architecture:
USRP RIO Block Diagram
Adam Taylor just published an EETimes review of the Xilinx RFSoC, announced earlier this week. (See “Game-Changing RFSoCs from Xilinx”.) Taylor has a lot of experience with high-speed analog converters: he’s designed systems based on them—so his perspective is that of a system designer who has used these types of devices and knows where the potholes are—and he’s worked for a semiconductor company that made them—so he should know what to look for with a deep, device-level perspective.
Here’s the capsulized summary of his comments in EETimes:
“The ADCs are sampled at 4 Gsps (gigasamples per second), while the DACs are sampled at 6.4 Gsps, all of which provides the ability to work across a very wide frequency range. The main benefit of this, of course, is a much simpler RF front end, which reduces not only PCB footprint and the BOM cost but -- more crucially -- the development time taken to implement a new system.”
“…these devices offer many advantages beyond the simpler RF front end and reduced system power that comes from such a tightly-coupled solution.”
“These devices also bring with them a simpler clocking scheme, both at the device-level and the system-level, ensuring clock distribution while maintaining low phase noise / jitter between the reference clock and the ADCs and DACs, which can be a significant challenge.”
“These RFSoCs will also simplify the PCB layout and stack, removing the need for careful segregation of high-speed digital signals from the very sensitive RF front-end.”
“I, for one, am very excited to learn more about RFSoCs and I cannot wait to get my hands on one.”
For more information about the new Xilinx RFSoC, see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say ‘All Programmable,’ we mean it!” and “The New All Programmable RFSoC—and now the video.”
Adam Taylor has published nearly 200 blogs in Xcell Daily but he’s reserved some of his best advice for embedded.com. Yesterday, he published a short article titled: “How to prevent FPGA-based projects from going astray.” In this article, Taylor describes five common issues that lead design teams astray:
Learn from the best. Spend five minutes and read Adam’s new article.
If you’re still uncertain as to what System View’s Visual System Integrator hardware/software co-development tool for Xilinx FPGAs and Zynq SoCs does, the following 3-minute video should make it crystal clear. Visual System Integrator extends the Xilinx Vivado Design Suite and makes it a system-design tool for a wide variety of embedded systems based on Xilinx devices.
This short video demonstrates System View’s tool being used for a Zynq-controlled robotic arm:
For more information about System View’s Visual System Integrator hardware/software co-development tool, see:
Avnet’s new $499 UltraZed PCIe I/O carrier card for its UltraZed-EG SoM (System on Module)—based on the Xilinx Zynq UltraScale+ MPSoC—gives you easy access to the SoM’s 180 user I/O pins, 26 MIO pins from the Zynq MPSoC’s MIO, and 4 GTR transceivers from the Zynq MPSoC’s PS (Processor System) through the PCIe x1 edge connector; two Digilent PMOD connectors; an FMC LPC connector; USB and microUSB, SATA, DisplayPort, and RJ45 connectors; an LVDS touch-panel interface; a SYSMON header; pushbutton switches; and LEDs.
$499 UltraZed PCIe I/O Carrier Card for the UltraZed-EG SoM
That’s a lot of connectivity to track in your head, so here’s a block diagram of the UltraZed PCIe I/O carrier card:
UltraZed PCIe I/O Carrier Card Block Diagram
For information on the Avnet UltraZed SOM, see “Look! Up in the sky! Is it a board? Is it a kit? It’s… UltraZed! The Zynq UltraScale+ MPSoC Starter Kit from Avnet” and “Avnet UltraZed-EG SOM based on 16nm Zynq UltraScale+ MPSoC: $599.” Also, see Adam Taylor’s MicroZed Chronicles about the UltraZed:
Yesterday, Xilinx announced breakthrough RF converter technology that allows the creation of an RFSoC with multi-Gsamples/sec DACs and ADCs on the same piece of TSMC 16nm FinFET silicon as the digital programmable-logic circuitry, the microprocessors, and the digital I/O. This capability transforms the Zynq UltraScale+ MPSoC into an RFSoC that's ideal for implementing 5G and other advanced RF system designs. (See “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say ‘All Programmable,’ we mean it!” for more information about that announcement.)
Today there’s a 4-minute video with Sr. Staff Technical Marketing Engineer Anthony Collins providing more details including an actual look at the performance of a 16nm test chip with the 12-bit, 4Gsamples/sec ADC and the 14-bit, 6.4Gsamples/sec DAC in operation.
Here’s the video:
To learn more about the All Programmable RFSoC architecture, click here or contact your friendly, neighborhood Xilinx sales representative.
By Adam Taylor
A few weeks ago we looked at how we can generate PWM signals using the Zynq SoC’s TTC (Triple Time Counter). PWM is very useful for interfacing with motor drives and for communications. What we did not look at however was how we can measure the PWM signals received by the Zynq SoC.
The Zynq SoC’s TTC (Triple Timer Counter)
We can do this using the TTC’s event counters These 16-bit counters are clocked by CPU_1x and are capable of measuring the time an input signal spends high or low. This input signal can come from either MIO or EMIO pins for the first TTC in both TTC 0 and 1 or from EMIO pins in the reaming two timers in each of the TTCs.
The event timer is very simple to use, once you enable and configure it to measure either the high or low duration of the pulse. The time updates the event count regsiter once the high or low level it has been configured to measure completes.
With a 133MHz CPU_1x clock, this 16-bit register can measure events as long as 492 microseconds before it overflows.
If the event counter does overflow and the event timer is not configured to handle this situation, the event timer will disable itself. If we have enabled overflow, the counter will roll over and continue counting while generating a event-roll-over interrupt. We can use this capability to count longer events by counting the number of times the event rolls over before arriving at the final value.
While using one event timer allows us to measure the time a signal is high or low, we can use two event timers to measure both the high and low times for the same signal: one configured to measured the high time and another to measure the low time.
To use the TTC to monitor an event, we need to ensure the TTC is enabled on the MIO configuration tab of the Zynq customization dialog:
To measure an external signal, we need to configure the TTC to use an external signal. We do this on the Clock Congifuration tab of the Zynq customization dialog:
Enabling this external source on the Zynq processing system block diagram provides input ports that we can connect to the external signal we wish to monitor. In this case I have connected both event timer inputs to the same external signal to monitor the signal’s high and low durations.
When I implemented the design targeting an Avnet ZedBoard, I broke the wave outputs and clock inputs out to the board’s PMOD connector A.
To get the software up and running I used the Servo example that we generated earlier as a base. To use the event timers, we need to set the enable bit the Event Control Timer register. Within this register, we can enable the event timer, set the appropriate signal level, and enable overflow if desired.
The TTC files provided with the BSP do not provide functions to configure or use the event timers within the TTC. However, interacting with them is straightforward. We can use the Xil_Out16 and Xil_In16 functions to configure the event timer and to read the timer value.
To enable the TTC0 timers zero and one to count opposite events, we can use the commands shown below:
Once enabled, we can then read the TTC event timers. In the case of this example, we use the code snippet below:
event = Xil_In16(0xF8001078);
event = Xil_In16(0xF800107C);
These commands read the event timer value.
When I put this all together and stimulated the external input using a 5KHz signal with a range of duty cycles, I could correctly determine the signal’s high and low times.
For example, with a 70 % duty cycle the event timer recorded a time of 15556 for the high duration and time of 6667 for a low duration of the pulse. There are 22222 CPU_1x clock cycles in a 5KHz signal. The measurement captured in the event registers total 22224 CPU_1x clock cycles or a frequency of 4999.6 Hz with the correct duty cycles for the signal received.
To ensure the most accurate conversion of clock counts into actual time measurements, we can use the #define XPAR_PS7_CORTEXA9_0_CPU_CLK_FREQ_HZ 666666687 definition provided within xparameters.h. This is either 4 or 6 times the frequency of CPU_1x.
These event timers can prove very useful in our systems, especially if we are interfacing with sensors that provide PWM outputs such as some temperature and pressure sensors.
Code is available on Github as always.