National Instruments’ (NI’s) VirtualBench All-in-One Instrument, based on the Xilinx Zynq Z-7020 SoC, combines a mixed-signal oscilloscope with protocol analysis, an arbitrary waveform generator, a digital multimeter, a programmable DC power supply, and digital I/O. The PC- or tablet-based user-interface software allows you to make all of those instruments play together as a troubleshooting symphony. That point is made very apparent in this new 3-minute video demonstrating the speed at which you can troubleshoot circuits using all of the VirtualBench’s capabilities in concert:
For more Xcell Daily blog posts about the NI VirtualBench All-in-One instrument, see:
For more information about the VirtualBench, please contact NI directly.
I’ve known this was coming for more than a week, but last night I got double what I expected. Digilent’s Web site has been teasing the new Arty Z7 Zynq SoC dev board for makers and hobbyists for a week—but with no listed price. Last night, prices appeared. That’s right, there are two versions of the board available:
Digilent Arty Z7 dev board for makers and hobbyists
Other than that, the board specs appear identical.
The first thing you’ll note from the photo is that there’s a Zynq SoC in the middle of the board. You’ll also see the board’s USB, Ethernet, Pmod, and HDMI ports. On the left, you can see double rows of tenth-inch headers in an Arduino/chipKIT shield configuration. There are a lot of ways to connect to this board, which should make it a student’s or experimenter’s dream board considering what you can do with a Zynq SoC. (In case you don’t know, there’s a dual-core ARM Cortex-A9 MPCore processor on the chip along with a hearty serving of FPGA fabric.)
Oh yeah. The Xilinx Vivado HL Design Suite WebPACK tools? Those are available at no cost. (So is Digilent’s attractive cardboard packaging, according to Arty Z7 Web page.)
Although the Arty Z7 board has now appeared on Digilent’s Web site, the product’s Web page says the expected release date is March 27. That’s five whole days away!
As they say, operators are standing by.
Please contact Digilent directly for more Arty Z7 details.
The organizers of last week’s Embedded World show in Nuremberg gave out embedded AWARDS in three categories last week during the show and MathWorks’ HDL Coder won in the tools category. (See the announcement here.) If you don’t know about this unique development tool, now is a good time to become acquainted with it. HDL Coder accepts model-based designs created using MathWorks’ MATLAB and Simulink and can generate VHDL or Verilog for all-hardware designs or hardware and software code for designs based on a mix of custom hardware and embedded software running on a processor. That means that HDL Coder works well with Xilinx FPGAs and Zynq SoCs.
Here’s a diagram of what HDL Coder does:
You might also want to watch this detailed MathWorks video titled “Accelerate Design Space Exploration Using HDL Coder Optimizations.” (Email registration required.)
For more information about using MathWorks HDL Coder to target your designs for Xilinx devices, see:
You want to learn how to design with and use RF, right? Students from all levels and backgrounds looking to improve their RF knowledge will want to take a look at the new ADALM-PLUTO SDR USB Learning Module from Analog Devices. The $149 USB module has an RF range of 325MHz to 3.8GHz with separate transmit and receive channels and 20MHz of instantaneous bandwidth. It pairs two devices that seem made for each other: an Analog Devices AD9363 Agile RF Transceiver and a Xilinx Zynq Z-7010 SoC.
Analog Devices’ $149 ADALM-PLUTO SDR USB Learning Module
Here’s an extremely simplified block diagram of the module:
Analog Devices’ ADALM-PLUTO SDR USB Learning Module Block Diagram
However, the learning module’s hardware is of little use without training material and Analog Devices has already created dozens of online tutorials and teaching materials for this device including ADS-B aircraft position, receiving NOAA and Meteor-M2 weather satellite imagery, GSM analysis, listening to TETRA signals, and pager decoding.
I did not go to Embedded World in Nuremberg this week but apparently SemiWiki’s Bernard Murphy was there and he’s published his observations about three Zynq-based reference designs that he saw running in Aldec’s booth on the company’s Zynq-based TySOM embedded dev and prototyping boards.
Aldec TySOM-2 Embedded Prototyping Board
Murphy published this article titled “Aldec Swings for the Fences” on SemiWiki and wrote:
“At the show, Aldec provided insight into using the solution to model the ARM core running in QEMU, together with a MIPI CSI-2 solution running in the FPGA. But Aldec didn’t stop there. They also showed off three reference designs designed using this flow and built on their TySOM boards.
“The first reference design targets multi-camera surround view for ADAS (automotive – advanced driver assistance systems). Camera inputs come from four First Sensor Blue Eagle systems, which must be processed simultaneously in real-time. A lot of this is handled in software running on the Zynq ARM cores but the computationally-intensive work, including edge detection, colorspace conversion and frame-merging, is handled in the FPGA. ADAS is one of the hottest areas in the market and likely to get hotter since Intel just acquired Mobileye.
“The next reference design targets IoT gateways – also hot. Cloud interface, through protocols like MQTT, is handled by the processors. The gateway supports connection to edge devices using wireless and wired protocols including Bluetooth, ZigBee, Wi-Fi and USB.
“Face detection for building security, device access and identifying evil-doers is also growing fast. The third reference design is targeted at this application, using similar capabilities to those on the ADAS board, but here managing real-time streaming video as 1280x720 at 30 frames per second, from an HDR-CMOS image sensor.”
The article contains a photo of the Aldec TySOM-2 Embedded Prototyping Board, which is based on a Xilinx Zynq Z-7045 SoC. According to Murphy, Aldec developed the reference designs using its own and other design tools including the Aldec Riviera-PRO simulator and QEMU. (For more information about the Zynq-specific QEMU processor emulator, see “The Xilinx version of QEMU handles ARM Cortex-A53, Cortex-R5, Cortex-A9, and MicroBlaze.”)
Then Murphy wrote this:
“So yes, Aldec put together a solution combining their simulator with QEMU emulation and perhaps that wouldn’t justify a technical paper in DVCon. But business-wise they look like they are starting on a much bigger path. They’re enabling FPGA-based system prototype and build in some of the hottest areas in systems today and they make these solutions affordable for design teams with much more constrained budgets than are available to the leaders in these fields.”
AEye is the latest iteration of the eye-tracking technology developed by EyeTech Digital Systems. The AEye chip is based on the Zynq Z-7020 SoC. It’s located immediately adjacent to the imaging sensor, which creates compact, stand-alone systems. This technology is finding its way into diverse vision-guided systems in the automotive, AR/VR, and medical diagnostic arenas. According to EyeTech, the Zynq SoC’s unique abilities allows the company to create products they could not do any other way.
With the advent of the reVISION stack, EyeTech is looking to expand its product offerings into machine learning, as discussed in this short, 3-minute video:
For more information about EyeTech, see:
By Adam Taylor
In looking at the Zynq UltraScale+ MPSoC’s AMS capabilities so far, we have introduced the two slightly different Sysmon blocks residing within the Zynq UltraScale+ MPSoC’s PS (processing system) and PL (programmable logic). In this blog, I am going to demonstrate how we can get the PS Symon up and running when we use both the ARM Cortex-A53 and Cortex-R5 processor cores in the Zynq UltraScale+ MPSoC’s PS. There is little difference when we use both types of processor, but I think it important to show you how to use both.
The process to use the Sysmon is the same as it is for many of the peripherals we have looked at previously with the MicroZed Chronicles:
The function names in parentheses are those which we use to perform the operation we desire, provided we pass the correct parameters. In the simplest case, as in this example, we can then poll the output registers using the XSysMonPsu_GetAdcData() function. All of these functions are defined within the file xsysmonpsu.h, which is available under the board Support Package Lib Src directory in SDK.
Examining the functions, you will notice that each of the functions used in step 4 to 8 require an input parameter called SysmonBlk. You must pass this parameter to the function. This parameter is how we which Sysmon (within the PS or the PL) we want to address. For this example, we will be specifying the PS Sysmon using XSYSMON_PS, which is also defined within xsysmonpsu.h. If we want to address the PL, we use the XSYSMON_PL definition, which we will be looking at next time.
There is also another header file which is of use and that is xsysmonpsu_hw.h. Within this file, we can find the definitions required to correctly select the channels we wish to sample in the sequencer. These are defined in the format:
This simple example samples the following within the PS Sysmon:
We can use conversion functions provided within the xsysmonpsu.h to convert from the raw value supplied by the ADC into temperature and voltage. However, the PS IO banks are capable of supporting 3v3 logic. As such, the conversion macro from raw reading to voltage is not correct for these IO banks or for the HD banks in the PL. (We will look at different IO bank types in another blog).
The full-scale voltage is 3V for most of the voltage conversions. However, in line with UG580 Pg43, we need to use a full scale of 6V for the PS IO. Otherwise we will see a value only half of what we are expecting for that bank’s supply voltage setting. With this in mind, my example contains a conversion function at the top of the source file to be used for these IO banks, to ensure that we get the correct value.
The Zynq UltraScale+ MPSoC architecture permits both the APU (the ARM Cortex-A53 processors) and the RPU (the ARM Cortex-R5 processors) to address the Sysmon. To demonstrate this, the same file was used in applications first targeting an ARM Cortex-A53 processor in the APU and then targeting the ARM Cortex-R5 processor in the RPU. I used Core 0 in both cases.
The only difference between these two cases was the need to create new applications that select the core to be targeted and then updating the FSBL to load the correct core. (See “Adam Taylor’s MicroZed Chronicles, Part 172: UltraZed Part 3—Saying hello world and First-Stage Boot” for more information on how to do this.)
Results when using the ARM Cortex-A53 Core 0 Processor
Results when using the ARM Cortex-R5 Core 0 Processor
When I ran the same code, which is available in the GitHub directory, I received the examples as above over the terminal program, which show it working on both the ARM Cortex-A53 and ARM Cortex-R5 cores.
Next time we will look at how we can use the PL Sysmon.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
This week, EETimes’ Junko Yoshida published an article titled “Xilinx AI Engine Steers New Course” that gathers some comments from industry experts and from Xilinx with respect to Monday’s reVISION stack announcement. To recap, the Xilinx reVISION stack is a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference.
As Xilinx Senior Vice President of Corporate Strategy Steve Glaser tells Yoshida, “Xilinx designed the stack to ‘enable a much broader set of software and systems engineers, with little or no hardware design expertise to develop, intelligent vision guided systems easier and faster.’”
“While talking to customers who have already begun developing machine-learning technologies, Xilinx identified ‘8 bit and below fixed point precision’ as the key to significantly improve efficiency in machine-learning inference systems.”
Yoshida also interviewed Karl Freund, Senior Analyst for HPC and Deep Learning at Moor Insights & Strategy, who said:
“Artificial Intelligence remains in its infancy, and rapid change is the only constant.” In this circumstance, Xilinx seeks “to ease the programming burden to enable designers to accelerate their applications as they experiment and deploy the best solutions as rapidly as possible in a highly competitive industry.”
She also quotes Loring Wirbel, a Senior Analyst at The Linley group, who said:
“What’s interesting in Xilinx's software offering, [is that] this builds upon the original stack for cloud-based unsupervised inference, Reconfigurable Acceleration Stack, and expands inference capabilities to the network edge and embedded applications. One might say they took a backward approach versus the rest of the industry. But I see machine-learning product developers going a variety of directions in trained and inference subsystems. At this point, there's no right way or wrong way.”
There’s a lot more information in the EETimes article, so you might want to take a look for yourself.
This week at Embedded World in Nuremberg, Lynx Software Technologies is demonstrating its port of the LynxSecure Separation Kernel hypervisor to the ARM Cortex-A53 processors on the Xilinx Zynq UltraScale+ MPSoC. According to Robert Day, Vice President of Marketing at Lynx, "ARM designers are now able to run safety critical environments alongside a general purpose OS like Linux or LynxOS RTOS on the same Xilinx processor without compromising safety, security or real-time performance. Use cases include automotive systems based on environments such as AUTOSAR RTA-BSW from ETAS and avionics designs using LynxOS-178 RTOS from Lynx. Designers can match the security of air-gap hardware partitioning without incurring the cost, power and size overhead of separate hardware."
The LynxSecure port to the Zynq UltraScale+ MPSoC supports modular software architectures and tight integration with the Zynq UltraScale+ MPSoC’s FPGA fabric for hosting bare-metal applications, trusted functions, and open-source projects on a single SoC with secure partitioning. You have the option to decide which functions run in software using LynxSecure bare-metal apps and which functions you need to hardware-accelerate through the Zynq UltraScale+ MPSoC’s FPGA fabric.
The LunxSecure technology was designed to satisfy high-assurance computing requirements in support of the NIST, NSA Common Criteria, and NERC CIP evaluation processes which are used to regulate military and industrial computing environments.
The LynxSecure Separation Kernel hypervisor provides:
Here’s a diagram of the LynxSecure Separation Kernel hypervisor architecture:
Please contact Lynx Software Technologies directly for information about the LynxSecure Separation Kernel hypervisor.
Machine learning and machine inference based on CNNs (convolutional neural networks) are the latest way to classify images and, as I wrote in Monday’s blog post about the new Xilinx reVISION announcement, “The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down.” (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”) The challenge now is to make the CNNs run faster while consuming less power. It would be nice to make them easier to use as well.
OK, that’s a setup. A paper published last month at the 25th International Symposium on Field Programmable Gate Arrays titled “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference” describes a method to speed up CNN-based inference while cutting power consumption by reducing CNN precision in the inference machines. As the paper states:
“…a growing body of research demonstrates this approach [CNN] incorporates significant redundancy. Recently, it has been shown that neural networks can classify accurately using one- or two-bit quantization for weights and activations. Such a combination of low-precision arithmetic and small memory footprint presents a unique opportunity for fast and energy-efficient image classification using Field Programmable Gate Arrays (FPGAs). FPGAs have much higher theoretical peak performance for binary operations compared to floating point, while the small memory footprint removes the off-chip memory bottleneck by keeping parameters on-chip, even for large networks. Binarized Neural Networks (BNNs), proposed by Courbariaux et al., are particularly appealing since they can be implemented almost entirely with binary operations, with the potential to attain performance in the teraoperations per second (TOPS) range on FPGAs.”
The paper then describes the techniques developed by the authors to generate BNNs and instantiate them into FPGAs. The results, based on experiment using a Xilinx ZC706 eval kit based on a Zynq Z-7045 SoC, are impressive:
“When it comes to pure image throughput, our designs outperform all others. For the MNIST dataset, we achieve an FPS which is over 48/6x over the nearest highest throughput design  for our SFC-max/LFC-max designs respectively. While our SFC-max design has lower accuracy than the networks implemented by Alemdar et al. for our LFC-max design outperforms their nearest accuracy design by over 6/1.9x for throughput and FPS/W respectively. For other datasets, our CNV-max design outperforms TrueNorth for FPS by over 17/8x for CIFAR-10 / SVHN datasets respectively, while achieving 9.44x higher throughput than the design by Ovtcharov et al., and 2:2x over the fastest results reported by Hegde et al. Our prototypes have classification accuracy within 3% of the other low-precision works, and could have been improved by using larger BNNs.”
There’s something even more impressive, however. This design approach to creating BNNs is so scalable that it’s now on a low-end platform—the $229 Digilent PYNQ-Z1. (Digilent’s academic price for the PYNQ-Z1 is only $65!) Xilinx Research Labs in Ireland, NTNU (Norwegian U. of Science and Technology), and the U. of Sydney have released an open-source Binarized Neural Network (BNN) Overlay for the PYNQ-Z1 based on the work described in the above paper.
According to Giulio Gambardella of Xilinx Reseach Labs, “…running on the PYNQ-Z1 (a smaller Zynq 7020), [the PYNQ-Z1] can achieve 168,000 image classifications per second with 102µsec latency on the MNIST dataset with 98.40% accuracy, and 1700 images per seconds with 2.2msec latency on the CIFAR-10, SVHN, and GTSRB dataset, with 80.1%, 96.69%, and 97.66% accuracy respectively running at under 2.5W.”
Digilent PYNQ-Z1 board, based on a Xilinx Zynq Z-7020 SoC
Because the PYNQ-Z1 programming environment centers on Python and the Jupyter development environment, there are a number of Jupyter notebooks associated with this package that demonstrate what the overlay can do through live code that runs on the PYNQ-Z1 board, equations, visualizations and explanatory text and program results including images.
There are also examples of this BNN in practical application:
For more information about the Digilent PYNQ-Z1 board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.”
Today, EEJournal’s Kevin Morris has published a review article of the announcement titled “Teaching Machines to See: Xilinx Launches reVISION” following Monday’s announcement of the Xilinx reVISION stack for developing vision-guided applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”
“But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms.
“That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric - such as Xilinx Zynq and Altera SoC FPGAs - absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space.”
Later on, Morris gets to the fly in the ointment:
“Oh, yeah, There’s still that “almost impossible to program” issue.”
And then he gets to the solution:
“reVISION, announced this week, is a stack - a set of tools, interfaces, and IP - designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development - with customers predicting cuts of as much as 12 months from current schedules for new product and update development.
In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.
The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ⅕ the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications.”
But don’t take my word for it, read Morris’ article yourself.
By Dr. Rajan Bedi, Spacechips
Several of my satcom ground-segment clients and I are considering Xilinx's recently announced RFSoC for future transceivers and I want to share the benefits of this impending device. (Note: For more information on the Xilinx RFSoC, see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say “All Programmable,” we mean it!”)
Direct RF/IF sampling and direct DAC up-conversion are currently being used very successfully in-orbit and on the ground. For example, bandpass sampling provides flexible RF frequency planning with some spacecraft by directly digitizing L- and S-band carriers to remove expensive and cumbersome superheterodyne down-conversion stages. Today, many navigation satellites directly re-construct the L-band carrier from baseband data without using traditional up-conversion. Direct RF/IF Sampling and direct DAC up-conversion have dramatically reduced the BOM, size, weight, power consumption, as well as the recurring and non-recurring costs of transponders. Software-defined radio (SDR) has given operators real scalability, reusability, and reconfigurability. Xilinx's new RFSoC will offer further hardware integration advantages for the ground segment.
The Xilinx RFSoC integrates multi-Gsamples/sec ADCs and DACs into a 16nm Zynq UltraScale+ MPSoC. At this geometry and with this technology, the mixed-signal converters draw very little power and economies of scale make it possible to add a lot of digital post-processing (Small A/Big D!) to implement functions such as DDC (digital down-conversion), DUC (digital up-conversion), AGC (automatic gain control), and interleaving calibration.
While CMOS scaling has improved ADC and DAC sample rates, which results in greater bandwidths at lower power, the transconductance of transistors and the size of the analog input/output voltage swing are reduced for analog designs, which impacts G/T at the satellite receiver. (G/T is antenna gain-to-noise-temperature, a figure of merit in the characterization of antenna performance where G is the antenna gain in decibels at the receive frequency and T is the equivalent noise temperature of the receiving system in kelvins. The receiving system’s noise temperature is the summation of the antenna noise temperature and the RF-chain noise temperature from the antenna terminals to the receiver output.)
Integrating ADCs and DACs with Xilinx's programmable MPSoC fabric shrinks physical footprint, reduces chip-to-chip latency, and completely eliminates the external digital interfaces between the mixed-signal converters and the FPGA. These external interfaces typically consume appreciable power. For parallel-I/O connections, they also need large amounts of pc board space and are difficult to route.
There will be a number of devices in the Xilinx RFSoC family, each containing different ADC/DAC combinations targeting different markets. Depending on the number of integrated mixed-signal converters, Xilinx is predicting a 55% to 77% reduction in footprint compared to current discrete implementations using JESD204B high-speed serial links between the FPGA and the ADCs and DACs, as illustrated below. Integration will also benefit clock distribution both at the device and system level.
Figure 1: RFSoC device concept (Source Xilinx)
The RFSoC’s integrated 12-bit ADCs can each sample up to 4Gsamples/sec, which offers flexible bandwidth and RF frequency-planning options. The analog input bandwidth of each ADC appears to be 4GHz, which allows direct RF/IF sampling up to the S-band.
Direct RF/IF sampling obeys the bandpass Nyquist Theorem when oversampling at 2x the information bandwidth (or greater) and undersampling the absolute carrier frequencies. For example, the spectrum below shows a 48.5MHz L-band signal centerd at 1.65GHz, digitized using an undersampling rate of 140.5Msamples/sec. The resulting oversampling ratio is 2.9 with the information located in the 24th Nyquist zone. Digitization aliases the bandpass information to the first Nyquist zone, which may or may not be baseband depending on your application. If not, the RFSoC's integrated DDC moves the alias to dc, allowing the use of a low-pass filter.
Figure 2: Direct L-Band Sampling
As the sample rate increases, the noise spectral density spreads across a wider Nyquist region with respect to the original signal bandwidth. Each time the sampling frequency doubles, the noise spectral density decreases by 3dB as the noise re-distributes across twice the bandwidth, which increases dynamic range and SNR. Understandably, operators want to exploit this processing gain! A larger oversampling ratio also moves the aliases further apart, relaxing the specification of the anti-aliasing filter. Furthermore, oversampling increases the correlation between successive samples in the time-domain, allowing the use of a decimating filter to remove some samples and reduce the interface rate between the ADC and the FPGA.
The RFSoC’s integrated 14-bit DACs operate up to 6.4Gsamples/sec, which also offers flexible bandwidth and RF frequency-planning options.
Just like any high-frequency, large bandwidth mixed-signal device, designing an RFSoC into a system requires careful consideration of floor-planning, front/back-end component placement, routing, grounding, and analog-digital segregation to achieve the required system performance. The partitioning starts at the die and extends to the module/sub-system level with all the analog signals (including the sampling clock) typically on one side of an ADC or DAC. Given the RFSoC's high sampling frequencies, at the pcb level, analog inputs and outputs must be isolated further to prevent crosstalk between adjacent channels and clocks, and from digital noise.
At low carrier frequencies, the performance of an ADC or DAC is limited by its resolution and linearity (DNL/INL). However at higher signal frequencies, SNR is determined primarily by the sampling clock’s purity. For direct RF/IF applications, minimizing jitter will be key to achieving the desired performance as shown below:
Figure 3: SNR of an ideal ADC vs analog input frequency and clock jitter
While there are aspects of the mixed-signal processing that could be improved, from the early announcements and information posted on their website, Xilinx has done a good job with the RFSoC. Although not specifically designed for satellite communication, but more so for 5G MIMO and wireless backhaul, the RFSoC's ADCs and DACs have sufficient dynamic range and offer flexible RF frequency-planning options for many ground-segment OEMs.
The specification of the RFSoC's ADC will allow ground receivers to directly digitize the information broadcast at traditional satellite communication frequencies at L- and S-band as well as the larger bandwidths used by high-throughput digital payloads. Thanks to its reprogrammability, the same RFSoC-based architecture with its wideband ADCs can be re-used for other frequency plans without having to re-engineer the hardware.
The RFSoC's DAC specification will allow ground transmitters to directly construct approximately 3GHz of bandwidth up to the X-band (9.6GHz). Xilinx says that first samples of RFSoC will become available in 2018 and I look forward to designing the part into satcom systems and sharing my experiences with you.
Dr. Rajan Bedi pioneered the use of Direct RF/IF Sampling and direct DAC up-conversion for the space industry with many in-orbit satellites currently using these techniques. He was previously invited by ESA and NASA to present his work and was also part of the project teams which developed many of the ultra-wideband ADCs and DACs currently on the market. These devices are successfully operating in orbit today. Last year, his company, Spacechips, was awarded High-Reliability Product of the Year for advancing Software Defined Radio.
Spacechips provides space electronics design consultancy services to manufacturers of satellites and spacecraft around the world. The company also helps OEMs assess the benefits of COTS components and exploit the advantages of direct RF/IF sampling and direct DAC up-conversion. Prior to founding Spacechips, Dr. Bedi headed the Mixed-Signal Design Group at Airbus Defence & Space in the UK for twelve years. Rajan is the author of Out-of-this-World Design, the popular, award-winning blog on Space Electronics. He also teaches direct RF/IF sampling and direct DAC up-conversion techniques in his Mixed-Signal and FPGA courses which are offered around the world. Rajan offers a series of unique training courses, Courses for Rocket Scientists, which teach and compare all space-grade FPGAs as well as the use of COTS Xilinx UltraScale and UltraScale+ parts for implementing spacecraft IP. Rajan has designed every space-grade FPGA into satellite systems!
As part of today’s reVISION announcement of a new, comprehensive development stack for embedded-vision applications, Xilinx has produced a 3-minute video showing you just some of the things made possible by this announcement.
Here it is:
By Adam Taylor
Several times in this series, we have looked at image processing using the Avnet EVK and the ZedBoard. Along with the basics, we have examined object tracking using OpenCV running on the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PS (processing system) and using HLS with its video library to generate image-processing algorithms for the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic, see blogs 140 to 148 here).
Xilinx’s reVision is an embedded-vision development stack that provides support for a wide range of frameworks and libraries often used for embedded-vision applications. Most exciting, from my point of view, is that the stack includes acceleration-ready OpenCV functions.
The stack itself is split into three layers. Once we select or define our platform, we will be mostly working at the application and algorithm layers. Let’s take a quick look at the layers of the stack:
As I mentioned above one of the most exciting aspects of the reVISION stack is the ability to accelerate a wide range of OpenCV functions using the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL. We can group the OpenCV functions that can be hardware-accelerated using the PL into four categories:
What is very interesting with these function calls is that we can optimize them for resource usage or performance within the PL. The main optimization method is specifying the number of pixels to be processed during each clock cycle. For most accelerated functions, we can choose to process either one or eight pixels. Processing more pixels per clock cycle reduces latency but increases resource utilization. Processing one pixel per clock minimizes the resource requirements at the cost of increased latency. We control the number of pixels processed per clock in via the function call.
Over the next few blogs, we will look more at the reVision stack and how we can use it. However in the best Blue Peter tradition, the image below shows the result of running a reVision Harris OpenCV acceleration function within the PL when accelerated.
Accelerated Harris Corner Detection in the PL
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Today, Xilinx announced a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference. It’s called the reVISION stack and it allows design teams without deep hardware expertise to use a software-defined development flow to combine efficient machine-learning and computer-vision algorithms with Xilinx All Programmable devices to create highly responsive systems. (Details here.)
The Xilinx reVISION stack includes a broad range of development resources for platform, algorithm, and application development including support for the most popular neural networks: AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN. Additionally, the stack provides library elements such as pre-defined and optimized implementations for CNN network layers, which are required to build custom neural networks (DNNs and CNNs). The machine-learning elements are complemented by a broad set of acceleration-ready OpenCV functions for computer-vision processing.
For application-level development, Xilinx supports industry-standard frameworks including Caffe for machine learning and OpenVX for computer vision. The reVISION stack also includes development platforms from Xilinx and third parties, which support various sensor types.
The reVISION development flow starts with a familiar, Eclipse-based development environment; the C, C++, and/or OpenCL programming languages; and associated compilers all incorporated into the Xilinx SDSoC development environment. You can now target reVISION hardware platforms within the SDSoC environment, drawing from a pool of acceleration-ready, computer-vision libraries to quickly build your application. Soon, you’ll also be able to use the Khronos Group’s OpenVX framework as well.
For machine learning, you can use popular frameworks including Caffe to train neural networks. Within one Xilinx Zynq SoC or Zynq UltraScale+ MPSoC, you can use Caffe-generated .prototxt files to configure a software scheduler running on one of the device’s ARM processors to drive CNN inference accelerators—pre-optimized for and instantiated in programmable logic. For computer vision and other algorithms, you can profile your code, identify bottlenecks, and then designate specific functions that need to be hardware-accelerated. The Xilinx system-optimizing compiler then creates an accelerated implementation of your code, automatically including the required processor/accelerator interfaces (data movers) and software drivers.
The Xilinx reVISION stack is the latest in an evolutionary line of development tools for creating embedded-vision systems. Xilinx All Programmable devices have long been used to develop such vision-based systems because these devices can interface to any image sensor and connect to any network—which Xilinx calls any-to-any connectivity—and they provide the large amounts of high-performance processing horsepower that vision systems require.
Initially, embedded-vision developers used the existing Xilinx Verilog and VHDL tools to develop these systems. Xilinx introduced the SDSoC development environment for HLL-based design two years ago and, since then, SDSoC has dramatically and successfully shorted development cycles for thousands of design teams. Xilinx’s new reVISION stack now enables an even broader set of software and systems engineers to develop intelligent, highly responsive embedded-vision systems faster and more easily using Xilinx All Programmable devices.
And what about the performance of the resulting embedded-vision systems? How do their performance metrics compare against against systems based on embedded GPUs or the typical SoCs used in these applications? Xilinx-based systems significantly outperform the best of this group, which employ Nvidia devices. Benchmarks of the reVISION flow using Zynq SoC targets against Nvidia Tegra X1 have shown as much as:
There is huge value to having a very rapid and deterministic system-response time and, for many systems, the faster response time of a design that's been accelerated using programmable logic can mean the difference between success and catastrophic failure. For example, the figure below shows the difference in response time between a car’s vision-guided braking system created with the Xilinx reVISION stack running on a Zynq UltraScale+ MPSoC relative to a similar system based on an Nvidia Tegra device. At 65mph, the Xilinx embedded-vision system’s response time stops the vehicle 5 to 33 feet faster depending on how the Nvidia-based system is implemented. Five to 33 feet could easily mean the difference between a safe stop and a collision.
(Note: This example appears in the new Xilinx reVISION backgrounder.)
The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down. Many new types of neural networks for vision-guided systems have emerged along with new techniques that make deployment of these neural networks much more efficient. No matter what you develop today or implement tomorrow, the hardware and I/O reconfigurability and software programmability of Xilinx All Programmable devices can “future-proof” your designs whether it’s to permit the implementation of new algorithms in existing hardware; to interface to new, improved sensing technology; or to add an all-new sensor type (like LIDAR or Time-of-Flight sensors, for example) to improve a vision-based system’s safety and reliability through advanced sensor fusion.
Xilinx is pushing even further into vision-guided, machine-learning applications with the new Xilinx reVISION Stack and this announcement complements the recently announced Reconfigurable Acceleration Stack for cloud-based systems. (See “Xilinx Reconfigurable Acceleration Stack speeds programming of machine learning, data analytics, video-streaming apps.”) Together, these new development resources significantly broaden your ability to deploy machine-learning applications using Xilinx technology—from inside the cloud to the very edge.
You might also want to read “Xilinx AI Engines Steers New Course” by Junko Yoshida on the EETimes.com site.
The amazing “snickerdoodle one”—a low-cost, single-board computer with wireless capability based on the Xilinx Zynq Z-7010 SoC—is once more available for purchase on the Crowd Supply crowdsourcing Web site. Shipments are already going out to existing backers and, if you missed out on the original crowdsourcing campaign, you can order one for the post-campaign price of $95. That’s still a huuuuge bargain in my book. (Note: There is a limited number of these boards available, so if you want one, now’s the time to order it.)
In addition, you can still get the “snickerdoodle black” with a faster Zynq Z-7020 SoC and more SDRAM that also includes an SDSoC software license, all for $195. Finally, snickerdoodle’s creator krtkl has added two mid-priced options: the snickerdoodle prime and snickerdoodle prime LE—also based on Zynq Z-7020 SoCs—for $145.
The krtkl snickerdoodle low-cost, single-board computer based on a Xilinx Zynq SoC
Ryan Cousins at krtkl sent me this table that helps explain the differences among the four snickerdoodle versions:
For more information about krtkl’s snickerdoodle SBC, see:
I just received an email from Dave Embedded Systems announcing that the company will be showing its new ONDA SOM (System on Module) based on Xilinx Zynq UltraScale+ MPSoCs at next week’s Embedded World 2017 in Nuremberg. Here’s a board photo:
Dave Embedded Systems ONDA SOM based on the Xilinx Zynq UltraScale+ MPSoC (Note: Facsimile Image)
And here’s a photo of the SMM’s back side showing the three 140-pin, high-density I/O connectors:
Dave Embedded Systems ONDA SOM based on the Xilinx Zynq UltraScale+ MPSoC (Back Side)
Thanks to the multiple processors and programmable logic in the Zynq UltraScale+ MPSoC, the ONDA board packs a lot of processing power into its small 90x55mm board. Dave Embedded Systems plans to offer versions of the ONDA SOM based on the Zynq UltraScale+ ZU2, ZU3, ZU4, and ZU5 MPSoCs, so there should be a wide range of price/performance points to pick from while standardizing on one uniformly sized platform.
Here’s a block diagram of the board:
Dave Embedded Systems ONDA SOM based on the Xilinx Zynq UltraScale+ MPSoC, Block Diagram
Please contact Dave Embedded Systems for more information about the ONDA SOM.
By Adam Taylor
Embedded vision is one of my many FPGA/SoC interests. Recently, I have been doing some significant development work with the Avnet Embedded Vision Kit (EVK) significantly (for more info on the EVK and its uses see Issues 114 to 126 of the MicroZed Chronicles). As part my development, I wanted to synchronize the EVK display output with an external source—also useful if we desire to synchronize multiple image streams.
Implementing this is straight forward provided we have the correct architecture. The main element we need is a buffer between the upstream camera/image sensor chain and the downstream output-timing and -processing chain. VDMA (Video Direct Memory Access) provides this buffer by allowing us to store frames from the upstream image-processing pipeline in DDR SDRAM and then reading out the frames into a downstream processing pipeline with different timing.
The architectural concept appears below:
VDMA buffering between upstream and downstream with external sync
For most downstream chains, we use a combination of the video timing controller (VTC) and AXI Stream to Video Out IP blocks, both provided in the Vivado IP library. These two IP blocks work together. The VTC provides output timing and generates signals such as VSync and HSync. The AXI Stream to Video Out IP Block synchronizes its incoming AXIS stream with the timing signals provided by the VTC to generate the output video signals. Once the AXI Stream to Video Out block has synchronized with these signals, it is said to be locked and it will generate output video and timing signals that we can use.
The VTC itself is capable of both detecting input video timing and generating output video timing. These can be synchronized if you desire. If no video input timing signals are available to the VTC, then the input frame sync pulse (FSYNC_IN) serves to synchronize the output timing.
Enabling Synchronization with FSYNC_IN or the Detector
If FSYNC_IN alone is used to synchronize the output, we need to use not only FSYNC_IN but also the VTC-provided frame sync out (FSYNC_OUT) and GEN_CLKEN to ensure correct synchronization. GEN_CLKEN is an input enable that allows the VTC generator output stage to be clocked.
The FSYNC_OUT pulse can be configured to occur at any point within the frame. For this application, is has been configured to be generated at the very end of the frame. This configuration can take place in the VTC re-configuration dialog within Vivado for a one-time approach or, if an AXI Lite interface is provided, it can be positioned using that during run time.
The algorithm used to synchronize the VTC to an external signal is:
Should GEN_CLK not be disabled, the VTC will continue to run freely and will generate the next frame sequence. Issuing another FSYNC_IP while this is occurring will not result in re-synchronisation but will result in the AXI Stream to Video Out IP block being unable to synchronize the AXIS video with the timing information and losing lock.
Therefore, to control the enabling of the GEN_CLKEN we need to create a simple RTL block that implements the algorithm above.
Vivado Project Demonstrating the concept
When simulated, this design resulted in the VTC synchronizing to the FSYNC_IN signal as intended. It also worked the same when I implemented it in my EVK kit, allowing me to synchronize the output to an external trigger.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Last month, the European AXIOM Project took delivery of its first board based on a Xilinx Zynq UltraScale+ ZU9EG MPSoC. (See “The AXIOM Board has arrived!”) The AXIOM project (Agile, eXtensible, fast I/O Module) aims at researching new software/hardware architectures for Cyber-Physical Systems (CPS).
AXIOM Project Board based on Xilinx Zynq UltraScale+ MPSoC
The board in fact presents the pinout of an Arduino Uno so you can attach an Arduino Uno-compatible shield to the board. The presence of the Arduino UNO pinout enables fast prototyping and exposes the FPGA I/O pins in a user-friendly manner.
Here are the board specs:
You can see the AXIOM board for the first time during next week’s Embedded World 2017 at the SECO UDOO Booth, at the SECO booth, and at the EVIDENCE booth.
Please contact the AXIOM Project for more information.
By Adam Taylor
Without a doubt, some of the most popular MicroZed Chronicles blogs I have written about the Zynq 7000 SoC explain how to use the Zynq SoC’s XADC. In this blog, we are going to look at how we can use the Zynq UltraScale+ MPSoC’s Sysmon, which replaces the XADC within the MPSoC.
The MPSoC contains not one but two Sysmon blocks. One is located within the MPSoC’s PS (processing system) and another within the MPSoC’s PL (programmable logic). The capabilities of the PL and PS Sysmon blocks are slightly different. While the processors in the MPSoC’s PS can access both Sysmon blocks through the MPSoC’s memory space, the different Sysmon blocks have different sampling rates and external interfacing abilities. (Note: the PL must be powered up before the PL Sysmon can be accessed by the MPSoC’s PS. As such, we should check the PL Sysmon control register to ensure that it is available before we perform any operations that use it.)
The PS Sysmon samples its inputs at 1Msamples/sec while the PL Sysmon has a reduced sampling rate of 200Ksamples/sec. However, the PS Sysmon does not have the ability to sample external signals. Instead, it monitors the Zynq MPSoC’s internal supply voltages and die temperature. The PL Sysmon can sample external signals and it is very similar to the Zynq SoC’s XADC, having both a dedicated VP/VN differential input pair and the ability to interface to as many as sixteen auxiliary differential inputs. It can also monitor on-chip voltage supplies and temperature.
Sysmon Architecture within the Zynq UltraScale+ MPSoC
Just as with the Zynq SoC’s XADC, we can set upper and lower alarm limits for ADC channels within both the PL and PS Sysmon in the Zynq UltraScale+ MPSoC. You can use these limits to generate an interrupt should the configured bound be exceed. We will look at exactly how we can do this in another blog once we understand the basics.
The two diagrams below show the differences between the PS and PL Sysmon blocks in the Zynq UltraScale+ MPSoC:
Zynq UltraScale+ MPSoC’s PS System Monitor (UG580)
Zynq UltraScale+ MPSoC’s PL Sysmon (UG580)
Interestingly, the Sysmone4 block in the MPSoC’s PL provides direct register access to the ADC data. This will be useful if using either the VP/VN or Aux VP/VN inputs to interface with sensors that do not require high sample rates. This arrangement permits downstream signal processing, filtering, and transfer functions to be implemented in logic.
Both MPSoC Sysmon blocks require 26 ADC clock cycles to perform a conversion. Therefore, if we are sampling at 200Ksamlpes/sec, using the PL Sysmon we require a 5.2MHz ADC clock. For the PS Sysmon to sample at 1Msamples/sec, we need to provide a 26MHz ADC clock.
We set the AMS modules’ clock within the MPSoC Clock Configuration dialog, as shown below:
Zynq UltraScale+ MPSoC’s AMS clock configuration
The eagle-eyed will notice that I have set the clock to 52MHz and not 26 MHz. This is because the PS Sysmon’s clock divisor has a minimum value of 2, so setting the clock to 52MHz results in the desired 26MHz clock. The minimum divisor is 8 for the PL Sysmon, although in this case it would need to be divided by 10 to get the desired 5.2MHz clock. You also need to pay careful attention to the actual frequency and not just the requested frequency to get the best performance. This will impact the sample rate as you may not always get the exact frequency you want—as is the case here.
Next time in the UltraZed Edition of the MicroZed Chronicles, we will look at the software required to communicate with both the PS and PL Symon in the Zynq UltraScale+ MPSoC.
Code is available on Github as always.
RFEL has supplied the UK’s Defence Science and Technology Laboratory (DSTL), an executive agency sponsored by the UK’s Ministry of Defence, with two of its Zynq-based HALO Rapid Prototype Development Systems (RPDS). DSTL will evaluate video processing algorithms using IP from RFEL and 3rd parties in real-time, interactive video trials for military users. The HAL RPDS dramatically speeds assessment of complex video-processing solutions and provides real-time prototypes while conventional software-based simulations do not provide real-time performance).
HALO Rapid Prototype Development Systems (RPDS)
HALO is a small, lightweight, real-time video-processing subsystem based on the Xilinx Zynq Z-7020 or Z-7030 SoCs. It’s also relatively low-cost. HALO is designed for fast integration of high-performance vision capabilities for extremely demanding video applications—and military video applications are some of the most demanding because things whiz by pretty quickly and mistakes are very costly. The Zynq SoC’s software and hardware programmability give the HALO RPDS the flexibility to adapt to a wide variety of video-processing applications while providing real-time response.
Here’s a block diagram of the HALO RPDS:
HALO Rapid Prototype Development Systems (RPDS) Block Diagram
As you can see from the light blue boxes at the top of this block diagram, there are already a variety of real-time, video-processing algorithms. RFEL itself offers many such cores for:
All of these video-processing functions operate in real-time because they are hardware implementations instantiated in the Zynq SoC’s PL (programmable logic). In addition, the Zynq SoC’s extensive collection of I/O peripherals and programmable I/O mean that the HALO RPDS can interface with a broad range of image and video sources and displays.(That's why we say that Zynq SoCs are All Programmable.)
DSTL procured two HALO RPDS systems to support very different video processing investigations, for diverse potential applications. One system is being used to evaluate RFEL's suite of High Definition (HD) video-stabilization IP products to create bespoke solutions. The second system is being used to evaluate 3rd-party algorithms and their performance. The flexibility and high performance of the Zynq-based HALO RPDS system means that it is now possible for DSTL to rapidly experiment with many different hardware-based algorithms. Of course, any successful candidate solutions are inherently supported on the HALO platform, so the small, lightweight HALO system provides both a prototyping platform and an implementation platform.
For previous coverage of an earlier version of RFEL’s HALO system, see “Linux + Zynq + Hardware Image Processing = Fused Driver Vision Enhancement (fDVE) for Tank Drivers.”
What could be better than a PCIe SBC (single-board computer) that combines a Xilinx Zynq SoC with an FMC connector? How about the world’s smallest FMC carrier card that also happens to be based on any one of three Xilinx Zynq SoCs (your choice of a single-core Zynq Z7012S SoC or a dual-core Z7015 or Z-7030)? That’s the description of the Berten DSP GigaExpress SBC.
The Berten DSP GigaExpress SBC, the world’s smallest FMC carrier card
The GigaExpress SBC incorporates 1Gbyte of DDR3L-1066 SDRAM for the Zynq SoC’s single- or dual-core ARM Cortex-A9 PS (Processing System) and there’s 512Mbytes of dedicated DDR3L SDRAM clocked at 333MHz for exclusive use by the Zynq PL (Programmable Logic). The PS software and PL configuration are stored in a 512Mbit QSPI Flash memory. A 1000BASE-T Ethernet interface is available on a rugged Cat5e RJ45 connector. The block diagram of the GigaExpress SBC appears below. From this diagram and the above photo, you can see that the Zynq SoC along with the various memory devices is all the digital silicon you need to implement a complete, high-performance system.
Berten DSP GigaExpress SBC Block Diagram
Please contact Berten DSP directly for more information about the GigaExpress SBC.
Today, Aldec announced its latest FPGA-based HES prototyping board—the HES-US-440—with a whopping 26M ASIC gate capacity. This board is based on the Xilinx Virtex UltraScale VU440 FPGA and it also incorporates a Xilinx Zynq Z-7100 SoC that acts as the board’s peripheral controller and host interface. The announcement includes a new release of Aldec’s HES-DVM Hardware/Software Validation Platform that enables simulation acceleration and emulation use modes for the HES-US-440 board in addition to the physical prototyping capabilities. You can also use this prototyping board directly to implement HPC (high-performance computing) applications.
Aldec HES-US-440 Prototyping Board, based on a Xilinx Virtex UltraScale VU440 FPGA
The Aldec HES-US-440 board packs a wide selection of external interfaces to ease your prototyping work including four FMC HPC connections, PCIe, USB 3.0 and USB 2.0 OTG, UART/USB bridge, QSFP+, 1Gbps Ethernet, HDMI, SATA; has on-board NAND and SPI Flash memories; and incorporates two microSD slots.
Here’s a block diagram of the HES-US-440 prototyping board:
Aldec HES-US-440 Prototyping Board Block Diagram
For more information about the Aldec HES-US-440 prototyping board and Aldec’s HES-DVM Hardware/Software Validation Platform, please contact Aldec directly.
Here are four online training classes in March that cover various technical design aspects of Xilinx UltraScale and UltraScale+ FPGAs and the Zynq UltraScale+ MPSoC:
03/09/2017 Zynq UltraScale+ MPSoC for the Software Developer
03/15/2017 Serial Transceivers in UltraScale Series FPGAs/MPSoCs – Part I – Transceiver Design Methodology
03/22/2017 Serial Transceivers in UltraScale Series FPGAs/MPSoCs – Part II – Debugging Techniques and PCB Design
03/23/2017 Zynq UltraScale+ MPSoC for the System Architect
These four classes will be taught by three Xilinx Authorized Training Providers: Faster Technology, Xprosys, and Hardent. Click here for registration details.
Berten DSP’s GigaX API for the Xilinx Zynq SoC creates a high-speed, 200Mbps full-duplex communications channel between a GbE port and the Zynq SoC’s PS (programmable logic) through an attached SDRAM buffer and an AXI DMA controller IP block. Here’s a diagram to clear up what’s happening:
The software API implements IP filtering and manages TCP/UDP headers, which help you implement a variety of hardware-accelerated Ethernet systems including Ethernet bridges, programmable network nodes, and network offload appliances. Here’s a performance curve illustrating the kind of throughput you can expect:
Please contact Berten DSP directly for more information about the GigaX API.
Xcell Daily has covered the Samtec FireFly mid-board interconnect system several times but now there’s a new 3.5-minute video demo of a PCIe-specific version of the FireFly optical module. In the video demo, FireFly optical PCIe modules convey PCIe signals between a host PC and a video card over 100m of optical fiber in real time. The video passed over this link works smoothly. That’s quite a feat for a small module like the FireFly and it creates new possibilities for designing distributed systems.
The PCIe-specific version of the Samtec FireFly module handles PCIe sidebands and other PCIe-specific protocols. These modules match up well with the PCIe controllers found in Xilinx UltraScale and UltraScale+ devices and many 7 series FPGAs and Zynq SoCs. As Kevin Burt of Samtec’s Optical Group explains, the mid-board design of the FireFly system allows you to locate the modules adjacent to the driving chips (FPGAs in this case), which improves signal integrity of the pcb design.
Here’s the Samtec video:
For additional coverage of the Samtec FireFly system, see:
By Adam Taylor
Having looked at how we can quickly and easily get the Zynq UltraScale+ MPSoC up and running, I now want to look at the architecture of the system in a little more detail. I am going to start with examining the processor’s global address map. I am not going to look in detail into the contents of the address map. Initially, I want to explore how it is organized so that we understand it. I want to explain how the 32-bit ARM Cortex-R5 processors in the Zynq UlraScale+ MPSoC’s RPU (Real-time Processing Unit) and the 64-bit ARM Cortex-A53 processors in the APU (Application Processing Unit) share their address spaces.
The ARM Cortex-A53 processors use a 40-bit address bus, which can address up to 1Tbyte of memory. Compare this to the 32-bit address bus of the ARM Cortex-R5 processors, which can only address a 4Gbyte address space. The Zynq UltraScale+ MPSoC architects therefore had to consider how these address spaces would work together. The solution they came up with is pretty straightforward.
The memory map of the The Zynq UltraScale+ MPSoC is organised to so that the PMU (Platform Management Unit), MIO peripherals, DDR controller, the PL (programmable logic), etc. all fall within the first 4Gbyte of addressable space so that the APU and the RPU can both address these resources. The APU has further access to the DDR and PCIe controllers and the PL up to the remaining 1Tbyte address limit. The lower 4Gbytes of address space supports 32-bit addressing for some peripherals. One example of this is the PCIe controller, which supports 32-bit addressing via a 256Mbyte address range in the lower 4Gbytes and up to 256Gbytes (using 64-bit addressing) in the full address map.
MPSoC Global Address Map
It goes without saying that the only the APU can access the address space above 4 GB. However, the more observant amongst us will have noticed that there is also what appears to be a 36-bit addressable mode as well. Using a 36-bit address, provides for faster address translation, because the table walker uses only three stages instead of four for a 40-bit address. Therefore, 36 bit addressing should be used if possible to optimize system performance.
Address translation is the role of the System Memory Management Unit (SMMU), which has been designed to transform addresses from a virtual address space to a physical address space when using a virtualized environment. The SMMU can provide the following translations if desired:
Virtual Address (VA) - > Intermediate Physical Address (IPA) -> Physical Address (PA)
Within the SMMU, these are defined as being stage one VA to IPA or stage two IPA to PA and depending upon use case we can perform only a stage one, stage two or a stage one and two translation. To understand more about the SMMU—which is a complex subject—I would recommend reading chapters 3 and 10 of the Zynq UltraScale+ MPSoC TRM (UG1085) and the ARM SMMU architecture specification.
SMMU translation schemes
Now that we understand a little more about the Zynq UltraScale+ MPSoC’s global memory map, we will look at exactly what is contained within this memory map and how we can configure and use this map with both the APU and the RPU cores over the next few blogs.
Code is available on Github as always.
Adam Taylor just published an EETimes review of the Xilinx RFSoC, announced earlier this week. (See “Game-Changing RFSoCs from Xilinx”.) Taylor has a lot of experience with high-speed analog converters: he’s designed systems based on them—so his perspective is that of a system designer who has used these types of devices and knows where the potholes are—and he’s worked for a semiconductor company that made them—so he should know what to look for with a deep, device-level perspective.
Here’s the capsulized summary of his comments in EETimes:
“The ADCs are sampled at 4 Gsps (gigasamples per second), while the DACs are sampled at 6.4 Gsps, all of which provides the ability to work across a very wide frequency range. The main benefit of this, of course, is a much simpler RF front end, which reduces not only PCB footprint and the BOM cost but -- more crucially -- the development time taken to implement a new system.”
“…these devices offer many advantages beyond the simpler RF front end and reduced system power that comes from such a tightly-coupled solution.”
“These devices also bring with them a simpler clocking scheme, both at the device-level and the system-level, ensuring clock distribution while maintaining low phase noise / jitter between the reference clock and the ADCs and DACs, which can be a significant challenge.”
“These RFSoCs will also simplify the PCB layout and stack, removing the need for careful segregation of high-speed digital signals from the very sensitive RF front-end.”
“I, for one, am very excited to learn more about RFSoCs and I cannot wait to get my hands on one.”
For more information about the new Xilinx RFSoC, see “Xilinx announces RFSoC with 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs for 5G, other apps. When we say ‘All Programmable,’ we mean it!” and “The New All Programmable RFSoC—and now the video.”
If you’re still uncertain as to what System View’s Visual System Integrator hardware/software co-development tool for Xilinx FPGAs and Zynq SoCs does, the following 3-minute video should make it crystal clear. Visual System Integrator extends the Xilinx Vivado Design Suite and makes it a system-design tool for a wide variety of embedded systems based on Xilinx devices.
This short video demonstrates System View’s tool being used for a Zynq-controlled robotic arm:
For more information about System View’s Visual System Integrator hardware/software co-development tool, see:
Avnet’s new $499 UltraZed PCIe I/O carrier card for its UltraZed-EG SoM (System on Module)—based on the Xilinx Zynq UltraScale+ MPSoC—gives you easy access to the SoM’s 180 user I/O pins, 26 MIO pins from the Zynq MPSoC’s MIO, and 4 GTR transceivers from the Zynq MPSoC’s PS (Processor System) through the PCIe x1 edge connector; two Digilent PMOD connectors; an FMC LPC connector; USB and microUSB, SATA, DisplayPort, and RJ45 connectors; an LVDS touch-panel interface; a SYSMON header; pushbutton switches; and LEDs.
$499 UltraZed PCIe I/O Carrier Card for the UltraZed-EG SoM
That’s a lot of connectivity to track in your head, so here’s a block diagram of the UltraZed PCIe I/O carrier card:
UltraZed PCIe I/O Carrier Card Block Diagram
For information on the Avnet UltraZed SOM, see “Look! Up in the sky! Is it a board? Is it a kit? It’s… UltraZed! The Zynq UltraScale+ MPSoC Starter Kit from Avnet” and “Avnet UltraZed-EG SOM based on 16nm Zynq UltraScale+ MPSoC: $599.” Also, see Adam Taylor’s MicroZed Chronicles about the UltraZed: