In this 40-minute webinar, Xilinx will present a new approach that allows you to unleash the power of the FPGA fabric in Zynq SoCs and Zynq UltraScale+ MPSoCs using hardware-tuned OpenCV libraries, with a familiar C/C++ development environment and readily available hardware development platforms. OpenCV libraries are widely used for algorithm prototyping by many leading technology companies and computer vision researchers. FPGAs can achieve unparalleled compute efficiency on complex algorithms like dense optical flow and stereo vision in only a few watts of power.
This Webinar is being held on July 12. Register here.
Here’s a fairly new, 4-minute video showing a 1080p60 Dense Optical Flow demo, developed with the Xilinx SDSoC Development Environment in C/C++ using OpenCV libraries:
For related information, see Application Note XAPP1167, “Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries.”
By Adam Taylor
In some applications, we wish to maintain the phase relationship between sampled signals. The Zynq SoC’s XADC contains two ADCs, which we can operate simulateneously in lock step to maintain the phase relationship between two sampled signals. To do this, we use the sixteen auxillary inputs with Channels 0-7 assigned to ADC A and channeld 8-15 assigned to ADC B. In simultaneous mode, we can therefore perform conversions on channels 0 to 7 and at the same time, perform conversions on channels 8 to 15.
In simultaneous mode, we can also continue to sample the on-chip parameters, however they are not sampled simultaneously. We are unable to perform automatic calibration in simultaneous mode but we can use another mode to perform calibration when needed. This should be sufficent because calibration is generally performed only on power up of the device for most applications.
To use the simulatenous mode, we first need a hardware design on Vivado that breaks out the AuX0 and AuX8 channels. On the Zedboard and MicroZed I/O Carrier Cards, these signals are broken out to the AMS connector. This allows me to connect signal sources to the AMS header to stimulate the I/O pin with a signal. For this example, I an using a Digilent Analog Discovery module as signal source.
The hardware design within the Zynq for this example appears below:
Writing the software in SDK for simultaneous mode is very similar to the other modes of operation we have used in the past. The only major difference is that we need to make sure we have configured the simultaneous channels in the sequencer. Once this is done and we have configured the input format we want—bipolar or unipolar, averaging, etc.—we can start the sequencer using the XSM_SEQ_MODE_SIMUL mode definition.
When I ran this on the MicroZed set up as shown above and stored 64 samples from both the AUX0 and AUX8 input using input signals that were offset by 180 degrees, I was able to recover the following waveform, which shows the phase relations ship is maintained:
If we want, we can also use simultaneous-conversion mode with an external analog multiplexer. All we need to do is configure the design to use the external mux as we did previously. Perhaps the difference this time is that we need to use two external analog multiplexers because we need to be able to select the two channels to convert simultaneously. Also, we need only use three address bits to cover the 0-7 address range, as opposed four address bits that we needed for addressing all sixteen analog inputs when we previously used sequencer mode. We use the lower three address bits of the four available address bits.
At this point, the only XADC mode that we have not looked at is independent mode. This mode is like the XADC’s default (safe) mode, however in independent mode ADC A monitors the internal on chip parameters while ADC B samples the external inputs. Independent mode is intended to implement a monitoring mode. As such, the alarms are active so you can use this mode for implementing security and anti-tamper features in your design.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Plethora IIoT develops cutting‑edge solutions to Industry 4.0 challenges using machine learning, machine vision, and sensor fusion. In the video below, a Plethora IIoT Oberon system monitors power consumption, temperature, and the angular speed of three positioning servomotors in real time on a large ETXE-TAR Machining Center for predictive maintenance—to spot anomalies with the machine tool and to schedule maintenance before these anomalies become full-blown faults that shut down the production line. (It’s really expensive when that happens.) The ETXE-TAR Machining Center is center-boring engine crankshafts. This bore is the critical link between a car’s engine and the rest of the drive train including the transmission.
Plethora uses Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs as the heart of its Oberon system because these devices’ unique combination of software-programmable processors, hardware-programmable FPGA fabric, and programmable I/O allow the company to develop real-time systems that implement sensor fusion, machine vision, and machine learning in one device.
Initially, Plethora IIoT’s engineers used the Xilinx Vivado Design Suite to develop their Zynq-based designs. Then they discovered Vivado HLS, which allows you to take algorithms in C, C++, or SystemC directly to the FPGA fabric using hardware compilation. The engineers’ first reaction to Vivado HLS: “Is this real or what?” They discovered that it was real. Then they tried the SDSoC Development Environment with its system-level profiling, automated software acceleration using programmable logic, automated system connectivity generation, and libraries to speed programming. As they say in the video, “You just have to program it and there you go.”
Here’s the video:
Plethora IIoT is showcasing its Oberon system in the Industrial Internet Consortium (IIC) Pavilion during the Hannover Messe Show being held this week. Several other demos in the IIC Pavilion are also based on Zynq All Programmable devices.
I’ve written about the Zynq-based Red Pitaya several times in Xcell Daily. (See below.) Red Pitaya is a single-board, open instrumentation platform based on the Xilinx Zynq SoC, which combines a dual-core ARM Cortex-A9 MPCore processor with a heavy-duty set of peripherals and a chunk of Xilinx 7 series programmable logic. Red Pitaya packages its programmable instrumentation board with probes, power supply, and an enclosure and calls it the STEMlab. I’ve just discovered the STEMlab page on the Digi-Key site with inventory levels, so if you want to get into programmable instrumentation in a hurry, this is a good place to start.
The page lists three STEMlab starter kits:
Red Pitaya 27901 STEMlab kit with scope and logic probes
For more articles about the Zynq-based Red Pitaya, see:
By Adam Taylor
Over the length of this series, we have looked at several different development boards. One thing that is common to many of these boards: they provide one or more Pmod (Peripheral module) connections that allow us to connect small peripherals to our boards. Pmods expand our prototype designs to create final systems. We have not looked in much detail at Pmods but they are an important aspect of many developments. As such, it would be remiss for me not to address them.
The Pmod standard itself was developed by Digilent and is an open-source de facto standard to ensure wide adoption of this very useful interface. There’s a wide range of available Pmods from DA/AD convertors to GPS receivers and OLED displays.
Over the years, we have looked at several Zynq-based boards with at least one Pmod port. In some cases, these boards provide Pmod ports that are connected to either the Zynq SoC’s PL (programmable logic), the PS (processing system), or both. If a PS connection is used, we can use the Zynq SoC’s MIO to provide the interface. If the Pmod connection is to the PL, then we need to create our own interface to the Pmod device. Regardless of whether we use the PL or the PS, we will need a software driver to interface with it.
Various Zynq-based dev boards and their Pmod connections
That comment may initially bring you to the thought that we need to develop our own Pmod drivers from scratch. This of course increases the time it takes to develop the application. For many Pmods, this is not the case. There is wide range of existing drivers we can use for both the PL and PS we can use within our designs.
The first thing we need to do it download the Digilent Vivado library. This library contains several Pmod drivers and DVI sinks and sources plus other very useful IP blocks that can accelerate our design.
Once you have downloaded this library, examine the file structure. You will notice multiple folders under the Pmods folder. Each of these folders is named for an available Pmod (e.g. Pmod_AD2 which is ADC). Within each of these drivers, you will see files structures as shown below:
Within this structure, the folders contain:
The next step, if we wish to use these IP modules, is to include the directory as a repository within our Vivado design. We do this by selecting the project settings within our project. We can add a new repository pointing to the Digilent Vivado library we have just downloaded using the IP settings repository manager tab:
Once this is done, we should be able to see the Pmod IP cores within the Vivado IP Catalog. We can then use these IP cores in within our design in the same way we use all other IP.
Once we have created our block diagram in Vivado, we can customize the Pmod IP blocks and select the Pmod Port they are connected to—assuming the board definition for the development board we are using supports that.
In the case below, which targets the new Digilent ARTY Z7 board, the AD2 Pmod is being connected to Pmod Port B:
If we are unable to find a driver for the Pmod we want to use, we can use the Pmod Bridge driver, which will enable us to create an interface to the desired Pmod with the correct pinout.
When it comes to software, all we need to do is import the files from the drivers/<Pmod_name>/src directory to our SDK project. Adding these files will provide a range of drives that we can use to interface with the Pmod PL instantiation and talk to the connected Pmod. If there is example code available, we will find this under the drivers/<Pmod name>/examples directory. When I ran this example code for the PmodAD2 it worked as expected:
This enables us to get our designs up and running even faster.
My code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Like any semiconductor device, designing with Xilinx All Programmable devices means dealing with their power-supply requirements—and like any FPGA or SoC, Xilinx devices do have their fair share of requirements in the power-supply department. They require several supply voltages, more or less depending on your I/O requirements, and they need to have these voltages ramped up and down in a certain sequence and with specific ramp rates if they’re to operate properly. On top of that, power-supply designs are board-specific—different for every unique pcb. Dealing with all of these supply specs is a challenging engineering problem, just due to the number of requirements, so you might like some help tackling it.
Here’s some help.
Infineon demonstrated a reference power supply design for Xilinx Zynq UltraScale+ MPSoCs based on its IRPS5401 Flexible Power Management Unit at APEC (the Applied Power Electronics Conference) last month. The reference design employs two IRPS5401 devices to manage and supply ten different power supplies. Here’s a block diagram of the reference design:
This design is used on the Avnet UltraZed SOM, so you know that it’s already proven. (For more information about the Avnet UltraZed SOM, see “Avnet UltraZed-EG SOM based on 16nm Zynq UltraScale+ MPSoC: $599” and “Look! Up in the sky! Is it a board? Is it a kit? It’s… UltraZed! The Zynq UltraScale+ MPSoC Starter Kit from Avnet.”)
Now the UltraZed SOM measures only 2x3.5 inches (50.8x76.2mm) and the power supply consumes only a small fraction of the space on the SOM, so you know that the Infineon power supply design must be compact.
It needs to be.
Here’s a photo of the UltraZed SOM with the power supply section outlined in yellow:
Infineon Power Supply Design on the Avnet UltraZed SOM (outlined in yellow)
Even though this power supply design is clearly quite compact, the high integration level inside of Infineon’s IRPS5401 Flexible Power Management Unit means that you don’t need additional components to handle the power-supply sequencing or ramp rates. The IRPS5401s handle that for you.
However, every Zynq UltraScale+ MPSoC pcb is different because every pcb presents different loads, capacitances, and inductances to the power supply. So you will need to tailor the sequencing and ramp times for each board design. Sounds like a major pain, right?
Well, Infineon felt your pain and offers an antidote. It’s an Infineon software app called the PowIRCenter and it’s designed to reduce the time needed to develop the complex supply-voltage sequencing and ramp times to perhaps 15 minutes worth of work—which is how long it took, apparently, for an Avnet design engineer to set the timings for the UltraZed SOM.
Here’s a 4-minute video where Infineon’s Senior Product Marketing Manager Tony Ochoa walks you through the highlights of this power supply design and the company’s PowIRCenter software:
Just remember, the Infineon IRPS5401 Flexible Power Management Unit isn’t dedicated to the Zynq UltraScale+ MPSoC. You can use it to design power supplies for the full Xilinx device range.
Note: For more information about the IRPS5401 Flexible Power Management Unit, please contact Infineon directly.
Mentor Embedded is now supporting the Android OS (plus Linux and Nucleus) on Zynq UltraScale+ MPSoCs. You learn more in a free Webinar titled “Android in Safety Critical Designs” that’s being held on May 3 and 4. The Webinar will discuss how to use Android in safety-critical designs on the Xilinx Zynq UltraScale+ MPSoC. Register for the Webinars here.
I got a heads up on a new, low-end dev board called the “MiniZed” coming soon from Avnet and found out there’s a pre-announcement Web page for the board. Avnet’s MiniZed is based on one of the new Zynq Z-7000S family members with one ARM Cortex-A9 processor. It will include both WiFi and Bluetooth RF transceivers and, according to the MiniZed Web page, will cost less than $100!
Here’s the link to the MiniZed Web page and here’s a slightly fuzzy MiniZed board photo:
Avnet MiniZed (coming soon, for less than $100)
If I’m not mistaken, that’s an Arduino header footprint and two Digilent Pmod headers on the board, which means that a lot of pretty cool shields and Pmods are already available for this board (minus the software drivers, at least for the Arduino shields).
I know you’ll want more information about the MiniZed board but I simply don’t have it. So please contact Avnet for more information or register for the info on the MiniZed Web page.
By Adam Taylor
Having introduced the Real-Time Clock (RTC) in the Xilinx Zynq UltraScale+ MPSoC, the next step is to write some simple software to set the time, get the time, and calibrate the RTC. Doing this is straightforward and aligns with how we use other peripherals in the Zynq MPSoC and Zynq-7000 SoC.
Like all Zynq peripherals, the first thing we need to do with the RTC is look up the configuration and then use it to initialize the peripheral device. Once we have the RTC initialized, we can configure and use it. We can use the functions provided in the xrtcpsu.h header file to initialize and use the RTC. All we need to do is correctly set up the driver instance and include the xrtcpsu.h header file. If you want to examine the file’s contents, you will find them within the generated BSP for the MPSoC. Under this directory, you will also find all the other header files needed for your design. Which files are available depends upon how you configured the MPSoC in Vivado (e.g. what peripherals are present in the design).
We need to use a driver instance to use the RTC within our software application. For the RTC, that’s XRtcPsu, which defines the essential information such as the device configuration, oscillator frequency, and calibration values. This instance is used in all interactions with the RTC using the functions in the xrtcpsu.h header file.
As I explained last week, the RTC counts the number of seconds, so we will need to convert to and from values in units of seconds. The xrtcpsu.h header file contains several functions to support these conversions. To support this, we’ll use a C structure to hold the real date prior to conversion and loading into the RTC or to hold the resultant conversion date following conversion from the seconds counter.
We can use the following functions to set or read the RTC (which I did in the code example available here):
By convention, the functions used to set the RTC seconds counter is based on a time epoch from 1/1/2000. If we are going to be using internet time, which is often based on a 1/1/1970 epoch by a completely different convention, we will need to convert from one format to another. The functions provided for the RTC only support years between 2000 and 2099.
In the example code, we’ve used these functions to report the last set time before allowing the user to enter the time over using a UART. Once the time has been set, the RTC is calibrated before being re-initialized. The RTC is then read once a second and the values output over the UART giving the image shown at the top of this blog. This output will continue until the MPSoC is powered down.
To really exploit the capabilities provided by the RTC, we need to enable the interrupts. I will look at RTC interrupts in the Zynq MPSoC in the next issue of the MicroZed Chronicles, UltraZed Edition. Once we understand how interrupts work, we can look at the RTC alarms. I will also fit a battery to the UltraZed board to test its operation on battery power.
The register map with the RTC register details can be found here.
My code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
By Adam Taylor
So far, we have examined the FPGA hardware build for the Aldec TySOM-2 FPGA Prototyping board example in Vivado, which is a straightforward example of a simple image-processing chain. This hardware design allows an image to be received, stored in DDR SDRAM attached to the Zynq SoC’s PS, and then output to an HDMI display. What the hardware design at the Vivado level does not do is perform any face-detection functions. And to be honest, why would it?
With the input and output paths of the image-processing pipeline defined, we can use the untapped resources of the Zynq SoC’s PL and PS/PL interconnects to create the application at a higher level. We need to use SDSoC to do this, which allows us to develop our design using a higher-level language like C or C++ and then move the defined functionality from the PS into the PL—to accelerate that function.
The Vivado design we examined last week forms an SDSoC Platform, which we can use with the Linux operating system to implement the final design. The use of Linux allows us to use OpenCV within the Zynq SoC’s PS cores to support creation of the example design. If we develop with the new Xilinx reVISION stack, we can go even further and accelerate some of the OpenCV functions.
The face-detection example supplied with the TySOM-2 board implements face detection using a Pixel Intensity Comparison-based Object (PICO) detection framework developed by N Markus et al. The PICO framework scans the image with a cascade of binary classifiers. This PICO-based approach permits more efficient implementations that do not require the computation of integral images, HOG Pyramids, etc.
In this example, we need to define a frame buffer within the device tree blob to allow the Linux application to access the images stored within the Zynq SoC’s PS DDR SDRAM. The Linux application then uses “Video for Linux 2” (V4L2) to access this frame buffer and to allow further processing.
Once we get an image frame from the frame buffer, the software application can process it. The application will do the following things:
Looking at the above functions, not all of them can be accelerated in to the hardware. In this example, the conversion from YUV to greyscale, Sobel Edge Detection, and YUV-to-RGB conversion can be accelerated using the PL to increase performance.
Moving these functions into the PL is as easy as selecting the two functions we wish to accelerate with hardware and then clicking on build to create the example.
Once this was completed, the algorithm ran as expected using both the PS and PL in the Zynq SoC.
Using this approach allows us to exploit both the Zynq SoC’s PL and PS for image processing without the need to implement a fixed RTL design in Vivado. In short, this ability allows us to use a known good platform design to implement image capture and display across several different applications. Meanwhile, the use of SDSoC also allows us to exploit the Zynq SoC’s PL at a higher level without the need to develop the HDL from scratch, reducing development time.
My code is available on Github as always.
Later this month at the NAB Show in Las Vegas, you’ll be able to see several cutting-edge video demos based on the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC in the Omnitek booth (C7915). First up is an HEVC video encoder demo using the embedded, hardened video codec built into the Zynq UltraScale+ ZU7EV MPSoC on a ZCU106 eval board. (For more information about the ZCU106 board, see “New Video: Zynq UltraScale+ EV MPSoCs encode and decode 4K60p H.264 and H.265 video in real time.”)
Next up is a demo of Omnitek’s HDMI 2.0 IP core, announced earlier this year. This core consists of separate transmit and receive subsystems. The HDMI 2.0 Rx subsystem can convert an HDMI video stream (up to 4KP60) into an RGB/YUV video AXI4-Stream and places AUX data in an auxiliary AXI4-Stream. The HDMI 2.0 Tx subsystem converts an RGB/YUV video AXI4-Stream plus AUX data into an HDMI video stream. This IP features a reduced resource count (small footprint in the programmable logic) and low latency.
Finally, Omnitek will be demonstrating a new addition to its OSVP Video Processor Suite: a real-time Image Signal Processing (ISP) Pipeline Subsystem, which can create an RGB video stream from raw image-sensor outputs. The ISP pipeline includes blocks that perform image cropping, defective-pixel correction, black-level compensation, vignette correction, automatic white balancing, and Bayer filter demosaicing.
Omnitek’s Image Signal Processing (ISP) Pipeline Subsystem
Both the HDMI 2.0 and ISP Pipeline Subsystem IP are already proven on Xilinx All Programmable devices including all 7 series devices (Artix-7, Kintex-7, and Virtex-7), Kintex UltraScale and Virtex UltraScale devices, Kintex UltraScale+ and Virtex UltraScale+ devices, and Zynq-7000 SoCs and Zynq UltraScale+ MPSoCs.
By Adam Taylor
When we look at the peripherals in the Zynq UltraScale+ MPSoC’s PS (processor system), we see several which, while not identical to the those in the Zynq-7000 SoC, perform a similar function (e.g. the Sysmon, I2C controller etc.). As would be expected however, there are also peripherals that are brand new in the MPSoC. One of these is the Real-Time Clock (RTC), which will be the subject of my next few blogs.
The Zynq UltraScale+ MPSoC’s RTC is an interesting starting point in this examination of the on-chip PS peripherals as it can be powered from its own supply PSBATT to ensure that the RTC functions when the rest of the system is powered down. If we want that feature to work in our system design, then we need to include a battery that will provide this power over the equipment’s operating life.
The Zynq UltraScale+ MPSoC’s RTC needs an external battery to operated when the system is powered down
As shown in the figure above, the Zynq UltraScale+ MPSoC’s RTC is split into the RTC Core (dark gray rectangle) and the RTC Controller (medium gray “L”). The RTC Core resides within the battery power domain. The core contains all the counters needed to implement the timer functions and includes a tick counter driven directly by the external crystal oscillator (see clocking blog).
At the simplest level, the tick counter determines when a second has elapsed, incrementing a second counter. The operating system uses this second counter to determine the date from a specific reference point known by the operating system. The second counter is 32 bits wide so it can count for 136 years. If necessary, we can set the seconds counter to a known value as well once the low-power domain is operational.
To ensure timing accuracy, the RTC provides a calibration register that can correct timing errors every 16 seconds due to static inaccuracies caused by the crystal’s frequency tolerance. At some point, your application code can determine the RTC’s timing inaccuracy based on an external timing reference (like a GPS-derived time, for example) and then use the computed inaccuracy to discipline the RTC by setting the calibration register.
The Zynq UltraScale+ MPSoC’s RTC incorporates a calibration register for clock-crystal compensation
The RTC can generate an interrupt once every second when it’s fully powered. (There’s no need for clock interrupts when the RTC is running on battery power because there’s no operational processor to interrupt.) The Zynq UltraScale+ MPSoC’s ARM processor that’s controlling the RTC should have this interrupt enabled so that it can correctly manage it.
During board testing and commissioning, we can use an RTC register bit to clock the counters in place of the external crystal oscillator. This is of interest if we want to ensure that alarms occur at set values but we do not want to wait for the long time they would normally take to occur if we waited for the oscillator ticks. The other approach is to use a different value for the alarms during testing, which requires a different load of the application software and is not representative of the actual code.
When it comes to selecting an external crystal for the RTC, we should select either a 32768Hz or a 65536Hz crystal. If the part selected has a 20 PPM tolerance, the RTC’s calibration feature allows us to achieve better than 2 PPM if we use the 32768Hz crystal or 1 PPM if we use the 65536Hz crystal. (We get more calibration resolution with the faster crystal.)
We need to use the RTC Controller to access and manage the RTC core. The controller provides the ability to control and interact the RTC Core once the low-power domain is powered up. We also configure the interrupts and alarms to be generated within the RTC controller. We can set an alarm to occur at any point within the 136-year range of the second counter.
I should also note that battery power is only required when the PS main supplies are not powered. If the main supplies are powered, then the battery does not power the RTC Core. We can use the ratio of the time the system is powered up to the time it spends powered down to correctly size the battery.
In the next blog, we will look at the software we need to write to configure, control, and calibrate the RTC.
My code is available on Github as always.
As a follow-on to last month’s announcement that RFEL had supplied the UK’s Defence Science and Technology Laboratory (DSTL) with two of its Zynq-based HALO Rapid Prototype Development Systems (RPDS), RFEL has now announced that DSTL has contracted with a three-company team to develop an adaptive, real-time, FPGA-based vision platform “to solve complex defence vision and surveillance problems, facilitating the rapid incorporation of best-in-class video processing algorithms while simultaneously bridging the gap between research prototypes and deployable equipment.” The three company team includes RFEL, 4Sight Imaging, and team leader Plextek.
The press release explains, “This innovative work draws together the best aspects of two approaches to video processing: high performance, bespoke FPGA processing supporting the computationally intensive tasks, and the flexibility (but lower performance) of CPU-based processing. This heterogeneous, hybrid approach is possible by using contemporary system-on-chip (SoC) devices, such as Xilinx’s Zynq devices, that provide embedded ARM CPUs with closely coupled FPGA fabric. The use of a modular FPGA design, with generic interfaces for each module, enables FPGA functions, which are traditionally inflexible, to be dynamically re-configured under software control.”
HALO Rapid Prototype Development Systems (RPDS)
By Adam Taylor
Having introduced the Aldec TySOM-2 FPGA Prototyping Board, based on the Xilinx Zynq SoC, and the face detection application running on it, I thought it would be a good idea to take a more detailed examination of the face-detection application’s architecture.
The face detection example uses one Blue Eagle camera, which is connected to the Aldec FMC-ADAS card. The processed frames showing the detected face are output via the TySOM-2 board’s HDMI port. What is worth pointing out is that the application running on the TySOM-2 board, face detection in this case, is enabled by the software. The Zynq PL (programmable logic) hardware design provides the capability to interface with the camera, for sharing the video frames with the Zynq PS (processing system) through the DDR SDRAM, and for display output.
Any application could be implemented—not just face detection. It could be object tracking. I could be corner detection. It could be anything. This is one of the things that makes development of image-processing systems on the Zynq so powerful. We can use the same base platform on the TySOM-2 board and customize the application in software. Of course, we can also use the Xilinx SDSoC development environment to further accelerate the algorithm into the TySOM-2 platform’s remaining resources to increase performance.
The Blue Eagle camera transmits the video stream using a, FPD-Link III link. These links use a high-speed, bi-directional CML (Current Mode Logic) link to transfer the image data. An FPD-Link III receiving device (a TI DS90UB914Q-Q1 FPD-Link III SER/DES) is used on the ADAS FMC to implement this camera interface. This device is configured for the application in hand using the I2C peripheral in the Zynq SoC’s PS. This device provides video to the Zynq PL in a parallel format: the parallel data bits, HSync, VSync, and a pixel clock.
We need to process the frames and store them within the Zynq PS’ DDR SDRAM using Video DMA (Direct Memory Access) to ensure that we can access the image frames within DDR memory using the Zynq SoC’s ARM Cortex-A9 processor. We need to use several IP blocks that come as standard IP within Vivado to implement this. These IP blocks transfer data using the AXI streaming protocol--AXIS.
Therefore, the first thing needed is to convert the received video in parallel format into an AXIS stream. Once the video is in the correct format, we can use the VDMA IP block to transfer video data to and from the Zynq PS’ DDR SDRAM, where the software running on the Zynq SoC’s ARM Cortex-A9 processors can access the frames and implement the application algorithms.
Unlike previous examples we have examined, which used a single AXI High Performance (AXI HP) port, this example uses two of the Zynq SOC’s AXI HP interface ports, one in each direction. This configuration requires a slightly more complicated DMA architecture because we’ll need two VDMA IP Blocks. Within the Zynq PL, the AXI standard used for most IP blocks is AXI 4.0 while the ports on the Zynq SoC implement AXI 3.0. Therefore, we need to use an AXI Interconnect or a protocol convertor to convert between the two standards.
This use of two interfaces will make no performance difference when compared to a single HP AXI interface because the S0 and S1 AXI HP Ports on the Zynq SoC which are used by this configuration are multiplexed down to the M0 port on the memory interconnect and finally connected to the S3 port on the DDR SDRAM controller. This is shown below in the interconnection diagram from UG585, the TRM for the Zynq SoC.
Once the VDMA is implemented, the design then perform color-space conversion, chroma resampling, and finally passes to an on-screen display module. Once this has been completed, the video stream must be converted from AXIS to parallel video, which can then be output to the HDMI transmitter.
With this hardware platform completed, the next step is to write the software to create the application. For this we have the choice of using SDK or using SDSoC, which adds the ability to accelerate some of the application algorithm functions using programmable logic. As this example is implemented on the Zynq Z-7100 SoC, which has a significant amount of free, on-chip programmable resources following the implementation of the base platform, we’ll be using SDSoC for this example. We will look at the software architecture next time.
My code is available on Github as always.
Xcell Daily has covered the Xilinx version of QEMU for the Zynq UltraScale+ MPSoC, Zynq-7000 SoC, and for any design that uses the Xilinx MicroBlaze 32-bit soft-core RISC processor but now there’s a short video that gives you a good introduction to the tool in just over 10 minutes.
If you haven’t heard about QEMU, it’s a fast, software-based processor emulator and virtual emulation platform developed by the open-source community. Xilinx adopted QEMU several years ago for internal product development and, as a consequence, has developed and extended QEMU to cover the ARM Cortex-A53, -A9, and -R5 processor cores used in the Zynq UltraScale+ MPSoCs and Zynq-7000 SoCs and for the Xilinx MicroBlaze processor core as well. At the same time, Xilinx has fed these enhancements back to the open-source community. (More info here.)
You reap the benefit of this development in the rapid introduction of new Zynq devices and in your ability to download the Xilinx version of QEMU and use it for developing your own designs. The availability of QEMU for the processors used in Zynq devices and the MicroBlaze processor allows you to develop code for your design long before the application-specific hardware IP is ready. That means you can short-cut big chunks of your development cycle when the software team might otherwise be idled, waiting for a development platform. With QEMU, the development platform is as simple as a laptop PC.
That said, here’s the short video:
For additional Xcell Daily coverage of QEMU, see:
If you have a mere 14 minutes to spare, you can watch this new video that will show you how to set up the Zynq UltraScale+ MPSoC’s hardened, embedded PCIe block as a Root Port using the Vivado Design Suite. The target system is the ZCU102 eval kit (currently on sale for half price) and the video shows you how to use the PetaLinux tools to connect to a PCIe-connected NVMe SSD.
This is a fast, painless way to see a complete set of Xilinx development tools being used to create a fully operational system based on the Zynq UltraScale+ MPSoC in less than a quarter of an hour.
By Adam Taylor
So far on this journey, most of the boards we have looked at have been fitted with either the Zynq Z-7010 or Z-7020 SoCs. The new Aldec TySOM 2 board comes which with either a Zynq Z-7045 or Z-7100 device fitted, making it the most powerful Zynq-based board we have looked at to date. Especially with the Z-7100 SoC fitted as is the example Aldec has provided to me.
The TySOM 2 board is intended for development prototyping. As such, it provides you with a range of I/O pins, broken out on two FMC connectors that connect to 288 of the Zynq Z-7100 SoC’s 362 I/O pins and all 16 GTX lanes. It also provides some simple user peripherals including switches and LEDS along with an HDMI port connected to the Zynq SoC’s PL (programmable logic). Meanwhile, the Zynq PS (processing system) provides four USB 2.0 ports, Ethernet, and a USB/UART for connectivity and 1Gbyte of DDR memory. In short, the Aldec TySOM 2 board has everything we need to create a very power single board computer.
Here’s a block diagram of the TySOM 2 board:
Aldec TySOM 2 board block diagram
Of course, there’s a range of FPGA Mezzanine Cards (FMC) available from Aldec and other vendors to enable prototyping over a wide range of applications including vision, IIOT and ADAS. Aldec supplied my board with the ADAS daughter board, which enables the connection of up to five cameras using FPD-Link III connections. As FMC is an ANSI standard, there are a wide range of 3rd-party FMCs available, which further widen the prototyping options to support applications such as Software Defined Radio.
As I mentioned before, the Zynq Z-7100 SoC is the most powerful Zynq device we have examined to date. So what does the Z-7100 bring to the party that we have not seen before (not including the PL’s increased logic resources)? The most obvious addition is the provision of the 16 GTX transceivers that support data rates to 12.5Gbps. You can also use these high-speed serial links to implement Gen1 (2.5 Gbps) and Gen2 (5.0 Gbps) PCIe interfaces. Multi-lane solutions are also possible. The Z-7100 can support as many as 8 lanes if we so desire.
We also gain access to high performance I/O pins for the first-time, which introduce digitally controlled, on-chip termination for better signal integrity. Zynq Z-7020 devices and below only provide High Range (HR) I/O, which handle a wider range of I/O voltages (1.2V to 3.3V) although with reduced performance. When it comes to logic resources, the Zynq Z-7100 SoC is very impressive. It gives us 444K logic cells, 2020 DSP slices, 26.5Mbits of block RAM, and 554,800 flip flops.
We will look more in detail at how we can use this development board over the next few weeks. However, Aldec shipped this board pre-installed with a face-detection application, which connects to a single camera using the ADAS FMC and an HDMI display. When I connected it all up and ran the application, the example sprung to life and detected my face as I moved about in front of the supplied camera:
My code is available on Github as always.
With a big part of the embedded world just catching up to 32-bit RISC processors, you may have looked at the multiple 64-bit ARM Cortex-A53 processors in the Xilinx Zynq UltraScale+ MPSoC and wondered, “Why?” It’s a fair question and one that reminds me of the debates I had with my good friend Jack Ganssle at long-past Embedded Systems Conferences. I consider Jack to be one of the world’s foremost embedded-system design experts so he has a very informed opinion about these things. (If you do not subscribe to Jack’s free Embedded Muse newsletter, you should immediately click on that link and then come back.)
Jack and I discussed the use of 8-bit versus 32-bit processors in embedded systems many years ago. I argued that you could already see designers employing all sorts of memory block-switching schemes to circumvent the 8-bit processors’ 64Kbyte address-space limitations. Why do that? Why take on the significant burden of the added software complexity to juggle these switched-memory blocks when Moore’s Law had already made 32-bit processors with their immense address spaces eminently affordable?
Well, even 32-bit processors no longer have “immense’ memory spaces relative to the embedded tasks we must now tackle and address-space considerations are a big part of why you want to think about using 64-bit processors for embedded designs. But that’s not the sole or even the main consideration for using 64-bit processors in embedded designs.
Rather than argue the points here, my intent is to alert you to a free, 1-hour Webinar being taught by Doulos titled “Shift Gear with a 64-bit ARM-Powered MPSoC.” Yes, the title could be more descriptive, so here are the ARM Cortex-A53 programmer’s model enhancements that this Webinar will cover:
The Webinar will be conducted twice on April 21 to accommodate multiple time zones worldwide. Click here for more info.
Mentor has just announced the DRS360 platform for developing autonomous driving systems based on the Xilinx Zynq UltraScale+ MPSoC. The automotive-grade DRS360 platform is already designed and tested for deployment in ISO 26262 ASIL D-compliant systems.
This platform offers comprehensive sensor-fusion capabilities for multiple cameras, radar, LIDAR, and other sensors while offering “dramatic improvements in latency reduction, sensing accuracy and overall system efficiency required for SAE Level 5 autonomous vehicles.” In particular, the DRS360 platform’s use of the Zynq UltraScale+ MPSoC permits the use of “raw data sensors,” thus avoiding the power, cost, and size penalties of microcontrollers and the added latency of local processing at the sensor nodes.
Eliminating pre-processing microcontrollers from all system sensor nodes brings many advantages to the autonomous-driving system design including improved real-time performance, significant reductions in system cost and complexity, and access to all of the captured sensor data for a maximum-resolution, unfiltered model of the vehicle’s environment and driving conditions.
Rather than try to scale lower levels of ADAS up, Mentor’s DRS360 platform is optimized for Level 5 autonomous driving, and it’s engineered to easily scale down to Levels 4, 3 and even 2. This approach makes it far easier to develop systems at the appropriate level for the system you’re developing because the DRS360 platform is already designed to handle the most complex tasks from the beginning.
If you’re working with any sort of video, there’s a new 4-minute demo video you need to see. This video shows two new Zynq UltraScale+ EV MPSoC devices working in tandem to decode and display 4K60p streaming video in both H.264 and H.265 video formats in real time. Zynq UltraScale+ EV MPSoC devices incorporate hardened, low-latency H.264 and H.265 video codecs (encode and decode). The demo employs two Xilinx ZCU106 boards in the following configuration:
The first ZCU106 extracts the 4K60p video stream from a USB stick at 60Mbps, decodes the video, and displays it on a local monitor using a DisplayPort interface. At the same time, the on-board Zynq UltraScale+ EV device re-encodes the video using the on-chip H.265 encoder, which reduces the video bit rate to 10Mbps thanks to the improved encoding efficiency of the H.265 standard. The board then transmits the resulting 10Mbps video stream over a wired Ethernet connection to a second ZCU106 board, which decodes the video and displays it on a second monitor. The entire process occurs with such low latency that it’s hard to see any delay between the two displayed video streams.
Here’s the video demo:
Hours after I posted yesterday’s blog about Siglent’s new sub-$400, Zynq-powered SDS1000-E family of 2-channel, 200MHz, 1Gsamples/sec DSOs (see “Siglent 200MHz, 1Gsample/sec SDS1000X-E Entry-Level DSO family with 14M sample points is based on Zynq SoC”), EEVblog’s Dave Jones posted a detailed, 25-minute teardown video of the very same scope, which clearly illustrates just how Siglent reached this incredibly low price point.
One way Siglent achieved this design milestone was to use one single board to implement all of the scope’s analog and digital circuitry. However, 8- or 10-layer pcbs are expensive, so Siglent needed to minimize that single board’s size and one way to do that is to really chop the component count on the board. To do that without cutting functions, you need to use the most highly integrated devices you can find, which is probably why Siglent’s design engineers selected the Xilinx Zynq Z-7020 SoC as the keystone for this DSO’s digital section. As discussed yesterday, the use of the Zynq Z-7020 SoC allowed Siglent’s design team to introduce advanced features from the company’s high-end DSOs and put them into these entry-level DSOs with essentially no increase in BOM cost.
Here’s a screen capture from Dave’s new teardown video showing you what the new Siglent DSO’s main board looks like. That’s Dave’s finger poised over the Xilinx Zynq SoC (under the heat sink), which is flanked to the left and right by the two Samsung K4B1G1646I 1Gbit (64Mx16) DDR3 SDRAM chips used for waveform capture and the display buffer—among other things.
As discussed yesterday, the Zynq SoC’s two on-chip ARM Cortex-A9 processors can easily handle the scope’s GUI and housekeeping chores. Its on-chip programmable logic implements the capture buffer, the complex digital triggering, and the high-speed computation needed for advanced waveform math and the 1M-point FFT. Finally, the Zynq SoC’s programmable I/O and SerDes transceiver pins make it easy to interface to the scope’s high-speed ADC and the DDR3 memory needed for the deep, 14M-point capture buffer and the display memory for the DSO’s beautiful color LCD with 256 intensity levels. (All this is discussed in yesterday’s Xcell Daily blog post about these new DSOs.)
Here’s a photo of that Siglent screen from one of Dave’s previous videos, where he uses a prototype of this Siglent DSO to troubleshoot and fix a malfunctioning HP 54616B DSO that had been dropped:
Note: Since sending this prototype to Dave, Siglent has apparently decided to bump the bandwidth of these DSOs to 200MHz. Just another reminder of how competitive this entry-level DSO market has become, and how the Zynq SoC's competitive advantages can be leveraged in a system-level design.
Here’s Dave’s teardown video:
Siglent’s new SDS1000X-E family of entry-level DSOs (digital sampling oscilloscopes) feature 200MHz of bandwidth with a 1G sample/sec sample rate in the fastest family members, 14M sample points in all family models, 256 intensity levels, and a high-speed display update rate of 400,000 frames/sec. The new DSOs also include many advanced features not often found on entry-level DSOs including intelligent triggering, serial bus decoding and triggering, historical mode and sequence mode, a rich set of measurement and mathematical operations, and a 1M-point FFT. The SDS1000X-E DSO family is based on a Xilinx Zynq Z-7020 SoC, which has made it cost-effective for Siglent to migrate its high-end SPO (Super Fluorescent Oscilloscope) technology to this new entry-level DSO family.
Siglent’s new, entry-level SDS1000X-E DSO family is based on a Xilinx Zynq Z-7020 SoC
According to this WeChat article published in January by Siglent (Ding Yang Technology in China), the Zynq SoC “is very suitable for data acquisition, storage and digital signal processing in digital oscilloscopes.” In addition, the high-speed, high-density, on-chip interconnections between the Zynq SoC’s PS (processor system) and PL (programmable logic) “effectively solve” the traditional digital storage oscilloscope CPU and FPGA data-transmission bottlenecks, which reduces the DSO’s dead time between triggers and increases the waveform capture and display rates. According to the article, the system design employs four AXI ports operating between the Zynq PS and PL to achieve 8Gbytes/sec data transfers—“far greater than the local bus transmission rate” achievable using chip-to-chip I/O, with far lower power consumption.
The Zynq SoC’s combination of ARM Cortex-A9 software-driven processing and on-chip programmable logic also reduces hardware footprint and facilitates integration of high-performance processing systems into Siglent’s compact, entry-level oscilloscopes. The article also suggests that the DSO system design employs the Zynq SoC’s partial-reconfiguration capability to further reduce the parts count and the board footprint: “The PL section has 220 DSP slices and 4.9 Mb Block RAM; coupled with high throughput between the PS and PL data interfaces, we have the flexibility to configure different hardware resources for different digital signal processing.”
Further, the SDS1000X-E DSO family’s high-speed ADC uses high-speed differential-pair signaling to connect directly to the Zynq SoC’s high-speed SerDes transceivers, which guarantee’s “stable and reliable access” to the ADCs’ 1Gbyte/sec data stream while the Zynq SoC’s on-chip DDR3 controller operating at 1066Mtransfers/sec allows “the use of single-chip DDR3 to meet the real-time storage of the ADC output data requirements.”
Siglent has also used the Zynq SoC’s PL to implement the DSOs’ high-sensitivity, low-jitter, zero-temperature-drift digital triggering system, which includes many kinds of intelligent trigger functions such as slope, pulse width, video, timeout, rungs, and patterns that can help DSO users more accurately isolate waveforms of interest. Advanced bus-protocol triggers and bus events (such as the onset of I2C bus traffic or UART-specific data can also serve as trigger conditions, thanks to the high-speed triggering ability designed into the Zynq SoC’s PL. These intelligent triggers greatly facilitate debugging and add considerable value to the new Siglent entry-level DSOs.
Here’s a translated block diagram of the SDS1000X-E DSO family’s system design:
The new SDS1000X-E DSO family illustrates the result of selecting a Zynq SoC as the foundation for a system design. The large number of on-chip resources permit you to think outside of the box when it comes to adding features. Once you’ve selected a Zynq SoC, you no longer need to think about cramming code into the device to add features. With the Zynq SoC’s hardware, software, and I/O programmability, you can instead start thinking up new features that significantly improve the product’s competitive position in your market.
This is precisely what Siglent’s engineers were able to do. Once the Zynq SoC was included in the design, the designers of this entry-level DSO family were able to think about which high-performance features they wished to migrate to their new design.
By Adam Taylor
We have looked at the XADC several times within this series. One thing we have not examined is how to use the external analog multiplexer capability. This is an oversight on my part as it can be very useful when we are architecting our system. With the XADC we can interface with up to 17 analog inputs: one dedicated Vp/Vn pair of inputs and sixteen auxiliary differential input pairs which share pins with the logic IO. This means that we can sample up to 17 different analog signals along with the device’s on-chip supply voltages and temperatures. This does of course does require the use of as many as 34 I/O pins, which can be challenging on some pin-constrained devices or designs.
The use of an external multiplexor provides us with the ability to sample up to 16 analog inputs. We need only 4 I/O lines for the multiplexer address as the Vp/Vn pair are dedicated and are outside of the multiplexer address. Note that we are not limited to using only the Vp/Vn pair for analog inputs. You can use any of the auxiliary inputs as well.
To demonstrate how we do this, the first thing with need is a Vivado design with the XADC set up to allow an external mux. We can do this on the ADC setup tab of the XADC wizard. We can also select which analog inputs are being used with the external mux. If we already have a design with the XADC enabled, we can use the AXI interface to configure it.
With the wider Vivado design, I am going to include some ILAs (Integrated Logic Analyzers) so that we can see what is happening internally and I am going to connect the mux pins from the FPGA to the ZedBoard AMS header GPIO pins and into a logic analyzer so that we can see they are changing as would be the case when driving an external mux.
Implementing this within the software is very similar to how we previously did this for the XADC. The first step is to configure the XADC as we would if we were using the internal mux capability. However, when we want to use the external mux we need to consider the information within UG480 and particularly the diagram below:
To use an external mux, we therefore need to do the following in addition to our normal approach:
Once these have been configured, we set the XADC sampling by setting the sequencer mode to continuous pass. This will then sequence the external mux pins around the inputs desired as shown below in the ILA capture when all 16 aux inputs are sampled.
The eagle-eyed will have noticed there are 16 external inputs which requires 4 pins but the external mux address provides 5 pins. To connect these to an external multiplexer we need to connect only the lower four bits of the address.
Just as we do when the internal mux is used, the sampled data from the conversion will be in the appropriate register and not in the Vp/Vn aux conversion register (e.g. aux 0 will be in aux 0, aux 1 in aux 1 and so on).
An external analog mux therefore allows us to monitor nearly the same number of analog signals with a much-reduced pin count. There is also another trick we can do with the XADC, which we will look, soon.
Code is available on Github as always.
Here’s a 40-minute teardown video of a Vision Research Phantom v5 high-speed high-speed, 1024x1024-pixel, 1000frames/sec video camera (circa 2001) from tesla500’s YouTube video channel. His methodical teardown and excellent system-level explanation uncovers a couple of “huge” Xilinx XC4020 FPGAs (circa 2000) on the timing and interface boards and Xilinx XC9500 CPLDs implementing the timing and control on the four high-speed capture-memory boards. There’s also a Hitachi SH-2 32-bit RISC processor with a hardware MAC (for DSP) on the timing board.
The XC4020 FPGAs are 3rd-generation devices that each have 784 CLBs (1560 LUTs total). They were big in their day but they’re very small now. These days, I think you could implement all of the digital timing and control circuitry in this camera including the SH-2 processor’s capabilities using the smallest single-core Zynq Z-7007S SoC—with the ARM Cortex-A9 processor in the Zynq SoC running considerably more than 20x faster than the turn-of-the-millennium SH-2 processor’s roughly 28MHz maximum clock rate.
Of course, Vision Research has moved far beyond 1000 frames/sec over the past 17 years. Its latest cameras can go 1000x faster than that, hitting 1M frames/sec when configured with the company’s FAST option (fast indeed!), while the Phantom v5 is no longer listed even on the company’s “discontinued cameras” page. Nevertheless, I found tesla500’s teardown and explanations fascinating and valuable.
Of course, Xilinx All Programmable devices have long been used to design advanced video equipment like the Vision Research Phantom v5 high-speed camera. Which allows me to quickly remind you of the recent launch of the Xilinx reVISION stack launch for embedded-vision applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)
And now, here’s tesla500’s Vision Research Phantom v5 high-speed camera teardown video:
On April 11, the third, free Webinar in Xilinx's "Precise, Predictive, and Connected Industrial IoT" series will provide insight into the role of Zynq All Programmable SoCs in the breath of applications across IIoT Edge and the connectivity between them. A brief summary of IIoT trends will be presented, followed by an overview of the Data Distribution Service (DDS) IIoT databus standard presented by RTI, the IIoT Connectivity Company, and how DDS and OPC-UA target different connectivity challenges in IIoT systems.
Webinar attendees will also learn:
Xcell Daily discussed DeePhi Tech’s Zynq-based CNN acceleration processor last year in connection with the Hot Chips 2016 conference. (See “DeePhi’s Zynq-based CNN processor is faster, more energy efficient than CPUs or GPUs.”) DeePhi’s founder Song Yao appears in a new Powered by Xilinx video this week giving many more details including some fascinating information about an early customer, ZeroTech—China’s second largest drone maker.
DeePhi provides the entire stack needed to develop machine-learning applications based on neural networks including the development software, algorithms, and a neural-network processor that runs efficiently on the Xilinx Zynq SoC. This technology is particularly good for deep-learning, vision-based embedded apps such as drones, robotics, surveillance cameras, and for cloud-computing applications as well.
The video also provides more details on ZeroTech’s use of DeePhi’s machine-learning technology for object detection, pedestrian detection, and gesture recognition—all in a drone that nestles in your hand.
Song Yao explains that DeePhi’s tools provide a GPU-like development environment while taking advantage of the superior efficiency of neural networks implemented with programmable logic. In addition, DeePhi can change the neural network’s architecture to further optimize the design for specific applications.
Finally, he explains that you can use these Zynq-based implementations in applications where GPUs will simply not work due to power-consumption restrictions. In fact, last year at Hot Chips 2016 he reportedly said, “The FPGA based DPU platform achieves an order of magnitude higher energy efficiency over GPU on image recognition and speech detection.”
Here’s the new, 3-minute Powered by Xilinx video:
Last month, I blogged about a new Aldec FPGA Prototyping board—the HES-US-440—based on the “big, big, big” Xilinx Virtex UltraScale VU440 FPGA teamed with the Xilinx Zynq Z-7100 SoC. (See “Aldec selected the big, big, big Virtex UltraScale VU440 (and the Zynq SoC) for its new proto board—the HES-US-440.”) Now, Aldec’s Hardware Technical Support Manager Krzysztof Szczur has published an interesting article titled “Software Driven Test of FPGA Prototype,” which describes how you can use this prototyping board to create software-driven testbenches.
Why would you want to do that?
Because a software-driven verification methodology can shorten your development schedule by a lot, especially if you speed it up by moving from slow, software-based simulation to a much, much faster FPGA-based prototyping environment like the one provided by the Aldec HES-US-440.
And because time = money.
In fact, time >> money because you can usually find more money but there’s absolutely, positively no one out there minting time.
Note that this is true whether you’re designing an ASIC or you plan to deploy your design on a Xilinx All Programmable device.
Aldec HES-US-440 FPGA Prototpying Board Connection Diagram
Adam Taylor and Xilinx’s Sr. Product Manager for SDSoC and Embedded Vision Nick Ni have just published an article on the EE News Europe Web site titled “Machine learning in embedded vision applications.” That title’s pretty self-explanatory, but there are a few points I’d like to highlight. Then you can go read the full article yourself.
As the article states, “Machine learning spans several industry mega trends, playing a very prominent role within not only Embedded Vision (EV), but also Industrial Internet of Things (IIoT) and Cloud Computing.” In other words, if you’re designing products for any embedded market, you might well find yourself at a competitive disadvantage if you’re not adding machine-learning features to your road map.
This article closely ties machine learning with neural networks (including Feed-forward Neural Networks (FNNs), Recurrent Neural Networks (RNNs), and Deep Neural Networks (DNNs), and Convolutional Neural Networks (CNNs)). Neural networks are not programmed; they’re trained. Then, if they’re part of an embedded design, they’re deployed. Training is usually done using floating-point neural-network implementations but, for efficiency (power and cost), deployed neural networks can use fixed-point representations with very little or no loss of accuracy. (See “Counter-Intuitive: Fixed-Point Deep-Learning Inference Delivers 2x to 6x Better CNN Performance with Great Accuracy.”)
The programmable logic inside of Xilinx FPGAs, Zynq SoCs, and Zynq UltraScale+ MPSoCs is especially good at implementing fixed-point neural networks, as described in this article by Nick Ni and Adam Taylor. (Go read the article!)
Meanwhile, this is a good time to remind you of the recent Xilinx introduction of the reVISION stack for neural network development using Xilinx All Programmable devices. For more information about the Xilinx reVISION stack, see:
This article about Cyber Physical Systems on the Embedded Computing Design site led me to a new SMARC Rel. 2.0 module—the SECO SM-B71—that’s capable of carrying any one of ten Zynq UltraScale+ MPSoCs based on the common SFVC784 package pinout shared by these devices. The MPSoC device list includes the ZU2CG, ZU3CG, ZU4CG, ZU5CG family members and the ZU2EG, ZU3EG, ZU4EG, ZU5EG, ZU4EV, and ZU5EV family members with the integrated ARM Mali-400 MP2 GPU. Now that’s flexibility. The board also accommodates as much as 8Gbytes of DDR4-2400 SDRAM. Here’s a photo:
SECO SM-B71 Zynq UltraScale+ MPSoC SMARC Module
As you can see from the image, SECO is previewing this product at the moment, so please contact SECO directly for more information about the SM-B71.
A blog post from earlier this week, “Seven low-cost Zynq dev and training boards: a quick review,” prompted an email from Graham Naylor in the UK. Naylor informed me that I’d not mentioned his favorite Zynq-based board, the Trenz TE0722, in that blog post—and then he told me how he’s using the Trenz board (which is really more of a low-cost SOM rather than a training/dev board). During the day, Naylor measures neutron pulses from an ionization chamber using the Zynq-based Red Pitaya open instrumentation platform. (I’ve written many blogs about the Red Pitaya, listed below.) For fun, it appears that Naylor and colleague Pete Allwright design cave radios. If you’ve never heard of a cave radio, you’re in good company because I hadn’t either.
Naylor sent me a preprint of an article that will appear in the quarterly BCRA’s Cave Radio & Electronics Group Journal, in the June 2017 issue. (The BCRA is the British Cave Research Association.) Naylor’s and Allwright’s article, titled “Outlining the Architecture of the Nicola 3Z Cave Radio,” discusses the design of a new version of the Nicola 3 rescue radio designed to be used by cave rescue teams for underground communications.
The original Nicola 3 radio was based on a Xilinx Spartan-3E FPGA supplied on a module from OHO Elektronik. The FPGA implemented an SDR design for a radio that performs SSB modulation and demodulation using an 87KHz carrier wave. Radio transmission does not occur through the air but through the ground using a couple of electrodes jammed into the earthen floor of the cave. (We’re in a cave, remember?) A little water poured on the earth interface helps improve transmission/reception.
Xilinx introduced the 90nm Spartan-3E in 2005, so the Nicola cave radio development team has upgraded the Nicola design to the Zynq Z-7010 SoC, which resides on a low-cost Trenz TE0722 SOM. Trenz sells one of these boards for just €64.00 and if you want 500 pieces, the price drops to €48.00.
Trenz TE0722 Zynq SOM
The new radio is called the Nicola 3Z. (I'm guessing "Z" is for "Zynq.") The FPGA fabric in the Zynq SoC implements the SDR functions in the Nicola 3Z radio including the SSB class D modulator, which drives an H-bridge driver for transmission; the receiver’s SSB filter, decimator, and demodulator; and an AGC block implemented on a soft-core Xilinx PicoBlaze 8-bit microcontroller, which is also instantiated in the Zynq SoC’s FPGA. There’s a second PicoBlaze instantiation on chip for housekeeping. That Zynq Z-7010 SoC may be a low-end part, but it’s plenty busy in the Nicola 3Z radio’s design.
Note: For more information about the Zynq-based Red Pitaya open instrumentation platform, see: