Here’s a 90-second video showing a 56Gbps Xilinx test chip with a 56Gbps PAM4 SerDes transceiver operating with plenty of SI margin and better than 10-12 error rate over a backplane originally designed for 28Gbps operation.
Note: This working demo employs a Xilinx test chip. The 56Gbps PAM4 SerDes is not yet incorporated into a product. Not yet.
For more information about this test chip, see “3 Eyes are Better than One for 56Gbps PAM4 Communications: Xilinx silicon goes 56Gbps for future Ethernet.”
With a big part of the embedded world just catching up to 32-bit RISC processors, you may have looked at the multiple 64-bit ARM Cortex-A53 processors in the Xilinx Zynq UltraScale+ MPSoC and wondered, “Why?” It’s a fair question and one that reminds me of the debates I had with my good friend Jack Ganssle at long-past Embedded Systems Conferences. I consider Jack to be one of the world’s foremost embedded-system design experts so he has a very informed opinion about these things. (If you do not subscribe to Jack’s free Embedded Muse newsletter, you should immediately click on that link and then come back.)
Jack and I discussed the use of 8-bit versus 32-bit processors in embedded systems many years ago. I argued that you could already see designers employing all sorts of memory block-switching schemes to circumvent the 8-bit processors’ 64Kbyte address-space limitations. Why do that? Why take on the significant burden of the added software complexity to juggle these switched-memory blocks when Moore’s Law had already made 32-bit processors with their immense address spaces eminently affordable?
Well, even 32-bit processors no longer have “immense’ memory spaces relative to the embedded tasks we must now tackle and address-space considerations are a big part of why you want to think about using 64-bit processors for embedded designs. But that’s not the sole or even the main consideration for using 64-bit processors in embedded designs.
Rather than argue the points here, my intent is to alert you to a free, 1-hour Webinar being taught by Doulos titled “Shift Gear with a 64-bit ARM-Powered MPSoC.” Yes, the title could be more descriptive, so here are the ARM Cortex-A53 programmer’s model enhancements that this Webinar will cover:
The Webinar will be conducted twice on April 21 to accommodate multiple time zones worldwide. Click here for more info.
Mentor has just announced the DRS360 platform for developing autonomous driving systems based on the Xilinx Zynq UltraScale+ MPSoC. The automotive-grade DRS360 platform is already designed and tested for deployment in ISO 26262 ASIL D-compliant systems.
This platform offers comprehensive sensor-fusion capabilities for multiple cameras, radar, LIDAR, and other sensors while offering “dramatic improvements in latency reduction, sensing accuracy and overall system efficiency required for SAE Level 5 autonomous vehicles.” In particular, the DRS360 platform’s use of the Zynq UltraScale+ MPSoC permits the use of “raw data sensors,” thus avoiding the power, cost, and size penalties of microcontrollers and the added latency of local processing at the sensor nodes.
Eliminating pre-processing microcontrollers from all system sensor nodes brings many advantages to the autonomous-driving system design including improved real-time performance, significant reductions in system cost and complexity, and access to all of the captured sensor data for a maximum-resolution, unfiltered model of the vehicle’s environment and driving conditions.
Rather than try to scale lower levels of ADAS up, Mentor’s DRS360 platform is optimized for Level 5 autonomous driving, and it’s engineered to easily scale down to Levels 4, 3 and even 2. This approach makes it far easier to develop systems at the appropriate level for the system you’re developing because the DRS360 platform is already designed to handle the most complex tasks from the beginning.
If you’re working with any sort of video, there’s a new 4-minute demo video you need to see. This video shows two new Zynq UltraScale+ EV MPSoC devices working in tandem to decode and display 4K60p streaming video in both H.264 and H.265 video formats in real time. Zynq UltraScale+ EV MPSoC devices incorporate hardened, low-latency H.264 and H.265 video codecs (encode and decode). The demo employs two Xilinx ZCU106 boards in the following configuration:
The first ZCU106 extracts the 4K60p video stream from a USB stick at 60Mbps, decodes the video, and displays it on a local monitor using a DisplayPort interface. At the same time, the on-board Zynq UltraScale+ EV device re-encodes the video using the on-chip H.265 encoder, which reduces the video bit rate to 10Mbps thanks to the improved encoding efficiency of the H.265 standard. The board then transmits the resulting 10Mbps video stream over a wired Ethernet connection to a second ZCU106 board, which decodes the video and displays it on a second monitor. The entire process occurs with such low latency that it’s hard to see any delay between the two displayed video streams.
Here’s the video demo:
Hours after I posted yesterday’s blog about Siglent’s new sub-$400, Zynq-powered SDS1000-E family of 2-channel, 200MHz, 1Gsamples/sec DSOs (see “Siglent 200MHz, 1Gsample/sec SDS1000X-E Entry-Level DSO family with 14M sample points is based on Zynq SoC”), EEVblog’s Dave Jones posted a detailed, 25-minute teardown video of the very same scope, which clearly illustrates just how Siglent reached this incredibly low price point.
One way Siglent achieved this design milestone was to use one single board to implement all of the scope’s analog and digital circuitry. However, 8- or 10-layer pcbs are expensive, so Siglent needed to minimize that single board’s size and one way to do that is to really chop the component count on the board. To do that without cutting functions, you need to use the most highly integrated devices you can find, which is probably why Siglent’s design engineers selected the Xilinx Zynq Z-7020 SoC as the keystone for this DSO’s digital section. As discussed yesterday, the use of the Zynq Z-7020 SoC allowed Siglent’s design team to introduce advanced features from the company’s high-end DSOs and put them into these entry-level DSOs with essentially no increase in BOM cost.
Here’s a screen capture from Dave’s new teardown video showing you what the new Siglent DSO’s main board looks like. That’s Dave’s finger poised over the Xilinx Zynq SoC (under the heat sink), which is flanked to the left and right by the two Samsung K4B1G1646I 1Gbit (64Mx16) DDR3 SDRAM chips used for waveform capture and the display buffer—among other things.
As discussed yesterday, the Zynq SoC’s two on-chip ARM Cortex-A9 processors can easily handle the scope’s GUI and housekeeping chores. Its on-chip programmable logic implements the capture buffer, the complex digital triggering, and the high-speed computation needed for advanced waveform math and the 1M-point FFT. Finally, the Zynq SoC’s programmable I/O and SerDes transceiver pins make it easy to interface to the scope’s high-speed ADC and the DDR3 memory needed for the deep, 14M-point capture buffer and the display memory for the DSO’s beautiful color LCD with 256 intensity levels. (All this is discussed in yesterday’s Xcell Daily blog post about these new DSOs.)
Here’s a photo of that Siglent screen from one of Dave’s previous videos, where he uses a prototype of this Siglent DSO to troubleshoot and fix a malfunctioning HP 54616B DSO that had been dropped:
Note: Since sending this prototype to Dave, Siglent has apparently decided to bump the bandwidth of these DSOs to 200MHz. Just another reminder of how competitive this entry-level DSO market has become, and how the Zynq SoC's competitive advantages can be leveraged in a system-level design.
Here’s Dave’s teardown video:
Siglent’s new SDS1000X-E family of entry-level DSOs (digital sampling oscilloscopes) feature 200MHz of bandwidth with a 1G sample/sec sample rate in the fastest family members, 14M sample points in all family models, 256 intensity levels, and a high-speed display update rate of 400,000 frames/sec. The new DSOs also include many advanced features not often found on entry-level DSOs including intelligent triggering, serial bus decoding and triggering, historical mode and sequence mode, a rich set of measurement and mathematical operations, and a 1M-point FFT. The SDS1000X-E DSO family is based on a Xilinx Zynq Z-7020 SoC, which has made it cost-effective for Siglent to migrate its high-end SPO (Super Fluorescent Oscilloscope) technology to this new entry-level DSO family.
Siglent’s new, entry-level SDS1000X-E DSO family is based on a Xilinx Zynq Z-7020 SoC
According to this WeChat article published in January by Siglent (Ding Yang Technology in China), the Zynq SoC “is very suitable for data acquisition, storage and digital signal processing in digital oscilloscopes.” In addition, the high-speed, high-density, on-chip interconnections between the Zynq SoC’s PS (processor system) and PL (programmable logic) “effectively solve” the traditional digital storage oscilloscope CPU and FPGA data-transmission bottlenecks, which reduces the DSO’s dead time between triggers and increases the waveform capture and display rates. According to the article, the system design employs four AXI ports operating between the Zynq PS and PL to achieve 8Gbytes/sec data transfers—“far greater than the local bus transmission rate” achievable using chip-to-chip I/O, with far lower power consumption.
The Zynq SoC’s combination of ARM Cortex-A9 software-driven processing and on-chip programmable logic also reduces hardware footprint and facilitates integration of high-performance processing systems into Siglent’s compact, entry-level oscilloscopes. The article also suggests that the DSO system design employs the Zynq SoC’s partial-reconfiguration capability to further reduce the parts count and the board footprint: “The PL section has 220 DSP slices and 4.9 Mb Block RAM; coupled with high throughput between the PS and PL data interfaces, we have the flexibility to configure different hardware resources for different digital signal processing.”
Further, the SDS1000X-E DSO family’s high-speed ADC uses high-speed differential-pair signaling to connect directly to the Zynq SoC’s high-speed SerDes transceivers, which guarantee’s “stable and reliable access” to the ADCs’ 1Gbyte/sec data stream while the Zynq SoC’s on-chip DDR3 controller operating at 1066Mtransfers/sec allows “the use of single-chip DDR3 to meet the real-time storage of the ADC output data requirements.”
Siglent has also used the Zynq SoC’s PL to implement the DSOs’ high-sensitivity, low-jitter, zero-temperature-drift digital triggering system, which includes many kinds of intelligent trigger functions such as slope, pulse width, video, timeout, rungs, and patterns that can help DSO users more accurately isolate waveforms of interest. Advanced bus-protocol triggers and bus events (such as the onset of I2C bus traffic or UART-specific data can also serve as trigger conditions, thanks to the high-speed triggering ability designed into the Zynq SoC’s PL. These intelligent triggers greatly facilitate debugging and add considerable value to the new Siglent entry-level DSOs.
Here’s a translated block diagram of the SDS1000X-E DSO family’s system design:
The new SDS1000X-E DSO family illustrates the result of selecting a Zynq SoC as the foundation for a system design. The large number of on-chip resources permit you to think outside of the box when it comes to adding features. Once you’ve selected a Zynq SoC, you no longer need to think about cramming code into the device to add features. With the Zynq SoC’s hardware, software, and I/O programmability, you can instead start thinking up new features that significantly improve the product’s competitive position in your market.
This is precisely what Siglent’s engineers were able to do. Once the Zynq SoC was included in the design, the designers of this entry-level DSO family were able to think about which high-performance features they wished to migrate to their new design.
By Adam Taylor
We have looked at the XADC several times within this series. One thing we have not examined is how to use the external analog multiplexer capability. This is an oversight on my part as it can be very useful when we are architecting our system. With the XADC we can interface with up to 17 analog inputs: one dedicated Vp/Vn pair of inputs and sixteen auxiliary differential input pairs which share pins with the logic IO. This means that we can sample up to 17 different analog signals along with the device’s on-chip supply voltages and temperatures. This does of course does require the use of as many as 34 I/O pins, which can be challenging on some pin-constrained devices or designs.
The use of an external multiplexor provides us with the ability to sample up to 16 analog inputs. We need only 4 I/O lines for the multiplexer address as the Vp/Vn pair are dedicated and are outside of the multiplexer address. Note that we are not limited to using only the Vp/Vn pair for analog inputs. You can use any of the auxiliary inputs as well.
To demonstrate how we do this, the first thing with need is a Vivado design with the XADC set up to allow an external mux. We can do this on the ADC setup tab of the XADC wizard. We can also select which analog inputs are being used with the external mux. If we already have a design with the XADC enabled, we can use the AXI interface to configure it.
With the wider Vivado design, I am going to include some ILAs (Integrated Logic Analyzers) so that we can see what is happening internally and I am going to connect the mux pins from the FPGA to the ZedBoard AMS header GPIO pins and into a logic analyzer so that we can see they are changing as would be the case when driving an external mux.
Implementing this within the software is very similar to how we previously did this for the XADC. The first step is to configure the XADC as we would if we were using the internal mux capability. However, when we want to use the external mux we need to consider the information within UG480 and particularly the diagram below:
To use an external mux, we therefore need to do the following in addition to our normal approach:
Once these have been configured, we set the XADC sampling by setting the sequencer mode to continuous pass. This will then sequence the external mux pins around the inputs desired as shown below in the ILA capture when all 16 aux inputs are sampled.
The eagle-eyed will have noticed there are 16 external inputs which requires 4 pins but the external mux address provides 5 pins. To connect these to an external multiplexer we need to connect only the lower four bits of the address.
Just as we do when the internal mux is used, the sampled data from the conversion will be in the appropriate register and not in the Vp/Vn aux conversion register (e.g. aux 0 will be in aux 0, aux 1 in aux 1 and so on).
An external analog mux therefore allows us to monitor nearly the same number of analog signals with a much-reduced pin count. There is also another trick we can do with the XADC, which we will look, soon.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
This short, 2-minute video shows a live Spartan-7 7S50 FPGA operating on a board, running a MicroBlaze soft processor connected to DDR3 SDRAM as a demo. The 28nm Spartan-7 device family comes in small form-factor packages—as small as 8x8mm. You design systems based on Spartan-7 devices with the Xilinx Vivado HL Design Suite tools.
These devices are available for ordering now and operators are standing by.
Today, Xilinx posted information about the new $2995 Kintex UltraScale+ KCU116 Eval Kit on Xilinx.com. If you’re looking to get into the UltraScale+ FPGAs’ GTY transceiver races—to 32.75Gbps—this is a great kit to start with. The kit includes:
Here’s a nice shot of the KCU116 board from the kit’s quickstart guide:
Kintex UltraScale+ KCU116 Eval Board
One of the key features of this board are the four SFP+ optical cages there on the left. Those handle 25Gbps optical modules, driven of course by four of the KU5P FPGA’s GTY transceivers.
Take a look.
Here’s a 40-minute teardown video of a Vision Research Phantom v5 high-speed high-speed, 1024x1024-pixel, 1000frames/sec video camera (circa 2001) from tesla500’s YouTube video channel. His methodical teardown and excellent system-level explanation uncovers a couple of “huge” Xilinx XC4020 FPGAs (circa 2000) on the timing and interface boards and Xilinx XC9500 CPLDs implementing the timing and control on the four high-speed capture-memory boards. There’s also a Hitachi SH-2 32-bit RISC processor with a hardware MAC (for DSP) on the timing board.
The XC4020 FPGAs are 3rd-generation devices that each have 784 CLBs (1560 LUTs total). They were big in their day but they’re very small now. These days, I think you could implement all of the digital timing and control circuitry in this camera including the SH-2 processor’s capabilities using the smallest single-core Zynq Z-7007S SoC—with the ARM Cortex-A9 processor in the Zynq SoC running considerably more than 20x faster than the turn-of-the-millennium SH-2 processor’s roughly 28MHz maximum clock rate.
Of course, Vision Research has moved far beyond 1000 frames/sec over the past 17 years. Its latest cameras can go 1000x faster than that, hitting 1M frames/sec when configured with the company’s FAST option (fast indeed!), while the Phantom v5 is no longer listed even on the company’s “discontinued cameras” page. Nevertheless, I found tesla500’s teardown and explanations fascinating and valuable.
Of course, Xilinx All Programmable devices have long been used to design advanced video equipment like the Vision Research Phantom v5 high-speed camera. Which allows me to quickly remind you of the recent launch of the Xilinx reVISION stack launch for embedded-vision applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)
And now, here’s tesla500’s Vision Research Phantom v5 high-speed camera teardown video:
On April 11, the third, free Webinar in Xilinx's "Precise, Predictive, and Connected Industrial IoT" series will provide insight into the role of Zynq All Programmable SoCs in the breath of applications across IIoT Edge and the connectivity between them. A brief summary of IIoT trends will be presented, followed by an overview of the Data Distribution Service (DDS) IIoT databus standard presented by RTI, the IIoT Connectivity Company, and how DDS and OPC-UA target different connectivity challenges in IIoT systems.
Webinar attendees will also learn:
Xcell Daily discussed DeePhi Tech’s Zynq-based CNN acceleration processor last year in connection with the Hot Chips 2016 conference. (See “DeePhi’s Zynq-based CNN processor is faster, more energy efficient than CPUs or GPUs.”) DeePhi’s founder Song Yao appears in a new Powered by Xilinx video this week giving many more details including some fascinating information about an early customer, ZeroTech—China’s second largest drone maker.
DeePhi provides the entire stack needed to develop machine-learning applications based on neural networks including the development software, algorithms, and a neural-network processor that runs efficiently on the Xilinx Zynq SoC. This technology is particularly good for deep-learning, vision-based embedded apps such as drones, robotics, surveillance cameras, and for cloud-computing applications as well.
The video also provides more details on ZeroTech’s use of DeePhi’s machine-learning technology for object detection, pedestrian detection, and gesture recognition—all in a drone that nestles in your hand.
Song Yao explains that DeePhi’s tools provide a GPU-like development environment while taking advantage of the superior efficiency of neural networks implemented with programmable logic. In addition, DeePhi can change the neural network’s architecture to further optimize the design for specific applications.
Finally, he explains that you can use these Zynq-based implementations in applications where GPUs will simply not work due to power-consumption restrictions. In fact, last year at Hot Chips 2016 he reportedly said, “The FPGA based DPU platform achieves an order of magnitude higher energy efficiency over GPU on image recognition and speech detection.”
Here’s the new, 3-minute Powered by Xilinx video:
Last month, I blogged about a new Aldec FPGA Prototyping board—the HES-US-440—based on the “big, big, big” Xilinx Virtex UltraScale VU440 FPGA teamed with the Xilinx Zynq Z-7100 SoC. (See “Aldec selected the big, big, big Virtex UltraScale VU440 (and the Zynq SoC) for its new proto board—the HES-US-440.”) Now, Aldec’s Hardware Technical Support Manager Krzysztof Szczur has published an interesting article titled “Software Driven Test of FPGA Prototype,” which describes how you can use this prototyping board to create software-driven testbenches.
Why would you want to do that?
Because a software-driven verification methodology can shorten your development schedule by a lot, especially if you speed it up by moving from slow, software-based simulation to a much, much faster FPGA-based prototyping environment like the one provided by the Aldec HES-US-440.
And because time = money.
In fact, time >> money because you can usually find more money but there’s absolutely, positively no one out there minting time.
Note that this is true whether you’re designing an ASIC or you plan to deploy your design on a Xilinx All Programmable device.
Aldec HES-US-440 FPGA Prototpying Board Connection Diagram
Adam Taylor and Xilinx’s Sr. Product Manager for SDSoC and Embedded Vision Nick Ni have just published an article on the EE News Europe Web site titled “Machine learning in embedded vision applications.” That title’s pretty self-explanatory, but there are a few points I’d like to highlight. Then you can go read the full article yourself.
As the article states, “Machine learning spans several industry mega trends, playing a very prominent role within not only Embedded Vision (EV), but also Industrial Internet of Things (IIoT) and Cloud Computing.” In other words, if you’re designing products for any embedded market, you might well find yourself at a competitive disadvantage if you’re not adding machine-learning features to your road map.
This article closely ties machine learning with neural networks (including Feed-forward Neural Networks (FNNs), Recurrent Neural Networks (RNNs), and Deep Neural Networks (DNNs), and Convolutional Neural Networks (CNNs)). Neural networks are not programmed; they’re trained. Then, if they’re part of an embedded design, they’re deployed. Training is usually done using floating-point neural-network implementations but, for efficiency (power and cost), deployed neural networks can use fixed-point representations with very little or no loss of accuracy. (See “Counter-Intuitive: Fixed-Point Deep-Learning Inference Delivers 2x to 6x Better CNN Performance with Great Accuracy.”)
The programmable logic inside of Xilinx FPGAs, Zynq SoCs, and Zynq UltraScale+ MPSoCs is especially good at implementing fixed-point neural networks, as described in this article by Nick Ni and Adam Taylor. (Go read the article!)
Meanwhile, this is a good time to remind you of the recent Xilinx introduction of the reVISION stack for neural network development using Xilinx All Programmable devices. For more information about the Xilinx reVISION stack, see:
This article about Cyber Physical Systems on the Embedded Computing Design site led me to a new SMARC Rel. 2.0 module—the SECO SM-B71—that’s capable of carrying any one of ten Zynq UltraScale+ MPSoCs based on the common SFVC784 package pinout shared by these devices. The MPSoC device list includes the ZU2CG, ZU3CG, ZU4CG, ZU5CG family members and the ZU2EG, ZU3EG, ZU4EG, ZU5EG, ZU4EV, and ZU5EV family members with the integrated ARM Mali-400 MP2 GPU. Now that’s flexibility. The board also accommodates as much as 8Gbytes of DDR4-2400 SDRAM. Here’s a photo:
SECO SM-B71 Zynq UltraScale+ MPSoC SMARC Module
As you can see from the image, SECO is previewing this product at the moment, so please contact SECO directly for more information about the SM-B71.
InnoRoute has just started shipping its TrustNode extensible, ultra-low-latency (2.5μsec) IPv6 OpenFlow SDN router as a pcb-level product. The design combines a 1.9GHz, quad-core Intel Atom processor running Linux with a Xilinx FPGA to implement the actual ultra-low-latency router hardware. (You’re not implementing that as a Linux app running on an Atom processor!) The TrustNode Router reference design features twelve GbE ports. Here’s a photo of the TrustNode SDN Router board:
InnoRoute TrustNode SDN Router Board with 12 GbE ports
Based on the pcb layout in the photo, it appears to me that the Xilinx FPGA implementing the 12-port SDN router is under that little black heatsink in the center of the board nearest to all of the Ethernet ports while the quad-core processor running Linux must be sitting there in the back under that great big silver heatsink with an auxiliary cooling fan, near the processor-associated USB ports and SDcard carrier.
InnoRoute’s TrustNode Web page is slightly oblique as to which Xilinx FPGA is used in this design but the description sort of winnows the field. First, the description says that you can customize InnoRoute’s TrustNode router design using the Xilinx Vivado HL Design Suite WebPACK Edition—which you can download at no cost—so we know that the FPGA must be a 28nm series 7 device or newer. Next, the description says that the design uses 134.6k LUTs, 269.2k flip-flops, and 12.8Mbits of BRAM. Finally, we see that the FPGA must be able to handle twelve Gigabit Ethernet ports.
The Xilinx FPGA that best fits this description is an Artix-7 A200.
You can use this TrustNode board to jump into the white-box SDN router business immediately, or at least as fast as you can mill and drill an enclosure and screen your name on the front. In fact, InnoRoute has kindly created a nice-looking rendering of a suggested enclosure design for you:
InnoRoute TrustNode SDN Router (rendering)
The router’s implementation as IP in an FPGA along with the InnoRoute documentation and the Vivado tools mean that you can enhance the router’s designs and add your special sauce to break out of the white box. (White Box Plus? White Box Permium? White Box Platinum? Hey, I’m from marketing and I’m here to help.)
This design enhancement and differentiation are what Xilinx All Programmable devices are especially good at delivering. You are not stuck with some ASSP designer’s concept of what your customers need. You can decide. You can differentiate. And you will find that many customers are willing to pay for that differentiation.
Note: Please contact InnoRoute directly for more information on the TrustNode SDN Router.
Here’s another amazing demo video of National Instrument’s (NI’s) PXIe-5840 VST (Vector Signal Transceiver) showing real-time, DVR-like capture of 1GHz of continuous-bandwidth RF data on a 24Tbyte RAID drive. You’d want this if you needed to capture a real-time, broad-spectrum set of RF signals for subsequent, more detailed analysis. The VST captures the broad-spectrum data and simultaneously streams it to the RAID storage box. (The NI PXIe-5840 VST is based on a Xilinx Virtex-7 690T FPGA for its real-time RF-generation and –analysis capabilities.)
Here’s the 2-minute video:
For more information about the 2nd-generation NI VST, see:
A blog post from earlier this week, “Seven low-cost Zynq dev and training boards: a quick review,” prompted an email from Graham Naylor in the UK. Naylor informed me that I’d not mentioned his favorite Zynq-based board, the Trenz TE0722, in that blog post—and then he told me how he’s using the Trenz board (which is really more of a low-cost SOM rather than a training/dev board). During the day, Naylor measures neutron pulses from an ionization chamber using the Zynq-based Red Pitaya open instrumentation platform. (I’ve written many blogs about the Red Pitaya, listed below.) For fun, it appears that Naylor and colleague Pete Allwright design cave radios. If you’ve never heard of a cave radio, you’re in good company because I hadn’t either.
Naylor sent me a preprint of an article that will appear in the quarterly BCRA’s Cave Radio & Electronics Group Journal, in the June 2017 issue. (The BCRA is the British Cave Research Association.) Naylor’s and Allwright’s article, titled “Outlining the Architecture of the Nicola 3Z Cave Radio,” discusses the design of a new version of the Nicola 3 rescue radio designed to be used by cave rescue teams for underground communications.
The original Nicola 3 radio was based on a Xilinx Spartan-3E FPGA supplied on a module from OHO Elektronik. The FPGA implemented an SDR design for a radio that performs SSB modulation and demodulation using an 87KHz carrier wave. Radio transmission does not occur through the air but through the ground using a couple of electrodes jammed into the earthen floor of the cave. (We’re in a cave, remember?) A little water poured on the earth interface helps improve transmission/reception.
Xilinx introduced the 90nm Spartan-3E in 2005, so the Nicola cave radio development team has upgraded the Nicola design to the Zynq Z-7010 SoC, which resides on a low-cost Trenz TE0722 SOM. Trenz sells one of these boards for just €64.00 and if you want 500 pieces, the price drops to €48.00.
Trenz TE0722 Zynq SOM
The new radio is called the Nicola 3Z. (I'm guessing "Z" is for "Zynq.") The FPGA fabric in the Zynq SoC implements the SDR functions in the Nicola 3Z radio including the SSB class D modulator, which drives an H-bridge driver for transmission; the receiver’s SSB filter, decimator, and demodulator; and an AGC block implemented on a soft-core Xilinx PicoBlaze 8-bit microcontroller, which is also instantiated in the Zynq SoC’s FPGA. There’s a second PicoBlaze instantiation on chip for housekeeping. That Zynq Z-7010 SoC may be a low-end part, but it’s plenty busy in the Nicola 3Z radio’s design.
Note: For more information about the Zynq-based Red Pitaya open instrumentation platform, see:
By Adam Taylor
At the end of the Sysmon AMS blogs I had introduced the several PLLs within the Zynq UltraScale+ MPSoC. This introduction suggests to me that it’s time to talk about the clocking architecture of the MPSoC Device.
As with the original Zynq SoC, the PS (processing system) in the Zynq UltraScale+ MPSoC is the system master. So we will initially focus upon its clocking architecture. Within the PS there are three main clock inputs:
While the PS reference clock has a dedicated input pin, the PSS_ALT_REF_CLK and PSS_VIDEO_REF_CLK are input via the MIO and are enabled or disabled in Vivado by the I/O configuration customization tab. If we plan on using these clocks, we need to ensure there is no conflict with other planned use of the MIO.
Enabling the Alternate reference clock and the video clock
Once these have been enabled, we can configure them on the clock configuration input clock tab as shown below:
Internally, the PS has four clock groups that provide all the required clocks:
We’ll now focus on the MCG as this is the group with which we will have the most interaction. Within this group, we choose which of the five PLLs is used to clock the Zynq UltraScale+ MPSoC’s processors and peripherals within the LPD and FPD. We can do this via the clock configuration -> output clocks tab. Here we can configure the domains clocking for both the low and full power domains.
To generate a PLL output frequency as closely as possible to the desired frequency, we may want to change the PLL input-clock source. We have several potential clock sources which can be used to clock each of the PLLs within the Zynq UltraScale+ MPSoC.
As mentioned above we can use PS_REF_CLK, PS_ALT_REF_CLK, or PS_VIDEO_REF_CLK. These clocks are directly input into the PS. We can also use one of the four GT_REF_CLKS or the AUX_REF_CLK. This latter reference clock is provided from the PL while the former clock is provided by the PS_GTR. The relevant PLL control register selects which of these clocks drives the PLL. These registers reside in either the CRL_APB module for low-power domain PLLs or CRF_APB module for high-power domain PLLs.
We can select which of the four GT reference clocks is provided as the GT_REF_CLK using the Serial Input Output Unit (SIOU) module CRX_CNTRL Register.
Now that we understand the Zynq UltraScale+ MPSoC’s clocking and how we set the desired frequency for each of the subsystems, we will explore the subsystems in more detail in the MicroZed Chronicles blogs that follow.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
By Adam Taylor
I just received the most interesting email from Antti at Trenz, in which he pointed out that he had designed a 500MHz radio receiver using just four resistors, four capacitors, and a Xilinx series 7 FPGA. How did he do this? He is keeping that to himself for the moment. However, Antti thinks it would a great idea to open a challenge based upon this design to see if others in the FPGA community can explain how he achieved this. Of course, for the winners who supply the correct answer there will be Trenz goodies as prizes. (See below.)
The diagram below shows the components allowed to solve this challenge. You can redraw them if necessary to clarify the schematic. The values of R and C do not need to be optimized.
To prove that this is possible, the screen shots below show the input and the recovered signal inside the FPGA.
By this point I was thinking Delta Sigma ADC using the FPGA’s LVDS inputs. (There are papers and articles about this technique online.) However, Antti tells me this is not his solution and he was kind enough to provide a few hints for this challenge below:
Because they have so many Trenz prizes to give away to the winners, Antti has created three categories:
The closing date for entries is July 3rd. The judges will be Antti and myself (Adam Taylor).
If you want to enter your solution in any category email firstname.lastname@example.org
Image Matters’ Origami B20 module, based on a Xilinx Kintex UltraScale KU060 FPGA, is a small 94x53mm module that you can use to perform all sorts of high-speed processing. (See “Image Matters launches Origami Ecosystem for developing advanced 4K/8K video apps using the FPGA-based Origami module.”) For example, you can use it for a variety of video-compression applications using various IP compression cores including MPEG, JPEG-2000, and TICO. You can also use it for cloud-computing and neural-network applications such as image detection. The key thing is that the small Origami B20 module puts everything you need to run the FPGA on the one small module including SDRAM, Flash memory, the power supply, a backup battery, and security features (including tamper protection).
Here’s a short, 2.5-minute, Powered by Xilinx video with more information about the Origami B20 module:
In yesterday’s EETimes article titled “How will Ethernet go real-time for industrial networks?,” author Richard Wilson interviews National Instruments’ Global Technology and Marketing Director Rahman Jamal about using OPC-UA (the OPC Foundation’s Unified Architecture) and TSN (time-sensitive networking) to build industrial Ethernet networks (IIoT/Industrie 4.0) that deliver real-time response. (Yes, yes, yes, “real-time” is a loosely defined term where “real” depends on your system’s temporal reality.) As Jamal states in the interview, some constrained industrial Ethernet network topologies need no help to achieve real-time operation. In other cases and for other topologies, you need Ethernet implementations that are “heavily modified at the hardware level to achieve performance.”
One of the hardware additions that can really help is the hardware implementation of the IEEE 1588v2 PTP (Precision Time Protocol) clock-synchronization standard. PTP permits each piece of network-connected equipment to be synchronized using a 64-bit timer, which can be used for time-stamping, synchronization, control and as a common time reference to implement TSN.
PTP implementation is an ideal task for an IP block instantiated in programmable logic (see last year’s Xcell Daily blog post “Intelligent Gateways Make a Factory Smarter,” written by SoC-e (System on Chip engineering) founder and CEO Armando Astarloa). SoC-e has implemented just such an IEEE 1588v2 PTP IP core in a Xilinx Zynq SoC, which is the core logic device inside of the company’s CPPS-Gate40 Sensor intelligent IIoT gateway. (Note: Software PTP implementations are neither fast nor deterministic enough for many IIoT applications.)
SoC-e CPPS-Gate40 Sensor intelligent IIoT gateway
You can see the SoC-e PTP IP core in the very center of this CPPS-Gate40 block diagram:
SoC-e CPPS-Gate40 Sensor intelligent IIoT gateway block diagram
According to the SoC-e Web page, the company’s IEEE 1588v2 IP core in the CPPS-Gate40 Sensor gateway can deliver sub-microsecond network synchronization. How is such a small number possible? As Jamal says in his EETimes’ interview, “bit times (time on the wire) for a 64-byte frame at GigE rates is 512ns.” That’s how.
With last week’s introduction of the Digilent Arty Z7 Zynq SoC training and dev board, I felt it was time to review some of the low-cost boards that occupy an outsized piece of mindshare in my head. So here are seven that have appeared previously in the Xcell Daily blog, listed in alphabetical order by vendor:
Analog Devices ADALM-PLUTO ($149): Students from all levels and backgrounds looking to improve their RF knowledge will want to take a look at the new ADALM-PLUTO SDR USB Learning Module from Analog Devices. The $149 USB module has an RF range of 325MHz to 3.8GHz with separate transmit and receive channels and 20MHz of instantaneous bandwidth. It pairs two devices that seem made for each other: an Analog Devices AD9363 Agile RF Transceiver and a Xilinx Zynq Z-7010 SoC.
Analog Devices’ $149 ADALM-PLUTO SDR USB Learning Module
Digilent ARTY Z7 ($149 to $209): The first thing you’ll note from the Arty Z7 dev board photo is that there’s a Zynq SoC in the middle of the board. You’ll also see the board’s USB, Ethernet, Pmod, and HDMI ports. On the left, you can see double rows of tenth-inch headers in an Arduino/chipKIT shield configuration. There are a lot of ways to connect to this board, which should make it a student’s or experimenter’s dream board considering what you can do with a Zynq SoC.
Digilent Arty Z7 dev board for makers and hobbyists
Digilent PYNQ-Z1 ($229): PYNQ is an open-source project that makes it easy for you to design embedded systems using the Xilinx Zynq-7000 SoC using the Python language, associated libraries, and the Jupyter Notebook, which is a pretty nice, collaborative learning and development environment for many programming languages including Python. PYNQ allows you to exploit the benefits of programmable logic used together with microprocessors to build more capable embedded systems with superior performance when performing embedded tasks.
Digilent PYNQ-Z1 Dev Board
Krtkl’s snickerdoodle ($95 to $195): The amazing, Zynq-based “snickerdoodle one” is a low-cost, single-board computer with wireless capability based on the Xilinx Zynq Z-7010 SoC, available for purchase on the Crowd Supply crowdsourcing Web site.
Krtkl’s Zynq-based, WiFi-enabled Snickerdoodle Dev Board
National Instruments myRIO ($400 to $1001): The NI myRIO hardware/software development platform for NI’s LabVIEW system design software is based on the Zynq-7010 All Programmable SoC. About the size of a small paperback book (so that it easily drops into a backpack), the NI myRIO sports ten analog inputs, six analog outputs, left and right audio channels, 40 digital I/O lines (SPI, I2C, UARD, PWM, rotary encoder) and an on-board, 3-axis accelerometer, and two 34-pin expansion headers.
The Zynq-based NI myRIO
National Instruments RoboRIO ($435 for FRC teams): The NI roboRIO robotic controller was specifically designed for the FIRST Robotics Competition (FRC). The FRC event is a particular passion for NI’s founder, Dr. James Truchard.
NI roboRIO for First Robotics Competition teams
Trenz ZynqBerry (€79.00 to €109.00): The Trenz Electronic TE0726 ZynqBerry Dev Board puts a Xilinx Zynq Z-7010 or Z-7020 SoC into a Rasberry-Pi-compatible form factor with 64Mbytes of LPDDR2 SDRAM, four USB ports (in a hub configuration), a 100Mbps Ethernet port, an HDMI port, MIPI DSI and CSI-2 connectors, a PWM digital audio jack, and 128Mbits of Flash memory for configuration and operation.
TE0726 ZynqBerry Dev Board from Trenz
By Adam Taylor
A couple of weeks ago, I talked about the Xilinx reVision stack and the support it provides for OpenVX and OpenCV. One of the most exciting things I explained was about how we could accelerate several OpenCV functions (which include the OpenVX Core functions) using the Zynq SoC’s programmable logic. What I did not look at was the other significant part of the reVision stack and its support for machine learning.
Machine learning is increasing important for embedded-vision applications because it helps systems to evolve from being vision-enabled to being vision-guided autonomous systems. Machine learning is often used for embedded-vision applications to identify and classify information contained within an image. The embedded-vision system uses these identifications and classifications to make informed decisions in real time, enabling increased interaction with the environment.
For those unfamiliar with machine learning it is most often implemented by the creation and training of a neural network. Neural networks are modelled upon the human cerebral cortex in that each neuron receives an input, processes it, and communicates the processed signal it to another neuron. Neural networks typically consist of an input layer, internal layer(s), and an output layer.
Those familiar with machine learning may have come across the term “deep learning.” This is where there are several hidden layers in the neural network, allowing more complex machine-learning algorithms to be implemented.
When working with neural networks in embedded-vision applications, we need to use a 2D network. This is where Convolutional Neural Networks (CNNs) are used. CNNs are deep-learning networks that contain several convolutional and sub-sampling layers along with a separate, fully connected network to perform the final classification. Within the convolution layer, the input image will be broken down into several overlapping smaller tiles.
The results from this convolution layer are used to create an activation map, using an activation layer in the network placed before further sub-sampling and additional stages and preceding the final, fully connected network. The exact implementation of the CNN network varies depending upon the network architecture implemented (GoogLeNet, SSD, AlexNet). However, a CNN will typically contain at least the following elements:
The weights used for each of these elements are determined via training, and one of the CNN’s advantages is the relative ease of training the network. Training requires large data sets and high-performance computers to correctly determine the weights for each stage.
To ease the development of machine-learning applications, many engineers use a framework like Caffe, which supports the implementation and training of machine learning. The use of frameworks allows us to work at a higher level and maximize reuse. Using a framework, we don’t need to start from scratch each time we develop an application.
The Xilinx reVision stack provides an integrated Caffe framework flow, which allows us to take the prototext definition of the network and trained weights to deploy the machine-learning application. (Note that network training is separate and distinct from deployment.) To enable this, the Xilinx reVision stack provides several hardware-accelerated functions that can be implemented within the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic) to create the machine-learning inference engine. The reVision stack also provides examples for a wide range of network structures, enabling us to get up and running with our machine-learning application without the need to initially compile the PL design. Once we are happy with the machine-learning application, we can then use the SDSoC flow to develop our own embedded-vision application containing the optimized machine-learning application.
Using the Zynq PL provides for an optimal implementation that delivers faster response times when interacting with the embedded-vision system environment. This is especially true as machine learning applications are increasingly implemented using fixed-point integers like INT8, which are ideal for implementation in DSP elements.
Machine learning is going to be a hot area for several applications. So I will be coming back to this topic in detail as the MicroZed Chronicles progress—with some examples of course.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
The multi-GHz processing capabilities of Xilinx FPGAs never fails to amaze me and the following video from National Instruments (NI) demonstrating the real-time signal-generation and analysis capabilities of the NI PXIe-5840 VST (Vector Signal Transceiver) are merely one more proof point. The NI VST is designed for use in a wide range of RF test systems including 5G and IoT RF applications, ultra-wideband radar prototyping, and RFIC testing. In the demo below, this 2nd-generation NI VST is generating an RF signal spanning 1.2GHz to 2.2GHz (1GHz of analog bandwidth) containing five equally spaced LTE channels. The analyzer portion of the VST is simultaneously and in real time demodulating and decoding the signal constellations in two of the five LTE channels.
The resulting analysis screen generated by NI’s LabVIEW software tells the story:
The reason that the NI PXIe-5840 VST can perform all of these feats in real time is because there’s a Xilinx Virtex-7 690T FPGA inside pulling the levers, making this happen. (NI’s 1st-generation VSTs employed Xilinx Virtex-6 FPGAs.)
Here's the 2-minute video of the NI VST demo:
Please contact National Instruments directly for more information on its VST family.
For additional blogs about NI’s line of VSTs, see:
If you’re going to strap two 12Gsamples/sec, 16-bit DACs and two 6.4Gsamples/sec, 12-bit ADCs into your VPX/AMC module, you’d better include massive real-time DSP horsepower to tame them. That’s exactly what VadaTech has done with its VPX599 and AMC599 modules by placing a Xilinx Kintex UltraScale KU115 FPGA (along with 16 or 20Gbytes of high-speed DDR4 SDRAM) on the modules’ digital carrier board mated to an FMC analog converter board.
VadaTech AMC599 ADC/DAC Module
VadaTech VPX599 ADC/DAC Module
Here’s a block diagram of the AMC599 module (the VPX599 block diagram is quite similar):
VadaTech AMC599 ADC/DAC Module Block Diagram
At these conversion rates, raw data streams to and from the host CPU are quite impractical so you must, repeat must, have on-board processing and local storage—and what other processing genie besides a Xilinx UltraScale FPGA would you trust to handle and process those sorts of extreme streams?
Please contact VadaTech directly for more information on the VPX599 and AMC599 modules.
The just-announced VICO-4 TICO SDI Converter from Village Island employs visually lossless 4:1 TICO compression to funnel a 4K60p video (on four 3G-SDI video streams or one 12G-SDI stream) into onto a single 3G-SDI output stream, which reduces infrastructure costs for transport, cabling, routing, and compression in broadcast networks.
VICO-4 4:1 SDI Converter from Village Island
Here’s a block diagram of what’s going on inside of Village Island’s VICO-4 TICO SDI Converter:
And here’s a diagram showing you what broadcasters can do with this sort of box:
The reason this is even possible in a real-time broadcast environment is because the lightweight intoPIX TICO compression algorithm has very low latency (just very a few video lines) when implemented in hardware as IP. (Software-based, frame-by-frame video compression is therefore totally out of the question in an application such as this introduces too much delay.)
Looking at the VICO-4’s main (and only) circuit board shows one main chip implementing the 4:1 compression and signal multiplexing. And that chip is… a Xilinx Kintex UltraScale KU035 FPGA. It has plenty of on-chip programmable logic for the TICO compression IP and it has sixteen 16.3Gbps transceiver ports—more than plenty to handle the 3G- and 12G-SGI I/O required by this application.
Note: Paltek in Japan is distributing Village Island’s VICO-4 board in Japan as an OEM component. The board needs 12Vdc at ~25VA.
For more information about TICO compression IP, see:
A configurable, COG (center-of-gravity), laser-line extraction algorithm allows VRmagic’s LineCam3D to resolve complex surface contours with 1/64 sub-pixel accuracy. (The actual measurement precision, which can be as small as a micrometer, depends on the optics attached to the camera.) The camera must process the captured video internally because, at its maximum 1KHz scan rate, there would be far more raw contour data than can be pumped over the camera’s GigE Vision interface. The algorithm therefore runs in real time on the camera’s internal Xilinx series 7 FPGA, which is paired with a TI DaVinci SoC to handle other processing chores and 2Gbytes of DDR3 SDRAM. The camera’s imager is a 2048x1088-pixel CMOSIS CMV2000 CMOS image sensor with a pipelined global shutter. The VRmagic LineCam3D also has a 2D imaging mode that permits the extraction of additional object information such as surface printing that would not appear on the contour scans (as demonstrated in the photo below).
Here’s a composite photo of the camera’s line-scan contour output (upper left), the original object being scanned (lower left), and the image of the object constructed from the contour scans (right):
In laser-triangulation measurement setups, the camera’s lens plane is not parallel to the scanned object’s image plane, which means that only a relatively small part of the laser-scanned image would normally be in focus due to limited depth of focus. To compensate for this, the LineCam3D integrates a 10° tilt-shift adapter into its rugged IP65/67 aluminum housing, to expand the maximum in-focus object height. Anyone familiar with photographic tilt-shift lenses—mainly used for architectural photography in the non-industrial world—immediately recognizes this as the Scheimpflug principle, which increases depth of focus by tilting the lens relative to both the imager plane and the subject plane. It’s fascinating that this industrial camera incorporates this ability into the camera body so that any C-mount lens can be used as a tilt-shift lens.
For more information about the LineCam3D camera, please contact VRmagic directly.
There’s a new line in the table for Spartan-7 FPGAs in the FPGA selection guide on page 2 showing an expanded-temperature range option of -40 to +125°C for all six family members. These are “Expanded Temp” -1Q devices. So if you have the need for extreme hi-temp (or low-temp) operation, you might want to check into these devices. Ask your friendly neighborhood Xilinx or Avnet sales rep.
For more information about the Spartan-7 FPGA product family, see:
National Instruments’ (NI’s) VirtualBench All-in-One Instrument, based on the Xilinx Zynq Z-7020 SoC, combines a mixed-signal oscilloscope with protocol analysis, an arbitrary waveform generator, a digital multimeter, a programmable DC power supply, and digital I/O. The PC- or tablet-based user-interface software allows you to make all of those instruments play together as a troubleshooting symphony. That point is made very apparent in this new 3-minute video demonstrating the speed at which you can troubleshoot circuits using all of the VirtualBench’s capabilities in concert:
For more Xcell Daily blog posts about the NI VirtualBench All-in-One instrument, see:
For more information about the VirtualBench, please contact NI directly.