Earlier this month, Xilinx held a developer’s forum in Frankfurt, Germany and Xilinx’s Senior Director for Software and IP Ramine Roan discussed the growing role of Xilinx All Programmable devices in his opening remarks, which appear in a New Electronics article written by Neil Tyler titled “Resurgence of interest in FPGAs helped by new services via the Cloud.” Roane started by stating something that any design team already knows: CPU architectures are failing to meet the demand of increasing workloads because Dennard frequency and power scaling—often erroneously lumped into Moore’s Law, which is really about transistor and density scaling—essentially died several years ago after several decades of robust health. The current workaround—multicore architectures—rapidly hits its own limits in most embedded systems where there just aren’t enough tasks to distribute to dozens of processor cores.
The article then quotes Roane:
“There are too many transistors switching at the same time and current leakage at lower geometries is hitting power constraint limits, and this is all happening at a time when workload demand is growing exponentially both in the Cloud and at the edge.”
One solution, hardware application accelerators, only make sense if the production volumes are justified. For that you need a killer app said Roane.
Problem: there just aren’t that many killer apps.
The current situation plays to the strengths of Xilinx All Programmable devices, which can be reconfigured for a truly wide range of applications. “They provide configurable processor sub-systems and hardware that can be reconfigured dynamically,” said Roane.
The problem, of course, is that taking advantage of the programmable hardware resources in Xilinx devices has not been as easy as it might be. In the past, you needed specialized hardware-design skills; You needed to know Verilog or VHDL; You needed to wade into possibly unfamiliar hardware waters.
Roane emphasized that things are very different today. As the article states, “Xilinx and its growing ecosystem of partners are now delivering a much richer development stack so that hardware, embedded and application software developers can program them more easily by using higher level programming options, like C, C++ and OpenCL.”
“We are now able to deliver a development stack that designers are increasingly familiar with and which is also available on the Cloud via secure cloud services platforms,” added Roane, referring to Xilinx-based cloud acceleration offerings from Amazon Web Services (AWS EC2 F1 instances) and Alibaba Cloud.
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
Korea-based ATUS has just published a 4-minute video of its Zynq-based CNN (convolutional neural network) performing real-time object recognition on a 416x234-pixel dashcam video stream at 46.7fps. Reliable, real-time object recognition is essential to the development of autonomous driving and ADAS systems. ATUS’ design is based on a Xilinx Zynq Z-7020 SoC running a YOLO (you only look once) object-detection system. In the video below, the system recognizes cars, trucks, buses, and pedestrians.
Without a doubt, how we use the Zynq SoC’s XADC in our developments is the one area I receive the most emails about. Just before Christmas, I received two questions that I thought would make for a pretty good blog. The first question asked how to use the AXI streaming output with DMA while the follow up question was about how to output the captured data for further analysis.
The AXI streaming output of the XADC is very useful when we want to create a signal-processing path, which might include filters, FFTs, or our own custom HLS processing block for example. It is also useful if we want to transfer many samples as efficiently as possible to the PS (processing system) memory for output or higher-level processing and decision making within the PS.
The XADC outputs samples on the AXI Stream for each of its channels when it is enabled. The AXI Streaming interface implements the optional TID bus that identifies the channel currently streaming on the AXI stream data output to allow downstream IP cores to correlate the AXI Stream data with an input channel. If we output only a single XADC channel, we do not need to monitor the TID signal information. However if we are outputting multiple channels in the AXI stream, we need to pay attention to the TID information to ensure that our processing blocks use the correct samples.
We need a more complex DMA architecture to support multi-channel XADC operation. The DMA IP Core must have its scatter-gather engine enabled to provide multi-channel support. Of course, this level of complexity is not required if we’re only using a single XADC channel.
For the following example, we will be using a single XADC output channel so that I can demonstrate these concepts simply. I will return to this example in a later blog and expand the design for multiple output channels.
We will use a DMA transfer from the PL to the PS to move XADC samples into the PS memory. However, we cannot connect the XADC and DMA AXI stream interfaces directly. That design won’t function correctly because the DMA IP Core requires the assertion of the optional AXI Stream signal TLast to signal AXI transfer completion. Unfortunately, the XADC’s AXI Streaming output does not contain this signal so we need to add a block to drive the TLast signal between the XADC and DMA.
This interface block is very simple. It should allow the user to define the size of the AXI transfer and it needs to assert the TLast signal once the AXI transfer size is reached. Rather helpfully, an IP block called Tlast_gen that implements this functionality has been provided here. All we need to do is add this IP core to the project IP repository and include it in our design.
We can use an AXI GPIO in the PL to control the size of the transfer dynamically at run time.
Creating the block diagram within Vivado for this example is very simple. In fact, most of the design can be automatically created using Vivado, as shown in this video:
The final block diagram is below. I have uploaded the TCL BD description to my GitHub to enable more detailed inspection and recreation of the project.
Once the project has been completed in Vivado, we can build the design and export it to SDK to create the application.
The software application performs the following functions:
Initialize the AXI GPIO
Initialize and configure the XADC to read a single channel (VP/VN on the Zedboard)
Reset the DMA channel
Loop forever performing the following
Simple DMA transfer from the PL to the PS
Flush the cache at the PS transfer address to ensure that we can see the data in the PS DDR memory
Flushing the cache is very important. If we don’t flush the cache, we will not see the captured ADC values in memory when we examine the PS DDR memory.
We also need to take care when setting the number of ADC samples transferred. The output Stream is 16 bits wide. The tlast_gen block counts these 16-bit (2-byte) transfers while the DMA transfer counts bytes. So we need we need to set the tlast_gen transfer size to be half the number of bytes the DMA is configured to transfer. If we fail to set this correctly, we will only be able to perform the transfer once. Then the DMA will hang because the tlast signal will not be generated.
Generation of the tlast signal
When I ran this software on the ZedBoard, I could see the ADC values changing as the DMA transfer occurred in a memory watch window.
Memory watch window showing the 256-byte DMA capture
Now that we can capture the ADC values in the PS memory, we may want to extract this information for further analysis—especially if we are currently verifying or validating the design. The simplest way to do this is to write out the values over an RS-232 port. However, this can be a slow process and it requires modification to the application software.
Another method we can use is the XSCT Console within the debug view in SDK. Using XSCT we can:
Read out a memory address range
Read out a memory address range as a TCL list
Read out a memory address range to a binary file
The simplest approach is to output a memory address range. To do this, we use the command:
mrd <address> <number of words>
Reading out 256 words from the address 0x00100000
While this technique outputs the data, the output format is not the easiest for an analysis program to work with because it contains both address and data values.
We can obtain a more useful data output by requesting the data to be output as a TCL list using the command:
mrd -value -size h 0x00100000 128
Reading out a TCL list
We can then use this TCL list with a program like Microsoft Excel, MATLAB, or Octave to further analyze the captured signal:
Captured Data analyzed and plotted in Excel.
Finally, if we want to download a binary file containing the memory contents we can use the command:
mrd -bin -file part233.bin 0x00100000 128
We can then read this binary file into an analysis program like Octave or MATLAB or into custom analysis software.
Hopefully by this point I have answered the questions posed and shared the answers more widely, enabling you to get your XADC designs up and running faster.
When the engineering team at MADV Technology set out to develop the Madventure 360—the world’s smallest, lightest, and thinnest consumer-grade, 360° 4K video camera—in late 2015, it discovered that an FPGA was the only device capable of meeting all of their project goals. As a result, the Madventure camera relies on a Xilinx Spartan-6 FPGA to stitch and synchronize the video streams from two image sensors (one aimed to the front, one aimed to the back) while also performing additional image processing. The tiny, palm-sized camera measures only 3.1x2.6 inches and is only 0.9 inches thick. It sells on Amazon at the moment for $309.99, complete with selfie stick and mini tripod.
MADV Madventure 360° video camera
Now you can hear a little about how MADV Technology created this tiny video wonder in this new Powered by Xilinx video:
Photonfocus has just introduced another industrial video camera in its MV1 industrial camera line—the MV1-D1280-L01-1280-G2 1280x1024-pixel, 85fps (948fps in burst mode), with a GigE interface—which implements all standard features of the MV1 platform as well as burst mode, MROI (multiple regions of interest), and binning. In burst mode, the camera’s internal 2Gbit burst memory can store image sequences for subsequent analysis. The amount of storage depends on image resolution: 250msec at 1024x124 pixels, 1000msec at 512x512 pixels. The maximum amount of stored video also varies with the size of the specified ROI.
The MV1-D1280-L01-1280-G2 1280x1024-pixel, 85fps (948fps in burst mode) industrial video camera with a GigE interface
Like many of its existing industrial video cameras, Photonfocus’ MV1-D1280-L01-1280-G2 is based on a platform design that uses a Xilinx Spartan-6 FPGA for a foundation. Use of the Spartan-6 FPGA permitted Photonfocus to create an extremely flexible vision-processing platform that serves as a common hardware foundation for several radically different types of rugged, industrial cameras in multiple camera lines. These cameras use very different imaging sensors to meet a wide variety of application requirements. The different sensors have very different sensor interfaces, which is why using the Spartan-6 FPGA—an interfacing wizard if there ever was one—as a foundation technology is such a good idea.
Here are some of the other Xilinx-based Photonfocus cameras covered previously in Xcell Daily:
Embedded-vision applications present many design challenges and a new ElectronicsWeekly.com article written by Michaël Uyttersprot, a Technical Marketing Manager at Avnet Silica, and titled “Bringing embedded vision systems to market” discusses these challenges and solutions.
First, the article enumerates several design challenges including:
Meeting hi-res image-processing demands within cost, size, and power goals
Handling a variety of image-sensor types
Handling multiple image sensors in one camera
Real-time compensation (lens correction) for inexpensive lenses
Distortion correction, depth detection, dynamic range, and sharpness enhancement
Next, the article discusses Avnet Silica’s various design offerings that help engineers quickly develop embedded-vision designs. Products discussed include:
The ISE 2018 show for Pro A/V and Broadcast equipment users and designers opens in Amsterdam in about three weeks and Xilinx will be showing several Pro A/V and broadcast technologies in its booth (#15-K222) including three all-new ones:
DisplayPort 1.4: With 8K cameras and display panels already beginning to appear, Pro AV system designers can now design products based on Xilinx UltraScale and UltraScale+ devices with 16.3Gbps GTH SerDes ports that can ingest 8K video for subsequent processing, compression, and transport.
SMPTE-2110: Macnica Technology will be demonstrating the SMPTE-2110 broadcast standard for media-over-IP transport running on its VIPA Professional PCIe Video Transport Interface card, which is based on a Xilinx Kintex UltraScale KU035 FPGA. With a mezzanine codec like intoPIX’s TICO, you can pipe 4K60 video through a standard 10G Ethernet pipe—which is great for designers of high-end video projectors, KVMs, media gateways, and video-over-IP boxes.
Macnica Technology’s VIPA Professional PCIe Video Transport Interface Card can implement SMPTE-2110 and TICO
for 4K60 video transmission over 10GbE
4K, low-latency (60msec glass-to-glass) HEVC video streaming running on a Zynq UltraScale+ MPSoC, useful for designing video-conferencing systems and KVMs.
You’ll also see Omnitek demonstrating its 4K-video warping and stitching subsystem IP running on a Xilinx ZCU106 eval kit. On the ZCU106’s Zynq UltraScale+ MPSoC, the warping subsystem can create real-time image warps on video streams with images as large as 4096x2160 pixels at 60fps and the image-stitching subsystem can stitch as many as eight video streams into one 4K/UHD stream in real time.
Although Deutsche Börse Group, one of the world’s largest stock and security exchanges, already had a packet-capture and time-stamping solution in place, a major upgrade and redesign of their co-location network in 2017 added 60 Metamakos K-Series networking devices—including the company’s MetaApp 32 Network Application Platform—in to its data center in Frankfurt, Germany. This upgrade was a response to increasing customer demand for market fairness and precision in network-based trading. The upgrade significantly enhances and strengthens network-monitoring capabilities and gives full visibility for network transactions by capturing every packet entering and exiting Deutsche Börse Group’s network. (Metamako has published a case study of this application. Click here for more information.)
Metamako’s MetaApp 32 is an adaptable network application platform that brings intelligence to the network edge for some of the most demanding, latency-critical networks including high-frequency trading and analytics. It is based on Xilinx Virtex-7 FPGAs, which means that the company’s Network Application Platform can run multiple networking applications in parallel—very quickly.
First, the YouTube video demonstrates the IoT design interacting with an app on a mobile phone. Then video takes you step-by-step through the creation process using the Xilinx Vivado development environment.
The YouTuber writes:
“I implemented a web server using Python and bottle framework, which works with another C++ application. The C++ application controls my custom IPs (such as PWM) implemented in PL block. A user can control LEDs, 3-color LEDs, buttons and switches mounted on ZYBO board.”
The YouTube video’s Web page also lists the resources you need to recreate the IoT design:
A quick look at the latest product table for the Xilinx Zynq UltraScale+ RFSoC will tell you that the sample rate for the devices’ RF-class, 14-bit DAC has jumped to 6.554Gsamples/sec, up from 6.4Gsamples/sec. I asked Senior Product Line Manager Wouter Suverkropp about the change and he told me that the increase supports “…an extra level of oversampling for DOCSIS3.1 [designers]. The extra oversampling gives them 3dB processing gain and therefore simplifies the external circuits even further.”
Zynq UltraScale+ RFSoC Conceptual Diagram
For more information about the Zynq UltraScale+ RFSoC, see:
This design runs on an EMC² EMC2-ZU3EG Development Platform, which is based on a Xilinx Zynq Ultrascale+ ZU3EG MPSoC. A VITA57.1 FMC-compatible daughter card plugged to the EMC² Development Platform provides the HDMI input/output interface. Compute-intensive tasks are implemented in hardware using standard or custom IP cores and Vivado HLS. A Xilinx LogiCORE AXI VDMA IP core provides high-bandwidth memory access between the board’s DDR4 SDRAM and the HDMI peripherals.
Sundance Multiprocessor’s Hardware-Accelerated Sobel Filtering Demo running on the Xilinx Zynq UltraScale+ MPSoC
The project’s software was developed using the Xilinx SDSoC 2017.2 development environment. It’s a bare-metal application that executes on the Zynq UltraScale+ MPSoC’s Arm Cortex-A53 processor. The Sobel filter algorithm alternately runs unaccelerated on the Arm cortex-A53 processor and then with hardware acceleration using IP instantiated in the Zynq UltraScale+ MPSoC’s PL. The results are impressive:
Without hardware acceleration: 0.73fps
With hardware acceleration: 31fps
That’s a 42x speedup!
Here’s a simplified block diagram of the reference design’s hardware from Vivado 2017.2:
“…our FPGA implementation achieves a 10x speedup on the compute intensive part of the code, compared to an optimized parallel implementation on multicore CPU, and it delivers a 3.5x speedup at application level for the given setup.”
The University of Kaiserslautern’s Microelectronic Systems Design Research Group has been working on accelerating financial applications using FPGAs in connection with high-performance computing systems since 2010 and that research has recently migrated to cloud-based computing systems including Amazon’s EC2 F1 Instance, which is based on Xilinx Virtex Ultrascale+ FPGAs. The results in this White Paper are based on using OpenCL code and the Xilinx SDAccel development environment.
For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:
HuMANDATA has introduced its EDX-303 series of FPGA dev boards featuring the three largest members of the new Xilinx Spartan-7 FPGAs: the S50T, S75T, and S100T. One other notable device on the board: a nonvolatile Everspin MR2A16AMA35 4Mbit MRAM (magnetic RAM) directly controlled by the Spartan-7 FPGA.
The boards all measure 53x54mm and share a common block diagram:
HuMANDATA EDX-303 dev board block diagram
Here’s a photo of the board:
HuMANDATA EDX-303 dev board for the three largest Spartan-7 FPGAs
Please contact HuMANDATA directly for more information about the EDX-303 dev board series.
Xilinx has announced availability of automotive-grade Zynq UltraScale+ MPSoCs, enabling development of safety critical ADAS and Autonomous Driving systems. The 4-member Xilinx Automotive XA Zynq UltraScale+ MPSoC family is qualified according to AEC-Q100 test specifications with full ISO 26262 ASIL-C level certification and is ideally suited for various automotive platforms by delivering the right performance/watt while integrating critical functional safety and security features.
The XA Zynq UltraScale+ MPSoC family has been certified to meet ISO 26262 ASIL-C level requirements by Exida, one of the world's leading accredited certification companies specializing in automation and automotive system safety and security. The product includes a "safety island" designed for real-time processing functional safety applications that has been certified to meet ISO 26262 ASIL-C level requirements. In addition to the safety island, the device’s programmable logic can be used to create additional safety circuits tailored for specific applications such as monitors, watchdogs, or functional redundancy. These additional hardware safety blocks effectively allow ASIL decomposition and fault-tolerant architecture designs within a single device.
The design solutions we create using Zynq SoC and Zynq UltraScale+ MPSoC devices are complex embedded systems. This blog post about embedded system design will address the application challenges using both the Processing System (PS) and the Programmable Logic (PL) found in the Zynq SoC and MPSoC devices.
When it comes to commissioning these systems, we need to be able to debug interactions between the PS and PL at run time. We can use breakpoints to halt the software so that we can examine values in memory and registers. We can also use Integrated Logic Analyzers (ILA) to examine designs within the PL. What we need is a method to allow software break points and PL ILAs to work together to provide maximum information about the system’s behavior.
Cross triggering allows us to do just this. Cross triggering can:
Trigger an ILA when a software break point is hit
Halt the software when an ILA is triggered
Before we can use this debugging technique, we need to enable cross triggering between the PS and PL during debug it within the customization dialog of the processing system (for either the Zynq SoC or Zynq UltraScale+ MPSoC) within Vivado.
Enabling Cross Triggering in the Zynq SoC
Enabling Cross Triggering in the Zynq UltraScale+ MPSoC
We can have as many as four cross triggers in each direction. However for this example, I am only using one trigger in each direction.
Once enabled, the Zynq SoC/Zynq UltraScale+ MPSoC PS block within our block diagram will include additional ports named TRIGGER_IN_x and TRIGGER_OUT_x. We connect these ports to the ILAs within our PL design to create cross triggers.
Note: We do not need to connect the TRIGGER_IN and TRIGGER_OUT ports to the same ILA as I do in this example. Also, the trigger input and output actually consist of two signals, the trigger signal and a trigger acknowledge.
For this example, I am using one ILA in conjunction with a Zynq SoC PS to cross trigger in both directions:
Cross triggering between an ILA and a Zynq SoC’s PS
Enabling ILA Triggers
Once we have the hardware design completed within Vivado, the next step is to build the design and then export the hardware definition and bit file to the SDK. It is within SDK that we create the application and enable cross triggering within our debug configuration.
To enable cross triggering, we need to update the debug configuration using the debug configuration dialog where we select the “enable cross triggering” option. Once this has been selected, we also need to define the connections between the input and output triggers by creating new cross-trigger breakpoints. We do this by clicking on the button to the right of the “enable cross triggering” option.
Enabling cross triggering
This will open a dialog that allow you to connect the trigger input and outputs. For this example, I need to create two cross trigger breakpoints for the one input and one output. One trigger goes from the PS to the PL and the other goes from the PL to the PS. To create these, we simply click on the create button.
Cross Trigger Breakpoint Definition
Defining the connection between the PS and PL
This will open a dialog box that enables the connections between the trigger input and outputs to be defined. For this example, I have connected:
PS to PL trigger: CPU Debug output to Fabric Trace Monitor (FTM) Inputs. This will assert triggers to all four cross-trigger signals in the PL when the CPU hits a breakpoint.
PL to PS trigger: FTM outputs to CPU Debug input. This will stop the software application when any of the four PL FTM triggers asserts.
Both the Input and output triggers within the PL from the PS are connected to the ILA in this example. As such, we can use these triggers within the ILA to determine what action we wish to take.
We also must have a Vivado hardware manager open and connected to the ILAs within the device to successfully execute cross triggering along with SDK.
If we wish the software to trigger the ILA when code execution hits a breakpoint, we configure the ILA’s trigger mode to trigger from TRIG_IN_ONLY or BASIC/ADV_OR_TRIG_IN. Depending upon which is selected, the ILA will trigger if the external trigger occurs or if an ILA internal trigger occurs. Running the application software with a breakpoint defined will result in the ILA waveform being available for inspection when the breakpoint is reached.
Should we wish for a logic transaction in the PL to halt the software, we can use the ILA’s trigger output. The ILA can be configured to output a trigger when either a user-defined internal ILA trigger occurs (e.g. when a rising edge occurs) or when a value on a bus is detected, or to propagate the ILA’s input trigger to the output.
Setting the ILA to trigger when a software breakpoint is hit
Setting the ILA to stop software execution when the ILA triggers
This simple example allowed me to perform cross triggering in both directions, with ease.
Cross triggering is a useful technique that helps get to the bottom of troublesome PL/PS interaction issues and can also be of use when we wish to gather evidence for verification in critical applications.
Bitmain’s Antminer S9 Bitcoin Mining Machine uses a Zynq Z-7010 SoC as a main control processor
The Powered by Xilinx program has just published a 3-minute video containing an interview with Yingfei Li, Bitmain’s Marketing Director, and Wenguo Zhang, Bitmain’s Hardware R&D Director. In the video, Zhang explains that the Zynq Z-7010 solved multiple hidden problems with the company’s previous-generation control panel, thanks to the Zynq SoC’s dual-core Arm Cortex-A9 MPCore processor and the on-chip programmable logic.
Due to the success that Bitmain has had with Xilinx Zynq SoCs in it’s Antminer S9 Bitcoin mining machine, the company is now exploring the use of Xilinx 20nm and 16nm devices (UltraScale and UltraScale+) for future, planned AI platforms and products.
DornerWorks is one of only three Xilinx Premier Alliance Partners in North America offering design services, so the company has more than a little experience using Xilinx All Programmable devices. The company has just launched a new learn-by-email series with “interesting shortcuts or automation tricks related to FPGA development.”
The series is free but you’ll need to provide an email address to receive the lessons. I signed up and immediately received a link to the first lesson titled “Algorithm Implementation and Acceleration on Embedded Systems” written by DornerWorks’ Anthony Boorsma. It contains information about the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC and the Xilinx SDSoC development environment.
Need a tiny-but-powerful SOM for your next embedded project? The iWave iW-RainboW-G28M SOM based on a Xilinx Zynq Z-7007S, Z-7014S, Z-7010, or Z-7020 SoC is certainly tiny—it’s a 67.6x37mm plug-in SoDIMM—and with one or two Arm Cortex A9 MPCore processors, 512Mbytes of DDR3 SDRAM, 512Mbytes of NAND Flash, Gigabit Ethernet and USB 2.0 ports, and an optional WiFi/Bluetooth module it certainly qualifies as powerful and it’s offered in an industrial temp range (-40°C to +85°C).
iWave’s iW-RainboW-G28M SoDIMM SOM is based on any one of four Xilinx Zynq SoCs
iWave’s SOM design obviously takes advantage of the pin compatibility built into the Xilinx Zynq Z-7000S and Z-7000 device families.
You’ll find the announcement for the iW-RainboW-G28M SoDIMM SOM here and the data sheet here.
Please contact iWave directly for more information about the iW-RainboW-G28M SoDIMM SOM.
Providing good, filtered, reliable, and precise power to FPGAs and SoCs is an engineering challenge and if you’d like help meeting with challenge for Xilinx FPGAs, Zynq SoCs, or Zynq UltraScale+ MPSoCs in the 10W to 50W range, then Avnet and Infineon’s January 16 Webinar titled “Infineon DC/DC PMIC for FPGAs/SoCs for 10W to 50W Applications” is for you.
Here are the key takeaways you should get from this free Webinar:
Learn to simplify your BOM by replacing many of your standard regulators with one Infineon PMIC
Learn how updating to Xilinx newest recommendations for rail power consolidation leads to best integration and use case
Discover how to re-use your power design to cover your complete set of application with FPGAs/SoCs and ASICs
Lower total solution cost by high level integration and component reduction (35% board area reduction)
Dialog Semiconductor’s app note, AN-PM-096, titled “Power Solutions for Xilinx Spartan-7 Devices” has a full discussion of this topic and provides a reference design for the DA9062 PMIC that consumes a mere 420mm2 (20x21mm) of pcb area.
Here’s how the DA9062 PMIC’s four internal buck regulators and four internal LDO regulators match up to the power requirements of a Spartan-7 FPGA:
And here’s the pcb footprint for the DA9062 reference design:
As noted by Xilinx Senior Tech Marketing Manager for Analog and Power Delivery Cathal Murphy, the DA9062 PMIC makes a good match for the Xilinx Spartan-6 FPGA family as well.
The recent introduction of the groundbreaking Xilinx Zynq UltraScale+ RFSoC means that there are big changes in store for the way advanced RF and comms systems will be designed. With as many as 16 RF-class ADCs and DACs on one device along with a metric ton or two of other programmable resources, the Zynq UltraScale+ RFSoC makes it possible to start thinking about single-chip Massive MIMO systems. A new EDN.com article by Paul Newson , Hemang Parekh, and Harpinder Matharu titled “Realizing 5G New Radio massive MIMO systems” teases a few details for building such systems and includes this mind-tickling photo:
A sharp eye and keen memory will link that photo to a demo from last October’s Xilinx Showcase demo at the Xilinx facility in Longmont, Colorado. Here’s Xilinx’s Lee Hansen demonstrating a similar system based on the Xilinx Zynq UltraScale+ RFSoC:
For more details about the Zynq UltraScale+ RFSoC, contact your friendly neighborhood Xilinx or Avnet sales rep and see these previous Xcell Daily blog posts:
Trenz Electronic has taken the largest Zynq SoC—a Z-7100—and placed it along with 1Gbyte of DDR3 SDRAM, 4Gbytes of eMMC Flash memory, and 32Mbytyes of QSPI Flash memory on a diminutive 8.5x8.5cm, industrial-grade, ready-to-ship SOM called the TE0782-02-100-2I. In addition to its dual-core Arm Cortex-A9 MPCore processor subsystem, the Zynq Z-7100 SoC provides you with a ton of programmable resources including 444K logic cells, 26.5Mbits of BRAM, 2020 DSP48 slices, and sixteen bulletproof GTX 12.5Gbps SerDes transceivers. Three 160-pin, high-speed connectors on the bottom of the SOM provide I/O connectivity between the Zynq Z-7100 SoC and the rest of your system.
Here are top and bottom photos of the Trenz Electronic TE0782-02-100-2I SOM:
Trenz Electronic TE0782-02-100-2I SOM based on a Xilinx Zynq Z-7100 SoC (top view)
Trenz Electronic TE0782-02-100-2I SOM based on a Xilinx Zynq Z-7100 SoC (bottom view)
As a Xilinx employee I would like to contribute on the Pros ... and the Cons.
Let start with the Cons: if there is a processor that suits all your needs in terms of cost/power/performance/IOs just go for it. You won't be able to design the same thing in an FPGA at the same price.
Now if you need some kind of glue logic around (IOs), or your design need multiple processors/GPUs due to the required performance then it's time to talk to your local FPGA dealer (preferably Xilinx distributor!). I will try to answer a few remarks I saw throughout this thread:
FPGA/SoC: In the majority of the FPGA designs I’ve seen during my career at Xilinx, I saw some kind of processor. In pure FPGAs (Virtex/Kintex/Artix/Spartan) it is a soft-processor (Microblaze or Picoblaze) and in a [Zynq SoC or Zynq Ultrascale+ MPSoC], it is a hard processor (dual-core Arm Cortex-A9 [for Zynq SoCs] and Quad-A53+Dual-R5 [for Zynq UltraScale+ MPSoCs]). The choice is now more complex: Processor Only, Processor with an FPGA aside, FPGA only, Integrated Processor/FPGA. The tendency is for the latter due to all the savings incurred: PCB, power, devices, ...
Power: Pure FPGAs are making incredible progress, but if you want really low power in stand-by mode you should look at the Zynq Ultrascale+ MPSoC, which contains many processors and particularly a Power Management Unit that can switch on/off different regions of the processors/programmable logic.
Analog: Since Virtex-5 (2006), Xilinx has included ADCs in its FPGAs, which were limited to internal parameter measurements (Voltage, Temperature, ...). [These ADC blocks are] called the System Monitor. With 7 series (2011) [devices], Xilinx included a dual 1Msamples/sec@12-bits ADC with internal/external measurement capabilities. Lately Xilinx [has] announced very high performance ADCs/DACs integrated into the Zynq UltraScale+ RFSoC: 4Gsamples/sec@12 bits ADCs / 6.5Gsamples/sec@14 bits DACs. Potential applications are Telecom (5G), Cable (DOCSYS) and Radar (Phased-Array).
Security: The bitstream that is stored in the external Flash can be encoded [encrypted]. Decoding [decrypting] is performed within the FPGA during bitstream download. Zynq-7000 SoCs and Zynq Ultrascale+ MPSoCs support encoded [encrypted] bitstreams and secured boot for the processor[s].
Ease of Use: This is the big part of the equation. Customers need to take this into account to get the right time to market. Since 2012 and [with] 7 series devices, Xilinx introduced a new integrated tool called Vivado. Since then a number of features/new tools have been [added to Vivado]:
IP Integrator(IPI): a graphical interface to stitch IPs together and generate bitstreams for complete systems.
Vivado HLS (High Level Synthesis): a tool that allows you to generate HDL code from C/C++ code. This tool will generate IPs that can be handled by IPI.
SDSoC (Software Defined SoC): This tool allows you to design complete systems, software and hardware on a Zynq SoC/Zynq UltraScale+ MPSoC platform. This tool with some plugins will allow you to move part of your C/C++ code to programmable logic (calling Vivado HLS in the background).
SDAccel: an OpenCL (and more) implementation. Not relevant for this thread.
There are also tools related to the MathWorks environment [MATLAB and Simulink]:
System Generator for DSP (aka SysGen): Low-level Simulink library (designed by Xilinx for Xilinx FPGAs). Allows you to program HDL code with blocks. This tools achieves even better performance (clock/area) than HDL code as each block is an instance of an IP (from register, adder, counter, multiplier up to FFT, FIR compiler, and VHLS IP). Bit-true and cycle-true simulations.
Xilinx Model Composer (XMC): available since ... yesterday! Again a Simulink blockset but based on Vivado HLS. Much faster simulations. Bit-true but not cycle-true.
All this to say that FPGA vendors have [expended] tremendous effort to make FPGAs and derivative devices easier to program. You still need a learning curve [but it] is much shorter than it used to be…
One of life’s realities is that the most advanced semiconductor devices—including the Xilinx Zynq UltraScale+ MPSoCs—require multiple voltage supplies for proper operation. That means that you must devote a part of the system engineering effort for a product based on these devices to the power subsystem. Put another way, it’s been a long, long time since the days when a single 5V supply and a bypass capacitor were all you needed. Fortunately, there’s help. Xilinx has a number of vendor partners with ready, device-specific power-management ICs (PMICs). Case in point: Dialog Semiconductor.
If you need to power a Zynq UltraScale+ ZU3EG, ZU7EV, or ZU9CG MPSoC, you’ll want to check out Dialog’s App Note AN-PM-095 titled “Power Solutions for Xilinx Zynq Ultrascale+ ZU9EG.” This document contains reference designs for cost-optimized, PMIC-based circuits specifically targeting the power requirements for Zynq UltraScale+ MPSoCs. According to Xilinx Senior Tech Marketing Manager for Analog and Power Delivery Cathal Murphy, Dialog Semi’s PMICs can be used for low-cost power-supply designs because they generate as many as 12 power rails per device. They also switch at frequencies as high as 3MHz, which means that you can use smaller, less expensive passive devices in the design.
It also means that your overall power-management design will be smaller. For example, Dialog Semi’s power-management ref design for a Zynq UltraScale+ ZU9 MPSoC requires only 1.5in2 of board space—or less for smaller devices in the MPSoC family.
You don’t need to visualize that in your head. Here’s a photo and chart supplied by Cathal:
The Dialog Semi reference design is hidden under the US 25-cent piece.
As the chart notes, these Dialog Semi PMICs have built in power sequencing and can be obtained preprogrammed for Zynq-specific power sequences from distributors such as Avnet.
Cathal also pointed out that Dialog Semi has long been supplying PMICs to the consumer market (think smartphones and tablets) and that the power requirements for Zynq UltraScale+ MPSoCs map well into the existing capabilities of PMICs designed for this market, so you reap the benefit of the company’s volume manufacturing expertise.
Late last week, Avnet announced that it’s now offering the Aaware Sound Capture Platform paired with the MiniZed Zynq SoC development platform as a complete dev kit for voice-based cloud services including Amazon Alexa and Google Home. It’s listed on the Avnet site for $198.99. Avnet and Aaware are demonstrating the new kit at CES 2018, being held this week in Las Vegas. You’ll find them at the Eureka Park booth #50212 in the Sands Expo.
The Aaware Sound Capture Platform coupled to a Zynq-based Avnet MiniZed dev board
The Aaware Sound Capture Platform couples as many as 13 MEMS microphones (you can use fewer in a 1D linear or 2D array) with a Xilinx Zynq Z-7010 SoC to pre-filter incoming voice, delivering a clean voice data stream to local or cloud-based voice recognition hardware. The system has a built-in wake word (like “Alexa” or “OK, Google”) that triggers the unit’s filtering algorithms.
Avnet’s MiniZed dev board is usually based on a single-core Zynq 7Z007S but the MiniZed board included in this kit is actually based on a dual-core Zynq Z-7010 SoC. This board offers you outstanding wireless I/O in the form of a WiFi 802.11b/g/n module and a Bluetooth 4.1 module.
For more information about the Aaware Sound Capture Platform, see:
Adam Taylor has been writing about the use of Xilinx All Programmable devices for image-processing platforms for quite a while and he has wrapped up much of what he knows into a 44-minute video presentation, which appears below. Adam is presenting tomorrow at the Xilinx Developer Forum being held in Frankfurt, Germany.
My good friend Jack Ganssle has long published The Embedded Muse email newsletter and the January 2, 2018 issue (#341!) includes an extensive review of the new $759, Zynq-based Siglent SDS1204X-E 4-channel DSO. Best of all, he’s giving one of these bad boys away at the end of January. (Contest details below.)
Siglent’s Zynq-based SDS1204X-E 4-channel DSO. Photo credit: Jack Ganssle
“I'm blown away by the advanced engineering and quality of manufacturing exhibited by this and some other Chinese test equipment. Steve Leibson wrote a piece about how the unit works, and it's clear that the innovation and technology in this unit are world-class.”
In my own review of the Siglent SDS1202X-E last November, I wrote:
“Siglent’s SDS-1202X-E and SDS-1104X-E DSOs once again highlight the Zynq SoC’s flexibility and capability when used as the foundation for a product family. The Zynq SoC’s unique combination of a dual-core Arm Cortex-A9 MPCore processing subsystem and a good-sized chunk of Xilinx 7 series FPGA permits the development of truly high-performance platforms.”
Last April, I wrote:
“The new SDS1000X-E DSO family illustrates the result of selecting a Zynq SoC as the foundation for a system design. The large number of on-chip resources permit you to think outside of the box when it comes to adding features. Once you’ve selected a Zynq SoC, you no longer need to think about cramming code into the device to add features. With the Zynq SoC’s hardware, software, and I/O programmability, you can instead start thinking up new features that significantly improve the product’s competitive position in your market.
“This is precisely what Siglent’s engineers were able to do. Once the Zynq SoC was included in the design, the designers of this entry-level DSO family were able to think about which high-performance features they wished to migrate to their new design.”
All of that is equally true for the Siglent SDS1204X-E 4-channel DSO, which is further proof of just how good the Zynq SoC is when used as a foundation for an entire product-family.
Now if you want to win the Siglent SDS1204X-E 4-channel DSO that Jack’s giving away at the end of January, you first need to subscribe to The Embedded Muse. The subscription is free, Jack’s an outstanding engineer and a wonderful writer, and he’s not going to sell or even give your email address to anyone else so consider the Embedded Muse subscription a bonus for entering the drawing. After you subscribe, you can enter the contest here. (Note: It’s Jack’s contest, so if you have questions, you need to ask him.)
In the video, KORTIQ CEO Harold Weiss discusses using low-end Zynq SoCs (up to the Z-7020) and Zynq UltraScale+ MPSoCs (the ZU2 and ZU3) to create low-power solutions that deliver “just enough” performance for target industrial applications such as video processing, which requires billions of operations per second. The Zynq SoCs and Zynq UltraScale+ MPSoCs consume far less power than competing GPUs and CPUs while accelerating multiple CNN layers including convolutional layers, pooling layers, fully connected layers, and adding layers.
If you want your design to run at maximum speed at the lowest possible power consumption (and who does not?), then you want to run your algorithms using fixed-point hardware. With that in mind, MathWorks has just published an extensive guide to “Best Practices for Converting MATLAB Code to Fixed Point” for MATLAB-based designs with a nearly hour-long companion video.