Displaying articles for: 04-16-2017 - 04-22-2017
Like any semiconductor device, designing with Xilinx All Programmable devices means dealing with their power-supply requirements—and like any FPGA or SoC, Xilinx devices do have their fair share of requirements in the power-supply department. They require several supply voltages, more or less depending on your I/O requirements, and they need to have these voltages ramped up and down in a certain sequence and with specific ramp rates if they’re to operate properly. On top of that, power-supply designs are board-specific—different for every unique pcb. Dealing with all of these supply specs is a challenging engineering problem, just due to the number of requirements, so you might like some help tackling it.
Here’s some help.
Infineon demonstrated a reference power supply design for Xilinx Zynq UltraScale+ MPSoCs based on its IRPS5401 Flexible Power Management Unit at APEC (the Applied Power Electronics Conference) last month. The reference design employs two IRPS5401 devices to manage and supply ten different power supplies. Here’s a block diagram of the reference design:
This design is used on the Avnet UltraZed SOM, so you know that it’s already proven. (For more information about the Avnet UltraZed SOM, see “Avnet UltraZed-EG SOM based on 16nm Zynq UltraScale+ MPSoC: $599” and “Look! Up in the sky! Is it a board? Is it a kit? It’s… UltraZed! The Zynq UltraScale+ MPSoC Starter Kit from Avnet.”)
Now the UltraZed SOM measures only 2x3.5 inches (50.8x76.2mm) and the power supply consumes only a small fraction of the space on the SOM, so you know that the Infineon power supply design must be compact.
It needs to be.
Here’s a photo of the UltraZed SOM with the power supply section outlined in yellow:
Infineon Power Supply Design on the Avnet UltraZed SOM (outlined in yellow)
Even though this power supply design is clearly quite compact, the high integration level inside of Infineon’s IRPS5401 Flexible Power Management Unit means that you don’t need additional components to handle the power-supply sequencing or ramp rates. The IRPS5401s handle that for you.
However, every Zynq UltraScale+ MPSoC pcb is different because every pcb presents different loads, capacitances, and inductances to the power supply. So you will need to tailor the sequencing and ramp times for each board design. Sounds like a major pain, right?
Well, Infineon felt your pain and offers an antidote. It’s an Infineon software app called the PowIRCenter and it’s designed to reduce the time needed to develop the complex supply-voltage sequencing and ramp times to perhaps 15 minutes worth of work—which is how long it took, apparently, for an Avnet design engineer to set the timings for the UltraZed SOM.
Here’s a 4-minute video where Infineon’s Senior Product Marketing Manager Tony Ochoa walks you through the highlights of this power supply design and the company’s PowIRCenter software:
Just remember, the Infineon IRPS5401 Flexible Power Management Unit isn’t dedicated to the Zynq UltraScale+ MPSoC. You can use it to design power supplies for the full Xilinx device range.
Note: For more information about the IRPS5401 Flexible Power Management Unit, please contact Infineon directly.
Today, intoPIX announced that it’s lightweight TICO video-compression IP cores for Xilinx FPGAs can now support frame resolutions and rates to 8K60p as well as the previously supported HD and 4K resolutions. Currently, the compression cores support 10-bit, 4:2:2 workflows but intoPIX also disclosed in a published table (see below) that a future release of the IP core will support 4:4:4 color sampling. The TICO compression standard simplifies the management of live and broadcast video streams over existing video network infrastructures based on SDI and Ethernet by reducing the bandwidth requirements of high-definition and ultra-high-definition video at compression ratios as large as 5:1 (visually lossless at ratios to 4:1). TICO compression supports live video streams through its low latency—less than 1msec end-to-end.
Conveniently, intoPIX has published a comprehensive chart showing its various TICO compression IP cores and the Xilinx FPGAs that can support them. Here’s the intoPIX chart:
Note that the most cost-effective Xilinx FPGAs including the Spartan-6 and Artix-7 families support TICO compression at HD and even some UHD/4K video formats while the Kintex-7, Virtex-7, and UltraScale device families support all video formats through 8K.
Please contact intoPIX for more information about these IP cores.
Mentor Embedded is now supporting the Android OS (plus Linux and Nucleus) on Zynq UltraScale+ MPSoCs. You learn more in a free Webinar titled “Android in Safety Critical Designs” that’s being held on May 3 and 4. The Webinar will discuss how to use Android in safety-critical designs on the Xilinx Zynq UltraScale+ MPSoC. Register for the Webinars here.
I got a heads up on a new, low-end dev board called the “MiniZed” coming soon from Avnet and found out there’s a pre-announcement Web page for the board. Avnet’s MiniZed is based on one of the new Zynq Z-7000S family members with one ARM Cortex-A9 processor. It will include both WiFi and Bluetooth RF transceivers and, according to the MiniZed Web page, will cost less than $100!
Here’s the link to the MiniZed Web page and here’s a slightly fuzzy MiniZed board photo:
Avnet MiniZed (coming soon, for less than $100)
If I’m not mistaken, that’s an Arduino header footprint and two Digilent Pmod headers on the board, which means that a lot of pretty cool shields and Pmods are already available for this board (minus the software drivers, at least for the Arduino shields).
I know you’ll want more information about the MiniZed board but I simply don’t have it. So please contact Avnet for more information or register for the info on the MiniZed Web page.
The Vivado Design Suite HLx Editions 2017.1 release is now available for download. The Vivado HL Design Edition and HL System Edition now support partial reconfiguration. Partial reconfiguration is available for the Vivado WebPACK Edition at a reduced price.
Xilinx partial reconfiguration technology allows you to swap FPGA-based functions in and out of your design on the fly, eliminating the need to fully reconfigure the FPGA and re-establish links. Partial reconfigurability gives you the ability to update feature sets in deployed systems, fix bugs, and migrate to new standards while critical functions remain active. This capability dramatically expands the flexible use of Xilinx All Programmable designs in a truly wide variety of applications.
For example, a detailed article published on the WeChat Web site by Siglent about the company’s new, entry-level SDS1000X-E DSO family—based on a Xilinx Zynq Z-7020 SoC—suggests that the new DSO family’s system design employs the Zynq SoC’s partial-reconfiguration capability to further reduce the parts count and the board footprint: “The PL section has 220 DSP slices and 4.9 Mb Block RAM; coupled with high throughput between the PS and PL data interfaces, we have the flexibility to configure different hardware resources for different digital signal processing.” (See “Siglent 200MHz, 1Gsample/sec SDS1000X-E Entry-Level DSO family with 14M sample points is based on Zynq SoC.”)
Siglent’s new, entry-level SDS1000X-E DSO family is based on a Xilinx Zynq Z-7020 SoC
In addition, the Vivado 2017.1 release includes support for the Xilinx Spartan-7 7S50 FPGA (Vivado WebPACK support will be in a later release). The Spartan-7 FPGAs are the lowest-cost devices in the 28nm Xilinx 7 series and they’re optimized for low, low cost per I/O while delivering terrific performance/watt. Compared to Xilinx Spartan-6 FPGAs, Spartan-7 FPGAs run at half the power consumption (for comparable designs) and with 30% more operating frequency. The Spartan-7 S50 FPGA is a mid-sized family member with 52,160 logic cells, 2.7Mbits of BRAM, 120 DSP slices, and 250 single-ended I/O pins. It’s a very capable FPGA. (For more information about the Spartan-7 FPGA family, see “Today, there are six new FPGAs in the Spartan-7 device family. Want to meet them?” and “Hot (and Cold) Stuff: New Spartan-7 1Q Commercial-Grade FPGAs go from -40 to +125°C!”)
Spartan-7 FPGA Family Table
As of today, Amazon Web Services (AWS) has made the FPGA-accelerated Amazon EC2 F1 compute instance generally available to all AWS customers. (See the new AWS video below and this Amazon blog post.) The Amazon EC2 F1 compute instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Each Amazon EC2 F1 compute instance can include as many as eight FPGAs, so you can develop extremely large and capable, custom compute engines with this technology. According to the Amazon video, use of the FPGA-accelerated F1 instance can accelerate applications in diverse fields such as genomics research, financial analysis, video processing (in addition to security/cryptography and machine learning) by as much as 30x over general-purpose CPUs.
Access through Amazon’s FPGA Developer AMI (an Amazon Machine Image within the Amazon Elastic Compute Cloud (EC2)) and the AWS Hardware Developer Kit (HDK) on Github. Once your FPGA-accelerated design is complete, you can register it as an Amazon FPGA Image (AFI), and deploy it to your F1 instance in just a few clicks. You can reuse and deploy your AFIs as many times, and across as many F1 instances as you like and you can list it in the AWS Marketplace.
The Amazon EC2 F1 compute instance reduces the time a cost needed to develop secure, FPGA-accelerated applications in the cloud and has now made access quite easy through general availability.
Here’s the new AWS video with the general-availability announcement:
The Amazon blog post announcing general availability lists several companies already using the Amazon EC2 F1 instance including:
AT&T recently announced the development of a one-of-a-kind 5G channel sounder—internally dubbed the “Porcupine” for obvious reasons—that can characterize a 5G transmission channel using 6000 angle-of-arrival measurements in 150msec, down from 15 minutes using conventional pan/tilt units. These channel measurements capture how wireless signals are affected in a given environment. For instance, channel measurements can show how objects such as trees, buildings, cars, and even people reflect or block 5G signals. The Porcupine allows measurement of 5G mmWave frequencies via drive testing, something that was simply not possible using other mmWave channel sounders. Engineers at AT&T used the mmWave Transceiver System and LabVIEW System Design Software including LabVIEW FPGA from National Instruments (NI) to develop this system.
AT&T “Porcupine” 5G Channel Sounder
NI designed the mmWave Transceiver System as a modular, reconfigurable SDR platform for 5G R&D projects. This prototyping platform offers 2GHz of real-time bandwidth for evaluating mmWave transmission systems using NI’s modular transmit and receive radio heads in conjunction with the transceiver system’s modular PXIe processing chassis.
The key to this system’s modularity is NI’s 18-slot PXIe-1085 chassis, which accepts a long list of NI processing modules as well as ADC, DAC, and RF transceiver modules. NI’s mmWave Transceiver System uses the NI PXIe-7902 FPGA module—based on a Xilinx Virtex-7 485T—for real-time processing.
NI PXIe-7902 FPGA module based on a Xilinx Virtex-7 485T
NI’s mmWave Transceiver System maps different mmWave processing tasks to multiple FPGAs in a software-configurable manner using the company’s LabVIEW System Design Software. NI’s LabVIEW relies on the Xilinx Vivado Design Suite for compiling the FPGA configurations. The FPGAs distributed in the NI mmWave Transceiver System provide the flexible, high-performance, low-latency processing required to quickly build and evaluate prototype 5G radio transceiver systems in the mmWave band—like AT&T’s Porcupine.
By Adam Taylor
Having introduced the Real-Time Clock (RTC) in the Xilinx Zynq UltraScale+ MPSoC, the next step is to write some simple software to set the time, get the time, and calibrate the RTC. Doing this is straightforward and aligns with how we use other peripherals in the Zynq MPSoC and Zynq-7000 SoC.
Like all Zynq peripherals, the first thing we need to do with the RTC is look up the configuration and then use it to initialize the peripheral device. Once we have the RTC initialized, we can configure and use it. We can use the functions provided in the xrtcpsu.h header file to initialize and use the RTC. All we need to do is correctly set up the driver instance and include the xrtcpsu.h header file. If you want to examine the file’s contents, you will find them within the generated BSP for the MPSoC. Under this directory, you will also find all the other header files needed for your design. Which files are available depends upon how you configured the MPSoC in Vivado (e.g. what peripherals are present in the design).
We need to use a driver instance to use the RTC within our software application. For the RTC, that’s XRtcPsu, which defines the essential information such as the device configuration, oscillator frequency, and calibration values. This instance is used in all interactions with the RTC using the functions in the xrtcpsu.h header file.
As I explained last week, the RTC counts the number of seconds, so we will need to convert to and from values in units of seconds. The xrtcpsu.h header file contains several functions to support these conversions. To support this, we’ll use a C structure to hold the real date prior to conversion and loading into the RTC or to hold the resultant conversion date following conversion from the seconds counter.
We can use the following functions to set or read the RTC (which I did in the code example available here):
By convention, the functions used to set the RTC seconds counter is based on a time epoch from 1/1/2000. If we are going to be using internet time, which is often based on a 1/1/1970 epoch by a completely different convention, we will need to convert from one format to another. The functions provided for the RTC only support years between 2000 and 2099.
In the example code, we’ve used these functions to report the last set time before allowing the user to enter the time over using a UART. Once the time has been set, the RTC is calibrated before being re-initialized. The RTC is then read once a second and the values output over the UART giving the image shown at the top of this blog. This output will continue until the MPSoC is powered down.
To really exploit the capabilities provided by the RTC, we need to enable the interrupts. I will look at RTC interrupts in the Zynq MPSoC in the next issue of the MicroZed Chronicles, UltraZed Edition. Once we understand how interrupts work, we can look at the RTC alarms. I will also fit a battery to the UltraZed board to test its operation on battery power.
The register map with the RTC register details can be found here.
My code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
The MEGA65 is an open-source microcomputer modeled on the incredibly successful Commodore 64/65 circa 1982-1990. Ye olde Commodore 64 (C64)—introduced in 1982—was based on an 8-bit MOS Technology 6510 microprocessor, which was derived from the very popular 6502 processor that powered the Apple II, Atari 400/800, and many other 8-bit machines in the 1980s. The 6510 processor added an 8-bit parallel I/O port to the 6502, which no doubt dropped the microcomputer’s BOM cost a buck or two. According to Wikipedia, “The 6510 was only widely used in the Commodore 64 home computer and its variants.” Also according to Wikipedia, “For a substantial period (1983–1986), the C64 had between 30% and 40% share of the US market and two million units sold per year, outselling the IBM PC compatibles, Apple Inc. computers, and the Atari 8-bit family of computers.”
Now that is indeed a worthy computer to serve as a “Jurassic Park” candidate and therefore, the non-profit MEGA (Museum of Electronic Games & Art), “dedicated to the preservation of our digital heritage,” is supervising the physical recreation of the Commodore 64 microcomputer (mega65.org). It’s called the MEGA65 and it’s software-compatible with the original Commodore 64, only faster. (The 6510 processor emulation in the MEGA65 runs at 48MHz compared to the original MOS Technology 6510’s ~1MHz clock rate.) MEGA65 hardware designs and software are open-source (LGPL).
How do you go about recreating the hardware of a machine that’s been gone for 25 years? Fortunately, it’s a lot easier than extracting DNA from the stomach contents of ancient mosquitos trapped in amber. Considering that this blog is appearing in Xcell Daily on the Xilinx Web site, the answer’s pretty obvious: you use an FPGA. And that’s exactly what’s happening.
A few days ago, the MEGA65 team celebrated initial bringup of the MEGA65 pcb. You can read about the bringup efforts here and here is a photo of the pcb:
The first MEGA65 PCB
The MEGA65 pcb is designed to fit into the existing Commodore 65 plastic case. (The Commodore 65 was prototyped but not put into production.)
Sort of gives a new meaning to “single-chip microcomputer,” does it not. That big chip in the middle of the board is an Xilinx Artix-7 A200T. It implements the Commodore 64’s entire motherboard in one programmable logic device. Yes, that includes the RAM. The Artix-7 A200T FPGA has 13.14Mbits of on-chip block RAM. That’s more than 1.5Mbytes of RAM, or 25x more RAM than the original Commodore 64 motherboard, which used eight 4164 64Kbit, 150nsec DRAMs for RAM storage. The video’s a bit improved too, from 160x200 pixels, with a maximum of four colors per 4x8 character block, or 320x200 pixels, with a maximum of two colors per 8x8 character block, to a more modern 1920x1200 pixels with 12-bit color (23-bit color is planned). Funny what 35 years of semiconductor evolution can produce.
What’s the project’s progress status? Here’s a snapshot from the MEGA65 site:
MEGA65 Project Status
And here’s a video of the MEGA65 in action:
Remember, what you see and hear is running on a Xilinx Artix-7 A200T, configured to emulate an entire Commodore 64 microcomputer. Most of the code in this video was written in the Jurassic period of microcomputer development. If you’re of a certain age, these old programs should bring a chuckle or perhaps just a smile to your lips.
Note: You’ll find a MEGA65 project log by Antti Lukats here.
Basic problem: When you’re fighting aliens to save the galaxy wearing your VR headset, having a wired tether to pump the video to your headset is really going to crimp your style. Spin around to blast that battle droid sneaking up on you from behind is just as likely to garrote you as save your neck. What to do? How will you successfully save the galaxy?
Well, NGCodec and Celeno Communications have a demo for you in the NGCodec booth (N2635SP-A) at NAB in the Las Vegas Convention Center next week. Combine NGCodec’s low-latency H.265/HEVC “RealityCodec” video coder/decoder IP with Celeno’s 5GHz 802.11ac WiFi connection and you have a high-definition (2160x1200), high-frame-rate (90 frames/sec) wireless video connection over a 15Mbps wireless channel. This demo uses a 250:1 video compression setting to fit the video into the 15Mbps channel.
In the demo, a RealityCodec hardware instance in a Xilinx Virtex UltraScale+ VU9P FPGA on a PCIe board plugged into a PC running Windows 10 compresses generated video in real time. The PC sends the compressed, 15Mbps video stream to a Celeno 802.11ac WiFi radio, which transmits the video over a standard 5GHz 802.11ac WiFi connection. Another Celeno WiFi radio receives the compressed video stream and sends it to a second RealityCodec for decoding. The decoder hardware is instantiated in a relatively small Xilinx Kintex-7 325T FPGA. The decoded video stream feeding the VR goggles requires 6Gbps of bandwidth, which is why you want to compress it for RF transmission.
Of course, if you’re going to polish off the aliens quickly, you really need that low compression latency. Otherwise, you’re dead meat and the galaxy’s lost. A bad day all around.
Here’s a block diagram of the NAB demo:
You are never going to get past a certain performance barrier by compiling C for a software-programmable processor. At some point, you need hardware acceleration.
As an analogy: You can soup up a car all you want; it’ll never be an airplane.
Sure, you can bump the processor clock rate. You can add processor cores and distribute the tasks. Both of these approaches increase power consumption, so you’ll need a bigger and more expensive power supply; they increase heat generation, which means you will need better cooling and probably a bigger heat sink or a fan (or another fan); and all of these things increase BOM costs.
Are you sure you want to take that path? Really?
OK, you say. This blog’s from an FPGA company (actually, Xilinx is an “All Programmable” company), so you’ll no doubt counsel me to use an FPGA to accelerate these tasks and I don’t want to code in Verilog or VHDL, thank you very much.
Not a problem. You don’t need to.
You can get the benefit of hardware acceleration while coding in C or C++ using the Xilinx SDSoC development environment. SDSoC produces compiled software automatically coupled to hardware accelerators and all generated directly from your high-level C or C++ code.
That’s the subject of a new Chalk Talk video just posted on the eejournal.com Web site. Here’s one image from the talk:
This image shows three complex embedded tasks and the improvements achieved with hardware acceleration:
A beefier software processor or multiple processor cores will not get you 1000x more performance—or even 30x—no matter how you tweak your HLL code, and software coders will sweat bullets just to get a few percentage points of improvement. For such big performance leaps, you need hardware.
Here’s the 14-minute Chalk Talk video:
What do you do if you want to build a low-cost state-of-the-art, experimental SDR (software-defined radio) that’s compatible with GNURadio—the open-source development toolkit and ecosystem of choice for serious SDR research? You might want to do what Lukas Lao Beyer did. Start with the incredibly flexible, full-duplex Analog Devices AD9364 1x1 Agile RF Transceiver IC and then give it all the processing power it might need with an Artix-7 A50T FPGA. Connect these two devices on a meticulously laid out circuit board taking all RF-design rules into account and then write the appropriate drivers to fit into the GNURadio ecosystem.
Sounds like a lot of work, doesn’t it? It’s taken Lukas two years and four major design revisions to get to this point.
Well, you can circumvent all that work and get to the SDR research by signing up for a copy of Lukas’ FreeSRP board on the Crowd Supply crowd-funding site. The cost for one FreeSRP board and the required USB 3.0 cable is $420.
Lukas Lao Beyer’s FreeSRP SDR board based on a Xilinx Artix-7 A50T FPGA
With 32 days left in the Crowd Supply funding campaign period, the project has raised pledges of a little more than $12,000. That’s about 16% of the way towards the goal.
There are a lot of well-known SDR boards available, so conveniently, the FreeSRP Crowd Supply page provides a comparison chart:
If you really want to build your own, the documentation page is here. But if you want to start working with SDR, sign up and take delivery of a FreeSRP board this summer.
By Adam Taylor
So far, we have examined the FPGA hardware build for the Aldec TySOM-2 FPGA Prototyping board example in Vivado, which is a straightforward example of a simple image-processing chain. This hardware design allows an image to be received, stored in DDR SDRAM attached to the Zynq SoC’s PS, and then output to an HDMI display. What the hardware design at the Vivado level does not do is perform any face-detection functions. And to be honest, why would it?
With the input and output paths of the image-processing pipeline defined, we can use the untapped resources of the Zynq SoC’s PL and PS/PL interconnects to create the application at a higher level. We need to use SDSoC to do this, which allows us to develop our design using a higher-level language like C or C++ and then move the defined functionality from the PS into the PL—to accelerate that function.
The Vivado design we examined last week forms an SDSoC Platform, which we can use with the Linux operating system to implement the final design. The use of Linux allows us to use OpenCV within the Zynq SoC’s PS cores to support creation of the example design. If we develop with the new Xilinx reVISION stack, we can go even further and accelerate some of the OpenCV functions.
The face-detection example supplied with the TySOM-2 board implements face detection using a Pixel Intensity Comparison-based Object (PICO) detection framework developed by N Markus et al. The PICO framework scans the image with a cascade of binary classifiers. This PICO-based approach permits more efficient implementations that do not require the computation of integral images, HOG Pyramids, etc.
In this example, we need to define a frame buffer within the device tree blob to allow the Linux application to access the images stored within the Zynq SoC’s PS DDR SDRAM. The Linux application then uses “Video for Linux 2” (V4L2) to access this frame buffer and to allow further processing.
Once we get an image frame from the frame buffer, the software application can process it. The application will do the following things:
Looking at the above functions, not all of them can be accelerated in to the hardware. In this example, the conversion from YUV to greyscale, Sobel Edge Detection, and YUV-to-RGB conversion can be accelerated using the PL to increase performance.
Moving these functions into the PL is as easy as selecting the two functions we wish to accelerate with hardware and then clicking on build to create the example.
Once this was completed, the algorithm ran as expected using both the PS and PL in the Zynq SoC.
Using this approach allows us to exploit both the Zynq SoC’s PL and PS for image processing without the need to implement a fixed RTL design in Vivado. In short, this ability allows us to use a known good platform design to implement image capture and display across several different applications. Meanwhile, the use of SDSoC also allows us to exploit the Zynq SoC’s PL at a higher level without the need to develop the HDL from scratch, reducing development time.
My code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.