By Adam Taylor
In some applications, we wish to maintain the phase relationship between sampled signals. The Zynq SoC’s XADC contains two ADCs, which we can operate simulateneously in lock step to maintain the phase relationship between two sampled signals. To do this, we use the sixteen auxillary inputs with Channels 0-7 assigned to ADC A and channeld 8-15 assigned to ADC B. In simultaneous mode, we can therefore perform conversions on channels 0 to 7 and at the same time, perform conversions on channels 8 to 15.
In simultaneous mode, we can also continue to sample the on-chip parameters, however they are not sampled simultaneously. We are unable to perform automatic calibration in simultaneous mode but we can use another mode to perform calibration when needed. This should be sufficent because calibration is generally performed only on power up of the device for most applications.
To use the simulatenous mode, we first need a hardware design on Vivado that breaks out the AuX0 and AuX8 channels. On the Zedboard and MicroZed I/O Carrier Cards, these signals are broken out to the AMS connector. This allows me to connect signal sources to the AMS header to stimulate the I/O pin with a signal. For this example, I an using a Digilent Analog Discovery module as signal source.
The hardware design within the Zynq for this example appears below:
Writing the software in SDK for simultaneous mode is very similar to the other modes of operation we have used in the past. The only major difference is that we need to make sure we have configured the simultaneous channels in the sequencer. Once this is done and we have configured the input format we want—bipolar or unipolar, averaging, etc.—we can start the sequencer using the XSM_SEQ_MODE_SIMUL mode definition.
When I ran this on the MicroZed set up as shown above and stored 64 samples from both the AUX0 and AUX8 input using input signals that were offset by 180 degrees, I was able to recover the following waveform, which shows the phase relations ship is maintained:
If we want, we can also use simultaneous-conversion mode with an external analog multiplexer. All we need to do is configure the design to use the external mux as we did previously. Perhaps the difference this time is that we need to use two external analog multiplexers because we need to be able to select the two channels to convert simultaneously. Also, we need only use three address bits to cover the 0-7 address range, as opposed four address bits that we needed for addressing all sixteen analog inputs when we previously used sequencer mode. We use the lower three address bits of the four available address bits.
At this point, the only XADC mode that we have not looked at is independent mode. This mode is like the XADC’s default (safe) mode, however in independent mode ADC A monitors the internal on chip parameters while ADC B samples the external inputs. Independent mode is intended to implement a monitoring mode. As such, the alarms are active so you can use this mode for implementing security and anti-tamper features in your design.
Code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
Plethora IIoT develops cutting‑edge solutions to Industry 4.0 challenges using machine learning, machine vision, and sensor fusion. In the video below, a Plethora IIoT Oberon system monitors power consumption, temperature, and the angular speed of three positioning servomotors in real time on a large ETXE-TAR Machining Center for predictive maintenance—to spot anomalies with the machine tool and to schedule maintenance before these anomalies become full-blown faults that shut down the production line. (It’s really expensive when that happens.) The ETXE-TAR Machining Center is center-boring engine crankshafts. This bore is the critical link between a car’s engine and the rest of the drive train including the transmission.
Plethora uses Xilinx Zynq SoCs and Zynq UltraScale+ MPSoCs as the heart of its Oberon system because these devices’ unique combination of software-programmable processors, hardware-programmable FPGA fabric, and programmable I/O allow the company to develop real-time systems that implement sensor fusion, machine vision, and machine learning in one device.
Initially, Plethora IIoT’s engineers used the Xilinx Vivado Design Suite to develop their Zynq-based designs. Then they discovered Vivado HLS, which allows you to take algorithms in C, C++, or SystemC directly to the FPGA fabric using hardware compilation. The engineers’ first reaction to Vivado HLS: “Is this real or what?” They discovered that it was real. Then they tried the SDSoC Development Environment with its system-level profiling, automated software acceleration using programmable logic, automated system connectivity generation, and libraries to speed programming. As they say in the video, “You just have to program it and there you go.”
Here’s the video:
Plethora IIoT is showcasing its Oberon system in the Industrial Internet Consortium (IIC) Pavilion during the Hannover Messe Show being held this week. Several other demos in the IIC Pavilion are also based on Zynq All Programmable devices.
By Adam Taylor
Over the length of this series, we have looked at several different development boards. One thing that is common to many of these boards: they provide one or more Pmod (Peripheral module) connections that allow us to connect small peripherals to our boards. Pmods expand our prototype designs to create final systems. We have not looked in much detail at Pmods but they are an important aspect of many developments. As such, it would be remiss for me not to address them.
The Pmod standard itself was developed by Digilent and is an open-source de facto standard to ensure wide adoption of this very useful interface. There’s a wide range of available Pmods from DA/AD convertors to GPS receivers and OLED displays.
Over the years, we have looked at several Zynq-based boards with at least one Pmod port. In some cases, these boards provide Pmod ports that are connected to either the Zynq SoC’s PL (programmable logic), the PS (processing system), or both. If a PS connection is used, we can use the Zynq SoC’s MIO to provide the interface. If the Pmod connection is to the PL, then we need to create our own interface to the Pmod device. Regardless of whether we use the PL or the PS, we will need a software driver to interface with it.
Various Zynq-based dev boards and their Pmod connections
That comment may initially bring you to the thought that we need to develop our own Pmod drivers from scratch. This of course increases the time it takes to develop the application. For many Pmods, this is not the case. There is wide range of existing drivers we can use for both the PL and PS we can use within our designs.
The first thing we need to do it download the Digilent Vivado library. This library contains several Pmod drivers and DVI sinks and sources plus other very useful IP blocks that can accelerate our design.
Once you have downloaded this library, examine the file structure. You will notice multiple folders under the Pmods folder. Each of these folders is named for an available Pmod (e.g. Pmod_AD2 which is ADC). Within each of these drivers, you will see files structures as shown below:
Within this structure, the folders contain:
The next step, if we wish to use these IP modules, is to include the directory as a repository within our Vivado design. We do this by selecting the project settings within our project. We can add a new repository pointing to the Digilent Vivado library we have just downloaded using the IP settings repository manager tab:
Once this is done, we should be able to see the Pmod IP cores within the Vivado IP Catalog. We can then use these IP cores in within our design in the same way we use all other IP.
Once we have created our block diagram in Vivado, we can customize the Pmod IP blocks and select the Pmod Port they are connected to—assuming the board definition for the development board we are using supports that.
In the case below, which targets the new Digilent ARTY Z7 board, the AD2 Pmod is being connected to Pmod Port B:
If we are unable to find a driver for the Pmod we want to use, we can use the Pmod Bridge driver, which will enable us to create an interface to the desired Pmod with the correct pinout.
When it comes to software, all we need to do is import the files from the drivers/<Pmod_name>/src directory to our SDK project. Adding these files will provide a range of drives that we can use to interface with the Pmod PL instantiation and talk to the connected Pmod. If there is example code available, we will find this under the drivers/<Pmod name>/examples directory. When I ran this example code for the PmodAD2 it worked as expected:
This enables us to get our designs up and running even faster.
My code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
The Vivado Design Suite HLx Editions 2017.1 release is now available for download. The Vivado HL Design Edition and HL System Edition now support partial reconfiguration. Partial reconfiguration is available for the Vivado WebPACK Edition at a reduced price.
Xilinx partial reconfiguration technology allows you to swap FPGA-based functions in and out of your design on the fly, eliminating the need to fully reconfigure the FPGA and re-establish links. Partial reconfigurability gives you the ability to update feature sets in deployed systems, fix bugs, and migrate to new standards while critical functions remain active. This capability dramatically expands the flexible use of Xilinx All Programmable designs in a truly wide variety of applications.
For example, a detailed article published on the WeChat Web site by Siglent about the company’s new, entry-level SDS1000X-E DSO family—based on a Xilinx Zynq Z-7020 SoC—suggests that the new DSO family’s system design employs the Zynq SoC’s partial-reconfiguration capability to further reduce the parts count and the board footprint: “The PL section has 220 DSP slices and 4.9 Mb Block RAM; coupled with high throughput between the PS and PL data interfaces, we have the flexibility to configure different hardware resources for different digital signal processing.” (See “Siglent 200MHz, 1Gsample/sec SDS1000X-E Entry-Level DSO family with 14M sample points is based on Zynq SoC.”)
Siglent’s new, entry-level SDS1000X-E DSO family is based on a Xilinx Zynq Z-7020 SoC
In addition, the Vivado 2017.1 release includes support for the Xilinx Spartan-7 7S50 FPGA (Vivado WebPACK support will be in a later release). The Spartan-7 FPGAs are the lowest-cost devices in the 28nm Xilinx 7 series and they’re optimized for low, low cost per I/O while delivering terrific performance/watt. Compared to Xilinx Spartan-6 FPGAs, Spartan-7 FPGAs run at half the power consumption (for comparable designs) and with 30% more operating frequency. The Spartan-7 S50 FPGA is a mid-sized family member with 52,160 logic cells, 2.7Mbits of BRAM, 120 DSP slices, and 250 single-ended I/O pins. It’s a very capable FPGA. (For more information about the Spartan-7 FPGA family, see “Today, there are six new FPGAs in the Spartan-7 device family. Want to meet them?” and “Hot (and Cold) Stuff: New Spartan-7 1Q Commercial-Grade FPGAs go from -40 to +125°C!”)
Spartan-7 FPGA Family Table
AT&T recently announced the development of a one-of-a-kind 5G channel sounder—internally dubbed the “Porcupine” for obvious reasons—that can characterize a 5G transmission channel using 6000 angle-of-arrival measurements in 150msec, down from 15 minutes using conventional pan/tilt units. These channel measurements capture how wireless signals are affected in a given environment. For instance, channel measurements can show how objects such as trees, buildings, cars, and even people reflect or block 5G signals. The Porcupine allows measurement of 5G mmWave frequencies via drive testing, something that was simply not possible using other mmWave channel sounders. Engineers at AT&T used the mmWave Transceiver System and LabVIEW System Design Software including LabVIEW FPGA from National Instruments (NI) to develop this system.
AT&T “Porcupine” 5G Channel Sounder
NI designed the mmWave Transceiver System as a modular, reconfigurable SDR platform for 5G R&D projects. This prototyping platform offers 2GHz of real-time bandwidth for evaluating mmWave transmission systems using NI’s modular transmit and receive radio heads in conjunction with the transceiver system’s modular PXIe processing chassis.
The key to this system’s modularity is NI’s 18-slot PXIe-1085 chassis, which accepts a long list of NI processing modules as well as ADC, DAC, and RF transceiver modules. NI’s mmWave Transceiver System uses the NI PXIe-7902 FPGA module—based on a Xilinx Virtex-7 485T—for real-time processing.
NI PXIe-7902 FPGA module based on a Xilinx Virtex-7 485T
NI’s mmWave Transceiver System maps different mmWave processing tasks to multiple FPGAs in a software-configurable manner using the company’s LabVIEW System Design Software. NI’s LabVIEW relies on the Xilinx Vivado Design Suite for compiling the FPGA configurations. The FPGAs distributed in the NI mmWave Transceiver System provide the flexible, high-performance, low-latency processing required to quickly build and evaluate prototype 5G radio transceiver systems in the mmWave band—like AT&T’s Porcupine.
By Adam Taylor
Having introduced the Real-Time Clock (RTC) in the Xilinx Zynq UltraScale+ MPSoC, the next step is to write some simple software to set the time, get the time, and calibrate the RTC. Doing this is straightforward and aligns with how we use other peripherals in the Zynq MPSoC and Zynq-7000 SoC.
Like all Zynq peripherals, the first thing we need to do with the RTC is look up the configuration and then use it to initialize the peripheral device. Once we have the RTC initialized, we can configure and use it. We can use the functions provided in the xrtcpsu.h header file to initialize and use the RTC. All we need to do is correctly set up the driver instance and include the xrtcpsu.h header file. If you want to examine the file’s contents, you will find them within the generated BSP for the MPSoC. Under this directory, you will also find all the other header files needed for your design. Which files are available depends upon how you configured the MPSoC in Vivado (e.g. what peripherals are present in the design).
We need to use a driver instance to use the RTC within our software application. For the RTC, that’s XRtcPsu, which defines the essential information such as the device configuration, oscillator frequency, and calibration values. This instance is used in all interactions with the RTC using the functions in the xrtcpsu.h header file.
As I explained last week, the RTC counts the number of seconds, so we will need to convert to and from values in units of seconds. The xrtcpsu.h header file contains several functions to support these conversions. To support this, we’ll use a C structure to hold the real date prior to conversion and loading into the RTC or to hold the resultant conversion date following conversion from the seconds counter.
We can use the following functions to set or read the RTC (which I did in the code example available here):
By convention, the functions used to set the RTC seconds counter is based on a time epoch from 1/1/2000. If we are going to be using internet time, which is often based on a 1/1/1970 epoch by a completely different convention, we will need to convert from one format to another. The functions provided for the RTC only support years between 2000 and 2099.
In the example code, we’ve used these functions to report the last set time before allowing the user to enter the time over using a UART. Once the time has been set, the RTC is calibrated before being re-initialized. The RTC is then read once a second and the values output over the UART giving the image shown at the top of this blog. This output will continue until the MPSoC is powered down.
To really exploit the capabilities provided by the RTC, we need to enable the interrupts. I will look at RTC interrupts in the Zynq MPSoC in the next issue of the MicroZed Chronicles, UltraZed Edition. Once we understand how interrupts work, we can look at the RTC alarms. I will also fit a battery to the UltraZed board to test its operation on battery power.
The register map with the RTC register details can be found here.
My code is available on Github as always.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
By Adam Taylor
Having introduced the Aldec TySOM-2 FPGA Prototyping Board, based on the Xilinx Zynq SoC, and the face detection application running on it, I thought it would be a good idea to take a more detailed examination of the face-detection application’s architecture.
The face detection example uses one Blue Eagle camera, which is connected to the Aldec FMC-ADAS card. The processed frames showing the detected face are output via the TySOM-2 board’s HDMI port. What is worth pointing out is that the application running on the TySOM-2 board, face detection in this case, is enabled by the software. The Zynq PL (programmable logic) hardware design provides the capability to interface with the camera, for sharing the video frames with the Zynq PS (processing system) through the DDR SDRAM, and for display output.
Any application could be implemented—not just face detection. It could be object tracking. I could be corner detection. It could be anything. This is one of the things that makes development of image-processing systems on the Zynq so powerful. We can use the same base platform on the TySOM-2 board and customize the application in software. Of course, we can also use the Xilinx SDSoC development environment to further accelerate the algorithm into the TySOM-2 platform’s remaining resources to increase performance.
The Blue Eagle camera transmits the video stream using a, FPD-Link III link. These links use a high-speed, bi-directional CML (Current Mode Logic) link to transfer the image data. An FPD-Link III receiving device (a TI DS90UB914Q-Q1 FPD-Link III SER/DES) is used on the ADAS FMC to implement this camera interface. This device is configured for the application in hand using the I2C peripheral in the Zynq SoC’s PS. This device provides video to the Zynq PL in a parallel format: the parallel data bits, HSync, VSync, and a pixel clock.
We need to process the frames and store them within the Zynq PS’ DDR SDRAM using Video DMA (Direct Memory Access) to ensure that we can access the image frames within DDR memory using the Zynq SoC’s ARM Cortex-A9 processor. We need to use several IP blocks that come as standard IP within Vivado to implement this. These IP blocks transfer data using the AXI streaming protocol--AXIS.
Therefore, the first thing needed is to convert the received video in parallel format into an AXIS stream. Once the video is in the correct format, we can use the VDMA IP block to transfer video data to and from the Zynq PS’ DDR SDRAM, where the software running on the Zynq SoC’s ARM Cortex-A9 processors can access the frames and implement the application algorithms.
Unlike previous examples we have examined, which used a single AXI High Performance (AXI HP) port, this example uses two of the Zynq SOC’s AXI HP interface ports, one in each direction. This configuration requires a slightly more complicated DMA architecture because we’ll need two VDMA IP Blocks. Within the Zynq PL, the AXI standard used for most IP blocks is AXI 4.0 while the ports on the Zynq SoC implement AXI 3.0. Therefore, we need to use an AXI Interconnect or a protocol convertor to convert between the two standards.
This use of two interfaces will make no performance difference when compared to a single HP AXI interface because the S0 and S1 AXI HP Ports on the Zynq SoC which are used by this configuration are multiplexed down to the M0 port on the memory interconnect and finally connected to the S3 port on the DDR SDRAM controller. This is shown below in the interconnection diagram from UG585, the TRM for the Zynq SoC.
Once the VDMA is implemented, the design then perform color-space conversion, chroma resampling, and finally passes to an on-screen display module. Once this has been completed, the video stream must be converted from AXIS to parallel video, which can then be output to the HDMI transmitter.
With this hardware platform completed, the next step is to write the software to create the application. For this we have the choice of using SDK or using SDSoC, which adds the ability to accelerate some of the application algorithm functions using programmable logic. As this example is implemented on the Zynq Z-7100 SoC, which has a significant amount of free, on-chip programmable resources following the implementation of the base platform, we’ll be using SDSoC for this example. We will look at the software architecture next time.
My code is available on Github as always.
If you have a mere 14 minutes to spare, you can watch this new video that will show you how to set up the Zynq UltraScale+ MPSoC’s hardened, embedded PCIe block as a Root Port using the Vivado Design Suite. The target system is the ZCU102 eval kit (currently on sale for half price) and the video shows you how to use the PetaLinux tools to connect to a PCIe-connected NVMe SSD.
This is a fast, painless way to see a complete set of Xilinx development tools being used to create a fully operational system based on the Zynq UltraScale+ MPSoC in less than a quarter of an hour.
Hardent, A Xilinx Approved Training Provider, is conducting a free 1-hour Webinar that will discuss best practices for Constraints Management using the the Vivado Design Suite on April 18. Topic covered will include:
By Adam Taylor
We have looked at the XADC several times within this series. One thing we have not examined is how to use the external analog multiplexer capability. This is an oversight on my part as it can be very useful when we are architecting our system. With the XADC we can interface with up to 17 analog inputs: one dedicated Vp/Vn pair of inputs and sixteen auxiliary differential input pairs which share pins with the logic IO. This means that we can sample up to 17 different analog signals along with the device’s on-chip supply voltages and temperatures. This does of course does require the use of as many as 34 I/O pins, which can be challenging on some pin-constrained devices or designs.
The use of an external multiplexor provides us with the ability to sample up to 16 analog inputs. We need only 4 I/O lines for the multiplexer address as the Vp/Vn pair are dedicated and are outside of the multiplexer address. Note that we are not limited to using only the Vp/Vn pair for analog inputs. You can use any of the auxiliary inputs as well.
To demonstrate how we do this, the first thing with need is a Vivado design with the XADC set up to allow an external mux. We can do this on the ADC setup tab of the XADC wizard. We can also select which analog inputs are being used with the external mux. If we already have a design with the XADC enabled, we can use the AXI interface to configure it.
With the wider Vivado design, I am going to include some ILAs (Integrated Logic Analyzers) so that we can see what is happening internally and I am going to connect the mux pins from the FPGA to the ZedBoard AMS header GPIO pins and into a logic analyzer so that we can see they are changing as would be the case when driving an external mux.
Implementing this within the software is very similar to how we previously did this for the XADC. The first step is to configure the XADC as we would if we were using the internal mux capability. However, when we want to use the external mux we need to consider the information within UG480 and particularly the diagram below:
To use an external mux, we therefore need to do the following in addition to our normal approach:
Once these have been configured, we set the XADC sampling by setting the sequencer mode to continuous pass. This will then sequence the external mux pins around the inputs desired as shown below in the ILA capture when all 16 aux inputs are sampled.
The eagle-eyed will have noticed there are 16 external inputs which requires 4 pins but the external mux address provides 5 pins. To connect these to an external multiplexer we need to connect only the lower four bits of the address.
Just as we do when the internal mux is used, the sampled data from the conversion will be in the appropriate register and not in the Vp/Vn aux conversion register (e.g. aux 0 will be in aux 0, aux 1 in aux 1 and so on).
An external analog mux therefore allows us to monitor nearly the same number of analog signals with a much-reduced pin count. There is also another trick we can do with the XADC, which we will look, soon.
Code is available on Github as always.
This short, 2-minute video shows a live Spartan-7 7S50 FPGA operating on a board, running a MicroBlaze soft processor connected to DDR3 SDRAM as a demo. The 28nm Spartan-7 device family comes in small form-factor packages—as small as 8x8mm. You design systems based on Spartan-7 devices with the Xilinx Vivado HL Design Suite tools.
These devices are available for ordering now and operators are standing by.
InnoRoute has just started shipping its TrustNode extensible, ultra-low-latency (2.5μsec) IPv6 OpenFlow SDN router as a pcb-level product. The design combines a 1.9GHz, quad-core Intel Atom processor running Linux with a Xilinx FPGA to implement the actual ultra-low-latency router hardware. (You’re not implementing that as a Linux app running on an Atom processor!) The TrustNode Router reference design features twelve GbE ports. Here’s a photo of the TrustNode SDN Router board:
InnoRoute TrustNode SDN Router Board with 12 GbE ports
Based on the pcb layout in the photo, it appears to me that the Xilinx FPGA implementing the 12-port SDN router is under that little black heatsink in the center of the board nearest to all of the Ethernet ports while the quad-core processor running Linux must be sitting there in the back under that great big silver heatsink with an auxiliary cooling fan, near the processor-associated USB ports and SDcard carrier.
InnoRoute’s TrustNode Web page is slightly oblique as to which Xilinx FPGA is used in this design but the description sort of winnows the field. First, the description says that you can customize InnoRoute’s TrustNode router design using the Xilinx Vivado HL Design Suite WebPACK Edition—which you can download at no cost—so we know that the FPGA must be a 28nm series 7 device or newer. Next, the description says that the design uses 134.6k LUTs, 269.2k flip-flops, and 12.8Mbits of BRAM. Finally, we see that the FPGA must be able to handle twelve Gigabit Ethernet ports.
The Xilinx FPGA that best fits this description is an Artix-7 A200.
You can use this TrustNode board to jump into the white-box SDN router business immediately, or at least as fast as you can mill and drill an enclosure and screen your name on the front. In fact, InnoRoute has kindly created a nice-looking rendering of a suggested enclosure design for you:
InnoRoute TrustNode SDN Router (rendering)
The router’s implementation as IP in an FPGA along with the InnoRoute documentation and the Vivado tools mean that you can enhance the router’s designs and add your special sauce to break out of the white box. (White Box Plus? White Box Permium? White Box Platinum? Hey, I’m from marketing and I’m here to help.)
This design enhancement and differentiation are what Xilinx All Programmable devices are especially good at delivering. You are not stuck with some ASSP designer’s concept of what your customers need. You can decide. You can differentiate. And you will find that many customers are willing to pay for that differentiation.
Note: Please contact InnoRoute directly for more information on the TrustNode SDN Router.
By Adam Taylor
At the end of the Sysmon AMS blogs I had introduced the several PLLs within the Zynq UltraScale+ MPSoC. This introduction suggests to me that it’s time to talk about the clocking architecture of the MPSoC Device.
As with the original Zynq SoC, the PS (processing system) in the Zynq UltraScale+ MPSoC is the system master. So we will initially focus upon its clocking architecture. Within the PS there are three main clock inputs:
While the PS reference clock has a dedicated input pin, the PSS_ALT_REF_CLK and PSS_VIDEO_REF_CLK are input via the MIO and are enabled or disabled in Vivado by the I/O configuration customization tab. If we plan on using these clocks, we need to ensure there is no conflict with other planned use of the MIO.
Enabling the Alternate reference clock and the video clock
Once these have been enabled, we can configure them on the clock configuration input clock tab as shown below:
Internally, the PS has four clock groups that provide all the required clocks:
We’ll now focus on the MCG as this is the group with which we will have the most interaction. Within this group, we choose which of the five PLLs is used to clock the Zynq UltraScale+ MPSoC’s processors and peripherals within the LPD and FPD. We can do this via the clock configuration -> output clocks tab. Here we can configure the domains clocking for both the low and full power domains.
To generate a PLL output frequency as closely as possible to the desired frequency, we may want to change the PLL input-clock source. We have several potential clock sources which can be used to clock each of the PLLs within the Zynq UltraScale+ MPSoC.
As mentioned above we can use PS_REF_CLK, PS_ALT_REF_CLK, or PS_VIDEO_REF_CLK. These clocks are directly input into the PS. We can also use one of the four GT_REF_CLKS or the AUX_REF_CLK. This latter reference clock is provided from the PL while the former clock is provided by the PS_GTR. The relevant PLL control register selects which of these clocks drives the PLL. These registers reside in either the CRL_APB module for low-power domain PLLs or CRF_APB module for high-power domain PLLs.
We can select which of the four GT reference clocks is provided as the GT_REF_CLK using the Serial Input Output Unit (SIOU) module CRX_CNTRL Register.
Now that we understand the Zynq UltraScale+ MPSoC’s clocking and how we set the desired frequency for each of the subsystems, we will explore the subsystems in more detail in the MicroZed Chronicles blogs that follow.
I’ve known this was coming for more than a week, but last night I got double what I expected. Digilent’s Web site has been teasing the new Arty Z7 Zynq SoC dev board for makers and hobbyists for a week—but with no listed price. Last night, prices appeared. That’s right, there are two versions of the board available:
Digilent Arty Z7 dev board for makers and hobbyists
Other than that, the board specs appear identical.
The first thing you’ll note from the photo is that there’s a Zynq SoC in the middle of the board. You’ll also see the board’s USB, Ethernet, Pmod, and HDMI ports. On the left, you can see double rows of tenth-inch headers in an Arduino/chipKIT shield configuration. There are a lot of ways to connect to this board, which should make it a student’s or experimenter’s dream board considering what you can do with a Zynq SoC. (In case you don’t know, there’s a dual-core ARM Cortex-A9 MPCore processor on the chip along with a hearty serving of FPGA fabric.)
Oh yeah. The Xilinx Vivado HL Design Suite WebPACK tools? Those are available at no cost. (So is Digilent’s attractive cardboard packaging, according to Arty Z7 Web page.)
Although the Arty Z7 board has now appeared on Digilent’s Web site, the product’s Web page says the expected release date is March 27. That’s five whole days away!
As they say, operators are standing by.
Please contact Digilent directly for more Arty Z7 details.
By Adam Taylor
In looking at the Zynq UltraScale+ MPSoC’s AMS capabilities so far, we have introduced the two slightly different Sysmon blocks residing within the Zynq UltraScale+ MPSoC’s PS (processing system) and PL (programmable logic). In this blog, I am going to demonstrate how we can get the PS Symon up and running when we use both the ARM Cortex-A53 and Cortex-R5 processor cores in the Zynq UltraScale+ MPSoC’s PS. There is little difference when we use both types of processor, but I think it important to show you how to use both.
The process to use the Sysmon is the same as it is for many of the peripherals we have looked at previously with the MicroZed Chronicles:
The function names in parentheses are those which we use to perform the operation we desire, provided we pass the correct parameters. In the simplest case, as in this example, we can then poll the output registers using the XSysMonPsu_GetAdcData() function. All of these functions are defined within the file xsysmonpsu.h, which is available under the board Support Package Lib Src directory in SDK.
Examining the functions, you will notice that each of the functions used in step 4 to 8 require an input parameter called SysmonBlk. You must pass this parameter to the function. This parameter is how we which Sysmon (within the PS or the PL) we want to address. For this example, we will be specifying the PS Sysmon using XSYSMON_PS, which is also defined within xsysmonpsu.h. If we want to address the PL, we use the XSYSMON_PL definition, which we will be looking at next time.
There is also another header file which is of use and that is xsysmonpsu_hw.h. Within this file, we can find the definitions required to correctly select the channels we wish to sample in the sequencer. These are defined in the format:
This simple example samples the following within the PS Sysmon:
We can use conversion functions provided within the xsysmonpsu.h to convert from the raw value supplied by the ADC into temperature and voltage. However, the PS IO banks are capable of supporting 3v3 logic. As such, the conversion macro from raw reading to voltage is not correct for these IO banks or for the HD banks in the PL. (We will look at different IO bank types in another blog).
The full-scale voltage is 3V for most of the voltage conversions. However, in line with UG580 Pg43, we need to use a full scale of 6V for the PS IO. Otherwise we will see a value only half of what we are expecting for that bank’s supply voltage setting. With this in mind, my example contains a conversion function at the top of the source file to be used for these IO banks, to ensure that we get the correct value.
The Zynq UltraScale+ MPSoC architecture permits both the APU (the ARM Cortex-A53 processors) and the RPU (the ARM Cortex-R5 processors) to address the Sysmon. To demonstrate this, the same file was used in applications first targeting an ARM Cortex-A53 processor in the APU and then targeting the ARM Cortex-R5 processor in the RPU. I used Core 0 in both cases.
The only difference between these two cases was the need to create new applications that select the core to be targeted and then updating the FSBL to load the correct core. (See “Adam Taylor’s MicroZed Chronicles, Part 172: UltraZed Part 3—Saying hello world and First-Stage Boot” for more information on how to do this.)
Results when using the ARM Cortex-A53 Core 0 Processor
Results when using the ARM Cortex-R5 Core 0 Processor
When I ran the same code, which is available in the GitHub directory, I received the examples as above over the terminal program, which show it working on both the ARM Cortex-A53 and ARM Cortex-R5 cores.
Next time we will look at how we can use the PL Sysmon.
Code is available on Github as always.
By Adam Taylor
Embedded vision is one of my many FPGA/SoC interests. Recently, I have been doing some significant development work with the Avnet Embedded Vision Kit (EVK) significantly (for more info on the EVK and its uses see Issues 114 to 126 of the MicroZed Chronicles). As part my development, I wanted to synchronize the EVK display output with an external source—also useful if we desire to synchronize multiple image streams.
Implementing this is straight forward provided we have the correct architecture. The main element we need is a buffer between the upstream camera/image sensor chain and the downstream output-timing and -processing chain. VDMA (Video Direct Memory Access) provides this buffer by allowing us to store frames from the upstream image-processing pipeline in DDR SDRAM and then reading out the frames into a downstream processing pipeline with different timing.
The architectural concept appears below:
VDMA buffering between upstream and downstream with external sync
For most downstream chains, we use a combination of the video timing controller (VTC) and AXI Stream to Video Out IP blocks, both provided in the Vivado IP library. These two IP blocks work together. The VTC provides output timing and generates signals such as VSync and HSync. The AXI Stream to Video Out IP Block synchronizes its incoming AXIS stream with the timing signals provided by the VTC to generate the output video signals. Once the AXI Stream to Video Out block has synchronized with these signals, it is said to be locked and it will generate output video and timing signals that we can use.
The VTC itself is capable of both detecting input video timing and generating output video timing. These can be synchronized if you desire. If no video input timing signals are available to the VTC, then the input frame sync pulse (FSYNC_IN) serves to synchronize the output timing.
Enabling Synchronization with FSYNC_IN or the Detector
If FSYNC_IN alone is used to synchronize the output, we need to use not only FSYNC_IN but also the VTC-provided frame sync out (FSYNC_OUT) and GEN_CLKEN to ensure correct synchronization. GEN_CLKEN is an input enable that allows the VTC generator output stage to be clocked.
The FSYNC_OUT pulse can be configured to occur at any point within the frame. For this application, is has been configured to be generated at the very end of the frame. This configuration can take place in the VTC re-configuration dialog within Vivado for a one-time approach or, if an AXI Lite interface is provided, it can be positioned using that during run time.
The algorithm used to synchronize the VTC to an external signal is:
Should GEN_CLK not be disabled, the VTC will continue to run freely and will generate the next frame sequence. Issuing another FSYNC_IP while this is occurring will not result in re-synchronisation but will result in the AXI Stream to Video Out IP block being unable to synchronize the AXIS video with the timing information and losing lock.
Therefore, to control the enabling of the GEN_CLKEN we need to create a simple RTL block that implements the algorithm above.
Vivado Project Demonstrating the concept
When simulated, this design resulted in the VTC synchronizing to the FSYNC_IN signal as intended. It also worked the same when I implemented it in my EVK kit, allowing me to synchronize the output to an external trigger.
Code is available on Github as always.
MRAM (magnetic RAM) maker Everspin wants to make it easy for you to connect its 256Mbit DDR3 ST-MRAM devices (and it’s soon-to-be-announced 1Gbit ST-MRAMs) to Xilinx UltraScale FPGAs, so it now provides a software script for the Vivado MIG (Memory Interface Generator) that adapts the MIG DDR3 controller to the ST-MRAM’s unique timing and control requirements. Everspin has been shipping MRAMs for more than a decade and, according to this EETimes.com article by Dylan McGrath, it’s still the only company to have shipped commercial MRAM devices.
Nonvolatile MRAM’s advantage is that it has no wearout failure, as opposed to Flash memory for example. This characteristic gives MRAM huge advantages over Flash memory in applications such as server-class enterprise storage. MRAM-based storage cards require no wear leveling and their read/write performance does not degrade over time, unlike Flash-based SSDs.
As a result, Everspin also announced its nvNITRO line of NVMe storage-accelerator cards. The initial cards, the 1Gbyte nvNITRO ES1GB and 2Gbyte nvNITRO ES2GB, deliver 1,500,000 IOPS with 6μsec end-to-end latency. When Everspin's 1Gbit ST-MRAM devices become available later this year, the card capacities will increase to 4 to 16Gbytes.
Here’s a photo of the card:
Everspin nvNITRO Storage Accelerator
If it looks familiar, perhaps you’re recalling the preview of this board from last year’s SC16 conference in Salt Lake City. (See “Everspin’s NVMe Storage Accelerator mixes MRAM, UltraScale FPGA, delivers 1.5M IOPS.”)
If you look at the photo closely, you’ll see that the hardware platform for this product is the Alpha Data ADM-PCIE-KU3 PCIe accelerator card, loaded 1 or 2Gbyte Everspin ST-MRAM DIMMs. Everspin has added its own IP to the Alpha Data card, based on a Kintex UltraScale KU060 FPGA, to create an MRAM-based NVMe controller.
As I wrote in last year’s post:
“There’s a key point to be made about a product like this. The folks at Alpha Data likely never envisioned an MRAM-based storage accelerator when they designed the ADM-PCIE-KU3 PCIe accelerator card but they implemented their design using an advanced Xilinx UltraScale FPGA knowing that they were infusing flexibility into the design. Everspin simply took advantage of this built-in flexibility in a way that produced a really interesting NVMe storage product.”
It’s still an interesting product, and now Everspin has formally announced it.
By Adam Taylor
Without a doubt, some of the most popular MicroZed Chronicles blogs I have written about the Zynq 7000 SoC explain how to use the Zynq SoC’s XADC. In this blog, we are going to look at how we can use the Zynq UltraScale+ MPSoC’s Sysmon, which replaces the XADC within the MPSoC.
The MPSoC contains not one but two Sysmon blocks. One is located within the MPSoC’s PS (processing system) and another within the MPSoC’s PL (programmable logic). The capabilities of the PL and PS Sysmon blocks are slightly different. While the processors in the MPSoC’s PS can access both Sysmon blocks through the MPSoC’s memory space, the different Sysmon blocks have different sampling rates and external interfacing abilities. (Note: the PL must be powered up before the PL Sysmon can be accessed by the MPSoC’s PS. As such, we should check the PL Sysmon control register to ensure that it is available before we perform any operations that use it.)
The PS Sysmon samples its inputs at 1Msamples/sec while the PL Sysmon has a reduced sampling rate of 200Ksamples/sec. However, the PS Sysmon does not have the ability to sample external signals. Instead, it monitors the Zynq MPSoC’s internal supply voltages and die temperature. The PL Sysmon can sample external signals and it is very similar to the Zynq SoC’s XADC, having both a dedicated VP/VN differential input pair and the ability to interface to as many as sixteen auxiliary differential inputs. It can also monitor on-chip voltage supplies and temperature.
Sysmon Architecture within the Zynq UltraScale+ MPSoC
Just as with the Zynq SoC’s XADC, we can set upper and lower alarm limits for ADC channels within both the PL and PS Sysmon in the Zynq UltraScale+ MPSoC. You can use these limits to generate an interrupt should the configured bound be exceed. We will look at exactly how we can do this in another blog once we understand the basics.
The two diagrams below show the differences between the PS and PL Sysmon blocks in the Zynq UltraScale+ MPSoC:
Zynq UltraScale+ MPSoC’s PS System Monitor (UG580)
Zynq UltraScale+ MPSoC’s PL Sysmon (UG580)
Interestingly, the Sysmone4 block in the MPSoC’s PL provides direct register access to the ADC data. This will be useful if using either the VP/VN or Aux VP/VN inputs to interface with sensors that do not require high sample rates. This arrangement permits downstream signal processing, filtering, and transfer functions to be implemented in logic.
Both MPSoC Sysmon blocks require 26 ADC clock cycles to perform a conversion. Therefore, if we are sampling at 200Ksamlpes/sec, using the PL Sysmon we require a 5.2MHz ADC clock. For the PS Sysmon to sample at 1Msamples/sec, we need to provide a 26MHz ADC clock.
We set the AMS modules’ clock within the MPSoC Clock Configuration dialog, as shown below:
Zynq UltraScale+ MPSoC’s AMS clock configuration
The eagle-eyed will notice that I have set the clock to 52MHz and not 26 MHz. This is because the PS Sysmon’s clock divisor has a minimum value of 2, so setting the clock to 52MHz results in the desired 26MHz clock. The minimum divisor is 8 for the PL Sysmon, although in this case it would need to be divided by 10 to get the desired 5.2MHz clock. You also need to pay careful attention to the actual frequency and not just the requested frequency to get the best performance. This will impact the sample rate as you may not always get the exact frequency you want—as is the case here.
Next time in the UltraZed Edition of the MicroZed Chronicles, we will look at the software required to communicate with both the PS and PL Symon in the Zynq UltraScale+ MPSoC.
Code is available on Github as always.
Last September at the GNU Radio Conference (GRCon16) in Boulder, CO, Ettus Research announced its RFNoC & Vivado HLS Challenge with a $10,000 grand prize for developing “innovative and useful open-source RF Network on Chip (RFNoC) blocks that highlight the productivity and development advantage of Xilinx Vivado High-Level Synthesis (HLS) for FPGA programming using C, C++, or System C. The new RFNoC blocks generated during the challenge will add to the rapidly growing library of available open-source blocks for programming FPGAs in SDR development and production.”
Based on formal proposals, the company has now accepted seven teams for the challenge:
The final challenge competition will take place in May or June 2017 (venue to be announced) and the teams are required to submit technical papers for publication in the GRCon17 technical proceedings outlining their design’s contribution, implementation, results, and lessons learned. (GRCon17 takes place on September 11-15, 2017 in San Diego, CA.)
For more information about the challenge, see “Matt Ettus of Ettus Research wants you to win $10K. All you have to do is meet his RFNoC & Vivado HLS challenge for SDR.”
If you’re still uncertain as to what System View’s Visual System Integrator hardware/software co-development tool for Xilinx FPGAs and Zynq SoCs does, the following 3-minute video should make it crystal clear. Visual System Integrator extends the Xilinx Vivado Design Suite and makes it a system-design tool for a wide variety of embedded systems based on Xilinx devices.
This short video demonstrates System View’s tool being used for a Zynq-controlled robotic arm:
For more information about System View’s Visual System Integrator hardware/software co-development tool, see:
After five years and a dozen prototypes, the Haddington Dynamics development team behind Dexter—a $3K, trainable, 5-axis robotic arm kit for personal manufacturing—launched the project on Kickstarter just yesterday and are already 41.6% of the way to meeting the overall $100K project funding goal with 28 days left in the funding period. Dexter is designed to be a personal robot arm with the ability to make a wide variety of goods. Think of Dexter as your personal robotic factory with additive (2.5D/3D printing) and subtractive (drilling and milling) capabilities.
Dexter incorporates a 6-channel motor controller but the arm itself uses five stepper motors for positioning. Adding a gripper or other end-effector to the end of the arm adds a 6th degree of freedom.
Dexter Robotic Arm 3D CAD Drawing
You need some hefty, high-performance computation to precisely coordinate five axes of motion and the current Dexter prototype employs programmable logic in the form of a Xilinx Zynq Z-7000 SoC on an Avnet MicroZed dev board for this task. (The Kickstarter page even shows an IP block diagram from the Vivado Design Suite.)
The Dexter team calls the Zynq SoC an FPGA supercomputer:
“By using a(n) FPGA supercomputer to solve the precision control problem, we were able to optimize the physical and electrical architecture of the robot to minimize the mass and therefore the power requirements. All 5 of the stepper motors are placed at strategic locations to lower the center of mass and to statically balance the arm. This way almost all of the torque of the motors is used to move the payload not the robot.”
The prototype design achieves 50-micron repeatability!
Here’s a video of the prototype Dexter robotic arm in development, including a shot of the robotic arm threading a needle:
There are several more videos on the Dexter Kickstarter page.
By Adam Taylor
We have now built a basic Zynq UltraScale+ MPSoC hardware design for the UltraZed board in Vivado that got us up and running. We’ve also started to develop software for the cores within the Zynq UltraScale+ MPSoC’s PS (processor system). The logical next step is to generate a simple “hello world” program, which is exactly what we are going to do for one of the cores in the Zynq UltraScale+ MPSoC’s APU (Application Processing Unit).
As with the Zynq Z-7000 SoC, we need three elements to create a simple bare-metal program for the Zynq UltraScale+ MPSoC:
To create a new hardware platform definition, select:
File-> New -> Other -> Xilinx – Hardware Platform Specification
Provide a project name and select the hardware definition file, which was exported from Vivado. You can find the exported file within the SDK directory if you exported it local to the project.
Creating the Hardware platform
Once the hardware platform has been created within SDK, you will see the hardware definition file opens within the file viewer. Browsing through this file, you will see the address ranges of the Zynq UltraScale+ MPSoC’s ARM Cotex-A53 and Cortex-R5 processors and PMU (Performance Monitor Unit) cores within the design. A list of all IP within the processors’ address space appears at the very bottom of the file.
Hardware Platform Specification in SDK file browser
We then use the information provided within the hardware platform to create a BSP for our application. We create a new application by selecting:
File -> New -> Board Support Package
Within the create BSP dialog, we can select the processor this BSP will support, the compiler to be used, and the selected OS, In this case, we’ll use bare metal or FreeRTOS.
For this first example, we will be running the “hello world” program from the APU on processor core 0. We must be sure to target the same core as we create the BSP and application if everything is to function correctly.
Board Support Package Generation
With the BSP created, the next step is to create the application using this BSP. We can create the application in a similar manner to the BSP and hardware platform:
File -> New -> Application Project
This command opens a dialog that allows us to name the project, select the BSP, specify the processor core, and select operating system. On the first tab of the dialog, configure these settings for APU core 0, bare metal, and the BSP just created. On the second tab of the dialog box, select the pre-existing “hello world” application.
Configuring the application
Selecting the Hello World Application
At this point, we have the application ready to run on the UltraZed dev board. We can run the application using either the debugger within SDK or we can boot the device from a non-volatile memory such as an SD card.
To boot from an SD Card, we need to first create a first-stage boot loader (FSBL). To do this, we follow the same process as we do when creating a new application. The FSBL will be based on the current hardware platform but it will have its own BSP with several specific libraries enabled.
Select File -> New -> Application Project
Enter a project name and select the core and OS to support the current build as previously done for the “hello world” application. Click the “Create New” radio button for the BSP and then on the next page, select the Zynq MP FSBL template.
Configuring the FSBL application
Selecting the FSBL template
With the FSBL created, we now need to build all our applications to create the required ELF files for the FSBL and the application. If SDK is set to build automatically, these files will have been created following the creation of the FSBL. If not, then select:
Project -> Build All
Once this process completes, the final step is to create a boot file. The Zynq UltraScale+ MPSoC boots from a file named boot.bin, created by SDK. This file contains the FSBL, FPGA programming file, and the applications. We can create this file by hand and indeed later in this series we will be doing so to examine the more advanced options. However, for the time being we can create a boot.bin by right-clicking on the “hello world” application and selecting the “Create Boot Image” option.
Creating the boot image from the file, from the hello world application
This will populate the “create boot image” dialog correctly with the FSBL, FPGA bit file, and our application—provided the elf files are available.
Boot Image Creation Dialog correctly populated
Once the boot file is created, copy the boot.bin onto a microSD card and insert it into the SD card holder on the UltraZed IOCC (I/O Carrier Card). The final step, before we apply power, is to set SW2 on the UltraZed card to boot from the SD Card. The setting for this is 1 = ON, 2 = OFF, 3 = ON, and 4 = OFF. Now switch on the power on, connect to a terminal window, and you will see the program start and execute.
When I booted this on my UltraZed and IOCC combination, the following appeared in my terminal window:
Hello World Running
Next week we will look a little more at the architecture of the Zynq UltraScale+ MPSoC’s PS.
Code is available on Github as always.
The Koheron SDK and Linux distribution, based on Ubuntu 16.04, allows you to prototype working instruments for the Red Pitaya Open Instrumentation Platform, which is based on a Xilinx Zynq All Programmable SoC. The Koheron SDK outputs a configuration bitstream for the Zynq SoC along with the requisite Linux drivers, ready to run under the Koheron Linux Distribution. You build the FPGA part of the Zynq SoC design by writing the code in Verilog using the Xilinx Vivado Design Suite and assembling modules using TCL.
The Koheron Web site already includes several instrumentation examples based on the Red Pitaya including an ADC/DAC exerciser, a pulse generator, an oscilloscope, and a spectrum analyzer. The Koheron blog page documents several of these designs along with many experiments designed to be conducted using the Red Pitaya board. If you’re into Python as a development environment, there’s a Koheron Python library as well.
There’s also a quick-start page on the Koheron site if you’re in a hurry.
The Red Pitaya Open Instrumentation Platform
For more articles about the Zynq-based Red Pitaya, see:
By Adam Taylor
Like is MicroZed and PicoZed predecessors, the UltraZed-EG is a System on Module (SOM) that contains all of the necessary support functions for a complete embedded processing system. As a SOM, this module is designed to be integrated with an application-specific carrier card. In this instance, our application-specific card is the Avnet UltraZed IO Carrier Card.
The specific Zynq UltraScale+ MPSoC contained within the UltraZed SOM is the XCZU3EG-SFVA625, which incorporates a quad-core ARM Cortex-A53 APU (Application Processing Unit), dual ARM Cortex-R5 processors in an RPU (Real-Time Processing Unit), and an ARM Mali-400 GPU. Coupled with a very high performance programmable-logic array based on the Xilinx UltraScale+ FPGA fabric, suffice it to say that exploring how to best use all of these resources it will keep us very, very busy. You can find the 36-page product specification for the device here.
The UltraZed SOM itself shown in the diagram below provides us with 2GBytes of DDR4 SDRAM, while non-volatile storage for our application(s) is provided by both dual QSPI or eMMC Flash memory. Most of the Zynq UltraScale+ MPSoC’s PS and PL I/O are broken out to one of three headers to provide maximum flexibility on the application-specific carrier card.
Avnet UltraZed-EG SOM Block Diagram
The UltraZed IO Carrier Card (IOCC), breaks out the I/O pins from the SOM to a wide variety of interface and interconnect technologies including Gigabit Ethernet, USB 2/3, UART, PMOD, Display Port, SATA, and Ardunio Shield. This diverse set of I/O connections give us wide lattitude in developing all sorts of systems. The IOCC also provicdes a USB to JTAG interface allowing us to program and debug our system. You’ll find more information on the IOCC here.
Having introduced the UltraZed and its IOCC, it is time to write a simple “hello world” application and to generate our first Zynq UltraScale+ MPSoC design.
The first step on this journey is make sure we have used the provided voucher to generate a license and downloaded the Design Edition of the Vivado Design Suite.
The next step is to install the board files to provide Vivado with the necessary information to create designs targeting the UltraZed SoM. You can download these files using this link. These board-definition files include information such as the actual Zynq UltraScale+ MPSoC device populated on the SoM, connections to the PS on the IOCC, and a preset configuration for the SoM. We can of course create an example without using these files, however it requires a lot more work.
Once you have downloaded the zip file, extract the contents into the following directory:
<Vivado Install Root>/data/boards/boardfiles
When this is complete, you will see that the UltraZed board defintions are now in the directory and we can now use them within our design.
I should point out at this point that some of the UltraZed boards (including mine) use ES1 silicon. To alert Vivado about this, we need to create a init.tcl file in the scripts directory that will enable us to use ES1 silicon. Doing so is very simple. Within the directory:
Create a file called init.tcl. Enter the line “enable_beta_device*” into this file to enable the use of ES1 silicon within your toolchain.
With this completed we can open Vivado and create a new RTL project. After entering the project name and location, click next on the add sources, IP, and constraints tabs. This should bring you to part selection tab. Click on boards and you should see our UltraZed IOCC board. Select that board and then finish the open project dialog.
This will create a new project.
For this project I am just going to just use the Zynq UltraScale+ MPSoC’s PS to print “hello world.” I usually like to do this with new boards to ensure that I have pipe-cleaned the tool chain. To do this, we need a hardware-definition file to export to SDK to define the hardware platform.
The first step in this sequence is within Flow Navigator. On the left-hand side of the Vivado screen, select the Create Block Diagram option. This will provide a dialog box allowing you to name your block design (or you can leave it default). Click OK and this will create a blank block diagram (in the example below mine is called design_1).
Within this block diagram, we need to add an MPSoC system. Click on the “add IP” button as indicated in the block diagram. This will bring up an IP dialog. Within the search box, type in “MPSoC” and you will see the Zynq UltraScale+ MPSoC IP block. Double click on this and it will be added to the diagram automatically.
Once the block has been added, you will notice a designer assistance notification across the top of the block diagram. For the moment, do not click on that. Instead, double click on the MPSoC IP in your block diagram and it will open up the customization screen for the MPSoC, just like any other IP block.
Looking at the customization screen, you will see it is not yet configured for the target board. For instance, the IOU block has no MIO configuration. Had we not downloaded the board definition, we would now have to configure this by manually. But why do that when we can use the shortcut?
We have the board-definition files, so all we need to do to correctly configure this for the IOCC is close the customization dialog and click on the Run Block Automation notification at the top of the block diagram. This will configure the MPSoC for our use on the IOCC. Within the block automation dialog, check to make sure that the “apply pre-sets” option is selected before clicking OK.
Re-open the MPSoC IP block again and you will see a different configuration of the MPSoC—one that is ready to use with our IOCC.
Do not change anything. Close the dialog box. Then, on the block diagram, connect the PL_CLK0 pin to the maxihpm0_lpd_ack pin. Once that is complete, click on “validate” to ensure that the design has no errors.
The next step is very simple. We’ll create an RTL wrapper file for the block diagram. This will allow us to implement the design. Under the sources tab, right-click on the block diagram and select “create HDL wrapper.” When prompted, select the option that allows Vivado to manage the file for you and click OK.
To generate the bitstream, click on the “Generate Bitstream” icon on the menu bar. If you are prompted about any stages being out of date, re-run them first by clicking on “yes.”
Depending on the speed of your system, this step may take a few minutes or longer to generate the bitstream. Once completed, select the “open implementation” option. Having the implementation open allows us to export the hardware definition file to SDK where we will develop our software.
To export the hardware definition, select File-> Export->Export Hardware. Select “include bit file” and export it.
To those familiar with the original Zynq SoC, all of this should look pretty familiar.
We are now ready to write our first software program—next time.
You can find links to previous editions of the MPSoC edition here
Code is available on Github as always.
Xilinx launched the UltraFast Design Methodology more than three years ago. It’s designed to get you from project start to a successful, working design using the Xilinx Vivado Design Suite in the least amount of time using hand-picked best practices from industry experts. There’s a 364-page methodology manual titled “UltraFast Design Methodology Guide for the Vivado Design Suite” (UG949) that describes the methodology in detail and you can download and read it for free. (And you should.)
Don’t have time right now to read the 364-page version? Maybe you just want to get the ideas to see if it’s worth your time to read the full manual. (Hint: It is.)
OK, so if you’re that pressed for time there’s a 2-page Quick Reference Guide (UG1231) that you can read to see what the ideas are all about. You can download that guide here, also for free.
Note: If you’re looking for something in between take a look at “UltraFast: Hand-picked best practices from industry experts, distilled into a potent Design Methodology.”
Yesterday at Photonics West, my colleague Aaron Behman and I stopped by the Ximea booth and had a very brief conversation with Max Larin, Ximea’s CEO. Ximea makes a very broad line of industrial and scientific cameras and a lot of them are based on several generations of Xilinx FPGAs. During our conversation, Max removed a small pcb from a plastic bag and showed it to us. “This is the world’s smallest industrial camera,” he said while palming a 13x13mm board. It was one of Ximea’s MU9 subminiature USB cameras based on a 5Mpixel ON Semiconductor (formerly Aptina) MT9P031 image sensor. Ximea’s MU9 subminiature camera is available as a color or monochrome device.
Here’s are front and back photos of the camera pcb:
Ximea 5Mpixel MU9 subminiature USB camera
As you can see, the size of the board is fairly well determined by the 10x10mm image sensor, its bypass capacitors, and a few other electronic components mounted on the front of the board. Nearly all of the active electronics and the camera’s I/O connector are mounted on the rear. A Cypress CY7C68013 EZ-USB Microcontroller operates the camera’s USB interface and the device controlling the sensor is a Xilinx Spartan-3 XC3S50 FPGA in an 8x8mm package. FPGAs with their logic and I/O programmability are great for interfacing to image sensors and for processing the video images generated by these sensors.
Our conversation with Max Larin at Photonics West got me to thinking. I wondered, “What would I use to design this board today?” My first thought was to replace both the Spartan-3 FPGA and the USB microcontroller with a single- or dual-core Xilinx Zynq SoC, which can easily handle all of the camera’s functions including the USB interface, reducing the parts count by one “big” chip. But the Zynq SoC family’s smallest package size is 13x13mm—the same size as the camera pcb—and that’s physically just a bit too large.
The XC3S50 FPGA used in this Ximea subminiature camera is the smallest device in the Spartan-3 family. It has 1728 logic cells and 72Kbits of BRAM. That’s a lot of programmable capability in an 8x8mm package even though the Spartan-3 FPGA family first appeared way back in 2003. (See “New Spartan-3 FPGAs Are Cost-Optimized for Design and Production.”)
There are two newer Spartan FPGA families to consider when creating a design today, Spartan-6 and Spartan-7, and both device families include multiple devices in 8x8mm packages. So I decided see how much I might pack into a more modern FPGA with the same pcb real-estate footprint.
The simple numbers from the data sheets tell part of the story. A Spartan-3 XC3S50 provides you with 1728 logic cells, 72Kbits of BRAM, and 89 I/O pins. The Spartan-6 XCSLX4, XC6SLX9, and XCSLX16 provide you with 3840 to 14,579 logic cells, 216 to 576Kbits of BRAM, and 106 I/O pins. The Spartan-7 XC7S6 and XC7S15 provide 6000 to 12,800 logic cells, 180 to 360Kbits of BRAM, and 100 I/O pins. So both the Spartan-6 and Spartan-7 FPGA families provide nice upward-migration paths for new designs.
However, the simple data-sheet numbers don’t tell the whole story. For that, I needed to talk to Jayson Bethurem, the Xilinx Cost Optimized Portfolio Product Line Manager, and get more of the story. Jayson pointed out a few more things.
First and foremost, the Spartan-7 FPGA family offers a 2.5x performance/watt improvement over the Spartan-6 family. That’s a significant advantage right there. The Spartan-7 FPGAs are significantly faster than the Spartan-6 FPGAs as well. Spartan-6 devices in the -1L speed grade have a 250MHz Fmax versus 464MHz for Spartan-7 -1 or -1L parts. The fastest Spartan-6 devices in the -3 speed grade have an Fmax of 400MHz (still not as fast as the slowest Spartan-7 speed grade) and the fastest Spartan-7 FPGAs, the -2 parts, have an Fmax of 628MHz. So if you feel the need for speed, the Spartan-7 FPGAs are the way to go.
I’d be remiss not to mention tools. As Jayson reminded me, the Spartan-7 family gives you entrée into the world of Vivado Design Suite tools. That means you get access to the Vivado IP catalog and Vivado’s IP Integrator (IPI) with its automated integration features. These are two major benefits.
Finally, some rather sophisticated improvements to the Spartan-7 FPGA family’s internal routing architecture means that the improved placement and routing tools in the Vivado Design Suite can pack more of your logic into Spartan-7 devices and get more performance from that logic due to reduced routing congestion. So directly comparing logic cell numbers between the Spartan-6 and Spartan-7 FPGA families from the data sheets is not as exact a science as you might assume.
The nice thing is: you have plenty of options.
For previous Xcell Daily blog posts about Ximea industrial and scientific cameras, see:
The short video below captured at the recent SDS Drives show in Germany shows two recent safety-related certifications for Xilinx development tools. The first is a TÜV SÜD certification for the Vivado Design Suite for functional-safety applications and the second is for the Xilinx MicroBlaze processor GNU compiler tool chain, certified to SIL 4.
The video also shows a Zynq SoC being used to implements a functional-safety application using two different processor architectures—an ARM Cortex-A9 and a Xilinx MicroBlaze soft processor core—running the same code. This demonstration shows the functional-safety flexibility you get when you design a Zynq SoC into your design.
Xilinx has a version of QEMU—a fast, open-source, just-in-time functional simulator—for the ARM processors in the Zynq SoC and the Zynq UltraScale+ MPSoC and for the company’s MicroBlaze soft processor core. QEMU accelerates code development by giving embedded software developers an enhanced execution environment long before hardware is available and they can continue to use QEMU as a software-development platform even after the hardware is ready. (After all, it’s a lot easier to distribute QEMU to 300 software developers than to ship hardware units to each of them.)
Although QEMU was already available through the open-source community, Xilinx has added several innovations over time to match the multi-core, heterogeneous devices available in the two distinct Zynq device families, augmented by additional MicroBlaze processors instantiated in programmable logic.
The latest version of Xilinx QEMU, available on github at https://github.com/Xilinx/qemu, includes extended features including:
Xilinx is actively developing QEMU enhancements, which means more features are on the way. Meanwhile, you’ll find the Xilinx QEMU Wiki here.
By Adam Taylor
Having looked that how we can optimize the Zynq SoC’s PS (processor system) for power during operation and when we wish for the Zynq SoC to enter sleep mode, I now want to round off our look at power-reduction techniques by looking at how we reduce power consumption within the Zynq SoC’s PL (programmable logic) using design techniques. Obviously, one of the first things we should do is enable power optimization within implementation flow, which optimizes the design for power efficiency. However, Vivado tools can only optimize a design as presented. So let’s see what we can do to ensure that we present the best design possible.
Setting Power Optimization within Vivado
One of the first places to start is to ensure that we are familiar with the structure of the CLBs and slices used to implement our creations within the Zynq SoC’s PL. If you are not as familiar as you should be, the detail of these PL components is provided within in the Seven Series CLB user guide UG474.
Each CLB contains two slices. These slices provide the LUTs (look up tables), storage elements, etc. used to implement the logic in your design. The first thing we can do to optimize power consumption in our programmable logic design is to consider the polarity, synchronicity, and grouping of control signals to these CLB’s and slices. When we talk about a control signal, we mean the clock, clock enable, set/reset, and distributed-RAM write enables used within a slice.
Storage elements in a Programmable Logic Slice
Looking at the storage elements shown above, you can see that except for the CLK control signal, which has a mux to enable its inversion, all other signals are active high. If we declare them as active low or asynchronous, we will require an extra LUT to invert the signal and additional routing resources to connect the inverter. These extra logic and routing resources increase power consumption.
Grouping of control signals relates to how a specific group of control signals—e.g. the clock, reset and clock enable—behave. Creating many different control groups within a design or module makes it more difficult for the placer to locate elements within different control groups close together. The end result will require more routing which makes timing closure more difficult and increases power consumption.
We also need to consider how we use and configure the PL’s I/O resources. For instance, we must giver proper consideration to limiting drive strength and slew rate. We should also consider using the lowest I/O voltage supported by the receiving device. For example, can we may be able to use reduced-swing LVDS in place of LVDS.
More advanced design techniques that we can use relate to the use of hard macros within the PL and how the tools use this logic. One of the biggest savings can be achieved by using a smaller device, which clearly reduces overall power. There are two main techniques we can use to reduce the size of the required device. The first of these is resource time sharing, which uses the same on-chip logic resources for different functions at different times. A second approach is to use a common core for processing multiple inputs and inputs if possible. However, this technique increases complexity during design capture because we must consider multiplexing and sequencing needs.
Once we have completed our design, we can run the XPE tool within Vivado to estimate power consumption and predict junction temperature (very important!). Hopefully, we’ll get the reduction power we require. However, if we do not, we can perform “what if” scenarios as detailed by UG907, which also contains other low-power design techniques.
Code is available on Github as always.
All of Adam Taylor’s MicroZed Chronicles are cataloged here.
You can now download the latest version of the Vivado Design Suite HLx Editions, release 2016.4, which adds support for multiple Xilinx UltraScale+ devices including the Virtex UltraScale+ XCVU11P and XCVU13P FPGAs and board support packages for the Zynq UltraScale+ MPSoC ZCU102-ES2 and Virtex UltraScale+ VCU118-ES1 boards.
Download the latest version here.