UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

Here’s another amazing demo video of National Instrument’s (NI’s) PXIe-5840 VST (Vector Signal Transceiver) showing real-time, DVR-like capture of 1GHz of continuous-bandwidth RF data on a 24Tbyte RAID drive. You’d want this if you needed to capture a real-time, broad-spectrum set of RF signals for subsequent, more detailed analysis. The VST captures the broad-spectrum data and simultaneously streams it to the RAID storage box. (The NI PXIe-5840 VST is based on a Xilinx Virtex-7 690T FPGA for its real-time RF-generation and –analysis capabilities.)

 

Here’s the 2-minute video:

 

 

 

 

 

For more information about the 2nd-generation NI VST, see:

 

 

 

 

 

 

 

 

 

One more low-cost Zynq SoC dev board, from a reader/spelunker designing SDR cave radios

by Xilinx Employee ‎03-29-2017 12:30 PM - edited ‎03-29-2017 02:45 PM (215 Views)

 

A blog post from earlier this week, “Seven low-cost Zynq dev and training boards: a quick review,” prompted an email from Graham Naylor in the UK. Naylor informed me that I’d not mentioned his favorite Zynq-based board, the Trenz TE0722, in that blog post—and then he told me how he’s using the Trenz board (which is really more of a low-cost SOM rather than a training/dev board). During the day, Naylor measures neutron pulses from an ionization chamber using the Zynq-based Red Pitaya open instrumentation platform. (I’ve written many blogs about the Red Pitaya, listed below.) For fun, it appears that Naylor and colleague Pete Allwright design cave radios. If you’ve never heard of a cave radio, you’re in good company because I hadn’t either.

 

Naylor sent me a preprint of an article that will appear in the quarterly BCRA’s Cave Radio & Electronics Group Journal, in the June 2017 issue. (The BCRA is the British Cave Research Association.) Naylor’s and Allwright’s article, titled “Outlining the Architecture of the Nicola 3Z Cave Radio,” discusses the design of a new version of the Nicola 3 rescue radio designed to be used by cave rescue teams for underground communications.

 

The original Nicola 3 radio was based on a Xilinx Spartan-3E FPGA supplied on a module from OHO Elektronik. The FPGA implemented an SDR design for a radio that performs SSB modulation and demodulation using an 87KHz carrier wave. Radio transmission does not occur through the air but through the ground using a couple of electrodes jammed into the earthen floor of the cave. (We’re in a cave, remember?) A little water poured on the earth interface helps improve transmission/reception.

 

 

Nicola 3 Cave Radio in Use.jpg

 

 

Nicola 3 radio on test in Reservoir Hole, Mendip, UK (Photo: Mendip Cave Rescue)

 

 

Xilinx introduced the 90nm Spartan-3E in 2005, so the Nicola cave radio development team has upgraded the Nicola design to the Zynq Z-7010 SoC, which resides on a low-cost Trenz TE0722 SOM. Trenz sells one of these boards for just €64.00 and if you want 500 pieces, the price drops to €48.00.

 

 

 

Trenz TE0722 Zynq SOM.jpg

 

Trenz TE0722 Zynq SOM

 

 

The new radio is called the Nicola 3Z. (I'm guessing "Z" is for "Zynq.") The FPGA fabric in the Zynq SoC implements the SDR functions in the Nicola 3Z radio including the SSB class D modulator, which drives an H-bridge driver for transmission; the receiver’s SSB filter, decimator, and demodulator; and an AGC block implemented on a soft-core Xilinx PicoBlaze 8-bit microcontroller, which is also instantiated in the Zynq SoC’s FPGA. There’s a second PicoBlaze instantiation on chip for housekeeping. That Zynq Z-7010 SoC may be a low-end part, but it’s plenty busy in the Nicola 3Z radio’s design.

 

 

 

Note: For more information about the Zynq-based Red Pitaya open instrumentation platform, see:

 

 

 

 

By Adam Taylor

 

At the end of the Sysmon AMS blogs I had introduced the several PLLs within the Zynq UltraScale+ MPSoC. This introduction suggests to me that it’s time to talk about the clocking architecture of the MPSoC Device.

 

As with the original Zynq SoC, the PS (processing system) in the Zynq UltraScale+ MPSoC is the system master. So we will initially focus upon its clocking architecture.  Within the PS there are three main clock inputs:

 

  • PS Reference Clock (PSS_REF_CLK)
  • Alternate PS Reference Clock (PSS_ALT_REF_CLK)
  • Video Reference Clock (PSS_VIDEO_REF_CLK)

 

While the PS reference clock has a dedicated input pin, the PSS_ALT_REF_CLK and PSS_VIDEO_REF_CLK are input via the MIO and are enabled or disabled in Vivado by the I/O configuration customization tab. If we plan on using these clocks, we need to ensure there is no conflict with other planned use of the MIO.

 

 

 

Image1.jpg

 

 

Enabling the Alternate reference clock and the video clock

 

 

Once these have been enabled, we can configure them on the clock configuration input clock tab as shown below:

 

 

Image2.jpg 

 

 

Internally, the PS has four clock groups that provide all the required clocks:

 

  • Main Clock Group (MCG): This group covers the Zynq UltraScale+ MPSoC’s LPD and FPD power domains. Within the MCG, we find the five PLLs in the PS (the DDR, APU, VIDEO, RPU, and IO PLLs). The first three PLLs are within the FPD while the last two are within the LPD.
  • Secure Clock Group (SCG): This group provides the clocks for the Zynq UltraScale+ MPSoC’s PMU and the CSU. It is generated internally via a ring oscillator.
  • Real Time Clock Group (RTC): This group provides the clock for the RTC and requires an external crystal attached to two dedicated Zynq UltraScale+ MPSoC PS I/O pins (PS_ADI, PS_ADO).
  • Interface Clock Group (ICG): This group consists of clocks provided to the PS via interfaces (e.g. from the PL (programmable logic) as part of the AXI transactions).

 

 

We’ll now focus on the MCG as this is the group with which we will have the most interaction. Within this group, we choose which of the five PLLs is used to clock the Zynq UltraScale+ MPSoC’s processors and peripherals within the LPD and FPD. We can do this via the clock configuration -> output clocks tab. Here we can configure the domains clocking for both the low and full power domains.

 

 

Image3.jpg 

 

 

To generate a PLL output frequency as closely as possible to the desired frequency, we may want to change the PLL input-clock source. We have several potential clock sources which can be used to clock each of the PLLs within the Zynq UltraScale+ MPSoC. 

 

As mentioned above we can use PS_REF_CLK, PS_ALT_REF_CLK, or PS_VIDEO_REF_CLK. These clocks are directly input into the PS. We can also use one of the four GT_REF_CLKS or the AUX_REF_CLK. This latter reference clock is provided from the PL while the former clock is provided by the PS_GTR. The relevant PLL control register selects which of these clocks drives the PLL. These registers reside in either the CRL_APB module for low-power domain PLLs or CRF_APB module for high-power domain PLLs.

 

We can select which of the four GT reference clocks is provided as the GT_REF_CLK using the Serial Input Output Unit (SIOU) module CRX_CNTRL Register.

 

Now that we understand the Zynq UltraScale+ MPSoC’s clocking and how we set the desired frequency for each of the subsystems, we will explore the subsystems in more detail in the MicroZed Chronicles blogs that follow.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

 

By Adam Taylor

 

I just received the most interesting email from Antti at Trenz, in which he pointed out that he had designed a 500MHz radio receiver using just four resistors, four capacitors, and a Xilinx series 7 FPGA. How did he do this? He is keeping that to himself for the moment. However, Antti thinks it would a great idea to open a challenge based upon this design to see if others in the FPGA community can explain how he achieved this. Of course, for the winners who supply the correct answer there will be Trenz goodies as prizes. (See below.)

 

The diagram below shows the components allowed to solve this challenge. You can redraw them if necessary to clarify the schematic. The values of R and C do not need to be optimized.

 

 

image1.jpg 

 

 

To prove that this is possible, the screen shots below show the input and the recovered signal inside the FPGA.

 

 

Image2.jpg

 

 

By this point I was thinking Delta Sigma ADC using the FPGA’s LVDS inputs. (There are papers and articles about this technique online.) However, Antti tells me this is not his solution and he was kind enough to provide a few hints for this challenge below:

 

 

  1. All you need to know is really in the diagram. You can print it out, take a pencil and complete the challenge. If you are not familiar with Xilinx FPGA’s you may have to read some Xilinx datasheets.
  2. You may have to read the Challenge description many times.
  3. The minimal digital circuitry used inside of the Xilinx FPGA to demonstrate the operation of this radio receiver can be made within less than two hours with Xilinx Vivado. If you are very fast, maybe in less than one.

 

 

Because they have so many Trenz prizes to give away to the winners, Antti has created three categories:

 

  • UltraFast - This is very simple. The first person who sends an email to Antti with a working solution, or with a solution that we cannot prove to be non-working, is the winner. There is no need to calculate the values for the resistors or capacitors or to make any performance estimates. Your sole goal is to be the first to solve the Challenge correctly.

 

  • Maker DIY hacker - To win in this category, you need to provide a working solution; you must implement it; and you must provide a photo of your setup and some measurement results. Bonus points are given if you use your design for some cool application.

 

  • Ultimate Theory - To win in this category, you need to provide a working solution. You will however get higher ranking if you:

 

  1. Provide some reasonable values for the resistors and capacitors or actually calculate them and provide the calculations.
  2. You provide additional solutions with fewer components (and less performance).
  3. You provide an enhanced solution adding minimal numbers of resistors and capacitors (no other components).

 

The closing date for entries is July 3rd. The judges will be Antti and myself (Adam Taylor).

 

If you want to enter your solution in any category email challenge@trenz.biz

 

 

 

Image Matters’ Origami B20 module, based on a Xilinx Kintex UltraScale KU060 FPGA, is a small 94x53mm module that you can use to perform all sorts of high-speed processing. (See “Image Matters launches Origami Ecosystem for developing advanced 4K/8K video apps using the FPGA-based Origami module.”) For example, you can use it for a variety of video-compression applications using various IP compression cores including MPEG, JPEG-2000, and TICO. You can also use it for cloud-computing and neural-network applications such as image detection. The key thing is that the small Origami B20 module puts everything you need to run the FPGA on the one small module including SDRAM, Flash memory, the power supply, a backup battery, and security features (including tamper protection).

 

Here’s a short, 2.5-minute, Powered by Xilinx video with more information about the Origami B20 module:

 

 

 

 

 

In yesterday’s EETimes article titled “How will Ethernet go real-time for industrial networks?,” author Richard Wilson interviews National Instruments’ Global Technology and Marketing Director Rahman Jamal about using OPC-UA (the OPC Foundation’s Unified Architecture) and TSN (time-sensitive networking) to build industrial Ethernet networks (IIoT/Industrie 4.0) that deliver real-time response. (Yes, yes, yes, “real-time” is a loosely defined term where “real” depends on your system’s temporal reality.) As Jamal states in the interview, some constrained industrial Ethernet network topologies need no help to achieve real-time operation. In other cases and for other topologies, you need Ethernet implementations that are “heavily modified at the hardware level to achieve performance.”

 

One of the hardware additions that can really help is the hardware implementation of the IEEE 1588v2 PTP (Precision Time Protocol) clock-synchronization standard. PTP permits each piece of network-connected equipment to be synchronized using a 64-bit timer, which can be used for time-stamping, synchronization, control and as a common time reference to implement TSN.

 

PTP implementation is an ideal task for an IP block instantiated in programmable logic (see last year’s Xcell Daily blog post “Intelligent Gateways Make a Factory Smarter,” written by SoC-e (System on Chip engineering) founder and CEO Armando Astarloa). SoC-e has implemented just such an IEEE 1588v2 PTP IP core in a Xilinx Zynq SoC, which is the core logic device inside of the company’s CPPS-Gate40 Sensor intelligent IIoT gateway. (Note: Software PTP implementations are neither fast nor deterministic enough for many IIoT applications.)

 

 

SoC-e CPPS-Gate40 Sensor Gateway.jpg 

 

SoC-e CPPS-Gate40 Sensor intelligent IIoT gateway

 

 

 

You can see the SoC-e PTP IP core in the very center of this CPPS-Gate40 block diagram:

 

 

 

SoC-e CPPSGate40 block diagram.jpg

 

SoC-e CPPS-Gate40 Sensor intelligent IIoT gateway block diagram

 

 

 

According to the SoC-e Web page, the company’s IEEE 1588v2 IP core in the CPPS-Gate40 Sensor gateway can deliver sub-microsecond network synchronization. How is such a small number possible? As Jamal says in his EETimes’ interview, “bit times (time on the wire) for a 64-byte frame at GigE rates is 512ns.” That’s how.

 

 

Seven low-cost Zynq dev and training boards: a quick review

by Xilinx Employee on ‎03-27-2017 01:40 PM (776 Views)

 

With last week’s introduction of the Digilent Arty Z7 Zynq SoC training and dev board, I felt it was time to review some of the low-cost boards that occupy an outsized piece of mindshare in my head. So here are seven that have appeared previously in the Xcell Daily blog, listed in alphabetical order by vendor:

 

 

 

 

Analog Devices ADALM-PLUTO ($149): Students from all levels and backgrounds looking to improve their RF knowledge will want to take a look at the new ADALM-PLUTO SDR USB Learning Module from Analog Devices. The $149 USB module has an RF range of 325MHz to 3.8GHz with separate transmit and receive channels and 20MHz of instantaneous bandwidth. It pairs two devices that seem made for each other: an Analog Devices AD9363 Agile RF Transceiver and a Xilinx Zynq Z-7010 SoC.

 

 

Analog Devices ADALM-PLUTO SDR USB Learning Module.jpg 

 

Analog Devices’ $149 ADALM-PLUTO SDR USB Learning Module

 

 

 

 

 

 

Digilent ARTY Z7 ($149 to $209): The first thing you’ll note from the Arty Z7 dev board photo is that there’s a Zynq SoC in the middle of the board. You’ll also see the board’s USB, Ethernet, Pmod, and HDMI ports. On the left, you can see double rows of tenth-inch headers in an Arduino/chipKIT shield configuration. There are a lot of ways to connect to this board, which should make it a student’s or experimenter’s dream board considering what you can do with a Zynq SoC.

 

 

Digilent Arty Z7.jpg 

 

Digilent Arty Z7 dev board for makers and hobbyists

 

 

 

Digilent PYNQ-Z1 ($229): PYNQ is an open-source project that makes it easy for you to design embedded systems using the Xilinx Zynq-7000 SoC using the Python language, associated libraries, and the Jupyter Notebook, which is a pretty nice, collaborative learning and development environment for many programming languages including Python. PYNQ allows you to exploit the benefits of programmable logic used together with microprocessors to build more capable embedded systems with superior performance when performing embedded tasks.

 

 

PYNQ-Z1.jpg

 

Digilent PYNQ-Z1 Dev Board

 

 

 

Krtkl’s snickerdoodle ($95 to $195): The amazing, Zynq-based “snickerdoodle one” is a low-cost, single-board computer with wireless capability based on the Xilinx Zynq Z-7010 SoC, available for purchase on the Crowd Supply crowdsourcing Web site.

 

 

Snickerdoodle.jpg

 

Krtkl’s Zynq-based, WiFi-enabled Snickerdoodle Dev Board

 

 

 

National Instruments myRIO ($400 to $1001): The NI myRIO hardware/software development platform for NI’s LabVIEW system design software is based on the Zynq-7010 All Programmable SoC. About the size of a small paperback book (so that it easily drops into a backpack), the NI myRIO sports ten analog inputs, six analog outputs, left and right audio channels, 40 digital I/O lines (SPI, I2C, UARD, PWM, rotary encoder) and an on-board, 3-axis accelerometer, and two 34-pin expansion headers.

 

 

NI myRIO Board Photo.jpg

 

The Zynq-based NI myRIO

 

 

 

National Instruments RoboRIO ($435 for FRC teams): The NI roboRIO robotic controller was specifically designed for the FIRST Robotics Competition (FRC). The FRC event is a particular passion for NI’s founder, Dr. James Truchard.

 

 

 

NI roboRIO.jpg

 

NI roboRIO for First Robotics Competition teams

 

 

 

 

Trenz ZynqBerry (€79.00 to €109.00): The Trenz Electronic TE0726 ZynqBerry Dev Board puts a Xilinx Zynq Z-7010 or Z-7020 SoC into a Rasberry-Pi-compatible form factor with 64Mbytes of LPDDR2 SDRAM, four USB ports (in a hub configuration), a 100Mbps Ethernet port, an HDMI port, MIPI DSI and CSI-2 connectors, a PWM digital audio jack, and 128Mbits of Flash memory for configuration and operation.

 

 

 

Trenz ZynqBerry Dev Board.jpg 

 

TE0726 ZynqBerry Dev Board from Trenz

 

 

 

 

 

 

By Adam Taylor

 

A couple of weeks ago, I talked about the Xilinx reVision stack and the support it provides for OpenVX and OpenCV. One of the most exciting things I explained was about how we could accelerate several OpenCV functions (which include the OpenVX Core functions) using the Zynq SoC’s programmable logic. What I did not look at was the other significant part of the reVision stack and its support for machine learning.

 

Machine learning is increasing important for embedded-vision applications because it helps systems to evolve from being vision-enabled to being vision-guided autonomous systems. Machine learning is often used for embedded-vision applications to identify and classify information contained within an image. The embedded-vision system uses these identifications and classifications to make informed decisions in real time, enabling increased interaction with the environment.

 

For those unfamiliar with machine learning it is most often implemented by the creation and training of a neural network. Neural networks are modelled upon the human cerebral cortex in that each neuron receives an input, processes it, and communicates the processed signal it to another neuron. Neural networks typically consist of an input layer, internal layer(s), and an output layer.

 

 

Image1.jpg

 

 

 

Those familiar with machine learning may have come across the term “deep learning.” This is where there are several hidden layers in the neural network, allowing more complex machine-learning algorithms to be implemented.

 

When working with neural networks in embedded-vision applications, we need to use a 2D network. This is where Convolutional Neural Networks (CNNs) are used. CNNs are deep-learning networks that contain several convolutional and sub-sampling layers along with a separate, fully connected network to perform the final classification. Within the convolution layer, the input image will be broken down into several overlapping smaller tiles.

 

The results from this convolution layer are used to create an activation map, using an activation layer in the network placed before further sub-sampling and additional stages and preceding the final, fully connected network. The exact implementation of the CNN network varies depending upon the network architecture implemented (GoogLeNet, SSD, AlexNet). However, a CNN will typically contain at least the following elements:

 

 

  • Convolution – Identifies features within the image
  • Rectified Linear Unit (reLU) – Activation layer that creates an activation map following the convolution
  • Max Pooling – Performs sub-sampling between layers
  • Fully Connected layer – Performs the final classification

 

 

The weights used for each of these elements are determined via training, and one of the CNN’s advantages is the relative ease of training the network. Training requires large data sets and high-performance computers to correctly determine the weights for each stage.

 

To ease the development of machine-learning applications, many engineers use a framework like Caffe, which supports the implementation and training of machine learning. The use of frameworks allows us to work at a higher level and maximize reuse. Using a framework, we don’t need to start from scratch each time we develop an application.

 

The Xilinx reVision stack provides an integrated Caffe framework flow, which allows us to take the prototext definition of the network and trained weights to deploy the machine-learning application. (Note that network training is separate and distinct from deployment.) To enable this, the Xilinx reVision stack provides several hardware-accelerated functions that can be implemented within the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic) to create the machine-learning inference engine. The reVision stack also provides examples for a wide range of network structures, enabling us to get up and running with our machine-learning application without the need to initially compile the PL design. Once we are happy with the machine-learning application, we can then use the SDSoC flow to develop our own embedded-vision application containing the optimized machine-learning application.

 

 

Image2.jpg 

 

 

Using the Zynq PL provides for an optimal implementation that delivers faster response times when interacting with the embedded-vision system environment. This is especially true as machine learning applications are increasingly implemented using fixed-point integers like INT8, which are ideal for implementation in DSP elements.

 

Machine learning is going to be a hot area for several applications. So I will be coming back to this topic in detail as the MicroZed Chronicles progress—with some examples of course.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

 

MicroZed Chronicles Second Year.jpg 

 

The multi-GHz processing capabilities of Xilinx FPGAs never fails to amaze me and the following video from National Instruments (NI) demonstrating the real-time signal-generation and analysis capabilities of the NI PXIe-5840 VST (Vector Signal Transceiver) are merely one more proof point. The NI VST is designed for use in a wide range of RF test systems including 5G and IoT RF applications, ultra-wideband radar prototyping, and RFIC testing. In the demo below, this 2nd-generation NI VST is generating an RF signal spanning 1.2GHz to 2.2GHz (1GHz of analog bandwidth) containing five equally spaced LTE channels. The analyzer portion of the VST is simultaneously and in real time demodulating and decoding the signal constellations in two of the five LTE channels.

 

The resulting analysis screen generated by NI’s LabVIEW software tells the story:

 

 

NI VST Control Screen for LTE Demo.jpg 

 

The reason that the NI PXIe-5840 VST can perform all of these feats in real time is because there’s a Xilinx Virtex-7 690T FPGA inside pulling the levers, making this happen. (NI’s 1st-generation VSTs employed Xilinx Virtex-6 FPGAs.)

 

Here's the 2-minute video of the NI VST demo:

 

 

 

 

 

 

 

Please contact National Instruments directly for more information on its VST family.

 

 

For additional blogs about NI’s line of VSTs, see:

 

 

 

 

 

 

If you’re going to strap two 12Gsamples/sec, 16-bit DACs and two 6.4Gsamples/sec, 12-bit ADCs into your VPX/AMC module, you’d better include massive real-time DSP horsepower to tame them. That’s exactly what VadaTech has done with its VPX599 and AMC599 modules by placing a Xilinx Kintex UltraScale KU115 FPGA (along with 16 or 20Gbytes of high-speed DDR4 SDRAM) on the modules’ digital carrier board mated to an FMC analog converter board.

 

 

 

VadaTech AMC599.jpg 

 

VadaTech AMC599 ADC/DAC Module

 

 

VadaTech VPX599.jpg

 

VadaTech VPX599 ADC/DAC Module

 

 

 

Here’s a block diagram of the AMC599 module (the VPX599 block diagram is quite similar):

 

 

 

VadaTech AMC599 Block Diagram.jpg
 

 

VadaTech AMC599 ADC/DAC Module Block Diagram

 

 

 

At these conversion rates, raw data streams to and from the host CPU are quite impractical so you must, repeat must, have on-board processing and local storage—and what other processing genie besides a Xilinx UltraScale FPGA would you trust to handle and process those sorts of extreme streams?

 

 

Please contact VadaTech directly for more information on the VPX599 and AMC599 modules.

 

 

 

 

 

 

The just-announced VICO-4 TICO SDI Converter from Village Island employs visually lossless 4:1 TICO compression to funnel a 4K60p video (on four 3G-SDI video streams or one 12G-SDI stream) into onto a single 3G-SDI output stream, which reduces infrastructure costs for transport, cabling, routing, and compression in broadcast networks.

 

 

 

Village Island VICO-4.jpg

 

 

VICO-4 4:1 SDI Converter from Village Island

 

 

 

Here’s a block diagram of what’s going on inside of Village Island’s VICO-4 TICO SDI Converter:

 

 

Village Island VICO-4 Block Diagram.jpg 

 

And here’s a diagram showing you what broadcasters can do with this sort of box:

 

 

Village Island VICO-4 Distribution Diagram.jpg

 

 

 

The reason this is even possible in a real-time broadcast environment is because the lightweight intoPIX TICO compression algorithm has very low latency (just very a few video lines) when implemented in hardware as IP. (Software-based, frame-by-frame video compression is therefore totally out of the question in an application such as this introduces too much delay.)

 

Looking at the VICO-4’s main (and only) circuit board shows one main chip implementing the 4:1 compression and signal multiplexing. And that chip is… a Xilinx Kintex UltraScale KU035 FPGA. It has plenty of on-chip programmable logic for the TICO compression IP and it has sixteen 16.3Gbps transceiver ports—more than plenty to handle the 3G- and 12G-SGI I/O required by this application.

 

 

Village Island VICO-4 pcb.jpg 

 

 

Note: Paltek in Japan is distributing Village Island’s VICO-4 board in Japan as an OEM component. The board needs 12Vdc at ~25VA.

 

 

 

For more information about TICO compression IP, see:

 

 

 

 

 

 

 

Laser-based, industrial 3D Camera from VRmagic resolves complex surfaces with 1/64 sub-pixel accuracy

by Xilinx Employee ‎03-23-2017 10:48 AM - edited ‎03-23-2017 11:08 AM (679 Views)


VRmagic LineCam3D.jpgA configurable, COG (center-of-gravity), laser-line extraction algorithm allows VRmagic’s LineCam3D to resolve complex surface contours with 1/64 sub-pixel accuracy. (The actual measurement precision, which can be as small as a micrometer, depends on the optics attached to the camera.) The camera must process the captured video internally because, at its maximum 1KHz scan rate, there would be far more raw contour data than can be pumped over the camera’s GigE Vision interface. The algorithm therefore runs in real time on the camera’s internal Xilinx series 7 FPGA, which is paired with a TI DaVinci SoC to handle other processing chores and 2Gbytes of DDR3 SDRAM. The camera’s imager is a 2048x1088-pixel CMOSIS CMV2000 CMOS image sensor with a pipelined global shutter. The VRmagic LineCam3D also has a 2D imaging mode that permits the extraction of additional object information such as surface printing that would not appear on the contour scans (as demonstrated in the photo below).

 

Here’s a composite photo of the camera’s line-scan contour output (upper left), the original object being scanned (lower left), and the image of the object constructed from the contour scans (right):

 

 

VRmagic LineCam3D Output.jpg 

 

In laser-triangulation measurement setups, the camera’s lens plane is not parallel to the scanned object’s image plane, which means that only a relatively small part of the laser-scanned image would normally be in focus due to limited depth of focus. To compensate for this, the LineCam3D integrates a 10° tilt-shift adapter into its rugged IP65/67 aluminum housing, to expand the maximum in-focus object height. Anyone familiar with photographic tilt-shift lenses—mainly used for architectural photography in the non-industrial world—immediately recognizes this as the Scheimpflug principle, which increases depth of focus by tilting the lens relative to both the imager plane and the subject plane. It’s fascinating that this industrial camera incorporates this ability into the camera body so that any C-mount lens can be used as a tilt-shift lens.

 

 

For more information about the LineCam3D camera, please contact VRmagic directly.

 

 

Hot (and Cold) Stuff: New Spartan-7 1Q Commercial-Grade FPGAs go from -40 to +125°C!

by Xilinx Employee ‎03-22-2017 05:12 PM - edited ‎03-22-2017 05:15 PM (542 Views)

 

There’s a new line in the table for Spartan-7 FPGAs in the FPGA selection guide on page 2 showing an expanded-temperature range option of -40 to +125°C for all six family members. These are “Expanded Temp” -1Q devices. So if you have the need for extreme hi-temp (or low-temp) operation, you might want to check into these devices. Ask your friendly neighborhood Xilinx or Avnet sales rep.

 

 

Spartan-7 Family Table with 1Q devices.jpg
 

 

 

For more information about the Spartan-7 FPGA product family, see:

 

 

 

 

 

 

 

Scary good 3-minute troubleshooting demo of National Instruments’ VirtualBench All-in-One Instrument

by Xilinx Employee ‎03-22-2017 01:40 PM - edited ‎03-23-2017 07:14 AM (435 Views)

 

National Instruments’ (NI’s) VirtualBench All-in-One Instrument, based on the Xilinx Zynq Z-7020 SoC, combines a mixed-signal oscilloscope with protocol analysis, an arbitrary waveform generator, a digital multimeter, a programmable DC power supply, and digital I/O. The PC- or tablet-based user-interface software allows you to make all of those instruments play together as a troubleshooting symphony. That point is made very apparent in this new 3-minute video demonstrating the speed at which you can troubleshoot circuits using all of the VirtualBench’s capabilities in concert:

 

 

 

 

 

For more Xcell Daily blog posts about the NI VirtualBench All-in-One instrument, see:

 

 

 

 

 

For more information about the VirtualBench, please contact NI directly.

 

Arty Z7: Digilent’s new Zynq SoC trainer and dev board—available in two flavors for $149 or $209

by Xilinx Employee ‎03-22-2017 11:18 AM - edited ‎03-22-2017 04:09 PM (2,652 Views)

 

I’ve known this was coming for more than a week, but last night I got double what I expected. Digilent’s Web site has been teasing the new Arty Z7 Zynq SoC dev board for makers and hobbyists for a week—but with no listed price. Last night, prices appeared. That’s right, there are two versions of the board available:

 

  • The $149 Arty Z7-10 based on a Zynq Z-7010 SoC
  • The $209 Arty Z7-20 based on a Zynq Z-7020 SoC

 

 

Digilent Arty Z7.jpg 

 

Digilent Arty Z7 dev board for makers and hobbyists

 

 

 

Other than that, the board specs appear identical.

 

The first thing you’ll note from the photo is that there’s a Zynq SoC in the middle of the board. You’ll also see the board’s USB, Ethernet, Pmod, and HDMI ports. On the left, you can see double rows of tenth-inch headers in an Arduino/chipKIT shield configuration. There are a lot of ways to connect to this board, which should make it a student’s or experimenter’s dream board considering what you can do with a Zynq SoC. (In case you don’t know, there’s a dual-core ARM Cortex-A9 MPCore processor on the chip along with a hearty serving of FPGA fabric.)

 

Oh yeah. The Xilinx Vivado HL Design Suite WebPACK tools? Those are available at no cost. (So is Digilent’s attractive cardboard packaging, according to Arty Z7 Web page.)

 

Although the Arty Z7 board has now appeared on Digilent’s Web site, the product’s Web page says the expected release date is March 27. That’s five whole days away!

 

As they say, operators are standing by.

 

 

Please contact Digilent directly for more Arty Z7 details.

 

 

 

MathWorks’ HDL Coder wins Embedded World AWARD in Nuremberg last week

by Xilinx Employee ‎03-21-2017 02:07 PM - edited ‎03-22-2017 05:59 AM (771 Views)

 

The organizers of last week’s Embedded World show in Nuremberg gave out embedded AWARDS in three categories last week during the show and MathWorks’ HDL Coder won in the tools category. (See the announcement here.) If you don’t know about this unique development tool, now is a good time to become acquainted with it. HDL Coder accepts model-based designs created using MathWorks’ MATLAB and Simulink and can generate VHDL or Verilog for all-hardware designs or hardware and software code for designs based on a mix of custom hardware and embedded software running on a processor. That means that HDL Coder works well with Xilinx FPGAs and Zynq SoCs.

 

Here’s a diagram of what HDL Coder does:

 

 

MathWorks HDL Coder.jpg 

 

 

You might also want to watch this detailed MathWorks video titled “Accelerate Design Space Exploration Using HDL Coder Optimizations.” (Email registration required.)

 

 

For more information about using MathWorks HDL Coder to target your designs for Xilinx devices, see:

 

 

 

 

 

 

 

 

 

The Sundance DSP PXIe700 module is a 3U PXIe card with an on-board Xilinx Kintex-7 FPGA (a 325T or a 410T), so it can perform nearly any signal-processing or control task you can imagine.

 

 

Sundance PXIe-700 Kintex-7 Module Photo.jpg 

 

 

Sundance PXIe700 module based on a Xilinx Kintex-7 FPGA

 

 

Here’s a block diagram of the Sundance PXIe700 module:

 

 

Sundance PXIe-700 Kintex-7 Module.jpg

 

 

Sundance PXIe700 module based on a Xilinx Kintex-7 FPGA, Block Diagram

 

 

 

Sundance provides this board with the SCom IP Core, which communicates to the host through the PCIe interface and provides the user logic instantiated in the Kintex-7 FPGA with a multichannel streaming interface to the host CPU and sample applications. Other IP cores, a Windows driver, DLL, and user-interface software are also available. The PXIe700 data sheet also mentions a VideoGuru toolset that can turn this hardware into a video test center for  NTSC, VGA, DVI, SMPTE, GigE-Vision, and other video  standards. (Contact Sundance DSP for more details.)

 

The Sundance product page also shows the PXIe700 board with a couple of Sundance FMC modules attached as example applications:

 

 

Sundance PXIe-700 With attached FMC-DAQ2p5.jpg 

 

 

Sundance PXIe700 with attached FMC-DAQ2p5 multi-Gsample/sec ADC and DAC card

 

 

 

 

Sundance PXIe-700 With attached FMC-ADC500-5.jpg

 

 

Sundance PXIe700 with attached FMC-ADC500-5 5-channel, 500Msamples/sec, 16-bit ADC card

 

 

 

 

 

 

You want to learn how to design with and use RF, right? Students from all levels and backgrounds looking to improve their RF knowledge will want to take a look at the new ADALM-PLUTO SDR USB Learning Module from Analog Devices. The $149 USB module has an RF range of 325MHz to 3.8GHz with separate transmit and receive channels and 20MHz of instantaneous bandwidth. It pairs two devices that seem made for each other: an Analog Devices AD9363 Agile RF Transceiver and a Xilinx Zynq Z-7010 SoC.

 

 

 

Analog Devices ADALM-PLUTO SDR USB Learning Module.jpg 

 

 

Analog Devices’ $149 ADALM-PLUTO SDR USB Learning Module

 

 

Here’s an extremely simplified block diagram of the module:

 

 

Analog Devices ADALM-PLUTO SDR USB Learning Module Block Diagram.jpg

 

 

Analog Devices’ ADALM-PLUTO SDR USB Learning Module Block Diagram

 

 

However, the learning module’s hardware is of little use without training material and Analog Devices has already created dozens of online tutorials and teaching materials for this device including ADS-B aircraft position, receiving NOAA and Meteor-M2 weather satellite imagery, GSM analysis, listening to TETRA signals, and pager decoding.

 

 

By Adam Taylor

 

 

So far, we have addressed how to communicate, control, and read conversions from the Sysmon block in the Zynq UltraScale+ MPSoC’s PS (processing system). However, we have not addressed the Sysmon in the Zynq UltraScale+ MPSoC’s PL (programmable logic), which provides us the ability to monitor external signals.

 

As I mentioned in my first blog looking at the Zynq UltraScale+ MPSoC’s AMS capabilities, the first thing we need to do to use the PL Sysmon is check that it is available. We do this by reading the AMS block’s combined status register, which resides at address 0xFFA50044, named PL_SYSMON_CONTROL_STATUS. Only the LSB is used within this register. If that bit is set high, then the PL Sysmon is available. Otherwise, we should not try to address it.

 

We can access this register using a simple call to read the address as below:

 

 

pl_status = Xil_In32(0xFFA50044); // pl_status is a u32

 

 

Image1.jpg

 

 

Once we have confirmed availability, we can use the same approach that we did in the previous blog to configure the PS Sysmon. The only difference is that this time we will be configuring and using PL Sysmon, not the PS Sysmon. The exception this time it that as opposed to using the XSYSMON_PS block identifier, which says we wish to us the PS Sysmon, this time we need to use the XSYSMON_PL block identifier to use the PL Sysmon. This block identifier establishes the base address of the Sysmon we wish to use.

 

We can select the Sysmon channel wish to monitor using the sequencer masks available within xsysmonpsu_hw.h. Most of these sequencer channels are named as per their channel (e.g. VP_VN) as they are dedicated to one or the other Sysmon. The supply channels are not so named. As such, we need to understand when addressing either the PL or PS Sysmon what voltage parameter we are monitoring. The table below shows to monitored parameter for the both the PS and the PL channels.

 

 

 

Image2.jpg 

 

What is very helpful is the ability of the PL Sysmon to monitor the PS core voltages. This is especially of interest for mission-critical applications. We will talk more about this in future blogs.

 

Putting this all together in a simple application that we run on the Zynq UltraScale+ MPSoC’s ARM cortex-A53 cores (the ARM Cortex-R5 cores can do the same), we can see the following results when reading both the PL and PS Sysmons:

 

 

Image3.jpg

 

 

Mapping the supplies above against the Zynq UltraScale+ MPSoC’s data sheet and IOCC user guide, voltages for the IO banks show both Sysmons are reading the correct values.

 

At this point, we have addressed either one or the other of the Sysmon blocks available using the block identifier. However, we can also read back several other critical voltages via the AMS block. This AMS Block is the third block identifier available which is XSYSMON_AMS.

 

We use this third block identifier to address the AMS block, within which we can read back the values of several other system critical voltages. These voltages include the five Zynq UltraScale+ MPSoC PS PLL voltages; the DDR PLL voltage; the battery voltage; and the VCC INT, BRAM, and AUX voltages.

 

We can address these voltages within the AMS block using the correct channel number and the “get ADC data” function as shown below:

 

 

XSysMonPsu_GetAdcData(InstancePtr, XSM_CH_VCC_PSLL0, XSYSMON_AMS);

 

 

When we are working with these channel numbers, it can get confusing as to what block identifier we should be using. It is worthwhile remembering that channels 0-6, 8-10 and 13-37 are addressed by the PL or PS Sysmon block identifiers while channels 38-53 to are addressed only by the AMS identifier.

 

I have uploaded the code to GitHub and you can find back issues here.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

 

I did not go to Embedded World in Nuremberg this week but apparently SemiWiki’s Bernard Murphy was there and he’s published his observations about three Zynq-based reference designs that he saw running in Aldec’s booth on the company’s Zynq-based TySOM embedded dev and prototyping boards.

 

 

Aldec TySOM-2 Prototyping Board.jpg

 

Aldec TySOM-2 Embedded Prototyping Board

 

 

 

Murphy published this article titled “Aldec Swings for the Fences” on SemiWiki and wrote:

 

 

“At the show, Aldec provided insight into using the solution to model the ARM core running in QEMU, together with a MIPI CSI-2 solution running in the FPGA. But Aldec didn’t stop there. They also showed off three reference designs designed using this flow and built on their TySOM boards.

 

“The first reference design targets multi-camera surround view for ADAS (automotive – advanced driver assistance systems). Camera inputs come from four First Sensor Blue Eagle systems, which must be processed simultaneously in real-time. A lot of this is handled in software running on the Zynq ARM cores but the computationally-intensive work, including edge detection, colorspace conversion and frame-merging, is handled in the FPGA. ADAS is one of the hottest areas in the market and likely to get hotter since Intel just acquired Mobileye.

 

“The next reference design targets IoT gateways – also hot. Cloud interface, through protocols like MQTT, is handled by the processors. The gateway supports connection to edge devices using wireless and wired protocols including Bluetooth, ZigBee, Wi-Fi and USB.

 

“Face detection for building security, device access and identifying evil-doers is also growing fast. The third reference design is targeted at this application, using similar capabilities to those on the ADAS board, but here managing real-time streaming video as 1280x720 at 30 frames per second, from an HDR-CMOS image sensor.”

 

The article contains a photo of the Aldec TySOM-2 Embedded Prototyping Board, which is based on a Xilinx Zynq Z-7045 SoC. According to Murphy, Aldec developed the reference designs using its own and other design tools including the Aldec Riviera-PRO simulator and QEMU. (For more information about the Zynq-specific QEMU processor emulator, see “The Xilinx version of QEMU handles ARM Cortex-A53, Cortex-R5, Cortex-A9, and MicroBlaze.”)

 

Then Murphy wrote this:

 

“So yes, Aldec put together a solution combining their simulator with QEMU emulation and perhaps that wouldn’t justify a technical paper in DVCon. But business-wise they look like they are starting on a much bigger path. They’re enabling FPGA-based system prototype and build in some of the hottest areas in systems today and they make these solutions affordable for design teams with much more constrained budgets than are available to the leaders in these fields.”

 

 

 

Digilent says that its new $199.99 Digital Discovery—a low-cost USB instrument that combines a 24-channel, 800Msamples/sec logic analyzer; a 16-bit, 100Msamples/sec digital pattern generator; and a 100mA power supply—“was created to be the ultimate embedded development companion.” Further, “Its features and specifications were deliberately chosen to maintain a small and portable form factor, withstand use in a variety of environments, and keep costs down, while balancing the requirements of operating on USB Power.”

 

(Note: Skip to the bottom of this blog for a limited-time offer. Then come back.)

 

Here’s a photo of the Digital Discovery:

 

 

Digilent Digital Discovery Module.jpg

 

Digilent’s Digital Discovery—a combined 24-channel, 800Msamples/sec logic analyzer and 16-bit, 100Msamples/sec digital pattern generator

 

 

If that form factor looks familiar, you’re probably reminded of the company’s Analog Discovery and Analog Discovery 2 100Msamples/sec USB DSO, logic analyzer, and power supply. (See “$279 Analog Discovery 2 DSO, logic analyzer, power supply, etc. relies on Spartan-6 for programmability, flexibility.) And just like the Analog Discovery modules, the Digilent Digital Discovery is based on a Xilinx Spartan-6 FPGA (an LX25), which becomes pretty clear when you look at the product’s board photo. The Spartan-6 FPGA is right there in the center of the board:

 

 

Digilent Digital Discovery Module Board.jpg

 

Digilent’s Digital Discovery is based on a Xilinx Spartan-6 FPGA

 

 

When I write “based on,” what I mean to say is that Digilent’s IP in the Spartan-6 FPGA pretty much implements the entire low-cost instrument—as clearly shown in the block diagram:

 

 

Digilent Digital Discovery Module Block Diagram.jpg

 

 

Digilent’s Digital Discovery Module Block Diagram

 

 

And what does this $199.99 instrument do, considering that it’s implemented using a low-cost FPGA? Here are the specs (and pretty impressive specs they are):

 

 

  • 24-channel, 800Msamples/sec* digital logic analyzer (1.2…3.3V CMOS)
  • 16-channel, 100Msamples/sec pattern generator (1.2…3.3V)
  • 16-channel virtual digital I/O including buttons, switches, and LEDs – perfect for logic training applications
  • Two input/output digital trigger signals for linking multiple instruments (1.2…3.3V CMOS)
  • A programmable power supply of 1.2…3.3V/100mA. The same voltage supplies the Logic Analyzer input buffers and the Pattern Generator input/output buffers, for keeping the logic level compatibility with the circuit under test.
  • Digital Bus Analyzers (SPI, I²C, UART, Parallel)

 

*Note: to obtain speeds of 200MS/s and higher, the High Speed Adapter must be used.

 

 

So, you may have noted that asterisk on the logic analyzer's maximum sample rate. You need a Digital Discovery High Speed Adapter to attain the full 800Msamples/sec acquisition rate on the logic analyzer. Normally, that’s another $49.99. However, for the first 100 Digital Discovery buyers, Digilent is throwing in the high-speed adapter in for free.

 

Operators are standing by.

 

 

 

 

 

Image3.jpgAEye is the latest iteration of the eye-tracking technology developed by EyeTech Digital Systems. The AEye chip is based on the Zynq Z-7020 SoC. It’s located immediately adjacent to the imaging sensor, which creates compact, stand-alone systems. This technology is finding its way into diverse vision-guided systems in the automotive, AR/VR, and medical diagnostic arenas. According to EyeTech, the Zynq SoC’s unique abilities allows the company to create products they could not do any other way.

 

With the advent of the reVISION stack, EyeTech is looking to expand its product offerings into machine learning, as discussed in this short, 3-minute video:

 

 

 

 

 

 

For more information about EyeTech, see:

 

 

 

 

 

Next week at OFC 2017 in Los Angeles, Acacia Communications, Optelian, Precise-ITC, Spirent, and Xilinx will present the industry’s first interoperability demo supporting 200/400GbE connectivity over standardized OTN and DWDM. Putting that succinctly, the demo is all about packing more bits/λ, so that you can continue to use existing fiber instead of laying more.

 

Callite-C4 400GE/OTN Transponder IP from Precise-ITC instantiated in a Xilinx Virtex UltraScale+ VU9P FPGA will map native 200/400GbE traffic—generated by test equipment from Spirent—into 2x100 and 4x100 OTU4-encapsulated signals. The 200GbE and 400GbE standards are still in flux, so instantiating the Precise-ITC transponder IP in an FPGA allows the design to quickly evolve with the standards with no BOM or board changes. Concise translation: faster time to market with much less risk.

 

 

Precise-ITC Callite-4 IP.jpg

 

Callite-C4 400GE/OTN Transponder IP Block Diagram

 

 

 

Optelian’s TMX-2200 200G muxponder, scheduled for release later this year, will muxpond the OTU4 signals into 1x200Gbps or 2x200Gbps DP-16QAM using Acacia Communications’ CFP2-DCO coherent pluggable transceiver.

 

 

The Optelian and Precise-ITC exhibit booths at OFC 2017 are 4139 and 4141 respectively.

 

 

 

By Adam Taylor

 

In looking at the Zynq UltraScale+ MPSoC’s AMS capabilities so far, we have introduced the two slightly different Sysmon blocks residing within the Zynq UltraScale+ MPSoC’s PS (processing system) and PL (programmable logic). In this blog, I am going to demonstrate how we can get the PS Symon up and running when we use both the ARM Cortex-A53 and Cortex-R5 processor cores in the Zynq UltraScale+ MPSoC’s PS. There is little difference when we use both types of processor, but I think it important to show you how to use both.

 

The process to use the Sysmon is the same as it is for many of the peripherals we have looked at previously with the MicroZed Chronicles:

 

  1. Look Up the configuration of the Sysmon Peripheral (XSysMonPsu_LookupConfig)
  2. Initialize the Sysmon Peripheral (XSysMonPsu_CfgInitialize)
  3. Reset the Sysmon (XSysMonPsu_Reset)
  4. Set the Sequencer to safe mode while we update its configuration (XSysMonPsu_SetSequencerMode)
  5. Disable the alarms (XSysMonPsu_SetAlarmEnables)
  6. Set the Sequencer Enables for the channels we want to sample (XSysMonPsu_SetSeqChEnables)
  7. Set the ADC Clock Divisor (XSysMonPsu_SetAdcClkDivisor)
  8. Set the Sequencer Mode (XSysMonPsu_SetSequencerMode)

 

The function names in parentheses are those which we use to perform the operation we desire, provided we pass the correct parameters. In the simplest case, as in this example, we can then poll the output registers using the XSysMonPsu_GetAdcData() function. All of these functions are defined within the file xsysmonpsu.h, which is available under the board Support Package Lib Src directory in SDK.

 

Examining the functions, you will notice that each of the functions used in step 4 to 8 require an input parameter called SysmonBlk. You must pass this parameter to the function. This parameter is how we which Sysmon (within the PS or the PL) we want to address. For this example, we will be specifying the PS Sysmon using XSYSMON_PS, which is also defined within xsysmonpsu.h. If we want to address the PL, we use the XSYSMON_PL definition, which we will be looking at next time.

 

There is also another header file which is of use and that is xsysmonpsu_hw.h. Within this file, we can find the definitions required to correctly select the channels we wish to sample in the sequencer. These are defined in the format:

 

 

XSYSMONPSU_SEQ_CH*_<Param>_MASK

 

 

This simple example samples the following within the PS Sysmon:

 

  1. Temperature
  2. Low Power Core Supply Voltage
  3. Full Power Core Supply Voltage
  4. DDR Supply Voltage
  5. Supply voltage for PS IO banks 0 to 3

 

We can use conversion functions provided within the xsysmonpsu.h to convert from the raw value supplied by the ADC into temperature and voltage. However, the PS IO banks are capable of supporting 3v3 logic. As such, the conversion macro from raw reading to voltage is not correct for these IO banks or for the HD banks in the PL. (We will look at different IO bank types in another blog).

 

The full-scale voltage is 3V for most of the voltage conversions. However, in line with UG580 Pg43, we need to use a full scale of 6V for the PS IO. Otherwise we will see a value only half of what we are expecting for that bank’s supply voltage setting. With this in mind, my example contains a conversion function at the top of the source file to be used for these IO banks, to ensure that we get the correct value.

 

The Zynq UltraScale+ MPSoC architecture permits both the APU (the ARM Cortex-A53 processors) and the RPU (the ARM Cortex-R5 processors) to address the Sysmon. To demonstrate this, the same file was used in applications first targeting an ARM Cortex-A53 processor in the APU and then targeting the ARM Cortex-R5 processor in the RPU. I used Core 0 in both cases.

 

The only difference between these two cases was the need to create new applications that select the core to be targeted and then updating the FSBL to load the correct core. (See “Adam Taylor’s MicroZed Chronicles, Part 172: UltraZed Part 3—Saying hello world and First-Stage Boot” for more information on how to do this.)

 

 

 

Image1.jpg

 

Results when using the ARM Cortex-A53 Core 0 Processor

 

 

 

Image2.jpg

 

Results when using the ARM Cortex-R5 Core 0 Processor

 

 

 

When I ran the same code, which is available in the GitHub directory, I received the examples as above over the terminal program, which show it working on both the ARM Cortex-A53 and ARM Cortex-R5 cores.

 

Next time we will look at how we can use the PL Sysmon.

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg

 

 

 

 

EETimes’ Junko Yoshida with some expert help analyzes this week’s Xilinx reVISION announcement

by Xilinx Employee ‎03-15-2017 01:25 PM - edited ‎03-22-2017 07:20 AM (646 Views)

 

Image3.jpgThis week, EETimes’ Junko Yoshida published an article titled “Xilinx AI Engine Steers New Course” that gathers some comments from industry experts and from Xilinx with respect to Monday’s reVISION stack announcement. To recap, the Xilinx reVISION stack is a comprehensive suite of industry-standard resources for developing advanced embedded-vision systems based on machine learning and machine inference.

 

(See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)

 

As Xilinx Senior Vice President of Corporate Strategy Steve Glaser tells Yoshida, “Xilinx designed the stack to ‘enable a much broader set of software and systems engineers, with little or no hardware design expertise to develop, intelligent vision guided systems easier and faster.’

 

Yoshida continues:

 

While talking to customers who have already begun developing machine-learning technologies, Xilinx identified ‘8 bit and below fixed point precision’ as the key to significantly improve efficiency in machine-learning inference systems.

 

 

Yoshida also interviewed Karl Freund, Senior Analyst for HPC and Deep Learning at Moor Insights & Strategy, who said:

 

Artificial Intelligence remains in its infancy, and rapid change is the only constant.” In this circumstance, Xilinx seeks “to ease the programming burden to enable designers to accelerate their applications as they experiment and deploy the best solutions as rapidly as possible in a highly competitive industry.

 

 

She also quotes Loring Wirbel, a Senior Analyst at The Linley group, who said:

 

What’s interesting in Xilinx's software offering, [is that] this builds upon the original stack for cloud-based unsupervised inference, Reconfigurable Acceleration Stack, and expands inference capabilities to the network edge and embedded applications. One might say they took a backward approach versus the rest of the industry. But I see machine-learning product developers going a variety of directions in trained and inference subsystems. At this point, there's no right way or wrong way.

 

 

There’s a lot more information in the EETimes article, so you might want to take a look for yourself.

 

 

 

 

Next week at the OFC Optical Networking and Communication Conference & Exhibition in Los Angeles, Xilinx will be in the Ethernet Alliance booth demonstrating the industry’s first, standard-based, multi-vendor 400GE network. A 400GE MAC and PCS instantiated in a Xilinx Virtex UltraScale+ VU9P FPGA will be driving a Finisar 400GE CFP8 optical module, which in turn will communicate with a Spirent 400G test module over a fiber connection.

 

In addition, Xilinx will be demonstrating:

 

 

PAM4 Eye.jpg

 

 

  • The world’s first complete FlexE 1.0 solution showcasing bonding, sub-rating and channelization on UltraScale+ FPGAs.

 

  • LLDP packet snooping on transport line cards to allow SDN controllers to build network topology maps, which aid data-center network automation.

 

  • Optical technology abstraction in DCI transport.

 

If you’re visiting OFC, be sure to stop by the Xilinx booth (#1809).

 

 

 

PLDA has announced the XpressRICH4-AXI PCIe 4.0 configurable IP block that ties an on-chip AXI bus to PICe 4.0. The IP block complies with the PCI Express Base 4.0r7 specification and supports endpoint, root port, and dual mode configurations. The IP supports Xilinx Virtex-7, Virtex UltraScale, and Kintex UltraScale devices and can be used for ASIC design as well.

 

Here’s a block diagram of the core:

 

 

PLDA XpressRICH4-AXI PCIe 4 IP.jpg

 

 

PLDA XpressRICH4-AXI PCIe 4.0 configurable IP Block Diagram

 

 

Please contact PLDA directly for more information about this IP.

 

 

 

This week at Embedded World in Nuremberg, Lynx Software Technologies is demonstrating its port of the LynxSecure Separation Kernel hypervisor to the ARM Cortex-A53 processors on the Xilinx Zynq UltraScale+ MPSoC. According to Robert Day, Vice President of Marketing at Lynx, "ARM designers are now able to run safety critical environments alongside a general purpose OS like Linux or LynxOS RTOS on the same Xilinx processor without compromising safety, security or real-time performance. Use cases include automotive systems based on environments such as AUTOSAR RTA-BSW from ETAS and avionics designs using LynxOS-178 RTOS from Lynx. Designers can match the security of air-gap hardware partitioning without incurring the cost, power and size overhead of separate hardware."

 

The LynxSecure port to the Zynq UltraScale+ MPSoC supports modular software architectures and tight integration with the Zynq UltraScale+ MPSoC’s FPGA fabric for hosting bare-metal applications, trusted functions, and open-source projects on a single SoC with secure partitioning.  You have the option to decide which functions run in software using LynxSecure bare-metal apps and which functions you need to hardware-accelerate through the Zynq UltraScale+ MPSoC’s FPGA fabric.

 

The LunxSecure technology was designed to satisfy high-assurance computing requirements in support of the NIST, NSA Common Criteria, and NERC CIP evaluation processes which are used to regulate military and industrial computing environments.

 

The LynxSecure Separation Kernel hypervisor provides:

 

  • Safety & Security
  • Domain Isolation
  • Trusted Execution Environments
  • Reference Monitor Plugins (e.g. Firewalls, IDS encryption, guards)

 

 

Here’s a diagram of the LynxSecure Separation Kernel hypervisor architecture:

 

 

 

LynxSecure Architecture.jpg 

 

 

 

Please contact Lynx Software Technologies directly for information about the LynxSecure Separation Kernel hypervisor.

 

 

Zynq + PYNQ + Python + BNNs: Machine inference does not get any easier… or faster

by Xilinx Employee ‎03-14-2017 03:10 PM - edited ‎03-15-2017 10:25 AM (3,788 Views)

 

Machine learning and machine inference based on CNNs (convolutional neural networks) are the latest way to classify images and, as I wrote in Monday’s blog post about the new Xilinx reVISION announcement, “The last two years have generated more machine-learning technology than all of the advancements over the previous 45 years and that pace isn't slowing down.” (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”) The challenge now is to make the CNNs run faster while consuming less power. It would be nice to make them easier to use as well.

 

OK, that’s a setup. A paper published last month at the 25th International Symposium on Field Programmable Gate Arrays titled “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference” describes a method to speed up CNN-based inference while cutting power consumption by reducing CNN precision in the inference machines. As the paper states:

 

…a growing body of research demonstrates this approach [CNN] incorporates significant redundancy. Recently, it has been shown that neural networks can classify accurately using one- or two-bit quantization for weights and activations.  Such a combination of low-precision arithmetic and small memory footprint presents a unique opportunity for fast and energy-efficient image classification using Field Programmable Gate Arrays (FPGAs). FPGAs have much higher theoretical peak performance for binary operations compared to floating point, while the small memory footprint removes the off-chip memory bottleneck by keeping parameters on-chip, even for large networks. Binarized Neural Networks (BNNs), proposed by Courbariaux et al., are particularly appealing since they can be implemented almost entirely with binary operations, with the potential to attain performance in the teraoperations per second (TOPS) range on FPGAs.

 

The paper then describes the techniques developed by the authors to generate BNNs and instantiate them into FPGAs. The results, based on experiment using a Xilinx ZC706 eval kit based on a Zynq Z-7045 SoC, are impressive:

 

When it comes to pure image throughput, our designs outperform all others. For the MNIST dataset, we achieve an FPS which is over 48/6x over the nearest highest throughput design [1] for our SFC-max/LFC-max designs respectively. While our SFC-max design has lower accuracy than the networks implemented by Alemdar et al. for our LFC-max design outperforms their nearest accuracy design by over 6/1.9x for throughput and FPS/W respectively. For other datasets, our CNV-max design outperforms TrueNorth for FPS by over 17/8x for CIFAR-10 / SVHN datasets respectively, while achieving 9.44x higher throughput than the design by Ovtcharov et al., and 2:2x over the fastest results reported by Hegde et al. Our prototypes have classification accuracy within 3% of the other low-precision works, and could have been improved by using larger BNNs.

 

There’s something even more impressive, however. This design approach to creating BNNs is so scalable that it’s now on a low-end platform—the $229 Digilent PYNQ-Z1. (Digilent’s academic price for the PYNQ-Z1 is only $65!) Xilinx Research Labs in Ireland, NTNU (Norwegian U. of Science and Technology), and the U. of Sydney have released an open-source Binarized Neural Network (BNN) Overlay for the PYNQ-Z1 based on the work described in the above paper.

 

According to Giulio Gambardella of Xilinx Reseach Labs, “…running on the PYNQ-Z1 (a smaller Zynq 7020), [the PYNQ-Z1] can achieve 168,000 image classifications per second with 102µsec latency on the MNIST dataset with 98.40% accuracy, and 1700 images per seconds with 2.2msec latency on the CIFAR-10, SVHN, and GTSRB dataset, with 80.1%, 96.69%, and 97.66% accuracy respectively running at under 2.5W.”

 

 

PYNQ-Z1.jpg

 

Digilent PYNQ-Z1 board, based on a Xilinx Zynq Z-7020 SoC

 

 

 

Because the PYNQ-Z1 programming environment centers on Python and the Jupyter development environment, there are a number of Jupyter notebooks associated with this package that demonstrate what the overlay can do through live code that runs on the PYNQ-Z1 board, equations, visualizations and explanatory text and program results including images.

 

There are also examples of this BNN in practical application:

 

 

 

 

For more information about the Digilent PYNQ-Z1 board, see “Python + Zynq = PYNQ, which runs on Digilent’s new $229 pink PYNQ-Z1 Python Productivity Package.

 

 

 

 

Image3.jpgToday, EEJournal’s Kevin Morris has published a review article of the announcement titled “Teaching Machines to See: Xilinx Launches reVISION” following Monday’s announcement of the Xilinx reVISION stack for developing vision-guided applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”

 

Morris writes:

 

But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms.

 

“That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric - such as Xilinx Zynq and Altera SoC FPGAs - absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space.

 

Later on, Morris gets to the fly in the ointment:

 

“Oh, yeah, There’s still that “almost impossible to program” issue.”

 

And then he gets to the solution:

 

reVISION, announced this week, is a stack - a set of tools, interfaces, and IP - designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development - with customers predicting cuts of as much as 12 months from current schedules for new product and update development.

 

In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.

 

The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ⅕ the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications.

 

 

But don’t take my word for it, read Morris’ article yourself.

 

 

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.