Displaying articles for: 03-26-2017 - 04-01-2017
This short, 2-minute video shows a live Spartan-7 7S50 FPGA operating on a board, running a MicroBlaze soft processor connected to DDR3 SDRAM as a demo. The 28nm Spartan-7 device family comes in small form-factor packages—as small as 8x8mm. You design systems based on Spartan-7 devices with the Xilinx Vivado HL Design Suite tools.
These devices are available for ordering now and operators are standing by.
Today, Xilinx posted information about the new $2995 Kintex UltraScale+ KCU116 Eval Kit on Xilinx.com. If you’re looking to get into the UltraScale+ FPGAs’ GTY transceiver races—to 32.75Gbps—this is a great kit to start with. The kit includes:
Here’s a nice shot of the KCU116 board from the kit’s quickstart guide:
Kintex UltraScale+ KCU116 Eval Board
One of the key features of this board are the four SFP+ optical cages there on the left. Those handle 25Gbps optical modules, driven of course by four of the KU5P FPGA’s GTY transceivers.
Take a look.
Here’s a 40-minute teardown video of a Vision Research Phantom v5 high-speed high-speed, 1024x1024-pixel, 1000frames/sec video camera (circa 2001) from tesla500’s YouTube video channel. His methodical teardown and excellent system-level explanation uncovers a couple of “huge” Xilinx XC4020 FPGAs (circa 2000) on the timing and interface boards and Xilinx XC9500 CPLDs implementing the timing and control on the four high-speed capture-memory boards. There’s also a Hitachi SH-2 32-bit RISC processor with a hardware MAC (for DSP) on the timing board.
The XC4020 FPGAs are 3rd-generation devices that each have 784 CLBs (1560 LUTs total). They were big in their day but they’re very small now. These days, I think you could implement all of the digital timing and control circuitry in this camera including the SH-2 processor’s capabilities using the smallest single-core Zynq Z-7007S SoC—with the ARM Cortex-A9 processor in the Zynq SoC running considerably more than 20x faster than the turn-of-the-millennium SH-2 processor’s roughly 28MHz maximum clock rate.
Of course, Vision Research has moved far beyond 1000 frames/sec over the past 17 years. Its latest cameras can go 1000x faster than that, hitting 1M frames/sec when configured with the company’s FAST option (fast indeed!), while the Phantom v5 is no longer listed even on the company’s “discontinued cameras” page. Nevertheless, I found tesla500’s teardown and explanations fascinating and valuable.
Of course, Xilinx All Programmable devices have long been used to design advanced video equipment like the Vision Research Phantom v5 high-speed camera. Which allows me to quickly remind you of the recent launch of the Xilinx reVISION stack launch for embedded-vision applications. (See “Xilinx reVISION stack pushes machine learning for vision-guided applications all the way to the edge.”)
And now, here’s tesla500’s Vision Research Phantom v5 high-speed camera teardown video:
On April 11, the third, free Webinar in Xilinx's "Precise, Predictive, and Connected Industrial IoT" series will provide insight into the role of Zynq All Programmable SoCs in the breath of applications across IIoT Edge and the connectivity between them. A brief summary of IIoT trends will be presented, followed by an overview of the Data Distribution Service (DDS) IIoT databus standard presented by RTI, the IIoT Connectivity Company, and how DDS and OPC-UA target different connectivity challenges in IIoT systems.
Webinar attendees will also learn:
Xcell Daily discussed DeePhi Tech’s Zynq-based CNN acceleration processor last year in connection with the Hot Chips 2016 conference. (See “DeePhi’s Zynq-based CNN processor is faster, more energy efficient than CPUs or GPUs.”) DeePhi’s founder Song Yao appears in a new Powered by Xilinx video this week giving many more details including some fascinating information about an early customer, ZeroTech—China’s second largest drone maker.
DeePhi provides the entire stack needed to develop machine-learning applications based on neural networks including the development software, algorithms, and a neural-network processor that runs efficiently on the Xilinx Zynq SoC. This technology is particularly good for deep-learning, vision-based embedded apps such as drones, robotics, surveillance cameras, and for cloud-computing applications as well.
The video also provides more details on ZeroTech’s use of DeePhi’s machine-learning technology for object detection, pedestrian detection, and gesture recognition—all in a drone that nestles in your hand.
Song Yao explains that DeePhi’s tools provide a GPU-like development environment while taking advantage of the superior efficiency of neural networks implemented with programmable logic. In addition, DeePhi can change the neural network’s architecture to further optimize the design for specific applications.
Finally, he explains that you can use these Zynq-based implementations in applications where GPUs will simply not work due to power-consumption restrictions. In fact, last year at Hot Chips 2016 he reportedly said, “The FPGA based DPU platform achieves an order of magnitude higher energy efficiency over GPU on image recognition and speech detection.”
Here’s the new, 3-minute Powered by Xilinx video:
Last month, I blogged about a new Aldec FPGA Prototyping board—the HES-US-440—based on the “big, big, big” Xilinx Virtex UltraScale VU440 FPGA teamed with the Xilinx Zynq Z-7100 SoC. (See “Aldec selected the big, big, big Virtex UltraScale VU440 (and the Zynq SoC) for its new proto board—the HES-US-440.”) Now, Aldec’s Hardware Technical Support Manager Krzysztof Szczur has published an interesting article titled “Software Driven Test of FPGA Prototype,” which describes how you can use this prototyping board to create software-driven testbenches.
Why would you want to do that?
Because a software-driven verification methodology can shorten your development schedule by a lot, especially if you speed it up by moving from slow, software-based simulation to a much, much faster FPGA-based prototyping environment like the one provided by the Aldec HES-US-440.
And because time = money.
In fact, time >> money because you can usually find more money but there’s absolutely, positively no one out there minting time.
Note that this is true whether you’re designing an ASIC or you plan to deploy your design on a Xilinx All Programmable device.
Aldec HES-US-440 FPGA Prototpying Board Connection Diagram
Adam Taylor and Xilinx’s Sr. Product Manager for SDSoC and Embedded Vision Nick Ni have just published an article on the EE News Europe Web site titled “Machine learning in embedded vision applications.” That title’s pretty self-explanatory, but there are a few points I’d like to highlight. Then you can go read the full article yourself.
As the article states, “Machine learning spans several industry mega trends, playing a very prominent role within not only Embedded Vision (EV), but also Industrial Internet of Things (IIoT) and Cloud Computing.” In other words, if you’re designing products for any embedded market, you might well find yourself at a competitive disadvantage if you’re not adding machine-learning features to your road map.
This article closely ties machine learning with neural networks (including Feed-forward Neural Networks (FNNs), Recurrent Neural Networks (RNNs), and Deep Neural Networks (DNNs), and Convolutional Neural Networks (CNNs)). Neural networks are not programmed; they’re trained. Then, if they’re part of an embedded design, they’re deployed. Training is usually done using floating-point neural-network implementations but, for efficiency (power and cost), deployed neural networks can use fixed-point representations with very little or no loss of accuracy. (See “Counter-Intuitive: Fixed-Point Deep-Learning Inference Delivers 2x to 6x Better CNN Performance with Great Accuracy.”)
The programmable logic inside of Xilinx FPGAs, Zynq SoCs, and Zynq UltraScale+ MPSoCs is especially good at implementing fixed-point neural networks, as described in this article by Nick Ni and Adam Taylor. (Go read the article!)
Meanwhile, this is a good time to remind you of the recent Xilinx introduction of the reVISION stack for neural network development using Xilinx All Programmable devices. For more information about the Xilinx reVISION stack, see:
This article about Cyber Physical Systems on the Embedded Computing Design site led me to a new SMARC Rel. 2.0 module—the SECO SM-B71—that’s capable of carrying any one of ten Zynq UltraScale+ MPSoCs based on the common SFVC784 package pinout shared by these devices. The MPSoC device list includes the ZU2CG, ZU3CG, ZU4CG, ZU5CG family members and the ZU2EG, ZU3EG, ZU4EG, ZU5EG, ZU4EV, and ZU5EV family members with the integrated ARM Mali-400 MP2 GPU. Now that’s flexibility. The board also accommodates as much as 8Gbytes of DDR4-2400 SDRAM. Here’s a photo:
SECO SM-B71 Zynq UltraScale+ MPSoC SMARC Module
As you can see from the image, SECO is previewing this product at the moment, so please contact SECO directly for more information about the SM-B71.
InnoRoute has just started shipping its TrustNode extensible, ultra-low-latency (2.5μsec) IPv6 OpenFlow SDN router as a pcb-level product. The design combines a 1.9GHz, quad-core Intel Atom processor running Linux with a Xilinx FPGA to implement the actual ultra-low-latency router hardware. (You’re not implementing that as a Linux app running on an Atom processor!) The TrustNode Router reference design features twelve GbE ports. Here’s a photo of the TrustNode SDN Router board:
InnoRoute TrustNode SDN Router Board with 12 GbE ports
Based on the pcb layout in the photo, it appears to me that the Xilinx FPGA implementing the 12-port SDN router is under that little black heatsink in the center of the board nearest to all of the Ethernet ports while the quad-core processor running Linux must be sitting there in the back under that great big silver heatsink with an auxiliary cooling fan, near the processor-associated USB ports and SDcard carrier.
InnoRoute’s TrustNode Web page is slightly oblique as to which Xilinx FPGA is used in this design but the description sort of winnows the field. First, the description says that you can customize InnoRoute’s TrustNode router design using the Xilinx Vivado HL Design Suite WebPACK Edition—which you can download at no cost—so we know that the FPGA must be a 28nm series 7 device or newer. Next, the description says that the design uses 134.6k LUTs, 269.2k flip-flops, and 12.8Mbits of BRAM. Finally, we see that the FPGA must be able to handle twelve Gigabit Ethernet ports.
The Xilinx FPGA that best fits this description is an Artix-7 A200.
You can use this TrustNode board to jump into the white-box SDN router business immediately, or at least as fast as you can mill and drill an enclosure and screen your name on the front. In fact, InnoRoute has kindly created a nice-looking rendering of a suggested enclosure design for you:
InnoRoute TrustNode SDN Router (rendering)
The router’s implementation as IP in an FPGA along with the InnoRoute documentation and the Vivado tools mean that you can enhance the router’s designs and add your special sauce to break out of the white box. (White Box Plus? White Box Permium? White Box Platinum? Hey, I’m from marketing and I’m here to help.)
This design enhancement and differentiation are what Xilinx All Programmable devices are especially good at delivering. You are not stuck with some ASSP designer’s concept of what your customers need. You can decide. You can differentiate. And you will find that many customers are willing to pay for that differentiation.
Note: Please contact InnoRoute directly for more information on the TrustNode SDN Router.
Here’s another amazing demo video of National Instrument’s (NI’s) PXIe-5840 VST (Vector Signal Transceiver) showing real-time, DVR-like capture of 1GHz of continuous-bandwidth RF data on a 24Tbyte RAID drive. You’d want this if you needed to capture a real-time, broad-spectrum set of RF signals for subsequent, more detailed analysis. The VST captures the broad-spectrum data and simultaneously streams it to the RAID storage box. (The NI PXIe-5840 VST is based on a Xilinx Virtex-7 690T FPGA for its real-time RF-generation and –analysis capabilities.)
Here’s the 2-minute video:
For more information about the 2nd-generation NI VST, see:
A blog post from earlier this week, “Seven low-cost Zynq dev and training boards: a quick review,” prompted an email from Graham Naylor in the UK. Naylor informed me that I’d not mentioned his favorite Zynq-based board, the Trenz TE0722, in that blog post—and then he told me how he’s using the Trenz board (which is really more of a low-cost SOM rather than a training/dev board). During the day, Naylor measures neutron pulses from an ionization chamber using the Zynq-based Red Pitaya open instrumentation platform. (I’ve written many blogs about the Red Pitaya, listed below.) For fun, it appears that Naylor and colleague Pete Allwright design cave radios. If you’ve never heard of a cave radio, you’re in good company because I hadn’t either.
Naylor sent me a preprint of an article that will appear in the quarterly BCRA’s Cave Radio & Electronics Group Journal, in the June 2017 issue. (The BCRA is the British Cave Research Association.) Naylor’s and Allwright’s article, titled “Outlining the Architecture of the Nicola 3Z Cave Radio,” discusses the design of a new version of the Nicola 3 rescue radio designed to be used by cave rescue teams for underground communications.
The original Nicola 3 radio was based on a Xilinx Spartan-3E FPGA supplied on a module from OHO Elektronik. The FPGA implemented an SDR design for a radio that performs SSB modulation and demodulation using an 87KHz carrier wave. Radio transmission does not occur through the air but through the ground using a couple of electrodes jammed into the earthen floor of the cave. (We’re in a cave, remember?) A little water poured on the earth interface helps improve transmission/reception.
Xilinx introduced the 90nm Spartan-3E in 2005, so the Nicola cave radio development team has upgraded the Nicola design to the Zynq Z-7010 SoC, which resides on a low-cost Trenz TE0722 SOM. Trenz sells one of these boards for just €64.00 and if you want 500 pieces, the price drops to €48.00.
Trenz TE0722 Zynq SOM
The new radio is called the Nicola 3Z. (I'm guessing "Z" is for "Zynq.") The FPGA fabric in the Zynq SoC implements the SDR functions in the Nicola 3Z radio including the SSB class D modulator, which drives an H-bridge driver for transmission; the receiver’s SSB filter, decimator, and demodulator; and an AGC block implemented on a soft-core Xilinx PicoBlaze 8-bit microcontroller, which is also instantiated in the Zynq SoC’s FPGA. There’s a second PicoBlaze instantiation on chip for housekeeping. That Zynq Z-7010 SoC may be a low-end part, but it’s plenty busy in the Nicola 3Z radio’s design.
Note: For more information about the Zynq-based Red Pitaya open instrumentation platform, see:
By Adam Taylor
At the end of the Sysmon AMS blogs I had introduced the several PLLs within the Zynq UltraScale+ MPSoC. This introduction suggests to me that it’s time to talk about the clocking architecture of the MPSoC Device.
As with the original Zynq SoC, the PS (processing system) in the Zynq UltraScale+ MPSoC is the system master. So we will initially focus upon its clocking architecture. Within the PS there are three main clock inputs:
While the PS reference clock has a dedicated input pin, the PSS_ALT_REF_CLK and PSS_VIDEO_REF_CLK are input via the MIO and are enabled or disabled in Vivado by the I/O configuration customization tab. If we plan on using these clocks, we need to ensure there is no conflict with other planned use of the MIO.
Enabling the Alternate reference clock and the video clock
Once these have been enabled, we can configure them on the clock configuration input clock tab as shown below:
Internally, the PS has four clock groups that provide all the required clocks:
We’ll now focus on the MCG as this is the group with which we will have the most interaction. Within this group, we choose which of the five PLLs is used to clock the Zynq UltraScale+ MPSoC’s processors and peripherals within the LPD and FPD. We can do this via the clock configuration -> output clocks tab. Here we can configure the domains clocking for both the low and full power domains.
To generate a PLL output frequency as closely as possible to the desired frequency, we may want to change the PLL input-clock source. We have several potential clock sources which can be used to clock each of the PLLs within the Zynq UltraScale+ MPSoC.
As mentioned above we can use PS_REF_CLK, PS_ALT_REF_CLK, or PS_VIDEO_REF_CLK. These clocks are directly input into the PS. We can also use one of the four GT_REF_CLKS or the AUX_REF_CLK. This latter reference clock is provided from the PL while the former clock is provided by the PS_GTR. The relevant PLL control register selects which of these clocks drives the PLL. These registers reside in either the CRL_APB module for low-power domain PLLs or CRF_APB module for high-power domain PLLs.
We can select which of the four GT reference clocks is provided as the GT_REF_CLK using the Serial Input Output Unit (SIOU) module CRX_CNTRL Register.
Now that we understand the Zynq UltraScale+ MPSoC’s clocking and how we set the desired frequency for each of the subsystems, we will explore the subsystems in more detail in the MicroZed Chronicles blogs that follow.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.
By Adam Taylor
I just received the most interesting email from Antti at Trenz, in which he pointed out that he had designed a 500MHz radio receiver using just four resistors, four capacitors, and a Xilinx series 7 FPGA. How did he do this? He is keeping that to himself for the moment. However, Antti thinks it would a great idea to open a challenge based upon this design to see if others in the FPGA community can explain how he achieved this. Of course, for the winners who supply the correct answer there will be Trenz goodies as prizes. (See below.)
The diagram below shows the components allowed to solve this challenge. You can redraw them if necessary to clarify the schematic. The values of R and C do not need to be optimized.
To prove that this is possible, the screen shots below show the input and the recovered signal inside the FPGA.
By this point I was thinking Delta Sigma ADC using the FPGA’s LVDS inputs. (There are papers and articles about this technique online.) However, Antti tells me this is not his solution and he was kind enough to provide a few hints for this challenge below:
Because they have so many Trenz prizes to give away to the winners, Antti has created three categories:
The closing date for entries is July 3rd. The judges will be Antti and myself (Adam Taylor).
If you want to enter your solution in any category email email@example.com
Image Matters’ Origami B20 module, based on a Xilinx Kintex UltraScale KU060 FPGA, is a small 94x53mm module that you can use to perform all sorts of high-speed processing. (See “Image Matters launches Origami Ecosystem for developing advanced 4K/8K video apps using the FPGA-based Origami module.”) For example, you can use it for a variety of video-compression applications using various IP compression cores including MPEG, JPEG-2000, and TICO. You can also use it for cloud-computing and neural-network applications such as image detection. The key thing is that the small Origami B20 module puts everything you need to run the FPGA on the one small module including SDRAM, Flash memory, the power supply, a backup battery, and security features (including tamper protection).
Here’s a short, 2.5-minute, Powered by Xilinx video with more information about the Origami B20 module:
In yesterday’s EETimes article titled “How will Ethernet go real-time for industrial networks?,” author Richard Wilson interviews National Instruments’ Global Technology and Marketing Director Rahman Jamal about using OPC-UA (the OPC Foundation’s Unified Architecture) and TSN (time-sensitive networking) to build industrial Ethernet networks (IIoT/Industrie 4.0) that deliver real-time response. (Yes, yes, yes, “real-time” is a loosely defined term where “real” depends on your system’s temporal reality.) As Jamal states in the interview, some constrained industrial Ethernet network topologies need no help to achieve real-time operation. In other cases and for other topologies, you need Ethernet implementations that are “heavily modified at the hardware level to achieve performance.”
One of the hardware additions that can really help is the hardware implementation of the IEEE 1588v2 PTP (Precision Time Protocol) clock-synchronization standard. PTP permits each piece of network-connected equipment to be synchronized using a 64-bit timer, which can be used for time-stamping, synchronization, control and as a common time reference to implement TSN.
PTP implementation is an ideal task for an IP block instantiated in programmable logic (see last year’s Xcell Daily blog post “Intelligent Gateways Make a Factory Smarter,” written by SoC-e (System on Chip engineering) founder and CEO Armando Astarloa). SoC-e has implemented just such an IEEE 1588v2 PTP IP core in a Xilinx Zynq SoC, which is the core logic device inside of the company’s CPPS-Gate40 Sensor intelligent IIoT gateway. (Note: Software PTP implementations are neither fast nor deterministic enough for many IIoT applications.)
SoC-e CPPS-Gate40 Sensor intelligent IIoT gateway
You can see the SoC-e PTP IP core in the very center of this CPPS-Gate40 block diagram:
SoC-e CPPS-Gate40 Sensor intelligent IIoT gateway block diagram
According to the SoC-e Web page, the company’s IEEE 1588v2 IP core in the CPPS-Gate40 Sensor gateway can deliver sub-microsecond network synchronization. How is such a small number possible? As Jamal says in his EETimes’ interview, “bit times (time on the wire) for a 64-byte frame at GigE rates is 512ns.” That’s how.
With last week’s introduction of the Digilent Arty Z7 Zynq SoC training and dev board, I felt it was time to review some of the low-cost boards that occupy an outsized piece of mindshare in my head. So here are seven that have appeared previously in the Xcell Daily blog, listed in alphabetical order by vendor:
Analog Devices ADALM-PLUTO ($149): Students from all levels and backgrounds looking to improve their RF knowledge will want to take a look at the new ADALM-PLUTO SDR USB Learning Module from Analog Devices. The $149 USB module has an RF range of 325MHz to 3.8GHz with separate transmit and receive channels and 20MHz of instantaneous bandwidth. It pairs two devices that seem made for each other: an Analog Devices AD9363 Agile RF Transceiver and a Xilinx Zynq Z-7010 SoC.
Analog Devices’ $149 ADALM-PLUTO SDR USB Learning Module
Digilent ARTY Z7 ($149 to $209): The first thing you’ll note from the Arty Z7 dev board photo is that there’s a Zynq SoC in the middle of the board. You’ll also see the board’s USB, Ethernet, Pmod, and HDMI ports. On the left, you can see double rows of tenth-inch headers in an Arduino/chipKIT shield configuration. There are a lot of ways to connect to this board, which should make it a student’s or experimenter’s dream board considering what you can do with a Zynq SoC.
Digilent Arty Z7 dev board for makers and hobbyists
Digilent PYNQ-Z1 ($229): PYNQ is an open-source project that makes it easy for you to design embedded systems using the Xilinx Zynq-7000 SoC using the Python language, associated libraries, and the Jupyter Notebook, which is a pretty nice, collaborative learning and development environment for many programming languages including Python. PYNQ allows you to exploit the benefits of programmable logic used together with microprocessors to build more capable embedded systems with superior performance when performing embedded tasks.
Digilent PYNQ-Z1 Dev Board
Krtkl’s snickerdoodle ($95 to $195): The amazing, Zynq-based “snickerdoodle one” is a low-cost, single-board computer with wireless capability based on the Xilinx Zynq Z-7010 SoC, available for purchase on the Crowd Supply crowdsourcing Web site.
Krtkl’s Zynq-based, WiFi-enabled Snickerdoodle Dev Board
National Instruments myRIO ($400 to $1001): The NI myRIO hardware/software development platform for NI’s LabVIEW system design software is based on the Zynq-7010 All Programmable SoC. About the size of a small paperback book (so that it easily drops into a backpack), the NI myRIO sports ten analog inputs, six analog outputs, left and right audio channels, 40 digital I/O lines (SPI, I2C, UARD, PWM, rotary encoder) and an on-board, 3-axis accelerometer, and two 34-pin expansion headers.
The Zynq-based NI myRIO
National Instruments RoboRIO ($435 for FRC teams): The NI roboRIO robotic controller was specifically designed for the FIRST Robotics Competition (FRC). The FRC event is a particular passion for NI’s founder, Dr. James Truchard.
NI roboRIO for First Robotics Competition teams
Trenz ZynqBerry (€79.00 to €109.00): The Trenz Electronic TE0726 ZynqBerry Dev Board puts a Xilinx Zynq Z-7010 or Z-7020 SoC into a Rasberry-Pi-compatible form factor with 64Mbytes of LPDDR2 SDRAM, four USB ports (in a hub configuration), a 100Mbps Ethernet port, an HDMI port, MIPI DSI and CSI-2 connectors, a PWM digital audio jack, and 128Mbits of Flash memory for configuration and operation.
TE0726 ZynqBerry Dev Board from Trenz
By Adam Taylor
A couple of weeks ago, I talked about the Xilinx reVision stack and the support it provides for OpenVX and OpenCV. One of the most exciting things I explained was about how we could accelerate several OpenCV functions (which include the OpenVX Core functions) using the Zynq SoC’s programmable logic. What I did not look at was the other significant part of the reVision stack and its support for machine learning.
Machine learning is increasing important for embedded-vision applications because it helps systems to evolve from being vision-enabled to being vision-guided autonomous systems. Machine learning is often used for embedded-vision applications to identify and classify information contained within an image. The embedded-vision system uses these identifications and classifications to make informed decisions in real time, enabling increased interaction with the environment.
For those unfamiliar with machine learning it is most often implemented by the creation and training of a neural network. Neural networks are modelled upon the human cerebral cortex in that each neuron receives an input, processes it, and communicates the processed signal it to another neuron. Neural networks typically consist of an input layer, internal layer(s), and an output layer.
Those familiar with machine learning may have come across the term “deep learning.” This is where there are several hidden layers in the neural network, allowing more complex machine-learning algorithms to be implemented.
When working with neural networks in embedded-vision applications, we need to use a 2D network. This is where Convolutional Neural Networks (CNNs) are used. CNNs are deep-learning networks that contain several convolutional and sub-sampling layers along with a separate, fully connected network to perform the final classification. Within the convolution layer, the input image will be broken down into several overlapping smaller tiles.
The results from this convolution layer are used to create an activation map, using an activation layer in the network placed before further sub-sampling and additional stages and preceding the final, fully connected network. The exact implementation of the CNN network varies depending upon the network architecture implemented (GoogLeNet, SSD, AlexNet). However, a CNN will typically contain at least the following elements:
The weights used for each of these elements are determined via training, and one of the CNN’s advantages is the relative ease of training the network. Training requires large data sets and high-performance computers to correctly determine the weights for each stage.
To ease the development of machine-learning applications, many engineers use a framework like Caffe, which supports the implementation and training of machine learning. The use of frameworks allows us to work at a higher level and maximize reuse. Using a framework, we don’t need to start from scratch each time we develop an application.
The Xilinx reVision stack provides an integrated Caffe framework flow, which allows us to take the prototext definition of the network and trained weights to deploy the machine-learning application. (Note that network training is separate and distinct from deployment.) To enable this, the Xilinx reVision stack provides several hardware-accelerated functions that can be implemented within the Zynq SoC’s or Zynq UltraScale+ MPSoC’s PL (programmable logic) to create the machine-learning inference engine. The reVision stack also provides examples for a wide range of network structures, enabling us to get up and running with our machine-learning application without the need to initially compile the PL design. Once we are happy with the machine-learning application, we can then use the SDSoC flow to develop our own embedded-vision application containing the optimized machine-learning application.
Using the Zynq PL provides for an optimal implementation that delivers faster response times when interacting with the embedded-vision system environment. This is especially true as machine learning applications are increasingly implemented using fixed-point integers like INT8, which are ideal for implementation in DSP elements.
Machine learning is going to be a hot area for several applications. So I will be coming back to this topic in detail as the MicroZed Chronicles progress—with some examples of course.
If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.