UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

 

Thirty years ago, my friends and co-workers Jim Reyer and KB and I would drive to downtown Denver for a long lunch at a dive Mexican bar officially known as “The Brewery Bar II.” But the guy who owned it, the guy who was always perched on a stool inside to the door to meet and seat customers, was named Abe so we called these trips “Abe’s runs.” This week, I found myself in downtown Denver again at the SC17 supercomputer conference at the Colorado Convention Center. The Brewery Bar II is still in business and only 15 blocks away from the convention center, so on a fine, sunny day, I set out on foot for one more Abe’s run.

 

I arrived about 45-minutes later.

 

 

 

Abes Front.jpg 

 

 

 

I walked in the front door and 30 years instantly evaporated. I couldn’t believe it but the place didn’t look any different. The same rickety tables. The same neon signs on the wall. The same bar. The same weird red, flocked wallpaper. It was all the same except my friends weren’t there with me and Abe wasn’t sitting on a stool. I’d already known that he’d passed away many years ago.

Also the same was the crowded state of the place at lunch time. The waitress (they don’t have servers at Abe’s) told me there were no tables available but I could eat at the bar. I took a place at the end of the bar and sat next to a guy typing on a laptop. That wasn’t the same as it was 30 years ago.

 

 

Abes 1 small.jpg 

 

 

 

The bartender came up and asked me what I wanted to drink. I said I’d not been in for more than 25 years and asked if they still served “Tinys.” A Tiny is Abe’s-speak for a large beer. He said “Of course,” so I ordered a Tiny ice tea. (Not the Long Island variety.)

 

Then he asked me what I wanted to eat. There’s only one response for that at Abe’s and since they still understood what a Tiny was, I answered without ever touching a menu: “One special relleno, green, with sour cream as a neutron moderator.” He asked me if I wanted the green chile hot, mild, or half and half. Thirty years ago, I’d have ordered hot. My digestive system now has three more decade’s worth of mileage on it, so I ordered half and half. Good thing. The chile’s hotness still registered a 6 or 7 on the Abe’s 1-to-10 scale.

 

After I ordered, the guy with the laptop next to me said “The rellenos are still as good as they were 25 years ago.” Indeed, that’s what he was eating. The ice had broken with Abe’s hot rellenos and so we started talking. The laptop guy’s name was Scott and he maintains cellular antenna installations on towers and buildings. His company owns a lot of cell tower sites in the Denver area.

 

Scott is very familiar with the changes taking place in cellular infrastructure and cell-site ownership, particularly with the imminent arrival of the latest 5G gear. He told me that the electronics is migrating up the towers to be as near the antennas as possible. “All that goes up there now is 48V power and a fiber,” he said. Scott is also familiar with the migration of the electronics directly into the antennas.

 

It turns out that Scott is also a ham radio operator, so we talked about equipment. He’s familiar with and has repaired just about everything that’s been on the market going back to tube-based gear but he was especially impressed with the new all-digital Icom rig he now uses most of the time. Scott’s not an engineer, but hams know a ton about electronics, so we started discussing all sorts of things. He’s especially interested in the newer LDMOS power FETs. So much so that he’s lost interest in using high-voltage transmitter tubes. "Why mess with high voltage when I can get just as far with 50V?" he mused.

 

I was wearing my Xilinx shirt from the SC17 conference, so I took the opportunity to start talking about the very relevant Xilinx Zynq UltraScale+ RFSoC, which is finding its way into a lot of 5G infrastructure equipment. Scott hadn’t heard about it, which really isn’t surprising considering how new it is, but after I described it he said he looked forward to maybe finding one in his next ham rig.

 

The special relleno, green with sour cream, arrived and one bite immediately took me back three decades again. The taste had not changed one morsel. Scott and I continued to talk for an hour. Sadly, the relleno didn’t last nearly that long.

 

Scott and I left Abe's together. He got into his truck and I started the 15-block walk back to the convention center. The conversation and the food formed one of those really remarkable time bubbles you sometimes stumble into—and always at Abe’s.

 

 

During a gala black-tie ceremony held on November 15 at The Brewery in central London, the Xilinx Zynq UltraScale+ RFSoC won the top spot among the six competitors in the IET Innovation Awards’ Communications Category—although you might not figure that out to read the citation on the E&T Magazine Web site:

 

  • Communications: Xilinx, for its single-chip 5G antenna interface device that dramatically reduces the size, power and complexity of traditional antenna structures.

 

Note: E&T is the IET's award-winning monthly magazine and associated website.

 

 

RFSoC IET Award Communications Category.jpg 

 

 

Xilinx’s Giles Peckham (center) accepts the IET Innovation Award in the Communications Category for the

Zynq UltraScale+ RFSoC from Professor Will Stewart (IET Communications Policy Panel, on left) and

Rick Edwards (Awards emcee, television presenter, and writer/comic, on right). Photo courtesy of IET.

 

 

 

Classifying the Xilinx Zynq UltraScale+ RFSoC device family, with its integrated multi-gigasample/sec RF ADCs and DACs, soft-decision forward error correction (SD-FEC) IP blocks, UltraScale architecture programmable logic fabric, and Arm Cortex-A53/Cortex-R5 multi-core processing subsystem as an “antenna interface device,” even a “Massive-MIMO Antenna Interface” device, sort of shortchanges the RFSoC in my opinion. The Zynq UltraScale+ RFSoC is a category killer for many, many applications that need “high-speed analog-in, high-speed analog-out, digital-processing-in-the-middle” capabilities due to the devices’ extremely high integration level, though it most assuredly will reduce the size, power, and complexity of traditional antenna structures as cited in the IET Innovation Awards literature. There's simply no other device like the Zynq UltraScale+ RFSoC on the market, as suggested by this award. (If you drill down to here on the IET Innovation Awards Web page, you’ll find that the Zynq UltraScale+ RFSoC was indeed Xilinx’s IET Innovation Awards entry in the communications category this year.)

 

 

RFSoC Conceptual Diagram.jpg

Zynq UltraScale+ RFSoC Conceptual Diagram

 

 

 

 

The UK-based IET is one of the world’s largest engineering institutions with more than 168,000 members in 150 countries and so winning one of the IET’s annual Innovation Awards is an honor not to be taken lightly. This year, the Communications category of the IET Innovation Awards was sponsored by GCHQ (Government Communications Headquarters), the UK’s intelligence and security organization responsible for providing signals intelligence and information assurance to the UK’s government and armed forces. 

 

For more information about the IET Innovation Awards and to see all of the various categories, click here for an animated brochure.

 

 

 

For more information about the Zynq UltraScale+ RFSoC, see:

 

 

 

 

 

 

 

 

 

 

 

According to an announcement released today:

 

“Xilinx, Inc. (XLNX) and Huawei Technologies Co., Ltd. today jointly announced the North American debut of the Huawei FPGA Accelerated Cloud Server (FACS) platform at SC17. Powered by Xilinx high performance Virtex UltraScale+ FPGAs, the FACS platform is differentiated in the marketplace today.

 

“Launched at the Huawei Connect 2017 event, the Huawei Cloud provides FACS FP1 instances as part of its Elastic Compute Service. These instances enable users to develop, deploy, and publish new FPGA-based services and applications through easy-to-use development kits and cloud-based EDA verification services. Both expert hardware developers and high-level language users benefit from FP1 tailored instances suited to each development flow.

 

"...The FP1 demonstrations feature Xilinx technology which provides a 10-100x speed-up for compute intensive cloud applications such as data analytics, genomics, video processing, and machine learning. Huawei FP1 instances are equipped with up to eight Virtex UltraScale+ VU9P FPGAs and can be configured in a 300G mesh topology optimized for performance at scale."

 

 

Huawei’s FP1 FPGA accelerated cloud service is available on the Huawei Public Cloud today. To register for the public beta, click here.

 

 

 

One of the several demos in the Xilinx booth during this week’s SC17 conference in Denver was a working demo of the CCIX (Cache Coherent Interconnection for Accelerators) protocol, which simplifies the design of offload accelerators for hyperscale data centers by providing low-latency, high-bandwidth, fully coherent access to server memory.  The demo shows L2 switching acceleration using an FPGA to offload a host processor. The CCIX protocol manages a hardware cache in the FPGA, which is coherently linked to the host processor’s memory. Cache updates take place in the background without software intervention through the CCIX protocol. If cache entries are invalidated in the host memory, the CCIX protocol automatically invalidates the corresponding cache entries in the FPGA’s memory.

 

Senior Staff Design Engineer Sunita Jain gave Xcell Daily a 3-minute explanation of the demo, which shows a 4.5x improvement in packet transfers using CCIX versus software-controlled transfers:

 

 

 

 

 

There’s one thing to note about this demo. Although the CCIX standard calls for using the PCIe protocol as a transport layer at 25Gbps/lane, which is faster than PCIe Gen4, this demo only demonstrates the CCIX protocol and is using the significantly slower PCIe Gen1 for the transport layer.

 

For more information about the CCIX protocol as discussed in Xcell Daily, see:

 

  

 

 

  

 

 

 

This week at SC17 in Denver, Everspin was showing some impressive performance numbers for the MRAM-based nvNITRO NVMe Accelerator Card that the company introduced earlier this year. As discussed in a previous Xcell Daily blog post, the nvNITRO NVMe Accelerator Card is based on the company’s non-volatile ST-MRAM chips and a Xilinx Kintex UltraScale KU060 FPGA implements the MRAM controller and the board’s PCIe Gen3 x8 host interface. (See “Everspin’s new MRAM-based nvNITRO NVMe card delivers Optane-crushing 1.46 million IOPS (4Kbyte, mixed 70/30 read/write).”)

 

The target application of interest at SC17 was high-frequency trading, where every microsecond you can shave off of system response times directly adds dollars to the bottom line, so the ROI on a product like the nvNITRO NVMe Accelerator Card that cuts transaction times is easy to calculate.

 

 

 

Everspin nvNITRO NVMe card.jpg 

 

 

Everspin MRAM-based nvNITRO NVMe Accelerator Card

 

 

 

It turns out that a common thread and one of the bottlenecks for high-frequency trading applications is the use of Apache Log4j event-logging utility. However, incoming packets arrive at a variable rate—the traffic is bursty—and the Log4j logging utility needs to keep up with the highest possible burst rates to ensure that every event is logged. Piping these events directly into SSD storage sets a low limit to the burst rate that a system can handle. Inserting an nvNITRO NVMe Accelerator Card as a circular buffer in series with the incoming event stream as shown below boosts Log4j performance by 9x.

 

 

 

Everspin nvNITRO Circular Buffer.jpg 

 

 

 

Proof of efficacy appears in the chart below, which shows the much lower latency and much better determinism provided by the nvNITRO card:

 

 

 

Everspin nvNITRO Latency Reduction.jpg 

 

 

 

One more thing of note: As you can see by one of the labels on the board in the photo above, Everspin’s nvNITRO card is now available as Smart Modular Technologies’ MRAM NVM Express Accelerator Card. Click here for more information.

 

  

 

Ryft is one of several companies now offering FPGA-accelerated applications based on Amazon’s AWS EC2 F1 instance. Ryft was at SC17 in Denver this week with a sophisticated, cloud-based data analytics demo based on machine learning and deep learning that classified 50,000 images from one data file using a neural network, merged the classified image files with log data from another file to create a super metadata file, and then provided fast image retrieval using many criteria including image classification, a watch-list match (“look for a gun” or “look for a truck”), or geographic location using the Google Earth database. The entire demo made use of geographically separated servers containing the files used in conjunction with Amazon’s AWS Cloud. The point of this demo was to show Ryft’s ability to provide “FPGAs as a Service” (FaaS) in an easy to use manner using any neural network of your choice, any framework (Caffe, TensorFlow, MXNet), and the popular RESTful API.

 

This was a complex, live demo and it took Ryft’s VP of Products Bill Dentinger six minutes to walk me through the entire thing, even moving as quickly as possible. Here’s the 6-minute video of Bill giving a very clear explanation of the demo details:

 

 

 

 

Note: Ryft does a lot of work with US government agencies and as of November 15 (yesterday), Amazon’s AWS EC2 F1 instance based on Xilinx Virtex UltraScale+ FPGAs is available on GovCloud. (See “Amazon’s FPGA-accelerated AWS EC2 F1 instance now available on Amazon’s GovCloud—as of today.”)

 

Amazon’s FPGA-accelerated AWS EC2 F1 instance now available on Amazon’s GovCloud—as of today

by Xilinx Employee ‎11-15-2017 02:46 PM - edited ‎11-15-2017 02:54 PM (1,100 Views)

 

“Amazon EC2 F1 instances are now available in the AWS GovCloud (US) region.” Amazon posted the news on its AWS Web site today and the news was announced by Amazon’s Senior Director Business Development and Product Gadi Hutt during his introductory speech at a special half-day Amazon AWS EC2 F1 instance dev lab held at SC17 in Denver the same morning. According to the Amazon Web page, “With this launch, F1 instances are now available in four AWS regions, specifically US East (N. Virginia), US West (Oregon), EU (Ireland) and AWS GovCloud (US).”

 

Nearly 100 developers attended the lab and listened to Hutt’s presentation along with two AWS F1 instance customers, Ryft and NGCodec. The presentations were followed by a 2-hour hands-on lab.

 

 

Amazons Gadi Hutt presents to an AWS EC2 F1 Lab at SC17.jpg

 

Amazon's Gadi Hutt presents to an AWS EC2 F1 hands-on lab at SC17

 

 

 

The Amazon EC2 F1 compute instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Each Amazon EC2 F1 compute instance can include as many as eight FPGAs, so you can develop extremely large and capable, custom acceleration engines with this technology. According to Amazon, use of the FPGA-accelerated F1 instance can accelerate applications in diverse fields such as genomics research, financial analysis, video processing (in addition to security/cryptography and machine learning) by as much as 30x over general-purpose CPUs.

 

For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:

 

 

 

 

 

 

 

Xilinx demos Virtex UltraScale+ FPGA VCU1525 Acceleration Development Kit at SC17 in Denver

by Xilinx Employee ‎11-15-2017 02:34 PM - edited ‎11-15-2017 02:45 PM (1,853 Views)

 

This week, if you were in the Xilinx booth at SC17, you would have seen demos of the new Virtex UltraScale+ FPGA VCU1525 Acceleration Development Kit (available in actively and passively cooled versions). Both versions are based on Xilinx Virtex UltraScale+ VU9P FPGAs with 64Gbytes of on-board DDR4 SDRAM.  

 

 

 

Xilinx VCU1525 Active.jpg 

 

Xilinx Virtex UltraScale+ FPGA VCU1525 Acceleration Development Kit, actively cooled version

 

 

 

Xilinx VCU1525_Passive_Photshopped.jpg 

 

 

Xilinx Virtex UltraScale+ FPGA VCU1525 Acceleration Development Kit, passively cooled version

 

 

 

Xilinx had several VCU1525 Acceleration Development Kits at SC17 running various applications at SC17. Here’s a short 90-second video from SC17 showing two running applications—edge-to-cloud video analytics and machine learning— narrated by Xilinx Senior Engineering Manager Khang Dao:

 

 

 

 

Note: For more information about the Xilinx Virtex UltraScale+ FPGA VCU1525 Acceleration Development Kit, contact your friendly neighborhood Xilinx or Avnet sales representative.

 

 

Back in August, I wrote about a series of GigE 3D imaging sensors based on Spartan-6 FPGAs from Carnegie Robotics. (See “Carnegie Robotics’ FPGA-based GigE 3D cameras help robots sweep mines from a battlefield, tend corn, and scrub floors.”) That blog post mentioned that Carnegie Robotics had teamed with GPS maker Swift Navigation to work on autonomous robots that would employ the 3D and positioning-system sensors from the two companies. That post also mentioned that the photo of Swift Navigation’s centimeter-accurate Piksi Multi multi-band, multi-constellation GNSS (global navigation satellite system) receiver clearly showed that the receiver is based on a Zynq Z-7020 SoC.

 

Now, Swift Navigation has just appeared in the latest “Powered by Xilinx” video. In this video, Swift Navigation’s CEO and Founder Timothy Harris describes his company’s use of the Zynq SoC in the Piksi Multi. The Zynq SoC’s programmable logic processes the incoming signals from multiple global-positioning satellite constellations on multiple frequencies and performs measurements on those signals that is normally performed by dedicated hardware. Then the Zynq SoC’s dual-core Arm Cortex-A9 MPCore processor calculates a physical position from those measurements.

 

The advantages that hardware and software programmability confer on Swift Navigation’s Piksi Multi includes the ability to quickly adapt the GNSS module for specific customer requirements and the ability to update, upgrade, and add features to the module via over-the-air transmissions. These capabilities give Swift Navigation a competitive advantage over competitive designs that employ dedicated hardware.

 

Here’s the video:

 

 

 

 

By Adam Taylor

 

Being able to see internal software variables in our Zynq-based embedded systems in real time is extremely useful during system bring-up and for debugging. I often use an RS-232 terminal during commissioning to report important information like register values from my designs and have often demonstrated that technique in previous blog posts. Information about variable values in a running system provides a measure of reassurance that the design is functioning as intended and, as we progress though the engineering lifecycle, provides verification that the system is continuing to work properly. In many cases we will develop custom test software that reports the status of key variables and processes to help prove that the design functions as intended.

 

This approach works well and has for decades—since the earliest days of embedded design. However, a better solution for Zynq-based systems that allows us to read the contents of the processor memory and extract the information we need without impacting the target’s operation and without the need to add a single line of code to the running target now presents itself. It’s called μC/Probe and it’s from Micrium, the same company that has long offered the µC/OS RTOS for a wide variety of processors including the Xilinx Zynq SoC and Zynq UltraScale+ MPSoC.

 

Micrium’s μC/Probe tool allows us to create custom graphical user interfaces that display the memory contents of interest in our systems designs.  With this capability, we can create a virtual dashboard that provides control and monitoring of key system parameters and we can do this very simply by dragging and dropping indicator, display, and control components onto the dashboard and associating them with variables in the target memory. In this manner it is possible to both read and write memory locations using the dashboard.

 

When it comes to using Micrium’s μC/Probe tool with our Zynq solution, we have choices regarding interfacing:

 

 

  • Use a Segger J Link JTAG Pod. In this case, the target system requires no additional code unless we wish to use an advanced μC/Probe feature such as an oscilloscope.

 

  • Use RS-232, USB, or TCP/IP. In this case we do not need to use JTAG. However we do need to add some code to the target embedded system. Micrium supplies sample code for us to use.

 

 

For this first example, I am going to use a Segger J Link JTAG pod to create a simple example and demonstrate the capabilities.  However, the second interface option proves useful for Zynq-based boards that lack a separate JTAG header and instead use a USB-to-JTAG device or if you do not have a J Link probe. We will look at using Micrium’s μC/Probe tool with the second interface option in a later blog.

 

Of course, the first thing we need to do is create a test application and determine the parameters to observe. The Zynq SoC’s XADC is perfect for this because it provides quantized values of the device temperature and voltage rails. These are ideal parameters to monitor during an embedded system’s test, qualification, and validation so we will use these parameters in this blog.

 

The test application example will merely read these values in a continuous loop. That’s a very simple program to write (see MicroZed Chronicles 7 and 8 for more information on how to do this). To understand the variables that we can monitor or interact with using μC/Probe, we need to understand that the tool reads in and parses the ELF produced by SDK to get pointers to the memory values of interest. To ensure that the ELF can be properly read in and parsed by μC/Probe, the ELF needs to contain debugging data in the DWARF format. That means that within SDK, we need to set the compile option –gdwarf-2 to ensure that we use the appropriate version of DWARF. Failure to use this switch will result in μC/Probe being unable to read and parse the generated ELF.

 

We set this compile switch in the C/C++ build settings for the application, as shown below:

 

 

 

Image1.jpg 

 

 

Setting the correct DWARF information in Xilinx SDK

 

 

With the ELF file properly created, I made a bootable SD card image of the application and powered on the MicroZed. To access the memory, I connected the Segger J Link, which uses a Xilinx adaptor cable to mate with the MicroZed board’s JTAG connector.

 

 

 

Image2.jpg

 

 

MicroZed with the Segger J Link

 

 

 

All I needed to do now was to create the virtual dashboard. Within μC/Probe, I loaded the ELF by clicking on the ELF button in the symbol browser. Once loaded, we can see a list of all the symbols that can be used on μC/Probe’s virtual dashboard.

 

 

 

Image3.jpg 

 

 

ELF loaded, and available symbols displayed

 

 

 

For this example, which is monitoring Zynq SoC’s XADC internal signals including device temperature and power, I added six Numeric Indicators and one Angular Gauge Quadrant. Adding graphical elements to the data screen is very simple. All you need to do this find the display element you desire in the toolbox, drag onto the data screen, and drop it in place.

 

 

 

Image4.jpg 

 

 

Adding a graphical element to the display

 

 

 

To display information from the running Zynq SoC on μC/Probe’s Numeric Indicators and on the Gauge, I needed to associate each indicator and gauge with a variable in memory. We use the Symbol viewer to do this. Select the variable you want and drag it onto the display indicator as shown below.

 

 

 

Image5.jpg

 

 

Associating a variable with a display element

 

 

 

If you need to scale the display to use the full variable range or otherwise customize it, hold the mouse over the appropriate display element and select the properties editor icon on the right.  The properties editor lets you scale the range, enter a simple transfer function, or increase the number of decimal places if desired.

 

 

Image6.jpg

 

 

Formatting a Display Element

 

 

 

Once I’d associated all the Numeric Indicators and the Gauge with appropriate variables but before I could run the project and watch the XADC values in real time, one final thing remained: I needed to inform the project how I wished to communicate with the target and select the target processor. For this example, I used Segger’s J Link probe.

 

 

 

Image7.jpg

 

 

Configuring the communication with the target

 

 

 

With this complete. I clicked “run” and captured the following video of the XADC data being captured and displayed by μC/Probe.

 

 

 

 

 

 

All of this was pretty simple and very easy to do. Of course, this short has just scratched the surface of the capabilities of Micrium’s μC/Probe tool. It is possible to implement advanced features such as oscilloscopes, bridges to Microsoft Excel, and communication with the target using terminal windows or more advanced interfaces like USB. In the next blog we will look at how we can use some of these advanced features to create a more in-depth and complex virtual dashboard.

 

I think I am going to be using Micrium’s μC/Probe tool in many blogs going forward where I want to interact with the Zynq as well.

 

 

 

You can find the example source code on the GitHub.

 

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

 

First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg  

 

 

 

 

Second Year E Book here

 

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

 

 

 

Digilent has announced a major upgrade to the Zynq-based Zybo dev board, now called the Zybo Z7. The original board was based on a Xilinx Zynq Z-7010 SoC with the integrated Arm Cortex-A9 MPCore processors running at 650MHz. The new Zybo Z7-10 and -20 dev boards are based on the Zynq Z-7010 and Z-7020 SoC respectively, and the processors now run at 667MHz. The Zybo Z7-10 sells for $199 (currently, you can get a voucher for the Xilinx SDSoC development environment for $10 more) and the Zybo Z7-20 board with triple the programmable logic resources sells for $299 (and currently includes the SDSoC voucher).

 

 

Digilent Zybo Z7-20 board.jpg 

 

Digilent Zybo Z7-20 Dev Board based on Zynq Z-7020 SoC

 

 

In addition to the faster processors, there are several additional upgrades made to the Zybo Z7 versus the Zybo dev board. SDRAM capacity has increased from 512Mbytes on the original Zybo board to 1Gbyte on the Zybo Z7. The new boards now have two HDMI ports to support “bump-in-the-wire” HDMI applications. Both boards now also include a connector with a MIPI CSI-2 interface for video camera connections. You can plug a Raspberry Pi Camera Module directly into this connector and Digilent also plans to offer a camera module for this port.

 

Here’s a video explaining some of the highlights of the new Zybo Z7.

 

 

 

 

 

Note: For more information about the Zybo Z7 dev board, please contact Digilent directly.

 

 

Mercury Systems recently announced the BuiltSAFE GS Multi-Core Renderer, which runs on the multi-core ARM Cortex-A53 processor inside Xilinx Zynq UltraScale+ MPSoCs. The BuiltSAFE GS Multi-Core Renderer—a high-performance, small-footprint OpenGL library designed to render highly complex 3D graphics in safety-critical embedded systems, is certifiable to DO-178C at the highest design assurance level (DAL-A) as well as the highest Automotive Safety Integrity Level (ASIL D). Because it runs on the CPU, performance of the Multi-Core Renderer scales up with more CPU cores and can run on Zynq Ultrascale+ CG MPSoC variants that do not include the Arm Mali-400 GPU.

 

According to Mercury’s announcement:

 

“Hardware certification requirements (DO-254/ED80) present huge challenges when using a graphics-processing unit (GPU), and the BuiltSAFE GS Multi-Core Renderer is the ideal solution to this problem. It uses a deterministic, processor architecture-independent model optimized for any multicore-based platform to maximize performance and minimize power usage. All of the BuiltSAFE Graphics Libraries use industry standard OpenGL API specifications that are compatible with most new and legacy applications, but it can also be completely tailored to meet any customer requirements.”

 

 

 

Mercury Systems BuiltSAFE 3D Renderer.jpg 

 

 

 

Please contact Mercury Systems for more information about the BuiltSAFE GS Multi-Core Renderer.

 

 

 

 

The new Mellanox Innova-2 Adapter Card teams the company’s ConnectX-5 Ethernet controller with a Xilinx Kintex UltraScale+ KU15P FPGA to accelerate computing, storage, and networking in data centers. According to the announcement, “Innova-2 is based on an efficient combination of the state-of-the-art ConnectX-5 25/40/50/100Gb/s Ethernet and InfiniBand network adapter with Xilinx UltraScale FPGA accelerator.” The adapter card has a PCIe Gen4 host interface.

 

 

 

Mellanox Innova-2 Adapter Card.jpg 

 

Mellanox’s Innova-2 PCIe Adapter Card

 

 

 

Key features of the card include:

 

  • Dual-port 25Gbps Ethernet via SFP cages
  • TLS/SSL, IPsec crypto offloads
  • Mellanox ConnectX-5 Ethernet controller and Xilinx Kintex UltraScale+ FPGA for either “bump-on-the-wire” or “look-aside” acceleration
  • Low-latency RDMA and RDMA over Converged Ethernet (RoCE)
  • OVS and Erasure Coding offloads
  • Mellanox PeerDirect communication acceleration
  • End-to-end QoS and congestion control
  • Hardware-based I/O virtualization

 

 

Innova-2 is available in multiple, pre-programmed configurations for security applications with encryption acceleration such as IPsec or TLS/SSL. Innova-2 boosts performance by 6x for security applications while reducing total cost of ownership by 10X when compared to alternatives.

 

Innova-2 enables SDN and virtualized acceleration and offloads for Cloud infrastructure. The on-board programmable resources allow deep-learning training and inferencing applications to achieve faster performance and better system utilization by offloading algorithms into the card’s Kintex UltraScale+ FPGA and the ConnectX acceleration engines.

 

The adapter card is also available as an unprogrammed card, open for customers’ specific applications. Mellanox provides configuration and management tools to support the Innova-2 Adapter Card across Windows, Linux, and VMware distributions.

 

Please contact Mellanox directly for more information about the Innova-2 Adapter Card.

 

 

 

 

XIMEA adds 8K imaging to its line of xiB PCIe cameras using CMOSIS CMV50000 sensor

by Xilinx Employee ‎11-08-2017 09:44 AM - edited ‎11-10-2017 08:51 AM (2,558 Views)

 

XIMEA has announced an 8K version of its existing xiB series of PCIe embedded-vision cameras. The new camera, called the CB500, incorporates a CMOSIS CMV50000 sensor with 47.6Mpixel (7920x6004) resolution at 12bit conversion depth. The camera is available in either color or monochrome version and can stream 30fps at 8bits/pixel transport mode (22fps at 12bits/pixel transport mode). Both cameras employ a 20Gbps PCIe Gen2 x4 system interface.

 

 

 

Ximea XiB PCIe Camera.jpg 

 

 

Ximea 8K, 47.6Mpixel CB500 xiB embedded-vision camera with PCIe interface

 

 

Like many of its cameras, the XIMEA CM500 relies on the programmability of a Xilinx FPGA to accommodate the different interface needs and processing requirements of the sensors and interfaces in its cameras. In the case of the CM500, the FPGA is an Artix-7 A75T.

 

 

For information about the XIMEA CM500 8K camera, please contact XIMEA directly.

 

 

For more information about other XIMEA embedded-vision cameras based on Xilinx all Programmable devices, see:

 

 

 

 

 

 

 

William Wong, Technology Editor for ElectronicDesign.com, just published an article titled “Hypervisors Step Up Security for Arm Cortex-A” and the first item he discusses is Lynx Software Technologies’ LynxSecure Separation Kernel Hypervisor running on the Xilinx Zynq UltraScale MPSoC. Wong writes, “The 64-bit, Arm Cortex-A ARMv8 architecture supports virtual machines (VMs), but it requires hypervisor software to deliver this functionality.”

 

LynxSecure 6.0, the latest version of the company’s Separation Kernel Hypervisor, was just announced late last month. The initial port of this new hypervisor to the ARM architecture targets the multiple Arm Cortex-A53 processors in the Zynq Ultrascale+ MPSoC. (Previous versions only supported the x86 microprocessor architecture.) LynxSecure uses the Arm Cortex-A53 processors’ MMU, SMMU, and virtualization capabilities found on Armv8 processors to fully isolate operating systems and applications. It allows access only to the devices allocated to these applications and operating systems.

 

According to Lynx, it chose the Zynq UltraScale+ MPSoC as the first porting target for the LinuxSecure hypervisor “for its broad market applicability, early customer interest, and the long-term relationship between Lynx and Xilinx.”

 

For more information about the LinuxSecure Separation Kernel Hypervisor, see this brochure or contact Lynx Software Technologies directly.

 

 

 

Accolade’s new Flow-Shunting feature for its FPGA-based ANIC network adapters lets you more efficiently drive packet traffic through existing 10/40/100GE data center networks by offloading host servers. It does this by eliminating the processing and/or storage of unwanted traffic flows, as identified by the properly configured Xilinx UltraScale FPGA on the ANIC adapter. By offloading servers and reducing storage requirements, flow shunting can deliver operational cost savings throughout the data center.

 

The new Flow Shunting feature is a subset of the existing Flow Classification capabilities built into the FPGA-based Advanced Packet Processor in the company’s ANIC network adapters. (The company has written a technology brief explaining the capability.) Here’s a simplified diagram of what’s happening inside of the ANIC adapter:

 

 

Accolade Flow Shunting.jpg 

 

 

The Advanced Packet Processor in each ANIC adapter performs a series of packet processing functions including flow classification (outlined in red). The flow classifier inspects each packet, determines whether each packet is part of a new flow or an existing flow, and then updates the associated lockup table (LUT)—which resides in a DRAM bank—with the flow classification. The LUT has room to store as many as 32 million unique IP flow entries. Each flow entry includes standard packet-header information (source/destination IP, protocol, etc.) along with flow metadata including total packet count, byte count, and the last time a packet was seen. The same flow entry tracks information about both flow directions to maintain a bi-directional context. With this information, the ANIC adapter can take specific actions on an individual flow. Actions might include forwarding, dropping, or re-directing packets in each flow.

 

These operations form the basis for flow shunting, which permits each application to decide from which flow(s) it does and does not want to receive data traffic. Intelligent, classification-based flow shunting allows an application to greatly reduce the amount of data it must analyze or handle, which frees up server CPU resources for more pressing tasks.

 

 

For more information about Accloade’s UltraScale-based ANIC network adapters, see “Accolade 3rd-gen, dual-port, 100G PCIe Packet Capture Adapter employs UltraScale FPGAs to classify 32M unique flows at once.

 

 

 

Today, Microsoft, Mocana, Infineon, Avnet, and Xilinx jointly introduced a highly integrated, high-assurance IIoT (industrial IoT) system based on the Microsoft Azure Cloud and Microsoft’s Azure IoT Device SDK and Azure IoT Edge runtime package, Mocana’s IoT Security Platform, Infineon’s OPTIGA TPM (Trusted Platform Module) 2.0 security cryptocontroller chip, and the Avnet UltraZed-EG SOM based on the Xilinx Zynq UltraScale+ EG MPSoC.

 

The Mocana IoT Security Platform stack looks like this:

 

 

Mocana IoT Security Platform.jpg 

 

Mocana IoT Security Platform stack

 

 

 

Here’s a photo of the dev board that combines all of these elements:

 

 

 

Infineon, Avnet, Xilinx, Microsoft, Mocana IIoT Board.jpg

 

 

 

The Avnet UltraZed-EG SOM appears in the lower left and the Infineon OPTIGA TPM 2.0 security chip resides on a Pmod carrier plugged into the top of the board.

 

If you’re interested in learning more about this highly integrated IIoT hardware/software solution, click here.

 

 

 

The November/December 2017 issue of the ARRL’s QEX magazine carries an article written by Stefan Scholl (DC9ST) titled “The Panoradio: A Modern Software Defined Radio with Direct Sampling.” This article describes the implementation of an open-source software-defined radio (SDR) based on an Avnet Zedboard—which in turn is based on a Xilinx Zynq Z-7020 SoC—and an Analog Devices AD9467-FMC-250EBZ board based on the 16-bit, 250Msamples/sec AD9467 ADC.

 

 

 

Panoradio.jpg 

 

 

Stefan Scholl’s Panoradio SDR is based on a Zedboard (the green board on the left, with a Zynq Z-7020 S0C) and an AD9467-FMC-250EBZ ADC board (the blue board on the right)

 

 

 

The Panoradio’s features include:

 

  • 0 -100 MHz direct sampling reception
  • Direct sampling of 70 cm (425 – 440 MHz) signals
  • Three independent, zoomable waterfall displays (100 MHz to 6.1 kHz bandwidth)
  • Two independent audio receivers (22 kHz bandwidth) with Weaver SSB demodulation
  • Standalone operation (no PC needed)
  • Runs full Linux stack including demodulation software (e.g. Fldigi)

 

Beyond the comprehensive design, Scholl’s article contains one of the most concise arguments for the adoption of SDRs that I’ve seen:

 

“The extensive use of digital signal processing has many advantages over analog circuits: Analog processing is often limited by the laws of physics, that can hardly be overcome. Digital processing is limited only by circuit complexity—a better performance (sensitivity, dynamic range, spurs, agility, etc.) is achieved by more complex calculations and larger bit widths. Since semiconductor technology has continuously advanced following Moore’s Law, very complex systems can be built today and it is possible to achieve extraordinary accuracy and performance for digital signals with comparatively little effort.”

 

Scholl then lists numerous SDR advantages:

 

  • Digital FIR filters can be built with virtually any filter response.
  • Mixers and amplifiers implemented with digital multipliers do not introduce spurs, harmonics, or unwanted IMD (inter-modulation distortion). Gain imperfection or other parasitic behaviors are also absent.
  • Digital oscillators based on direct digital synthesis (DDS) achieve extremely high spectral purity with virtually no spurs or harmonics.
  • Digital oscillator frequency can change instantaneously without phase discontinuity.
  • DSP is impervious to component aging effects, impedance mismatch, and a variety of EMC issues that plague analog circuitry.
  • Only quantization noise is present and can be made arbitrarily low by increasing bit widths.
  • SDRs are easily copied and can be backed up prior to making changes so that failed experiments can be easily reversed.

 

 

The Panoradio’s DSP section consumes less than 50% of the Zynq Z-7020 SoC’s PL (programmable logic) resources. As Scholl writes: “This quite low utilization would allow for even more complex DSP.” Note that is the resource utilization in a low-end Zynq SoC. There are several Zynq SoC family members with significantly more PL resources available.

 

At the end of the article, Scholl writes: “Interestingly, the bottleneck is not the FFTs or the communication with the IP cores, but the drawing routines for the waterfall plots. The Zynq does not have any graphics acceleration core, which could speed up the drawing process. However, Xilinx has realized this bottleneck and included graphics acceleration in the Zynq successors: the UltraScale MPSoC, and the Zynq UltraScale+.”

 

(Actually, there’s just one device family here, the Zynq UltraScale+ MPSoC, with Mali-400 GPUs in the EG and EV variants, but Scholl will no doubt be even more interested in the new Zynq UltraScale+ RFSoCs, which incorporate 4Gsamples/sec ADCs and 6.4Gsamples/sec DACs—perfect for SDR applications.)

 

For more information on the Zynq UltraScale+ RFSoC, see:

 

 

 

 

 

 

 

LittleGP-30 based on Spartan-6 FPGA emulates world’s oldest personal computer: the LGP-30 from 1956

by Xilinx Employee ‎11-06-2017 01:28 PM - edited ‎11-06-2017 04:19 PM (2,194 Views)

 

Back in the early days of computing when only immense, big-iron computers roamed the earth, Stanley Frankel created the revolutionary LGP-30 desk computer. Back when the sale of one, two, or maybe three units per computer design was the norm, LibraScope built and shipped more than 500 LGP-30s from 1956 through the early 1960s. It proved to be a very durable design for the time. This tiny titan used only 113 vacuum tubes and 1450 newfangled germanium diodes (from the discard pile at Hughes Aircraft) to create a working computer. The LGP-30’s main memory and even its three 32-bit CPU registers were stored on its magnetic rotating-drum memory.

 

 

LGP-30 ad.jpg 

 

 

This ad for Stan Frankel’s LGP-30 personal computer appeared in the Proceedings of the IRE in April, 1959.

 

 

 

The LGP-30 was a significant milestone in computer history. It was the first computer to be used as a process-control machine due to its “low” $27,000 cost. John Kemeny and Thomas Kurtz at Dartmouth College used an LGP-30 during the early 1960s to develop several simplified programming languages designed for undergraduate study: DARSIMCO (Dartmouth Simplified Code), DART, ALGOL 30, SCALP (Self-Contained ALGOL Processor), and DOPE (Dartmouth Oversimplified Programming Experiment). They called the successor to these languages the Beginner’s All-purpose Symbolic Instruction Code (BASIC) but by the time they developed BASIC, they’d graduated to General Electric GE-225 and Datanet-30 computers.

 

Now, Jürgen Müller has developed a timing-faithful miniature replica of the LGP-30 called the LittleGP-30. It’s based on a Xilinx Spartan-6 LX9 FPGA, which recreates the LGP-30’s CPU and its rotating magnetic drum, reads the user controls, and drives the displays.

 

 

LittleGP-30 Computer.jpg 

 

 

Jürgen Müller’s LittleGP-30 FPGa-based miniature replica of the 1950’s-era LGP-30 computer

 

 

 

You might well be expecting to see a big panel of blinking lights, as was common for computers of that era. However, the original LGP-30 used lighted pushbuttons to show the machine’s operational status and the internal machine state appeared on the front panel on an oscilloscope display. Müller has recreated the lighted pushbuttons using LED-backlit tactile switches and an LCD recreates the oscilloscope display. The 3-board LittleGP-30 uses a low-cost Numato MIMAS FPGA board based on a Spartan-6 LX9 FPGA. A custom control/display board with the LittleGP-30’s switches, LCD, a rotary encoder, and an HDMI port for displaying the entire contents of the machine’s emulated drum memory plugs into the Numato MIMAS board. A third circuit board on the top serves as a rather realistic reproduction of the LGP-30’s front panel—in miniature of course. Although it’s not currently a kit, Müller has posted a 42-page LittleGP-30 manual online.

 

Müller’s LittleGP-30 Web page also contains links to original LGP-30 paper-tape ASCII images, more than a dozen original software and hardware manuals, and a history of the LGP-30 computer that’s at the Computermuseum der Fakultät Informatik in Stuttgart. (The Computer History Museum in Mountain View also has an LGP-30 on display.)

 

Note: You can find out more about the history of Stan Frankel and the LGP-30 here.

 

 

By Adam Taylor

 

 

So far, all of my image-processing examples have used only one sensor and produce one video stream within the Zynq SoC or Zynq UltraScale+ MPSoC PL (programmable logic). However, if we want to work with multiple sensors or overlay information like telemetry on a video frame, we need to do some video mixing.

 

Video mixing merges several different video streams together to create one output stream. In our designs we can use this merged video stream in several ways:

 

  1. Tile together multiple video streams to be displayed on a larger display. For example, stitching multiple images into a 4K display.
  2. Blend together multiple image streams as vertical layers to create one final image. For example, adding an overlay or performing sensor fusion.

 

To do this within our Zynq SoC or Zynq UltraScale+ MPSoC system, we use the Video Mixer IP core, which comes with the Vivado IP library. This IP core mixes as many as eight image streams plus a final logo layer. The image streams are provided to the core via AXI Streaming or AXI memory-mapped inputs. You can select which one on a stream-by-stream basis. The IP Core’s merged-video output uses an AXI Stream.

 

To give a demonstration of the how we can use the video mixer, I am going to update the MiniZed FLIR Lepton project to use the 10-inch touch display and merge a second video stream using a TPG. Using the 10-inch touch display gives me a larger screen to demonstrate the concept. This screen has been sitting in my office for a while now so it’s time it became useful.

 

Upgrading to the 10-inch display is easy. All we need to do in the Vivado design is increase the pixel clock frequency (fabric clock 2) from 33.33MHz to 71.1MHz. Along with adjusting the clock frequency, we also need to set the ALI3 controller block to 71.1MHz.

 

Now include a video mixer within the MiniZed Vivado design. Enable layer one and select a streaming interface with global alpha control enabled. Enabling a layer’s global alpha control allows the video mixer to blend the alpha on a pixel-by-pixel basis. This setting allows pixels to be merged according to the defined alpha value rather than just over riding the pixel on the layer beneath. The alpha value for each layer ranges between 0 (transparent) and 1 (opaque). Each layer’s alpha value is defined within an 8-bit register.

 

 

Image1.jpg 

 

 

Insertion of the Video Mixer and Video Test Pattern Generator

 

 

 

Image2.jpg

  

Enabling layer 1, for AXI streaming and Global Alpha Blending

 

 

The FLIR camera provides the first image stream. However we need a second image stream for this example, so we’ll instantiate a video TPG core and connect its output to the video mixer’s layer 1 input. For the video mixer and test pattern generator, be sure to use the high-speed video clock used in the image-processing chain. Build the design and export it to SDK.

 

We use the API xv_mix.h to configure the video mixer in SDK. This API provides the functions needed to control the video mixer.

 

The principle of the mixer is simple. There is a master layer and you declare the vertical and horizontal size of this layer using the API. For this example using the 10-inch display, we set the size to 1280 pixels by 800 lines. We can then fill this image space using the layers, either tiling or overlapping them as desired for our application.

 

Each layer has an alpha register to control blending along with X and Y origin registers and height and width registers. These registers tell the mixer how it should create the final image. Positional location for a layer that does not fill the entire display area is referenced from the top left of the display. Here’s an illustration:

 

 

 

Image3.jpg 

 

Video Mixing Layers, concept. Layer 7 is a reduced-size image in this example.

 

 

To demonstrate the effects of layering in action, I used the test pattern generator to create a 200x200-pixel checkerboard pattern with the video mixer’s TPG layer alpha set to opaque so that it overrides the FLIR Image. Here’s what that looks like:

 

 

 

Image4.jpg

 

Test Pattern FLIR & Test Pattern Generator Layers merged, test pattern has higher alpha.

 

 

 

Then I set the alpha to a lower value, enabling merging of the two layers:

 

 

 

Image5.jpg 

 

Test Pattern FLIR & Generator Layers merged, test pattern alpha lower.

 

 

 

We can also use the video mixer to tile images as shown below. I added three more TPGs to create this image.

 

 

 

Image6.jpg 

 

Four tiled video streams using the mixer

 

 

The video mixer is a good tool to have in our toolbox when creating image-processing or display solutions. It is very useful if we want to merge the outputs of multiple cameras working in different parts of the electromagnetic spectrum. We’ll look at this sort of thing in future blogs.

 

 

You can find the example source code on the GitHub.

 

Adam Taylor’s Web site is http://adiuvoengineering.com/.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

First Year E Book here

First Year Hardback here.

 

 

  

MicroZed Chronicles hardcopy.jpg 

 

 

Second Year E Book here

Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

Programmable logic is proving to be an excellent, flexible implementation medium for neural networks that gets faster and faster as you go from floating-point to fixed-point representation—making it ideal for embedded AI and machine-learning applications—and the latest proof point is a recently published paper written by Yufeng Hao and Steven Quigley in the Department of Electronic, Electrical and Systems Engineering at the University of Birmingham, UK. The paper is titled “The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA” and it describes a successful implementation and training of a fixed-point Deep Recurrent Neural Network (DRNN) using the Python programming language; the Theano math library and framework for multi-dimensional arrays; the open-source, Python-based PYNQ development environment; the Digilent PYNQ-Z1 dev board; and the Xilinx Zynq Z-7020 SoC on the PYNQ-Z1 board. Using a Python DRNN hardware-acceleration overlay, the two-person team achieved 20GOPS of processing throughput for an NLP (natural language processing) application with this design and outperformed earlier FPGA-based implementation by factors ranging from 2.75x to 70.5x.

 

Most of the paper discusses NLP and the LM (language model), “which is involved in machine translation, voice search, speech tagging, and speech recognition.” The paper then discusses the implementation of a DRNN LM hardware accelerator using Vivado HLS and Verilog to synthesize a custom overlay for the PYNQ development environment. The resulting accelerator contains five Process Elements (PEs) capable of delivering 20 GOPS in this application. Here’s a block diagram of the design:

 

 

 

PYNQ DRNN Block Diagram.jpg

 

DRNN Accelerator Block Diagram

 

 

 

There are plenty of deep technical details embedded in this paper but this one sentence sums up the reason for this blog post about the paper: “More importantly, we showed that a software and hardware joint design and simulation process can be useful in the neural network field.” This statement is doubly true considering that the PYNQ-Z1 dev board sells for $229.

 

 

 

Today, Xilinx announced plans to invest $40M to expand research and development engineering work in Ireland on artificial intelligence and machine learning for strategic markets including cloud computing, embedded vision, IIoT (industrial IoT), and 5G wireless communications. The company already has active development programs in these categories and today’s announcement signals an acceleration of development in these fields. The development was formally announced in Dublin today by The Tánaiste (Deputy Prime Minister of Ireland) and Minister for Business, Enterprise and Innovation, Frances Fitzgerald T.D., and by Kevin Cooney, Senior Vice President, Chief Information Officer and Managing Director EMEA, Xilinx Inc. The new investment is supported by the Irish government through IDA Ireland.

 

Xilinx first established operations in Dublin in 1995. Today, the company employs 350 people at its EMEA headquarters in Citywest, Dublin, where it operates a research, product development, engineering, and an IT center along with centralized supply, finance, legal, and HR functions. Xilinx also has R&D operations in Cork, which the company established in 2001.

 

 

Xilinx Ireland.jpg 

 

Xilinx’s Ireland Campus

 

I’ve written several times about Amazon’s AWS EC2 F1 instance, a cloud-based acceleration services based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. (See “AWS makes Amazon EC2 F1 instance hardware acceleration based on Xilinx Virtex UltraScale+ FPGAs generally available.”) The VINEYARD Project, a pan-European effort to significantly increase the performance and energy efficiency of data centers by leveraging the advantages of hardware accelerators, is using Amazon’s EC2 F1 instance to develop Apache Spark accelerators. VINEYARD project coordinator Christoforos Kachris from ICCS/NTU, of gave a presentation on "Hardware Acceleration of Apache Spark on Energy Efficient FPGAs" at SPARK Summit 2017 and a video of his presentation appears below.

 

Kachris’ presentation details experiments on accelerating machine-learning (ML) applications running on the Apache Spark cluster-computing framework by developing hardware-accelerated IP. The central idea is to create ML libraries that can be seamlessly invoked by programs simply by calling the appropriate library. No other program changes are needed to get the benefit of hardware acceleration. Raw data passes from a Spark Worker through a pipe, a Python API, and a C API to the FPGA acceleration IP and returns to the Spark Worker over a similar, reverse path.

 

The VINEYARD development team first prototyped their idea by creating a small model of the AWS EC2 F1 could-based system using four Digilent PYNQ-Z1 dev boards networked together via Ethernet and the Python-based, open-source PYNQ software development environment. Digilent’s PYNQ-Z1 dev boards are based on Xilinx Zynq Z-7020 SoCs. Even this small prototype dramatically doubled the performance relative to a Xeon server.

 

Having proved the concept, the VINEYARD development team scaled up to the AWS EC2 F1 and achieved a 3x to 10x performance improvement (cost normalized against an AWS instance with non-accelerated servers).

 

Here’s the 26-minute video presentation:

 

 

 

The Xilinx PYNQ Hackathon wrap video just went live.

by Xilinx Employee on ‎11-01-2017 12:05 PM (2,553 Views)

 

Twelve student and industry teams competed for 30 straight hours in the Xilinx Hackathon 2017 competition in early October and the 3-minute wrap video just appeared on YouTube. The video shows a lot of people having a lot of fun with the Zynq-based Digilent PYNQ-Z1 dev board and Python-based PYNQ development environment:

 

 

 

 

In the end, the prizes:

 

 

 

  • The “Murphy’s Law” prize for dealing with insurmountable circumstances went to Team Harsh Constraints.

 

  • The “Best Use of Programmable Logic” prize went to Team Caffeine.

 

  • The “Runner Up” prize went to Team Snapback.

 

  • The “Grand Prize” went to Team Questionable.

 

 

For detailed descriptions of the Hackathon entries, see “12 PYNQ Hackathon teams competed for 30 hours, inventing remote-controlled robots, image recognizers, and an air keyboard.”

 

 

And a special “Thanks!” to Sparkfun for supplying much of the Hackathon hardware. Sparkfun is headquartered just down the road from the Xilinx facility in Longmont, Colorado.

 

 

 

 

 

The smart factories envisioned by Industrie 4.0 are not in our future. They’re here, now and they rely heavily on networked equipment. Everything needs to be networked. In the past, industrial and factory applications relied on an entire zoo of different and incompatible networking “standards” and some of these were actually standards. Today, the networking standard for Industrie 4.0 and IIoT applications is pretty well understood to be Ethernet—and that’s a problem because Ethernet is not deterministic. In the world of factory automation, non-deterministic networks are a “bad thing.”

 

Enter TSN (time-sensitive networking).

 

TSN is a set of IEEE 802 substandards—extensions to Ethernet—that enable deterministic Ethernet communications. Xilinx’s Michael Zapke recently published an article about TSN in the EBV Blog titled “Time-Sensitive Networking (TSN): Converging networks for Industry 4.0.” After discussing TSN briefly, Zapke’s article veers to practical matters and discussed implementing TSN protocols within the Xilinx Zynq UltraScale+ MPSoC’s PS (Processing System) and PL (Programmable Logic) using Xilinx’s 1G/100M TSN Subsystem LogiCORE IP, because you need both hardware and software to make TSN protocols work correctly.

 

Note: If you’re attending the SPS IPC Drives trade fair late this month in Nuremberg, you can see this IP in action.

 

 

Green Hills recently announced availability of its safety-critical, secure INTEGRITY RTOS for Xilinx Zynq UltraScale+ MPSoCs. The INTEGRITY RTOS has an 18-year history of use in safety-critical avionics, industrial, medical, avionics, and automotive applications. The RTOS uses the Zynq UltraScale+ MPSoC’s hardware memory protection to isolate and protect embedded applications and it creates secure partitions that guarantee the resources each task needs to run correctly.

 

The company’s comprehensive announcement covers the release of:

 

 

  • INTEGRITY RTOS, for safety/security critical software in industrial, avionics, medical, automotive, and railway applications. This release makes full use of the Zynq UltraScale+ MPSoC’s Arm 64-bit Cortex-A53 processor cores including the NEON SIMD instructions and hardware virtualization.

 

  • Green Hills C/C++ Optimizing Compilers and 64-bit toolchain.

 

  • MULTI 64-bit IDE and debugger.

 

  • Integrated code-quality tools including a MISRA C/C++ code-quality adherence checker and DoubleCheck static analyzer.

 

  • Multicore run-control, board bring-up, low-level debugging, and real-time trace debugging with Green HiIls’ Probe and SuperTrace Probe based on Arm’s CoreSight trace IP, which is incorporated into all Zynq UltraScale+ MPSoCs.

 

  • Embedded Cryptographic Toolkit with FIPS 140-2 compliant services that secure embedded devices through secure boot, secure data storage, secure networks (SSL, TSL, IPSec, SSH), and digitally signed secure OTA firmware updates.

 

  • Cloud-based Device Lifecycle Management (DLM) that manages secure credentials throughout the manufacturing supply chain, even over untrusted networks.

 

  • INTEGRITY Multivisor secure virtualization extension, an optional future feature, that safely runs guest operating systems such as Linux and Android alongside real-time critical applications.

 

 

 

Please contact Green Hills directly for more information about the INTEGRITY RTOS and associated Green Hills products.

 

 

 

 

Just this month, IEEE Spectrum magazine ran an article written by Morgen E. Peck titled “Why the Biggest Bitcoin Mines Are in China.” The article primarily discusses Bitmain, a company that claims to have made 70% of the bitcoin mining rigs in the world. The company’s current top-of-the-line bitcoin-mining machine, the one that Bitmain says is the most powerful bitcoin miner in the world, is called the Antminer S9. Each Antminer S9 incorporates 189 of Bitmain’s 16nm ASIC, the BM1387. The Antminer S9 can execute 14 TeraHashes/sec in its fastest speed grade.

 

 

Bitmain Antminer S9.jpg

 

 

Bitmain’s Antminer S9, the world’s most powerful bitcoin miner

 

 

 

According to the IEEE Spectrum article, FPGAs were used to mine bitcoin until about 2013. After that, ASICs took over the heavy-duty task of running the bitcoin SHA-256 hashing algorithm as fast as possible. That’s because bitcoin mining is a race and the winner of the Bitcoin race is the fastest one to compute and register a hash that meets the criteria established for Bitcoin. The first to register that hash with the Bitcoin network owns that bitcoin, currently worth a little more than $6000 according to the exchange rate posted at the bottom of the Web page for the Antminer S9 mining machine.

 

So if the days of FPGA-based bitcoin mining are over, why cover Bitmain in an Xcell Daily blog? Because something needs to manage and direct the 189 ASICs in each Antminer S9 mining machine and that something is a Xilinx Zynq Z-7010 SoC according to Bitmain’s Antminer S9 Web page. Among the features that Bitmain liked about the Zynq SoC is that it “supports Gigabit Ethernet to ensure that mined blocks are submitted instantly” because finding the right hash is one thing, but being the first to register it is everything.

 

 

 

According to Yin Qi, Megvii’s chief exec, his company is developing a “brain” for visual computing. Beijing-based Megvii develops some of the most advanced image-recognition and AI technology in the world. The company’s Face++ facial-recognition algorithms run on the cloud and in edge devices such as the MegEye-C3S security camera, which runs Face++ algorithms locally and can capture more than 100 facial images in each 1080P video frame at 30fps.

 

 

MegEye-C3S Security Camera.jpg 

 

 

MegEye-C3S Facial-Recognition Camera based on Megvii’s Face++ technology

 

 

 

In its early days, Megvii ran its algorithms on GPUs, but quickly discovered the high cost and power disadvantages of GPU acceleration. The company switched to the Xilinx Zynq SoC and is able to run deep convolution on the Zynq SoC’s programmable logic while quantitative analysis runs simultaneously on the Zynq SoC’s Arm Cortex-A9 processors. The heterogeneous processing resources of the Zynq SoC allow Megvii to optimize the performance of its recognition algorithms for lowest cost and minimum power consumption in edge equipment such as the MegEye-C3S camera.

 

 

MegEye-C3S Security Camera exploded diagram.jpg 

 

MegEye-C3S Facial-Recognition Camera exploded diagram showing Zynq SoC (on right)

 

 

Here’s a 5-minute video where Megvii’s Sam Xie, GM of Branding and Marketing, and Jesson Liu, Megvii’s hardware leader, explain how their company has been able to attract more than 300,000 developers to the Face++ platform and how the Xilinx Zynq SoC has aided the company in developing the most advanced recognition products in the cloud and on the edge:

 

 

 

 

 

 

 

Xilinx has a terrific tool designed to get you from product definition to working hardware quickly. It’s called SDSoC. Digilent has a terrific dev board to get you up and running with the Zynq SoC quickly. It’s the low-cost Arty Z7. A new blog post by Digilent’s Alex Wong titled “Software Defined SoC on Arty Z7-20, Xilinx ZYNQ evaluation board” posted on RS Online’s DesignSpark site gives you a detailed, step-by-step tutorial on using SDSoC with the Digilent Arty S7. In particular, the focus here is on the ease of moving functions from software running on the Zynq SoC’s Arm Cortex-A9 processors to the Zynq SoC’s programmable hardware using Vivado HLS, which is embedded in SDSoC. That’s so that you can get the performance benefit of hardware-based task execution.

 

 

 

Digilent Arty Z7.jpg

 

Digilent’s Arty Z7 dev board

 

 

 

 

 

More about Watchdogs: You need them. They’re in the Zynq UltraScale+ MPSoC. So use them.

by Xilinx Employee ‎10-30-2017 10:31 AM - edited ‎10-30-2017 10:33 AM (2,383 Views)

 

Today’s blog post from Adam Taylor, “Adam Taylor’s MicroZed Chronicles Part 222, UltraZed Edition Part 20: Zynq Watchdogs,” discusses the three watchdogs incorporated into the Zynq UltraScale+ MPSoC’s PS (processing system). Adam does a terrific job of describing the watchdog capabilities in the Zynq UltraScale+ MPSoC and how you can use them to bulletproof your firmware and software against crashes. Meanwhile, I think a bit more background on watchdogs in general might be of help.

 

In my opinion, the best general article about watchdogs and their uses is “Great Watchdog Timers For Embedded Systems,” by Jack Ganssle. Jack wrote this article in 2011 and updated it in 2016. The article starts with a disaster. The Clementine spacecraft, launched in 1994, successfully mapped the moon and then was lost en route to near-Earth asteroid Geographos. Evidence suggests that the spacecraft’s dual-redundant processor’s software threw a floating-point exception, locked up, and essentially exhausted the fuel for the spacecraft’s maneuvering thrusters while putting the spacecraft into an 80 RPM spin from which it could not be recovered. Although the spacecraft’s 1750 microprocessor had a hardware watchdog, it was not used.

 

 

NASA Clementine.jpg

 

Clementine spacecraft. Photo credit: NASA/JPL

 

 

 

You do not want this sort of thing happening to your design.

 

Later in the article, Jack writes, “The best watchdog is one that doesn't rely on the processor or its software. It's external to the CPU, shares no resources, and is utterly simple, thus devoid of latent defects.”

 

That describes the three hardware watchdogs in the Zynq UltraScale+ MPSoC.

 

For more excellent watchdog advice, read Adam Taylor’s blog post from today and read Jack Ganssle’s treatise on watchdogs.

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.