We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!


Spectrum’s FPGA-based M4x series of 3U PXIe Arbitrary Waveform Generators (AWGs) generate waveforms based on 16-bit samples delivered at maximum rates from 625Msamples/sec to 1.25G samples/sec with bandwidths to 400MHz and programmable output levels from ±200 mV to ±4 V (±5 V for 625 MS/s models). There are five models in the M4x series with 200MHz or 400MHz bandwidths and 1, 2, or 4 channels.



Spectrum M4x AWG.jpg


Spectrum’s M4x series of PXIe Arbitrary Waveform Generators (AWGs)




The generators feature five operating modes:


  • Single-shot
  • Loop
  • FIFO
  • Gating replay
  • Multiple replay



Spectrum provides OS drivers for 32-bit and 64-bit Windows and Linux and code examples for C/C++, LabVIEW, MATLAB, LabWindows/CVI, IVI, .NET, Delphi, Visual Basic, and Python. Drivers are compatible with earlier products so existing Spectrum customers can use the same software developed a decade ago for the company’s 20Msamples/sec generators to control these new M4x AWGs. Spectrum also offers its SBench6-Pro software that lets users control all of the AWG's operating modes and hardware settings from a simple GUI.


Here’s a block diagram of Spectrum’s M4x AWG hardware:




Spectrum M4x AWG Block Diagram.jpg


Spectrum M4x series PXIe Arbitrary Waveform Generator Block Diagram




Note that the digital section of this instruments is largely implemented using one FPGA, which happens to be a Xilinx Virtex-6 XC6LX130T. This FPGA implements the PCIe Gen2 x4 bus interface, the sample-memory interface, the sample-sequencing and trigger logic, and the interface to the high-speed DACs.


Please contact Spectrum directly for additional information about these products.


By Adam Taylor


Over the last few weeks and indeed last year, we have looked at the Xilinx SDSoC Development Environment in detail. However one area we have not examined are SDSoC’s performance monitoring and the trace capabilities.


Performance monitoring allows us to examine the performance of the processors executing applications within our system. We also can see the performance of the AXI interconnect used as part of the Zynq SoC’s PL acceleration in considerable detail. This feature allows us to understand the interaction between the PS and the PL. Tracing capability, which requires more detail, will be the focus of another blog.


We enable the AXI performance monitor using SDSoC’s Project overview. On the right-hand side under options, there is a tick box labeled Insert AXI Performance Monitoring. Checking this box and then cleaning the build, prior to a complete re-build of the project with the active configuration set to SDDebug, tells SDSoC to insert AXI performance-monitoring blocks into the design.






For this example, I will use one of the demo applications and target the ZedBoard. I am going to run the matrix multiplication example and target a bare-metal solution. We can monitor the AXI performance using both standalone code and Linux.


Once the application is built, we need to connect the ZedBoard to our development PC using both the UART and the JTAG connectors.


To run the examples on our target board, we will be using an approach that differs from what we have done before—i.e. we are not going to copy the generated files on to a SD Card. Instead we are going to use SDSoC’s Debugger. We will also be using two new perspectives within SDSoC: the debugger perspective (which should be familiar to those of us who have used Xilinx SDK previously), and the performance analysis perspective.


The first thing we need to do with the files we’ve generated is to create a debug configuration for the elf file. Within SDSoC, under project explorer, open the folder for the project we have just compiled, expand the SDDebug folder, and select the elf file for the project.







Right click on this selection and select Debug As -> Debug Configurations. Create a new Xilinx SDSoC Application as configured in the image below.






Selecting the Debug Configuration




On the application tab, check the stop program at entry operation. This will prevent the program from running the minute it is downloaded and will allow us to control the program when executed.






SDSoC Debug Configuration







Ensuring the program waits at entry



With this complete, click on debug. The bit file will be loaded and the application downloaded and held at the entry point, awaiting our command. You may see a dialog asking you to switch to the debug view, click yes and you will see that the application has been loaded and is paused.





Program downloaded and awaiting execution



If we want to execute the program, we can click the resume button (or hit F8) as shown above. However if we do that, we will not obtain the performance data. If we want the performance data, we need to open the performance analysis perspective by clicking on the open perspective button.





We can also select Window->Perspective-> Open Perspective-> Other







Selecting the Performance Analysis Perspective



This will open the performance analysis perspective. However, before we can obtain the performance analysis, we need to define the underlying hardware. This is very simple to do under the performance system manager. Just select run.





The Performance Session Manager settings



This will open a dialog box that allows us to define the clock rate and the APM (AXI Performance Monitor) information. This information resides in the following directory:










Defining the APM slots in the design



Once this is completed, we can run the program and capture the information of interest within in the performance graphs. These performance graphs relate to either the PS or the APM performance. I captured the following when I ran the program:







Result of the APM Performance Analysis Graph







Result from the APM Performance Analysis Counters




Performance analysis allows us to examine the performance of our system in more depth, which helps us better understand the interaction between the Zynq SoC’s PS and PL.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.




 MicroZed Chronicles hardcopy.jpg



  • Second Year E Book here
  • Second Year Hardback here




MicroZed Chronicles Second Year.jpg 




All of Adam Taylor’s MicroZed Chronicles are cataloged here.










Agnisys IDesignSpec and ISequenceSpec can generate synthesizable HDL for Vivado from plain-text specs

by Xilinx Employee ‎10-20-2016 10:23 AM - edited ‎10-20-2016 03:49 PM (921 Views)



Here’s the system engineer’s dream: Write a concise specification for a system, feed it to the right tool, and get a working design out of the other end of the tool. As I said, it’s a dream. But Agnisys seems bound and determined to make this dream a reality. The company offers two design tools—IDesignSpec and ISequenceSpec—that perform this feat for system and test designs. Here’s a diagram of the IDesignSpec flow:



Agnisys IDesignSpec.jpg 


Agnisys IDesigSpec Design Flow



Note that IDesignSpec accepts specifications in a variety of text-centric formats including IP-XACT and XML and emits a number of files including synthesizable Verilog or VHDL, which slips right into the Xilinx Vivado HLx Design Suite.


Now this may sound complex and I’m sure that whatever’s going on under the hood is indeed complicated; but perhaps your job isn’t, as demonstrated in the following 5-minute demo video:






Contact Agnisys directly for more information about IDesignSpec and ISequenceSpec.









Adam Taylor, the prolific author of the long-running MicroZed Chronicles series, has just made a 4-minute video showing how he got the Raspberry Pi Camera running with the Trenz Electronic ZynqBerry dev board based on the Xilinx Zynq Z-7000 SoC. After just four minutes, he’s got HD video from the camera running through the ZynqBerry board and onto a video monitor. Take a look:





For more info on the ZynqBerry board, see:













Xilinx introduced the UltraFast Design Methodology—which uses hand-picked best design practices to speed you on your way to a successful design—nearly two years ago. If you’re not yet using the UltraFast Design Methodology but are curious as to why you might want board this train, here’s an 11-minute video that tells all:






The most requested piece of IP for Xilinx All Programmable devices is DMA for PCIe and the Xilinx DMA for PCIe subsystem is now included—free—in the Xilinx Vivado HL Design Suite. This DMA block supports as many as four PCIe Gen3 x16 channels with transfer sizes as large as 256Mbytes (and infinitely long transfers with linked descriptors). The DMA engine in the IP supports scatter/gather operations.


As of the new 2016.3 release of Vivado, this PCIe DMA IP core supports PCIe Gen3 x16 operation in Xilinx UltraScale+ FPGAs and MPSoCs. There’s beta support for Artix-7 and Kintex-7 FPGAs and the Zynq Z-7000 SoC as well in this latest Vivado release.


Here’s a 14-minute video to give you all of the technical details and a demo:






Real-time video analytics that can recognize 1000 object classes at frame-rate speeds need heavy-duty processing, so TeraDeep pulled a really big gun from the rack: a Micron Pico Computing AC-510 accelerator card based on a Xilinx Kintex UltraScale KU060 FPGA connected to Micron’s Hybrid Memory Cube (HMC). TeraDeep’s press release says:


“For low-latency applications such as video analytics, where quick recognition and tracking of fast-moving objects is critical, the graphical processing units (GPUs) used in conventional systems are at a disadvantage. TeraDeep instead uses an FPGA-based architecture that offers faster analytics at half the power, making it an ideal candidate for on-premise appliances.


The first version of the company's solution is an FPGA-based PCIe board that achieves a four-time lower latency compared with the latest GPUs.”


A quick trip to the TeraDeep home page will show you a series of animated GIFs where you can see the real-time recognition in action.






TeraDeep will be demonstrating this technology in Micron's booth at the Supercomputing 2016 Conference on November 14-17 in Salt Lake City, Utah.









Late last week, specifically October 14, marked the rollout of the OpenCAPI standard for high-speed, coherent interconnect among processor, memory, hardware accelerators, and I/O devices in data center environments. The new OpenCAPI Consortium—which includes AMD, Dell EMC, Google, Hewlett Packard Enterprise, IBM, Mellanox Technologies, Micron, NVIDIA, and Xilinx—made the announcement.


Why OpenCAPI? The consortium lists two solid reasons:


  1. Hardware acceleration will become as commonplace as microprocessors.
  2. Existing system interfaces cannot meet current much less future requirements for coherent connection of accelerators.


Thus, concluded the initial group of OpenCAPI Consortium members, there’s a need for a new and open standard.


Here are some use cases already predicted for OpenCAPI:



OpenCAPI Use Cases.jpg



OpenCAPI Use Cases



OpenCAPI uses 25Gbps signaling and a low-latency protocol for high-speed signaling among various system components. Attached devices operate natively within the application’s user space using the OpenCAPI standard. The protocol is designed to be ISA-agnostic.


If the “CAPI” part of the OpenCAPI standard looks familiar, that’s because it is. OpenCAPI is an evolved, open version of the CAPI protocol. For previous Xcell Daily blogs about the CAPI standard, see:









The folks at Hackaday are holding a SuperConference in Pasadena, California on November 5 and 6 and Digilent’s Sam Bobrowicz is running a 4-hour, hands-on, $79 workshop starting at noon on Saturday titled “FPGAs: Beyond Digital Logic with Microblaze and Arty.” (The registration fee includes a $99 Digilent ARTY board, so the workshop’s a bargain!) Sam’s going to be teaching advanced FPGA applications using the Arty board and Xilinx’s Vivado Design Suite. Participants will use a Xilinx Microblaze soft core processor along with a library of pre-built IP blocks to design a custom microcontroller and implement it inside a Xilinx Artix-7 FPGA. Graphical design tools and a standard C-programming environment will be used. This workshop will not involve writing HDL.


Register Here.



The $99 Digilent ARTY dev board


 ARTY Board v2 White.jpg




For more information about the ARTY board, see: ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????





Programmable logic control of power electronics—where to start? What dev boards to use?

by Xilinx Employee ‎10-18-2016 10:24 AM - edited ‎10-18-2016 10:28 AM (1,726 Views)


A great new blog post on the ELMG Web site discusses three entry-level dev boards you can use to learn about controlling power electronics with FPGAs. (This post follows a Part 1 post that discusses the software you can use—namely Xilinx Vivado HLS and SDSoC—to develop power-control FPGA designs.)


And what are those three boards? They should be familiar to any Xcell Daily reader:



The $99 Digilent ARTY dev board (Artix-7 FPGA)


ARTY Board v2 White.jpg 




The Avnet ZedBoard (Zynq Z-7000 SoC)


ZedBoard V2.jpg






The Avnet MicroZed SOM (Zynq Z-7000 SoC)



MicroZed V2.jpg





Who is ELMG? They’ve spent the last 25 years developing digitally controlled power converters in motor drives, industrial switch mode power supplies, reactive power compensation, medium voltage system, power quality systems, motor starters, appliances and telecom switch-mode power supplies.



For more information about the ARTY board, see: ARTY—the $99 Artix-7 FPGA Dev Board/Eval Kit with Arduino I/O and $3K worth of Vivado software. Wait, What????



For more information about the MicroZed and the ZedBoard, see the 150+ blog posts in Adam Taylor’s MicroZed Chronicles.




Successful timing closure with programmable logic demands that you master constraints. If you’d like to sharpen those skills quickly, for free, then there’s a timing constraints Webinar for you. (Actually, it’s two Webinars to accommodate worldwide time zones.) The two Webinars take place on Friday, October 21 and they’re being taught by Doulos, an exceptional Xilinx authorized training provider.


This Webinar will teach you how to make best use of timing exceptions and simple I/O constraints. Here’s an outline:



  • Review of basic XDC timing constraints and static timing analysis in the Vivado Design Suite
  • Multicycle Paths
  • False Paths
  • Exclusive Clock Groups
  • Asynchronous Clock Groups
  • Minimum and Maximum Delay Exceptions
  • Constraining Clock Domain Crossing Circuitry
  • Path Segmentation
  • Constraint priority



Sound good? Register here.



Accolade’s 3rd-gen, dual-port, 100G ANIC-200Ku PCIe Lossless Packet Capture Adapter can classify packets in 32 million flows simultaneously—enabled by Xilinx UltraScale FPGAs—while dissipating a mere 50W. The board features two CFP4 optical adapter cages and can time-stamp packets with 4nsec precision. You can directly link two ANIC-200Ku Packet Capture Adapters with a direct-attach cable to handle lossless, aggregated traffic flows at 200Gbps.


Applications for the adapter include:


  • Passive and Inline Network Monitoring
  • Network Security and Forensics
  • In-Line Deep Packet Inspection (DPI)
  • Network Test and Measurement
  • Network Probes
  • Video Stream Monitoring
  • High Frequency Trading (HFT)
  • Application Performance Monitoring (APM)
  • High Performance Computing (HPC)



ANIC-200Ku Dual-=Port 100G Lossless Packet Adapter.jpg 

Accolade’s 3rd-gen, dual-port, 100G ANIC-200Ku PCIe Lossless Packet Capture Adapter


Want a fast, hands-on intro to the Xilinx Zynq Z-7000 SoC? Want it all in less than one day? Wish granted.


Next week on Thursday, October 27 Wednesday, October 26 at ARM TechCon in Santa Clara, California, krtkl’s CTO Jamil Weatherbee and Chief Design Officer Russell Bush will be teaching a 3-hour course on the Xilinx Zynq Z-7000 SoC based on the company’s incredibly successful, low-cost snickerdoodle dev board.


For lots more information on the snickerdoodle, see:











krtkl’s Zynq-based snickerdoodle dev board



Baidu SQL Accelerator.jpg Today, Xilinx announced that Baidu, China’s leading Internet search provider, is using Xilinx FPGAs to accelerate machine learning applications in their data centers located in China. Baidu’s Jian Ouyang discussed this work earlier this year at the Hot Chips conference held in Cupertino, California. (See “Baidu Takes FPGA Approach to Accelerating SQL at Scale” on the Nextplatform.com Web site. The Baidu Hot Chips paper is titled: “SDA: Software-Defined Accelerator for general-purpose big data analysis system.”) According to the Nextplatform.com article, “…Baidu sits on over an exabyte of data, processes around 100 petabytes per day, updates 10 billion Web pages daily, and handles over a petabyte of log updates every 24 hours.”


The Nextplatform.com article reports that Baidu developed its own FPGA board based on a Xilinx Kintex UltraScale KU115 FPGA paired with 8 to 32Gbytes of DDR4-2400 SDRAM. These boards automatically handle key SQL functions on demand. The article also contains two slides from Jian Ouyang’s Hot Chips presentation showing performance gains from the FPGA board. In one case, TPC-DS query3 runs 25x faster than the same function compiled from C++ and running in software. Terasort showed an 8x improvement. These are substantial performance gains and when you’re processing 10 billion Web pages a day, those sorts of numbers add up to big savings in data center capital expenses and energy costs.


Today’s press release also says that Baidu and Xilinx “are collaborating to further expand volume deployment of FPGA-based accelerated platforms.”






By Adam Taylor


Having gotten the Trenz Electronic ZynqBerry to say “hello world” in my previous blog post, our next step is to make the ZynqBerry board run a demo example and to see the acceleration we can get from using the programmable hardware in the on-board Zynq Z-7000 SoC. We will target Linux for our example because many of you will be using this OS WITH the ZynqBerry board.


The first thing we want to do is create a new project. Unlike last week where we created a default platform, we will use a local project so that we can access the example files with greater ease.


To do this within my working area I have created a new directory called ZynqBerry and copied the SDSoC platform definition we created last week underneath it.





This is what we would do if we have created our own, custom logic platform. Under this directory we will see a sample directory that contains a number of examples. We can use these examples to demonstrate the power of SDSoC and the hardware acceleration we can achieve.


To create a new project again we click file -> new -> SDSoC Application






This will open the dialog box as we saw last week when we created the “hello world” program. However, this time we are going to use the local project option, not a predefined platform. As such, after we have entered the project name and selected the operating system, we select “other” instead of selecting one of the pre-defined platforms and then navigate to the location of the local project.






You will see the platform name. Update this to show the path and the name. In this example, it becomes te0726_m_sdsoc. (Note: TE0726 is the model number for the Trenz Electronic ZynqBerry board.) SDSoC will then create the project for you and you’ll see it in the SDSoC project overview. Under the Hardware Functions, you will see a pre-selected C Function to be accelerated with the name of mmult_accel.


When the project is built, this function will moved into the Zynq SoC’s programmable logic instead of being run as a subroutine in the ARM cores. This is achieved using the connectivity framework within SDSoC that seamlessly integrates accelerated functions into the application. This is shown on the overview tab of the project overview.


If we want to see more about the platform and the resources we have available, we can click on the platform tab at the bottom:






To generate the example, we click on project -> build project, which will compile the application and build the necessary hardware. It may take a little time to perform the high-level synthesis and generate the bit file in Vivado.


Once this is complete we need to flash the Boot.bin file to the on-board QSPI Flash memory and copy the kernel image and the elf of the application to the SD card as we did last week.


After doing all of that and powering up the board, we can connect to the ZynqBerry using a terminal program like PuTTY over a serial link. Configure the link as 8, n, 1, and 115200 bps. At the log on prompt, log on as root. Now we can run the executable file generated by SDSoC.


However, to run the example we first need to mount the SD card because it is not mounted automatically. We can do this using the command below in the terminal window to mount the SD card into the mnt directory:



mount /dev/mmcblk0p1 /mnt/



With that completed, we can execute the program and see how much hardware acceleration we get using the command:






When I ran this on the ZynqBerry sitting on my desk, I achieved the following results showing a 33x speed improvement, which is not bad at all.







Note: We have looked at SDSoC before. For more SDSoC examples and usage information, please see the MicroZed Chronicles parts 85 to 103 at www.microzedchronciles.com





Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



MicroZed Chronicles hardcopy.jpg 





  • Second Year E Book here
  • Second Year Hardback here




 MicroZed Chronicles Second Year.jpg




All of Adam Taylor’s MicroZed Chronicles are cataloged here.









I’ve written blog posts about the Xilinx Zynq UltraScale+ MPSoC FPGA ZCU102 Evaluation Kit before and have been asked by anxious designers, “When can I order one?” Now I have the answer:




This board’s clearly the way to get going now. It has everything you need to start designing with the Zynq UltraScale+ MPSoC now including a node-locked license for the Design Edition of the Vivado HL Design Suite.




ZCU102 Eval Kit.jpg 


The Xilinx Zynq UltraScale+ MPSoC FPGA ZCU102 Evaluation Kit, currently available for $2495




I could give you a long, long list of the hardware on the board, but I think this image is more useful:



ZCU102 Board Photo.jpg



The Xilinx Zynq UltraScale+ MPSoC FPGA ZCU102, board detail



Note that the current price for the Xilinx Zynq UltraScale+ MPSoC FPGA ZCU102 Evaluation Kit is $2495. The list price shown on the Web page is $4995 but there’s currently a strike-through line superimposed on that number with the lower price adjacent. I’m told the lower price is in effect for now and that’s a bargain if you’re looking for a fast way to jumpstart your Zynq UltraScale+ MPSoC design.




The news on DDR memory failures is not good according to FuturePlus Systems’ co-founder Barbara Aichinger speaking at this week’s Memcon 2016 event in Santa Clara, California. Memory errors continue to plague the industry and they are proving to be expensive. According to one paper co-authored by Facebook:


“…servers were flagged for memory repair if they had more than 100 correctable errors per week…” and “Under our more aggressive/proactive repair policy, we find that on the average around 46% of servers that have errors end up being repaired each month.”


Aichinger then did the math for the Memcon audience:


  • About 2% of Facebook’s servers have a memory failure every month.
  • 46% of those servers with monthly memory failures get a memory DIMM swap.
  • Assuming Facebook has on the order of 100,000 servers, that’s 920 DIMM swaps per month.
  • There are 720 hours in a month.
  • Therefore, “Facebook is swapping out DIMMs [in its servers] every hour of every day of every month all year long!”



Two years ago, Aichinger’s Memcon 2014 topic was Row Hammer in DDR3. (See “Unexplained memory errors in your DDR3 design? Maybe it’s “Row Hammer.” Yet another thing to worry about” for more information about DDR Row Hammer.) Back then, the story was that row hammer would not be a problem with the new, yet-to-be-seen DDR4 SDRAM.


Wrong, wrong, wrong.


Aichinger said this week that DDR4 SDRAM also exhibits row-hammer failures. Want proof? Check out this White Paper from SGI titled “The Row Hammer Effect: Enhancing Memory RAS.” Worse, says Aichinger, error rates are vendor dependent.


Certainly, one way to induce memory errors in all kinds of DRAM including DDR4 SDRAMs is to violate memory protocols. All DDR specs including JEDEC’s DDR4 spec contain rules, many rules, about event ordering. For example:


  • Do not activate a bank that’s already open
  • Do not precharge a bank that’s closed
  • Do not read or write a page that’s not open


Timing violations are yet another way to cause memory errors.


Row hammer is a way to intentionally cause a memory error by repeatedly activating a row in the SDRAM. Repeated activation causes charge leakage in adjacent rows. Leak enough charge into a victim row and you can flip bits in that row even though you’re not explicitly accessing it. (And perhaps you cannot access that particular row because of memory management policies, so intentional row hammer is a form of hacking.)



This is your SDRAM on Row Hammer.jpg



This is your SDRAM on Row Hammer. Any questions?




The reason that Aichinger is standing atop this particular soap box at the moment is because she’s working with a JEDEC Task Group to produce a protocol-checks document for auditing SDRAM use. At Memcon, Aichinger proposed a compliance audit that can determine whether or not JEDEC specifications are being met in a specific application.


Achinger’s company, FuturePlus Systems, makes DIMM interposers that are pretty handy for this purpose. The interposers allow you to connect high-speed DSOs and logic analyzers directly to DIMM sockets so that you can monitor memory activity at speed to detect protocol violations caused by a number of issues including:


  • BIOS programming errors
  • Incorrect SPD programming (the SPD is an on-DIMM EEPROM with timing info about the specific SDRAMs on the DIMM)
  • Memory controller violations
  • Row hammer exploits


FuturePlus Systems also offers a self-contained piece of test equipment called the FS2800 DDR Detective that can perform this type of audit on DDR3, LPDDR3, DDR4, and LPDDR4 SDRAMs at speeds to DDR4-3200. (You’ll find a wealth of information about these memory-error problems on FuturePlus Systems’ www.ddrdetective.com and from the company’s blog.)


So where does that leave you? Where does it leave your design? Well, if you’re using an on-chip memory controller like the ones available in Xilinx Zynq Z-7000 SoCs and Zynq UltraScale+ MPSoCs, you’re going to want to be really careful about how you program these memory controllers and you might well want to perform an audit to ensure that you’re operating the DDR memory correctly. I’m sure FuturePlus systems’ would be happy to sell or rent a DDR Detective to you for an audit.


Beyond that, you may well want a special memory controller for your application’s specialized needs. If so, you’ll either need to design an ASIC or develop the controller using the programmable logic in a Xilinx All Programmable device. The programmable-logic route will get you to the finish line much faster and with more flexibility.





During the hot August days at NI Week in Austin earlier this year, Damon Bohls showed me a new feature called Smart Capture in the latest release of the VirtualBench software for the company’s groundbreaking VirtualBench All-in-One Instrument. I shot a video of his demo but the software had not yet been released so I could not post the video. Bohls, a Senior Software Engineer at NI, wrote to me today and told me that the software was now released. So now, here’s the 3-minute video demo:






NI’s VirtualBench instruments pack multiple test-and-measurement instruments into a compact, bench-top unit and the company’s latest and greatest version, the VB-8034, incorporates:


  • A 4-channel, 350MHz DSO (digital sampling oscilloscope) operating at 1.5G samples/sec
  • 34-channel, 350MHz logic analyzer
  • 20MHz/125Msamples/sec sine/square/ramp/triangle/dc/arb signal generator (14-bit resolution)
  • 5½-digit DMM with Vdc, Vac, Idc, Iac, resistance, continuity, and diode measurements
  • 3-channel programmable power supply (0-6V @ 3A, 0-25V @ 1A, 0 to -25V @ 1A))
  • 8 channels of digital I/O (5V-compatible input, 3.3V output)



VirtualBench instruments are soft—reprogrammable on both software and hardware levels thanks in part to the All Programmable Xilinx silicon used in the products’ design. The VB-8034 VirtualBench instrument incorporates both a Zynq Z-7020 SoC and a Kintex-7 FPGA in its design. (See “Dave Jones tears down an NI VB-8034 VirtualBench enhanced All-in-One instrument, finds Zynq SoC and Kintex-7 FPGA.”) The extreme reprogrammability of this design allows NI to upgrade and add features to the product at will. One such feature is the new Smart Capture.


Smart Capture allows you to set trigger events that tell the VirtualBench to automatically capture data on its DSO when a waveform is stable and a trigger event occurs. Bohls’ demonstrates how you might use such a feature in the above video.


More information about the VirtualBench 16.0 software upgrade is available on the NI Web site here.


For more insight into how NI’s engineers used Xilinx All Programmable devices to augment the VB-8034’s feature set, see “Want to know how NI implemented the FPGA-based digital phosphor in its new VirtualBench All-in-One Instrument? Here’s how.”




The $649 Avnet Zynq Transceiver Evaluation Kit (AZTEK) combines an Avnet PicoZed 7015 SoM (system on module), based on a Xilinx Zynq Z-7015 SoC, which is plugged into a PicoZed FMC V2 Carrier Card creating a low-cost experimentation platform for harnessing high-speed, multi-Gbps transceivers to transmit large amounts of data at high communication rates or over long distances using either fiber or coax. The PicoZed Carrier card breaks out the PicoZed SoM’s I/O including four high-speed transceivers, which terminate in a PCIe Gen2 x1 card edge interface, an LPC FMC connector, a SFP+ cage, and a general purpose SMA interface. Breaking out the Zynq SoC’s high-speed transceivers this way allows you to experiment and prototype with several different standard data-transmission protocols.




Avnet Zynq PicoZed Transceiver Eval Kit.jpg



The $649 Avnet Zynq Transceiver Evaluation Kit (AZTEK)



So that you’re ready to go out of the box, the kit also contains appropriate software and a number of compatible high-speed transmission media including:





Today, Xilinx announced open order entry for production 16nm Xilinx UltraScale+ devices. Here’s the text of the official announcement:


“Xilinx, Inc. (NASDAQ:XLNX) today announced it has reached a significant production milestone for its 16nm UltraScale+™ portfolio ahead of schedule. Less than a year after first ship of all devices, open order entry for production devices is available this quarter. The Xilinx® UltraScale+ portfolio is the only FinFET-based programmable technology available at 14nm or 16nm in the industry. The portfolio includes Kintex®, Virtex® UltraScale+ FPGAs and Zynq® UltraScale+ MPSoCs.”


“This production milestone further stretches our greater than one year lead in 16nm product delivery," said Moshe Gavrielov, president & CEO at Xilinx. "I'm proud of Xilinx's three consecutive generations of development and operational excellence." 





Today, BittWare announced the first customer shipments of two PCIe boards based on 16nm Xilinx Virtex UltraScale+ FPGAs: the XUPP3R and the XUPPL4. The BittWare XUPP3R is a 3/4-length PCIe board based on a Xilinx Virtex UltraScale+ XCVU9P FPGA that supports PCIe Gen4 x8 or Gen3 x16 and offers four front panel QSFP28 cages, each supporting 4 lanes of up to 25 Gbps with support for 10/25/40/100 GbE. The board also incorporates four DIMM sockets that accommodate as much as 256Gbytes of DDR4 SDRAM or BittWare’s high-performance QDR DRAM DIMMs. For even higher memory performance, you can also order the board populated with an optional 2Gbyte Hybrid Memory Cube (HMC) module.



BittWare XUPP3R UltraScale Plus Board.jpg



BittWare’s XUPP3R PCIe board based on a Xilinx Virtex UltraScale+ XCVU9P FPGA




The XUPPL4 is a low-profile PCIe board based on a Xilinx Virtex UltraScale+ XCVU3P FPGA that supports PCIe Gen4 x8 or Gen3 x16 and offers two front panel QSFP28 cages, each supporting 10/25/40/100 GbE. The board accommodates as much as 32Gbytes of DDR4 SDRAM.



BittWare XUPPL4 UltraScale Plus Board.jpg



BittWare’s XUPPL4 PCIe board based on a Xilinx Virtex UltraScale+ XCVU3P FPGA




The CCIX Consortium, which is developing a specification for Cache Coherent Interconnect for Accelerators (CCIX), announced today that it has tripled in size to 22 member companies and that the Release1 specification covering the physical, data-link, and protocol layers is now available to the consortium’s members. The CCIX consortium has chosen PCIe for the specification’s first transport layer, supporting several standard PCIe line rates (2.5, 8, and 16Gbps) with an additional high-speed 25Gbps option. However, the CCIX coherency protocol is actually agnostic of the link layer. At the existing PCIe line rates, this choice leverages the existing PCIe ecosystem including silicon, connectors, chip- and board-level design IP, and software. The faster 25Gbps rate will require something different.


CCIX simplifies the design of offload accelerators for hyperscale data centers by providing low-latency, high-bandwidth, fully coherent access to server memory. The specification employs a subset of full coherency protocols and is ISA-agnostic, meaning that the specification’s protocols are independent of the attached processors’ architecture and instruction sets. Full coherency is unique to the CCIX specification. It permits accelerators to cache processor memory and processors to cache accelerator memory.


The 22 member companies in the CCIX Consortium now include:


  • AMD *
  • Amphenol
  • ARM *
  • Arteris
  • Avery Design Systems
  • Broadcom Limited
  • Atos
  • Cadence Design Systems
  • Cavium
  • Huawei *
  • IBM *
  • Integrated Device Technology
  • Keysight Technologies
  • Mellanox technologies *
  • Micron Technology
  • NetSpeed Systems
  • Qualcomm Technologies *
  • Red Hat
  • Synopsys
  • Teledyne LeCroy
  • TSMC
  • Xilinx *


Note: The * by the company name denotes a founding member



CCIX is designed to provide coherent interconnection between server processors and hardware accelerators, memory, and among hardware accelerators as shown below:



CCIX Configurations.jpg



Sample CCIX Configurations



Typical applications for such accelerated systems include:


  • In-memory database processing
  • Data-center search
  • Intelligent network acceleration
  • Machine/Deep Learning
  • High-performance computing/supercomputing
  • 4G and 5G base stations
  • Mobile edge computing
  • Video analytics
  • Embedded computing



Contact the CCIX Consortium for more information about joining and getting access to the Release1 specification.




The Xilinx SDSoC Development Environment allows you to use C, C++, or SystemC to create hardware-accelerated code on platforms based on the Xilinx Zynq Z-7000 SoC. Today, there are more than 25 boards and SoMs from Xilinx and 3rd-party vendors listed as targets for SDSoC. Click here for the current list.


Also, see today’s blog post by Adam Taylor to see how he adapted SDSoC to the low-cost ZynqBerry from Trenz Electronic: “Adam Taylor’s MicroZed Chronicles Part 151: ZynqBerry & SDSoC.”



Trenz ZynqBerry Dev Board.jpg



Trenz Electronic ZynqBerry Dev Board






The well-studied N-Queens combinatorial puzzle yielded an answer for 26 queens in 2009 and has resisted further expansion to Q(27) for six years. (See “Solving the N-Queens Puzzle for 27 Queens using FPGAs.”) Now, a team led by Dr. Thomas B. Preusser at Technische Universität Dresden has cracked the puzzle for Q(27), again using the massive parallelism of FPGAs to reach an answer on September 19:



Q(27) = 234,907,967,154,122,528



Preusser’s team accomplished this feat by using Xilinx Vivado HL Design Suite tools to synthesize problem solvers that they instantiated into Xilinx 7 series FPGAs on the KC705 Kintex-7 FPGA Eval Kit and the VC707 Virtex-7 FPGA Eval Kit. (See the team’s Github page for details and code.)


Compared to previous synthesis results, the team found that the Vivado HL Design Suite was able to synthesize 67% more solvers running 7% faster than before for the KC705 board and 56% more solvers running 2% faster than before for the VC707 board. After synthesis, it still took “slightly more than a year” to compute a solution for Q(27). That’s an indication of just how difficult the calculation is.








Adam Taylor’s MicroZed Chronicles Part 151: ZynqBerry & SDSoC

by Xilinx Employee ‎10-10-2016 10:12 AM - edited ‎10-11-2016 05:46 PM (1,656 Views)


By Adam Taylor


(Note: This is a re-written and improved version of the original blog post that appeared on October 10.)



We are currently looking at Trenz Electronic’s ZynqBerry, which is intended for the maker end of the market. With this blog, I am going to go right back to the beginning to demonstrate how we can create a “hello world” program for this board. However, I want to do this with a twist. I want to use SDSoC to create the “hello world” program. SDSoC allows people not familiar with FPGA design to benefit from the Zynq SoC’s PL (programmable logic) performance without the need to be FPGA experts.


The first thing we need to do is create an SDSoC platform. The platform tells SDSoC what resources are available to it within the Zynq SoC and it configures the processor in terms of peripherals and memory addresses.


To get started, we need to download the demo SDSoC application from the Trenz Wiki. I downloaded the file including the pre-built directory. With the Zip file downloaded, I extracted the files to a directory (note make sure to use a shorter name for the extraction to ensure that the provided TCL scripts run properly). The directory contains a number of folders and batch files and looks like this:



Image 1.jpg 



The first step in creating a platform is to define the hardware side of it. We start this by running the following scripts:


  • Run design_basic_settings.cmd
  • Run Vivado_create_project_guimode.cmd


To define an SDSoC Platform, we need to define both the software and hardware environments. Two XML files provide these definitions. One XML file describes the hardware (clocks, AXI Ports, and available interrupts) and the other file defines the software libraries.


To define the hardware platform, we need to use the Xilinx Vivado HL Design Suite. Hence the need to create the project using the cmd files. Trenz provides all of the information we need to define the software under the settings/SDSOC folder in an XML file called sdsoc_sw.pfm.


With the project open in Vivado, we need to run a few provided scripts using the TCL command line. Enter and execute the following.



  • TE::ADV::beta_util_sdsoc_project -check_only
  • TE::ADV::beta_util_sdsoc_project


These commands create a new SDSoC folder in the directory’s top level. Under the SDSoC directory, you will see another directory with the project name; in this case it’s te0726_m_sdsoc. You will now find within this directory all of the files we need to create an SDSoC Platform.



Image 2.jpg 


Resultant SDSoC folder



Now it’s time to co-locate the files in a directory so that we can use it within SDSoC. For this example, because the platform is very simple and hence very flexible, I am going to create a generic platform within the SDSoC installation. This way we can reuse it for different projects on the same platform. The catch with this is we cannot easily see the provided demo applications. I will address that next time.


We can define a new SDSoC platform by moving the directory te0726_m_sdoc and its contents that we just created in Vivado under our <SDSOC Installation Path>/platforms/ as shown below.



Image 3.jpg 


With this complete, we are now ready to create our first application. For those who want to skip these steps and use the predefined SDSoC platform that I have created, you can download the zip file from my Github. Just unzip the file into your SDSoC platforms directory.


Now, when we fire up SDSoC we will be able to create an application running on our newly defined ZynqBerry platform.


Selecting file new -> SDSOC application will bring up the dialog box shown below. From the platform box, we should be able to select the platform that we just created. As this is just our first example, we will stick to using the bare metal OS.



Image 4.jpg



One we’ve created the project, you will see a project overview that defines the platform. We can use this overview to select the functions we want to accelerate.



Image 5.jpg



For now, we will write a simple program that loops “hello world.”


Create a C file under the source and add the code shown below (again you can download this from the GitHub if you want):



Image 6.jpg


The final step is to generate the boot files. To do this click on project-> build project. This may take a few minutes. The resulting programming files will be located under the SD Card directory within the debug or release folders (depending upon what you set in the SDSoC project overview) within your project.


This is where the ZynqBerry configuration is a little different compared to approaches that we have used before. The ZynqBerry uses the Zynq Z-7010 SoC in the CLG225 footprint. In this package, it is not possible to boot directly from the SD Card. The ZynqBerry addresses this by using a QSPI Flash device that contains the FSBL (first-stage boot loader) bit file and, if you’re using Linux, a second-stage boot loader like U-Boot while the application, OS, and the Linux file system sit on the SD Card, which is accessed by the either the FSBL or the second-stage boot loader.


This means that for our example, we need to use the Flash programmer to program the boot.bin file onto the QSPI Flash memory. We cannot do this within SDSoC. However we can use the Xilinx SDK in Vivado to do this. We need to use a hardware definition that’s available under the prebuilt\hardware\te0726_m directory.


We can import this definition within SDK using file-> new -> other and selecting hardware platform specification in the dialog box:




Image 7.jpg 



Image 8.jpg 


We want to select the HDF file, which is the hardware definition file. Clicking on finish will import this file into the workspace.


From the Xilinx tools option within SDK, select program flash and ensure that the ZynqBerry is connected over the micro USB port:



Image 9.jpg 



For the image file, browse to the boot.bin file created by the SDSoc build. While you’re burning this file into the QSPI Flash memory, copy the other bin file (named <project>.elf.bin) to the SD Card.


Once the Flash programming is complete, turn off the ZynqBerry and insert the SD Card before repowering it. When I did this for the application in hand, my terminal window showed the following:



Image 10.jpg 


We now have a simple SDSoC platform that we can use with the ZynqBerry board, although we do need to re-program the QSPI Flash memory and the SD Card for each new application.




Code is available on Github as always.


If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.




  • First Year E Book here
  • First Year Hardback here.



Image 10.jpg 




  • Second Year E Book here
  • Second Year Hardback here




Image 12.jpg 




All of Adam Taylor’s MicroZed Chronicles are cataloged here.








OKI IDS in Japan is an advanced consulting and design house with expertise in many applications including the design of machine- and embedded-vision systems. This week at the Xilinx Embedded Software Community Conference held at Xilinx in San Jose, Wataru Takahashi from OKI IDS described the implementation of a real-time, object-detection system based on the Harris Corner algorithm. A customer had implemented a vision-processing application on a notebook PC connected to a USB Webcam and then asked OKI IDS to implement the same application on a Zynq-based Xilinx ZC706 Eval Kit connected to an HD video camera with the intent of developing a high-performance embedded-vision system.


The customer’s existing prototype application processed 640x480-pixel video in software running on a PC at approximately 10 frames/sec. The same code was recompiled and run on the Zynq SoC’s dual-core ARM Cortex-A9 MPCore processor on the ZC706 board. The resulting code processed 1280x720-pixel HD video at only 2-3 frames/sec—far too slow for commercial use. However, by analyzing the vision-processing sequence and by using the Xilinx SDSoC Development Environment to move critical vision-processing tasks (including the Harris Corner algorithm) into custom hardware implemented with the Zynq SoC’s programmable logic, OKI IDS chopped the per-frame processing time from 400msec to 78msec and boosted the processing rate to approximately 15 frames/sec, thus providing an acceptable proof of concept for the customer.


OKI IDS’ next step was to move the design from the Xilinx Zynq-7000 SoC to the faster Xilinx Zynq UltraScale+ MPSoC, which has more on-chip processing resources. To do so, OKI IDS switched from Xilinx SDSoC to the Xilinx Vivado HL Design Suite and Vivado HLS and used the AuvizCV vision libraries to implement the Harris Corner algorithm. The result: double the performance—30 frames/sec.


Here’s a block diagram of the OKI IDS vision-processing algorithm adapted to the Zynq UltraScale+ MPSoC’s APU (application processing unit) and PL (programmable logic) on a Xilinx ZCU102 Eval Kit:



OKI IDS Harris Corner Zynq MPSoC demo.jpg 


OKI IDS Harris Corner Video-Processing Demo on a Zynq UltraScale+ MPSoc




You can now download and evaluate this demo from the OKI IDS Web site. Click here.


For additional information about this demo, see “Free Download! Zynq-based Harris Corner Detection Video Demo posted by OKI IDS.”





The krtkl Snickerdoodle dev board—“A palm-sized, reconfigurable Linux computer that connects to the real world: ARM + FPGA + Wi-Fi + Bluetooth + 154 I/O”—is a very successful, low-cost dev board for the Xilinx Zynq-7000 SoC that appeared a while ago on the CrowdSupply crowd-funding site. The $195 Snickerdoodle Black version, due to ship this month, is krtkl’s amped-up version with Dual-band 2.4GHz 802.11n 2x2 MIMO Wi-Fi, Bluetooth Classic & BLE, and a copperHead heat sink. Now there’s one more component that will ship with the Snickerdoodle Black board and it’s a big one: a license for the Xilinx SDSoC Design Environment that allows you to develop hardware-accelerated embedded designs for the Zynq SoC using C, C++, or SystemC. (See the announcement here.)


It all ships to you for $195.


Considering that the usual price for SDSoC is $995, that’s a real bargain.


Join the more than 3000 people who have ordered this cool dev board. Order yours here.


For more information about krtkl’s Snickerdoodle, see:









Krtkl’s Snickerdoodle Dev Board for the Xilinx Zynq-7000 SoC






MATRIX Labs bills the $99 MATRIX Creator dev board for the Raspberry Pi, listed on Indiegogo, as a “hardware bombshell.” A more precise description would be and FPGA-accelerated sensor hub sporting a massive array of on-board sensors. It’s a one stop shop for prototyping IoT and industrial IoT (IIoT) devices using the Raspberry Pi board as a base.


Here’s a top-and-bottom photo of the board:


MATRIX Creator Dev Board.jpg


MATRIX Creator dev board for the Raspberry Pi



Note: That square hole in the center of the board allows the Raspberry Pi’s Camera Module to peek through.


Here’s a detailed list of the various components on the MATRIX Creator board and its capabilities:



MATRIX Creator Dev Board Components.jpg



MATRIX Creator dev board Components and Capabilities




Note that one of those components is a Xilinx Spartan-6 LX4 FPGA, which makes a very fine low-cost sensor hub capable of operating in real time. No doubt you’d like to see how the FPGA fits into this board. MATRIX Labs has that covered with this block diagram:



MATRIX Creator Dev Board Block Diagram.jpg



MATRIX Creator dev board Block Diagram




MATRIX Labs has also developed supporting tools and software for the MATRIX Creator dev board including MATRIX OS, MATRIX CV (a computer-vision library), and MATRIX CLI (a sensor-hub application). More software is already being developed.


Unlike many crowd-funded projects, MATRIX Creator is already shipping so you’re assured of getting a board, according to MATRIX Labs, but you only have two days left in the funding period. So check it out now.



MATRIX Creator: IoT Computer Vision Dev Board



This week, National Instruments (NI) announced a technology demonstration of a test system for 76-81GHz automotive radar, targeting ADAS (Advanced Driver Assistance Systems) applications. The system is based on the company’s mmWave front-end technology and its PXIe-5840 2nd-generation vector signal transceiver (VST), introduced earlier this year, which combines a 6.5GHz RF vector signal generator and a 6.5GHz vector signal analyzer in a 2-slot PXIe module. (See “NI launches 2nd-Gen 6.5GHz Vector Signal Transceiver with 5x the instantaneous bandwidth, FPGA programmability.”) The ADAS Test Solution combines NI’s banded, frequency-specific upconverters and downconverters for the 76–81GHz radar band with the 2nd-generation VST’s 1GHz of real-time bandwidth.


The PXIe-5840 VST gets its real-time signal-analysis capabilities from a Xilinx Virtex-7 690T FPGA.



NI PXIe-5840 2nd-generation VST.jpg



National Instruments PXIe-5840 2nd-generation vector signal transceiver (VST)




A novel development by Zhao Tian, Kevin Wright, and Xia Zhou at Dartmouth College encodes data streams on sparse, ultrashort pulses and transmits these pulses optically using low-cost, visible LED luminaires designed for room illumination. (See “The DarkLight Rises: Visible Light Communication in the Dark”) The optical light pulses are hundreds of nanoseconds long so they’re far too short—by four or five orders of magnitude—to be perceived as light by human vision. However, they’re long enough to be captured by an inexpensive photodiode and are therefore useful for digital communications—albeit slow communications, on the order of 1.6 to 1.8 kbits/sec. Nevertheless, that rate meets a number of low-speed communications needs including many IoT and industrial IoT requirements.


Signal encoding employs OPPM (overlapping pulse position modulation), implemented in a $149 Xilinx Artix-7 A35T FPGA on a Digilent Basys 3 FPGA Trainer Board.



Digilent Basys 3 Artix-7 FPGA Trainer Board.jpg


Digilent Basys 3 Artix-7 FPGA Trainer Board



Here’s a short 1-minute video giving you an ultrashort overview of the project:





About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.