We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!



Last week, the Mycroft Mark II Privacy-Centric Open Voice Assistant Kickstarter Project, which is based on Aaware’s far-field Sound Capture Platform and the Xilinx Zynq UltraScale+ MPSoC, hit 300% funding on Kickstarter. Today, the pledge level hit 400%—$200k—with 1235 backers. There are still 18 days left in the funding campaign; still time for you to get in on this very interesting, multi-talented smart speaker and low-cost, open-source Zynq UltraScale+ MPSoC development platform.




Mycroft Mark II Smart Speaker.jpg 




Meanwhile, there seems to be a new pledge level that I don’t recall: a $179 level that includes a 1080p video camera. That’s in addition to the touch screen and voice input, which gives the Mycroft Mark II an even more interesting user interface. There are only a limited number of $179 pledge options, with 177 remaining as of the posting of this blog.


In addition, Fast Company has also published an article on the Mycroft Mark I Kickstarter project titled “Can Mycroft’s Privacy-Centric Voice Assistant Take On Alexa And Google?” Be sure to take a look.



For more information about the Mycroft Mark II Open Voice Assistant, see:






For more information about Aaware’s far-field Sound Capture Platform, see:





There is no PCIe Gen5—yet—but there’s a 32Gbps/lane future out there and TE Connectivity demonstrated that future at this week’s DesignCon 2018. The demo’s real purpose was to show the capabilities of TE Connectivity’s Sliver connector system which includes card-edge and cabled connectors. In the demo at DesignCon, four channels carry 32Gbps data streams through surface-mount and right-angle connectors to create a mockup of a future removable-storage device. Those 32Gbps data streams are generated, transmitted, and received by bulletproof Xilinx UltraScale+ GTY transceivers operating reliably at a theoretical PCIe Gen5’s 32Gbps/lane data rate despite 35dB of loss through the demo system.


Here’s a 1-minute video of the demo:







Xilinx demonstrated its 56Gbps PAM4 SerDes test chip nearly a year ago (See “3 Eyes are Better than One for 56Gbps PAM4 Communications: Xilinx silicon goes 56Gbps for future Ethernet”) and this week at DesignCon 2018 in Santa Clara, Samtec used that chip to demo its high-speed ExaMAX backplane connectors on working on an actual backplane. The demo setup included a Xilinx board with the PAM4 test chip driving a pair of coaxial cables connected to a paddle card plugged into one end of a backplane, which was populated with 14 ExaMAX connectors. A second paddle card at the other end received the PAM4 signals and conveyed them back to the Xilinx board via a second set of coaxial cables. Here’s a photo of the setup:



Samtec ExaMAX PAM4 56Gbps backplane demo.jpg 




In the following short video, Samtec’s Ralph Page describes the demo and mentions the nice eyes and clear data levels, as seen on the Xilinx demo software screen positioned above the demo boards. He also mentions the BER—5.29x10-8. That’s the error rate before adding the error-reducing capabilities of a FEC, which can drop the error rate by perhaps another ten orders of magnitude or more.


Samtec’s demo points to a foreseeable future where you will be able to develop large backplanes with screamingly fast performance using PAM4 SerDes transceivers.



Here’s the 46-second demo video:






The 2-minute video below shows you an operational Xilinx Virtex UltraScale+ XCVU37P FPGA, which is enhanced with co-packaged HBM (high-bandwidth memory) DRAM using Xilinx’s well-proven, 3rd-generation 3D manufacturing process. (Xilinx started shipping 3D FPGAs way back in 2011, starting with the Virtex-7 2000T and we’ve been shipping these types of devices ever since.)


This video was made on the very first day of silicon bringup for the device and it is already operating at full speed (460Gbytes/sec), error-free, over 32 channels. The Virtex UltraScale+ XCVU37P is one big All Programmable device with:


  • 2852K System Logic Cells
  • 9Mbits of BRAM
  • 270Mbits of UltraRAM
  • 9024 DSP48E2 slices
  • 8Gbytes of integrated HBM DRAM
  • 96 32.75Gbps GTY SerDes transceivers



Whatever your requirements, whatever your application, chances are this extremely powerful FPGA will deliver all of the heavy lifting (processing, memory, and I/O) that you need.


Here’s the video:






For more information about the Virtex UltraScale+ HBM-enhanced device family, see “Xilinx Virtex UltraScale+ FPGAs incorporate 32 or 64Gbits of HBM, delivers 20x more memory bandwidth than DDR.





Mycroft AI’s Mycroft Mark II Open Voice Assistant, which is based on Aaware’s far-field Sound Capture Platform and the Xilinx Zynq UltraScale+ MPSoC, is a Kickstarter project initiated last Friday. (See “New Kickstarter Project: The Mycroft Mark II open-source Voice Assistant is based on Aaware’s Sound Capture Platform running on a Zynq UltraScale+ MPSoC.”) The Mycroft Mark II project was fully funded in an astonishingly short seven hours, guaranteeing that the project would proceed. After only four days, the project has exceeded its pledge goal of $50,000 by 300%. As of this writing, 935 backers have pledged $150,801 so the project is most definitely a “go” and the project team is currently developing stretch goals to extend the project’s scope.


Here are two reasons you might want to participate in this Kickstarter campaign:


  • The Mycroft Mark II is a hands-free, privacy-oriented, open-source smart speaker with a touch screen. It has advanced far-field voice recognition and multiple wake words for voice-based cloud services such as Amazon’s Alexa and Google Home, courtesy of Aaware’s technology. (See “Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.”) The finished smart speaker requires a pledge of $129 (or $299 for three of them) but the dev kit version of the Mycroft Mark II requires a pledge of only $99, which is cheap as dev kits go. (Note: there are only 88 of these kits left, as of this writing.)


  • You could look at the Mycroft Mark II as a general-purpose, $99 Zynq UltraScale+ MPSoC open-source dev kit with a touch screen that’s also been enabled for voice control, which you can use as a platform for a variety of IIoT, cloud computing, or embedded projects. That in itself is a very attractive offer. As the Mycroft Mark II Kickstarter project page says: “The Mark II has special features that make hacking and customizing easy, not to mention thorough documentation and a community to lean on when building. Support for our community is central to the Mycroft mission.” That’s a lot for a sub-$100 dev kit, don’t you think?



Mycroft Mark II Smart Speaker Xray Diagram.jpg 


Mycroft Mark II Voice Assistant Xray Diagram




Xcell Daily has covered the FPGA-accelerated AWS EC2 F1 instances from Amazon Web Services several times. The AWS EC2 F1 instances allows AWS customers to develop accelerated code in C, C++, OpenCL, Verilog, or VHDL and run it on Amazon servers augmented with hardware-accelerated cards based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. (See below.)


A new AWS case study titled “Xilinx Speeds Testing Time, Increases Developer Productivity Using AWS” turns the tables. It discusses Xilinx’s use of AWS services to speed development of Xilinx development software such as the Vivado and SDx development environments. Xilinx employs extensive regression testing when developing new releases of these complex tools and the resulting demand spikes called for more “elastic” server resources. (Amazon’s “EC2” designation stands for “Elastic Compute Cloud.”)


As the case study states:



“Xilinx addressed its infrastructure-scaling problem by migrating to a high-performance computing (HPC) cluster running on Amazon Web Services (AWS). ‘We evaluated several cloud providers and chose AWS because it had the best tools and most mature solution,’” says [Ambs] Kesavan, [software engineering and DevOps director at Xilinx].





For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:












Everspin’s nvNITRO NVMe Storage Accelerator is a persistent-memory PCIe storage card for cloud and data-center applications that delive4rs up to 1.46 million IOPS for random 4Kbyte mixed 70/30 read/write operations. It’s based on Everspin’s STT-MRAM (spin-transfer torque magnetic RAM) chips and uses a Xilinx Kintex UltraScale KU060 FPGA to implement the MRAM controller and the board’s PCIe Gen3 x8 host interface. Everspin has just published an nvNITRO application note titled “Accelerating Fintech Applications with Lossless and Ultra-Low Latency Synchronous Logging using nvNITRO” that details the use of the nvNITRO Storage Accelerator to speed cloud-based financial transactions. The application note explores how Everspin nvNITRO technology can improve FinTech (Financial Technology) performance without creating additional compliance risks.


If you haven’t looked deeply into the intricacies of financial trading transactions, the app note starts with a clarifying block diagram, which shows the multiple layers built into the transaction process:




Everspin Financial Transaction Diagram.jpg 




The diagram shows many opportunities for accelerating transactions, which is important because in this market, microseconds translate into millions of dollars gained or lost.


If you’re developing cloud-based systems and acceleration is important, whether or not you’re developing FinTech applications, take a few minutes to read the Everspin app note.




For more information about the Everspin nvNITRO Storage Accelerator, see “Everspin’s new MRAM-based nvNITRO NVMe card delivers Optane-crushing 1.46 million IOPS (4Kbyte, mixed 70/30 read/write).”



Please contact Everspin for more information about the nvNITRO Storage Accelerator.





Earlier this month at the Xilinx Developers Forum (XDF) in Frankfurt, Huawei’s Principal Hardware Architect Craig Davies gave a half-hour presentation about Huawei Cloud’s FaaS (FPGAs as a Service). His primary mission: to enlist new Huawei Cloud partners to expand the company’s FACS (FPGA Accelerated Cloud Server) FaaS ecosystem. (Huawei announced the FACS offering at HUAWEI CONNECT 2017 last September, see “Huawei bases new, accelerated cloud service and FPGA Accelerated Cloud Server on Xilinx Virtex UltraScale+ FPGAs.”)


Huawei’s FACS cloud offering is based on a PCIe server card that incorporates a Xilinx Virtex UltraScale+ VU9P FPGA. (Huawei also offers the board for on-premise installations.) In addition to the hardware, Huawei offers three major development tools for FACS:



  • An SDAccel-based shell that offers fast, easy development. SDAccel is Xilinx’s development environment for C, C++, and OpenCL. This shell also provides access to Xilinx’s Vivado development environment.


  • A DPDK shell for high-performance applications. Intel originally developed DPDK as a packet-processing framework for accelerated server systems and Huawei’s implementation can support throughputs as high as 12Gbytes/sec.



  • A Professional Simulation Platform that encapsulates more than two decades of Huawei’s FPGA development experience.



With these offerings, Davies said, Huawei is looking to add partners to expand its ecosystem and is particularly interested in talking to companies that offer:


  • Accelerator IP
  • Design services
  • Solution providers
  • Domain expertise


There’s a Huawei Cloud Marketplace that serves as an outlet for FACS applications. The company is also welcoming end users to try the service.


Here’s a video of Davies’ 32-minute presentation at XDF:







Amazon’s Senior Director of Business Development and Product, Gadi Hutt, gave an in-depth presentation at the recent Xilinx Developers Forum in Frankfurt, Germany where he detailed the specifics, advantages, and the nuts-and-bolts “how to” with respect to using the FPGA-based AWS EC2 F1 instances to accelerate your business.


First, Hutt gave one of the most succinct definitions of “the cloud” I’ve heard: “the on-demand delivery of compute, storage, networking, etc. services.” This definition is free of the niggling details such as hardware, networking, power, and cooling that you are now free to ignore.


Then Hutt listed the advantages of cloud-based services:


  • Agility and speed of innovation
  • Cost savings
  • Elasticity: scale up or down quickly, as needed
  • Breadth of functionality
  • Go global in minutes


From there, Hutt provided a deep explanation of the steps you need to take to distribute cloud-based services globally. He also quoted a Gartner estimate, which said that AWS (Amazon Web Services) has more compute capacity than all of the other cloud providers combined. Certainly, this Gartner report puts AWS far in the upper right corner of the Gartner Magic Quadrant for Cloud Infrastructure as a Service, Worldwide.


Using AWS allows your company to “get out of IT” and focus on providing specialized services where you can add value, said Hutt. “You can focus on your core business,” he continued.


Then he turned to the specifics of the AWS EC2 F1 instances, which are based on multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Two of the many points Hutt made include:




  • SDAccel results are comparable to manually optimized implementations using HDL



“There’s pretty good maturity in the software ecosystem, today,” Hutt observed.


One of Hutt’s conclusions with respect to AWS EC2 F1 instances:



“There’s a tremendous opportunity for FPGAs to shine in a number of areas.”



If you’re interested in FPGA-based cloud acceleration, here’s the 48-minute video with Gadi Hutt’s full presentation at XDF:







For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:











Earlier this month, Xilinx held a developer’s forum in Frankfurt, Germany and Xilinx’s Senior Director for Software and IP Ramine Roan discussed the growing role of Xilinx All Programmable devices in his opening remarks, which appear in a New Electronics article written by Neil Tyler titled “Resurgence of interest in FPGAs helped by new services via the Cloud.” Roane started by stating something that any design team already knows: CPU architectures are failing to meet the demand of increasing workloads because Dennard frequency and power scaling—often erroneously lumped into Moore’s Law, which is really about transistor and density scaling—essentially died several years ago after several decades of robust health. The current workaround—multicore architectures—rapidly hits its own limits in most embedded systems where there just aren’t enough tasks to distribute to dozens of processor cores.


The article then quotes Roane:



“There are too many transistors switching at the same time and current leakage at lower geometries is hitting power constraint limits, and this is all happening at a time when workload demand is growing exponentially both in the Cloud and at the edge.”



One solution, hardware application accelerators, only make sense if the production volumes are justified. For that you need a killer app said Roane.



Problem: there just aren’t that many killer apps.



The current situation plays to the strengths of Xilinx All Programmable devices, which can be reconfigured for a truly wide range of applications. “They provide configurable processor sub-systems and hardware that can be reconfigured dynamically,” said Roane.


The problem, of course, is that taking advantage of the programmable hardware resources in Xilinx devices has not been as easy as it might be. In the past, you needed specialized hardware-design skills; You needed to know Verilog or VHDL; You needed to wade into possibly unfamiliar hardware waters.


Roane emphasized that things are very different today. As the article states, “Xilinx and its growing ecosystem of partners are now delivering a much richer development stack so that hardware, embedded and application software developers can program them more easily by using higher level programming options, like C, C++ and OpenCL.”



“We are now able to deliver a development stack that designers are increasingly familiar with and which is also available on the Cloud via secure cloud services platforms,” added Roane, referring to Xilinx-based cloud acceleration offerings from Amazon Web Services (AWS EC2 F1 instances) and Alibaba Cloud.





For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:










For more information about the Xilinx-based Alibaba Cloud F2 offering in Xcell Daily, see:





Javier Alejandro Varela and Professor Dr-Ing Norbert Wehn at of the University of Kaiserslautern’s Microelectronic Systems Design Research Group have just published a White Paper titled “Running Financial Risk Management Applications on FPGA in the Amazon Cloud” and the last sentence in the White Paper’s abstract reads:



“…our FPGA implementation achieves a 10x speedup on the compute intensive part of the code, compared to an optimized parallel implementation on multicore CPU, and it delivers a 3.5x speedup at application level for the given setup.”



The University of Kaiserslautern’s Microelectronic Systems Design Research Group has been working on accelerating financial applications using FPGAs in connection with high-performance computing systems since 2010 and that research has recently migrated to cloud-based computing systems including Amazon’s EC2 F1 Instance, which is based on Xilinx Virtex Ultrascale+ FPGAs. The results in this White Paper are based on using OpenCL code and the Xilinx SDAccel development environment.




For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:







Looking to turbocharge Amazon’s Alexa or Google Home? Aaware’s Zynq-based kit is the tool you need.

by Xilinx Employee ‎01-04-2018 02:13 PM - edited ‎01-05-2018 12:34 PM (24,550 Views)


How do you get reliable, far-field voice recognition; robust, directional voice recognition in the presence of strong background noise; and multiple wake words for voice-based cloud services such as Amazon’s Alexa and Google Home? Aaware has an answer with its $199, Zynq-based Far-Field Development Platform. (See “13 MEMS microphones plus a Zynq SoC gives services like Amazon’s Alexa and Google Home far-field voice recognition clarity.”) A new Powered by Xilinx Demo Shorts video gives you additional info and another demo. (That’s a Zynq-based krtkl snickerdoodle processing board in the video.)





Looking for a quick explanation of the FPGA-accelerated AWS EC2 F1 instance? Here’s a 3-minute video

by Xilinx Employee ‎12-19-2017 10:45 AM - edited ‎12-19-2017 10:49 AM (26,176 Views)


The AWS EC2 F1 compute instance allows you to create custom hardware accelerators for your application using cloud-based server hardware that incorporates multiple Xilinx Virtex UltraScale+ VU9P FPGAs. Several companies now list applications for FPGA-accelerated AWS EC2 F1 instances in the AWS Marketplace in application categories including:



  • Video processing
  • Data analytics
  • Genomics
  • Machine Learning



Here’s a 3-minute video overview recorded at the recent SC17 conference in Denver:







High-frequency trading is all about speed, which explains why Aldec’s new reconfigurable HES-HPC-HFT-XCVU9P PCIe card for high-frequency trading (HFT) apps is powered by a Xilinx Virtex UltraScale+ VU9P FPGA. That’s about as fast as you can get with any sort of reprogrammable or reconfigurable technology. The Virtex UltraScale+ FPGA directly connects to all of the board’s critical, high-speed interface ports—Ethernet, QSFP, and PCIe x16—and implements the communications protocols for those standard interfaces as well as the memory control and interface for the board’s three QDR-II+ memory modules. Consequently, there’s no time-consuming chip-to-chip interconnection. Picoseconds count in HFT applications, so the FPGA’s ability to implement all of the card’s logic is a real competitive advantage for Aldec. The new FPGA accelerator is extremely useful for implementing time-sensitive trading strategies such as Market Making, Statistical Arbitrage, and Algorithmic Trading and is compatible with 1U and larger trading systems.



Aldec HES-HPC-HFT-XCVU9P PCIe card .jpg 



Aldec’s HES-HPC-HFT-XCVU9P PCIe card for high-frequency trading apps—Powered by a Xilinx Virtex UltraScale+ FPGA





Here’s a block diagram of the board:




Aldec HES-HPC-HFT-XCVU9P PCIe card block diagram.jpg



Aldec’s HES-HPC-HFT-XCVU9P PCIe card block diagram




Please contact Aldec directly for more information about the HES-HPC-HFT-XCVU9P PCIe card.




An article titled “Living on the Edge” by Farhad Fallah, one of Aldec’s Application Engineers, on the New Electronics Web site recently caught my eye because it succinctly describes why FPGAs are so darn useful for many high-performance, edge-computing applications. Here’s an example from the article:


“The benefits of Cloud Computing are many-fold… However, there are a few disadvantages to the cloud too, the biggest of which is that no provider can guarantee 100% availability.”


There’s always going to be some delay when you ship data to the cloud for processing. You will need to wait for the answer. The article continues:


“Edge processing needs to be high-performance and in this respect an FPGA can perform several different tasks in parallel.”


The article then continues to describe a 4-camera ADAS demo based on Aldec’s TySOM-2-7Z100 prototyping board that was shown at this year’s Embedded Vision Summit held in Santa Clara, California. (The TySOM-2-7Z100 proto board is based on the Xilinx Zynq Z-7100 SoC—the largest member of the Zynq SoC family.)





Aldec TySOM-2-Z100 Prototyping Board.jpg 


Aldec’s TySOM-2-7Z100 prototyping board




Then the article describes the significant performance boost that the Zynq SoC’s FPGA fabric provides:


“The processing was shared between a dual-core ARM Cortex-A9 processor and FPGA logic (both of which reside within the Zynq device) and began with frame grabbing images from the cameras and applying an edge detection algorithm (‘edge’ here in the sense of physical edges, such as objects, lane markings etc.). This is a computational-intensive task because of the pixel-level computations being applied (i.e. more than 2 million pixels). To perform this task on the ARM CPU a frame rate of only 3 per second could have been realized, whereas in the FPGA 27.5 fps was achieved.”


That’s nearly a 10x performance boost thanks to the on-chip FPGA fabric. Could your application benefit similarly?





Eideticom’s NoLoad (NVMe Offload) platform uses FPGa-based acceleration on PCIe FPGA cards and in cloud-based FPGA servers to provide storage and compute acceleration through standardized NVMe and NVMe over Fabrics protocols. The No Load product itself is a set of IP that implements the NoLoad accelerator. The company is offering Hardware Eval Kits that target FPGA-based PCIe cards from Nallatech--the 250S FlashGT+ Card based on a Xilinx Kintex UltraScale+ KU15P FPGA—and the Alpha Data ADM-PCIE-9V3, which is based on a Xilinx Virtex UltraScale+ VU3P FPGA.


The NoLoad platform allows networked systems to share FPGA acceleration resources across the network fabric. For example, Eideticom offers an FPGA-accelerated Reed-Solomon Erasure Coding engine that can supply codes to any storage facility on the network.


Here’s a 6-minute video that explains the Eideticom NoLoad offering with a demo from the Xilinx booth at the recent SC17 conference:






For more information about the Nallatech 250S+ SSD accelerator, see “Nallatech 250S+ SSD accelerator boosts storage speed of four M.2 NVMe drives using Kintex UltraScale+ FPGA.”



For more information about the Alpha Data ADM-PCIE-9V3, see “Blazing new Alpha Data PCIe Accelerator card sports Virtex UltraScale+ VU3P FPGA, 4x 100GbE ports, 16Gbytes of DDR4 SDRAM.”


There was a live AWS EC2 F1 application-acceleration Developer’s Workshop during last month Amazon’s re:Invent 2017. If you couldn’t make it, don’t worry. It’s now online and you can run through it in about two hours (I’m told). This workshop teaches you how to develop accelerated applications using the AWS F1 OpenCL flow and the Xilinx SDAccel development environment for the AWS EC2 F1 platform, which uses Xilinx Virtex UltraScale+ FPGAs as high-performance hardware accelerators.


The architecture of the AWS EC2 F1 platform looks like this:



AWS EC2 F1 Architecture.jpg 


AWS EC2 F1 Architecture




This developer workshop is divided in 4 modules. Amazon recommends that you complete each module before proceeding to the next.


  1. Connecting to your F1 instance 
    You will start an EC2 F1 instance based on the FPGA developer AMI and connect to it using a remote desktop client. Once connected, you will confirm you can execute a simple application on F1.
  2. Experiencing F1 acceleration 
    AWS F1 instances are ideal to accelerate complex workloads. In this module you will experience the potential of F1 by using FFmpeg to run both a software implementation and an F1-optimized implementation of an H.265/HEVC encoder.
  3. Developing and optimizing F1 applications with SDAccel 
    You will use the SDAccel development environment to create, profile and optimize an F1 accelerator. The workshop focuses on the Inverse Discrete Cosine Transform (IDCT), a compute intensive function used at the heart of all video codecs.
  4. Wrap-up and next steps 
    Explore next steps to continue your F1 experience after the re:Invent 2017 Developer Workshop.



Access the online AWS EC2 F1 Developer’s Workshop here.



For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:









Accolade’s new ANIC-200Kq Flow Classification and Filtering Adapter brings packet processing, storage optimization, and scalable Flow Classification at 100GbE through two QSFP28 optical cages. Like the company’s ANIC-200Ku Lossless Packet Capture adapter introduced last year, the ANIC-200Kq board is based on a Xilinx UltraScale FPGA so it’s able to run a variety of line-speed packet-processing algorithms including the company’s new “Flow Shunting” feature.




Accolade ANIC-200Kq Flow Classification and Filtering Adapter.jpg 


Closeup view of the QSFP28 ports on Accolade’s ANIC-200Kq Flow Classification and Filtering Adapter




The new ANIC-200Kq adapter differs from the older ANIC-200Ku adapter in its optical I/O ports. The ANIC-200Kq adapter incorporates two QSFP28 optical cages and the ANIC-200Kq adapter incorporates two CFP2 cages. Both the QSFP28 and CFP2 interfaces accept SR4 and LR4 modules. The QSFP28 optical cages put Accolade’s ANIC-200Kq adapter squarely in the 25, 40, and 100GbE arenas, providing data center architects with additional architectural flexibility when designing their optical networks. For this reason, QSFP28 is fast becoming the universal form factor for new data center installations.



For more information in Xcell Daily about Accolade’s fast Flow Classification and Filtering Adapters, see:







The upcoming Xilinx Developer Forum in Frankfurt, Germany on January 9 will feature a hands-on Developer Lab titled “Accelerating Applications with FPGAs on AWS.” During this afternoon session, you’ll gain valuable hands-on experience with the FPGA-accelerated AWS EC2 F1 instance and hear from a special guest speaker from Amazon Web Services. Attendance is limited on a first-come-first-serve basis, so you must register, here.



For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:










Netcope’s NP4, a cloud-based programming tool allows you to specify networking behavior using declarations written in the P4 network-specific, high-level programming language for the company’s high-performance, programmable Smart NICs based on Xilinx Virtex UltraScale+ and Virtex-7 FPGAs. The programming process involves the following steps:


  1. Write the P4 code.
  2. Upload your code to the NP4 cloud.
  3. Wait for the application to autonomously translate your P4 code into VHDL and synthesize the FPGA configuration.
  4. Download the firmware bitstream and upload it to the FPGA on your Netcope NIC.


Netcope calls NP4 its “Firmware as a Service” offering. If you are interested in trying NP4, you can request free trial access to the cloud service here.



Netcope NFB-200G2QL Programmable NIC.jpg


Netcope Technologies’ NFB-200G2QL 200G Ethernet Smart NIC based on a Virtex UltraScale+ FPGA




For more information about Netcope and P4 in Xcell Daily, see:




For more information about Netcope’s FPGA-based NICs in Xcell Daily, see:








Karl Freund’s article titled “Amazon AWS And Xilinx: A Progress Report” appeared on Forbes.com today. Freund is a Moor Insights & Strategy Senior Analyst for Deep Learning and High-Performance Computing (HPC). He describes Amazon’s FPGA-based AWS EC2 F1 instance offering this way:



“…the cloud leader [Amazon] is laying the foundation to simplify FPGA adoption by creating a marketplace for accelerated applications built on Xilinx [Virtex UltraScale+] FPGAs.”



Freund then discusses what’s happened since Amazon announced its AWS EC2 F1 instance a year ago. Here are his seven highlights:


  1. "AWS has now deployed the F1 instances to four regions, with more to come…”

  2. “To support the Asian markets, where AWS has limited presence, Xilinx has won over support from the Alibaba and Huawei cloud operations.” (Well, that’s ones not really about Amazon, but let’s keep in in anyway, shall we?)

  3. “Xilinx has launched a global developer outreach program, and has already trained over 1,000 developers [on the use of AWS EC2 F1] at three Xilinx Developer Forums—with more to come.”

  4. “Xilinx has recently released a Machine Learning (ML) Amazon Machine Instance (AMI), bringing the Xilinx Reconfigurable Acceleration Stack (announced last year) for ML Inference to the AWS cloud.”

  5. “Xilinx partner Edico Genome recently achieved a Guinness World Record for decoding human genomes, analyzing 1000 full human genomes on 1000 F1 instances in 2 hours, 25 minutes; a remarkable 100-fold improvement in performance…”

  6. “AWS has added support for Xilinx SDAccel programming environment to all AWS regions for solution developers…”

  7. “Xilinx partner Ryft has built an impressive analytic platform on F1, enabling near-real-time analytics by eliminating data preparation bottlenecks…”



The rest of Freund’s article discusses the Ryft’s AWS Marketplace offering in more detail and concludes with this:



“…at least for now, Amazon.com, Huawei, Alibaba, Baidu, and Tencent have all voted for Xilinx.”




For extensive Xcell Daily coverage about the AWS EC2 F1 instance, see:















Titan IC’s newest addition to the AWS Marketplace based on the FPGA-accelerated AWS EC2 F1 instance is the Hyperion F1 10G RegEx File Scan, a high-performance file-search and file-scanning application that can process 1Tbyte of data with as many as 800,000 user-defined regular expressions in less than 15 minutes. The Hyperion F1 10G RegEx File Scan application leverages the processing power of the AWS EC2 F1 instance’s multiple Xilinx Virtex UltraScale+ VU9P FPGAs to speed the scanning of files using complex pattern and string matching, attaining a throughput as high as 10Gbps.


Here’s a block diagram showing the Hyperion F1 10G RegEx File Scan application running in an AWS EC2 f1.2xlarge instance:




Titan IC Hyperion F1 10G RegEx File Scan on AWS EC2 F1.jpg 




You can get more details about this application here in the AWS Marketplace.





For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:









This month at SC17 in Denver, Nallatech was showing its new 250S+ high-performance SSD-accelerator PCIe card, which uses a Xilinx Kintex UltraScale+ KU15P FPGA to implement an NVMe SSD controller/accelerator and the board’s PCIe Gen4 x8 interface. You can plug SSD cards or NVMe cables into the card’s four M.2 NVME slots so you can control as many as four on- or off-board drives with one card. The card comes in 3.84Tbyte and 6.4Tbyte versions with on-board M.2 NVMe SSDs and can control a drive array as large as 25.6Tbytes using NVMe cables.



Nallatech 250S NVMe accelerator card.jpg 


Nallatech 250S+ NVMe SSD accelerator card based on a Xilinx Kintex UltraScale+ FPGA





Nallatech 250S NVMe accelerator card with NVMe cables.jpg 


Nallatech 250S+ NVMe SSD accelerator card with NVMe cables




Here are the card’s specs:



Nallatech 250S NVMe accelerator card specs.jpg 



And here’s a block diagram of the Nallatech 250S+ NVMe accelerator card:




Nallatech 250S NVMe accelerator card block diagram.jpg 



As you can see, the Kintex UltraScale+ FPGA implements the entire logic design on the card, driving the PCIe connector, managing the four attached NVMe SSDs, directly controlling and operating the card’s on-board DDR4-2400 SDRAM cache, and even implementing the card's JTAG interface.



For more information about the Nallatech 250S+ NVMe accelerator card, please contact Nallatech directly.





When Xcell Daily last looked at Netcope Technologies’ NFB-200G2QL FPGA-based 200G Ethernet Smart NIC with its cool NASA-scoop heat sink in August, it had broken industry records for 100GbE performance with a throughput of 148.8M packets/sec on DPDK (the Data Plane Development Kit)—the theoretical maximum for 64-byte packets over 100GbE. (See “Netcope breaks 100GbE record @ 148.8M packets/sec (the theoretical max) with NFB-100G2Q FPGA-based NIC, then goes faster at 200GbE.”) At the time, all Netcope would say was that the NFB-200G2QL PCIe card was “based on a Xilinx Virtex UltraScale+ FPGA.” Well, Netcope was at SC17 in Denver earlier this month and has been expanding the capabilities of the board. It’s now capable of sending or receiving packets at a 200Gbps line rate with zero packet loss, still using “the latest Xilinx FPGA chip Virtex UltraScale+,” which I was told at Netcope’s SC17 booth is a Xilinx Virtex UltraScale+ VU7P FPGA.




Netcope NFB-200G2QL Programmable NIC.jpg 



Netcope Technologies’ NFB-200G2QL 200G Ethernet Smart NIC based on a Virtex UltraScale+ FPGA




One trick to doing this: using two PCIe Gen3 x16 slots to get packets to/from the server CPU(s). Why two slots? Because Netcope discovered that its 200G Smart NIC PCIe card could transfer about 110Gbps worth of packets over one PCIe Gen3 x16 slot and the theoretical maximum traffic throughput for one such slot is 128Gbps. That means 200Gbps will not pass through the eye of this 1-slot needle. Hence the need for two PCIe slots, which will carry the 200Gbps worth of packets with a comfortable margin. Where’s that second PCIe Gen3 interface coming from? Over a cable attached to the Smart NIC board and implemented in the board’s very same Xilinx Virtex UltraScale+ VU7P FPGA, of course. The company has written a White Paper describing this technique titled “Overcoming the Bandwidth Limitations of PCI Express.”


And yes, there’s a short video showing this Netcope sorcery as well:







In the short video below, Xilinx Product Marketing Manager Kamran Khan demonstrates GoogleNet running at 10K images/sec on Amazon’s AWS EC2 F1 using eight Virtex UltraScale+ FPGAs in a 16xlarge configuration. The same video also shows open-source, deep-learning app DeepDetect running in real time, classifying images from a Webcam’s real-time video stream.





For more information about Amazon’s AWS EC2 F1 instance in Xcell Daily, see:









Last week at SC17 in Denver, BittWare announced its TeraBox 1432D 1U FPGA server box, a modified Dell PowerEdge C4130 with a new front panel that exposes 32 100GbE QSFP ports from as many as four of the company’s FPGA accelerator cards. (That’s a total front-panel I/O bandwidth of 3.2Tbps!) The new 1U box doubles the I/O rack density with respect to the company’s previous 4U offering.



BittWare TeraBox 1432D.jpg 



BittWare’s TeraBox 1432D 1U FPGA Server Box exposes 32 100GbE QSFP ports on its front panel





The TeraBox 1432D server box can be outfitted with four of the company’s XUPP3R boards, which are based on Xilinx Virtex UltraScale+ FPGAs (VU7P, VU9P, or VU11P) and can be fitted for eight QSFPs each. (That’s four QSFP cages) on the board and four more QSFPs on a daughter card connected to the XUPP3R board via a cable to an FMC connector. This configuration underscores the extreme I/O density and capability of Virtex UltraScale+ FPGAs.




BittWare TeraBox 1432D Detail.jpg


BittWare TeraBox 1432D interior detail



The new BittWare TeraBox 1432D will be available Q1 2018 with the XUPP3R FPGA accelerator board. According to the announcement, BittWare will also release the Xilinx UltraScale+ VU13P-based XUPVV4 in 2018. This new board will also fit in the TeraBox 1432D.


Here’s a 3-minute video from SC17 with a walkthrough of the TeraBox 1432D 1U FPGA server box by BittWare's GM and VP of Network Products Craig Lund:







According to an announcement released today:


“Xilinx, Inc. (XLNX) and Huawei Technologies Co., Ltd. today jointly announced the North American debut of the Huawei FPGA Accelerated Cloud Server (FACS) platform at SC17. Powered by Xilinx high performance Virtex UltraScale+ FPGAs, the FACS platform is differentiated in the marketplace today.


“Launched at the Huawei Connect 2017 event, the Huawei Cloud provides FACS FP1 instances as part of its Elastic Compute Service. These instances enable users to develop, deploy, and publish new FPGA-based services and applications through easy-to-use development kits and cloud-based EDA verification services. Both expert hardware developers and high-level language users benefit from FP1 tailored instances suited to each development flow.


"...The FP1 demonstrations feature Xilinx technology which provides a 10-100x speed-up for compute intensive cloud applications such as data analytics, genomics, video processing, and machine learning. Huawei FP1 instances are equipped with up to eight Virtex UltraScale+ VU9P FPGAs and can be configured in a 300G mesh topology optimized for performance at scale."



Huawei’s FP1 FPGA accelerated cloud service is available on the Huawei Public Cloud today. To register for the public beta, click here.




One of the several demos in the Xilinx booth during this week’s SC17 conference in Denver was a working demo of the CCIX (Cache Coherent Interconnection for Accelerators) protocol, which simplifies the design of offload accelerators for hyperscale data centers by providing low-latency, high-bandwidth, fully coherent access to server memory.  The demo shows L2 switching acceleration using an FPGA to offload a host processor. The CCIX protocol manages a hardware cache in the FPGA, which is coherently linked to the host processor’s memory. Cache updates take place in the background without software intervention through the CCIX protocol. If cache entries are invalidated in the host memory, the CCIX protocol automatically invalidates the corresponding cache entries in the FPGA’s memory.


Senior Staff Design Engineer Sunita Jain gave Xcell Daily a 3-minute explanation of the demo, which shows a 4.5x improvement in packet transfers using CCIX versus software-controlled transfers:






There’s one thing to note about this demo. Although the CCIX standard calls for using the PCIe protocol as a transport layer at 25Gbps/lane, which is faster than PCIe Gen4, this demo only demonstrates the CCIX protocol and is using the significantly slower PCIe Gen1 for the transport layer.


For more information about the CCIX protocol as discussed in Xcell Daily, see:









This week at SC17 in Denver, Everspin was showing some impressive performance numbers for the MRAM-based nvNITRO NVMe Accelerator Card that the company introduced earlier this year. As discussed in a previous Xcell Daily blog post, the nvNITRO NVMe Accelerator Card is based on the company’s non-volatile ST-MRAM chips and a Xilinx Kintex UltraScale KU060 FPGA implements the MRAM controller and the board’s PCIe Gen3 x8 host interface. (See “Everspin’s new MRAM-based nvNITRO NVMe card delivers Optane-crushing 1.46 million IOPS (4Kbyte, mixed 70/30 read/write).”)


The target application of interest at SC17 was high-frequency trading, where every microsecond you can shave off of system response times directly adds dollars to the bottom line, so the ROI on a product like the nvNITRO NVMe Accelerator Card that cuts transaction times is easy to calculate.




Everspin nvNITRO NVMe card.jpg 



Everspin MRAM-based nvNITRO NVMe Accelerator Card




It turns out that a common thread and one of the bottlenecks for high-frequency trading applications is the use of Apache Log4j event-logging utility. However, incoming packets arrive at a variable rate—the traffic is bursty—and the Log4j logging utility needs to keep up with the highest possible burst rates to ensure that every event is logged. Piping these events directly into SSD storage sets a low limit to the burst rate that a system can handle. Inserting an nvNITRO NVMe Accelerator Card as a circular buffer in series with the incoming event stream as shown below boosts Log4j performance by 9x.




Everspin nvNITRO Circular Buffer.jpg 




Proof of efficacy appears in the chart below, which shows the much lower latency and much better determinism provided by the nvNITRO card:




Everspin nvNITRO Latency Reduction.jpg 




One more thing of note: As you can see by one of the labels on the board in the photo above, Everspin’s nvNITRO card is now available as Smart Modular Technologies’ MRAM NVM Express Accelerator Card. Click here for more information.




Ryft is one of several companies now offering FPGA-accelerated applications based on Amazon’s AWS EC2 F1 instance. Ryft was at SC17 in Denver this week with a sophisticated, cloud-based data analytics demo based on machine learning and deep learning that classified 50,000 images from one data file using a neural network, merged the classified image files with log data from another file to create a super metadata file, and then provided fast image retrieval using many criteria including image classification, a watch-list match (“look for a gun” or “look for a truck”), or geographic location using the Google Earth database. The entire demo made use of geographically separated servers containing the files used in conjunction with Amazon’s AWS Cloud. The point of this demo was to show Ryft’s ability to provide “FPGAs as a Service” (FaaS) in an easy to use manner using any neural network of your choice, any framework (Caffe, TensorFlow, MXNet), and the popular RESTful API.


This was a complex, live demo and it took Ryft’s VP of Products Bill Dentinger six minutes to walk me through the entire thing, even moving as quickly as possible. Here’s the 6-minute video of Bill giving a very clear explanation of the demo details:





Note: Ryft does a lot of work with US government agencies and as of November 15 (yesterday), Amazon’s AWS EC2 F1 instance based on Xilinx Virtex UltraScale+ FPGAs is available on GovCloud. (See “Amazon’s FPGA-accelerated AWS EC2 F1 instance now available on Amazon’s GovCloud—as of today.”)