Today in China, we were honored to host 1,280 developers, customers and partners at XDF Beijing – the second Xilinx Developer Forum (XDF) for 2018. At the event, our CEO, Victor Peng, gave a keynote where we officially launched Versal and Alveo in China, but in addition, executives from three of China’s most influential companies – Huawei, Alibaba and Inspur, took to the stage to disclose how they are leveraging Xilinx technology to improve and accelerate key data center workloads, like AI inference.
Huawei’s Xiaohua Zhang, vice president of IT Intelligent Computing Business, followed Victor’s keynote to discuss Huawei’s broad business scope – from phones to cloud computing to servers. Zhang explained that Huawei has a strategy to harmonize its online and offline platforms across its technologies by working with partners such as Xilinx to develop an ecosystem. As such, Huawei announced that they will now be integrating and deploying the Xilinx Alveo accelerator card as part of its product portfolio. The goal is to create even more value for its expanding data center and AI business. And with 20 years of working together, Zhang said that he looks to Xilinx and Huawei to jointly help enable its unified ecosystem and provide comprehensive support to customers and partners in China.
Xiaohua Zhang, Huawei
Next to the stage was Jeff Zhang, FPGA research and development director at Aliyun, an Alibaba company. Zhang said that Aliyun believes that as cloud computing and AI move forward, FPGAs are uniquely positioned to enable greater capabilities and deliver greater performance and value than CPUs and GPUs. Aliyun then announced that based on their own conservative estimates, they believe that FPGAs have delivered a 40 percent improvement in total cost of ownership. Zhang repeated that this is a conservative number – 40 percent! He said that as far as he is concerned, there is no need to prove further the value of FPGAs. He then encouraged the audience to leverage the power of FPGAs as-a-service (FaaS) so that they can benefit from the capabilities of FPGAs like greater functionality, as well as application security that Aliyun has uniquely enabled, and see for themselves.
Jeff Zhang, Aliyun, an Alibaba company
Li Jin, vice president of Inspur Group, was the last to come to the stage. With 57 percent of the server market, Inspur is the largest server vendor in China. The company saw its sales increase by a whopping 600 percent in 2017 – the largest growth of any server vendor in the world. Jin noted that Inspur servers running Xilinx FPGAs provide the agility and efficiency that is uniquely suited for its AI customers. He stated that the industry needs better AI frameworks, performance and data pathways, and that Inspur sees the value in enabling migration to FPGAs to meet the AI challenge of the future. As such, Inspur announced that it is qualifying the Alveo U200 and U250 accelerator cards for key server platforms, including the Inspur NF5280M5 general purpose server, the Inspur NF5468M5 AI server and GX4 supercomputer extension box to deliver the performance and agility that is needed for its customers.
Li Jin, Inspur Group
All in all, it was a fascinating series of presentations and a peek into how some of the largest data center companies and luminaries in China are taking advantage of the power of Xilinx. The materials from XDF Beijing are available on our Xilinx.cn website.
By Daniel Eaton, Xilinx Senior Manager, Market Development, Accelerated Computing
At XDF Beijing today, we were thrilled that Amazon Web Services (AWS) announced that Amazon EC2 F1 FPGA instances are now available for preview in the AWS China (Beijing) region, operated by Sinnet. This news follows a similar disclosure made at XDF Silicon Valley announcing F1 expansion into Frankfurt, London, and Sydney. In the last year, customer demand has prompted the growth of F1 from a single region in the U.S. to eight regions worldwide.
AWS is the world’s largest cloud computing service provider, where individuals, companies and governments can access reliable, scalable and inexpensive cloud computing services, available throughout 18 geographic regions across the globe.
F1 is a class of instances powered by Xilinx FPGAs to create fast, custom hardware accelerations for workloads including data analytics, AI inference, video and image processing, genomics and more.
Additionally at XDF Silicon Valley, Twitch, the largest and fastest growing live streaming video service in North America announced that the AWS F1 server platform enabled them to build the industry’s first broadcast-quality, live streaming platform using the new VP9 video encoding format. F1 provided 30X greater performance over CPU, GPU and ASIC implementations.
AWS users can leverage pre-built F1 accelerated applications from the AWS Marketplace. Today there are 36 FPGA accelerated applications and libraries available for F1 (App directory). Software developers can use OpenCL and C/C++ to build customer accelerators (github), and hardware developers can leverage F1 to develop and deploy FPGA accelerations using Verilog and VHDL (github).
Want to learn more? For software and hardware developers interested in F1, check out the new self-paced tutorial just launched by Xilinx.
Gadi Hutt, senior director of business development and product for Amazon Web Services,
speaking at the AWS Global Summit in New York City.
At today’s Xilinx Developer Forum in San Jose, Calif., our CEO, Victor Peng was joined by the AMD CTO Mark Papermaster for a Guinness. But not the kind that comes in a pint – the kind that comes in a record book.*
The companies revealed the AMD and Xilinx have been jointly working to connect AMD EPYC CPUs and the new Xilinx Alveo line of acceleration cards for high-performance, real-time AI inference processing. To back it up, they revealed a world-record* 30,000 images per-second inference throughput!
The impressive system, which will be featured in the Alveo ecosystem zone at XDF today, leverages two AMD EPYC 7551 server CPUs with its industry-leading PCIe connectivity, along with eight of the freshly-announced Xilinx Alveo U250 acceleration cards. The inference performance is powered by Xilinx ML Suite, which allows developers to optimize and deploy accelerated inference and supports numerous machine learning frameworks such as TensorFlow. The benchmark was performed on GoogLeNet*, a widely used convolutional neural network.
AMD and Xilinx have shared a common vision around the evolution of computing to heterogeneous system architecture and have a long history of technical collaboration. Both companies have optimized drivers and tuned the performance for interoperability between AMD EPYC CPUs with Xilinx FPGAs. We are also collaborating with others in the industry on cache coherent interconnect for accelerators (the CCIX Consortium – pronounced “see-six”), focused on enabling cache coherency and shared memory across multiple processors.
AMD EPYC is the perfect CPU platform for accelerating artificial intelligence and high- performance computing workloads. With 32 cores, 64 threads, 8 memory channels with up to 2 TB of memory per socket, and 128 PCIe lanes coupled with the industry’s first hardware-embedded x86 server security solution, EPYC is designed to deliver the memory capacity, bandwidth, and processor cores to efficiently run memory-intensive workloads commonly seen with AI and HPC. With EPYC, customers can collect and analyze larger data sets much faster, helping them significantly accelerate complex problems.
Xilinx and AMD see a bright future in their technology collaboration. There is strong alignment in our roadmaps that align the high-performance AMD EPYC server and graphics processors with Xilinx acceleration platforms across its Alveo accelerator cards, as well as its forthcoming Versal portfolio.
So, raise a pint to the future of AI inference and innovation for heterogeneous computing platforms. And don’t forget to stop by and see the system in action in the Alveo ecosystem zone at the Fairmont hotel.
(Full disclosure – this has not yet been verified by Guinness themselves, but we hope to make a trip to Dublin soon!)
*running a batch size of 1 and Int8 precision.
This morning at the Xilinx Developer Forum (XDF) in San Jose, Calif., Victor Peng, our CEO, was joined on stage by Dr. Yueshi Shen, principal research engineer at Twitch. During the five-minute exchange, Twitch revealed that it has selected Xilinx FPGAs to enable the industry’s first broadcast-quality, live streaming platform using the new video-encoding format, VP9.
Twitch is the largest and fastest growing live streaming video platform in North America and the first to offer a free and interactive network for watching gaming and eSports content. Dr. Shen and his team were tasked with ensuring a great live streaming experience for the millions of viewers and publishers in the Twitch community. He said it is critical to deliver broadcast-quality video, with no buffering and super-low latency.
To meet the needs of Twitch’s demanding viewer base, Dr. Shen’s team implemented a Xilinx-powered solution using the new VP9 encoding standard developed with encoder IP from Xilinx application partner NGCodec. VP9 is an open source video coding format developed by Google. It was initially used to power YouTube, but it is gaining momentum due to it being royalty-free and offering users the ability to reduce streaming bit-rates while maintaining high visual quality. NGCodec’s VP9 encoder implementation offers users further value by speeding up the encoding process while maintaining the compression efficiency of other slower implementations such as LibVPX.
Dr. Shen said his team looked at many options including CPUs, which couldn’t handle the 60 frames per-second encoding requirement, in fact, server class CPUs could only handle 4 frames per-second. His team discovered there were not any GPU or ASIC solutions that would fit the bill either. However, the server implementation of AWS F1 FPGA instances, based on Xilinx UltraScale+ FPGAs, gave them the ability to implement a solution delivering an impressive 120 frames per-second on a single FPGA, representing a 30X greater performance over a CPU implementation.
Dr. Shen explained that his team was asked to do something innovative - something that hadn’t yet been done in the industry. He said that Twitch established a design goal to deliver a 25 percent or more reduction in bitrate while keeping the video at broadcast quality – which they did! All of this needed to be done real-time to support live broadcasting.
For Twitch, the experience needs to be immersive and interactive. The company viewership is growing very fast. Fortunately, Xilinx technology enables companies like Twitch to innovate faster, which is key to staying ahead of their growing workload.
Twitch has built out its data center to support more than the peak demand, reaching beyond three million simultaneous users requiring a whopping 18 terabytes per-second — a massive bandwidth number – roughly the same as the bandwidth required to stream last summer’s World Cup!
Dr. Shen said that he and his company are very pleased with the results and wanted to share them with the tech community. "We strongly recommend the entire streaming industry to seriously consider VP9, FPGAs, and the AWS F1." Check out Twitch today, and appreciate the complexity of how this was all made possible.
Connect. Learn. Share. Those are three key activities of XDF. XDF stands for the Xilinx Developer Forum and we are thrilled to be holding our second annual XDF this coming October in San Jose, Calif., with two other regional XDFs following in Beijing and Frankfurt.
This year’s XDF will be bigger and better than ever before. It will be held at the beautiful Fairmont San Jose in the heart of Silicon Valley on Oct. 1 and 2 and is designed for those that have used Xilinx products before as well as those who are new or just curious about Xilinx’s adaptable and intelligent technology. XDF will offer more than 50 technical sessions, 20 hours of hands-on developer labs and tutorials, plus over 40 exhibitor demos.
Attendees can learn more about upcoming products and platforms like the forthcoming “Project Everest,” and hear directly from Xilinx’s president and CEO, Victor Peng, who will deliver the XDF keynote on day 2.
There will be hands-on developer labs for cloud and edge applications as well as tutorials and developer deep dives on a variety of products and technologies. We will cover edge-to-cloud machine learning tools, libraries and framework support including ML Suite and new edge and endpoint solutions from our new DeePhi team. We will also offer insight into embedded system software, edge software development, cloud software development, development through frameworks, and of course, hardware design.
If you’re looking to meet and connect with experts, check out the developer hangouts, hosted by Amazon Web Services and Avnet where you can spend time with cloud and edge developers with the knowhow to help you tackle your next design. Don’t forget to grab a snack while you’re there!
Featured speakers include Ford Research, who will share information about how it has developed technology for the car maker’s advanced driver assistance systems (ADAS) using SDSoC. Arm will discuss how it is bringing the benefits of Cortex-M processors to Xilinx developers. And SK Telecom (SKT) will explain how it deployed Xilinx FPGAs to support its automatic speech recognition (ASR) service running its Virtual Personal Assistant, NUGU. SKT achieved up to five times higher performance in automatic speech recognition applications compared to the service delivered by GPUs. More importantly, Xilinx technology delivered 16 times better performance-per-watt.
Check out our amazing line up and plan your visit by reviewing this agenda. Register to secure your spot at www.xilinx.com/xdf. It’s free to attend! And if you can’t make it, check out XDF Beijing on Oct. 16 and XDF Frankfurt on Dec. 10.
With the continuing advance of cloud computing, artificial intelligence, the Internet of Things and other drivers of computing demand, enterprises managing huge exabyte-scale data centers need to move more and more data faster and faster. One secret to handling that traffic growth is to bring new data acceleration capabilities closer to the data itself.
That’s the concept advanced by Manish Muthal, the Vice President of Data Center Marketing here at Xilinx, in a keynote address he delivered Aug. 8 at the 2018 Flash Memory Summit held at the Santa Clara Convention Center in Silicon Valley.
A new class of data acceleration platforms is needed to enable tomorrow's exabyte-scale data centers, Manish said in his address. He identified a number of “brick walls” that he said have stymied the performance of data delivery across networks over the years as the amount of data, the number of applications and the demand for faster speeds has grown exponentially.
Among the brick walls has been a limit on the number of individual transistors that a data center could support, the limitations of single-thread processor performance, limits on radio frequencies, the growing electrical power demands on a network and limitations on the amount of cores that could be supported per machine.
As each of these brick walls arose, Manish said Xilinx overcame them by developing heterogeneous computing architectures that delivered data acceleration, such as its FPGAs, MPSoCs and its new Adaptive Compute Acceleration Platform (ACAP).
“We now need to move to a paradigm of heterogeneous computing architectures,” he said in his keynote. “These accelerators are going to play a key role to help us scale performance in a cost- and power-efficient manner.”
Moving compute acceleration closer to data is key to the success of emerging and next generation cloud data centers, he said. These accelerators will need to be easy to deploy and manage and must be highly adaptable to the ever-changing workloads within cloud environments.
Initially, heterogeneous computing was used to enable machine learning and training or traditional applications such as database management or video, Manish explained. But as demand for more compute capability grows, data centers need to offer “adaptable acceleration.” Besides database and video, adaptable acceleration is also needed to do big data analytics, financial risk modeling and genomics, as well as storage networking and security.
Xilinx took the opportunity at this year’s Flash Memory Summit to exhibit its next-generation flash storage solutions across ecosystems, partners, and customers.
View Manish’s full keynote
If you are interested in learning more about storage acceleration or any other type of Xilinx-powered technology, we encourage you to join us at the Xilinx Developer Forum coming up on Oct. 1 and 2, 2018, at the San Jose Fairmont in downtown San Jose. Hope to see you there!
Visit the upcoming Hot Chips 2018 industry conference in Silicon Valley, and you’ll see how dominant Xilinx is becoming in the market for high-performance chips and platforms.
The conference, which runs Aug. 19-21 at the Flint Center for the Performing Arts in Cupertino, Calif., will share details around a key element of our new adaptive compute acceleration platform (ACAP), deliver three talks on artificial intelligence – including one talk from newly acquired DeePhi Tech -- and include an important keynote address by our CEO, Victor Peng. Peng’s topic, Adaptable Intelligence: the Next Computing Era, will reveal the next best thing in tech and Xilinx’s key role in delivering it. He’ll present this important message at 11:45 a.m. PT, on Tuesday, Aug. 21.
Earlier that same day, Juanjo Noguera, engineering director for the Xilinx Architecture Group, will introduce the audience to a key component of the company’s new ACAP platform, which was introduced only in March, and delivers a highly integrated multi-core compute system that can be programmed at the hardware and software levels to adapt to the needs of a wide range of applications and workloads.
Noguera’s talk is titled HW/SW Programmable Engine: Increased Compute Density Architecture for Project Everest. This will be a first peek at one of the novel heterogeneous components of the forthcoming Everest product family built on the new ACAP platform. It will provide orders of magnitude better performance and other improvements over what’s available now. His presentation will be delivered at 9:45 a.m. PT on Tuesday, Aug. 21.
In another Xilinx presentation, Rahul Nimaiyar, director of Data Center and IP Solutions, will describe the deep neural network (DNN) processor for Xilinx FPGAs that is currently available for use in the Amazon Web Services (AWS) F1 instance. His talk Xilinx Tensor Processor: An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs, will be presented at 4 p.m. PT on Tuesday, Aug. 21.
Attendees can also expect a talk from the newest member of the Xilinx family - Beijing-based DeePhi Tech, - which Xilinx acquired on July 18. Xilinx was impressed with DeePhi’s industry-leading capabilities in deep compression for machine learning, and system-level network optimization. DeePhi will give a presentation titled The Evolution of Deep Learning Accelerators Upon the Evolution of Deep Learning Algorithms, at 3:30 p.m. PT, also on Tuesday, Aug 21.
Also at Hot Chips, Michaela Blott, principal engineer for Xilinx Research, will share her insights from the forefront of Xilinx research in a tutorial on architectures for deep neural nets called Deep Learning and Computer Architectures. This takes place at 2 p.m. on Sunday, Aug. 19.
By Willard Tu, Xilinx Senior Director, Automotive
While we haven’t always been loudly honking our own car horn as we should, Xilinx has a strong pedigree in automotive. For more than 12 years, we’ve shipped over 40 million cumulative units to automakers and Tier 1 automotive suppliers. In the majority of recent deployments, Xilinx devices are being used by Tier 1s to provide processing power for the camera and sensor systems they are developing for advanced driver-assistance systems (ADAS) and autonomous vehicles.
But why FPGA for these systems? We get that question often, so I’ve pulled together a list of five key reasons why automakers and Tier 1s are choosing Xilinx FPGA-based technology for their camera- and sensor-based driving systems.
1 - Ability to easily customize and differentiate – ASICs and GPUs are one-size-fits-all solutions. Because of the programmable nature of FPGAs, automakers can customize their chips to run proprietary image-processing algorithms, for example, enabling features that differentiate their models from the competition. Do you want a driver-monitoring camera that tracks both the driver’s eyes and head position? No problem. And what happens if your imager changes from 4MP to 8MP? Again, not a problem because you’ll be readily able to customize.
2 - Open-box architecture – Other market solutions are effectively a “black box,” which doesn’t enable a Tier 1 or OEM to know what other capabilities are inside of it. With Xilinx technology, automotive OEMs see exactly what they get, and can customize their system to meet changing regulatory conditions, compliance with functional safety standards, etc. Automotive OEMs tell us they need to know what they are getting. The black-box design keeps them in the dark on this.
3 - Ability to position anywhere – The FPGA architecture is inherently thermally efficient, enabling OEMs to locate the devices anywhere in or on the vehicle: inside of the ADAS central processing unit, inside of the car or even on the windshield. It doesn’t matter.
4 - Scalability – Since it is possible to readily re-program and add processing power to Xilinx’s SoCs, it is easy for a Tier 1 or OEM to scale their systems to meet needs for increased complexity, speed or capabilities. The architecture enables them to add more programmable fabrics as demanding by applications.
5 - Adaptability – The automotive industry today is moving quickly and imposing ever-changing requirements on automakers. For example, the European NCAP (New Car Assessment Programme) organization sets standards for safety—encompassing features like lane-keeping and auto-braking—that are updated every few months. Traditional chip architectures take one to two years to design and get to market (and even that is aggressive). OEMs and Tier 1s can adjust Xilinx devices on the fly, to meet changing needs.
We are proud of our steady growth in the automotive market, which is the result of these unique benefits of our flexible, programmable FPGA and SoC technology.
But the differentiators don’t stop with the five listed above. Watch this space for an upcoming post where we share two exciting, emerging automotive applications that are possible only with FPGA.
A unique combination of benefits makes Xilinx devices increasingly the choice of automotive Tier 1 suppliers and OEMs.
Xilinx, as one of the creators of field-programmable gate array (FPGA) technology for integrated-circuit design, has long embraced high-level synthesis (HLS) as an automated design process that interprets a desired behavior in order to create hardware that delivers that behavior. Xilinx has just introduced a book that clearly explains the process of creating an optimized hardware design using HLS.
The book, “Parallel Programming for FPGAs,” by Stephen Neuendorffer, Principal Engineer at Xilinx, together with Ryan Kastner from UCSD and Janarbek Matai from Cognex, is a practical guide for anyone interested in building FPGA systems. It is of particular value to students in advanced undergraduate and graduate courses. But it can also be useful for system designers and embedded programmers already on the job.
The book assumes the reader has a working knowledge of C/C++ programming -- which is like assuming someone knows how to drive a car with an automatic transmission -- and assumes familiarity with other basic computer architecture concepts. The book also includes a signiﬁcant amount of sample code. Any reader of the book is strongly encouraged to fire up a Vivado HLS and try the sample code out for themselves. Free licenses are available through Vivado WebPack Edition, or a free 30-day trial of Vivado System Edition.
The book also includes several textbook-like features that make it particularly valuable in a classroom setting. For instance, it also asks questions within each chapter that will challenge the reader to help solidify their understanding of the material as they read along. There are also associated projects that were developed and used in an HLS class taught at the University of California at San Diego (UCSD). UCSD will make the ﬁles for these projects available to instructors upon request. Each project is more or less associated with one chapter in the book and includes reference designs targeting FPGA boards that are distributed through the Xilinx University Program.
As you might expect, the complexity of each project increases as you read along, which means that the book is intended to be read sequentially. Using this approach, the reader can see, for example, how the optimizations of the HLS approach are directly applicable to a specific application. And each application further explains how to write HLS code. However, there are drawbacks to the teach-by-example approach. First off, most applications require some additional background to give the reader a better understanding of the computation being performed. Truly understanding the computation often requires an extensive discussion of the mathematical background of the application. That may be oﬀ-putting to a reader who just wants to understand the basics of HLS, but Neuendorffer believes that such a deep understanding is necessary to master the code restructuring that is necessary to achieve the best design.
Although the chapters in “Parallel Programming for FPGAs” are arranged to be read sequentially and grow in complexity as the reader moves along, a more advanced HLS user can read an individual chapter if he or she only cares to understand a particular application domain. For example, a reader interested in generating a hardware accelerated sorting engine can skip ahead to Chapter 10 without necessarily having to read all of the previous chapters.
Xilinx strongly embraces HLS as an effective design process for developing FPGA integrated circuits to build hardware that works smartly and effectively in the fields of automotive, aircraft, satellite and other emerging technology. “Parallel Programming for FPGAs” will be an effective and essential guide for developing such products going forward. Keep it within reach on the desk in your lab.
Matrix-vector multiplication architecture with a particular choice of array partitioning and pipelining.
The pipelining registers have been elided and the behavior is shown at right.
By Dale Hitt, Director of Strategic Market Development at Xilinx
We are honored that Xilinx ML Suite received the 2018 Vision Product of the Year award for the best cloud technology at the Embedded Vision Summit this week in Santa Clara, California.
Xilinx ML Suite enables developers to easily integrate accelerated machine learning (ML) inference into their current applications. What is particularly innovative about the Xilinx ML Suite is that cloud users of ML inference can easily achieve more than an order of magnitude better performance and cost savings over a CPU-based infrastructure without the custom development.
Traditional datacenter processors have not been able to keep up with compute-intensive workloads running in today’s cloud, such as machine learning, genomics, and video transcoding. The ML Suite delivers a dramatic improvement in machine learning inference performance with uniquely adaptable Xilinx technology.
Xilinx ML Suite already works on major cloud platforms such as Amazon EC2 F1 in numerous regions in the US and Europe. It supports popular machine learning frameworks such as Caffe, MxNet, and Tensorflow, as well as Python and RESTful APIs. Applications that utilize the ML Suite can be deployed in both cloud and on-premise environments.
In sum, Xilinx ML Suite delivers low-latency, high-throughput, and power-efficient machine learning inference for real world applications.
Xilinx’s Nick Ni (right) accepts the Cloud Technology Vision Product of the Year Award
from Jeff Bier, founder of the Embedded Vision Alliance.
Photo courtesy of EVA.