This blog post is excerpted from the keynote presentation of Salil Raje, EVP and GM Xilinx Data Center Group, given March 24, 2021, at Xilinx Adapt: Data Center. To see Salil’s keynote on-demand, along with a great slate of presentations by industry experts, you can register and view the content here.
Most of us are still meeting with our co-workers via online video conferencing after the paradigm shift caused by the COVID-19 pandemic. You are probably not thinking much of what it takes to stream all the content and feeds from your meetings. But if you’re a data center operator you probably haven’t been getting a lot of sleep over the past year worrying about how to handle the unprecedented pandemic surge in video traffic.
Not only that, but data centers these days must handle an explosion of unstructured data from a broad range of workloads like video conferencing, streaming content, online gaming, and e-commerce. Many of these applications are very sensitive to latency and are also subject to ever-evolving standards for compression, encryption, and database architectures.
This has forced data centers to scale up their infrastructure to meet the performance and latency requirements of a variety of demanding workloads, while at the same time trying to minimize cost and power consumption. That’s proving to be very difficult, and it’s forcing data center operators to rethink their current architecture and explore new configurations that are inherently more scalable and efficient.
Currently, most data centers have racks with fixed sets of resources, combining SSDs, CPUs, and Accelerators in a single server. While this ensures a high bandwidth connection between compute and storage, it is very inefficient in terms of resource utilization, since there’s a fixed ratio of storage and compute in every server. As workloads require a different mix of compute and storage, islands of unused resources are left in each server.
A new architecture is emerging that promises to make a dramatic improvement in resource utilization. It’s known as “composable infrastructure”. Composable infrastructure entails decoupling resources and instead pooling them together and making them accessible from anywhere. Composable infrastructures enable the provisioning of workloads with just the right amount of resources, and rapid reconfiguration via software.
A composable architecture with pools of CPUs, SSDS, and accelerators that are networked together and controlled by a standards-based provisioning framework promises vastly improved data center resource efficiency. In such an architecture, different workloads might have different compute, storage, and acceleration requirements, and those resources will be assigned accordingly with no wasted hardware. That all sounds great in theory, but in practice, there is one big issue: Latency.
The Latency Challenge
As you disaggregate resources and move them farther apart you incur more delays and reduced bandwidth due to the network traffic between CPUs and SSDs, or between CPUs and accelerators. Unless you have some way to reduce the network traffic and interconnect the resources in an efficient way, this can be severely limiting. That’s where FPGAs play three major roles in solving the latency challenge:
FPGAs act as adaptable accelerators that can be customized for each workload for maximum performance.
FPGAs can also bring compute closer to the data, thereby reducing latency and minimizing the required bandwidth.
The adaptable, intelligent fabric of FPGAs enables efficient pooling of resources without incurring excessive delays.
The first significant advantage for FPGA-based compute accelerators is dramatically improved performance for workloads that are in high demand these days. In video transcoding use cases for live streaming applications, FPGA solutions typically outperforms x86 CPUs by 30x, which helps data center operators meet the huge increase in the number of simultaneous streams. Another example is in the critical field of genomic sequencing. A recent Xilinx genomics customer found that our FPGA-based accelerator delivered the answer 90 times faster than a CPU, helping medical researchers test DNA samples in a fraction of the time it once took.
Moving Compute Closer to Data
The second key advantage for FPGAs in a composable data center is the ability to bring adaptable compute close to the data, whether at rest or in motion. Xilinx FPGAs used in SmartSSD computational storage devices accelerate functions like high-speed search, parsing, compression, and encryption, which are usually performed by a CPU. This helps offload the CPU for more complex tasks but also reduces the traffic between the CPU and the SSDs, thereby cutting down on bandwidth consumption and reducing latency.
Similarly, our FPGAs are now used in SmartNICs like our new Alveo SN1000 to accelerate data in motion with wire-speed packet processing, compression, and crypto services as well as the ability to adapt to custom switching requirements for a particular data center or customer.
When you combine an FPGA’s adaptable compute acceleration with low-latency connectivity, you can go a step further in the composable data center. You can assign a compute-heavy workload to a cluster of accelerators that are interconnected by an adaptable intelligent fabric - creating a high-performance computer on demand.
Of course, none of this is possible if you can’t program the compute accelerators, SmartSSDs and SmartNICs with the optimal acceleration algorithms, and then provision them in the right numbers for each workload. For that task, we have built a comprehensive software stack that leverages domain-specific industry frameworks like TensorFlow and FFMPEG, which work in conjunction with our Vitis development platform. We also see a role for higher-level provisioning frameworks like RedFish to help with intelligent resource allocation.
The Future is Now
The promise of the composable data center is an exciting change and Xilinx devices and accelerator cards are key building blocks to this new efficient architecture. With rapid reconfigurability, low latency, and a flexible architecture that can adapt to changing workloads, Xilinx is well-positioned to be a major player in this evolution.