Reported by Thomas Bollaert, Senior Director, Software Acceleration Tools
Abhishek Ranjan and Mohit Kumar from BigZetta Systems gave a very interesting presentation about the Apache Hadoop MapReduce framework and how they were able to accelerate it using Amazon Web Services (AWS) F1 instances.
MapReduce is at the core of almost all tools in the Hadoop ecosystem and is the preferred paradigm to solve big-data problems: machine learning, distributed databases, analytics, image/video processing, speech recognition, etc. All applications leverage MapReduce. While several MapReduce frameworks exist, Hadoop is the number one choice because of its resilience, scalability, and stability. Companies like Netflix, LinkedIn, Uber, Pinterest, and Facebook rely on Hadoop for their big-data applications. Per Gartner, AWS has sold more Hadoop capacity and hosted more Hadoop instances than all other commercial players combined. Given the above, accelerating Hadoop on AWS seems like a great opportunity for a company like BigZetta.
The two processing steps of Hadoop are map() and reduce(). Between these two steps, the Hadoop framework needs to sort, compress, and merge large quantities of data. BigZetta has analyzed that these steps can take longer than the actual map() and reduce() steps. In other words, the framework itself takes more CPU cycles than the actual computation. So, BigZetta embarked on a mission to accelerate the core functions of the Hadoop framework.
SDAccel to Optimize Data Movement: 10X on Sort and 6X on Merge
BigZetta used SDAccel to write C models of the sort() and merge() functions. They were able to try hundreds of different architectures and configurations in a couple of weeks. This would have been impossible with a traditional RTL development flow.
BigZetta also used the analysis tools built into SDAccel to quickly optimize data movement in their application. They were able to achieve a 10X performance increase on sort and 6X on the merge. Overall, their application runs 2.4X faster end-to-end on the WordCount benchmark.
Their experience with SDAccel was very positive and was key in helping them achieve great results in a short amount of time. Here is a summary.
What impressed me the most, however, is that BigZetta was incorporated in May of 2018 and they have a working product today. This means that in less than 6 months, they were able to transform their initial idea to a working and sellable FPGA product on the AWS marketplace. That’s a radical change from traditional FPGA-based product development cycles. This is a great example of how FPGAs in the cloud is a game changer for many businesses. Innovators no longer need to worry about sourcing FPGAs, designing boards, getting them manufactured, managing inventory, etc. With the cloud, all of this is taken care of automatically. FPGAs are available on-demand and scale at-will. That is truly a game changer.
With AWS, FPGAs are available on-demand. With SDAccel, FPGAs can easily be programmed from C/C++. That’s a winning combination that will allow many more innovators like BigZetta to take advantage of FPGAs in the cloud in the very near future.