We have now entered the data centric era.
This new, data centric era is one in which organizations make decisions based upon the real time analysis of massive amounts of data. This era is driven by the explosion of data that is being created all around us. Everyday, more mobile and smart devices, more sensors, and more apps are creating massive volumes of data far beyond anything seen previously.
This means scientists can make new discoveries and cure diseases faster, law enforcement is better able to spot threats and keep us safer, and businesses make better decisions and get products to market faster. In this data centric era, organizations with the most data and the ability to turn that data into actionable information have a significant advantage over the competition.
In the data centric era, the Internet of Things (IoT), log data, sensors, and more are creating previously unimagined volumes of data. New methods of GPU based processing are breaking that data down into parallel streams for high speed analytics and AI based applications to use that data to identify patterns.
Collecting all this data, then processing and analyzing it, and finally storing it for future use is what powers the data centric era. Now the biggest challenge lies between the creation of that data and the active conversion of that data into information.
There are two parts to this challenge, the first being the collection of all the data being created. As more and more data is created, that data needs to be stored somewhere. Storage platforms in this new world need to ingest ever growing volumes of data, at constantly increasing rates of speed, making high write performance critical.
The need for high speed ingest is problematic for traditional data storage platforms, which have long been designed for read performance. In fact, most storage vendors don’t even publish their write performance specifications. Given that it is often only 10-20% of their read performance, who can blame them?
The second challenge is, once the data has been ingested by the storage, that same system needs to deliver the data to compute systems for processing and analysis in a manner that enables them to provide real time analysis.
Modern compute systems leverage the power of GPUs to process massive volumes of data in parallel to derive faster time to results. To deliver on their potential, these systems need to be fed data so that all of the GPU cores are fully utilized. Idle GPUs provide no benefit. This is why NVIDIA CTO Michael Kagan was quoted as saying, “High performance computing, requires high performance IO”, at Supercomputing20.
High performance IO means a data storage solution that can deliver both high throughput and ultra-low latency. Many storage systems can provide high IO through sheer brute force. Simply put, combine enough systems in a scale-out architecture and you can drive massive throughput. However, doing so with a small, efficient footprint is far more difficult.
The need for low latency further complicates that challenge. The key to getting the most out of GPU based computing platforms is low latency IO. To this end, NVIDIA has created Magnum IO GPUDirect Storage (GDS), which uses RDMA technology to provide ultra low latency data access using fabric attached storage. Traditional storage systems which were designed for bandwidth, not low latency, are often challenged with meeting this need.
The data centric era is upon us. It is rife with both opportunities and challenges. Those who leverage the next generation of technologies to solve those challenges and harness the power of data will usher in the next great wave of innovation. Those who do not run the risk of being left behind.
Read this article to discover how a compact, high performing, ultra-low latency storage platform can help you get ahead in the data centric world.