Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning are changing everything about how we work, play, and live. It is clear that these transformative technologies will have a profound impact on every aspect of the world we live in.
Accessing and Ingesting massive volumes of data is the key to unlocking the potential of these technologies.
AI, ML, DL work by analyzing vast amounts of data and identifying patterns that no human would be able to see. The ability to analyze the mountains of data required has been enabled by GPUs, which can process data streams in parallel making them much more efficient at this than CPUs. What is now needed is a storage solution that can deliver the data to the GPUs with extreme high performance and ultra-low latency.
GPU based systems, such as those from NVIDIA, typically come with NVMe SSD drives that can deliver data to the GPUs at high speed. As direct attached storage, these SSDs can deliver data with high throughput and ultra-low latency. The challenge with this model is capacity.
The value an AI model can deliver is directly proportional to the volume of data that it can analyze during a given timeframe. The more data it can use, and the faster it can use that data, the better the model will be. Internal NVMe storage checks the box for fast storage, but does not and cannot meet the requirement for the volume of storage required to optimize outcomes. You can only put so many drives in a server. Only an external storage solution can deliver the ability to scale to meet the capacity requirements of any dataset.
For AI initiatives to be successful, there is a need for an external storage solution that can start small and seamlessly grow to exascale capacity, while also providing the same performance as direct attached NVMe SSDs.
To deliver the same performance as internal NVMe SSDs an external storage system must deliver on two key measurements, throughput and latency.
Throughput is the amount of data that can be transferred between a given server and storage system. It is measured in GB/s and there are two key measurements that need to be considered. The first is read performance. This is the number that is typically used by storage vendors to highlight their product.
Different AI solutions will use data from a number of different sources and that data can be stored as block, file, or object data. When comparing storage solutions, it is critical to evaluate them on the same criteria.
Pavilion uniquely delivers industry leading performance for all three data types, simultaneously.
Getting it “Write”
The second key measurement is write performance. AI solutions are often used to find patterns in real time. These could be tracking and intrusion detection for cybersecurity, financial analytics for market trends, facial recognition for law enforcement, and many other uses.
This involves collecting massive volumes of data from where it is created and making it available to the GPUs for processing. The gating factor in this is the data ingest (write) performance of the storage solution.
Storage solutions have historically been tuned for read performance, and as a result few offer the write performance required for modern AI. In fact, very few storage vendors even publish their write performance.
It is fair to say that if a vendor will not tell you how they perform in a particular metric, it is probably because they do not perform very well. (Read more about the importance of Normalizing Vendor Performance Numbers in this Blocks and Files article and this Pavilion blog where we provide a tool for customers to use to evaluate apples to apples customer claims).
For organizations that suffer from poor write performance derived from limitations from legacy storage systems, they end up having to filter the datasets they use with their AI models to analyze outcomes, which leads to missed trends and lost opportunities.
Pavilion, and its purpose built architecture delivers high performance for both read and write operations. Specifically, the Pavilion HyperParallel Data Platform™ writes data at up to 90GB/s, which is faster than most competitors' read performance, let alone their unpublished write numbers.
GPU based systems have revolutionized what can be done with AI by parallelizing processing.
To get the most out of your GPUs, organizations need a storage solution that delivers high throughput and low latency. Without low latency, the GPUs will have to wait for data and when doing real time analytics, they could miss critical data.
Pavilion delivers unrivaled ultra-low latency of as little as 100µs for reads and 25µs for writes. Latency that low means Pavilion delivers scalable, disaggregated storage with performance that is the same or better than internal NVMe SSDs.
Pavilion uses NVMe-oF and RoCE to deliver latency this low and it is measured at the host, so it includes network latency. Most storage vendors only measure latency within their array and do not include network latency. Pavilion is not just faster, it is are faster over a greater distance.
Performance at Scale
In addition to the performance of local NVMe SSDs, the Pavilion HyperParallel Data Platform provides unlimited, linear scaling of performance and capacity. Each Pavilion system supports up to 2PB of capacity and up to 120GB/s of performance.
This means that two Pavilion systems deliver up to 4PB of capacity and 240GB/s of performance. Five systems offer 10PB and 600GB/s of throughput. Performance and capacity grow linearly with each array.
Pavilion uniquely delivers the highest performance, lowest latency solution for AI, ML, DL, Big Data Analytics, and a host of other applications. The Pavilion HyperParallel Data Platform offers industry leading performance for block, file, and object workloads simultaneously. With the ability to provide the same or better performance than direct attached NVMe SSDs and unlimited scale, the Pavilion HyperParallel Data Platform is the best choice for every modern application.