Getting the Write Storage for Your Data

The explosion in the rate of data creation has exposed a fundamental flaw in most data storage systems - poor write performance. As data creation rates increase, through the Internet of Things (IoT), the use of sensors, log data, smart devices, and all the other ways data is exploding, then all of that data needs to be collected onto a storage system. Once the data is on the storage system it can be sent to processing systems for analysis. If your storage platform has poor write performance, and most do, then it is likely that the data your organization relies on to make decisions is incomplete. 

What if the data is being created faster than the storage system can write to it? The typical answer is that the data gets dropped. Some form of sampling will be done and some data will be kept while other data is discarded. 

Incomplete data means that when your organization makes a data driven decision, and today all decisions should be data driven, you could be missing that key data point that would have prevented you from making a wrong decision. Finance companies could have executed a better trade or avoided fraud, security teams could have caught an intrusion sooner, law enforcement could have identified a threat, or a company could have identified a competitive advantage sooner. 

Every time data is discarded it is an opportunity for error. To financial services companies, it could mean missing an important trade or not detecting fraud. For law enforcement it may mean missing a credible threat. Life sciences teams may miss a key marker, impacting trials for a new drug. The list goes on. In today’s environment, where data determines outcomes, information cannot be dropped. 

To meet the need to ingest this growing volume of data, storage systems need high write performance. Yet, most storage systems have not been designed that way. Storage arrays have always been built for high read performance. 

Storage array vendors have been able to produce extremely high performance numbers by focusing on reads. They typically put some amount of DRAM in the array that acts as cache. When they do testing to get the performance numbers that they will publish, they are doing sequential reads from that cache. So, the read performance numbers that most vendors publish are not really what the array will do under load, it is what the cache can do. 

Write numbers can be even worse. For writes, data is typically written to the cache and then it is broken down into the blocks that will be striped across each drive, parity is calculated, and then the data is written. 

All of these steps are why most storage arrays have poor write performance, and it is why almost no storage vendor publishes their write performance. For most products, write performance is typically less than 20% of read performance. 

The Pavilion HyperParallel Data Platform, which uses a fundamentally different architecture than traditional storage arrays, does not suffer from this problem. The Pavilion HyperParallel Data Platform is a cacheless system.  Instead of using expensive DRAM or SCM to artificially inflate performance, it leverages the power of multiple controllers to consistently deliver unrivaled read and write performance. Rather than one controller managing data in cache, multiple controllers handle reads and writes in real time. 

The result is that Pavilion delivers up to 120GB/s of read performance and up to 90GB/s of write performance per system for block storage. File performance is equally impressive with up to 90GB/s for reads and up to 56Gb/s for writes. For object storage, Pavilion delivers up to 80GB/s and 35GB/s for reads and writes, respectively. 

This is unmatched read and write performance, for each data type, from a single system. When additional systems are used for scale out, performance scales linearly across systems as a result of the unique Pavilion architecture. 

As the rate of data creation continues to increase, the ability to ingest data quickly has become critical to every environment. If the data cannot be written quickly enough, then data can be lost, impacting results. Pavilion uniquely solves this problem. 

To learn more about how Pavilion delivers unmatched read and write performance, view the Pavilion performance solution brief.