Don’t Be Fooled: Off-the-Shelf SSDs Make a Solid Design Choice in Storage Systems

Storage system vendors have chosen to integrate flash in two ways: incorporate standard off-the-shelf SSDs, or design their own flash modules and controllers. Many of the early all-flash array pioneers, like Violin and TMS, designed their own custom flash modules for what were very sound reasons at the time. The choice to go in one direction or another in this area revolves around several criteria, most notably Performance, Time-to-Market, and Cost. I explore all 3 as it relates to this subject below:

Performance

Many vendors made the design choice to build their own flash controllers before NVMe products were available, due to the inefficiency of SCSI-based protocols and lack of support for PCIe. NVMe SSDs now give you the benefit of being able to use a standard SSD over PCIe and without the SCSI protocol overhead.

Another stated justification for custom flash modules is that an in-house-built controller can improve performance by avoiding flash management that is being performed by an SSD flash controller. The bulk of this flash management is also referred to as ‘garbage collection’ which must be performed to free up deleted space in NAND flash, as well as account for flash cell data aging.

Flash management needs to occur somewhere, so deciding to do it in your own controller just moves the work to a different processing domain. The added benefit stated by vendors is that they can strategically schedule garbage collection (GC) activity in order to avoid impacting user workloads.

This would be a big advantage when evaluating SSDs of prior generations. SSDs were notorious for having “write cliffs” where write performance would drop significantly once GC kicked in. However, enterprise SSDs of today, particularly NVMe SSDs, minimize this problem as they can sustain the same performance whether GC is running or not by using sophisticated algorithms to schedule the background tasks. Also, in cases where a vendor wants more control, support for Advanced Background Operations (ABO) is starting to emerge in NVMe SSDs, allowing the system to strategically schedule these operations directly.

Time-To-Market

The SSD vendors have an army of engineers working on flash controller research and development and have multiple teams in parallel working on incorporating the latest NAND lithographies and technologies into standard form-factor products. Every new type of flash needs to be heavily characterized, integrated into the controller design and qualified extensively. This is a lot of work and requires a lot of engineering investment. Also, vertically-integrated controller engineering teams are working with the flash engineering teams within the same company, allowing for a lot of synergy and rapid development cycles.

If a storage vendor decides they are going to take this on themselves, they are most likely not going to be able to keep up with the latest memory technology advancements and consume engineering resources on problems that are already solved elsewhere. For example, most storage system vendors consider their core competency to be in the area of designing systems and great end-user features, not designing flash controllers and therefore choose to invest resources where they add the most value and differentiation, which tends to be at the system level. Investing in flash controller design that you can acquire easily in the form of standard SSDs just detracts from that in the end.

Cost

It would seem that a storage system vendor building their own flash controllers would be able to offer lower prices to end customers since they have cut out the “middle man”‘ in the form of the SSD vendor by buying raw NAND directly. However, it is more likely that pricing to the end customer won’t be much different, since the vertically-integrated SSD vendors (Samsung, Toshiba, Western Digital, Intel, Micron) all manufacture their own flash for their SSDs, mass-produce their products, and therefore drive costs down dramatically as a result.

If a storage vendor buys their own flash, they will not be buying it in enough volume to lower costs to their end-users significantly vs. a vendor that buys SSDs from a vertically-integrated manufacturer who mass-produces 100s of thousands of SSDs per year. In addition, flash controllers would typically be FPGAs in an initial design, and when purchased in lower volumes are much more expensive than embedded SSD controllers.

Also, by rapidly incorporating the latest flash into a design, a vendor can be on the leading edge of the cost curve. By using standard SSDs, a vendor can incorporate lower-priced flash technology in their products much faster than vendors who can’t adapt their custom designs as quickly, and thus offer lower costs to the market ahead of the competition while they re-work their flash controller to handle different flash.

Another cost-related factor is endurance. Vendors also claim that they need to design their own flash modules to increase endurance of the flash. This is primarily a cost-savings for the vendor, not the customer since most vendors warrant the system for unlimited writes and will replace a worn-out flash module under warranty. In reality, this type of write-induced wear out rarely ever occurs in the real world. SSD endurance is very good, and the amount of endurance required has historically been over-estimated, which is why endurance requirements for SSDs keep falling in each generation (remember SLC?). Most workloads don’t come close to requiring the highest-endurance-rated SSDs even, but they are available if needed (10 DWD+).

Pavilion’s Approach: Use Commodity SSDs

Pavilion has designed a data storage system that packs 72 standard-format NVMe SSDs into a single 4U Chassis, which means we can offer over 200 TB per Rack-Unit with today’s highest-density 2.5″ SSDs. Pavilion’s performance density is also second-to-none, which should debunk claims that you need to design custom flash modules in order to get high performance.

Leveraging standard SSDs also allows us to aggressively ride the NAND flash innovation curve, by relying on the fast adoption of new types of flash by the SSD vendors. For example, we have qualified NVMe Optane-based SSDs in our platform very rapidly, and moving from 2D-Planar to 3D NAND also took only a matter of weeks. The same exercise could take up to a year or more for a vendor designing their own flash controllers if they are doing it right. This means that we can adopt cheaper forms of flash, allowing for cost advantages to be passed on to our customers quickly and direct our engineering resources elsewhere.

As a result, we have optimized Cost, Time-To-Market, and Performance all at the same time by leveraging standard SSDs in our system, allowing us to build a storage platform unlike any the market has seen up until now.