Motivation Behind Designing the Pavilion Platform

Over the course of two decades, I have been deeply involved in developing products that spanned various domains such as data networking, storage area networking, PCIe interconnects, IO Virtualization, Flash controller, and all-flash array systems. My journey took me through a variety of companies such as Cisco/Andiamo, Aprius (acquired by FusionIO) and Violin Memory and I am proud to be part of these teams that developed and shipped innovative products that served the data center.

Around the time I was at Aprius the idea of sharing IO on a compute node gained credence when it became evident that every type of IO (Network, Storage, etc.) was designed for peak bandwidth and the size and power consumption of the interconnect increased the overall cost and complexity. Two standards, SRIOV and MRIOV emerged with the idea of sharing IOs on a single host or multiple hosts respectively. Of this, MRIOV did not succeed because the economic drivers were not sufficient for the IO vendors. SRIOV became successful along with the advent of virtualization in servers where each VM could get an independent share of the IO. Despite the failure of MRIOV, the problem of sharing IOs across servers remained and we figured out a way to create technology that used these developments in the ecosystem to effectively share IO devices (SSDs) across multiple servers.

While we were developing the IOV system with the intent of sharing all IO, flash technology was taking off. Early SSD implementations simply replaced rotating media with NAND flash to maintain connection compatibility. The disk-drive form factor and existing interface allowed IT vendors to seamlessly substitute an SSD for a magnetic disk drive. It provided faster random access and data transfer rates than electromechanical hard disk drives (HDD) for sure, but the host interface was not able to fully exploit the NAND performance. To bring out the raw performance of NAND some vendors started using PCIe directly as it offered scalable bandwidth, deterministic latency and a guaranteed delivery protocol. However, this led to PCIe card form-factors that created serviceability problems and required custom drivers that implemented proprietary flash access protocols.

To address this issue, two industry standards for PCIe flash devices emerged – Non-Volatile Memory Host Controller Interface (NVMHCI, a predecessor of NVMe) and Small Computer System Interface (SCSI) Express. The latter did not gain traction as SCSI protocol limited the performance of NAND flash. NVMHCI/NVMe was developed by a consortium of 80+ vendors including Intel, Samsung, Sandisk, Dell, Seagate, and others. It was designed from the ground up to capitalize on the low latency and parallelism provided by PCIe SSDs. As these protocols were evolving we recognized the potential to pair the NVMe queues with the SSD and RoCEE queues on the Ethernet to provide remote access to NVMe SSDs over network. This early insight gave us a head start and we worked on independently developing technologies that eventually became NVMe-oF.

As media technology, networking bandwidth, and host protocols advanced it became evident that the storage controller was the bottleneck that limited the server from realizing the full performance of NVM media. This realization brought clarity and we recognized that the CPU centric storage array that had served the industry for 20+ years had to be completely overhauled. In order to provide the full benefit of the media performance so that customer gets maximum value for their investments in high-performance server, network and media, Sundar (co-founder) and I put together a small founding team and set out to work on a Fabric centric array thus seeding the birth of Pavilion Data Systems.