Faster Cryo-EM Results From High-Performance Storage

Cryo-EM is changing the landscape for drug discovery.  With millions invested in microscopes, powerful servers, and GPU technology a common denominator is slowing the rate of discovery– data storage.

While it is common to have NVMe SSDs inside servers for image acquisition, the massive files generated by Cryo-EM quickly saturates these drives.  Copies of the files are made and the data moved to centralized shared storage for iterative processing.  If a GPU-based system, such as a NVIDIA DGX A100 is used, the same problem compounds.  A DGX A100 has a minimal amount of internal NVMe capacity, meaning data must be copied and moved (again) from acquisition storage to GPU processing, then copied/moved back-and-forth to shared storage for researcher processing and modeling.

This complex process takes weeks, even months.  Inefficient data management wastes precious microscope and researcher cycles.  With small data sets always on the move, it is also difficult to assure the fidelity of the ultimate result.

Pavilion offers a better way.  Unlike traditional all-flash storage arrays, Pavilion’s HyperParallel Data Platform has the storage speed and storage capacity to serve as an image acquisition target, a GPU accelerator, and a secure, shared file system for processing and modeling.

With unrivaled write speeds of up to 90GB/s, Pavilion can ingest data from multiple microscopes simultaneously. Likewise, reading from GPUs is the fastest technology on the market, with speeds up to 120GB/s. Plus, Pavilion is the performance leader for NVIDIA MagnumIO GPUDirect Storage, enabling unmatched low latency to dramatically speed data movement.  For shared storage, the storage platform offers up to 2.2PB in a 4RU form-factor which means this NVMe all-flash array has the best performance density available in the market.

Pavilion’s HyperParallel File System supports SMB for image acquisition, NFS v3, v4, NFS RDMA, pNFS, Gluster, and application plug-ins for Spark and Hadoop.  The system also supports the S3 object storage protocol for archival and hybrid cloud configurations. Additionally, when configured as block storage, Pavilion can work with external file systems like IBM Spectrum Scale and Quantum StorNext.

Most importantly, due to the unique architecture of Pavilion, block, file, and object protocols can operate concurrently, each with industry leading performance, all in the same data storage system.Customers can scale out across systems in a linear fashion with a global namespace.

Researchers can accomplish more work in less time by removing storage as the bottleneck in a Cryo-EM workflow.  Large data sets can be accessed and managed by consolidating storage into a single, highly scalable system.  Gone are the days of making copies of data, moving data, and long wait times to retrieve images and models from archives.  The ultimate results are more rapid discoveries with higher fidelity.

To learn more about how Pavilion accelerates Cryo-EM workflows, please read Pavilion’s Cryo-EM Solution Brief.