Rapid advances in microscopes, detectors, and algorithms are changing the scale and pace of Cryo-EM workflows. For example, accelerating sub 3 Angstrom resolution reconstructions requires serious GPU/CPU and storage resources to process hundreds of thousands of raw particle images.
There are fundamental data storage bottlenecks in standard Cryo-EM data solutions.
For image acquisition, NAS and server-based SSDs quickly fill up. Copies are made, and data must move to centralized shared storage for iterative processing.
For researchers and scientists, lengthy delays for data ingest from microscopes and copy/ move functions to ingest servers with Direct-Attached Storage (DAS) can consume 8 hours/ day or more. This means microscopes are unavailable for creating micrographs, and research is stopped while data copy/move and checksums are performed. Thus, over a year, as many as 120 days of science are halted by inadequate Cryo-EM database storage infrastructure design.
If GPU servers like an NVIDIA DGX A100 are used, the same problem compounds. A DGX A100 has minimal internal NVMe capacity, meaning data must be copied and moved (again) from acquisition storage to GPU processing, then copied/moved back-and-forth to shared storage for researcher processing and modeling. With small data sets always on the move, it is also difficult to assure the fidelity of the ultimate result.
Pavilion's enterprise-proven architecture and HyperParallel File System supports SMB for image acquisition, NFS v3, v4, NFS RDMA, pNFS, Gluster, and application plug-ins for Spark and Hadoop. The system also supports the S3 object storage protocol for archival and hybrid cloud configurations. When configured as block storage, Pavilion can work with external file systems like IBM Spectrum Scale, BeeGFS, and Quantum StorNext.
With industry-leading performance density (2PB in 4RU) and standards-based networking, Pavilion is easy to deploy and operate. Additionally, the system seamlessly plugs in as a performance storage tier in organizations with existing investments in NAS or Parallel File Systems in a defined workflow of existing storage investments.
Researchers can accomplish more work in less time by removing research data storage as the bottleneck in a Cryo-EM workflow. Large data sets can be accessed and managed by consolidating storage into a single, highly scalable system. Gone are the days of making copies of data, moving data, and long wait times to retrieve images and models from archives. The ultimate results are more rapid discoveries with higher fidelity. To learn more, check out the latest Cryo-EM Solution Brief and White Paper.