Real-world challenges of Cryo-EM in the cloud – Part 2

Recently we shared a blog about challenges running Cryo-EM workloads in the cloud. Cloud-based solutions can help address scaling data processing for small organizations and data sharing for community research projects. However, there are fundamental challenges for Cryo-EM at scale in data acquisition, transfer, and processing that demand high-performance infrastructure on premises that the cloud cannot deliver cost-effectively.

Data Acquisition and Transfer:

New experiments need storage and data services before computational activities can begin. Inadequate compute and data services are often the reason for delays and analysis backlogs at many Cryo-EM facilities resulting in wasted days and weeks of valuable research time. Moreover, many of these workflow tasks are iterative and need to be performed several times to produce an accurate result. This research imperative further compounds the problem.

For researchers and scientists, lengthy delays for data ingest from microscopes and copy/ move functions to ingest servers with Direct-Attached Storage (DAS) can consume 8 hours/ day or more. This means microscopes are unavailable processing and research is stopped while data copy/move and checksums are performed. Thus, as many as ten days per month of science are halted by inadequate storage infrastructure design.

With Pavilion, nothing stops your science!

Offering up to 50GB/sec or write performance, data ingest from multiple microscopes is uninhibited. With up to 75GB/sec read performance, processing can start immediately as data continues to be ingested.

Packing 2B of usable NVMe capacity into a 4RU form factor, Pavilion delivers a scale-up and scale-out HyperOS™ file system that achieves near-linear performance as more systems are added to the cluster.

Once data is available on a high-performance filesystem, compute resources can be deployed, and the work of scientific analysis and molecule structure assembly begins. Using multi-core CPU and GPUs where relevant, a large compute complex is put to work to solve these structural biology problems. Once complete and checked for accuracy, the newly produced data must be copied to an object storage location and associated with the original datasets. This is a critical final step that is needed to ensure data lineage and experiment integrity. Development and use of a custom tool adds to the complexity of the process and adds an element of risk associated with data copy back to object storage.

Pavilion’s HyperOS supports file and object data simultaneously in the same system. Data can be moved from file to object and back with ease and data assurance. Data can be tiered to existing file system storage, newer low-cost HDD-based platforms, or seamlessly distributed to the cloud.

Conclusion

Researchers desire an environment wherein scientific experiments and activities are not hampered by IT operations. Furthermore, scientists are constantly finding new techniques and tools to improve the capabilities and accuracy of Cryo-EM outcomes. Scientists are continually looking to their IT counterparts to optimize the workflow executions of these extremely data-intensive workloads. Data movement and copying are responsible for many elongated workflow timelines and research backlogs. Coupled with availability and scalability issues, many Cryo-EM facilities are struggling to exploit their microscope investments.

Pavilion removes these bottlenecks and delivers unprecedented outcomes, reduces costs associated with cloud implementations while dramatically simplifying Cryo-EM workflows. To learn more, check out our latest white paper.