By Dr Farrah Blades, Postdoctoral Research Fellow, Institute for Molecular Bioscience (IMB), UQ
Farrah explains her involvement with RCC's Bunya Open OnDemand project.
Midway through 2021, I was beginning my first post-doctoral research project. During COVID lockdowns, I began learning the technique cryogenic electron microscopy (Cryo-EM), something which I had learned in my bachelor’s studies but had never put into practise.
Cryo-EM is commonly used to assess protein structure at near atomic level. It relies upon freezing the protein in vitreous ice, exposing the protein to an electron beam, detecting how the electrons are scattered by the protein and extrapolating the protein structure from the detected electrons.
While it may sound straightforward, the data collected from the microscopes regularly reach sizes of 5 TB in its raw state. The data then needs a lot of processing before we can interpret the structure of the protein.
Processing of the Cryo-EM data is of itself interesting as it relies on quite a few steps of user input which are visual — i.e., the selection of good data or visual assessment of when a step has worked well or needs adjusting — but it often necessitates the analysis of a lot of data and depends upon high-performance CPU, GPU, high-memory density and very fast storage, combined. There can be tens of millions of individual protein particles that the specialised software needs to categorise just from one data set.
At The University of Queensland (UQ), the traditional way to process the Cryo-EM data was by using a combination of commercially bought workstations for the visual component and sending data by command line to the Wiener GPU HPC for the heavy compute component. However, our group did not have a commercially bought workstation or any other suitable substitute to do the visual side of the data processing.
UQ had previously built an e-desktop platform called the Characterisation Virtual Laboratory (CVL), which was built to help researchers with a similar problem. CVL has run well for many years for the Lattice Lightsheet Microscopy suite at UQ, however it had not been an optimal tool for the Cryo-EM community, mainly due to many of the very specialised software applications needing a custom deployment environment to work on CVL’s infrastructure and the exceptionally large resource requirements and sheer data size of Cryo-EM workloads.
Together with RCC Operations Engineer Dr Edan Scriven, we did manage to get the Cryo-EM tools working on CVL, but due to the architecture of the platform and capacity of the desktops, biologists still needed to learn command line to process their large data.
CVL really was not going to replace workstations by any means, and I was determined to help users process their data in a way that is better than what they were currently accustomed to. That is when some other Cryo-EM experts, members of the Centre for Microscopy and Microanalysis (CMM) and I met with RCC Director Jake Carroll to re-express the unique needs of the Cryo-EM community.
It was apparent from the beginning that Jake and his team had seen the need for a powerful and flexible virtual desktop for UQ. They fully accepted our feedback when it came to the new GPU hardware bought for Bunya and how we were going to make this happen.
The journey was also accelerated by the fact that the current GPU price market has made new-age, powerful GPUs completely unobtainable for average researchers, and the fact that the RTX-style, cheaper GPUs can no longer handle the data sizes that we are dealing with.
As the Bunya supercomputer was being planned, it was essential that graphical type GPUs were purchased, and that the setup of the architecture would allow a virtual desktop to talk to the rest of the supercomputer without users needing to write script or lodge any large jobs by command line. This would provide a seamless experience for the user.
RCC Research Computing Systems Engineer Sarah Walters soon joined the project and has really driven the build and deployment of the Open OnDemand (OOD) desktop, called onBunya. It has become an even more powerful virtual desktop than I could have imagined. It is super flexible in terms of what silicon you can put beneath it, and the OOD desktops can indeed talk to the rest of the Bunya HPC.
Edan and I have been working for around six months on getting the Cryo-EM data processing software working seamlessly on onBunya, and I am really happy with the result. During the year, I will teach others in the Cryo-EM community how to use this new resource.
Hopefully, onBunya will replace the need for off-site server subscriptions (I know this has already happened in one case), and expensive, quickly outdated workstations.
I believe onBunya puts UQ at a huge advantage in the structural biology field. People will have less wait time from data collection to data publication, which will make us more competitive for funding in the future.
Another exciting thing has been watching people outside of the Cryo-EM community use the onBunya desktops and be really impressed with their performance.
Overall, this new resource may be the thing that keeps structural biology on an upward trajectory at UQ.
As we are all firm believers in open-access code to accelerate academia, we hope to package up the onBunya desktop code for other universities to deploy on their resources.
The whole experience has been very rewarding: I went from being someone who had never seen a command terminal to being involved in extremely computationally heavy and rewarding projects in the space of two years.
This project was only possible due to the trust and communication streams built between RCC and biologists at UQ. I hope to continue working with the RCC team to find other computational needs we can fulfil for researchers and to keep pushing for better.
Read more about onBunya.