RCC is working with UQ's Centre for Microscopy and Microanalysis (CMM) to deploy a research data management (RDM) solution to help its researchers handle large amounts of data from scientific instruments.
RCC and CMM are using Clowder, a customisable and scalable data management framework for structured and unstructured data developed by the US National Center for Supercomputing Applications (NCSA).
Clowder enables users to share, annotate, organise and analyse large collections of data sets.
Data from CMM’s instruments, such as microscopes, x-ray and mass spectrometry, will be ingested into the Clowder-based system, called Pitschi (Particle Imaging depoT using Storage CacHing Infrastructure), directly.
Project lead and RCC Systems Programmer Dr Hoang Nguyen said the development team evaluated a number of different platforms. “We decided to use Clowder as it can be customised to UQ storage and compute infrastructure, i.e. UQRDM, MeDiCI and UQ HPCs.
“Clowder also addresses the problem of data preservation in a world with growing amounts of digital data, much of which is not properly curated,” said Hoang.
The team has modified Clowder so it maps seamlessly with CMM’s instrument booking system, and with data storage on UQ Research Data Manager (UQRDM), and makes sharing and reusing data across campus and institutions easier.
They have also created a user-friendly image browser to navigate quickly through image and document data sets and enabled the harvesting of metadata during data ingestion.
CMM Data Informatics Manager Dr Rubbiya Ali, who worked closely with Hoang on the project, said: “We spent a lot of time working on pre-requirements, such as good network hardware, the protocols for high-speed communication between the repository and UQRDM, and then ingests of the data into the repository.
“With the help of RCC, we are now very close to achieving a complete data solution, Pitschi, for CMM instruments, which allows raw data to be streamed into the UQRDM and then into CVL@Weiner for researchers to do further 2D and 3D image processing.”
Pitschi is likely to be available in mid-2021. Its configuration ensures its integration with UQ’s new Research Infrastructure Management System (RIMS).
Pitschi is part of the ARDC-funded national Australian Characterisation Commons at Scale (ACCS) project.
The ACCS will deliver a rich ecosystem of computing systems, data repositories, workflows, and services, connected with instruments for researchers who use characterisation techniques or imaging collections, and facility scientists who run instruments.
The ACCS is expected to lower the burden of operating services, and deliver broad capability across four nodes: UQ, Monash University, University of Sydney and University of Western Australia.
Pitschi meets ARDC’s project requirement of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) data principles. FAIR data bring many advantages, such as achieving maximum impact from research and enabling new research questions to be answered.