RCC HPCs aid evolutionary discovery to rewrite textbooks

19 Dec 2019
Professors Bernie and Sandie Degnan with Sir David Attenborough on Australia's Heron Island.

A significant University of Queensland evolutionary discovery published in Nature had the helping hand of the Research Computing Centre’s high-performance computers.

Researchers in the Degnan Marine Genomics Labs used HPCs Awoonga and FlashLite in their work for the Nature paper, published in June this year.

The paper upends biologists’ century-old understanding of the evolutionary history of animals.

Using new technology to investigate how multi-celled animals developed, the findings revealed a surprising truth.

Senior co-author Professor Bernie Degnan, Director of UQ’s Centre for Marine Science, said the findings challenge a long-standing idea: that multi-celled animals evolved from a single-celled ancestor resembling a modern sponge cell known as a choanocyte.

“We’ve proposed that the first multicellular animals probably weren’t like the modern-day sponge cells, but were more like a collection of convertible cells,” Professor Degnan said.

“The great-great-great-grandmother of all cells in the animal kingdom, so to speak, was probably quite similar to a stem cell.

“This is somewhat intuitive as animals have many cell types that are used in very different ways—from neurons to muscles—and cell flexibility has been critical to animal evolution from the start.”

The team sequenced all of the genes expressed in individual cells, allowing the researchers to compare cell types over evolutionary time.

Fellow senior co-author Professor Sandie Degnan said this meant they could tease out the evolutionary history of individual cell types, by searching for the ‘signatures’ of each type.

“Many biologists for many decades believed the existing theory to be true, as sponge choanocytes look so much like single-celled choanoflagellates—the organism considered to be the closest living relatives of the animals,” she said.

“But their transcriptome signatures simply don’t match, meaning that these perhaps aren’t the core building blocks of animal life that we originally thought they were.

“This single cell sequencing technology has been used in the last few years, but it’s helped us finally address an age-old question, discovering something completely contrary to what anyone had ever proposed.

“We’re taking a core theory of evolutionary biology and turning it on its head,” she said.

“Now we have an opportunity to re-imagine the steps that gave rise to the first animals, the underlying rules that turned single cells into multicellular animal life.”

Professor Bernie Degnan said he hoped the revelation would help us understand our own condition and our understanding of our own stem cells and cancer.

The Degnan Marine Genomics Labs’ HPC usage                        

As previously mentioned, researchers in the Degnan Marine Genomics Labs used HPCs Awoonga and FlashLite in their work.

Awoonga is a conventional HPC cluster and FlashLite is for data-intensive applications.

PhD student Xueyan Xiang initially used Awoonga simply because the research data she needed was first made available to her on the HPC. “It was easy just to carry on using it!” she said.

As it turned out, she discovered Awoonga has a very good range of bioinformatics software available on it, as well as RStudio—free tools for R, an open source statistical language.

She was equally happy to use Awoonga as she required the ability to run a large number of jobs.

Meanwhile, postdoctoral fellow Dr Haojing Shao found FlashLite to be the best fit for his work as the HPC has plenty of main memory (RAM) and the resources were available when he needed them.

"My research benefits from FlashLite. I could run any program at anytime without any queueing time: It is time saving," said Dr Shao.

He used the Canu program, a single molecule sequence assembler, on FlashLite to run a very memory-intensive assembly of the Xestospongia bergquistia (a barrel sponge) genome.

He also used the HPC for BLAST searches, a bioinformatics tool for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA or RNA sequences.

QFAB Database and Systems Administrator Nick Rhodes, who is embedded within the Degnan Marine Genomics Labs one day per week, guided Haojing and Xueyan, and other Degnan Lab researchers, through the complex terrain of high-performance computing.

“The researchers are a bunch of very talented biologists whose computer skills often don't extend to the Linux command line,” said Nick.

“When they get into the analyses, I install and upgrade the software but also advise on the computational requirements—and if necessary direct them to HPC resources, such as QRIScloud, Awoonga and FlashLite.”

The group has Nectar virtual machines (VMs) that are attached to QRIScloud storage (QCIF’s cloud compute service, which is free for UQ researchers) but these VMs are not appropriate for genome assembly.  

“I spent a lot of time moving data (and results) between group resources and HPCs, plus it's also important to facilitate data transfer between the authors in a timely manner; recent past lab members are now in the USA, UK, Mexico, Austria, Chile, as well as Australia,” said Nick.

Other important work that the Degnan Labs is doing includes an elucidation of sea squirt genome for biotechnological applications and analysis of the crown-of-thorns starfish (COTS) genome for biocontrol technologies

COTS feed on coral and in plague proportions they can devastate hard coral communities. The Degnan Labs is looking to develop biological agents for its control on infested regions of the Great Barrier Reef.
 

Researchers
:

Professor Bernie DegnanProfessor Sandie Degnan, Ms Xueyan Xiang and Dr Haojing Shao
Centre for Marine Science
School of Biological Sciences
University of Queensland
 

Research community:

Marine Science
 

Resources used:
  • QRISdata: 30 TB. Frequent access storage.
  • QRIScloud:
    • 52  vCPUs
    • 208 GB RAM
    • 4 TB volume storage
  • QFAB’s high-memory node
    • 1 TB
  • Awoonga HPC
    • General-purpose compute for long-running queries using proprietary or turnkey software not available on group virtual machines.
  • FlashLite HPC
    • 4 TB frequent access storage (/30days/)
    • 400 GB (/90days/).

Latest