Major AI and visualisation upgrades ahead for HPC Bunya

31 July 2024

A major upgrade for The University of Queensland supercomputer Bunya is expected to deliver even higher performance across multiple scientific research domains with a focus on multi-GPU training, large language models (LLMs) and generative AI (Gen-AI).

The upgrades will also bring a significant array of new visual computing capabilities, and ushers in a new era of using direct-to-chip liquid cooling technology to more efficiently cool supercomputer components.

The upgrade, named Bunya Phase 3.0 (BP3), is currently being implemented and will likely become ready for researchers to use in August 2024.

BP3 is eagerly anticipated as it provides thousands of UQ researchers with access to one of the most powerful supercomputing platforms in the Group of Eight (GO8) Universities.

RCC Director Jake Carroll said Bunya's uptake, popularity and utilisation amongst the University’s researcher community has become “extremely significant.”

“RCC must scale Bunya’s resources to meet research demands as they come. We have the agility and model of procurement to allow for that due to UQ's strategic decisions to invest in these foundational and underpinning capabilities, strengthening Tier-2 supercomputing structures and governance for the institution, state and broader sector,” said Jake.

BP3 will deliver a new class of NVIDIA H100 GPUs for even higher performance machine learning and artificial intelligence training scenarios.

This new version of the H100 will enable peer-to-peer memory and process transfers between each GPU at extremely high speed and very low latency, delivering even higher general accelerated compute performance.

Another new class of GPUs — NVIDIA A16 GPUs — will also be installed to greatly expand the number of visually interactive desktops available for researchers using onBunya, a recently introduced platform for new HPC users and those needing to run graphical user interface (GUI) applications on Bunya. (Read our article about onBunya.)

Overall, Phase 3.0 will add the following to Bunya:

  • a significant fleet of the latest NVlink connected, liquid-cooled (see our separate section about this unique feature below) NVIDIA H100 GPUs for massive AI and machine learning workloads
  • a new set of the NVIDIA L40S GPUs for best-in-class FP32 performance
  • an extensive array of the new NVIDIA visualisation A16 GPUs
  • almost 2,000 extra cutting-edge AMD Genoa CPU cores.

Bunya is consistently refreshed and upgraded every year to bring new and cutting-edge technologies to UQ researchers. This accelerates research time to discovery, gives UQ agility in technology leadership, and provides a broad economy of scale.

RCC will announce when BP3 has become operational.

The upgrade was preceded by:

Bunya is UQ’s first heterogenous supercomputer — all capabilities consolidated into a very large, scalable platform.

Liquid cooling

Bunya Phase 3.0 (BP3) is unique as it is UQ’s first hardware to use liquid to cool GPUs.

RCC Director Jake Carroll said the more efficient liquid-assisted air cooling technique is the beginning of a better, more sustainable path to heat-capture and ultimately, a smaller carbon footprint in supercomputing.

“The new GPUs in BP3 are so energy dense and hot, we need a better way to transfer heat away from them. At 700W per GPU and with four of them in a single node, air alone is no longer the right answer. It’s a pretty big deal for us to be moving to this type of liquid cooling technology and spells the direction of our industry,” said Jake.

“This new cooling technology means we’re not using as much power in the Data Centre to do the same job. It takes more energy to use fans to cool something than it does with liquid.

“This is step one for us in liquid cooling. We will take it further over time, with more complete heat-capture, better integration and more efficiency as liquid cooling technologies mature, become ubiquitous and standards settle. It is a generational shift for our industry.”

The move to liquid cooling is to cope with the demands of the rise of AI and machine learning in research and other accelerated supercomputing workloads at UQ.

Latest