Major upgrade for HPC Bunya goes live

13 December 2024

A significant hardware upgrade for the University of Queensland supercomputer Bunya, named Bunya Phase 3.0 (BP3), went live on Monday, 23 September 2024.

BP3 is delivering even higher performance across multiple scientific research domains with a focus on multi-GPU training, large language models (LLMs) and generative AI (Gen-AI).

All Bunya users have automatic access to almost all the new Bunya nodes, except researchers need to apply to access the new four-way NVlink-connected, liquid-cooled NVIDIA H100 SXM5 GPU nodes for massive AI and machine learning workloads.

These new nodes feature four NVLink interconnects between each GPU, enabling memory sharing between each GPU at more than 900GB per second.

RCC Director Jake Carroll said: “For researchers, this means that memory intensive codes, very large AI training workloads and tasks that require tight coupling between each GPU benefit significantly, decreasing time to solution and increasing overall efficiency of each job run. It can be thought of as having a cohesive mesh of 320GB of GPU memory available to feed each of the four GPUs.”

Researchers who demonstrate their code will (a) make efficient use of the GPU and utilise it well, and (b) can demonstrate the use of large amounts of GPU memory in combination with high utilisation of GPU cores, are being asked to apply for access to the H100 SXM5 nodes. Researchers may contact the RCC Service Desk (rcc-support@uq.edu.au) to request access.

As well as the H100 SXM5 nodes, RCC has also installed an extensive array of the new NVIDIA visualisation A16 GPUs into Bunya for researchers using onBunya, a platform for new HPC users and those needing to run graphical user interface (GUI) applications on Bunya. (Read our article about onBunya.)

The A16 GPUs enable RCC to provide many accelerated graphical desktops to researchers, providing rendering, 4k video decoding, wireframe mesh acceleration and a smooth workstation experience inside the supercomputer.

“This is particularly useful for in-situ workloads, real-time feedback, interactive experiments where intervention is required, or where visual characterisation is important,” said Jake.

RCC Senior Developer and Consultation Manager Dr Marlies Hankel said the A16 GPUs will help with the demand for graphical accelerated desktops. “This in turn will free up the other GPUs for other work and in turn help with the demand there,” said Marlies.

A new set of the NVIDIA L40S GPUs for best-in-class FP32 performance has also been added to Bunya. The L40S GPUs expand on the capabilities of the L40 GPUs, providing an even more powerful and capable balance of machine learning and AI capability, inference performance and graphical rendering capabilities.

“This is an ideal workhorse device where the research requires a blend of many capabilities (such as AI inference and training; molecular dynamics; and CAD, CAM, and GIS) in the one onBunya desktop environment,” said Jake.

“The L40 and the L40S have been found to be even faster than NVIDIA’s dedicated A100 GPUs for machine learning and AI workloads.

“Lastly, RCC has added more than 1,500 cutting-edge AMD Genoa CPU cores as we scale Bunya in all directions to cater to the widest range of research workloads, including traditional software which is not GPU accelerated yet,” said Jake.

As well as delivering a significant array of new visual computing capabilities, BP3 ushers in a new era of using direct-to-chip liquid cooling technology to more efficiently cool supercomputer components.

Read our previous article about BP3, including more information about the direct-to-chip liquid cooling technology.

BP3 provides hundreds of UQ researchers with access to one of the most powerful supercomputing platforms in Australia’s Group of Eight (GO8) Universities.

Latest