UQ trials advanced AI hardware in sector first

13 December 2024

The University of Queensland is the first in Australia’s education sector to trial Lenovo and chipmaker AMD’s next-generation GPU hardware for the most demanding artificial intelligence and high-performance computing applications.

Lenovo, a multinational technology company, has provided its ThinkSystem SR685a v3 server, featuring the eight-way AMD Instinct MI300X 192GB 750W Accelerator platform, to UQ’s Research Computing Centre (UQ RCC) for trial and testing in a university scientific research setting.

UQ School of Electrical Engineering and Computer Science Senior Lecturer Dr Shekhar “Shakes” Chandra is the University’s first researcher to test the hardware with RCC’s assistance.

Dr Chandra is an imaging expert that leads a strong deep learning, AI focused research team interested in medical image analysis and signal and image processing applied to many areas of science and medicine.

“For researchers like me looking to create new models that mimic human vision, the AMD MI300x technology is a game changer because instead of waiting four days to know if your model is working correctly, now we can tell how good it is in several hours, but also work on far larger and more sophisticated models than we’ve ever been able to do before,” said Dr Chandra. (Read more about Dr Chandra's experience with this new technology further below.)

UQ RCC Director Jake Carroll said a broad range of data- and HPC-intensive research domains — including, but not limited to, artificial intelligence, astrophysics, climate and weather, computational chemistry, computational fluid dynamics, earth science, genomics, geophysics and molecular dynamics — will all potentially benefit from access to the advanced technology.

“This system is unusual as it has extremely high GPU memory density. At 192GB of HBM3 per GPU and with eight of those tightly linked together, this system has the potential to enable Large Language Model (LLM) work in generative AI with many billions of parameters, easily,” said Mr Carroll.

“Simultaneously, the AMD MI300x GPU is flexible because it also provides excellent single and double precision performance, enabling acceleration for traditional HPC codes in ways we’ve not had as much opportunity to understand and explore yet.

“Diversity in the GPU ecosystem is key for a healthy supercomputing future. It is my hope to take leading capabilities and technologies like these and get them in the hands of our researchers fast, enabling new paths for exploration that were not possible previously.”

Sinisa Nikolic, Lenovo’s Asia-Pacific Director of HPC, AI and Cloud Solution Providers, said the company’s collaboration with UQ marks a significant step in accelerating research capabilities across Australia’s education sector.

“With Lenovo’s ThinkSystem SR685a v3 server powered by AMD's MI300X accelerators, we’re bringing advanced AI and HPC resources directly into the hands of researchers, empowering them to tackle data-intensive challenges across disciplines,” said Mr Nikolic.

“This initiative not only enables pioneering work in fields like genomics, climate science, and physics, but also underscores our commitment to driving innovation with sustainable, energy-efficient technologies that align with the evolving needs of modern research institutions.”

The Lenovo/AMD technology provides eight MI300X accelerators on a single platform, providing the fast acceleration, large memory and Input/Output bandwidth to handle huge datasets for intensive AI tasks, such as generative AI and LLMs.

Machine learning and LLMs have become highly data intensive, with a need to split jobs across multiple GPUs. AMD Instinct accelerators facilitate large models with shared memory and caches.

The platform also includes power efficiency, flexibility with other systems, and scalability.

Mr Carroll said: “With data centre economics and sustainability now a critical factor in the design and implementation of GPU/AI enabled supercomputing, understanding power and energy efficiency is now top of mind to almost all research and advanced computing centres around the world.”

According to Mr Carroll, this technology is being used by several AI NeoCloud providers, technology giants such as Microsoft and Meta, and other “hyperscale customers.”

View the ThinkSystem AMD MI300X technical specifications.

UQ researcher tests AMD MI300x

By Dr Shekhar “Shakes” Chandra, Senior Lecturer, UQ School of Electrical Engineering and Computer Science (EECS)

As a researcher in imaging, we utilise a lot of computer vision techniques and require training models for vision. These techniques are very computationally intensive because they involve scanning and searching images for just the right information to make sense of the content in the images (called features) to train models and then use those features to make decisions at runtime, i.e., when we need deploy these techniques in practice.

This [AMD MI300x] hardware makes an incredible difference on how much of the computation can be done in parallel for us when training our models. Usually we have access to one to three main graphics processing units (GPUs) at once, which are like CPUs but can do more things in parallel. This new hardware gives us access to 8 GPUs in a single node, which is simply incredible.

To give an example, one of the standard benchmarks for imaging and computer vision is the ImageNet dataset that consists of 1.4 million images at 2K resolution across 1,000 categories. It currently takes four days to train a simple well-established model called ResNet on a single GPU, but with this new hardware, it takes a mere 5.5 hours — 17.5 times faster! As models get larger and more sophisticated, this training time grows exponentially.

Due to the speed of the [AMD MI300x] technology, this will make creating new state-of-the-art models so much easier.

Latest