Three RCC staff attended the world’s largest international supercomputing conference last month in the United States.
RCC Director Jake Carroll, Senior Research Computing Systems Engineer Ashley Wright, and Research Systems Projects and Delivery Manager Sarah Walters were all fortunate to be at SC24 in Atlanta, Georgia from 17–22 November.
Jake spoke at the IBM Storage Scale User Group meeting that was co-located at SC24. UQ and IBM Australia are partners in a Centre of Excellence and Innovation. The Centre supports researchers working across a wide range of disciplines including, but not limited to, health, life sciences, manufacturing and environmental sciences.
SC24 was Jake’s eighth time attending the conference. He has written a report about SC24, below, covering his highlights of the conference, general technology trends, and innovations that will be implemented at the University of Queensland.
SC24, also known as the International Conference for High Performance Computing, Networking, Storage, and Analysis, had a record attendance of almost 18,000, which surpassed the previous record by several thousand. It also featured the largest show floor ever, with more than 500 vendors and organisations.
SC24: “Everything is revolving around LLMs”
By RCC Director Jake Carroll
SC24 was vibrant, exciting, inspiring and completely re-invented.
The growing interest in supercomputing is coming from corporate enterprises who are grappling with the problems we have been for many years, as they attempt to navigate the AI revolution. The reality is, AI has always been a supercomputing problem.
There were far too many innovative ideas and products at SC24 to count, but in the first instance, RCC will be implementing local offline fully sovereign Large Language Models (LLMs) for UQ researchers by the end of this year. See my highlights below for more information about these LLMs.
Highlights
- The rise and maturation of diverse cooling solutions to deal with the ever-increasing thermal design point (TDP) of the technologies we use in digital research infrastructure (DRI) is now taking up a substantial part of every infrastructure discussion and much of the real estate on the show floor at SC. The open rack standards for both direct liquid to chip (DLC) and power delivery methods are now becoming prevalent and we have a much better understanding of what the future holds in data centre design.
- Full offline and local Large Language Models (LLMs) are a hot topic. Every institution wants sovereign AI capability – and there is a new expectation that research computing centres in institutions will provide “Inference as a Service” to researchers.
- The difficulties and challenges of training at scale for these LLMs are being openly talked about by companies like Meta, Microsoft and Google. Hardware reliability, scale and efficiency is difficult. It is a grand challenge to make it all work.
- The rise of the alternative GPU technologies. Companies like AMD, Sambanova, Cerebras and others are starting to become very serious contenders in accelerated computing – their technologies are competitive, compelling and differentiated. The GPU ecosystem is growing and we will once again have choice and options.
- The networking world is once again looking at standards, with challengers to Infiniband slowly but surely becoming more credible. The Ultra Ethernet Consortium is becoming well established and we will soon see a new breed of technology for interconnects ready for testing.
- Everyone is on the power efficiency bandwagon. Everyone is worried. Everyone should be worried. Lots of techniques are being discussed to do better in power consumption – but it is still early days and there are more questions than answers.
General trends
- Everything is revolving around LLMs and more generally, AI – but it isn’t the whole story. The rise of all the other infrastructures to support them is critical, and attention is being paid more carefully to the network, storage and workflows.
- Investment in data management is growing. The understanding of the importance of single namespace data storage management, trends in the building of scalable data fabrics and the importance of lowering data movement friction are slowly but surely being understood by the industry and broader community. We are at a point where things are so large in data generation and the requirement to store it, that we have little choice but to be much more intelligent about our data movement architectures.
- Investment in data governance is growing. FAIR is now not only about data being Findable, Accessible, Interoperable and Reusable, it is now critical that FAIR data is then found and accessed . If it isn’t, what did we really achieve, in all that effort?
- Provenance of data and workflows is a growing topic of interest in the scientific research community that is gaining more traction as we practice open science and scientific reproducibility atop digital research infrastructure.