Read all about SC16 by those who were there

8 Dec 2016
Prof. David Abramson (behind the lectern) speaking at the DDN Storage booth at SC16.

Three RCC staff and one RCC-sponsored partner who last month attended SC16, the world’s largest supercomputing conference, in Salt Lake City, Utah, U.S., have written reports about their experience.

The conference, held 13–18 November this year, is centred on the practise, research and outcomes of scientific computing. With more than 10,000 attendees and hundreds of lectures, papers and briefings, it is the highlight of the year for all high-performance and research computing focused staff, researchers and industry professionals.

RCC's Prof. David Abramson, Dr Minh Dinh and Dr Jin Chao have all written about their experience at SC16, as has Jake Carroll, Senior IT Manager (Research) at UQ’s Queensland Brain Institute. Read the reports below.

 

Trip reports

Jake Carroll (QBI)

After a wonderful week at the University of California San Diego and the San Diego Supercomputing Center spent learning at a rapid rate, I said goodbye to my new friends and colleagues, and was destined for Salt Lake City, Utah for the 29th Annual Supercomputing Conference (SC16 as it is known). There, myself and RCC would carry on discussions of ‘where to next’ and further development of our new long distance parallel filesystems infrastructure with our colleagues from across the globe, as well as our engineering and technical/engineering resources inside the companies who make the switching, filesystems and hardware platforms possible.

This year, I was one of two partners of the RCC to be sponsored to attend. I am very fortunate to be offered this opportunity, as it provides me a unique chance to interact with the heads of the HPC focused industry, the cutting edge of computer science and the general global community and cohort of the practice we carry out in the research intensive focused parts of the university – such that I can ‘bring home’ valuable lessons, industrial direction and innovations to build upon. In conjunction with the RCC, it helps us to set the pace, continue to innovate and allows UQ’s research and supercomputing to flourish in dynamic and new ways, benefiting research.

Focal points and industry trends this year from the conference were wide ranging and dynamic, but in general there are several key themes that are being heavily investigated by both the technology development industry and the computer scientists.

Deep learning is here to stay

Deep learning is only becoming a larger premise in the path to exascale computing platforms. The majority of the large government laboratories in the U.S. have now adopted a strategy geared to exascale and ramp up by way of alternative computing architectures.

The path and the choices made are proving divisive and contentious in industry as two clear and different options have emerged. One in the form of nVidia’s GPU technologies (the Pascal and Volta series deep learning geared compute acceleration platforms) and the other in the form of Intel’s KNL (Knights Landing) native x86 vectorisation accelerators.

Further, and perhaps more interesting than the hardware itself, a new paradigm in supercomputing is emerging – whereby deep learning and FCN’s (fully convolutional networks) are being used to assess the size, scale and predicted efficiency of calculations of larger supercomputing scheduled jobs and workloads before the workloads run. This helps to better determine just how well an experiment ‘in silico’ may actually work before precious run-time, wall-time and CPU cores are used – much less power consumption and resources.

As a result of some of the excellent discussions, executive briefings and one-on-one meetings afforded to us as attendees of the conference with key industry R & D heads, CTO’s and engineering leads, UQ now has many of the frameworks, strategies and techniques understood to implement some of these next generation techniques on as yet unconstructed next generation accelerated compute infrastructure.

Memory technologies and general memory architecture is going to change, and change soon

Our in-depth briefings have shown us that the proverbial writing is on the wall for technologies such as DDR4 memory and other mainstays of industry when it comes to RAM. The next generation of technologies we’ve been shown from consortia such as the Gen-Z group of partners show a clear path to much faster, more programmatically accessible and extensible memory architectures.

Technologies such as UQ RCC’s/QCIF’s HPC ‘FlashLite’ are a contemporary precursor to such ideas, with storage-class memory, nVME and HBM based memory subsystems in ‘chains of cache’ becoming a reality.

A lot of the discussions with industry CTO leaders inside Dell, IBM, SGI, Cray et al pointed towards a very different future in memory management, performance and the use of open-systems architectures for I/O in compute and memory bound applications.

As per usual, parallel filesystems were the talk of the show — but things are changing rapidly and constantly in this space

With more and more performance being exacted (and expected) from parallel filesystems on a daily basis (such as UQ’s MeDiCI network and the FlashLite/Tinaroo filesystems – known as Spectrum Scale), a lot of effort in industry and academia is being put into making these platforms deliver where they need to.

UQ is at the forefront of many of these concepts and topics, implementing acceleration techniques “in memory” and “in storage” to deliver higher I/O and lower latency read and write operations to compute engines, designed to eliminate I/O-wait scenarios.

In discussions with key vendors, UQ has been able to make inroads into technologies such as IME (DDN’s Infinite Memory Engine) and IBM’s LROC and HAWC, allowing research workloads on our high-performance computing facilities to run unencumbered by I/O wait scenarios.

Finally, there was a lot of discussion about dynamically expandable and collapsible filesystem name-spaces to contend with the massive data sets and expectations of modern in-silico experimentation and ‘campaign’ style storage requirements. In simple terms, this means the efficiency of a filesystem or scale of a filesystem for a specific very large job or workload a researcher might be running is no longer a difficult or non-agile thing to deal with. It can be dynamically created, then collapsed when an experiment is over – saving systems administrators time, giving researchers instant capability and not creating a difficult infrastructure burden in the upkeep or funding of such endeavours. This is an exciting development that UQ will likely beta test some capabilities of in the near future, stemming from some of our discussions with key R & D personnel. 
 

Overall, SC16 was vibrant, inspiring and full of content that is going to be of significant value to the medium to long term futures of UQ’s research computing progress. Our interactions with our international counterparts are becoming stronger and more valuable every year.

It has been a wonderful experience full of learning and potential and I’d like to thank the RCC for their support in facilitating my trip and the many interactions this year. We’re certainly ending the year with technological highlights and some excellent outcomes.

 

Prof. David Abramson (RCC)

The IEEE Supercomputing Conference, otherwise known as ‘SC[insert year here]’ is huge: it usually attracts 10,000–15,000 delegates, consists of an extensive technical program, a number of outreach activities and a large show floor. First-time attendees are usually staggered by the show floor, which has every supercomputing vendor in the world (and even smaller companies) with booths where they demonstrate their latest products.

I gave two talks at SC16 — one as an invited keynote speaker in the WORKS workshop (Workflows in Support of Large-Scale Science), entitled "Using Scientific Workflows for Science and Engineering Optimisation”. This talk described RCC’s Nimrod/OK environment that merges the Nimrod parameter sweep tools and the Kepler workflow engine.

I also gave a talk at the DDN booth on the show floor, entitled “Caches All the Way Down: Infrastructure for Data Science”. This talk described the MeDiCI data fabric and the HPC FlashLite cluster at UQ, and how these support data-intensive applications.

I also attended a number of invited and keynote talks at the conference. These included: Dr. Katharine Frase on "Cognitive Computing: How Can We Accelerate Human Decision Making, Creativity and Innovation Using Techniques from Watson and Beyond?”; Dr Charlie Catlett (who spoke in our RCC seminar series last year) on "Understanding Cities through Computation, Data Analytics, and Measurement”; Janice Coen on "Advances and Challenges in Wildland Fire Monitoring and Prediction”; and Kristin Persson on "The Materials Project – A Google of Materials”.

I attended an excellent panel entitled "HPC and Precision Medicine: Researchers Are on the Brink of Finding Cures to Cancer and Other Deadly Diseases within This Generation but Only with the Power of HPC”. 

These sessions were just a small fraction of the technical program, which can be seen on the SC16 website.

I met with a number of supercomputing vendors and discussed their latest products and our plans at UQ. These included DDN, IBM, Dell, SGI, Cray and Huawei.

I was present for the annual National Centre for Supercomputing Applications International Affiliates Meeting, and learned of a number of interesting developments at NCSA. These developments included a presentation on the National Data Service, the Midwest BigData Hub, Research and Education Grand Challenges at NCSA, SAVI: Global Initiative to Enhance @scale and distributed Computing and Analysis Technologies (GECAT) and New XSEDE Annual Conference Format PEARC17 (Practice and Experience in Advanced Research Computing).

It was a pleasure to hear my friend Bill Gropp’s talk where he accepted the Ken Kennedy Award for highly influential contributions to the programmability of high-performance parallel and distributed computers. Bill has also given talks in our RCC seminar series on a number of occasions.

It was a particular delight to see our high school students from the John Monash Science School (in Melbourne), The Queensland Academy of Science, Maths and Technology and Brisbane’s Faith Lutheran College attending the conference, and learning so much. We acknowledge the support of SGI and ScaleMP who provided financial support, and SGI who also provided a booth tour. 

The conference was graced with a fine dusting of snow on the last night as a reminder that winter was coming to the northern hemisphere.

 

Dr Jin Chao (RCC)

The Supercomputing (SC) conference is the most important event for the high performance computing (HPC) community, and is held annually in the U.S.A. This year, SC16 was held 13–18 November in Salt Lake City, Utah.

Each year’s topics typically cover the most significant research problems, solutions for challenging technical issues, visions for future direction, and awards for important achievements. The conference program consists of exhibitions, plenary talks, panel discussions, workshops, and technical paper presentations.

This year’s conference theme was “HPC matters”. Plenary talks reviewed the impact of HPC on discovering new medicines to make our lives better, and accelerating research on the next generation of artificial intelligence.

I attended many sessions based on my interests, including energy efficiency of parallel computing, scalable storage systems, reproducibility of extreme scale computing, and large-scale data analysis.

In terms of the current power concern for exascale computing, Thomas Theis presented some hardware technologies that have the potential to decrease power consumption 100 times without sacrificing performance, such as the tunnel field-effect transistor (TFET). However, such devices still have reliability issues, which must be solved before practical usage.

My RCC colleague Dr Minh Dinh delivered a talk about our research on verifying exascale computing results in the Extreme-Scale Programming Tools (ESPT) workshop, which attracted significant interest from the audience.

This year, the ACM Gordon Bell Prize was awarded to a Chinese team with the research topic “10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics.” During its 30-year history, this was the first time China won this prestigious award. It probably indicates that in the following decades, China will play an important role in the domain of HPC.

I have learnt a lot from this conference, including the most recent research progress on important domains, such as saving power usage for HPC, accelerating data analysis using HPC facilities, improving the usage of Cloud in scientific computing, and tackling uncertainty issues in extreme scale computing. I believe this new knowledge will benefit my research and help me to produce outcomes that are useful for RCC and UQ researchers.

 

Dr Minh Dinh (RCC)

This was my second time attending SC (the last time was back in 2011 in Seattle) and as much as I thought I knew what it’s all about, I was still absolutely overwhelmed.

This year’s theme was ‘HPC Matters’ where HPC solutions focus on big data and deep learning problems. We had a plenary session that discussed how HPC matters in supporting ‘Precision Medicine’ and a keynote session on the impacts of Cognitive Computing.

In the Extreme-Scale Programming Tools workshop, I delivered a talk about RCC’s latest effort in using statistics to support runtime verification of large-scale scientific codes.

I also attended several talks in this session, which discussed issues and solutions for I/O in HPC. I was introduced to the unum floating format presented as part of an analysis technique that could help verify the accuracy of floating point values in scientific computation.

Attending the SC technical program this year, my interest was on the topics of in-situ processing and energy-aware computing. As a result, I was drawn to the In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualisation (ISAV-16) workshop and the Energy Efficient Supercomputing (E2SC-16) workshop.

A talk in E2SC-16 caught my attention because it described an energy observation model very similar to the approach we are engaging at RCC as part of our energy optimisation project.

The exhibition area of SC16 was fun, exciting and full of participants. There were many booths with technology vendors and institutions from around the world. I was particularly impressed with the Intel team and their guided traveller tour. And I was captivated at the Paraview booth as I was interested in the Paraview Catalyst software. Paraview Catalyst enables in-situ analysis and visualisation of large-scale scientific applications.

Finally, regarding the ‘wilder’ sides of the SC conference, the Mellanox party was fun with food, drink and the stand-up comedian Gabriel ‘Fluffy’ Iglesias. Mellanox’s CEO presented its latest products, including the smart switch with only 90ns latency, a smart network adapter with 200Gb/s in speed and 600ns end-to-end latency and the 200Gb network cable.

Overall, SC16 was fun, exciting, engaging, and of course super crowded. A tip for first-time attendees (and even second-time attendees like myself): you need to be very clear about your interest and what you are after before entering the venue because there is much to learn, much to interact with and many materials to digest.

Latest