Key elements of creating a sustainable infrastructure environment for HPC systems
Tabitha K. Samuel, Group Lead, HPC Operations group, National Institute for Computational Sciences, University of Tennesse
Managing and effectively administering a HPC system is a complex affair. There are several pieces that have to work perfectly together to make for a reliable and secure, yet easy to use HPC system. There have been immense strides made in the field of system administration when it comes to HPC in the past few years.
In this talk I will be focusing on some key areas of administering a HPC system, namely the areas of configuration management, monitoring, intrusion detection and intrusion prevention systems, and advances in authentication mechanisms.
The National Institute for Computational Sciences (NICS) at the University of Tennessee is one of the leading high performance computing centers for excellence in the United States. NICS’s flagship computing system, Kraken, the first academic computer to break the petaflop barrier, enabled researchers in numerous scientific arenas, from climate to materials science to astrophysics, to achieve breakthroughs not yet possible on other resources.
In this talk, I will also be drawing upon real world implementation examples from the administration of Kraken and other supercomputers subsequently deployed at NICS.
Tabitha K. Samuel is the Group Lead for the HPC Operations group at the National Institute for Computational Sciences, University of Tennessee. The group manages all aspects of filesystems, cluster management, networking, security, infrastructure and systems programming for the center.
Her responsibilities also include contributing to current and future planning for the center, proposal writing and submission, and budget and project management.
Her research interests are in the areas of grid software, high performance computing system data analysis, and user experience and engagement.
Ms Samuel earned her Masters in Computer Science degree from the University of Tennessee, Knoxville, where her research focus was parallel computing for data mining applications.