Analysing Big Data using Workflows: from fighting wildfires to helping patients
Dr Ilkay Altintas, Chief Data Science Officer, San Diego Supercomputer Center, University of California San Diego
There is a growing plethora of applications that require processing of streaming data, often defined as Big Data due to the volume, velocity and/or variety of the data to be processed. The need for dynamic capabilities in computing is increasing more than ever with the influence of such applications.
Over the last decade, scientific workflows and dataflow systems have emerged as a successful model for big data processing, especially in scenarios where a scalable and reusable integration of streaming data, analytical tools and computational infrastructure is needed.
Emerging heterogeneous computing architectures and cloud technologies are enabling workflows to be utilised as a scalable and reproducible programming model for data streaming and steering within dynamic data-driven applications.
This talk will summarise varying and changing requirements for scalability in distributed workflows influenced by Big Data and heterogeneous computing architectures including our ongoing research efforts on end-to-end performance prediction and scheduling for workflow-driven applications.
Dr Ilkay Altintas is the Chief Data Science Officer at the San Diego Supercomputer Center (SDSC), University of California San Diego, where she is also the Founder and Director for the Workflows for Data Science Center of Excellence.
Since joining SDSC in 2001, she has worked on different aspects of scientific workflows as a principal investigator and in other leadership roles across a wide range of cross-disciplinary projects.
She is a co-initiator of and an active contributor to the open-source Kepler Scientific Workflow System, and the co-author of publications related to computational data science at the intersection of scientific workflows, provenance, distributed computing and big data with applications to many scientific domains.