Automating Predicitve Modelling & Knowledge Discovery

Date: 24th April 2018
Speaker: Ioannis Tsamardinos from The University of Crete 

The Data Science Institute would like to thank Ioannis Tsamardinos from the University of Crete, who presented 'Automating Predicitve Modelling and Knowledge Discovery' as part of our Advanced Data Analytics Seminar Series. 

Bio: Ioannis Tsamardinos, Ph.D., is a Professor in the Computer Science Department of UoC. He obtained his Ph.D. (in 2001) from the Intelligent Systems Program of University of Pittsburgh. Subsequently, he joined the faculty of the Department of Biomedical Informatics at Vanderbilt University until 2006 when he returned to Greece. His research interests lie in the field of Machine Learning, Data Science, and Bioinformatics and particularly variable selection, causal discovery, and automation of machine learning. He has mostly applied such methods on Bioinformatics and Biomedical Informatics. Ioannis Tsamardinos has over 100 international refereed publications in journals, conferences and edited volumes, more than 6000 citations in Google Scholar, and 2 US patents. He has been been awarded the ERC Consolidator Grant and the Greek national grant on research excellence ARISTEIA II.

Abstract: There is an enormous, constantly increasing need for data analytics (collectively meaning machine learning, statistical modeling, pattern recognition, and data mining applications) in a vast plethora of applications and including biological, biomedical, and business applications.

The primary bottleneck in the application of machine learning is the lack of human analyst expert time and thus, a pressing need to automate machine learning, and specifically, predictive and diagnostic modeling. In this talk, we present the scientific and algorithmics problems arising from trying to automate this process, such as appropriate choice of the combination of algorithms for preprocessing, transformations, imputation of missing values, and predictive modeling, tuning of the hyper-parameter values of the algorithms, and estimating the predictive performance and producing confidence intervals. In addition, we present the problem of feature selection and how it fits within an automated analysis pipeline, arguing that feature selection is the main tool for knowledge discovery in this context.