For challenging prediction problems in areas like medical diagnosis, text or image recognition, non-parametric machine learning methods including deep neural networks, support vector machines, boosting methods or random forests are typically used. These models are able to make highly accurate predictions, but their runtime does not scale well with increasing number of data points. For a given prediction problem, it is often unclear which modeling approach is optimal. Parallelization is one obvious way to reduce the required time for computing on large datasets. Complex configurations should be optimized by intelligent methods. Because of the high computational costs of these models, only black-box-optimizers with the capability to handle strongly restricted budget of evaluations can be used.
For a given large-scale dataset the optimal model should be selected based on a large space of potential methods, preprocessing options and algorithmic settings. To achieve this, modern and efficient optimizers like Iterated-Racing and MBO should be used in highly parallelel versions, which will be implemented and evaluated in this project. The aim is a highly parallel, fully automatic model selection engine, which runs on the LRZ HPC systems.
Coordinator(s): Prof. Dr. Bernd Bischl
Stuff: Janek Thomas