Kernel methods provide a mathematically rigorous way of learning, however they usually lack efficiency on large amounts of data due to a bad scaling in the number of data points. Furthermore, they are flat models, in the sense that they consist only of one linear combination of non-linear functions. Another drawback is that they do not allow for end-to-end learning, since the model learning is decoupled from the data representation learning. In contrast, Neural Network techniques are able to make use of such large amounts of data and computational resources and combine the representation learning with the model learning.
Based on a recent Representer Theorem for Deep Kernel learning , we examine different setups and optimization strategies for Deep Kernels including some theoretical analysis. We show that - even with simple kernel functions - the Deep Kernel approach leads to setups similar to Neural Networks but with optimizable activation functions. A combination of optimization and regularization approaches from both Kernel methods and Deep Learning methods, yields to improved accuracy in comparison to flat kernel models. Furthermore, the proposed approach easily scales to large amounts of training data in high dimension which is important from the application point of view. Preliminary results on a fluid dynamics application (with a dataset with up to 17 million data points in 30 dimensions show favorable results compared to standard Deep Learning methods .
 Bohn, Bastian, Christian Rieger, and Michael Griebel. "A Representer Theorem for Deep Kernel Learning." Journal of Machine Learning Research 20.64 (2019): 1-32.
 T. Wenzel, G. Santin, B. Haasdonk: "Deep Kernel Networks: Analysis and Comparison", Preprint, University of Stuttgart, in preparation.