The Summer School was organized by the SFB 876 research center at the TU Dortmund, Germany, during the 29.09. – 02.10.2014. It was focused towards the Machine Learning application using multi-core parallel optimization under constraints of the limited resources. The target group of the school were mostly PhD students from different areas of research that are using ML in their work. On the first day it was given a presentation by Céline Robardet on the graph theory and how it can be applied in case if mining of huge datasets are needed. Then, the presentation of the Streams framework was given. This framework allows to create a ”pipe” and feed any data including numerical series with random distributions or even images. Then, two-sessions lecture on k-means clustering was given. In fact we learned that the k-means can be formulated as a factorization of the matrix in a way X = W · H, where the main optimization problem is to find corresponding matrices W and H under defined constraints, which is not a trivial task. The factorization can be beneficial when it comes to the compression of the storage size that is needed to store the matrix. At the beginning of the second day the Jian-Jian Chen presented results on the scheduling methodology to reduce the overall time of client-server execution. It comes into place when so-called ”thin clients” require to execute set of granular tasks both remotely on the powerful server and locally. Then, the tasks execution has to be planned according to deadlines and required responses. Finally, the privacy learning using Information Theory was provided. By the end it was shown that by means of Entropy and other metrics it is possible to deduce the private information from anonymous datasets.
The third day was devoted to investigation the advantages/disadvantages of multi-core systems. In particular, it was shown that many cores on a lower speed will execute tasks with less power consumption than a single one. Some limitations on the efficiency and cache size were given. Then, Rich Caruana from Microsoft presented an extensive study of different ML models including ensemble classifiers with boosting and bagging. The accuracy of huge number of different combinations of classifiers and datasets were given. Also Deep Learning was investigated in terms of efficiency and model complexity. It was concluded that there is no need to build complex models in order to achieve better and faster results. In fact the model compression is a trade-off. The day was finished by the streams framework explanations together with useful examples of usage. The fourth day consisted of application of ML for astroparticle detectors using Android phones first and then simulating it on the Streams framework.
The main outcome from attending the summer school for me is in the broad understanding the data streams mining and corresponding mechanisms for parallel optimization in Machine Learning. Also a view on model compression in Neural Networks for Big Data was uniquely useful. The attendance was beneficial since I have got multiple ideas for my current work on access control as a data streams mining problem. Moreover, networking was important as well as bringing new international connections. Finally, people wonder a lot about the COINS research school because of t-short, yet nobody knew that it exists.