PHD Position: Temporal Decision Trees
Starting date: September, 2015.
Keywords: Machine learning, Time series, Classification, Decision trees
Context and objectives:
Time series (i.e. sequences of data points, consisting of successive measurements made over a time interval) are observed in a growing number of domains, such as: medicine (quantified self), biology (gene profile), engineering (energy consumption)... Classification of time series consists in building a method which, in the training step, identifies similarities between time series belonging to the same category of phenomena (or class). Next, with the help of the knowledge learned during the training step, the classification system will be able to infer the class of new, yet unseen, time series.
Classifying time series is a huge challenge. Some approaches parametrize times series, for example using projection in a vectoriel space and then use classical classification method on this space of projected variables. Others work directly on time series, searching an accurate and interpretable classifier. Decision trees are well-known efficient methods allowing to provide understandable results. In previous work [Douzal, Amblard 2012], we extended classical decision trees to deal with temporal variables by introducing new temporal split criterion, based on a behavior/values similarity measure between series, this similarity being defined by three parameters. Our proposal achieved interesting results on simulated and real datasets but some improvements have to be found:
• Search optimization. Currently, the algorithm explore all possible candidates. It would be interesting to rewrite the problem in order to get an analytical solution,
• Improvement of the algorithm’s time and space complexities.
• Reduction of the variance. Decision trees suffers of stability problems. Many methods have been proposed to reduce the variance of the classifier: pruning, boosting the training set,random forests.. New methods, based on the introduction of variability in the separation criteria or/and using surrogate splits, will be explored.
Requirements: Candidate must be knowledgeable (or having a strong motivation) concerning Statistics, Machine Learning, Applied mathematics and must have good skill in development using programming languages such as Python, C or R.
Location: The PhD student will work in the AMA team (
http://ama.liglab.fr/) of the LIG laboratory (
https://www.liglab.fr), at Grenoble (France). AMA is a leading group in Machine Learning and Data Analysis with over 24 researchers (including PhD students) and covering several aspects of Machine Learning from theory to applications, including statistical learning, data-mining, and cognitive science. Grenoble is a town is Alps, not far from Lyon and Geneve.
Founding: This PhD thesis is part of IKATS project that is a Research and Development project founded by the french government in the frame of PIA program. Partners are : LIG, CSSI, AIRBUS and EDF- R&D
Application: The application should include a description of research interests and past experience, a CV, degrees and grades, and relevant publications if any. Candidates are encouraged to provide letter(s) of recommendation and contact information to reference persons.
———–
[Douzal-Chouakria A. and Amblard C]. Classification trees for time series. Pattern Recognition, 45(3) :10761091, 2012.