Georgios Balikas - Mining and Learning from Multilingual Text Collections using Topic Models and Word Embeddings

12:00

Friday

Oct

2017

Thesis defence

Place:

IMAG Building Amphitheatre

Organized by:

Georgios Balikas

Speaker:

Georgios Balikas

Teams:

Membres du jury :

Cyril Goutte, chercheur senior au Conseil National de Recherches Canada , rapporteur
Gaël Dias, professeur à l'Université de Caen, rapporteur
Laurent Besacier, professeur à l'Université Grenoble Alpes, examinateur
Patrick Gallinari, professeur à l'Université Pierre et Marie Curie, examinateur
Guillaume Vernat, chercheur, Coffreo, examinateur
Massih-Reza Amini, professeur à l'Université Grenoble Alpes, directeur de thèse

In this thesis we focus on learning text representations based on the distributional hypothsis stating that linguistic items with similar distributions should have similar meanings. In the first part of the thesis, we consider probabilistic topic models for monolingual and bilingual text corpora. We identify some of the limitations of such models, for instance the fact that they do not account for text structure, and we propose ways to alleviate them. The second part of the thesis focuses on word embeddings, that is continuous word representations learned with neural networks. We investigate different settings of text classification and document retrieval problems. We propose algorithms that benefit from the expressiveness of word embeddings, either using deep neural networks or a re-formulation of the optimal transport problem.