Membres du jury :
In this thesis we focus on learning text representations based on the distributional hypothsis stating that linguistic items with similar distributions should have similar meanings. In the first part of the thesis, we consider probabilistic topic models for monolingual and bilingual text corpora. We identify some of the limitations of such models, for instance the fact that they do not account for text structure, and we propose ways to alleviate them. The second part of the thesis focuses on word embeddings, that is continuous word representations learned with neural networks. We investigate different settings of text classification and document retrieval problems. We propose algorithms that benefit from the expressiveness of word embeddings, either using deep neural networks or a re-formulation of the optimal transport problem.