Bo Li - Mesurer et améliorer la qualité des corpus comparables

12:00

Tuesday

Jun

2012

Thesis defence

Place:

Campus - Amphi MJK

Organized by:

Bo Li

Speaker:

Bo Li

Teams:

I will present on Tuesday, June 26 my dissertation entitled "Measuring and Improving Comparable Corpus Quality" (in French : Mesurer et améliorer la qualité des corpus comparables). The defense will take place at 2:00PM in LIG-Maison Jean Kuntzmann. Here is the list of jury members :

M. Laurent Besacier, PR, Université de Grenoble (Président)
M. Emmanuel Morin, PR, Université de Nantes (Rapporteur)
M. Pierre Zweigenbaum, DR, CNRS (Rapporteur)
M. Jian-Yun Nie , PR, Université de Montréal (Examinateur)
M. Jacques Savoy, PR Université de Neuchâtel Examinateur)
M. Eric Gaussier , PR, Université de Grenoble (Directeur de thèse)
M. Jean-Pierre Chevallet, MCF, Université de Grenoble (Co-Directeur de thèse)

Two main areas the presentation will cover are natural language processing and information retrieval. The short abstract of my thesis is included at the bottom of this email. You are cordially invited to attend the defense and the mini-reception right afterwards on the 0 floor (RDC) of the same building.

Different from previous studies exploiting comparable corpora, the work presented in this thesis aims at enhancing the quality of a comparable corpus in order to improve the performance of NLP tasks exploiting it. The idea is advantageous since it can work with any existing algorithm making use of comparable corpora. We concentrate on the following aspects : (1) We propose a comparability measure to quantify the degree of comparability of comparable corpora. This measure is developed within a simple probabilistic framework and can correlate well with gold-standard comparability levels. (2) With the proposed comparability measure, we develop two methods to improve the quality of any given comparable corpus. The efficiency of the methods is confirmed in terms of both the comparability scores and the quality of bilingual lexicons extracted from the enhanced comparable corpora. (3) The extracted lexicons are lastly used to enhance a novel information-based CLIR model.