Jury :
In the last few decades, many scientists were concerned with the fast extinction of languages. Faced with this alarming decline of the world’s linguistic heritage, action is urgently needed to enable fieldwork linguists, at least, to document languages by providing them innovative collection tools and to enable them to describe these languages. Machine assistance might be interesting to help them in such a task.
This is what we propose in this work, focusing on three pillars of the linguistic fieldwork : collection, transcription and analysis.
Recordings are essential, since they are the source material, the starting point of the descriptive work. Speech recording is also a valuable object for the documentation of the language. The growing proliferation of smartphones and other interactive voice mobile devices offer new opportunities for fieldwork linguists and researchers in language documentation. Field recordings should also include ethnolinguistic material which is particularly valuable to document traditions and way of living. However, large data collections require well organized repositories to access the content, with efficient file naming and metadata conventions. Thus, we have developed Lig-Aikuma, a free Android app running on various mobile phones and tablets. The app aims to record speech for language documentation, over an innovative way. It includes a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping. Lig-Aikuma proposes a range of different speech collection modes (recording, respeaking, translation and elicitation) and offers the possibility to share recordings between users. Through these modes, parallel corpora are built such as “under-resourced speech - well-resourced speech”, “speech - image”, “speech - video”, which are also of a great interest for speech technologies, especially for unsupervised learning.
After the data collection step, the fieldwork linguist transcribes these data. Nonetheless, it can not be done — currently — on the whole collection, since the task is tedious and timeconsuming. We propose to use automatic techniques to help the fieldwork linguist to take advantage of all his speech collection. Along these lines, automatic speech recognition (ASR) is a way to produce transcripts of the recordings, with a decent quality.
Once the transcripts are obtained (and corrected), the linguist can analyze his data. In order to analyze the whole collection collected, we consider the use of forced alignment methods. We demonstrate that such techniques can lead to fine evaluation of linguistic features. In return, we show that modeling specific features may lead to improvements of the ASR systems.