Michel Vacher
Emmanuel Vincent (Inria Nancy - Grand Est)
Lieu : Amphi F018 - UFR IM²AG (Bâtiment F) 60, avenue de la Chimie.
Le site web de Emmanuel Vincent
Speech recognition remains a challenging goal in everyday environments involving multiple background sources and reverberation. The popular "pipeline" approch involves two steps : 1. separating the target speech signal from the noise signal 2. applying a conventional speech recognizer to the enhanced signal.
In the first part of my talk, I will present a statistical modeling framework for audio source separation which makes it possible to jointly exploit various pieces of information about the sources and the environment. I will provide sound examples for the separation of speech vs. noise.
In the second part of the talk, I will argue that the "pipeline" approach yields suboptimal results due to the propagation of errors from the first step to the second step. I will introduce the uncertainty handling framework, which aims to replace the deterministic signal transiting through the pipeline by a full posterior distribution quantifying the confidence or the uncertainty in each part of the separated signal. I will show some achievements in that framework.