Radu Patrice Horaud (Ph.D’81) holds a position of director of research at Inria. Previously, he was a postdoctoral researcher at SRI International, Menlo Park, CA (1982-1984) and a CNRS researcher (1984-1998). For the past 10 years, Radu and his collaborators have developed a multidisciplinary research program, at the cross-roads of computer vision, audio signal processing, machine learning, and robotics. Radu coordinated several collaborative European projects and was awarded two ERC projects, an advanced grant (2014-2019) and a proof-of-concept grant (2018-2019).
In this talk, I will give an overview of the research carried out by the Perception team (Inria and Laboratoire Jean Kuntzman) for the past five years. I will start by stating the scientific challenges of fusing audio and visual data, in contrast to other data fusion paradigms. I will discuss audio-visual alignement and audio-visual tracking in the context of multiple users interacting with a robot or, more generally, with an intelligent agent. I will emphasize the complementary roles played by visual and audio perception and I will address in detail the problems associated with fusing these two modalities in unrestricted settings, such as interaction with a robot in a complex environment. Finally, I will discuss the challenges of combining multimodal perception with speech communication and with robot control.