James L Crowley - Put That There: 30 Years of Research on Multimodal Interaction

14:00

Thursday

Mar

2019

Keynote Speech

Organized by:

L'équipe Keynote du LIG : Nicolas Peltier, Renaud Lachaize, Dominique Vaufreydaz

Speaker:

James Crowley (GrenobleINP / LIG)

Teams:

James L. Crowley is a Professor at Grenoble Polytechnique Institute (Grenoble INP), at the Univ. Grenoble Alpes, where he teaches courses in Computer Vision, Machine Learning and Artificial Intelligence. He directs the Pervasive Interaction research group at INRIA Grenoble Rhône-Alpes Research Center in Montbonnot, France.
Over the last 35 years, professor Crowley has made a number of fundamental contributions to computer vision, robotics and multi-modal interaction. These include early innovations in scale invariant computer vision, localization and mapping for mobile robots, appearance-based techniques for computer vision, and visual perception for human-computer interaction.
Current research concerns context aware observation and modeling of human activity, Ambient Intelligence, and new forms of Human-Computer Interaction based on machine perception.

Humans interact with the world using five major senses: sight, hearing, touch, smell, and taste. Almost all interaction with the environment is naturally multimodal, as audio, tactile or paralinguistic cues provide confirmation for physical actions and spoken commands. Multimodal interaction seeks to fully exploit these parallel channels for perception and action to provide robust, natural interaction.
Richard Bolt’s "Put That There" (1980) provided an early paradigm that demonstrated the power of multimodality and helped attract researchers from a variety of disciplines to study a new approach for computing that moves beyond desktop graphical user interfaces (GUI). A series of workshops on Perceptual User Interfaces, as well as the organization of the 1st ICMI in Beijing in 1996 and eventually to the creation of the ACM Transactions on Interactive Intelligent Systems, in 2011.
In this talk I will look back to the origins of the scientific community of multimodal interaction, and review some of the more salient results that have emerged over the last 30 years including results in machine perception, system architecture, and human-computer interaction. I will illustrate these with demonstrations of multimodal interaction with smart environments, constructed in Grenoble in the period 1990 to 2010.
Recently a number of game-changing technologies such as deep learning, cloud computing, and planetary scale data collection have emerged to provide robust solutions to historically hard problems. As a result, scientific understanding of multimodal interaction gas taken on new relevance as construction of practical systems becomes feasible. I will discuss the impact of these new technologies and the opportunities and challenges that they raise, and conclude with a discussion of the importance of convergence with cognitive science and cognitive systems to provide foundations for intelligent, human-centered interactive systems that learn and fully understand humans and human-to-human social interaction, in order to provide services that surpass the abilities of the most intelligent human servants.