Discrete versus Continuous Image Retrieval Matching Models.
Applications to Mobile Scene Recognition with Sensor Fusion.
Duration : 3 years
Gross salary : 1700 euros/month
This PhD thesis takes places in the project GUIMUTEIC (http://www.guimuteic.fr), dedicated to provide an augmented tourism visit. The project will design glasses embedding a small screen, a camera, and other sensors. This wearable device is about helping a user to locate in a museum, and detect what this user is looking at. Hence this system must recognize the scene using image and sensors signal flow. This thesis is about image retrieval, with data sensors fusion.
Among different possible solutions to scene recognition, one can use an Image Retrieval System (IRS) that search in a scene database, the most similar image taken from the mobile device. Most Image Retrieval systems are based the transformation of images into visual words [1]. This solution of using "visual Bag Of Words (BOW)" enables the reuse of matching functions coming from textual Information Retrieval (IR) domain. One important basic hypothesis of IR is the existence of a discrete and finite set of words (index terms) that are extracted from documents. Most IR Systems do not take into account uncertainty related to term identification. When applied to image, continuous computed features have to be transformed into discrete visual words. Other approaches [2,3] directly exploit untransformed image features to compute a visual distance between images. In that case, we cannot directly reuse IR matching models: instead specific fast algorithms are proposed [4].
This thesis will then study the discrepancy of these two approaches, in order to better understand the pros and cons of these two possible solutions. More specifically, one will express these two approaches in a single theoretical model, to discover the meta-parameters that influence retrieval efficiency.
One outcome of this work will be a new theoretical fused IR model that describes both the discrete and continuous indexing approaches. This theoretical model will then be instantiated into an effective IR matching model. Moreover, this matching model will be tailored so to include context sensors (GPS location, magnetometer, accelerometer, etc.) and to fit into a mobile device for real time scene recognition.
The planning of the work it the following:
- Year 1: state of the art in Image Retrieval using Information Retrieval and Bag of Word approaches, comparison with direct feature based approaches. Analysis of sensor fusion with image retrieval. Definition of portable devices limitations. Acquisition of corpus from GUIMUTEIC partners and first proposal.
- Year 2: Proposal of the theoretical model, and derivation of operational matching function for the mobile-based image retrieval with sensor fusion. Experiments on test collections build the first year.
- Year 3: Final adaptation of the matching model. Final integration on the real wearable GUIMUTEIC device, and on site experimentations (Le Pont du Gars, Musée de Lyon).
Requirements for applicants: candidates must have a Master degree in Computer Sciences (with a research report), or equivalent, some knowledge in Information Retrieval, machine learning and Image processing. C/C++/Java programming, mobile devices programming, prototypes and experiments, are appreciated. Fluency in English is essential.
Submission of applications: Applications (CV, motivation letter, internship report, certificates with marks of the last two years, referees letters and/or contacts) have to be sent to: Jean-Pierre-Chevallet (Jean-Pierre.Chevallet@imag.fr), and Philippe Mulhem (LIG, Philippe.Mulhem@imag.fr).
References
[1] J.Sivic, A.Zisserman, "VideoGoogle : A text retrieval approach to object matching in videos", in: International Conference on Computer Vision, 2003.
[2] O. Chum, J. Philbin, and A. Zisserman, “Near duplicate image detection: min-hash and tf-idf weighting,” in BMVC, September 2008 .
[3] Torralba, A.; Fergus, R.; Weiss, Y., "Small codes and large image databases for recognition," Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on , vol., no., pp.1,8, 23-28 June 2008
[4] Norouzi, M.; Punjani, A.; Fleet, D.J., "Fast search in Hamming space with multi-index hashing," Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on , vol., no., pp.3108,3115, 16-21 June 2012