SLIDE stands for ScaLable Information Discovery and Exploitation. Our approach to research is data-driven and we develop algorithms and infrastructures for large-scale analytics, data linkage and ontology-based data access, crowd data sourcing and crowdsourced application evaluation. We often start doing research by acquiring data from different domains including user demographics, behavioral and opinion data on the social Web, user data on health and well-being, user and program traces and data center monitoring.
SLIDE combines efficient large-scale data processing with pattern mining algorithms to extract value from data. We work at the intersection of knowledge representation and reasoning, data mining, and data management.
We operate on variety of application domains such as the semantic and social Web, Health and well-being, Data center monitoring, and embedded systems.
Our research targets two types of users: experts who are interested in data mining and value extraction from large volumes of data; Novice users who are interested in finding information and content recommendation.
Our data mining and data processing axis (D for Discovery in SLIDE) covers pattern mining algorithms, parallel mining infrastructures such as MapReduce and Spark, and multicore processors, as well as social media data analysis. Our models and algorithms combine data mining with multi-dimensional indexing to discover information from raw data. Our applications for expert users enable advanced data exploration such as interactive search and ontology-based exploration. In this area, we also develop a data preparation framework (algebra and algorithms) for the cleaning and transformation of large volumes of data into usable data. We are also developing a crowdsourcing framework that optimizes data acquisition. Finally, we develop extensions of Datalog to express and infer the data linkage from multiple sources. Our exploitation axis (E for Exploitation in SLIDE) covers the development of distributed join algorithms. We combine the partitioning and placement of data with traditional joining algorithms for designing efficient data processing techniques on parallel and distributed infrastructures. We also develop ontology-based data access algorithms that allow analysts to explore large volumes of data using high-level concepts. Our applications for novice users are based on information retrieval and recommendation algorithms ranging from the search for relevant and diverse results to the definition and implementation of new recommendation semantics including social networks and various similarity functions between users.
Many of our applications are evaluated using methods borrowed from the information retrieval and machine learning domains. We are also exploring crowdsourcing for application evaluation. One of our recent axes is the design and implementation of models and algorithms for the efficient acquisition of data and the evaluation of applications via crowdsourcing. We focus on assigning tasks to workers in a context of crowdsourcing by optimizing human factors such as the expertise of workers and their availability.
Our most recent axis is data ethics. In particular, we develop models and algorithms for privacy in the social Web, and start exploring fairness and transparency in crowdsourcing.