Project Description
Interoperability of heterogeneous data is a critical problem faced by every modern enterprise that is concerned with data analysis, data migration, and data evolution. The fundamental goal in data interoperability is to facilitate and make transparent to end-users the extraction of information from multiple heterogeneous data sources that reside in different locations. At the heart of achieving data interoperability is the design and management of schema mappings. A schema mapping is a specification of the relationship between two database schemas. Schema mappings are the essential building blocks in specifying how data from different sources are to be integrated into a unified format or exchanged (i.e., translated) into a different format.
The intellectual merit of this project is the development of a solid foundation and a suite of techniques and tools for designing, understanding, and managing schema mappings. Earlier foundational work on schema mappings has mainly focused on the semantics and algorithmic issues of some of the basic operators for manipulating schema mappings with emphasis on the composition operator and the inverse operator. While the composition operator is well understood by now, much more remains to be done in the study of the inverse operator. One of the main goals of this project is to investigate in depth the inverse operator and also the difference operator, which remains largely unexplored to date. This project addresses several fundamental questions for the inverse and the difference operators, including the following: What is the right semantics for these two operators? What is the exact language for expressing these operators? Are there efficient algorithms for computing the result of the inverse operator and the difference operator? A parallel goal of this project is the development of a set of concepts and techniques for optimizing schema mapping and transforming more complex schema mappings into simpler, yet equivalent, ones. The final main goal of this project is to study the problem of using data examples to explain and illustrate schema mappings. The design of schema mappings between two schemas has been known to be one of the most costly and time-consuming tasks in achieving data interoperability. Prior studies have suggested that (familiar) data examples can be extremely powerful aids in designing schema mappings. This project addresses the following questions: What is the right notion or notions of illustrative data examples for schema mappings? How easy or difficult it is to compute small examples for illustrating schema mappings? How can one illustrate large and complex networks of schema mappings with data examples? How can one depict the similarities and differences among multiple schema mappings?
The broader impact of this project is the development of human resources in science and engineering through the teaching, mentoring, and research training of graduate and undergraduate students on the foundational and system development work of this project. Information about publications, course material, and software prototypes and tools developed through this project can be found at this project web page.
This research project is funded by the American Recovery and Reinvestment Act of 2009 (Public Law 111-5) under NSF grant IIS-0905276.
Faculty
Participants
Publications
-
Learning Schema Mappings
Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis
Proceedings of the 15th International Conference on Database Theory (ICDT 2012). Best Paper Award. -
On the Data Complexity of Consistent Query Answering
Balder ten Cate, Gaelle Fontaine, Phokion G. Kolaitis
Proceedings of the 15th International Conference on Database Theory (ICDT 2012). -
A Dichotomy in the Complexity of
Consistent Query Answering for Queries with Two Atoms
Phokion G. Kolaitis and Enela Pema
Information Processing Letters, 112(3), pp. 77-85, 2012 -
Characterizing Schema Mappings via Data Examples
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan
ACM Transactions of Database Systems, 36(4):23, 2011. Invited paper from PODS 2010 . -
EIRENE: Interactive Design and Refinement of
Schema mappings via Data Examples
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan
Proceedings of VLDB (PVLD), 4(12), pp. 1414-1417, 2011. -
Designing and Refining Schema Mappings via Data Examples
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan
Proceedings of the ACM SIGMOD Conference, pp. 133-144, 2011. -
Probabilistic Data Exchange
Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis
Journal of the Association for Computing Machinery (JACM), 58(4), 2011. -
Guarded Negation
Vince Barany, Balder ten Cate, Luc Segoufin
Proceedings of the 38th International Colloquium on Automata, Languages and Programming (ICALP 2011), pages 356-367, 2011. -
On the Tractability and Intractability of Consistent Conjunctive Query Answering
Enela Pema
EDBT/ICDT Ph.D. Workshop, 2011. -
Schema Mapping Evolution Through Composition and Inversion
Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang-Chiew Tan
Book Chapter in "Schema Matching and Mapping", pp. 191-222, Springer, 2011. -
Unary Negation
Balder ten Cate and Luc Segoufin
Proceedings of the International Symposium on Theoretical Aspects of Computer Science (STACS), LIPIcs, vol. 9, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2011. -
On the equivalence of distributed systems with queries and communication
Serge Abiteboul, Balder ten Cate, and Yannis Katsis
Proceedings of the International Conference on Database Theory (ICDT), pp. 126-137, 2011. -
MapMerge: Correlating Independent Schema Mappings
Bogdan Alexe, Mauricio Hernandez, Lucian Popa, and Wang Chiew Tan
Proceedings of the PVLDB Endowment, vol. 3, pp. 81-92, 2010. -
Database Constraints and Homomorphism Dualities
Balder ten Cate, Phokion G. Kolaitis, and Wang Chiew Tan
Proceedings of Principles and Practice of Constraint Programming (CP), pp. 475-490, 2010.
-
Structural Characterizations of Schema-Mapping Languages
Balder ten Cate and Phokion G. Kolaitis
Communications of the Association for Computing Machinery (CACM), vol. 53, no. 1, pp. 101-110, 2010 -
Characterizing Schema Mappings via Data Examples
Bogdan Alexe, Phokion G. Kolaitis, and Wang-Chiew Tan
Proceedings of the Symposium on Principles of Database Systems (PODS), pp. 261-272, 2010 -
Probabilistic Data Exchange
Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis
Proceedings of the International Conference on Database Theory (ICDT), pp. 76-88, 2010