Project Description
Interoperability of heterogeneous data is a critical problem faced by every modern enterprise that is concerned with data analysis, data migration, and data evolution. The fundamental goal in data interoperability is to facilitate and make transparent to endusers the extraction of information from multiple heterogeneous data sources that reside in different locations. At the heart of achieving data interoperability is the design and management of schema mappings. A schema mapping is a specification of the relationship between two database schemas. Schema mappings are the essential building blocks in specifying how data from different sources are to be integrated into a unified format or exchanged (i.e., translated) into a different format.
The intellectual merit of this project is the development of a solid foundation and a suite of techniques and tools for designing, understanding, and managing schema mappings. Earlier foundational work on schema mappings has mainly focused on the semantics and algorithmic issues of some of the basic operators for manipulating schema mappings with emphasis on the composition operator and the inverse operator. While the composition operator is well understood by now, much more remains to be done in the study of the inverse operator. One of the main goals of this project is to investigate in depth the inverse operator and also the difference operator, which remains largely unexplored to date. This project addresses several fundamental questions for the inverse and the difference operators, including the following: What is the right semantics for these two operators? What is the exact language for expressing these operators? Are there efficient algorithms for computing the result of the inverse operator and the difference operator? A parallel goal of this project is the development of a set of concepts and techniques for optimizing schema mapping and transforming more complex schema mappings into simpler, yet equivalent, ones. The final main goal of this project is to study the problem of using data examples to explain and illustrate schema mappings. The design of schema mappings between two schemas has been known to be one of the most costly and timeconsuming tasks in achieving data interoperability. Prior studies have suggested that (familiar) data examples can be extremely powerful aids in designing schema mappings. This project addresses the following questions: What is the right notion or notions of illustrative data examples for schema mappings? How easy or difficult it is to compute small examples for illustrating schema mappings? How can one illustrate large and complex networks of schema mappings with data examples? How can one depict the similarities and differences among multiple schema mappings?
The broader impact of this project is the development of human resources in science and engineering through the teaching, mentoring, and research training of graduate and undergraduate students on the foundational and system development work of this project. Information about publications, course material, and software prototypes and tools developed through this project can be found at this project web page.
This research project is funded by the American Recovery and Reinvestment Act of 2009 (Public Law 1115) under NSF grant IIS0905276.
Faculty
Participants
External Collaborators
Publications

Efficient Querying of Inconsistent Databases with Binary Integer Programming
Phokion G. Kolaitis, Enela Pema, and WangChiew Tan
Proceedings of VLDB (PVLDB), 6(6), pages 397408, 2013. 
A New Framework for Designing Schema Mappings
Bogdan Alexe and WangChiew Tan
Proceedings of In Search of Elegance in the Theory and Practice of Computation, 2013. 
Data Integration and Data Exchange: It's Really About Time
Mary Roth and WangChiew Tan
Proceedings of CIDR (CIDR), 2013. 
Schema Mappings and Data Examples
Balder ten Cate, Phokion G. Kolaitis, and WangChiew Tan
Proceedings of the 16th International Conference on Extending Database Technologyu (EDBT) (Tutorial), 2013. 
Learning Schema Mappings
Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis
Proceedings of the 15th International Conference on Database Theory (ICDT 2012). Best Paper Award. 
On the Data Complexity of Consistent Query Answering
Balder ten Cate, Gaelle Fontaine, Phokion G. Kolaitis
Proceedings of the 15th International Conference on Database Theory (ICDT 2012). 
Local Transformations and ConjunctiveQuery
Equivalence
Ronald Fagin and Phokion G. Kolaitis
Proceedings of the 31st ACM Symposium on Principles of Database Systems (PODS 2012), pages 179190. 
A Dichotomy in the Complexity of
Consistent Query Answering for Queries with Two Atoms
Phokion G. Kolaitis and Enela Pema
Information Processing Letters, 112(3), pages 7785, 2012 
MapMerge: Correlating Independent Schema
Mappings
Bogdan Alexe, Mauricio A. Hernandez, Lucian Popa, and WangChiew Tan.
VLDB Journal, 21(2), pages 191211, 2012. Invited paper from VLDB 2010. 
Characterizing Schema Mappings via Data Examples
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and WangChiew Tan
ACM Transactions of Database Systems, 36(4):23, 2011. Invited paper from PODS 2010. 
EIRENE: Interactive Design and Refinement of
Schema mappings via Data Examples
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and WangChiew Tan
Proceedings of VLDB (PVLDB), 4(12), pages 14141417, 2011. 
Designing and Refining Schema Mappings via Data Examples
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and WangChiew Tan
Proceedings of the ACM SIGMOD Conference, pages 133144, 2011. 
Probabilistic Data Exchange
Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis
Journal of the Association for Computing Machinery (JACM), 58(4), 2011. 
Reverse Data Exchange: Coping with Nulls
Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and WangChiew Tan
ACM Transactions on Database Systems (TODS), 36(2):11, 2011. Invited paper from PODS 2009. 
Characterizing EF over Infinite Trees
and Modal Logic on Transitive Graphs
Balder ten Cate and Alessandro Facchini
Mathematical Foundations of Computer Science 2011 (MFCS 2011), LNCS 6907, pages 290302, 2011. 
Guarded Negation
Vince Barany, Balder ten Cate, Luc Segoufin
Proceedings of the 38th International Colloquium on Automata, Languages and Programming (ICALP 2011), pages 356367, 2011. 
On the Tractability and Intractability of Consistent Conjunctive Query Answering
Enela Pema
EDBT/ICDT Ph.D. Workshop, 2011. 
Schema Mapping Evolution Through Composition and Inversion
Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and WangChiew Tan
Book Chapter in "Schema Matching and Mapping", pages 191222, Springer, 2011. 
Unary Negation
Balder ten Cate and Luc Segoufin
Proceedings of the International Symposium on Theoretical Aspects of Computer Science (STACS), LIPIcs, vol. 9, Schloss Dagstuhl  LeibnizZentrum fuer Informatik, 2011. 
On the equivalence of distributed systems with queries and communication
Serge Abiteboul, Balder ten Cate, and Yannis Katsis
Proceedings of the International Conference on Database Theory (ICDT), pages 126137, 2011. 
MapMerge: Correlating Independent Schema Mappings
Bogdan Alexe, Mauricio Hernandez, Lucian Popa, and Wang Chiew Tan
Proceedings of the PVLDB Endowment, vol. 3, pages 8192, 2010. 
Database Constraints and Homomorphism Dualities
Balder ten Cate, Phokion G. Kolaitis, and Wang Chiew Tan
Proceedings of Principles and Practice of Constraint Programming (CP), pages 475490, 2010.

Structural Characterizations of SchemaMapping Languages
Balder ten Cate and Phokion G. Kolaitis
Communications of the Association for Computing Machinery (CACM), vol. 53, no. 1, pages 101110, 2010 
Characterizing Schema Mappings via Data Examples
Bogdan Alexe, Phokion G. Kolaitis, and WangChiew Tan
Proceedings of the Symposium on Principles of Database Systems (PODS), pages 261272, 2010 
Probabilistic Data Exchange
Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis
Proceedings of the International Conference on Database Theory (ICDT), pages 7688, 2010
Software prototypes
Two prototype systems, EQUIP and EIRENE, were produced. EQUIP is a system for efficiently computing the consistent answers of queries over inconsistent databases. EIRENE is an interactive system for designing schema mappings via data examples. The code for these systems will be made available upon request. The papers related to EQUIP and EIRENE are below.

Efficient Querying of Inconsistent Databases with Binary Integer Programming
Phokion G. Kolaitis, Enela Pema, and WangChiew Tan
Proceedings of VLDB (PVLDB), 6(6), pages 397408, 2013. 
EIRENE: Interactive Design and Refinement of
Schema mappings via Data Examples
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and WangChiew Tan
Proceedings of VLDB (PVLDB), 4(12), pages 14141417, 2011. 
Designing and Refining Schema Mappings via Data Examples
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and WangChiew Tan
Proceedings of the ACM SIGMOD Conference, pages 133144, 2011.