We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
As we know by now, solutions in data exchange are not unique; indeed, if a source instance has a solution under a relational mapping M, then it has infinitely many of them. But if many solutions exist, which one should we materialize? To answer this question, we must be able to distinguish good solutions from others. As we already mentioned in Section 4.2, good solutions in data exchange are usually identified with the most general solutions (see Example 4.3), whichinturn can be characterized as the universal solutions. We will see in this chapter that universal solutions admit different equivalent characterizations.
The existence of solutions does not in general imply the existence of universal solutions (in fact, checking whether universal solutions exist is an undecidable problem). However, this negative theoretical result does not pose serious problems in data exchange. Recall from the previous chapter that the class of mappings M = (Rs, Rt, Σst, Σt), such that Σt consists of a set of egds and a weakly acyclic set of tgds, is particularly well-behaved for data exchange. Indeed, for this class the existence of solutions – that is undecidable in general – can be checked in polynomial time, and, in case a solution exists, at least one solution can be efficiently computed. We will see in this chapter that for this class, the existence of solutions implies the existence of universal solutions. Furthermore, when one exists, it can be efficiently computed.
In this chapter we shall study data exchange for XML documents. XML itself was invented as a standard for data exchange on the Web, albeit under a different interpretation of the term “data exchange”. In the Web context, it typically refers to a common, flexible format that everyone agrees on, and that, therefore, facilitates the transfer of data between different sites and applications. When we speak of data exchange, we mean transforming databases under different schemas with respect to schema mapping rules, and querying the exchanged data.
XML documents and schemas
In this section we review the basic definitions regarding XML. Note that a simple example was already shown in Chapter 1. XML documents have a hierarchical structure, usually abstracted as a tree. An example is shown in Figure 10.1. This document contains information about rulers of European countries. Its structure is represented by a labeled tree; in this example, the labels are europe, country, and ruler. In the XML context, these are referred to as element types. We assume that the labels come from a finite labeling alphabet and correspond, roughly, to relation names from the classical relational setting.
The root of the tree is labeled europe, and it has two children that are labeled country. These have data values, given in parentheses: the first one is Scotland, and the second one is England. Each country in turn has a set of rulers.
The ultimate goal of data exchange is to answer queries over the target data in a way consistent with the source data. In this chapter we focus on queries that return sets of tuples of data values. The reason for this restriction is twofold:
• first, for such queries the notion of certain answers is easy to define; and
• second, already for these queries we shall be able to identify features that need to be ruled out in order to ensure tractability of query answering.
We deal with a query language based on tree patterns, as it forms a natural analog of (unions of) conjunctive queries. We show that the problem of finding certain answers is in CONP, just like for conjunctive queries in the relational case. We then identify several features specific to XML, such as disjunction in DTDs or the horizontal order, that make the query answering problem intractable. Finally, we give a polynomial query answering algorithm that computes certain answers for unions of conjunctive queries that only use vertical navigation, restricted to schema mappings with nested-relational DTDs.
The query answering problem
Just like in the relational case, we study conjunctive queries and their unions. As we have learned, it is convenient to work with tree patterns, which have very similar expressivity to conjunctive queries. Thus, for querying XML documents we use the same language as for the dependencies: tree patterns with equalities and inequalities, to capture the analog of relational conjunctive queries (with inequalities).
XML schema mappings are expressed in terms of tree patterns, a simple query language for XML trees that plays the role analogous to that of conjunctive queries for relational databases. In this chapter we define the syntax and semantics of tree patterns, introduce a classification based on the features they use, and determine the complexity of basic computational problems. Then we do the same for XML schema mappings.
Tree patterns: classification and complexity
In XML trees information can be represented by means of data values, as well as the structure of the tree. Let us return to the example shown in Figure 10.1. The edge between the node storing Scotland and the node storing Mary I represents the fact that Mary I ruled Scotland. The value Charles I appearing twice informs us that Charles I ruled both Scotland and England. The node storing Mary I coming directly after the node storing James V corresponds to the fact that Mary I succeeded James V on the throne of Scotland. This already suggests the querying features needed to extract information from XML trees: child, next sibling, and their transitive closures: descendant, following sibling. We call these four features axes. It is also necessary to compare data values stored in different nodes.
In the light of PART TWO, a natural query language for XML trees is the family of conjunctive queries over XML trees viewed as databases over two sorts of objects: tree nodes, and data values. Relations in such representations include child, next sibling, and relations associating attribute values with nodes.
So far, our default semantics for answering queries in data exchange was the certain answers semantics. But is it always right to use it? After all, we have seen that even in the simplest possible, copying mappings, that essentially say “copy the source to the target”, some relational algebra queries cannot be answered (see Proposition 7.16 in Section 7.6). This is rather counter-intuitive behavior: one should expect that, in a copying mapping, relations simply change name but not content, and queries should be answerable over a copy of the source.
The standard certain answers semantics presents us with a host of problems:
• some queries are not rewritable even under the simplest mappings;
• computing answers to relational algebra queries is in general an undecidable problem; and
• computing certain answers could be intractable even for very simple and practically relevant classes of queries (e.g., conjunctive queries with inequalities, or SQL queries admitting inequalities in the where clause, as shown in Theorem 7.4).
Should we always follow the certain answers semantics? Is it really sacred in data exchange? Or perhaps we can find an alternative data exchange semantics that avoids the problems we listed above?
In a way, there is no fully satisfactory answer to these questions. Every semantics one presents is going to work well in some scenarios, and exhibit problems in others, and thus the choice of the semantics depends on the precise interpretation of what the mapping rules actually mean. But alternative semantics have been proposed, and we present several here that attempt to solve the problems above.
Classical computable model theory is most naturally concerned with countable domains. There are, however, several methods – some old, some new – that have extended its basic concepts to uncountable structures. Unlike in the classical case, however, no single dominant approach has emerged, and different methods reveal different aspects of the computable content of uncountable mathematics. This book contains introductions to eight major approaches to computable uncountable mathematics: descriptive set theory; infinite time Turing machines; Blum-Shub-Smale computability; Sigma-definability; computability theory on admissible ordinals; E-recursion theory; local computability; and uncountable reverse mathematics. This book provides an authoritative and multifaceted introduction to this exciting new area of research that is still in its early stages. It is ideal as both an introductory text for graduate and advanced undergraduate students and a source of interesting new approaches for researchers in computability theory and related areas.
Not all the systems mentioned in this book have been shown to be complete, only the ones for which a method has been described for converting trees into proofs. In this section, a more powerful strategy for showing completeness will be presented that applies to a wider range of propositional modal logics. It is a version of the so-called Henkin or canonical model technique, which is widely used in logic. This method is more abstract than the method of Chapter 8, and it is harder to adapt to systems that include quantifiers and identity, but a serious student of modal logic should become familiar with it. The fundamental idea on which the method is based is the notion of a maximally consistent set. Maximally consistent sets play the role of possible worlds. They completely describe the facts of a world by including either A or ~A (but never both) for each sentence A in the language.
The Lindenbaum Lemma
A crucial step in demonstrating completeness with such maximally consistent sets is to prove the famous Lindenbaum Lemma. To develop that result, some concepts and notation need to be introduced. When M is a (possibly) infinite set of sentences, ‘M, A’ indicates the result of adding A to the set M, and ‘M ⊢S C’ indicates that there is a finite list H formed from some of the members of M, such that H ⊢S C. Set M′ is an extension of M, provided that every member of M is a member of M′. We say that set M is consistent in S iff M ⊬S ⊥. M is maximal iff for every sentence A, either A or ~A is in M. M is maximally consistent for S (or mc for short) iff M is both maximal and consistent in S. When it is clear from the context what system is at issue, or if the results being discussed are general with respect to S, the subscript ‘S’ on ‘⊢’ will be dropped, and we will use ‘consistent’ in place of ‘consistent for S’. It should be remembered, however, that what counts as an mc set depends on the system S. We are now ready to state the Lindenbaum Lemma.
Lindenbaum Lemma. Every consistent set has an mc extension.