We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we discuss the typing of semistructured data. Typing is the process of describing, with a set of declarative rules or constraints called a schema, a class of XML documents, and verifying that a given document is valid for that class (we also say that this document is valid against the type defined by the schema). This is, for instance, used to define a specific XML vocabulary (XHTML, MathML, RDF, etc.), with its specificities in structure and content, that is used for a given application.
We first present motivations and discuss the kind of typing that is needed. XML data typing is typically based on finite-state automata. Therefore, we recall basic notion of automata, first on words, then on ranked trees, finally on unranked trees (i.e., essentially on XML). We also present the main two practical languages for describing XML types, DTDs, and XML Schema, both of which endorsed by the W3C.We then briefly describe alternative schema languages with their key features. In a last section, we discuss the typing of graph data.
One can also consider the issue of “type checking a program,” that is, verifying that if the input is of a proper input type, the program produces an output that is of a proper output type. In Section 3.5, we provide references to works on program type checking in the context of XML.
MOTIVATING TYPING
Perhaps the main difference with typing in relational systems is that typing is not compulsory for XML.
Mashups are Web applications that integrate and combine data from multiple Web sources to present them in a new way to a user. This chapter shows two different ways to construct mashup applications in practice: Yahoo! Pipes, a graphical user interface for building mashups, and XProc, a W3C language for describing workflows of transformations over XML documents. Pros and cons of either approach will be made clear as one follows the indicated steps. The goal will be to present information about news events, each event being accompanied by its localization displayed on a map. For that purpose, we integrate three sources of information:
A Web feed about current events in the world, in RSS format (e.g., CNN's top stories at http://rss.cnn.com/rss/edition.rss). Any such RSS feed is fine, though English is preferable to ensure precision of the geolocalization.
A geolocalization service. We use information from the GeoNames geographical database, and specifically their RSS to Geo RSS converter, whose API is described at http://www.geonames.org/rss-to-georss-converter.html.
A mapping service. We use Yahoo! Maps.
YAHOO! PIPES: A GRAPHICAL MASHUP EDITOR
Yahoo! Pipes allows creating simple mashup applications (simply called pipe) using a graphical interface based on the construction of a pipeline of boxes connected to each other, each box performing a given operation (fetching information, annotating it, reorganizing it, etc.) until the final output of the pipeline. It can be used by nonprogrammers, though defining complex mashups still requires skill and experience with the platform.
This chapter proposes an introduction to recommendation techniques and suggests some exercises and projects. We do not present a recommendation system in particular but rather focus on the general methodology. As an illustrative example, we will use the MovieLens data set to construct movie recommendations.
The chapter successively introduces recommendation, user-based collaborative filtering and item-based collaborative filtering. It discusses different methods parameterizations and evaluates their result with respect to the quality of the data set. We show how to generate recommendations using SQL queries on the Movie-Lens data set. Finally, we suggest some projects for students who want to investigate further the realm of recommendation systems.
INTRODUCTION TO RECOMMENDATION SYSTEMS
Given a set of ratings of items by a set of users, a recommendation system produces a list of items for a particular user, possibly in a given context. Such systems are widely used in Web applications. For example, content sites like Yahoo! Movies (movies), Zagat (restaurants), Library Thing (books), Pandora (music), Stumble Upon (Web site) suggest a list of items of interest by predicting the ratings of their users. E-commerce sites such as Amazon (books) or Netflix (movies) use recommendations to suggest new products to their users and construct bundle sales. Usually, they exploit the recent browsing history as a limited context. Finally, advertisement companies need to find a list of advertisements targeted for their users. Some of them, like Google AdSense, rely more on the context (e.g., keywords) than on an estimation of the user's taste based on her/his recent browsing history.
In large-scale file systems presented in Chapter 14, search operations are based on a sequential scan that accesses the whole data set. When it comes to finding a specific object, typically a tiny part of the data volume, direct access is much more efficient than a linear scan. The object is directly obtained using its physical address that may simply be the offset of the object's location with respect to the beginning of the file, or possibly a more sophisticated addressing mechanism.
An index on a collection C is a structure that maps the key of each object in C to its (physical) address. At an abstract level, it can be viewed as a set of pairs (k,a), called entries, where k is a key and a the address of an object. For the purpose of this chapter, an object is seen as raw (unstructured) data, its structure being of concern to the Client application only. You may want to think, for instance, of a relational tuple, an XML document, a picture or a video file. It may be the case that the key uniquely determines the object, as for keys in the relational model.
An index we consider here supports at least the following operations that we thereafter call the dictionary operations:
Insertion insert(k,a),
Deletion delete(k),
Key search search(k): a.
If the keys can be linearly ordered, an index may also support range queries of the form range(k1,k2) that retrieves all the keys (and their addresses) in that range.
In previous chapters, we presented algorithms for evaluating XPath queries on XML documents in Ptime with respect to the combined size of the XML data and of the query. In this context, the entire document is assumed to fit within the main memory. However, very large XML documents may not fit in the memory available to the query processor at runtime. Since access to disk-resident data is orders of magnitude slower than access to the main memory, this dramatically changes the problem. When this is the case, performance-wise, the goal is not so much in reducing the algorithmic complexity of query evaluation but in designing methods reducing the number of disk accesses that are needed to evaluate a given query. The topic of this chapter is the efficient processing of queries of disk-resident XML documents.
We will use extensively depth-first tree traversals in the chapter. We briefly recall two classical definitions:
preorder: To traverse a nonempty binary tree in preorder, perform the following operations recursively at each node, starting with the root node: (1) Visit the root, (2) traverse the left subtree, (3) traverse the right subtree.
postorder: To traverse a nonempty binary tree in postorder, perform the following operations recursively at each node, starting with the root node: (1) Traverse the left subtree, (2) traverse the right subtree, (3) visit the root.
Figure 4.1 illustrates the issues raised by the evaluation of path queries on disk-resident XML documents.
This is a first course in propositional modal logic, suitable for mathematicians, computer scientists and philosophers. Emphasis is placed on semantic aspects, in the form of labelled transition structures, rather than on proof theory. The book covers all the basic material - propositional languages, semantics and correspondence results, proof systems and completeness results - as well as some topics not usually covered in a modal logic course. It is written from a mathematical standpoint. To help the reader, the material is covered in short chapters, each concentrating on one topic. These are arranged into five parts, each with a common theme. An important feature of the book is the many exercises and an extensive set of solutions is provided.
This book describes the mathematical aspects of the semantics of programming languages. The main goals are to provide formal tools to assess the meaning of programming constructs in both a language-independent and a machine-independent way, and to prove properties about programs, such as whether they terminate, or whether their result is a solution of the problem they are supposed to solve. In order to achieve this the authors first present, in an elementary and unified way, the theory of certain topological spaces that have proved of use in the modelling of various families of typed lambda calculi considered as core programming languages and as meta-languages for denotational semantics. This theory is known as Domain Theory, and was founded as a subject by Scott and Plotkin. One of the main concerns is to establish links between mathematical structures and more syntactic approaches to semantics, often referred to as operational semantics, which is also described. This dual approach has the double advantage of motivating computer scientists to do some mathematics and of interesting mathematicians in unfamiliar application areas from computer science.
Information is a central topic in computer science, cognitive science and philosophy. In spite of its importance in the 'information age', there is no consensus on what information is, what makes it possible, and what it means for one medium to carry information about another. Drawing on ideas from mathematics, computer science and philosophy, this book addresses the definition and place of information in society. The authors, observing that information flow is possible only within a connected distribution system, provide a mathematically rigorous, philosophically sound foundation for a science of information. They illustrate their theory by applying it to a wide range of phenomena, from file transfer to DNA, from quantum mechanics to speech act theory.
The lecture courses in this work are derived from the SERC 'Logic for IT' Summer School and Conference on Proof Theory held at Leeds University. The contributions come from acknowledged experts and comprise expository and research articles; put together in this book they form an invaluable introduction to proof theory that is aimed at both mathematicians and computer scientists.
Coinduction is a method for specifying and reasoning about infinite data types and automata with infinite behaviour. In recent years, it has come to play an ever more important role in the theory of computing. It is studied in many disciplines, including process theory and concurrency, modal logic and automata theory. Typically, coinductive proofs demonstrate the equivalence of two objects by constructing a suitable bisimulation relation between them. This collection of surveys is aimed at both researchers and Master's students in computer science and mathematics and deals with various aspects of bisimulation and coinduction, with an emphasis on process theory. Seven chapters cover the following topics: history, algebra and coalgebra, algorithmics, logic, higher-order languages, enhancements of the bisimulation proof method, and probabilities. Exercises are also included to help the reader master new material.
Grammars of natural languages can be expressed as mathematical objects, similar to computer programs. Such a formal presentation of grammars facilitates mathematical reasoning with grammars (and the languages they denote), as well as computational implementation of grammar processors. This book presents one of the most commonly used grammatical formalisms, Unification Grammars, which underlies contemporary linguistic theories such as Lexical-Functional Grammar (LFG) and Head-driven Phrase Structure Grammar (HPSG). The book provides a robust and rigorous exposition of the formalism that is both mathematically well-founded and linguistically motivated. While the material is presented formally, and much of the text is mathematically oriented, a core chapter of the book addresses linguistic applications and the implementation of several linguistic insights in unification grammars. Dozens of examples and numerous exercises (many with solutions) illustrate key points. Graduate students and researchers in both computer science and linguistics will find this book a valuable resource.
The history of bisimulation is well documented in earlier chapters of this book. In this chapter we will look at a major non-trivial extension of the theory of labelled transition systems: probabilistic transition systems. There are many possible extensions of theoretical and practical interest: real-time, quantitative, independence, spatial and many others. Probability is the best theory we have for handling uncertainty in all of science, not just computer science. It is not an idle extension made for the purpose of exploring what is theoretically possible. Non-determinism is, of course, important, and arises in computer science because sometimes we just cannot do any better or because we lack quantitative data from which to make quantitative predictions. However, one does not find any use of non-determinism in a quantitative science like physics, though it appears in sciences like biology where we have not yet reached a fundamental understanding of the nature of systems.
When we do have data or quantitative models, it is far preferable to analyse uncertainty probabilistically. A fundamental reason that we want to use probabilistic reasoning is that if we merely reported what is possible and then insisted that no bad things were possible, we would trust very few system designs in real life. For example, we would never trust a communication network, a car, an aeroplane, an investment bank nor would we ever take any medication! In short, only very few idealised systems ever meet purely logical specifications. We need to know the ‘odds’ before we trust any system.
We introduce bisimulation and coinduction roughly following the way that led to their discovery in Computer Science. Thus the general topic is the semantics of concurrent languages (or systems), in which several activities, the processes, may run concurrently. Central questions are: what is, mathematically, a process? And what does it mean that two processes are ‘equal’? We seek notions of process and process equality that are both mathematically and practically interesting. For instance, the notions should be amenable to effective techniques for proving equalities, and the equalities themselves should be justifiable, according to the way processes are used.
We hope that the reader will find this way of proceeding helpful for understanding the meaning of bisimulation and coinduction. The emphasis on processes is also justified by the fact that concurrency remains today the main application area for bisimulation and coinduction.
We compare processes and functions in Section 1.1. We will see that processes do not fit the input/output schema of functions. A process has an interactive behaviour, and it is essential to take this into account. We formalise the idea of behaviour in Section 1.2 via labelled transition systems (LTSs), together with notations and terminology for them. We discuss the issue of equality between behaviours in Section 1.3. We first try to re-use notions of equality from Graph Theory and Automata Theory. The failure of these attempts leads us to proposing bisimilarity, in Section 1.4. We introduce the reader to the bisimulation proof method through a number of examples.
This book is about bisimulation and coinduction. It is the companion book of the volume An Introduction to Bisimulation and Coinduction, by Davide Sangiorgi (Cambridge University Press, 2011), which deals with the basics of bisimulation and coinduction, with an emphasis on labelled transition systems, processes, and other notions from the theory of concurrency.
In the present volume, we have collected a number of chapters, by different authors, on several advanced topics in bisimulation and coinduction. These chapters either treat specific aspects of bisimulation and coinduction in great detail, including their history, algorithmics, enhanced proof methods and logic. Or they generalise the basic notions of bisimulation and coinduction to different or more general settings, such as coalgebra, higher-order languages and probabilistic systems. Below we briefly summarise the chapters in this volume.
The origins of bisimulation and coinduction, by Davide Sangiorgi
In this chapter, the origins of the notions of bisimulation and coinduction are traced back to different fields, notably computer science, modal logic, and set theory.
An introduction to (co)algebra and (co)induction, by Bart Jacobs and Jan Rutten
Here the notions of bisimulation and coinduction are explained in terms of coalgebras. These mathematical structures generalise all kinds of infinitedata structures and automata, including streams (infinite lists), deterministic and probabilistic automata, and labelled transition systems. Coalgebras are formally dual to algebras and it is this duality that is used to put both induction and coinduction into a common perspective.