We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A parser is a piece of software capable of resolving a string into tokens and then checking whether the string belongs or not in a particular (formal) language. Constructing a parser from scratch is an interesting problem. However, a more interesting problem is that of constructing a particular parser from other (predefined?) parsers rather than from scratch. This problem can be solved by using parser builders. In the end, some of these parser builders have to be constructed from scratch, but all the complex parsers can be built from the other parser builders that parse components. Scala includes a rich library for building parsers using parser builders. In this chapter we first give an overview of some relevant notions, then we describe the library and finally we use this library to construct an interpreter for a simple programming language.
Language parsers
Given an alphabet (i.e., a set of symbols or characters), the closure of this alphabet is a set that has as elements all strings that consist of symbols drawn from this particular alphabet. For example, the digits 0 and 1 form an alphabet and the closure of this alphabet consists of “numbers” like 000, 101, 11, etc. A language can be considered a subset of the closure (including the empty string) of an alphabet. The language that is the closure of an alphabet is not particularly interesting since it is too large.
Most “real” programs have a graphical user interface (GUI for short). Therefore, any serious modern programming language should provide tools for GUI programming. Scala provides such tools through the scala.swing package, which is an interface to Java's JFC/Swing packages. The ambitious goal of this chapter is to teach you how to build GUI applications in Scala.
“Hello World!” again!
Typically, most introductory texts on programming are written without any coverage of GUI programming. In addition, advanced texts on programming cover GUI programming only as a marginal or optional topic. The truth is that the most “useful” applications have a graphical user interface that allows users to interact with the application. This implies that GUI programming is more common than programming textbooks “assume.” GUI programming is excluded by most texts because it is assumed that it is significantly harder than ordinary applications programming. Nevertheless, this is not true – it is true that GUI programming differs from conventional application programming, but being different does not make a methodology more difficult.
Creating simple GUI applications with Scala is relatively simple, however, one has to compile the source code of the application as it is not straightforward to create runnable GUI scripts. When compiling even a very simple GUI application, the Scala compiler will generate a number of .class files. This implies that if one wants to run this application, one needs to have all these files in a particular directory.
The expression problem, also known as the extensibility problem, refers to the situation where we need to extend the data types of a program as well as the operations on them, with two constraints: (a) we do not want to modify existing code and (b) we want to be able to resolve types statically. Thus, the essence of the expression problem lies in the type-safe incremental modification to both the data types and their corresponding operations, without recompilation and with the support/use of static typing.
At the heart of the expression problem is the Separation of Concerns principle. Since its inception about forty years ago by Edsger Wybe Dijkstra, the Separation of Concerns principle has been elevated to one of the cornerstones of software engineering. In plain words what it states is that when tackling a problem we have to identify the different concerns that apply to the specific problem and then try to separate them. By separating the concerns, we produce untangled, clearer code, thus reducing the software complexity and increasing maintainability.
Of course, separation of concerns is only half the truth. We can identify our concerns and successfully separate them, but at some point we will need to recombine them: after all, they are parts of the original problem.
So, what exactly do we separate and then recombine in the expression problem? Data and operations are two different dimensions. Incremental modifications to these dimensions should be done independently and in an extensible way.
Continuing our investigations on path territory in Chapter 8, the natural evolution is to touch some of the actual file abstraction. As a historical note, complete operating systems have been built based on this abstraction, Plan 9 being one that used it quite extensively. Plan 9 was designed by, among others, Ken Thompson and Rob Pike who also designed UTF-8. Throughout the book, we are using files as they are defined and implemented by a standard Java Development Kit distribution for an obvious reason: they use a pragmatic approach and at the same time make a very successful utility for everyday programming. In this chapter, we will try to tickle a few of our brian neurons around this design.
The java.io.file class of the Java platform already provides a file abstraction, via the java.io.File class. Despite the official online documentation, which states that File is
An abstract representation of file and directory pathnames,
the implementation plays the dual role of handling both generic paths and their physical counterparts in the native file system. We have already treated path representation and composition. The most common mapping of paths to underlying system resources is that related to native files.
So, we will develop here a small library for accessing files. But, instead of just providing a Scala wrapper around Java's File, we will abstract away common operations. The goal is for the design to accommodate not only the native file system but also a variety of other virtual file systems.
The constructs that have been presented in the previous chapter are enough for the creation of simple software systems. On the other hand, it is quite possible to create very complex software systems with these constructs, but the design and implementation processes will be really difficult. Fortunately, Scala provides many advanced features and constructs that facilitate programming as a mental activity. In this chapter we will describe most of these advanced features, while a few others like parsing combinators and actors will be presented thoroughly in later chapters.
Playing with trees
In the previous chapter we presented many important data types, but we did not mention trees, which forma group of data types that have many uses. Also, from the discussion so far, it is not clear whether Scala provides a facility for the construction of recursive data types, that is data types that are defined in terms of themselves. For example, a binary tree is a typical example of a recursively defined data structure that can be defined as follows.
Definition 1 Given the type node, a binary tree over the type node is defined in the following way.
It is quite probable that most of us are not consciously aware of an ever-appearing design pattern, which goes far beyond the design patterns in the normal sense of [24]. This pattern has to do with how we organize our data and, sometimes as a consequence, how we access these data. What we are talking about is the hierarchical data organization pattern that we can abbreviate in short as: Hierarchies are everywhere!
A file system is the canonical example of hierarchical organization. Its structure is a collection of files and directories, with the directories playing the role of containers for other files and/or directories. The Unix tradition has more to say about files, since the file system “pattern” has been extended to support other use cases than the traditional ones. For example, in Linux, /proc is a special mounted file system which can be used to view some kernel configuration and parameters. In fact, normal file system I/O calls can be used to write data into this special file system, so that kernel and driver parameters can be changed at runtime.
XML advocates will feel pleased to recognize that XML has been promoting such hierarchical organization. We are not sure how many of them were aware of the real essence of the general “Hierarchies are everywhere” pattern mentioned above, but the pattern itself is ubiquitous. Strangely enough, hierarchical databases have not survived, but probably XML strikes back on their behalf.
Today's computers have multi-core processors (i.e., integrated circuits to which two or more processors have been attached), which, in principle, allow the concurrent execution of computer instructions. In other words, today's computers are able to perform two or more tasks at the same time. Concurrent programming refers to the design and implementation of programs that consist of interacting computational processes that should be executed in parallel. In addition, concurrent programming is not only the next logical step in software development, but the next necessary step. Thus, all modern programming languages must provide constructs and libraries that will ease the construction of concurrent programs. Scala allows users to design and implement concurrent programs using either threads, or mailboxes or actors. Unfortunately, programming with threads is a cumbersome task, thus, concurrent applications in Scala are usually implemented using the actor model of programming.
Programming with threads: an overview
Roughly, a process is a program loaded into memory that is being executed. A thread, also known as a lightweight process, is a basic unit of processor utilization. Processes may include more than one thread while traditional processes include only one thread. Threads may effectively communicate but since they share a process's resources (for example, memory and open files), their communication is not without problems. Each Scala program has at least one thread while several other “system” threads take care of events in GUI applications, input and output, etc.
XML, the eXtensible Markup Language, is an industry standard for document markup. XML has been adopted in many fields that include software, physics, chemistry, finance, law, etc. XML is used to represent data that are interchanged between different operating systems while most configuration files in many operating systems are XML files. The widespread use of XML dictated the design and implementation of tools capable of handling XML content. Scala is a modern programming language and so it includes a standard library for the manipulation of XML documents. This library, which was designed and implemented by Burak Emir, is the subject of this chapter.
What is XML?
A markup is an annotation to text that describes how it is to be structured, laid out, or formatted. Markups are specified using tags that are usually enclosed in angle brackets. XML is a meta-markup language, that is, a language that can be used to define a specific set of tags that are suitable for a particular task. For example, one can define tags for verses, stanzas, and strophes in order to express poems in XML. When a specific set of tags is used to describe entities of a particular kind, then this set is called an XML application. For example, if one precisely specifies tags suitable to describe poems and uses them only for this purpose, then the resulting set of tags is an XML application.
Scala is a relatively new programming language that was designed by Martin Odersky and released in 2003. The distinguishing features of Scala include a seamless integration of functional programming features into an otherwise objectoriented language. Scala owes its name to its ability to scale, that is, it is a language that can grow by providing an infrastructure that allows the introduction of new constructs and data types. In addition, Scala is a concurrent programming language, thus, it is a tool for today as well as tomorrow! Scala is a compiled language. Its compiler produces bytecode for the Java Virtual Machine, thus allowing the (almost) seamless use of Java tools and constructs from within scala. The language has been used to rewrite Twitter's back-end services. In addition, almost all of Foursquare's infrastructure has been coded in Scala. This infrastructure is used by several companies worldwide (for example, Siemens, Sony Pictures Imageworks).
Who should read this book?
The purpose of this book is twofold: first to teach the basics of Scala and then to show how Scala can be used to develop real applications. Unlike other books on Scala, this one does not assume any familiarity with Java. In fact, no previous knowledge of Java is necessary to read this book, though some knowledge of Java would be beneficial, especially in the chapter on GUI applications.
Let X and Y be two finite disjoint sets of elements over some ordered type and of combined size greater than k. Consider the problem of computing the kth smallest element of X ⋃ Y. By definition, the kth smallest element of a set is one for which there are exactly k elements smaller than it, so the zeroth smallest is the smallest. How long does such a computation take?
The answer depends, of course, on how the sets X and Y are represented. If they are both given as sorted lists, then O(∣X∣+ ∣Y∣) steps are sufficient. The two lists can be merged in linear time and the kth smallest can be found at position k in the merged list in a further O(k) steps. In fact, the total time is O(k) steps, since only the first k + 1 elements of the merged list need be computed. But if the two sets are given as sorted arrays, then – as we show below – the time can further be reduced to O(log ∣X∣+log∣Y∣) steps. This bound depends on arrays having a constant-time access function. The same bound is attainable if both X and Y are represented by balanced binary search trees, despite the fact that two such trees cannot be merged in less than linear time.
The fast algorithm is another example of divide and conquer, and the proof that it works hinges on a particular relationship between merging and selection.