We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The Internet and the World Wide Web (also known as WWW or simply the Web) are often considered as synonyms by non-technical users. This confusion stems from the fact that the WWW is at the origin of the explosion in Internet use, providing a very user-friendly interface to access the almost infinite wealth of information available on the Internet. The WWW, though, is a rather different network in the sense that it is just made from a specific software protocol, which allows access to data scattered on the physical Internet. In other words, it is a virtual network which lives only as a sort of software map linking different data objects. Nevertheless, the Web finds a natural representation as a graph and it is a stunning example of an evolving network. New Web pages appear and disappear at an impressive rate, and the link dynamics is even faster. Indeed, the fact that we are dealing with virtual objects makes Web dynamics almost free from the physical constraints acting on the Internet. Any individual or institution can create at will new Web pages with any number of links to other documents, and each page can be pointed at by an unlimited number of other pages.
The Web is not the only virtual network present on the Internet. Users interactions and new media for information sharing can be mapped as well in a graph-like structure. The graph of e-mail acquaintances of Internet users is a well-defined example of social network hosted by the Internet.
Routers have multiple interfaces, each one corresponding to a physical connection with another peering router. By definition, each interface is associated with a different IP address. As discussed in Chapter 3, path probing discovers router interfaces. In particular each probing path will record a single interface for each traversed router, generally the one from which the packet arrived to the router. It is therefore possible that probes coming from different monitors, even if directed to the same destination, might enter from different interfaces on the same intermediate router. Thus, each time a router is probed from a different perspective, its interfaces are registered as separate routers. A simple example of the difference between the physical router connectivity and the one registered by probing paths is depicted in Figure A1.1. In this example a network of four physical routers are connected through six different interfaces. The graphs resulting from path probes forwarded from opposite directions are different. By merging these different views, a graph with a false connectivity is obtained and the interfaces (router aliases) resolution becomes indispensable to reconstruct the physical router topology.
We have seen in the previous chapter that the graphs representing the physical layout of the large-scale Internet look like a haphazard set of points and lines, with the result that they are of little help in finding any quantitative characterization or hidden pattern underlying the network fabric. The intricate appearance of these graphs, however, corresponds to the large-scale heterogeneity of the Internet and prompts us to the use of a statistical analysis as the proper tool for a useful mathematical characterization of this system. Indeed, in large heterogeneous systems, large-scale regularities cannot be found by looking at local elements or properties. Similarly, the study of a single router connectivity or history will not allow us to understand the behavior of the Internet as a whole. In other words, we must abandon local descriptions in favor of a large-scale statistical characterization, taking into account the aggregate properties of the many interacting units that compose the Internet.
The statistical description of Internet maps finds its natural framework in graph theory and the basic topological measures customarily used in this field. Here we shall focus on some metrics such as the shortest path length, the clustering coefficient, and the degree distribution, which provide a basic and robust characterization of Internet maps. The statistical features of these metrics provide evidence of the small-world and scale-free properties of the Internet. These two properties are prominent concepts in the characterization of complex networks, expressing in concise mathematical terms the hidden regularities of the Internet's structure.
The Internet is a technological infrastructure aimed at favoring data exchange and reachability. The World Wide Web can be used to extract information from distant places with a few mouse clicks, and Internet protocols forward messages to far away computers, efficiently routing them along the intricate network fabric. This extreme efficiency, however, can also work in favor of negative purposes, such as the spreading of computer viruses. Computer viruses have a long history, dating from the 1980s and before, becoming newly and sadly famous after each new bug attack, which eventually causes losses worth millions of dollars in computer equipment and downtime (Suplee, 2000). Their ever-increasing threat has therefore stimulated a growing interest in the scientific community and the economic world, translated in this latter case into the antivirus software business, moving millions of dollars worldwide every year.
Computer virus studies have been carried out for long time, based mainly on an analogy with biological epidemiology (Murray, 1988). In particular most studies have focused on the statistical epidemiology approach, aimed at modeling and forecasting the global incidence and evolution of computer virus epidemics in the computer world. The final goal of this approach is the development of safety and control policies at the large-scale level for the protection of the Internet. Puzzling enough, however, is the observed behavior of computer viruses in the wild, which exhibit peculiar characteristics that are difficult to explain in the usual epidemic spreading framework.
For the majority of people the word “Internet” means access to an e-mail account and the ability to mine data through any one of the most popular public web search engines. The Internet, however, is much more than that. In simple terms, it is a physical system that can be defined as a collection of independently administered computer networks, each one of them (providers, academic and governmental institutions, private companies, etc.) having its own administration, rules, and policies. There is no central authority overseeing the growth of this networks-of-networks, where new connection lines (links) and computers (nodes) are being added on a daily basis. Therefore, while conceived by human design, the Internet can be considered as a prominent example of a self-organized system that combines human associative capabilities and technical skills.
The exponential growth of this network has led many researchers to realize that a scientific understanding of the Internet is necessarily related to the mathematical and physical characterization of its structure. Drawing a map of the Internet's physical architecture is the natural starting point for this enterprise, and various research projects have been devoted to collecting data on Internet nodes and their physical connections. The result of this effort has been the construction of graph-like representations of large portions of the Internet. The statistical analysis of these maps has highlighted, to the surprise of many, a very complex and heterogeneous topology with statistical fluctuations extending over many scale lengths.
The natural framework for a correct mathematical description of complex networks is graph theory. The origins of graph theory can be traced back to the pioneering work of Euler to solve the Königsberg bridges problem (Euler, 1736), and has now reached a maturity in which a wealth of results of immediate applicability are useful for the understanding of real complex networks. In this appendix we shall provide a cursory introduction to the main definitions and concepts of graph theory, useful for the analysis of real networks. The main sources followed are the books by Chartrand and Lesniak (1986), Bollobás (1998), and Bollobás (1985), as well as the review articles by Albert and Barabási (2002), Dorogovtsev and Mendes (2002), and Newman (2003), covering more recent aspects.
Graphs and subgraphs
An undirected graph G is defined by a pair of sets G = (V, E), where V is a non-empty countable set of elements, called vertices or nodes, and E is a set of unordered pairs of different vertices, called edges or links. Throughtout the book a vertex is reffered to by its order i in the set V. The edge (i, j) joins the vertices i and j, which are said to be adjacent or connected. The total number of vertices in the graph (the cardinality of the set V) is denoted as N, the size of the graph.
The characterization of how routers, computers, and physical links interconnect with each other in the global Internet is a very difficult task due to several key features of network development. A first one is the Internet's size and continued growth. The Internet is growing exponentially and its size has already increased by five orders of magnitude since its birth. In other words, the Internet is a large-scale object whose global properties cannot be inferred, in general, from local ones. A second difficulty is the intrinsic heterogeneity of the Internet, that is it is composed of networks engineered with large technical and administrative diversity. The different networking technologies are merged together by the TCP/IP architecture that, while providing connectivity, does not imply uniform behavior. Moreover, networks range from small local campuses to large transcontinental backbone providers. This difference in size is reflected in different administrative policies that make routing through the Internet a highly unpredictable and heterogeneous phenomenon. Also very important is the fact that the Internet is a self-organizing system, whose properties cannot be traced back to any blueprint or chart. It evolves and drastically changes over time according to evolutionary principles dictated by the interplay between cooperation (the network has to work efficiently) and competition (providers wish to earn money). This means that routers and links are added by competing entities according to local economic and technical constraints, leading to a very intricate physical structure that does not comply with any globally optimized plan.
While the scope of this book is to look at the Internet as a self-organizing complex system and to study its large-scale properties by using a statistical approach, a general knowledge of how the Internet works is necessary to identify the main elements forming the Internet, as well as their respective interactions. To give a physical analogy, if we want to understand the properties of a certain material we first have to know the atomic elements it is composed of, and how they interact. Similarly, it is impossible to approach the Internet without first having some hint of what a router is and how it communicates with its peers.
In this chapter we provide a brief description of the different elements, both hardware and software, at the basis of the Internet's functioning, to allow the non-expert readers to familiarize themselves with the general mechanisms that make it possible to transfer data from host to host. These mechanisms will turn out to be very relevant for understanding problems related with measurement infrastructures and other communication processes taking place in the Internet. Our exploration of the Internet's workings will be necessarily non-technical and, needless to say, the expert reader can freely skip this chapter and use it later on for convenient reference.
Physical description
In providing a general picture of the Internet, the starting point is the concept of a computer network. A network is any set of computers – usually referred to as hosts – connected in such a way that each one of them can inter-operate with all the others.
The Internet is composed by thousands of different elements – both at the hardware and software level – which are naturally susceptible to errors, malfunctioning, or other kind or failures, such as power outages, hardware problems, or software errors (Paxson, 1997; Labovitz, Ahuja, and Jahanian, 1999). Needless to say, the Internet is also subject to malicious attacks. The most common of those are the denial-of-service attacks, that encompass a broad set of attacks aimed at a diversity of Internet services, such as the consumption of limited resources or the physical destruction of network components (C.E.R. Team, 2001). Given so many open chances for errors and failures, it might sometimes be surprising that the Internet functions at all.
The design of a computer network resilient to local failures (either random malfunctions or intentional attacks) was indeed one of the main motivations for the original study of distributed networks by Paul Baran (1964). Considering the worst possible scenario of an enemy attack directed towards the nodes of a nationwide computer network, Baran analyzed the “survivability” (defined as the average fraction of surviving nodes capable of communication with any other surviving node) of the several network designs available at that time. His conclusion was that the optimal network, from the survivability point of view, was a mesh-like graph with a sufficient amount of redundancy in the paths between vertices. Even in the case of a severe enemy strike, depleting a large number of components, such network topology, would ensure the connectivity among the surviving computers, diverting communications along the ensemble of alternative paths.
In previous chapters we have presented an X-ray picture of the Internet's topological skeleton and the consequences that the revealed structure has on several properties of this network. There is, however, another dimension to add to the Internet picture that refers to the quantitative characterization of the actual flow of data and transmission capacity of the various elements forming the Internet. Physical links have a large heterogeneity in their data transmission capacity. Also, the traffic load is highly variable on different connections and at different times; that is, a full characterization of the network cannot prescind from the knowledge of the interplay between the topological structure, the detailed physical properties of each element, and the traffic generated by users.
The measurement and modeling of link traffic and performance in LANs or WANs has been a major area of activity over the recent years. These studies pointed out the failure of the Poisson traffic model, providing empirical evidence for traffic self-similarity. The modeling of these properties finds a natural framework in the vast mathematical theory of self-similar stochastic processes, and intense research activity has been focused both on the understanding of the origin of the scale-free nature of traffic and its implications at the performance level. Measurements of link traffic imply the accumulation of very long time series of traffic loads and usually refer to a specific link or very small networks.
The problem of searching in complex networks is a very relevant issue, with a large number of practical applications. As the most obvious example of this problem we can consider the World Wide Web. It is an immense data depot that, however, acquires a practical interest and value only if it is possible to locate the particular information one is looking for. Probably, the first instance in which searching in complex networks has been considered is in the sociological context. In a now famous experiment, Milgram (1967) randomly selected people in a town in the Midwest, and asked them to send a letter to a target person living in Boston. The rule was that letters could not be sent directly but passed from person to person, only between people known on a first-name basis, until reaching the target. The unexpected result of this experiment was that the letters that arrived to the target traveled through short chains of acquaintances, with an average length of only six intermediate hops. The conclusion that can be drawn from this result is not only that short paths exist between the vertices of a social network (the small-world property), but that these paths can be efficiently found in order to transmit information. The very same scenario turns out to be very relevant also for technological networks such as the Internet.
In this chapter we will review the main searching strategies that can be applied in order to retrieve information from the vertices of a network, focusing specially on the Internet and the virtual networks hosted by it.
The Internet is such a rapidly evolving system that sometimes the effort put in its study might appear as extenuating and futile as Sisyphus' pursuit. Indeed, one cannot but ask the question: Would the picture and results obtained today still be valid for the Internet of tomorrow?
The scientific community has just started to partially understand the complex properties of present Internet maps, as new and more ambitious mapping projects are being developed, and new kinds of networks are making their appearance, posing new theoretical and experimental challenges. For instance, P2P and ad-hoc networks define a new class of dynamical system for which new types of monitoring infrastructures and different modeling frameworks need to be developed.
Another rapidly changing aspect of the Internet is related to traffic load and performance. The traffic in the Internet is steadily increasing, due to users' demands, along with the available bandwidth of communications lines. Unfortunately, we have not yet achieved a proper theoretical understanding of the interplay and mutual feedback between topological features and traffic load. This makes it very difficult to forecast if the existing infrastructures, as well as new networks joining the Internet, will reorganize in view of this interplay, providing a different global picture of the Internet.
As if all that were not enough, new technical improvements, at the software, hardware, and protocol levels, are constantly changing many of the mechanisms that rule the fundamental processes that make the Internet work at the “microscopic” level.
In the formulation of statistical mechanics we are concerned with the theoretical prediction of thermodynamic properties of a physical system which contains a large number of particles such as electrons, atoms, and molecules by some statistical average processes performed over an appropriately prepared statistical sample. There are many different ways to prepare the sample. J. Willard Gibbs (1901) coined the name ensemble for the statistical sample, and three different ensembles are introduced, i.e., the microcanonical, canonical, and grand canonical ensembles.
We can choose for the physical system any thermodynamical system such as a single-component gas, liquid, or solid, as well as a mixture of many components, as long as the system is in the condition of thermodynamical equilibrium. In order to establish a fundamental principle of statistical mechanics, however, we naturally choose as simple a system as possible, such as the one-component dilute gas made up of structureless monatomic molecules. We then extend the fomulation step by step to more complicated systems. In this chapter, formulation of the three Gibbs ensembles will be developed.
The microcanonical ensemble is a collection of identical replicas of a given physical system which is a gas made up of noninteracting structureless particles. Firstly, the system is assumed to be contained in a box of volume V, the number of particles is equal to N, and the total energy is given in a narrow range between E and E + dE.