1 Introduction
Distributed systems involve interacting processes. Usually, programmers write one programme per process and then compose those programmes in parallel. These programmes contain send and receive expressions which transmit data between processes. Predicting how the composition of programmes based on this method is challenging, so it is easy to write code that deadlocks, or gets stuck because patterns of sends and receives do not match. Session types (Honda, Reference Honda1993; Honda et al., Reference Honda, Vasconcelos and Kubo1998) can be used to describe the patterns of sends and receives in a programme, offering a foundation for static analyses aimed at preventing communication mismatches and deadlocks (Caires & Pfenning, Reference Caires and Pfenning2010; Wadler, Reference Wadler2012; DeYoung et al., Reference DeYoung, Caires, Pfenning and Toninho2012; Dardha et al., Reference Dardha, Giachino and Sangiorgi2012; Honda et al., Reference Honda, Yoshida and Carbone2016; Scalas & Yoshida, Reference Scalas and Yoshida2019). Working with session types enables the programmer to ensure the communications in their system follow compatible send/receive patterns.
Alternatively, developers can use a choreographic language to programme the interactions that they wish to take place in the system directly from a global viewpoint (Montesi, Reference Montesi2023). Choreographic programming (Montesi, Reference Montesi2013) is a programming paradigm based on this idea with particularly well-explored foundations (Cruz-Filipe & Montesi, Reference Cruz-Filipe and Montesi2020; Montesi, Reference Montesi2023) and promising developments (see, e.g., Carbone & Montesi, Reference Carbone and Montesi2013; López et al., Reference López, Nielson and Nielson2016; Dalla Preda et al., Reference Dalla Preda, Gabbrielli, Giallorenzo, Lanese and Mauro2017; Giallorenzo et al., Reference Giallorenzo, Montesi, Peressotti, Richter, Salvaneschi and Weisenburger2021; Cruz-Filipe et al., Reference Cruz-Filipe, Graversen, Lugovic, Montesi and Peressotti2022; Hirsch & Garg, Reference Hirsch and Garg2022; Jongmans & van den Bos, Reference Jongmans and van den Bos2022). In this paradigm, a programmer writes one programme as a choreography, which is then compiled to a programme for each process that is guaranteed to be correct by construction. Unlike session types, which only allow local code to be checked against them, choreographies compile to the local code itself. The syntax of choreographic programming languages is typically inspired by security protocol notation (Needham & Schroeder, Reference Needham and Schroeder1978), where send and receive commands are written together as part of atomic instructions for expressing communication. This has two key advantages. First, it gives programmers the power to express the desired communication flow among processes, but without the burden of manually coding send and receive actions. Second, it ensures that there is no mismatch which can cause deadlock, a property that has become known as deadlock-freedom by design (Carbone & Montesi, Reference Carbone and Montesi2013).
To see the power of this, consider the (in)famous bookseller example—a recurring example in the literature of choreographic programming and session types (Carbone & Montesi, Reference Carbone and Montesi2013; Honda et al., Reference Honda, Yoshida and Carbone2016;Montesi, Reference Montesi2023). wants to buy a book from . To this end, sends the title of the book—say, “The Importance of Being Earnest”—to , who then sends back the price. then can compare the price with its budget and based on the result informs that they want to buy the book if it is within their budget or informs them that they do not want to buy the book otherwise. We can describe this via the following choreography:
In Listing (1.1), as in all choreographic programmes, computation takes place among multiple processes communicating via message passing. Values are located at processes; for example, in the first line of the choreography, the title of the book is initially located at . The function communicates a value from the process to the process . It takes a local value at and returns a local value at . Footnote 1 Thus, x represents the string “The Importance of Being Earnest” at the process , while y represents the price at the process . Finally, we check locally if the book’s price is in ’s budget. Either way, we use function to send a label from to representing ’s choice to either proceed with the purchase or not. Either way, the choreography returns the dummy value () at .
While most of the early work on choreographies focused on simple lower-order imperative programming like in the example above, recent work has shown how to develop higher-order choreographic programming languages. These languages allow a programmer to write deadlock-free code using the usual abstractions of higher-order programming, such as objects (Giallorenzo et al., Reference Giallorenzo, Montesi and Peressotti2023) and higher-order functions (Hirsch & Garg, Reference Hirsch and Garg2022; Cruz-Filipe et al., Reference Cruz-Filipe, Graversen, Lugovic, Montesi and Peressotti2022).
For instance, Listing (1.1) bakes in the title and the value of the book. However, we may want to use this code whenever wants to buy any book, and let use any local function to decide whether to buy the book at a price.
Note the type of the function buyAtPrice?: it takes as input not just an integer, but an integer at ; similarly, it returns a Boolean at . Moreover, the arrow is annotated with a set of processes, which in this case is empty (). Other than those processes named in the input and output types of the function, these are the only processes who may participate in the computation of that function. Since that set is empty here, no other process may participate in the function—i.e., buyAtPrice? is local to . (Sometimes we wish for other processes to participate in the computation of a function, as we will see in Example 3.)
However, not every function with an annotation is local. For instance, is a function compatible with type for any type . Despite the fact that is clearly not local, only and are involved in the communication, leading to the annotation. Similarly, just because the input and output of a function are at different locations does not mean that the function involves communication. For instance, it might be a constant function: the choreography has the same type of a communication of an integer from .
A programmer using a higher-order choreographic language, like a programmer using any higher-order programming language, can write a programme once and use it in a large number of situations. For instance, by supplying different values of title and buyAtPrice?, the choreography in Listing (1.2) can be used to buy several different titles and can determine if they are willing to buy the book at the price using any method they desire.
While the move from first-order programming to higher-order programming is significant, previous work on the theoretical foundations of higher-order choreographic programming still did not account for other forms of abstraction (Hirsch & Garg, Reference Hirsch and Garg2022; Cruz-Filipe et al., Reference Cruz-Filipe, Graversen, Lugovic, Montesi and Peressotti2022). In particular, they did not allow for polymorphism, where programmes can abstract over types as well as data, allowing them to operate in many more settings; nor did they allow for delegation, where one process can ask another process to act in its stead.
These forms of abstraction are relatively standard: delegation is an important operation in concurrent calculi, and polymorphism is vital to modern programming. In choreographic programming, however, another form of abstraction becomes natural: abstraction over processes. Current higher-order choreographic languages require that code mention concrete process names. However, we often want to write more-generic code, allowing the same code to run on many processes. For example, Listing (1.2) allows to decide whether to buy a book from using any local function buyAtPrice?. It would be more natural to write as a book-selling service which different clients could interact with in the same way to buy a book.
In this paper, we tackle three new features for choreographic languages. Firstly, we show that abstraction over processes is a type of polymorphism, which we refer to as process polymorphism. Secondly, we extend Chor $\lambda$ —a simply-typed functional choreographic language—with polymorphism, including process polymorphism, and call this new language PolyChor $\lambda$ . Thirdly, we add the ability to communicate distributed values such as functions. This gives us the ability to delegate (i.e., to send code to another process, which that process is then expected to run), giving a clean language to study all three forms of abstraction.
Let us examine the bookseller service in our extended language:
This programme allows a process named B to connect with to buy a book. B then provides a string title and a decision function $\textsf{buyAtPrice?}$ . Thus, we no longer have to write a separate function for every process which may want to buy a book from .
While this addition may appear simple, it poses some unique theoretical challenges. First, the goal of a choreographic language is to compile a global programme to one local programme per process. However, since B does not represent any particular process, it is unclear how to compile the polymorphic code above. We solve this problem via a simple principle: each process knows its identity. With this principle in place, we can compile the code to a conditional in each process: one option to run if they take the role of B, and the other to run if they do not.
Notably, each process chooses dynamically which interpretation of the code to run. This flexibility is important, since we may want to allow different processes to occupy B’s place dynamically. For instance, we can imagine a situation where work together to buy a particularly expensive book: perhaps they compare bank accounts, and whoever has more money buys the book for them to share. This can be achieved in our system with Listing (1.4), where seller_service is the name of the choreography from Listing (1.3):
Here sends its bank balance, $\textsf{bank_balance}_1$ to , who compares the received value with its own balance, $\textsf{bank\_balance}_2$ . If has the larger balance, then it informs will be buying the book by means of the label “Me.” then sends the book title to , which allows to initiate the seller_service choreography using a $\textsf{buyAtPrice?}$ function that checks whether the price is less than ’s bank balance. If has the larger balance, then again informs of who will be performing the role of buyer for the rest of the protocol, “You” and “Them” respectively. Then, enters the seller_service choreography with similar input to the first case, except the title and $\textsf{buyAtPrice?}$ are now located at .
A related challenge shows up in the operational semantics of our extended language. Languages like PolyChor $\lambda$ generally have operational semantics which match the semantics of the compiled code by allowing out-of-order execution: redices in different processes might be reduced in any order. However, care must be taken with process polymorphism, since it may not be clear whether two redices are in the same or different processes.
In addition to type and process polymorphism, PolyChor $\lambda$ is the first choreographic language to allow the communication of distributed values: values not located entirely at the sender. These values include full choreographies described by distributed functions, which can be used to model delegation. To see how process polymorphism and communication of distributed values enables delegation, consider Figure 1. Here, when a buyer asks for a book, the seller first checks whether it is in stock. If it is, the sale continues as normal. If not, the seller delegates to a second seller, which may sell the book to the buyer.
In more detail, after ascertaining that the book is not in stock, informs B and that the rest of the choreography will be executed by in the place of using two selections with label “Delegate.” Then, sends first the rest of the choreography to , followed the title of the requested book. uses its own lookup function to execute the code in Listing (1.2). Both and B need to be informed that the delegation is happening, since B needs to know that it should interact with rather than .
In general, delegation poses a challenge: the third-party processes involved in a communicated value (processes that are neither the sender nor the receiver, such as B above) might need to change who they are going to interact with by swapping names (for instance, swapping above). As we will see, this challenge is relevant for both the type system and projection operation of PolyChor $\lambda$ . For typing, the combination of process polymorphism and distributed value communication can make it difficult to statically determine where data are located. For projection, we need to ensure that the third-party processes involved in a communicated value perform the required changes to process names in the right places during execution.
Structure of the Paper.
We begin in Section 2 by examining the system model of PolyChor $\lambda$ . We then proceed with the following contributions:
-
In Section 3, we describe the PolyChor $\lambda$ language in detail. This language includes both type polymorphism and process polymorphism. We develop both a type system and kind system and an operational semantics for PolyChor $\lambda$ .
-
In Section 4, we describe the local network language used to describe the distributed implementation. We also detail how to obtain this implementation via endpoint projection, which compiles PolyChor $\lambda$ programmes to a programme for each process.
-
In Section 5, we describe the main theorem of this paper, the correctness of endpoint projection with respect to our operational semantics. Because of the dynamic nature of process polymorphism, this requires significant reasoning compared to previous works on choreographies.
-
In Section 6, we demonstrate how our theory can be used to model an extended example where an edge computer can delegate tasks to an external server.
Finally, we discuss related work in Section 7 and conclude in Section 8.
2 System model
We begin by discussing the assumptions we make about how PolyChor $\lambda$ programmes will be run. These assumptions are as light as possible, allowing for PolyChor $\lambda$ to be run in many different scenarios. In particular, we assume that we have a fixed set of processes, which can communicate via messages. These processes can each be described by a polymorphic $\lambda$ -calculus, similar to System F $\omega$ , but with the addition of communication primitives.
2.1 Processes
We assume that there is a fixed set of process names et cetera. These processes can represent nodes in a distributed system, system processes, threads, or more. Process polymorphism allows us to refer to processes using type variables, which may go in or out of scope. Despite this, the set of physically-running processes remains the same.
We assume every process knows its identity. Thus, every process can choose what code to run on the basis of its identity. This assumption is reasonable for many practical settings, for instance it is common for nodes in distributed systems to know their identity. This capability is essential to our strategy for enabling process polymorphism.
2.2 Communication
We assume that processes communicate via synchronous message passing. Thus, if sends a message to , then does not continue until has received the message. Moreover, we assume that message passing is instantaneous and certain, so messages do not get lost.
Processes can receive two kinds of messages: values of local programmes (described below) and labels describing choices made during a computation. These are used to ensure that different processes stay in lock-step with each other.
2.3 Local programmes
We assume that processes run a local language described in Section 4. This is a functional language extended with communication features, similar to the language GV (Gay & Vasconcelos, Reference Gay and Vasconcelos2010; Wadler, Reference Wadler2012; Lindley & Morris, Reference Lindley and Morris2015). Even more related to our work is FST (System F with Session Types) Lindley & Morris (Reference Lindley, Morris, Gay and Ravara2017), an extension of GV with polymorphism. As it does not have our communication of distributed values, they can base their types on System F rather that System F $\omega$ .
Endpoint projection translates PolyChor $\lambda$ into this “Network Process” language. We have thus further extended GV with features required for our endpoint projection mechanism. For instance, in the local language described in Section 4 we provide an expression form, which allows a process to choose which code to run based on its identity. Despite these extensions, the language should feel familiar to any reader familiar with polymorphic $\lambda$ -calculi.
3 The polymorphic Chor $\lambda$ language
We now turn to our first major contribution: the design of the polymorphic, choreographic $\lambda$ -calculus, and PolyChor $\lambda$ . This calculus extends the choreographic $\lambda$ -calculus Chor $\lambda$ of Cruz-Filipe et al. (Reference Cruz-Filipe, Graversen, Lugovic, Montesi and Peressotti2022) with both type and, more importantly, process polymorphism. We begin by describing the features that PolyChor $\lambda$ shares with the base Chor $\lambda$ before describing the new features. The syntax of PolyChor $\lambda$ can be found in Figure 2.
Syntax Inherited from Chor $\lambda$ . Since choreographic programmes describe the behaviour of an entire communicating network of processes, we need to reason about where terms are located. In other words, we need to know which processes store the data denoted by a term. Terms of base type, like integers, are stored by exactly one process. This is represented in our type system by matching base types with a process name. For example, integers stored by the process are represented by the type . Values of this type also mark the process which stores them, so a value (read “the integer 5 at ”) has type . In Figure 2, the only base types are , but it is easy to extend the language with other base types, such as the types used in the introduction. We will continue to freely use other base types in our examples.
While base types are located on just one process, data of more-complex types may involve multiple processes. For instance, the term involves both data stored by . This is still recorded in the type: the term above has type . In addition to base types and product types, PolyChor $\lambda$ also has sum types (written ), along with their normal introduction and elimination forms. Note that products and coproducts in PolyChor $\lambda$ may not represent a product or coproduct at the local level, since each component may be at a different process. For instance, we can represent distributed Booleans as . Matching on a value with this type will cause both to make the same choice.
Functions are treated more unusually: while we have standard and application forms, we also allow functions to be defined mutually recursively with each other. In order to do so, any PolyChor $\lambda$ choreography is associated with a list, D, of bindings of functions to function variables f, which are also expressions. A function variable can then during execution be instantiated with its definition according to this list. As we will see in Section 3.3, PolyChor $\lambda$ terms are evaluated in a context which associates each function variable with a term. Note that, while in the original Chor $\lambda$ types were mutually recursive in a similar way, in PolyChor $\lambda$ we do not support recursive types. To see why, note that we syntactically restrict many types to type values. This prevents us having to reason about processes denoted by arbitrary terms—e.g., we cannot send to the “process” but we can write which, due to our call-by-value semantics, will force the type to reduce to before Y gets instantiated. As we will see in Section 4, allowing communication between arbitrary types would make endpoint projection difficult. However, since recursive types cannot necessarily reduce to a type value, they cannot be used in many parts of the type system.
Function types are also more specific than their usual construction in $\lambda$ -calculus: they are written . Here, is a set of process names and type variables denoting additional participants in the function which do not have either the input or output. Thus, if wants to communicate an integer to directly (without intermediaries), then she should use a function of type . However, if she is willing to use the process as an intermediary, then she should use a function of type . We will use when projecting to determine that the function in question and any uses thereof must be part of the local code of .
In order to allow values to be communicated between processes, we provide the primitive communication function . This function takes a value of type and returns the corresponding value at . As mentioned in the introduction, most choreographic languages provide a communication term modelled after the “Alice-and-Bob” notation of cryptographic protocols. For instance, might represent sending 5 to . This is easily recovered by applying the function . For example, the term represents sending a message containing 5 to : it evaluates to and has type .
Finally, consider the following, where M has type :
Clearly, needs to know which branch is taken, since he needs to store a different return value in each branch. However, only knows which whether M evaluates to (here are used to denote that a value is either the right or left part of a sum and annotated with the type of the other part of the sum to ensure type principality). Thus, this choreography cannot correspond to any network programme. Using the terminology found in the literature of choreographic languages, we might say that the choreography is unrealisable because there is insufficient knowledge of choice (Castagna et al., Reference Castagna, Dezani-Ciancaglini and Padovani2012; Montesi, Reference Montesi2023).
In order to enable programmes where a process’s behaviour differs depending on other processes data, such as how behaved differently depending on ’s data, we provide terms. These allow one process to tell another which branch has been taken, preventing knowledge from “appearing out of nowhere.” For instance, we can extend the programme above to:
This represents the same programme as above, except whether the left or the right branch has been taken. Unlike the previous version of this example, it does represent a (deadlock-free) network programme. In general, we allow arbitrary labels to be sent by terms, so semantically-meaningful labels can be chosen.
While and both transfer information between two processes, they differ in what information they transfer. moves a value, e.g., as an integer or a function, from the sender to the receiver. on the other hand uses a label to inform the receiver of a choice made by the sender. Some choreographic languages combine the two, so both a label and a value is communicated at the same time, but like most choreographic languages PolyChor $\lambda$ keeps the two separate.
Syntax Additions over Chor $\lambda$ .
In order to achieve (both type and process) polymorphism in PolyChor $\lambda$ , we add several features based on System F $\omega$ (Girard, Reference Girard1972). In particular, we add kinds and universal types along with type abstraction and application. From System F $\omega,$ we inherit the kind , which is the kind of types. We additionally inherit the kind which represents functions from types to types.
Moreover, we inherit type-level functions from System F $\omega$ . These represent the definition of type constructors. We also have type-level function application . Since types contain computation, we also define type values, which are simply types without application.
We use type-level functions for two primary purposes. First, we can use it to denote types which depend on process names, such as . Second, we use type-level functions to type communications, as we will see in Section 3.1.
Note that the base types , like local values, are syntactically restricted to only allow type values as subterms. This allows us to use a type variable to compute the location of a value dynamically, but not arbitrary terms, which would make it much harder to tell at time of projection where the value is located. Thus, we can write to compute the location of an integer dynamically ( has to reduce to a type value before X can be instantiated), but we cannot write directly. This way, our projected calculus can tell when instantiating X (at runtime) whether it gets instantiated as . It would be more complicated to create runtime checks for whether Y gets instantiated as a function type that outputs or not.
In addition to the kinds of System F $\omega$ , we also have the kind of process names. Thus, process names are types, but they cannot be used to type any terms.
Additionally, we have Without kinds , which represents types of kind which do not mention any of the processes in the set . We also refer to this kind as having a restriction of the processes in . Since we restrict the types that can be communicated based on which processes they contain, as we will see soon, the Without kind can be used to define polymorphic functions which contain communication. For instance, the term
defines a function which, given distinct processes X and Y, causes X to send 5 to Y. As we will see in Section 3.2, restricting the processes involved in a type (and therefore the term being typed) is essential for typing communications. In particular, we need to ensure that a sender never tries to send something located at the receiver. Moreover, we need to ensure that every part of the communicated value located at the sender actually gets moved to the receiver, even if its location is an uninstantiated type variable.
In the rest of this section, we explore the semantics of PolyChor $\lambda$ . First, we look at its static semantics, both in the form of typing and kinding. Second, we describe its operational semantics. Throughout, we will continue to give intuitions based on the concurrent interpretation of PolyChor $\lambda$ , though the semantics we give here does not correspond directly to that interpretation.
3.1 Typing
We now turn to the type system for PolyChor $\lambda$ . As before, our type system builds on that for Chor $\lambda$ . Here, we focus on the rules that are new in this work. Thus, we focus on rules related to polymorphism, and those that have had to change due to polymorphism.
Typing judgements for PolyChor $\lambda$ have the form , where $\Theta$ is the set of process names—either names in or type variables with kind —used in M or the type of M. The typing environment $\Gamma$ is a list associating variables and function names to their types and type variables and process names to their kinds. We sometimes refer to the pair $\Theta;\Gamma$ as a typing context.
Selected rules for our type system can be found in Figure 3. The full collection of rules are given in Appendix A. Again, many of the rules are inherited directly from Chor $\lambda$ (Cruz-Filipe et al., Reference Cruz-Filipe, Graversen, Lugovic, Montesi and Peressotti2022); we thus focus on the rules that have changed due to our additions. Many, if not most, of these rules are inspired by System F $\omega$ . However, the addition of the kind of processes and Without kinds—i.e., kinds of the form —also lead to some changes.
The rules give types to values of base types. Here, we have to ensure that the location of the term is a process. Intuitively, then, we want the location to have kind . However, it might be a Without kind—that is, it might be of the form . In this case, our subkinding system (which you can find details about in Section 3.2) still allows us to apply the rule.
We express function application and abstraction via the rules, respectively. The application rule is largely standard—the only addition is the addition of a set on the function type, as discussed earlier. The abstraction rule , on the other hand, is more complicated. First, it ensures that the argument type, , has kind . Then, it ensures that every element in the set decorating the arrow is a process name—i.e., that it has kind . Finally, it checks that, in an extended environment, the body of the function has the output type . As is usual, this extended environment gives a type to the argument. However, it restricts the available process names to those in the set and those mentioned in the types .
There are two ways that a type can mention a process: it can either name it directly, or it can name it via a type variable. Thus, in the rule we allow the free variables of to remain in the process context, computing them using the (standard) free-type-variable function where both bind X. However, we must also identify the involved processes in a type, which we write and compute as follows:
The involved processes of other types are defined homomorphically.
The communication primitives are typed with , respectively. A term behaves as M, where the process informs the process that the $\ell$ branch has been taken, as we saw earlier. Thus, the entire term has type if M does. Moreover, must be processes.
The rule types terms. So far we have been simplifying the type used in for readability. We have been using to denote the input type, but as it turns out to type correctly, we have to complicate things a little. Intuitively, a term represents communicating the parts of M on . Thus, we require that be a type transformer requiring a process. Moreover, cannot be mentioned in otherwise, not every part of the type of M on in our example above would transfer to . For this, we use the following notion of mentioned processes:
Again, with other types being defined homomorphically. The difference between involved and mentioned processes is subtle. If there is no polymorphism, they are the same, but when dealing with polymorphism with restriction they are opposites: involved processes includes every process not in the restriction (the variable could be instantiated as something involving those processes and thus they may be involved), while mentioned names includes the processes mentioned in the restriction. Mentioned names is used only when typing . If we have such a type-level function, , and two type values which are not and will not be instantiated to anything mentioned in then we can type as a function from . Since this is direct communication, no intermediaries are necessary and we can associate this arrow with the empty set .
It is worth noting at this point that the communication rule inspired our use of System F $\omega$ rather than plain System F, which lacks type-level computation. In Chor $\lambda$ and other previous choreographic languages, communicated values must be local to the sender. In PolyChor $\lambda$ , this would mean not allowing the communicated type to include type variables or processes other than the sender. Since we are introducing the idea of using communication as a means of delegation, we have slackened that restriction. This means that PolyChor $\lambda$ programmes can communicate larger choreographies whose type may involve other processes, and importantly other type variables. We see this in the delegation example Listing (1.5), where we have the communication . Adding in the required type annotation (which we had suppressed in the introduction), this becomes . Note that this still leaves us with a free type variable B, representing the unknown process that is telling to interact with! Since we cannot ban free type variables in communicated types, we must create a typing system that can handle them, and this requires type-level computation.
To see why this led us to type-level computation, consider the alternative. In Chor $\lambda$ and other choreographic works, we would have a type communication using process substitution instead of communication. The annotated programme would then be . When applied to a programme of appropriate type, the result would have type
Note that, because B is a type variable, it was ignored by the substitution. If B is later instantiated as , then we must substitute B with in the output type. Thus, we need some mechanism to delay this substitution; rather than use a mechanism like explicit substitutions, we instead reached for the standard tool of System F $\omega$ . The communication winds up instead being written as with X being instantiated as in the input type and in the output type. This seemed more elegant and less ad-hoc; moreover, it adds features which a real-world implementation of PolyChor $\lambda$ would want anyway. To ensure that B does not get instantiated incorrectly, we use our Without kinds. Rule requires that both are restricted on B, which, thanks to our restrictions being symmetric, means that B cannot be instantiated as either of them. The Without kinds here prevent nonsensical typings of where, in the type, part of the output does not get moved from the sender to the receiver. This can happen if a type variable present in the type of the communicated value is instantiated during execution before the communication takes place, but has not yet been instantiated when we type the choreography. Where it not for the restrictions imposed by Without kinds, we would allow the choreography
to be typed as , which implies that part of the function is still at after the communication is executed. This is not what will happen when actually executing the choreography, so the type is wrong. The Without kinds ensure that the choreography cannot be typed: the kind of B must be , and therefore, it cannot be instantiated as .
Returning now to the typing rules of Figure 3, we next have the rules, which type universal quantification. The rule is completely standard, while the others are 4 cases of what to do with a type abstraction. Each of these rules have a different definition for the typing context of M, depending on the kind of X. As is standard, we check if the body of the function has the right type when the parameter X has kind . But first, if X is a process as in , then we need to extend $\Theta$ with X. In addition, we must further manipulate the context in order to ensure that the types whose kinds are restricted on X correspond to the restriction on the kind of X.
First, the new type variable X may shadow a previously defined X. Thus, we need to remove X from any Without kinds already in the context. We do this using the following operation :
We define $+$ on other kinds homomorphically and extend this to contexts as usual:
Furthermore, in if X itself has a Without kind—that is, X’s kind tells us it cannot be any of the processes in —then we need to symmetrically add a restriction on X to every type in . Otherwise, we would not be able to use the roles in in any place where we cannot use X, even though we know X will not be instantiated with them. We do this with the operation , which we define as follows:
With these operations in place, we can now fully understand how to type the type abstractions. When is actually a Without kind, then we must handle both shadowing and symmetrical restrictions. However, when it is not a Without kind, we must only handle shadowing. We show an example where every possible complication
Example 1 (Typing complex type abstractions). Consider the following choreography, which takes a process A and sends an integer communication with A from :
That A has a Without kind and the fact that A is a process means that we will need to use Rule when typing M. In order to illustrate the necessity of shadowing, we will include an unnecessary process in our environment. Setting , we start with the following judgement:
We need to take into account both that A is a process and that it has a Without kind in order to make the choreography typeable. First, we shadow, obtaining the following:
so we get rid of any restrictions on previous variables called A. We then add the new symmetric restrictions necessary for typing the communication, as follows:
Continuing on, we can abbreviate . Finally, we add A to the environment and $\Theta$ (writing $\Theta' = \Theta \cup \{A\}$ ), giving:
where . Because of the restrictions in Rule , N would not be typable if we had not made sure to add the symmetric restrictions. We will furthermore see in Section 3.2 that adding A to the set process names is also necessary when kinding it with the kind.
On the other hand, although the rule looks bigger at first glance, it is much simpler to use Rule .
Example 2 (Typing simple type abstractions). Consider the following type abstraction, which takes a type A and applies a variable of that type to a function which also returns something of the same type:
We can type this as
Since we have no shadowing, the only way we have to manipulate our environment when entering the type abstraction is to add to the environment, giving us
Rules are for cases of middling complexity. In Rule , we have to add the type variable to $\Theta$ , as in . However, since we have no restrictions, we do not need to consider symmetric conflict. In Rule , we do consider symmetric conflicts, but do not add to $\Theta$ (since we are not dealing with a process).
The final addition to our type system is the rule . This is another standard rule from System F $\omega$ ; it tells us that we are allowed to compute in types. More specifically, it tells us that we can replace a type with an equivalent type, using the following equivalence:
In addition to the rules in Figure 3 for typing choreographies, our type system needs one more rule for typing the definitions of our recursive functions. We also add an extra judgement of the form $\Theta; \Gamma \vdash D$ where $\Theta; \Gamma$ is a typing context as before, and D is a set of definitions for function variables—i.e., $D = \{f_1 = M_1, \ldots f_n = M_n\}$ . We write D(f) for the term associated with f in D. The only rule for this judgement is , which says that a set of definitions is well-formed if every variable in D is associated with a type in $\Gamma$ , and the body of f in D can be given be given type in the context $\emptyset; \Gamma$ . We require that the body of f can be typed with an empty set of roles because they are global predefined functions, and as such they should not be local to any one process.
3.2 Kinding
We finish our discussion of the static semantics of PolyChor $\lambda$ by looking at our kinding system. Our kinding system uses only one judgement, , which says that in the typing context $\Theta; \Gamma$ , the type has kind . You can find the rules of our kinding system in Figure 4. These are mostly directly inherited from System F $\omega$ . However, we must account for and Without kinds.
For instance, the rules check that the type representing which process is storing the data indeed has the kind . Similarly, ensures that all of the types in the set of possible intermediaries are processes. The rule for type variables, , ensures that if a type variable X is assigned kind , then X must also be in $\Theta$ .
One of the biggest differences between our kinding system and that of System F $\omega$ , however, is the rule which tells us that our system enjoys subkinding. The subkinding rules come from the subset ordering on Without kinds. We also consider any kind equivalent to the same kind restricted on the empty set due to and .
The rules for subkinding are as follows:
Lemma 1. Let be a type. If there exists a typing context $\Theta;\Gamma$ such that then there exists a unique type value such that .
Proof The existence of follows from induction on and its uniqueness from induction on .
Lemma 2 (Type restriction). Let be a type. If there exists a typing context $\Theta;\Gamma$ such that .
Proof Follows from kinding rules.
Theorem 1 (Kindable types). Let M be a choreography and be a type such that . Then .
Proof Follows from induction on the derivation of and the kinding rules.
We also find that types have the same kinds as their equivalent type values. Due to $\beta$ -expansion, a kindable type can be equivalent to an unkindable type, but not an unkindable type value.
Theorem 2 (Kind Preservation). Let be a type. If there exists a typing context $\Theta;\Gamma$ such that for any type value such that .
Proof Follows from the kinding and type equivalence rules. The only way that a kindable type can be equivalent to a type which is not kindable is when we have types and such that . In that case, if we use the rule to create an unkindable with an extra application. However, this unkindable type is not a type value, and in fact, we must also use the same rule to remove this new type application before we get to a type value.
Example 3. We return to the delegation example (Listing (1.5)) and try to type it. As B appears free in the type of a value, F, being communicated between , B must actually have the Without kind . The choreography therefore gets the type
This type shows both the input, output, and involved roles of the choreography.
3.3 Operational semantics
Finally, we consider the operational semantics of PolyChor $\lambda$ . In practice, the semantics of a choreographic language can be used to simulate a choreography and check if it specifies the expected collective behaviour. Its key role, however, is to prove properties about the projected local code. Specifically, we are going to prove that the projected code is compliant to the choreography (an operational correspondence result) and that as a result it is deadlock-free. The semantics of PolyChor $\lambda$ are mostly a standard call-by-value reduction semantics for a typed $\lambda$ calculus. However, the reduction semantics must also carry a set D of function definitions. Only a few rules are unusual or must be modified; those can be found in Figure 5. You can find the rest of the rules in Appendix B.
The rules come from System F $\omega$ . The rule is similar to ordinary CBV $\beta$ reduction, but tells us how to reduce a type abstraction applied to a type value, but with the caveat that if we do not have a type value we must use type equivalence to get one before reducing. The rule tells us that we can reduce a type function applied to any argument.
The rule allows us to reduce function names by looking up their definition in the set D.
Finally, we have the rules for communication. The rule says that acts as a no-op, as we stated earlier. While this may seem redundant, such terms are vital for projection, as we will see in the next section. More importantly, the rule tells us how we represent communication at the choreography level: via substitution of roles. This also helps explain some of the restrictions in . Since we replace all mentions of with in V, we cannot allow other mentions of in the type transformer of V. Otherwise, there could be some mentions of P which should not be replaced during communication, which we do not model. Unlike when typing , when executing a communication we know (since we only consider choreographies without free variables) that any type variables in or V have already been instantiated and as such do we do not need to consider how to substitute variables which may later be instantiated to .
It may be surprising to learn that our semantics are simply call-by-value reduction semantics, especially for those readers familiar with choreographies. After all, choreographies are supposed to represent concurrent programmes, and so multiple redices should be available at any time. Indeed, previous works on choreographic programming (e.g., Carbone & Montesi, Reference Carbone and Montesi2013; Cruz-Filipe & Montesi, Reference Cruz-Filipe and Montesi2020; Hirsch & Garg, Reference Hirsch and Garg2022) provided a semantics with out-of-order execution, so that the operational semantics of the choreographies matched with all possible reductions in the concurrent interpretation. We use these simpler semantics, without out-of-order execution, instead. In exchange, our result in Section 5 will be weaker: we only promise that any value which the choreography can reduce to, so can the concurrent interpretation.
To see why we chose to obtain this weaker result, consider the choreography
Here we have a function f which needs to be instantiated with a distributed pair. is ready to feed its part of the argument into f and start computing the result, while and are still working on computing their part of the argument. There are two ways we could interpret PolyChor $\lambda$ concurrently: we can synchronise when all processes enter a function or we can allow to enter the function early. We take the second, more practical, route. However, this means it is not possible to reflect at least one evaluation order into the semantics of the choreography without banning distributed values or allowing us to somehow call a single value in multiple steps. This insight led to us adopting the weaker guarantee discussed above.
As is standard for call-by-value $\lambda$ -calculi, we are able to show that our type system is sound with respect to our operational semantics, as expressed in the following two theorems:
Theorem 3 (Type Preservation). Let M be a choreography and D a function mapping containing every function in M. If there exists a typing context $\Theta;\Gamma$ such that and $\Theta;\Gamma\vdash D$ , then for any M’ such that $M\rightarrow_{D} M'$ .
Proof Follows from the typing and semantic rules and Theorem 2.
Theorem 4 (Progress). Let M be a closed choreography and D a function mapping containing every function in M. If there exists a typing context $\Theta;\Gamma$ such that and $\Theta;\Gamma\vdash D$ , then either $M=V$ or there exists M’ such that $M\rightarrow_{D} M'$ .
Proof Follows from the typing and semantic rules.
4 Endpoint projection
We now proceed to the most important result for any choreographic programming language: endpoint projection. Endpoint projection gives a concurrent interpretation to our language PolyChor $\lambda$ by translating it to a parallel composition of programmes, one for each process. In order to define endpoint projection, though, we must define our process language, which we refer to as a local language. The syntax of the local language can be found in Figure 6. There you can also find the syntax of local transition labels and network transition labels, both of which will be described when we describe the operational semantics of networks.
As in PolyChor $\lambda$ , our local language inherits much of its structure from System F $\omega$ . In particular, we have products, sums, functions, universal quantification, and types, along with their corresponding terms. In fact, some types look more like standard System F $\omega$ than PolyChor $\lambda$ : function types do not need a set of processes which may participate in the function, and base types no longer need a location.
However, not everything is familiar; we have introduced new terms and new types. The terms allow terms to send and receive values, respectively. We also split terms into two terms: an offer term which allows to choose how this term will evolve. We represent such choices using choice terms of the form . This term informs the process represented by that it should reduce to its subterm labelled by $\ell$ , and then itself reduces to the term L. While these are unusual pieces of a polymorphic language like System F $\omega$ , they are familiar from process languages like $\pi$ calculus. We also add undefined types and terms, written , respectively. These represent terms which are ill-defined; we use them to represent data which does not exist on some process , but which needs to be written structurally in ’s programme. For instance, is the result of sending a value without process polymorphism. We also use it as the input of are functions which require an input. More generally, if a process participates in a function but the input and/or output is located elsewhere, we will use to represent that input and/or output. The type is only used for the term .
We also include a more unusual feature: explicit substitutions of processes. The term is a function which, when applied, replaces the role denoted by with that denoted by in its argument. This function allows us to represent the view of communication according to third parties: the roles simply change, without any mechanism necessary. For instance, imagine that wants to tell to communicate an integer to . She can do this by sending the function . In PolyChor $\lambda$ , this corresponds to the choreography
In order to project this choreography, we need to be able to project the communication function above even when it is not applied to any arguments. This is where we use explicit substitutions: we project the communication function to .
Finally, we introduce our unique feature: terms and their corresponding type. These represent the ability of a process to know its own identity and to take actions based on that knowledge. Process polymorphism requires an instantiation of a process variable at process to be accompanied by a conditional determining whether the variable has been instantiated as or as some other process may interact with. In particular, the term reduces to $B_1$ if the term is run by the process denoted by , and $B_2$ otherwise. Since $B_1$ and $B_2$ may have different types, we provide types of the form , which represent either the type (if typing a term on the process denoted by (otherwise). These terms form a backbone of endpoint projection for PolyChor $\lambda$ : every term binding a process gets translated to include an term. For instance, consider projecting the choreography
to some process . Depending on the argument to which this function is applied, should behave very differently: if it is applied to itself, it should receive something from . However, if it is applied to any other term, it should do nothing. We therefore project the choreography above to the following programme for :
Note that the construct is necessary for process polymorphism in general, unless process variables cannot be instantiated to the process they are located at. It, and the combinatorial explosion caused by having multiple process abstractions, is not caused by the choreographic language but instead the choreographic language hides it and lets programmers avoid explicitly describing both sides of the separately.
Note that we do not have a kinding system for local programmes. In fact, we do not check the types of local programmes at all. However, because types have computational content, we need to project them as well. In order to preserve that computational content, we again use an equivalence of types which corresponds to $\beta,\eta$ -equivalence. However, in order to accommodate types, we must index that equivalence with a process. Then, we have two rules regarding types:
We use these equivalence rules with process annotation to ensure that processes only use equivalences indexed with their own name and do not pick the wrong branch of an type. This way we project the type which is equivalent to everywhere else.
Now that we have seen the syntax of the programmes which run on each process, we can look at whole networks:
Definition 1. A network $\mathcal{N}$ is a finite map from a set of processes to local programmes. We often write for the network where process has behaviour $L_i$ .
The parallel composition of two networks $\mathcal{N}$ and $\mathcal{N}'$ with disjoint domains, $\mathcal{N} \mid \mathcal{N}'$ , simply assigns to each process its behaviour in the network defining it. Any network is equivalent to a parallel composition of networks with singleton domain, as suggested by the syntax above.
We now consider the operational semantics of local programmes and networks. These are given via labelled-transition systems; the syntax of both sorts of label can be found in Figure 6. The network transitions are labelled with where is the set of involved processes (either one for a local action or two for a synchronisation). The local transitions have more options for labels. The label $\tau$ denotes a normal local computation. We use the process name as a label for an action which can only take place at . The label denotes sending the value L to , leaving L’ after the send—we will explain what a label left behind after the send does when we discuss the semantics of local communication in detail. The label is the dual: it denotes receiving L’ from , with L being the value the receiver had before receiving. Again, we explain the semantics of receiving in detail later. Finally, the label denotes sending a label $\ell$ to , while the label denotes receiving the label $\ell$ from .
Selected rules for both operational semantics can be found in Figures 7 and 8. As before, transitions are indexed by a set of function definitions. Function variables reduce by looking up their definition in . Since this transition involves no communication, it is labelled with the empty transition, $\tau$ .
Perhaps surprisingly, undefined arguments to functions do not immediately cause the application to be undefined. To see why, think about choreographies of the form where some process is involved in both M and N. We project this to an application on Q of the form . Note that because we know that N has type , the projection has type and eventually evaluates to . Thus, if immediately evaluated to , the process could not participate in M, as they need to do! We therefore allow this to evaluate to . However, when the function is also undefined, we evaluate this to with the empty label $\tau$ , as you can see in the rules
As mentioned earlier, the explicit substitutions are functions which, when applied, perform the requested substitution in the value to which they are applied. This is implemented in the rule .
The terms are given meaning via the rules . The rule says that the term can evaluate to $L_1$ with label , while the rule says that it can instead reduce to $L_2$ with label where . We will see later that in the network semantics, we only allow transitions labelled with the process performing the transition.
Choice and offer terms reduce via the rules . The first, , tells us that a choice term simply reduces to its continuation with a transition label indicating the choice that has been made. The second, , tells us that an offer term can reduce to any continuation, with a transition label indicating the label of the continuation it reduced to. We will see later that the semantics of networks only allows the offer term to reduce to the continuation chosen by a matching choice term.
Finally, the terms are given meaning via , respectively. However, these rules behave somewhat-differently than might be expected: rather than acting as a plain send and receive, they behave more like a swap of information.
In a plain send, the sender would not have any information after the send—perhaps the term would come with a continuation, but this would not be related to the send. Moreover, the receiver would not provide any information, but merely receive the information from the sender. However, when sending a choreography with process polymorphism, the sender may need to participate in the continuation, depending on how polymorphic functions are applied. For instance, consider the following choreography, where sends a polymorphic function to , and the resulting polymorphic function is applied to :
The polymorphic function that results from the above is as follows:
Applying this to leads to a programme where receives from . Since needs to participate in this programme, must have a programme remaining after sending the polymorphic function to .
While this explains why terms cannot simply, for instance, return unit, it does not explain why terms swap results. To see this, consider what happens when a term is sent from a process to another process . We know from our type system that is not mentioned in the type of the term being sent, and we know that after the send all mentions of are changed to mentions of . Hence, after the send, ’s version of the term should be the view of a process not involved in the term. This is exactly what ’s version of the term is before the send. Thus, behaving as swaps leads to the correct behaviour.
Example 4 (Send And Receive). We now show the local projection (formalised in Section 4.1) and desired behaviour of
This choreography generates the network:
Using our semantics, we get the following reductions:
Now that we have discussed the semantics of local programmes, we discuss the semantics of networks. Each transition in the network semantics has a silent label indexed with the processes participating in that reduction: , where consists of either one process name (for local actions at that process) or two process names (for interactions involving these two processes). We treat as a set, implicitly allowing for exchange.
For instance, the rule describes communication. Here, one local term must reduce with a label, while another reduces with a label. These labels must match, in the sense that the value received by the must be the value sent by the —though with the receiver in place of the sender—and vice-versa. Then, a network in which the local terms are associated with the appropriate processes, and , can reduce with the label . Similarly, the rules reduces matching choice and select terms, resulting in the label .
While describe communication, the rest of the rules describe how a single process’s term can evolve over time in a network. Particularly interesting is , which says that a term can reduce only according to the process it is associated with. We can see here that the resulting label is , indicating that this reduction step only involves .
The rules tells us how to lift steps with an empty label $\tau$ . Such steps make no assumptions about the network, and so such terms can be associated with any process . When such a reduction takes place in a network, we label the resulting transition .
Finally, the rule says that if one part of a network can reduce with a label , then the entire network can reduce with that same label. This allows the other rules, which assume minimal networks, to be applied in larger networks.
In the future. we will use $\to^*$ and $\to^+$ to denote respectively a sequence and a sequence of at least one action with arbitrary labels.
4.1 Projection
We can now define the endpoint projection (EPP) of choreographies. This describes a single process’s view of the choreography; the concurrent interpretation of a choreography is given by composing the projection to every process in parallel. Endpoint projection to a particular process is defined as a recursive function over typing derivations . For readability, however, we write it as a recursive function over the term M, and use the notation to refer to the types assigned to any term N in the implicit typing derivation. Similarly, we use to refer to the kind of a type in the implicit typing derivation. We write to denote the projection of the term M (implicitly a typing derivation for M, proving that it has some type) to the process .
Intuitively, projection translates a choreography term to its corresponding local behaviour. For example, a communication action projects to a send (for the sender), a receive (for the receiver), a substitution (for the other processes in the type of the value being communicated) or an empty process (for the remaining processes). However, this is more complicated for statements. For instance, consider the following choreography, which matches on a sum type which is either an integer on or a unit on . If it is an integer, then receives that integer from Alice and the choreography returns the integer now located at . Otherwise, the choreography returns the default value 42 also located at of which branch she has taken using terms.
Imagine projecting this to ’s point of view. He does not have any of the information in the sum, so he cannot participate in choosing which branch of the expression gets evaluated. Instead, he has to wait for to tell him which branch he is in. If we naïvely translate just the first branch of the case expression, waits for to send him the label Just and then waits for to send him an integer. Similarly, in the second branch waits for to send him the label Nothing before returning the default value 42. Somehow, we need to combine these so that waits for either label, and then takes the corresponding action.
We do this by merging ’s local programmes for each branch (Carbone et al., Reference Carbone, Honda and Yoshida2012; Honda et al., Reference Honda, Yoshida and Carbone2016; Cruz-Filipe & Montesi, Reference Cruz-Filipe and Montesi2020). Merging is a partial function which combines two compatible local programmes, combining choice statements. In other words, the key property of merging is:
Merging is defined homomorphically on other terms, though it is undefined on incompatible terms. Thus, for example, is undefined.
We can then use this to project the choreography above to as:
where represents a part of the choreography executed by .
Definition 2. The EPP of a choreography M for process is defined by the rules in Figures 9, 10 and 11.
To project a network from a choreography, we therefore project the choreography for each process and combine the results in parallel: .
Intuitively, projecting a choreography to a process that is not involved in it returns a . More-complex choreographies, though, may involve processes that are not shown in their type. This explains the first clause for projecting an application: even if does not appear in the type of M, it may participate in interactions inside M. A similar observation applies to the projection of , where merging is also used.
Selections and communications follow the intuition given above, with one interesting detail: self-selections are ignored, and self-communications project to the identity function. This is different from many other choreography calculi, where self-communications are not allowed—we do not want to impose this in PolyChor $\lambda$ , since we have process polymorphism and therefore do not want to place unnecessary restrictions on which processes a choreography can be instantiated with.
Any process must prepare two behaviours for a process abstraction : one for when X is instantiated with itself, and one for when X is instantiated with another process. To do this, we use terms, which allow to use its knowledge of its identity to select which behaviour takes place. (This also holds when X has a Without kind, as long as the base kind is , is excluded from the type of X and does not participate in M then we simply project to .) However, type abstractions terms if is not a kind of processes, since cannot instantiate X.
When projecting an application, we may project both the function and its argument, either one but not the other, or neither. While it may seem simple—just project both sides, and get rid of any that come up—it turns out to be somewhat complicated. In order to ensure every process performs actions in the same order and avoid communication mismatches, we must project an abstraction for any process involved in the computation, even if they do not have the input (Cruz-Filipe et al., Reference Cruz-Filipe, Graversen, Lugovic, Montesi and Peressotti2023, Example 6). To see why this causes complications, consider . When M gets projected to , it becomes . However, applying M to an argument—say, —needs to lead to a function application on ! Thus, we project this to , allowing Q to instantiate its function. We use the type system to identify the cases where we need to keep and those where we should only project the function part of an (type) application.
Type applications work a bit differently. Since there is no chance of communication happening while computing a type, we can project only the body of a type abstraction without the actual abstraction to when we know the argument is not located at . In addition, we do not have a case for projecting only the argument, since the context surrounding a type abstraction will not expect a type.
In general, projecting a type yields at any process not used in that type. We use the restrictions on kinds to avoid projecting type variables and type abstractions when we know we do not need to and project all process names to themselves, but otherwise the projection of type constructs is similar to that of corresponding process terms.
Finally, to execute a projected choreography, we need to project the set of definitions of choreographic functions to a set of definitions of local functions. Since these functions are all parametrised over every involved process, this is as simple as projecting the definitions onto an arbitrarily chosen process name.
Note that function names always get projected everywhere. This means that if we have a function which does not terminate when applied to some value in any process, then it diverges when applied to that value in the choreography and in every other process.
Example 5. We will now show how to project the bookseller service example Equation (1.5). As in that example we use as syntactic sugar for for some as syntactic sugar for for some . We project for and get the following process:
Here we can see that if the buyer B turns out to be itself, then all the communications become identity functions, and the seller does not inform itself of its choice. Otherwise, we get a function which, after being instantiated with a buyer, also needs to be instantiated with two s representing values existing at B. It then waits for B to send a title, returns the price of this title, and waits for B to decide whether to buy or not. It might seem strange to have a function parametric on two values that are located at B and will therefore here be instantiated with s, but this example actually illustrates why when projecting we cannot in cases like this remove the first two $\lambda$ s from the process without causing a deadlock. Consider that is syntactic sugar for . Here we need to have the abstraction on y even though it gets instantiated as sends the result of . If instead we only had , then the first part of the application would not be a value, and would be waiting for B to choose between $\textsf{Buy}$ and $\textsf{Quit}$ while B has the abstraction on y and therefore considers the first part of the application a function which must wait to be instantiated. B therefore expects to receive the result of , and we get a deadlock in our system. This is why we never want to project a value to a non-value term and need to keep any abstractions guarding a part of the choreography involving .
5 The correctness of endpoint projection
We now show that there is a close correspondence between the executions of choreographies and of their projections. Intuitively, this correspondence states that a choreography can execute an action if, and only if, its projection can execute the same action, and both transition to new terms in the same relation. However, this is not completely true: if a choreography M reduces by rule , then the result has fewer branches than the network obtained by performing the corresponding reduction in the projection of C.
In order to capture this we revert to the branching relation (Cruz-Filipe & Montesi, Reference Cruz-Filipe and Montesi2020; Montesi, Reference Montesi2023), defined by $M \sqsupseteq N$ iff . Intuitively, if $M\sqsupseteq N$ , then M offers the same and possibly more behaviours than N. This notion extends to networks by defining $\mathcal{N}\sqsupseteq\mathcal{N}'$ to mean that, for any role .
Using this, we can show that the EPP of a choreography can do all that (completeness) and only what (soundness) the original choreography does. For traditional imperative choreographic languages, this correspondence takes the form of one action in the choreography corresponding to one action in the projected network. We instead have a correspondence between one action in the choreography and multiple actions in the network due to allowing choreographies to manipulate distributed values in one action such as in independently take the first part of the pair.
Theorem 5 (Completeness). Given a closed choreography M, if $M\rightarrow_{D} M'$ , $\Theta;\Gamma\vdash D$ , , and is defined, then there exists network $\mathcal{N}$ and choreography M” such that: .
Proof We prove this by structural induction on $M\rightarrow_{D} M'$ . We take advantage of the fact that type values project to at processes not involved in them, while choreographic values correspondingly project to at processes not involved in their type. See Appendix C for full details.
Theorem 6 (Soundness). Given a closed choreography M and a function mapping D, if , $\Theta;\Gamma\vdash D$ , and for some network $\mathcal N$ , then there exist a choreography M’ and a network $\mathcal N'$ such that: $M\rightarrow^*_{D} M'$ , .
Proof We prove this by structural induction on M in the accompanying technical report, taking advantage of the fact that thanks to projecting function names everywhere, a choreography that diverges at one role diverges at every role. See Appendix 12 for full details.
From Theorems 3 to 6, we get the following corollary, which states that a network derived from a well-typed closed choreography can continue to reduce until all roles contain only local values.
Corollary 1. Given a closed choreography M and a function environment D containing all the function names of M, if $\Theta;\Gamma\vdash M:T$ and $\Theta;\Gamma\vdash D$ , then: whenever for some network $\mathcal{N}$ , either there exists such that or .
6 Case study
We now show how our language can be used in an extended example (Figure 12). This example involves three processes: a client , an edge computer , and a server . Intuitively, wants to request that does some computation. However, may not have the resources to perform the computation; in this case, it will forward the request to . Whenever receives a request, then must first perform an authentication protocol. Whether or not the task is outsourced to logs the request.
Here we assume the following data:
-
A task (of type
-
For each of , a local function Compute which executes a task
-
An authentication choreography $\textsf{Authenticate}$ between and a another process . This choreography takes a key AuthKey and checks if the holder of that key is authorised to run a task on .
-
A key $\textsf{AuthKey}$ for
-
A logging choreography LogRequest involving two roles, provocatively called and . This choreography takes a client, a task, and the result of executing the task (at ) as input. It then creates a log entry at .
-
A local function HandleHere, which uses to determine whether it can handle a task locally. If HandleHere returns false, then the task must be shipped to . Unlike other data, this is represented as input to the choreography.
The choreography begins with sending the task to ; we call the resulting task x. (Note that x is the name of the task on , not the name of the task on then checks whether it can handle x using $\textsf{HandleHere}$ . If so, that it is computing the task. After performing the task, sends the result to . It furthermore informs so that it knows that it needs to log the task.
If cannot handle the task, then it again informs then makes a decision on whether has authorisation to request a task from . To make this decision, sends an authentication protocol to . Because communications swap the sender and the receiver in the communicated value, we write this authentication protocol with playing the role of the client. The protocol is therefore parameterised on the authenticator. Once receives the authentication protocol, it can instantiate the authenticator as finally sends the (now complete) protocol to ; running this protocol will have send its key to , possibly among other actions required for authentication. If the authentication procedure succeeds, then informs can then send the task to , who computes it and returns the result to . If authentication fails, then informs of this and the task fails, resulting in a 0 on . Either way, we finish the choreography by logging the task and its result using the function $\textsf{LogRequest}$ .
For to send an authentication protocol which it is itself involved with requires a bit of trickery. Usually, we would expect every part of the sent value located at to be moved to the receiver (first and then after another communication ) but obviously that would mean cannot be involved. We therefore send an authentication protocol that is parametric on the authenticator, , and only instantiate after the first communication from to has taken place.
Projecting this protocol to leads to the following code:
We see that the second case gets projected as an application of a new abstraction on the new variable y, with ’s part of the condition as the right side of the application. Since the condition contains a delegation, we get some process substitutions representing values with unknown locations being communicated between other processes. Because of the in the second branch, none of the substituted processes are ever reached. Therefore, these substitutions do nothing. Since is not involved in ’s decision to delegate to (or not), we do not see any of the code involved in the decision here. Instead, we get a straightforward offer as the result of merging the projection of each branch of the involved conditional.
We now show the projection for :
Note that we treat the second case almost the same as in , except that is involved in both communications of the delegation. Since the condition of the first case is located at , it gets projected as a case. Keep in mind that since we model communication as an exchange, what will actually be executed at after the delegation takes place is the right branch of the in the projection of .
Finally, we show the projection for :
Here we finally see the projection of what actually wants to do in order to authenticate. We also see that in the case where gets instantiated as the same process it is communicating with, which would be if the protocol did not get communicated before is instantiated, the communication gets replaced by an identity function .
7 Related work
7.1 Choreographies
Choreographies are inspired by the “Alice-and-Bob” notation for security protocols by Needham & Schroeder (Reference Needham and Schroeder1978), which included a term for expressing a communication from a participant to another. The same idea inspired later several graphical notations for modelling interactions among processes, including sequence diagrams and message sequence charts (International Telecommunication Union, 1996; Object Management Group, 2017).
A systematic introduction to theory of choreographic languages and their historical development can be found in Montesi (Reference Montesi2023). We recap and discuss relevant recent developments. The first sophisticated languages for expressing choreographies were invented to describe interactions between web services. The Web Services Choreography Description Language (WS-CDL) by The World Wide Web Consortium (W3C) (2004) is a choreographic language which describes the expected observable interactions between web services from a global point of view (Zongyan et al., Reference Zongyan, Xiangpeng, Chao and Hongli2007). Carbone et al. (Reference Carbone, Honda and Yoshida2012) later formalised endpoint projection for a theory of choreographies based on WS-CDL, and in particular introduced the merging operator (which we adjusted and extended to our setting). This inspired more work on choreographies and projection and eventually the birth of choreographic programming—where choreographies are programmes compiled to executable code—and the first choreographic programming language, Chor (Montesi, Reference Montesi2013). As choreographic programming languages became more complex, Cruz-Filipe & Montesi (Reference Cruz-Filipe and Montesi2020) developed a core calculus of choreographies (CC). Montesi (Reference Montesi2023) revisited and generalised CC in his text on foundations of choreographic languages. Cruz-Filipe et al. (Reference Cruz-Filipe, Montesi and Peressotti2021) then formalised this new version and its properties in the Coq theorem prover (The Coq Development Team, 2004). Later, Pohjola et al. (Reference Pohjola, Gómez-Londoño, Shaker and Norrish2022) developed a certified end-to-end compiler from another variation of CC to CakeML by using the HOL theorem prover.
One of the primary design goals of all of choreographic programming languages is deadlock-freedom by design (Carbone & Montesi, Reference Carbone and Montesi2013)—the operational correspondence between the choreography and the distributed network ensures deadlock-freedom for the network. PolyChor $\lambda$ achieves this goal. Montesi (Reference Montesi2023) discusses restrictions for a procedural imperative choreographic language in order to obtain a stronger liveness property (starvation-freedom). The idea is to prove that processes eventually get involved in transitions at the choreographic level and then leverage the correctness of endpoint projection to obtain the same result about choreography projections. This idea might work for PolyChor $\lambda$ as well, but whether and how the technical devices for starvation-freedom in Montesi (Reference Montesi2023) can be adapted to PolyChor $\lambda$ is not clear due to the different nature of our language (functional instead of imperative). Alternatively, one could adapt static analyses for lock-freedom—like that in Kobayashi (Reference Kobayashi2006)—to choreographies. We leave explorations of liveness properties other than deadlock-freedom in PolyChor $\lambda$ to future work.
The first choreographic language with limited process polymorphism was Procedural Choreographies (PC) (Reference Cruz-Filipe and MontesiCruz-Filipe & Montesi, 2017b ). In PC, a choreography comes with an environment of predefined procedures parametric on process names which may be called by the choreography. These procedures have a number of limitations compared to the process polymorphism of PolyChor $\lambda$ : they cannot contain any free processes, they cannot be partially instantiated, and they are lower-order—that is, they must be defined in the environment rather than as part of a larger choreography. These limitations allow the projection function to know how the procedure will be instantiated, whereas in PolyChor $\lambda$ we may need to compute the processes involved first. This has major implications for projection: in PC, it is easy to tell when projecting a procedure call which processes are involved and therefore need a local copy of the call. However, PolyChor $\lambda$ ’s full process polymorphism allows the function and process names to each be enclosed in a context. While this allows greater flexibility for programmers, it forces us to project a process-polymorphic functions to every process and let each process determine at runtime whether it is involved.
Recently, there has been a fair amount of interest in higher-order and functional programming for choreographies (Hirsch & Garg, Reference Hirsch and Garg2022; Cruz-Filipe et al., Reference Cruz-Filipe, Graversen, Lugovic, Montesi and Peressotti2022; Shen et al., Reference Shen, Kashiwa and Kuper2023; Giallorenzo et al., Reference Giallorenzo, Montesi and Peressotti2023). The first higher-order choreographic programming language, Choral (Giallorenzo et al., Reference Giallorenzo, Montesi and Peressotti2023) is an object-oriented choreographic language compiled to Java code. Thus, Choral choreographies can depend on other choreographies, allowing programmers to reuse code. Choral was also the first choreographic language to treat as a first-class function.
While Choral gave a practical implementation of higher-order choreographies, it did not establish their theoretical properties. Two different—but independently developed—works filled this gap, including Chor $\lambda$ , the basis of PolyChor $\lambda$ . Chor $\lambda$ is a functional choreographic calculus based on the $\lambda$ -calculus. In this work, we extended Chor $\lambda$ with type and process polymorphism and the ability to send non-local values such as choreographies. Chor $\lambda$ , and hence PolyChor $\lambda$ , provides a core language for higher-order choreographies, thus allowing us to establish their properties. Since the original Chor $\lambda$ has parametric procedures like PC and Choral, it lacks PolyChor $\lambda$ ’s property that a choreography diverging in one process must diverge in every process. This forces Chor $\lambda$ to have both a complex notion of out-of-order execution and a more lax correspondence between actions in the network and the choreography.
The other work establishing the theoretical properties of higher-order choreographic programming is Pirouette (Hirsch & Garg, Reference Hirsch and Garg2022), which is also a functional choreographic programming language based on simply-typed $\lambda$ calculus. Unlike Chor $\lambda$ (and thus PolyChor $\lambda$ ), Pirouette does not allow processes to send messages written in Pirouette. Instead, it takes inspiration from lower-order choreographic programming languages in which (the computations to produce) messages are written in their own separate language. Like other choreographic languages (Cruz-Filipe et al., Reference Cruz-Filipe, Montesi and Peressotti2021; Montesi, Reference Montesi2023), Pirouette’s design is parametrised by the language for writing messages. Thus, Pirouette can describe communication patterns between processes that draw from a large swath of languages for their local computations. Nevertheless, this design means that Pirouette fundamentally cannot allow programmes to send choreographic functions, unlike PolyChor $\lambda$ .
Moreover, unlike Chor $\lambda$ and PolyChor $\lambda$ , Pirouette forces every process to synchronise when applying a function. This allows Pirouette to establish a bisimulation relation with its network programming language, a result formalised in Coq. This result allows a traditional—and verified—proof of deadlock-freedom by construction. However, this constant synchronisation would be a bottleneck in real-world systems; PolyChor $\lambda$ ’s choice to obtain a weaker—but strong-enough—connection between the languages allows it to avoid this high cost.
7.2 Concurrent functional programming
Functional concurrent programming has a long history, starting with attempts to parallelise executions of functional programmes (Burton, Reference Burton1987). The first language for functional programming with communications on channels was Facile (Giacalone et al., Reference Giacalone, Mishra and Prasad1989) which, unlike later choreographic languages, had an abstraction over process IDs very similar to process polymorphism. A more recent work, which more closely resembles choreographic programming, is Links (Cooper et al., Reference Cooper, Lindley, Wadler and Yallop2006), with the RPC calculus (Cooper & Wadler, Reference Cooper and Wadler2009) as its core language. Links and the RPC calculus, like choreographies, allow a programmer to describe programmes where different parts of the computation takes place at different locations and then compile it to separate code for each location. Interestingly, though Links has explicit communication, in the RPC calculus the top level does not, and communications are created only when projecting a function located at a different process. Moreover, the RPC calculus does not feature multiple threads of computation; instead, on communication the single thread of computation moves to a new location while other locations block. The RPC calculus was later extended with location polymorphism, very similar to our and Facile’s process polymorphism (Choi et al., Reference Choi, Cheney, Fowler and Lindley2020). However, as the RPC calculus only deals with systems of 2 processes, a client and a server, they project a process abstraction as a pair and then the location as picking the correct part of this pair. This solution creates a simpler network language but is not suitable for a framework with an arbitrary number of participants such as PolyChor $\lambda$ . Moreover, the RPC calculus—like PolyChor $\lambda$ but unlike traditional choreographic languages—does not have out-of-order execution at the top level.
Session types were applied to a concurrent functional calculus with asynchronous communication by Gay & Vasconcelos (Reference Gay and Vasconcelos2010). Though initially this language did not guarantee deadlock-freedom, only runtime safety, later versions of GV (Wadler, Reference Wadler2012; Lindley & Morris, Reference Lindley and Morris2015) did. Jacobs et al. (Reference Jacobs, Balzer and Krebbers2022) extended GV with global types (Honda et al., Reference Honda, Yoshida and Carbone2016), which generalise session types to protocols with multiple participants. Similarly to choreographic programming, global types offer a global viewpoint on interactions. However, they are intended as specifications and thus cannot express computation. Global types are typically projected onto local types, which manually-written programmes can later be checked against. In choreographic programming, by contrast, choreographies are projected directly to programmes. Some works mix the approaches (e.g., Scalas et al., Reference Scalas, Dardha, Hu and Yoshida2017): given a global type, a compiler produces typestate-oriented libraries (Aldrich et al., Reference Aldrich, Sunshine, Saini and Sparks2009) that help the users with following the global type correctly (but not with performing the right computations at the right time).
Session types have also been used to study global higher-order programming outside of functional settings. Mostrous & Yoshida (Reference Mostrous and Yoshida2007) describe the challenges associated with obtaining subject reduction when sessions can pass other sessions between them. Based on this, Mostrous & Yoshida (Reference Mostrous and Yoshida2009) define the higher-order $\pi$ -Calculus with asynchronous sessions, a calculus combining elements of the $\pi$ -calculus and $\lambda$ -calculus.
Castellani et al. (Reference Castellani, Dezani-Ciancaglini, Giannini and Horne2020) propose a notion of session types with delegation. They write delegation by enclosing a part of the global type in brackets. During the execution of such a part, one process acts on another’s behalf by temporarily taking its name. This means that they do not need to inform other participants in the delegated computation that delegation is happening. However, nested delegations can cause deadlocks.
ML5 (Murphy VII et al., Reference Murphy VII, Crary and Harper2007; Licata & Harper, Reference Licata and Harper2010) is a functional concurrent programming language based on the semantics of modal logic. However, instead of the and terms of choreographic languages, they have a primitive get[w] M, which makes another process w evaluate M and return the result. Since M may include other gets, this construct gives ML5 something resembling PolyChor $\lambda$ ’s ability to send a full choreography. However, the result of evaluating this “choreography” must be at the receiver and then returned to the sender.
Multitier programming languages, like ScalaLoci (Weisenburger & Salvaneschi, Reference Weisenburger and Salvaneschi2020), offer another paradigm for describing global views of distributed systems. Like Choral, ScalaLoci is built on top of an existing object-oriented language: in this case, Scala. In ScalaLoci and other tierless languages, an object describes a whole system containing multiple processes and functions. Differently from choreographic programming, multitier programming does not allow for modelling the intended flow of communications. Rather, communication happens implicitly through remote function calls and the concrete protocol to be followed is largely left to be decided by a runtime middleware. For a more detailed comparison of choreographic and multitier languages, see the work of Giallorenzo et al. (Reference Giallorenzo, Montesi, Peressotti, Richter, Salvaneschi and Weisenburger2021).
8 Conclusion
In this paper, we presented PolyChor $\lambda$ , the first higher-order choreographic programming language with process polymorphism. PolyChor $\lambda$ has a type and kind system based on System F $\omega$ , but extended such that process names are types of kind . Moreover, we showed how to obtain a concurrent interpretation of PolyChor $\lambda$ programmes in a process language by using a new construct corresponding by the ability of a process to know its identity. We found that this construct was necessary if process variables are able to be instantiated as the process they are located at, but using a choreographic language abstracts from this necessity. Our explorations of process polymorphism also allowed PolyChor $\lambda$ to describe a communication of a non-local value from as sending the part of the message owned by . These non-local values include full choreographies, creating a simple and flexible way to describe delegation by communicating a distributed function describing the delegated task. This innovation required a new notion of communication as an exchange in which the delegator rather than being left with an empty value after sending a choreography is left with a function which will allow it to potentially take part in the delegated task, e.g., by receiving a result at the end.
Process polymorphism fills much of the gap between previous works on the theory of higher-order choreographies and practical languages. However, there is still more work to do. For instance, currently PolyChor $\lambda$ does not support recursive types.
Our current results rely on types normalising to a type value, which recursive types may not do. System F $\omega$ does not have our restriction of type abstractions only being instantiated with type values. However, PolyChor $\lambda$ needs to ensure that communications are only undertaken between processes, rather than complicated type expressions resulting in processes. Thus, we need to treat our type system as call-by-name, leading to the restriction above.
In order to support recursive types, we would need to either make endpoint projection capable of projecting to a possibly-nonterminating description of a process, or limit recursive types ability to make type computations fail to terminate.
Furthermore, one can imagine allowing processes to send types and process names as well as values. This would, for example, allow us to programme a server to wait to receive the name of a client which it will have to interact with. Since this form of delegation is common in practice, understanding how to provide this capability in a choreographic language, while retaining the guarantees of choreographic programming, would enable programmers to apply their usual programming patterns to choreographic code.
We project local type despite lacking a typing system for local processes. Our unusual network communication semantics have made it difficult to define local typing rules for , and we therefore leave local typing (or alternatively type erasure) as future work.
Certain, instant, and synchronous communication is convenient for theoretical study, but such assumptions do not match real-world distributed systems. Reference Cruz-Filipe and MontesiCruz-Filipe & Montesi (2017a ) model asynchronous communication in choreographies via runtime terms representing messages in transit. We could adapt this method to PolyChor $\lambda$ by having the communication primitive reduce in two steps: first to a runtime term and then to the delivered value. However, this extension would be nontrivial, since it is not obvious how to represent messages in transit when those messages are non-local values such as choreographies. In addition, the way we represent a communication at the local level (swapping values rather than only moving a value from sender to receiver) might require additional machinery (e.g., new runtime terms) to capture its asynchronous execution.
We also leave practical implementation of PolyChor $\lambda$ ’s new features to future work. This could be achieved by extending Choral (Giallorenzo et al., Reference Giallorenzo, Montesi and Peressotti2023), the original inspiration for Chor $\lambda$ . Communication primitives in Choral are user-defined—not fixed by any middleware or compiler—so it is possible to define new communication abstractions involving multiple roles. However, we need to manipulate roles at runtime in our local semantics, while the Choral compiler erases roles when projection code to Java. We may be able to overcome this issue by reifying roles in projected code or by using reflection.
While these gaps between theory and practice persist, process polymorphism in PolyChor $\lambda$ brings us much closer to realistic choreographic languages for distributed systems. Choreographic programmes promise to provide easier and cleaner concurrent and distributed programming with better guarantees. With higher-order choreographic programming and process polymorphism, the fulfilment of that promise is nearly within reach.
Acknowledgements
We thank the anonymous reviewers for their useful comments.
Funding
This work was supported by Villum Fonden (F.M., grant numbers 29518, 50079) and Independent Research Fund Denmark (F.M., grant number 0135-00219).
Conflicts of interest
The authors report no conflict of interest.
A Full PolyChor $\lambda$ typing rules
B Full PolyChor $\lambda$ operational semantics
C Proof of Theorem 5
In the foregoing, we use L to denote local expressions and U to denote local values in order to make the proofs more readable, as we will be switching back and forth between layers a lot.
Before we can prove completeness, we need a few lemmas. First, we show that choreographic values always project to local values.
Lemma 3. For any choreographic value V and process , if then is either a value or undefined.
Proof Straightforward from the projection rules.
We then prove the same for type values.
Lemma 4. Given a type value , if then for any process .
Proof Straightforward from the projection rules.
We then show that type values are projected to at uninvolved processes.
Lemma 5. Given a type value , for any process .
Proof Straightforward from induction on .
And similarly, we show that choreographic values project to at processes not involved in their type.
Lemma 6. Given a value V, if then for any process , we have is undefined.
Proof Follows from Lemmas 3 and 5 and the projection rules.
Finally, we show that equivalent types are projected to equivalent local types.
Lemma 7. Given a closed type and , then for any process , .
Proof We prove this by structural induction on . All but one case follow by simple induction.
We also need to prove that performing a substitution before and after projection yield the same result.
Lemma 8. Given a choreography M with a free variable x, a value V, and a type such that and is defined, we get .
Proof If then by definition and by Lemma 6, then and since we use $\alpha$ -conversion when substituting, we can guarantee that anywhere it gets substituted into M, meaning the projection will always be the same as it does not depend on context, only on syntax and type. We therefore get .
We are now ready to prove completeness.
Proof [Proof of Theorem 5] We prove this by structural induction on $M\rightarrow_{D} M'$ .
-
-
-
-
Assume $M=V~N$ , $M'=V~N'$ , and $N\rightarrow_{D} N'$ . This is similar to the previous case, using Lemma 3 to ensure every process can use Rule .
-
-
-
Assume and $M'=N'[x':= V]$ . This case is similar to the previous.
-
-
-
Assume , $N\rightarrow_{D} N''$ , and . Then, the result follows from simple induction.
-
Assume , $N\rightarrow_{D} N'$ , and . Then, the result follows from simple induction.
-
Assume and $M'=V$ . Then for any process such that or and for any other process , we have . We define $\mathcal{N}=[\![ M']\!]$ and the result follows.
-
Assume and $M'=V'$ . Then, the case is similar to the previous.
-
Assume $M=f$ and $M'=D(f)$ . Then, the result follows from the definition of $[\![ D]\!]$ .
D Proof of Theorem 6
As with completeness, we need some ancillary lemmas before we can prove soundness. For this, we need a notion of removing processes from a network.
Definition 3. Given a network .
First we show that actions in a network do not affect the roles not mentioned in the transition label.
Lemma 9. For any process and network $\mathcal{N}$ , if and .
Proof Straightforward from the network semantics.
Then we show that removing processes from a network does not prevent it from performing actions involving different processes.
Lemma 10. For any set of processes and network $\mathcal{N}$ , if then .
Proof Straightforward from the network semantics.
We finally show that if the projection of a choreographic type is equivalent to a local type value, then the original choreographic type is equivalent to a choreographic type value.
Lemma 11 Given a closed type and process then there exist a type such that: .
Proof We prove this by structural induction on . All but one case follows from simple induction.
We are then ready to prove soundness.
Proof [Proof of Theorem 6]. We prove this by structural induction on M.
-
Assume $M=V$ . Then for any process
-
-
-
-
Assume . This case is similar to the previous.
-
Assume . Then, the result follows from simple induction.
-
-
Assume . This is similar to the case above.
-
Assume $M=f$ . Then for any process . We therefore have some process such that and . We then define the required choreography $M'=D(f)$ and network and the result follows.
Discussions
No Discussions have been published for this article.