1. Introduction
Comparing behavior of programs is one of the fundamental activities in verification and analysis of programs, and many concepts, techniques, and formal systems have been proposed for this purpose, such as product program construction (Barthe et al., Reference Barthe, Crespo, Kunz, Butler and Schulte2011, Reference Barthe, D’Argenio and Rezk2004; Zaks and Pnueli, Reference Zaks, Pnueli, Cuellar, Maibaum and Sere2008), relational Hoare logic (Benton, Reference Benton2004), higher-order relational refinement types (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015), and so on.
Several recent relational program logics integrate compositional reasoning about relational properties of programs and over-approximating quantitative differences between computational effects of programs; the latter is done in the style of effect system (Lucassen and Gifford, Reference Lucassen and Gifford1988). One successful logic of this kind is Barthe et al.’s approximate probabilistic relational Hoare logic (apRHL for short) designed for verifying differential privacy of probabilistic programs (Barthe et al., Reference Barthe, Köpf, Olmedo and Béguelin2012). A judgment of apRHL is of the form $ c\sim_{\epsilon,\delta}c':\Phi\Rightarrow\Psi $ , and its intuitive meaning is that for any pair of initial states $ (\rho,\rho') $ related by $ \Phi $ , the $ \epsilon $ -distance between two probability distributions of final states $ [\![ c]\!]\rho $ and $ [\![ c']\!]{\rho'} $ is below $ \delta $ , and final states are related by $ \Psi $ . Another relational program logic that measures differences between computational effects of programs is Çiçek et al.’s RelCost (Çiçek et al., Reference Çiçek, Barthe, Gaboardi, Garg and Hoffmann2017). The target of the reasoning is a higher-order functional programming language equipped with cost counting effect. When we derive a judgment $ \Delta; \Psi; \Gamma \vdash M_1 \ominus M_2 \precsim n \colon \Phi $ in RelCost, its sound semantics ensures that the difference of cost counts by $ M_1 $ and $ M_2 $ is bound by n.
A high-level view on these relational program logics is that they integrate the feature of measuring quantitative differences between computational effects into relational program logic. We aim to mathematically formulate this perspective in this paper. The central concept in our formulation is a divergence on a monad. It specifies quantitative differences between computational effects modeled by a monad, and it may be seen as a generalization of various statistical divergences, such as the Kullback–Leibler divergence and the total variation distance, to general computational effects. We then present a construction of the semantics of a relational program logic that integrates the measurement of quantitative differences between computational effects from a divergence on a monad. This construction works with general monads and divergences on them and can be applied to derive the semantics of the approximate relational program logic for differential privacy as an instance. The detail of the development of our formulation is as follows:
• We introduce a structure called the divergence on a monad for measuring quantitative differences between computational effects in Section 4. Divergences on monads are a mild generalization of Barthe and Olmedo’s composable divergences on probability distributions, such as the Kullback–Leibler divergence and the total variation distance on probability distributions (Barthe and Olmedo, Reference Barthe and Olmedo2013). We visit examples of divergences on monads in Section 5 and give a method to transfer divergences on a monad to those on another monad through monad opfunctors.
• We study mathematical aspects of divergences on monads in Section 6. We first reformulate them as structures in the category of divergences. We show that divergences on monads can be lifted to the category of divergences as monads, and that these liftings yield relative monads corresponding to the original divergences on monads. We next introduce the concept of generatedness of divergences on monads. It is a direct generalization of the same concept studied for differential privacy by Balle et al. (Reference Balle, Barthe, Gaboardi, Hsu, Sato, Chiappa and Calandra2020). We then associate them to the concept of quantitative equational theories studied by Mardare et al. (Reference Mardare, Panangaden and Plotkin2016). We show that unconditional quantitative equational theories on a set X of variables bijectively correspond to X-generated pseudometrics on free monads.
• The key structure to integrate divergences on monads and relational program logics is something called the graded strong relational liftings of a monad. We present a general construction of such liftings from divergences on monads in Section 7. This construction makes it possible to give semantics of relational program logics with quantitative measurement of differences between computational effects for various monads and divergences on them.
• We introduce a generic relational program logic (called acRL) over Moggi’s computational metalanguage (the simply typed lambda calculus with monadic types) in Section 8. Inside acRL, we can use graded strong relational liftings constructed from divergences on a monad and reason about relational properties of programs together with quantitative differences of computational effects. To illustrate how the reasoning works in acRL, we instantiate it with the computational metalanguage having effectful operations for continuous random sampling (Section 9) and cost counting operation (Section 10).
2. Preliminaries
We assume basic knowledge about category theory (Mac Lane, Reference Mac Lane1998) and Moggi’s model of computational effects (Moggi, Reference Moggi1991). The definition of monad (Mac Lane, Reference Mac Lane1998, Chapter VI) and Kleisli category (Mac Lane, Reference Mac Lane1998, Section VI.5) are omitted.
In this paper, a Cartesian category (CC for short) is specified by a category $ {\mathbb{C}} $ with a designated terminal object 1 and a binary product functor $ (\times)\colon {\mathbb{C}}^2\to{\mathbb{C}} $ . The associated pairing operation and projection morphisms are denoted by $ \langle-,-\rangle,\pi_1, \pi_2 $ , respectively. The unique morphism to the terminal object is denoted by $ !_I:I\to 1 $ . A Cartesian closed category (CCC for short) is a CC $ ({\mathbb{C}},1,(\times)) $ with a specified exponential functor $ (\Rightarrow)\colon {\mathbb{C}}^{\mathrm{op}}\times{\mathbb{C}}\to{\mathbb{C}} $ . The associated evaluation morphism and currying operation is denoted by $ \mathrm{ev},\lambda(-) $ , respectively. There are plenty of examples of C(C)Cs. For the models of probabilistic computation, we will use the CC Meas of measurable spaces and the CCC QBS of quasi-Borel spaces (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017), which we briefly recall in Section 2.1.
Let $ ({\mathbb{C}},1,(\times)) $ be a CC. A global element of $ I\in{\mathbb{C}} $ is a morphism of type $ 1\rightarrow I $ . For a category $ {\mathbb{C}} $ , we define the functor $ U^{\mathbb{C}}:{\mathbb{C}}\to{\bf Set} $ by $ U^{\mathbb{C}}={\mathbb{C}}(1,-) $ . When $ {\mathbb{C}} $ is obvious, $ U^{\mathbb{C}} $ is denoted by U. Morphisms in $ {\mathbb{C}} $ act on global elements by the composition. To emphasize this action, we introduce a dedicated notation $ (\mathbin{\bullet}) $ whose type is $ {\mathbb{C}}(I,J)\times UI\to UJ $ . Of course, $ f\mathbin{\bullet} x\triangleq f\circ x=(Uf)(x) $ . We also define the partial application of a binary morphism $ f \colon I \times J \rightarrow K $ to a global element $ i\in UI $ by $ f_i \triangleq f \circ \langle i \circ !_J, {\rm id}_J \rangle:J\to K $ . When $ {\mathbb{C}} $ is a CCC, there is an evident isomorphism $ \lfloor{-}\rfloor\colon U(I\Rightarrow J)\cong{\mathbb{C}}(I,J) $ . We write $ \lceil{-}\rceil $ for its inverse.
A monad $ (T,\eta,\mu) $ on a category $ {\mathbb{C}} $ determines the operation $ (-)^\sharp:{\mathbb{C}}(I,TJ)\to{\mathbb{C}}(TI,TJ) $ called Kleisli extension. It is defined by $ f^\sharp\triangleq\mu_J\circ Tf $ . A monad may be equivalently given by a Kleisli triple (Moggi, Reference Moggi1991, Definition 1.2) that axiomatizes the triple $ (T,\eta,(-)^\sharp) $ . A strong monad on a CC $ ({\mathbb{C}},1,(\times)) $ is a pair of a monad $ (T,\eta,\mu) $ and a natural transformation $ \theta_{I,J}\colon I\times TJ\to T(I\times J) $ called strength. It should satisfy four axioms; see (Moggi, Reference Moggi1991, Definition 3.2) for detail.
A Cartesian category with a strong monad (CC-SM for short) is a pair of a CC and a strong monad on it. A Cartesian closed category with a strong monad (CCC-SM for short) is similarly defined. In a CC-SM $ ({\mathbb{C}},1,(\times),T,\eta,\mu,\theta) $ , the application of the strength to a global element can be expressed by the unit and the Kleisli extension of T (Moggi, Reference Moggi1991, Proof of Proposition 3.4):
We will use this fact in Proposition 7, Proposition 17, and Theorem 39.
2.1 Measurable spaces and quasi-Borel spaces
Measurable spaces. For the treatment of continuous probability distributions, we employ the category Meas of measurable spaces and measurable functions. For a measurable space I, we write $ |I| $ and $ \Sigma_I $ for the underlying set and $ \sigma $ -algebra of I, respectively. The category Meas is a (well-pointed) CC, and it has all small limits and small colimits that are strictly preserved by the forgetful functor $ |{-}| \colon {\bf Meas} \to {\bf Set} $ , which is naturally isomorphic to the global element functor $ {\bf Meas}(1,-) $ .
Standard Borel spaces. A standard Borel space is a special measurable space $ (|\Omega|, \Sigma_\Omega) $ whose $ \sigma $ -algebra $ \Sigma_\Omega $ is the coarsest one including the topology $ \sigma_\Omega $ of a Polish space $ (|\Omega|, \sigma_\Omega) $ . In particular, the real line $ {\mathbb{R}} $ forms a standard Borel space. In fact, a measurable space $ \Omega $ is standard Borel if and only if there are $ \gamma \colon \Omega \to {\mathbb{R}} $ and $ \gamma' \colon {\mathbb{R}} \to \Omega $ in Meas forming a section–retraction pair, that is, $ \gamma' \circ \gamma = {\rm id}_\Omega $ . For example, [0,1], $ [0,\infty] $ , $ {\mathbb{N}} $ , and $ {\mathbb{R}}^k $ ( $ k \in {\mathbb{N}} $ ) are standard Borel.
The Giry monad. We recall the Giry monad G (Giry, Reference Giry and Banaschewski1982). For every measurable space I, G I is the measurable space given by the following data: the underlying set $ |G I| $ is the set of all probability measures over I, and the $ \sigma $ -algebra is the coarsest one induced by functions $ \mathrm{ev}_A \colon |G I | \to [0,1] $ ( $ A \in \Sigma_X $ ) defined by $ \mathrm{ev}_A(\mu) = \mu(A) $ . The unit $ \eta_I \colon I \to G I $ assigns to each $ x \in I $ the Dirac measure $ \mathbf{d}_x $ centered at x. For every $ f \colon I \to G J $ , the Kleisli extension $ f^\sharp \colon G I \to G J $ is given by $ (f^\sharp(\mu))(A) = \int_x f(x)(A)~d\mu(x) $ for each $ \mu \in G I $ . By $ G_{s} $ , we mean the subprobabilistic variant of G (called sub-Giry monad), where the underlying set $ |G_{s} I| $ of $ G_{s} I $ is relaxed to the set of subprobaility measures over I.
The Giry monad G (resp. the sub-Giry monad $ G_{s} $ ) carries a (commutative) strength $ \theta_{I,J} \colon I \times G J \to G (I \times J) $ over the CC $ ({\bf Meas},1,(\times)) $ . It computes the product of measures ( $ (x, \mu) \mapsto \mathbf{d}_x \otimes \mu $ ). Therefore, $ ({\bf Meas},G) $ and $ ({\bf Meas},G_{s}) $ are (well-pointed) CC-SMs.
Quasi-Borel spaces. The category Meas is not suitable for the semantics of higher-order programming languages since it is not Cartesian closed (Aumann, Reference Aumann1961) Footnote 1. For the treatment of higher-order probabilistic programs with continuous distributions, we employ the Cartesian closed category QBS of quasi-Borel spaces and morphisms between them, together with the probability monad P on QBS (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017). A quasi-Borel space is a pair $ I = (|I|,M_I) $ of a set $ |I| $ and a subset $ M_I $ of the function space $ {\mathbb{R}} \Rightarrow |I| $ satisfying
(1) for $ \alpha \in M_I $ and a measurable function $ f \colon {\mathbb{R}} \to {\mathbb{R}} $ , $ \alpha \circ f \in M_I $ .
(2) for any $ x \in I $ , $ (\lambda r \in {\mathbb{R}}.x) \in M_I $ .
(3) for all $ P \colon {\mathbb{R}} \to {\mathbb{N}} $ and a family $ \{ \alpha_i \}_{ i \in {\mathbb{N}}} $ of functions $ \alpha_i \in M_I $ , $ (\lambda r \in {\mathbb{R}}. \alpha_{P(r)}(r) ) \in M_I $ .
A morphism $ f \colon (|I|,M_I) \to (|J|,M_J) $ is a function $ f \colon |I| \to |J| $ such that $ f \circ \alpha \in M_J $ holds for all $ \alpha \in M_I $ . The category QBS is a (well-pointed) CCC and has all countable products and coproducts that are strictly preserved by the forgetful functor $ |{-}| \colon {\bf QBS} \to {\bf Set} $ . It is naturally isomorphic to the global element functor $ {\bf QBS}(1,-) $ .
Connection to measurable spaces: An adjunction We can convert measurable spaces and quasi-Borel spaces using an adjunction $ L \dashv K \colon {\bf Meas} \to {\bf QBS} $ . They are given by:
For any standard Borel space $ \Omega \in {\bf Meas} $ , we have $ LK\Omega = \Omega $ . The right adjoint K is full-faithful when restricted to the standard Borel spaces (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017, Proposition 15-(2)]. The right adjoint K preserves countable coproducts and function spaces (if exists) of standard Borel spaces (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017, Proposition 19].
Probability Measures and the Probability Monad. A probability measure on a quasi-Borel space I is a pair $ (\alpha,\mu) \in M_I \times G {\mathbb{R}} $ . We introduce an equivalence relation $ \sim_I $ over probability measures on I by:
Using this, we introduce a probability monad P on QBS as follows:
• On objects, we define $ P:{\bf Obj}({\bf QBS})\to{\bf Obj}({\bf QBS}) $ by
\begin{align*} |P(I)| &\triangleq (M_I \times G {\mathbb{R}}) / \sim_I, & M_{P(I)} \triangleq \{ \lambda r. [(\alpha,g(r))]_{\sim_I} \mid \alpha \in M_I, g \in {\bf Meas}({\mathbb{R}},G {\mathbb{R}})\}. \end{align*}• The unit is defined by $ \eta_I (x) \triangleq [\lambda r.x,\mu]_{\sim_I} $ for an arbitrary $ \mu \in G {\mathbb{R}} $ .
• The Kleisli extension of $ f \colon I \to P(J) $ is defined by $ f^\sharp [\alpha,\mu]_{\sim_I} \triangleq [\beta, g^\sharp \mu] $ where there are $ \beta \in M_J $ and $ g \in {\bf Meas}({\mathbb{R}},G{\mathbb{R}}) $ satisfying $ f \circ \alpha = \lambda r \in {\mathbb{R}}. [\beta,g(r)]_{\sim_J} $ by definition of $ M_{P(J)} $ .
The monad P is (commutative) strong with respect to the C(C)C $ ({\bf QBS},1,(\times)) $ .
2.2 Category of binary relations
We define the category $ {\bf BRel}({\mathbb{C}}) $ of binary relations over $ {\mathbb{C}} $ -objects. This category is equivalent to subscones of $ {\mathbb{C}}^2 $ (Mitchell and Scedrov, Reference Mitchell and Scedrov1992). It offers an underlying category for relational reasoning about programs interpreted in $ {\mathbb{C}} $ .
• An object in $ {\bf BRel}({\mathbb{C}}) $ is a triple $ (I_1,I_2,R) $ where $ R\subseteq UI_1\times UI_2 $ .
• A morphism from $ (I_1,I_2,R) $ to $ (J_1,J_2,S) $ in $ {\bf BRel}({\mathbb{C}}) $ is a pair of $ {\mathbb{C}} $ -morphisms $ f_1\colon I_1\rightarrow J_1 $ and $ f_2\colon I_2\rightarrow J_2 $ such that for any $ (i_1,i_2)\in R $ , we have $ (f_1\mathbin{\bullet} i_1,f_2\mathbin{\bullet} i_2)\in S $ .
When X is a name of a $ {\bf BRel}({\mathbb{C}}) $ -object, by $ X_1,X_2 $ we mean its first and second component, and by $ R_X $ we mean its third component, so $ X=(X_1,X_2,R_X) $ . By $ (x_1,x_2)\in X $ , we mean $ (x_1,x_2)\in R_X $ . For objects $ X,Y\in{\bf BRel}({\mathbb{C}}) $ and a morphism $ (f_1,f_2)\colon (X_1,X_2)\to (Y_1,Y_2) $ in $ {\mathbb{C}}^2 $ , by
we mean that $ (f_1,f_2)\in{\bf BRel}({\mathbb{C}})(X,Y) $ , that is, for any $ (x_1,x_2)\in X $ , we have $ (f_1\mathbin{\bullet} x_1,f_2\mathbin{\bullet} x_2)\in Y $ . We say that $ X\in{\bf BRel}({\mathbb{C}}) $ is an endorelation (over I) if $ X_1=X_2(=I) $ .
We next define the forgetful functor $ p_{{\mathbb{C}}}:{\bf BRel}({\mathbb{C}})\to{{\mathbb{C}}}^2 $ by:
For $ (I_1,I_2)\in{\mathbb{C}}^2 $ , by $ {\bf BRel}({\mathbb{C}})_{(I_1,I_2)} $ we mean the complete boolean algebra $ \{X\in{\bf BRel}({\mathbb{C}}) \mid X_1=I_1\wedge X_2=I_2\} $ with the order given by $ X\le Y\iff R_X\subseteq R_Y $ .
When $ {\mathbb{C}} $ is a C(C)C, so is $ {\bf BRel}({\mathbb{C}}) $ (Mitchell and Scedrov, Reference Mitchell and Scedrov1992, Proposition 4.3). We specify a terminal object, a binary product functor and an exponential functor (in case $ {\mathbb{C}} $ is a CCC) on $ {\bf BRel}({\mathbb{C}}) $ by:
3. Divergences on Objects
We introduce the concept of divergence on objects in a CC $ {\mathbb{C}} $ . Major differences between divergences and metrics are threefold: (1) divergences are defined over objects in $ {\mathbb{C}} $ , (2) no axiom is imposed on divergences, and (3) divergences take values in a partially ordered monoid called divergence domain, which we define below.
Definition 1. A divergence domain $ {\mathcal{Q}}=(Q,\le,0,(+)) $ is a partially ordered commutative monoid whose poset part is a complete lattice.
The monoid addition $ (+) $ is only required to be monotone; no interaction with the sup/inf is required. We reserve the letter $ {\mathcal{Q}} $ to denote a general divergence domain. Examples of divergence domains are as follows:
Here, $ \mathbin{\bar{+}} $ is an extension of the addition by $ r \mathbin{\bar{+}} (-\infty) = (-\infty) \mathbin{\bar{+}} r=-\infty $ .
Definition 2. Let $ {\mathbb{C}} $ be a CC. A $ {\mathcal{Q}} $ -divergence on an object $ I\in{\mathbb{C}} $ is a function $ d\colon (UI)^2\to {\mathcal{Q}} $ . For two $ {\mathcal{Q}} $ -divergences d,d’ on I, by $ d\preceq_I d' $ , we mean $ \forall{x_1,x_2\in UI}~.~d'(x_1,x_2)\le d(x_1,x_2) $ .
The binary relation $ \preceq_I $ is a partial order on the set of $ {\mathcal{Q}} $ -divergences on I.
A suitable notion of morphismetween $ {\mathbb{C}} $ -objects with divergences is nonexpansive morphism.
Definition 3. Let $ {\mathbb{C}} $ be a CC. We define the category $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ of $ {\mathcal{Q}} $ -divergences on $ {\mathbb{C}} $ -objects and nonexpansive morphisms between them by the following data.
• An object is a pair (I,d) of an object $ I\in{\mathbb{C}} $ and a $ {\mathcal{Q}} $ -divergence d on I.
• A morphism from (I,d) to (J,e) is a $ {\mathbb{C}} $ -morphism $ f\colon I\to J $ such that for any $ x_1,x_2\in UI $ , $ e(f\mathbin{\bullet} x_1,f\mathbin{\bullet} x_2)\le d(x_1,x_2) $ holds.
For an object $ X\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ , by $ d_X $ we mean its $ {\mathcal{Q}} $ -divergence part. We also define the forgetful functor $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ by $ V_{{\mathcal{Q}},{\mathbb{C}}}(I,d)\triangleq I $ and $ V_{{\mathcal{Q}},{\mathbb{C}}}(f)\triangleq f $ .
We remark that the forgetful functor $ V_{{\mathcal{Q}},{\bf Set}}:{\bf Div}_{{\mathcal{Q}}}({{\bf Set}})\to{{\bf Set}} $ is a (Grothendieck) fibration, and the functor $ \overline U\colon {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ defined by $ \overline U(I,d)\triangleq (UI,d) $ and $ \overline U(f)\triangleq f $ makes the following commutative square a pullback in CAT (the large category of categories and functors between them):
Therefore, this pullback diagram asserts that $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ arises from the change-of-base (Jacobs, Reference Jacobs1999, Lemma 1.5.1) of the fibration $ V_{{\mathcal{Q}},{\bf Set}} $ along the global section functor $ U:{\mathbb{C}}\to{\bf Set} $ .
4. Divergences on Monads
We introduce the concept of divergence on monad as a quantitative measure of difference between computational effects. This mildly generalizes Barthe and Olmedo’s composable divergences on probability distributions (Barthe and Olmedo, Reference Barthe and Olmedo2013). Divergences on monads are defined upon two extra data called grading monoid and basic endorelation.
Definition 4. A grading monoid is a partially ordered monoid $ (M,\le,1,(\cdot)) $ .
Definition 5. A basic endorelation is a functor $ E\colon {\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ such that for each $ I\in{\mathbb{C}} $ , E I is an endorelation.
Grading monoids will be used when formulating $ (\varepsilon,\delta) $ -differential privacy as a divergence on a monad. Basic endorelations specify which global elements are regarded as identical. Any CC $ {\mathbb{C}} $ has at least two basic endorelations given by equality relations and total relations:
Other examples of basic endorelations can be found in concrete categories.
• The category $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ of $ {\mathcal{Q}} $ -divergences on $ {\mathbb{C}} $ -objects has a basic relation $ E_\delta $ parameterized by $ \delta\in {\mathcal{Q}} $ . It collects all pairs of global elements whose divergence is bound by $ \delta $ . That is, $ E_\delta(I,d)\triangleq ((I,d),(I,d),\{(x_1,x_2) \mid d(x_1,x_2)\le\delta\}) $ .
• The category Pre of preorders and monotone functions has the basic endorelation $ E_{eq} $ collecting equivalent global elements: $ E_{eq}(I,\le)\triangleq ((I,\le),(I,\le),\{(x,y) \mid x\le y\wedge y\le x\}) $ .
Definition 6. Let $ ({\mathbb{C}},1,(\times),T,\eta,\mu,\theta) $ be a CC-SM, $ {\mathcal{Q}}=(Q,\le,0,(+)) $ be a divergence domain, $ (M,\le,1,(\cdot)) $ be a grading monoid and $ E\colon {\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation. An E-relative M-graded $ {\mathcal{Q}} $ -divergence (when $ M=1 $ , we drop “M-graded”) on the monad T is a doubly indexed family of $ {\mathcal{Q}} $ -divergences $ {\mathsf\Delta} = \{{\mathsf\Delta}^{m}_{I} \colon (U(TI))^2 \to {\mathcal{Q}}\}_{m\in M,I\in{\mathbb{C}}} $ satisfying the following conditions:
• (Monotonicity) For any $ m\le m' $ in M and $ I\in{\mathbb{C}} $ ,
• (E-unit reflexivity) For any $ I\in{\mathbb{C}} $ ,
\begin{equation*} \sup_{(x_1,x_2) \in E I}{\mathsf\Delta}^{1}_{I}(\eta_I\mathbin{\bullet} x_1,\eta_I\mathbin{\bullet} x_2)\le 0. \end{equation*}• (E-composability) For any $ m_1,m_2\in M $ , $ I,J\in{\mathbb{C}} $ , $ c_1,c_2\in U(TI) $ and $ f_1,f_2\colon I\to TJ $ ,
\begin{equation*} {\mathsf\Delta}^{m_1\cdot m_2}_{J}(f_1^\sharp\mathbin{\bullet} c_1,f_2^\sharp\mathbin{\bullet} {c_2}) \le {\mathsf\Delta}^{m_1}_{I}(c_1,c_2) + \sup_{(x_1,x_2) \in E I}{\mathsf\Delta}^{m_2}_{J}(f_1\mathbin{\bullet} x_1,f_2\mathbin{\bullet} x_2). \end{equation*}
We write $ {\bf Div}(T, E,M,{\mathcal{Q}}) $ for the collection of E-relative M-graded $ {\mathcal{Q}} $ -divergences on T. We introduce a partial order $ \preceq $ (reusing the notation for the partial order between divergences in Definition 2) on $ {\bf Div}(T, E,M,{\mathcal{Q}}) $ by:
The E-composability condition is a generalization of the composability of differential privacy stated as Theorem 1 in Barthe and Olmedo (Reference Barthe and Olmedo2013). What is new in the present paper is that 1) we introduce a condition on the monad unit (E-unit reflexivity), and that 2) the sup computed in E-unit reflexivity and E-composability scans global elements related by E, while Barthe and Olmedo (Reference Barthe and Olmedo2013) only considers the case where $ E=\operatorname{Eq} $ . We will later show that both E-unit reflexivity and E-composability play an important role when connecting divergences, relational liftings of T, and the monad structure of T – these conditions are necessary and sufficient to construct strong graded relational liftings of T satisfying fundamental property with respect to divergences (Proposition 2).
We end this section by stating an interaction between the strength of T and divergences on T.
Proposition 7. Let $ ({\mathbb{C}},T) $ be a CC-SM, $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation, $ (M, \leq, 1, (\cdot)) $ be a grading monoid, and $ {\mathcal{Q}} $ be a divergence domain. Suppose also that $ EI \mathbin{\dot\times} EJ \le E(I \times J) $ holds for all $ I,J \in {\mathbb{C}} $ . Then each divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ satisfies: for all $ m\in M,(x_1,x_2) \in EI $ and $ c_1,c_2 \in U(TJ) $ ,
5. Examples of Divergences on Monads
5.1 Cost difference for deterministic computations
We introduce examples of divergence on the cost count monad $ T \triangleq {\mathbb{N}} \times - $ on Set (which is isomorphic to the writer monad over a single alphabet $ \{*\} $ ). The divergence is inspired by the work on relational cost analysis (Çiçek et al., Reference Çiçek, Barthe, Gaboardi, Garg and Hoffmann2017; Radiček et al., Reference Radiček, Barthe, Gaboardi, Garg and Zuleger2017) measuring the difference of costs between two programs by subtraction. Through these examples we also discuss the roles of the E-unit reflexivity and E-composability conditions.
The unit and Kleisli extension of the cost count monad T are defined by:
This monad T can be used to record the cost incurred by deterministic computations. For instance, consider the quicksort algorithm $ \mathsf{qsort} $ and the insertion sort algorithm $ \mathsf{isort} $ , both of which are modified so that they tick a count whenever they compare two elements to be sorted. These two modified sort programs are interpreted as functions $ [\![ \mathsf{qsort}]\!],[\![ \mathsf{isort}]\!]\colon {\mathbb{N}}^*\to T({\mathbb{N}}^*) $ so that the first component of $ [\![ \mathsf{qsort}]\!](x) $ and that of $ [\![ \mathsf{isort}]\!](x) $ report the number of comparisons performed during sorting x.
We first define an $ {\mathcal{N}} $ -divergence $ \mathsf{C}_I $ on TI, for each $ I\in{\bf Set} $ , by:
This divergence $ \mathsf{C}_I $ computes the difference of costs between two computations $ (i,x),(j,y) \in TI $ , ignoring their return values. The family $ \mathsf C={\mathsf{C}_I}_{I \in {\bf Set}} $ forms a $ \operatorname{Top} $ -relative $ {\mathcal{N}} $ -divergence on T. The $ \operatorname{Top} $ -unit reflexivity of $ \mathsf{C} $ means that the difference of costs between pure computations is zero:
The $ \operatorname{Top} $ -composability of $ \mathsf{C} $ says that we can limit the cost difference of two runs of programs $ f^\sharp(i,x) $ and $ g^\sharp(j,y) $ by the sum of cost difference of the preceding computations (i,x),(j,y) and that of two programs $ f,g\colon I\to TJ $ . The latter is measured by taking the sup of cost difference of f(x) and g(y), where (x,y) range over the basic correlation $ \operatorname{Top} I $ .
We remark that $ \mathsf C $ is not an $ \operatorname{Eq} $ -relative $ {\mathcal{N}} $ -divergence on T because the $ \operatorname{Eq} $ -composability fails: If $ I = \{x,y,z\} $ , $ J = \{v,w\} $ and $ f \colon I \to \mathsf C J $ is defined by $ f(x)=(0,w) $ , $ f(y)=(1,w) $ and $ f(z) = (0,v) $ , then we have $ \mathsf{C}_I((0,x),(0,y)) = 0 $ and $ \sup_{(x,y) \in \operatorname{Eq} I} \mathsf{C}_J(f(x),f(y)) = 0 $ , but we have $ \mathsf{C}_J (f^\sharp(0,x),f^\sharp(0,y)) = \mathsf{C}_J((0,w),(1,w)) = 1 $ .
Alternatively, we may consider the following $ {\mathcal{N}} $ -divergence $ \mathsf{C}'_I $ on TI for each $ I\in{\bf Set} $ :
This divergence is sensitive on return values of computations. When return values of two computations agree, $ \mathsf C' $ measures the cost difference as done in $ \mathsf{C} $ , but when they do not agree, the cost difference is judged as $ \infty $ . This divergence is an $ \operatorname{Eq} $ -relative $ {\mathcal{N}} $ -divergence on T.
5.2 Cost difference for nondeterministic computations
We may model the cost counting effect and nondeterministic choice by the monad $ P(\mathbb{N} \times -) $ on Set, where P is the powerset monad. We extend the divergence on the cost count monad in the previous section to this combined monad as follows. For two results of nondeterministic computations $ A, B \in P(\mathbb{N} \times I) $ with cost counting effects, the least upper bound of the difference $ i - j $ for all possible choices of $ (i,x) \in A $ and $ (j,y) \in B $ forms the divergence on $ P(\mathbb{N} \times -) $ :
If either A or B is empty, we fail to get an information of costs. We then have $ \mathsf{NCI}_I(A,B) = -\infty $ . Similary, we can define the divergence $ \mathsf{NC} $ in Table 1 measuring the absolute difference of costs.
5.3 Divergences for differential privacy
Differential privacy (DP for short) is a quantitative definition of privacy of randomized queries in databases. DP is based on the idea of noise-adding anonymization against background knowledge attacks. In the study of DP, a query is modeled by a measurable function $ c \colon I \to G J $ , where I and J are measurable spaces of inputs and outputs, respectively, and G J is the measurable space of all probability measures over J; here, G itself refers to the Giry monad (Giry, Reference Giry and Banaschewski1982); see also Section 2.1).
Definition 8. Differential privacy, (Dwork et al., Reference Dwork, McSherry, Nissim and Smith2006). Let $ c \colon I \to G J $ be a morphism in Meas, representing a randomized query. The query c satisfies $ (\varepsilon,\delta) $ -differential privacy ( $ \epsilon,\delta\ge 0 $ are reals) if for any adjacent datasets $ (d_1,d_2) \in R_{\mathrm{adj}} $ Footnote 2, the following holds
To express this definition in terms of divergence on monad, we introduce a doubly indexed family of $ {\mathcal{R}}^+ $ -divergence $ \mathsf{DP}=\{\mathsf{DP}_{J}^{\varepsilon}\}_{\varepsilon\in[0,\infty],J\in{\bf Meas}} $ on G J by:
Then the query $ c \colon I \to G J $ satisfies $ (\varepsilon,\delta) $ -DP if and only if
The pair $ (\varepsilon,\delta) $ indicates the difference between output probability distributions $ c(d_1) $ and $ c(d_2) $ of the query c for given datasets $ d_1 $ and $ d_2 $ . Intuitively, the parameter $ \varepsilon $ is an upper bound of the ratio $ \Pr[c(d_1) = s]/\Pr[c(d_2) = s] $ of probabilities which indicates the leakage of privacy. If $ \varepsilon $ is large, attackers can distinguish the datasets $ d_1 $ and $ d_2 $ from the outputs of the query c. The parameter $ \delta $ is the probability of failure of privacy protection.
The family $ \mathsf{DP} $ forms an $ \operatorname{Eq} $ -relative $ {\mathcal{R}}^+ $ -graded $ {\mathcal{R}}^+ $ -divergence on the Giry monad G (Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019, Lemma 6). This is proved by extending the composability of the divergence for DP on discrete probability distributions shown as Lemmas 3 and 6 in Barthe et al. (Reference Barthe, Köpf, Olmedo and Béguelin2012) and Proposition 5 in Barthe andOlmedo (Reference Barthe and Olmedo2013), based on the composition theorem of DP (Dwork and Roth, Reference Dwork and Roth2013, Section 3.5).
The conditions in Definition 6 on $ \mathsf{DP} $ corresponds to the following basic properties of DP:
• (monotonicity) The monotonicity of $ \mathsf{DP} $ corresponds to weakening the differential privacy of queries: if c satisfies $ (\varepsilon,\delta) $ -DP and $ \varepsilon \leq \varepsilon' $ and $ \delta \leq \delta' $ holds, then c satisfies $ (\varepsilon',\delta') $ -DP.
• ( $ \operatorname{Eq} $ -unit reflexivity) The $ \operatorname{Eq} $ -unit reflexivity of $ \mathsf{DP} $ implies $ \mathsf{DP}_{J}^{0}(\eta_J \circ h(x),\eta_J \circ h(x)) = 0 $ for any measurable function $ h \colon I \to J $ and $ x \in I $ . This, together with the composability below, ensures the robustness of DP of a query $ c\colon I\to G J $ with respect to deterministic postprocessing:
(2) \begin{equation} \label{eq:dpostpro} \forall{h:J\to K}~.~ {\textit{c}}\,\text{is $(\epsilon,\delta)$-DP} \implies \text{$G h\circ c$ is $(\epsilon,\delta)$-DP}. \end{equation}In fact, the divergence $ \mathsf{DP} $ is reflexive: we have $ \mathsf{DP}_{J}^{0}(\mu,\mu) = 0 $ for every $ \mu \in G J $ . Therefore, $ h\colon J\to K $ and G h in (2) can be replaced by $ h\colon J\to G K $ and $ h^\sharp $ ; the replaced condition states the robustness of DP of a query with respect to probabilistic postprocessing.• ( $ \operatorname{Eq} $ -composability) The $ \operatorname{Eq} $ -composability of $ \mathsf{DP} $ corresponds to the known property of DP called the sequential composition theorem (Dwork and Roth, Reference Dwork and Roth2013). If $ c_1 \colon I \to G J' $ and $ c_2 \colon J' \to G J $ are $ (\varepsilon_1,\delta_1) $ -DP and $ (\varepsilon_2,\delta_2) $ -DP, respectively, then the sequential composition $ c_2^\sharp\circ c_1 \colon I \to G J $ of the queries $ c_1 $ and $ c_2 $ is $ (\varepsilon_1+\varepsilon_2,\delta_1+\delta_2) $ -DP.
A non-example: Pointwise differential privacy. We stated above that a parameter $ (\varepsilon, \delta) $ of DP intuitively gives an upper bound of the probability ratio $ \Pr[c(d_1) = s]/\Pr[c(d_2) = s] $ and the probability of failure of privacy protection. However, strictly speaking, there is a gap between the definition of $ (\varepsilon,\delta) $ -DP and this intuition of $ \varepsilon $ and $ \delta $ . Pointwise differential privacy in Prasad and Smith (Reference Prasad and Smith2014, Definition 3.2) and Hall (Reference Hall2012, Proposition 1.2.3) is a finer definition of DP that is faithful to this intuition.
Definition 9. A measurable function $ c \colon I \to G J $ (regarded as a query) is pointwise $ (\varepsilon,\delta) $ -differentially private if whenever $ d_1 $ and $ d_2 $ are adjacent, there exists a measurable subset $ A \in \Sigma_J $ satisfying $ \Pr[ c(d_1) \notin A] \leq \delta $ and the following inequality:
which is equivalent toFootnote 3
To express this definition in terms of divergence on monad, we introduce a doubly indexed family of $ {\mathcal{R}}^+ $ -divergences $ \mathsf{pwDP}= \{\mathsf{pwDP}_{}^{\varepsilon}J\}_{\varepsilon\in {\mathcal{R}}^+,J\in{\bf Meas}} $ called pointwise indistinguishability:
Then, $ c \colon I \to G J $ is pointwise $ (\varepsilon,\delta) $ -differentially private if and only if
The family $ \mathsf{pwDP} $ is obviously reflexive: $ \mathsf{pwDP}_{J}^{\varepsilon}(\mu,\mu) = 0 $ holds for any $ \mu \in G J $ and $ \varepsilon \geq 0 $ . Hence, it is $ \operatorname{Eq} $ -unit reflexive. However, it is not $ \operatorname{Eq} $ -composable. We let $ I = \{0,1,2\} $ and $ J = \{0,1\} $ be discrete spaces, and let $ \alpha = \exp(\varepsilon) $ . We define two probability distributions $ \mu_1,\mu_2 \in G I $ by:
We then have $ \mathsf{pwDP}_{I}^{\varepsilon}(\mu_1,\mu_2) = 1 / 10 $ with $ A = \{1,2\} $ because
Next, we define $ f \colon I \to G J $ by:
We then have
Hence, we obtain $ \mathsf{pwDP}_{J}^{\varepsilon}(f^\sharp(\mu_1),f^\sharp(\mu_2)) = 82 / 100 $ with $ A = \{1\} $ because
By the reflexivity of $ \mathsf{pwDP} $ , we have $ \sup_{(x,y) \in \operatorname{Eq} I} \mathsf{pwDP}_{J}^{0} (f(x),f(y)) = 0 $ . Therefore, we have obtained a counterexample to the $ \operatorname{Eq} $ -composability of $ \mathsf{pwDP} $ :
Various relaxations of differential privacy. Since the seminal work on DP by Dwork et al. (Reference Dwork, McSherry, Nissim and Smith2006), various relaxations of differential privacy have been proposed: Rényi DP (Mironov, Reference Mironov2017), zero-concentrated DP (Bun and Steinke, Reference Bun and Steinke2016), and truncated zero-concentrated DP (Bun et al., Reference Bun, Dwork, Rothblum and Steinke2018). They give tighter bounds of differential privacy. These relaxations of differential privacy can be expressed by suitable divergences on the Giry monad G and sub-Giry monad $ G_{s} $ ; see Table 2 for their definitions. Therefore, $ \alpha,w\in(1,\infty) $ are nongrading parameters for $ \mathsf{Re} $ and $ \mathsf{tCDP} $ . Each row of the table represents that $ {\mathsf\Delta} $ is an $ \operatorname{Eq}{} $ -relative $ {\mathcal{Q}} $ - (resp. $ {\mathcal{Q}}_s $ -) divergences on G (resp. $ G_{s} $ ), and the definition of $ {\mathsf\Delta}^{}_{I}(\mu_1,\mu_2) $ follows.
5.4 Statistical divergences and composablity of f-divergences
Apart from differential privacy, various distances between (sub-)probability distributions are introduced in probability theory. They are called statistical divergences. Examples include total variation distance $ \mathsf{TV} $ , Hellinger distance $ \mathsf{HD} $ , Kullback–Leibler divergence $ \mathsf{KL} $ , and $ \chi^2 $ -divergence $ \mathsf{Chi} $ ; they are defined in Table 3. These statistical divergences are $ \operatorname{Eq} $ -relative divergences on the Giry monad G (and $ G_{s} $ for $ \mathsf{TV} $ ); see the same table for their divergence domains. Question marks in the column of $ {\mathcal{Q}}_s $ means that we do not know with which monoid structure the $ \operatorname{Eq} $ -composability holds. We remark that these divergences are also reflexive, that is, $ {\mathsf\Delta}(c,c)=0 $ . $ \operatorname{Eq} $ -composability of these divergences in discrete form are proved in Barthe and Olmedo (Reference Barthe and Olmedo2013) and Olmedo (Reference Olmedo2014). Later, Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019) extends their results to the composability of divergences in continuous form.
Each of four divergences in Table 3 can be expressed as an f-divergence $ {}^{f}\mathsf{Div} $ (Csiszár, Reference Csiszár1963, Reference Csiszár1967; Morimoto, Reference Morimoto1963):
Here, f is a parameter called weight function and has to be a convex function $ f \colon [0,\infty) \to {\mathbb{R}} $ , continuous at 0 and satisfying $ \lim_{x\to +0}xf(x)=0 $ . To support general $ \mu_1,\mu_2 \in G_{s} I $ , we suppose $ a f (0/a) = a f^\ast(0) $ for $ a \in [0,\infty) $ where $ f^\ast(0) \triangleq \lim_{x \to \infty} f(x)/x $ (see also Liese and Vajda (Reference Liese and Vajda2006, Definition 2)). Weight functions for four divergences $ \mathsf{TV}, \mathsf{KL},\mathsf{HD},\mathsf{Chi} $ are in Table 4. In fact, $ \mathsf{DP}^\varepsilon $ is also an f-divergence with weight function $ f(t)=\max (0,t-\exp(\varepsilon)) $ ; see Barthe and Olmedo (Reference Barthe and Olmedo2013, Proposition 2). We also remark that Rényi divergence $ {}^{\alpha}\mathsf{Re}_{ }{} $ of order $ \alpha $ is the logarithm of the f-divergence with weight function $ f(t) = t^\alpha $ .
f-divergences have several nice properties such as reflexivity, postprocessing inequality, joint-convexity, duality, and continuity (Csiszár, Reference Csiszár1967; Liese and Vajda, Reference Liese and Vajda2006). However, the $ \operatorname{Eq} $ -composability of f-divergences is not guaranteed in general. Here, we provide a sufficient condition for the $ \operatorname{Eq} $ -composability of $ {}^{f}\mathsf{Div} $ over a specific form of divergence domain.
Proposition 10. Let $ \gamma\geq 0 $ be a nonnegative real number, $ {\mathcal{R}}^+_{\gamma} =([0,\infty],\le,0,\lambda{(p,q)}~.~p+q+\gamma pq) $ be the divergence domain, and f be a weight function such that $ f \geq 0 $ and $ f(1) = 0 $ . If there exists $ \alpha, \beta, \beta' \in {\mathbb{R}} $ such that, for all $ x,y,z,w \in [0,1] $ , the following hold
then $ {}^{f}\mathsf{Div} $ is an $ \operatorname{Eq}{} $ -relative $ {\mathcal{R}}^+_{\gamma} $ -divergence on the Giry monad G. When $ \alpha = 0 $ and $ \beta,\beta' \in [0,1] $ , G can be replaced with the sub-Giry monad $ G_{s} $ .
The proof of this proposition generalizes and integrates the proofs given in Olmedo (Reference Olmedo2014, Section 5.A.2). This proposition is applicable to prove the composability of divergences in Table 3 by choosing suitable parameters; see Table 4.
5.5 Divergences on the probability monad on QBS via monad opfunctors
We have seen various divergences on the Giry monad G. It would be nice if they are transferred to the probability monad P on QBS (Section 2.1). For this, we first develop a generic method for transferring divergences on monads.
Let $ ({\mathbb{C}},S) $ and $ (\mathbb D,T) $ be two CC-SMs. A monad opfunctor (Street, Reference Street1972, Section 4) is a functor $ p \colon {\mathbb{C}} \to \mathbb D $ together with a natural transformation $ \lambda\colon p\circ S \to T\circ p $ making the following diagrams commute:
Proposition 11. Let $ ({\mathbb{C}},S),(\mathbb D,T) $ be two CC-SMs, $ (p\colon {\mathbb{C}}\to\mathbb D, \lambda\colon p\circ S\to T\circ p) $ be a monad opfunctor and assume that $ U^\mathbb D \circ p = U^{\mathbb{C}} $ holds, and basic endorelations $ F \colon {\mathbb{C}} \to {\bf BRel}({\mathbb{C}}) $ and $ E \colon \mathbb D \to {\bf BRel}(\mathbb D) $ satisfy $ R_{FpI} = R_{EI} $ for all $ I \in {\mathbb{C}} $ (we here use $ U^\mathbb D\circ p = U^{\mathbb{C}} $ ). Then for any $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ , the following doubly indexed family of $ {\mathcal{Q}} $ -divergences $ \langle{p,\lambda}\rangle^*{\mathsf\Delta} =\{ (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{m}_{I} \}_{m\in M,I\in{\mathbb{C}}} $ on SI is an F-relative M-graded $ {\mathcal{Q}} $ -divergence on S:
The left adjoint $ L \colon {\bf QBS} \to {\bf Meas} $ of the adjunction $ L \dashv K \colon {\bf Meas}\rightarrow{\bf QBS} $ and the natural transformation $ l \colon LP \Rightarrow G L $ defined by $ l_X([\alpha,\mu]_{\sim_X})= \mu(\alpha^{-1}(-)) $ forms a monad opfunctor from the probability monad P on QBS to the Giry monad G on Meas (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017, Prop. 22 (3)). Through this monad opfunctor (L,l), we can convert $ \operatorname{Eq}{} $ -divergences on G to those on P. This conversion can be applied to all the statistical divergences in Table 2 and 3.
In addition, for any standard Borel space, we can view such converted divergences $ \langle{L,l}\rangle^*{\mathsf\Delta} $ as the same thing as the original $ {\mathsf\Delta} $ . When $ \Omega \in {\bf Meas} $ is standard Borel, we have an equality $ L K \Omega=\Omega $ , and $ l_{K\Omega} $ is an isomorphism. Therefore, we obtain an isomorphism $ l_{K\Omega}\colon L P K \Omega \cong G L K \Omega= G\Omega $ (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017, Prop. 22 (4)). A concrete description of its inverse is $ l_{K\Omega}^{-1}\mathbin{\bullet}\mu= [\gamma',\mu(\gamma^{-1}(-))]_{\sim_{K \Omega}} $ , where $ \gamma' \colon {\mathbb{R}} \to \Omega $ and $ \gamma \colon \Omega \to {\mathbb{R}} $ are a section–retraction pair (i.e. $ \gamma' \circ \gamma = \mathrm{id}_\Omega $ ) that exists for any standard Borel $ \Omega $ .
Theorem 12. For any $ {\mathsf\Delta}\in{\bf Div}(G,\operatorname{Eq},{\mathcal{Q}},M) $ and standard Borel $ \Omega \in {\bf Meas} $ ,
5.6 Divergences on state monads
The state monad $ T_S\triangleq S \Rightarrow (- \times S) $ with a state space S is used to represent programs that update the state. We construct divergences on $ T_S $ using divergences $ d_S $ on the state space S in several ways.
5.6.1 Lipschitz constant on states
We first consider the state monad $ T_S $ on Set. We also consider a function $ d_S \colon S^2 \to [0,\infty] $ satisfying $ d_S(s,s) = 0 $ . The following $ {\mathcal{R}}^\times $ -divergence $ {\mathsf\Delta}^{\mathsf{lip},d_S}_I(f_1,f_2) $ on $ T_SI $ measures how much the function pair $ (\pi_2\circ f_1, \pi_2\circ f_2) $ extends the distance between two states before updated. In short, $ {\mathsf\Delta}^{\mathsf{lip},d_S} $ measures the Lipschitz constant on state transformers.
Proposition 13. The family $ {\mathsf\Delta}^{\mathsf{lip},d_S}=\{{\mathsf\Delta}^{\mathsf{lip},d_S}_I\}_{I\in{\bf Set}} $ of $ {\mathcal{R}}^\times $ -divergences on $ T_SI $ defined by:
is a $ \operatorname{Top} $ -relative $ {\mathcal{R}}^\times $ -divergence on $ T_S $ .
For state transformers $ f_1,f_2 \in T_S I $ , their state-updating part is given as functions $ \pi_2\circ f_1, \pi_2\circ f_2 \in S \Rightarrow S $ . When $ f_1 = f_2 = g $ , $ {\mathsf\Delta}^{\mathsf{lip},d_S}_I(g,g) $ is exactly the Lipschitz constant of $ \pi_2\circ g $ .
5.6.2 Distance between state transformers with the same inputs
Suppose that the function $ d_S $ also satisfies the triangle inequality. The following $ {\mathcal{R}}^+ $ -divergence $ {\mathsf\Delta}^{\mathsf{met},d_S}_{I}(f_1,f_2) $ on $ T_SI $ estimates the distance between updated states after the state transformers $ f_1 $ and $ f_2 $ are applied to the same input.
Proposition 14. Suppose that the function $ d_S $ also satisfy the triangle inequality. The family $ {\mathsf\Delta}^{\mathsf{met},d_S}=\{{\mathsf\Delta}^{\mathsf{met},d_S}_I\}_{I\in{\bf Set}} $ of $ {\mathcal{R}}^+ $ -divergences on $ T_SI $ defined by:
is an $ \operatorname{Eq} {} $ -relative $ {\mathcal{R}}^+ $ -divergence on $ T_S $ .
5.6.3 Sup-metric on the state monad on the category of generalized ultrametric spaces
The category $ \mathbf{Gum} $ of generalized ([0,1]-valued) ultrametric spacesFootnote 4 and nonexpansive functions is Cartesian closed (Rutten, Reference Rutten1996, Section 2.2). We consider the state monad $ T_S = S \Rightarrow (- \times S) $ on $ \mathbf{Gum} $ for a fixed space $ (S,d_S)\in\mathbf{Gum} $ . From the definition of exponential objects in $ \mathbf{Gum} $ , $ T_S(I,d_I) $ consists of the set of nonexpansive state transformers with the sup-metric between them. In fact, the metric part of all $ T_S(I,d_I) $ forms a divergence on $ T_S $ .
Proposition 15. The family $ \{ d_{T_S I} \colon (T_S(I,d_I))^2 \to [0,1] \}_{(I,d_I) \in \mathbf{Gum}} $ consisting of the metric part of the spaces $ T_S(I,d_I) $ , given by:
forms an $ \operatorname{Eq} {} $ -relative $ ([0,1],\leq,\max,0) $ -divergence on $ T_S $ .
In the category $ \mathbf{Gum} $ , instead of $ \operatorname{Eq} $ , there is another basic endorelation $ \mathsf{Dist}_0 $ :
By modifying the divergence $ d_{T_S (-)} $ , we obtain a $ \mathsf{Dist}_0 $ -relative $ ([0,1],\leq,\max,0) $ -divergence as below:
Proposition 16. The following forms a $ \mathsf{Dist}_0 {} $ -relative $ ([0,1],\leq,\max,0) $ -divergence on $ T_S $ :
5.7 Combining divergence with cost
In Section 5.2, we have introduced divergences on the monad $ P({\mathbb{N}}\times -) $ modeling nondeterministic choice and cost counting. These divergences are based on the distance/subtractions of costs represented by natural numbers. In this section, we provide an alternative divergences on the combination of a general computational effect T and cost counting. The basic idea is the following: given two computations $ c_1,c_2\in T(N\times I) $ , we discard the value part of $ c_i $ by $ T\pi_1:T(N\times I)\to T(N) $ and measure their difference by the divergence assumed on T.
Let $ ({\mathbb{C}},T) $ be a CC-SM and $ {\mathsf\Delta}\in{\bf Div}(T,\operatorname{Eq} ,1,{\mathcal{Q}}) $ be a divergence and $ (N,1_N\colon 1\to N,(\star)\colon N\times N\to N) $ be a monoid object in $ {\mathbb{C}} $ for cost counting. Then the composite $ T(N \times - ) $ of the monad T and the monoid action monad $ N \times (-) $ again carries a monad structure. We now define a family $ {\mathsf C}({\mathsf\Delta}, N) = \{ {\mathsf C}({\mathsf\Delta},N)_{I} \colon (U(T(N\times I)))^2 \to {\mathcal{Q}} \}_{I \in {\mathbb{C}}} $ of $ {\mathcal{Q}} $ -divergences by:
Proposition 17. The family $ {\mathsf C}({\mathsf\Delta},N) $ is an $ \operatorname{Eq} $ -relative $ {\mathcal{Q}} $ -divergence on $ T(N\times -) $ .
For example, the divergence $ {\mathsf C}(\mathsf{KL}, {{\mathbb{R}}}) $ on the composite monad $ G({\mathbb{R}} \times -) $ on Meas describes Kullback–Leibler divergence between distributions of costs in the probabilistic computations with real-valued costs. Intuitively, the side condition $ \mathsf{KL}_{{\mathbb{R}} \times I} (\mu_1, \mu_2) \leq \mathsf{KL}_{{\mathbb{R}}} (G \pi_1 \mathbin{\bullet} \mu_1, G \pi_1 \mathbin{\bullet} \mu_2) $ in the definition of $ {\mathsf C}(\mathsf{KL}, {{\mathbb{R}}}) $ means that the difference between $ \mu_1 $ and $ \mu_2 $ lies only in the costs.
5.8 Preorders on monads
To explore the generality of our framework, we look at the case where the divergence domain is $ {\mathcal{B}}=(\{0\ge 1\},1,\times) $ ; here, $ \times $ is the numerical multiplication. We identify an indexed family $ {\mathsf\Delta} = \{{\mathsf\Delta}^{}_{I} \colon (U(TI))^2 \to {\mathcal{B}} \}_{I \in {\mathbb{C}}} $ of $ {\mathcal{B}} $ -divergences and a family of adjacency relations $ \tilde {\mathsf\Delta} (1) I \triangleq \{ (c_1, c_2) \mid {\mathsf\Delta}^{}_{I} (c_1, c_2) \leq 1 \}_{I\in{\bf Set}} $ .
We point out a connection between $ \operatorname{Eq} $ -relative $ {\mathcal{B}} $ -divergences and preorders on monads studied in Katsumata and Sato (Reference Katsumata, Sato and Pfenning2013) and Sato (Reference Sato, Jacobs, Silva and Staton2014). A preorder on a monad T on Set assigns a preorder $ \sqsubseteq_I $ on TI for each $ I\in{\bf Set} $ , and this assignment satisfies:
• (Substitutivity) For any function $ f\colon I\to TJ $ and $ c_1,c_2\in TI $ , $ c_1\sqsubseteq _Ic_2 $ implies $ f^\sharp (c_1)\sqsubseteq_J f^\sharp (c_2) $ .
• (Congruence) For any function $ f_1,f_2\colon I\to TJ $ , if $ f_1(x)\sqsubseteq_Jf_2(x) $ holds for any $ x\in I $ , then $ f_1^\sharp (c)\sqsubseteq_Jf_2^\sharp (c) $ holds for any $ c\in TI $ .
Proposition 18. A preorder on a monad T on Set bijectively corresponds to an $ \operatorname{Eq} $ -relative $ {\mathcal{B}} $ -divergence $ {\mathsf\Delta} $ on T such that each $ \tilde{\mathsf\Delta}(1)I $ is a preorder.
For a preorder $ \sqsubseteq $ on a monad T on Set, by $ {\mathsf\Delta}^{\sqsubseteq}_{ }{} $ we mean the divergence corresponding to $ \sqsubseteq $ by Proposition 18 (in fact, we have $ \widetilde{{\mathsf\Delta}^{\sqsubseteq}_{ }{}}(1)I = {\sqsubseteq_I} $ and $ \widetilde{{\mathsf\Delta}^{\sqsubseteq}_{ }{}}(0)I = TI \times TI $ for all set I).
6. Properties of Divergences on Monads
6.1 Divergences on monads as structures in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $
In this section, we examine divergences on monads from the view point of the monoidal structure of $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ . For any CC $ {\mathbb{C}} $ , the category $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ has a symmetric monoidal structure, whose unit and tensor product are given by:
The coherence isomorphisms of this symmetric monoidal structure are inherited from the Cartesian monoidal structure on $ {\mathbb{C}} $ . Moreover, $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ becomes a symmetric strict monoidal functor of type $ ({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}),{\bf I},(\otimes))\to ({\mathbb{C}},1,(\times)) $ .
6.1.1 Enrichments of Kleisli categories induced by divergences
Let $ ({\mathbb{C}},T) $ be a CC-SM. We first show that a nongraded divergence on a monad T attaches a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ - enrichment on the Kleisli category $ {\mathbb{C}}_T $ of T. Attaching an enrichment to an ordinary category is formulated as follows.
Definition 19. A $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enrichment of a category $ \mathbb D $ is a family $ \{ d_{I, J}:\mathbb D(I,J)^2\to{\mathcal{Q}} \}_{I, J \in \mathbb D} $ of $ {\mathcal{Q}} $ -divergences on the homset $ \mathbb D (I, J) $ such that the following inequalities hold
Such an enrichment determines a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enriched category $ \mathbb D^d $ , whose object collection and homobjects are given by:
The identity and composition morphisms of $ \mathbb D^d $ :
are inherited from $ \mathbb D $ ; they are guaranteed to be nonexpansive by conditions (3) and (4).
We characterize $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enrichments of $ \mathbb D $ as $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enriched categories whose change-of-enriching-category along $ V_{{\mathcal{Q}},{\bf Set}} $ coincides with $ \mathbb D $ .
Proposition 20. Let $ \mathbb D $ be a category. There is a bijective correspondence between 1) a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enrichment $ \{d_{I,J}\}_{I,J\in\mathbb D} $ of $ \mathbb D $ and 2) a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enriched category $ \mathbb E $ such that the change-of-enriching-category of $ \mathbb E $ by $ V_{{\mathcal{Q}},{\bf Set}}:{\bf Div}_{{\mathcal{Q}}}({{\bf Set}})\to{{\bf Set}} $ is equal to $ \mathbb D $ .
We relate conditions (3) and (4) with the unit reflexivity and composability conditions in the definition of divergence on monad (Definition 6).
Theorem 21. Let $ ({\mathbb{C}}, T) $ be a CC-SM, $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation such that $ R_{E 1} \neq \emptyset $ Footnote 5, $ {\mathcal{Q}} $ be a divergence domain and $ {\mathsf\Delta}=\{{\mathsf\Delta}^{}_{I}:(U(TI))^2\to{\mathcal{Q}} \}_{I\in{\mathbb{C}}} $ be a family of $ {\mathcal{Q}} $ -divergences on TI. Define a family $ d^{\mathsf\Delta}=\{ d^{\mathsf\Delta}_{I, J}:{\mathbb{C}}_T(I,J)^2\to{\mathcal{Q}} \}_{I, J \in {\mathbb{C}}} $ of $ {\mathcal{Q}} $ -divergences on the homset $ {\mathbb{C}}_T (I, J) $ of the Kleisli category $ {\mathbb{C}}_T $ by:
Then $ d^{\mathsf\Delta} $ is a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enrichment of $ {\mathbb{C}}_T $ if and only if $ {\mathsf\Delta} $ is an E-relative $ {\mathcal{Q}} $ -divergence on T.
6.1.2 Internalizing divergences in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $
We seek a further characterization of the $ {\mathcal{Q}} $ -divergence (5) given to each homset of $ {\mathbb{C}}_T $ . Under a strengthened assumption, we relate it with the closed structure with respect to the tensor product of $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ . This allows us to internalize divergences on monads as structures in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ .
Let $ ({\mathbb{C}},T) $ be a CCC-SM and $ {\mathcal{Q}}=(Q,\le,0,(+)) $ be a divergence domain whose monoid operation $ (+) $ preserves the largest element $ \top\in{\mathcal{Q}} $ , that is, $ x+\top=\top $ . A consequence of this strengthened assumption is the following:
Lemma 22. Let $ X\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ be an object such that its $ {\mathcal{Q}} $ -divergence $ d_X $ takes values in $ \{0,\top\}\subseteq{\mathcal{Q}} $ . Define a functor $ X\multimap(-):{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $
here $ \lfloor{-}\rfloor:U(I\Rightarrow J)\to{\mathbb{C}}(I,J) $ is the bijection given in Section 2. Then it is a left adjoint to the functor $ (-)\otimes X:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ tensoring with X. Moreover, $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ is a map of adjunction (Mac Lane, Reference Mac Lane1998, Section IV.7) in the following sense:
The $ {\mathcal{Q}} $ -divergence $ d_{X\multimap Y} $ in (6) is similar to the sup part of the composability condition in Definition 6. We exploit this similarity to express the unit reflexivity and composability conditions of divergence on monad (Definition 6) using the internal hom functor $ X\multimap(-) $ . First, we define the uncurried bind morphism $ \mathrm{ub}_{I,J}:TI\times(I\Rightarrow TJ)\rightarrow TJ $ by:
Next, for a basic endorelation $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ , we define the functor $ :{E}^{\mathcal{Q}}{\mathbb{C}}\to{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ by:
Theorem 23. Let $ ({\mathbb{C}},T) $ be a CCC-SM, $ (M,\le,1,(\cdot)) $ be a grading monoid, $ {\mathcal{Q}}=(Q,\le,0,(+)) $ be a divergence domain whose monoid operation $ (+) $ satisfies $ x+\top=\top $ , and $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation. Let $ {\mathsf\Delta}=\{{\mathsf\Delta}^{m}_{I}:U(TI)^2\to{\mathcal{Q}}\}_{m\in M,I\in{\mathbb{C}}} $ be a doubly indexed family of $ {\mathcal{Q}} $ -divergences on TI, regarded as a family of $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ -objects. Then
(1) $ {\mathsf\Delta} $ satisfies the E-unit reflexivity condition if and only if for any $ I\in{\mathbb{C}} $ , the following nonexpansivity holds on the global element $ \lceil{\eta_I}\rceil:1\to I\Rightarrow TI $ corresponding to the monad unit:
\begin{equation*} \lceil{\eta_I}\rceil\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) ({\bf I}, {E}^{\mathcal{Q}}I\multimap {\mathsf\Delta}^{1}_{I}). \end{equation*}-
(2) $ {\mathsf\Delta} $ satisfies the E-composablity condition if and only if for any $ I,J\in{\mathbb{C}} $ and $ m,n\in M $ , the following nonexpansivity holds on the uncurried bind morphism $ \mathrm{ub}_{I,J}:TI\times(I\Rightarrow TJ)\rightarrow TJ $ :
\begin{equation*} \mathrm{ub}_{I,J}\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})({\mathsf\Delta}^{m}_{I}\otimes ( {E}^{\mathcal{Q}}I\multimap{\mathsf\Delta}^{n}_{J}), {\mathsf\Delta}^{m\cdot n}_{J}). \end{equation*}
Azevedo de Amorim et al. (Reference Azevedo de Amorim, Gaboardi, Hsu and Katsumata2019) formalized families of composable divergences as parameterized assignments in weakly closed monoidal refinements. Roughly speaking, they adopted the equivalence (2) of Theorem 23 as the definition of parameterized assignment. However, divergence on monads and parameterized assignments are built on slightly different categorical foundations, and their generalities are incomparable. Notable differences from parameterized assignments are: 1) divergences on monads are defined with respect to basic endorelations, and 2) the underlying category of divergences on monads is any CCs, while parameterized assignments requires a closed structure on their underlying category.
6.1.3 Divergences on monads and divergence liftings of monads
We next relate graded divergences on monads and monad-like structures on the category $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ of $ {\mathcal{Q}} $ -divergences on $ {\mathbb{C}} $ -objects. What we mean by monad-like structures is graded divergence liftings of monads on $ {\mathbb{C}} $ , which we introduce below. It is a graded monad on $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ (Katsumata, Reference Katsumata2014; Smirnov, Reference Smirnov2008) whose unit and multiplication are inherited from a monad on $ {\mathbb{C}} $ .
Definition 24. Let $ ({\mathbb{C}}, T) $ be a CC-SM, $ (M,\le,1,(\cdot)) $ be a grading monoid and $ {\mathcal{Q}} $ be a divergence domain. An M-graded $ {\mathcal{Q}} $ -divergence lifting of T is a mapping $ \dot{T} : M \times{\bf Obj}({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) )\rightarrow{\bf Obj}({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})) $ such that (below V stands for the forgetful functor $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ )
(1) $ V (\dot{T} m X) = T (V X) $
(2) $ m \leq n $ implies $ \dot{T} m X \preceq_{T(VX)} \dot{T} n X $
(3) $ \eta_{V X} \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X, \dot{T} 1 X) $
(4) $ \mu_{V X} \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (\dot{T} m (\dot{T} n X), \dot{T} (m \cdot n) X) $ .
Let $ E : {\mathbb{C}} \rightarrow {\bf BRel}({\mathbb{C}}) $ be a basic endorelation. We say that an M-graded $ {\mathcal{Q}} $ -divergence lifting $ \dot{T} $ of T is E-strong if the strength $ \theta $ of T satisfies
We write $ {\bf SGDLift}(T, E,M,{\mathcal{Q}}) $ for the collection of E-strong M-graded $ {\mathcal{Q}} $ -divergence liftings of T. We introduce a partial order $ \preceq $ (reusing the notation for the partial order between divergences in Definition 2) on $ {\bf SGDLift}(T, E,M,{\mathcal{Q}}) $ by:
We will later see a similar concept of strong graded relational lifting of monad in Definition 37. Divergence liftings and relational liftings are actually instances of a common general definition of strong graded lifting of monad (Katsumata, Reference Katsumata2014), but in this paper we omit this general definition.
In the following theorem, we show that every divergence on a monad can be expressed as the composite of a graded divergence lifting and the divergence corresponding to a basic endorelation.
Theorem 25. Let $ ({\mathbb{C}}, T) $ be a CC-SM, $ (M,\le,1,(\cdot)) $ be a grading monoid, $ {\mathcal{Q}} $ be a divergence domain, and $ E : {\mathbb{C}} \rightarrow {\bf BRel}({\mathbb{C}}) $ be a basic endorelation. For any $ {\mathsf\Delta} \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ , define a mapping $ [{\mathsf\Delta}]:M\times{\bf Obj}({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}))\to{\bf Obj}({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})) $ by:
where
Then $ [{\mathsf\Delta}] $ is an M-graded $ {\mathcal{Q}} $ -divergence lifting such that $ {\mathsf\Delta}^m_I = [{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}I) $ .
When $ M = 1 $ , Theorem 25 implies that the assignment $ I\mapsto {\mathsf\Delta}_I $ extends to the $ {E}^{\mathcal{Q}} $ - relative monad $ [{\mathsf\Delta}]\circ :{E}^{\mathcal{Q}}{\mathbb{C}}\to{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ in the sense of Altenkirch et al. (Reference Altenkirch, Chapman and Uustalu2015).
When we strengthen the assumptions on $ ({\mathbb{C}}, T) $ and $ {\mathcal{Q}} $ as done in Section 6.1.2, we obtain a sharper correspondence between divergences on monads and strong graded divergence liftings of monads.
Theorem 26. Let $ ({\mathbb{C}}, T) $ be a CCC-SM, M be a grading monoid, $ {\mathcal{Q}} $ be a divergence domain such that $ (+) $ satisfies $ x+\top=\top $ and $ E : {\mathbb{C}} \rightarrow {\bf BRel}({\mathbb{C}}) $ be a basic endorelation. Then there exists an adjunction between partial orders:
where $ \langle \dot{T}\rangle m I \triangleq \dot{T} m ( {E}^{\mathcal{Q}}I) $ .
6.2 Generation of divergences
It has been shown that DP can be interpreted as hypothesis testing (Kairouz et al., Reference Kairouz, Oh and Viswanath2015; Wasserman and Zhou, Reference Wasserman and Zhou2010). Given a query $ c\colon I\to G J $ and adjacent datasets $ (d_1,d_2) \in R_{\mathrm{adj}}\subseteq I^2 $ , we consider the following hypothesis testing with the null and alternative hypotheses:
For any rejection region $ S \in \Sigma_J $ , the Type I and Type II errors are then represented by $ \Pr[c(d_1) \in S] $ and $ \Pr[c(d_2) \notin S] $ , respectively. Kairouz et al. (Reference Kairouz, Oh and Viswanath2015) showed that c is $ (\varepsilon,\delta) $ -DP if and only if for any adjacent datasets $ (d_1,d_2) \in R_{\mathrm{adj}}\subseteq I^2 $ , the pair of Type I error and Type II error lands in the privacy region $ R(\varepsilon,\delta) $ :
They also showed that this is equivalent to the testing using probabilistic decision rules (Kairouz et al., Reference Kairouz, Oh and Viswanath2015, Corollary 2.3):
Later Balle et al. (Reference Balle, Barthe, Gaboardi, Hsu, Sato, Chiappa and Calandra2020) generalized this probabilistic variant of hypothesis testing to general statistical divergences and arrived at a notion of k-generatedness of statistical divergences ( $ k \in {\mathbb{N}} \cup \{\infty\} $ ). Following their generalization, we introduce the concept of $ \Omega $ -generatedness of divergences on monads.
Definition 27. Let $ \Omega \in {\mathbb{C}} $ . A divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ is $ \Omega $ -generated if for any $ m \in M $ , $ I\in{\mathbb{C}} $ and $ c_1,c_2 \in U(TI) $ ,
An equivalent definition of $ {\mathsf\Delta} \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ being $ \Omega $ -generated is: the following holds for any $ m\in M,I\in{\mathbb{C}},c_1,c_2\in U(TI),v\in{\mathcal{Q}} $ :
Here, $ \tilde{\mathsf\Delta}(m,v)\Omega $ is the binary relation $ \{ (c_1, c_2) \mid {\mathsf\Delta}^{m}_{\Omega} (c_1, c_2) \leq v \} $ ; see also (14). For an $ \Omega $ -generated divergence $ {\mathsf\Delta} $ , its component $ {\mathsf\Delta}^{m}_{\Omega} $ at $ \Omega $ is an essential part that determines all components $ {\mathsf\Delta}^{m}_{I} $ of $ {\mathsf\Delta} $ . When a divergence is shown to be $ \Omega $ -generated, the calculation of the codensity lifting $ T^{[ {{\mathsf\Delta}}]} $ given in Section 7 will be simplified (Section 7.1).
We illustrate $ \Omega $ -generatedness of various divergences. First, we show the $ \Omega $ -generatedness of divergences on the Giry monad G in Tables 2 and 3.
• Divergence $ \mathsf{DP} $ is generated over the two-point discrete space 2 (Balle et al., Reference Balle, Barthe, Gaboardi, Hsu, Sato, Chiappa and Calandra2020, Section B.7). The binary relation $ (\widetilde{\mathsf{DP}}(\varepsilon,\delta)2) $ coincides with the privacy region $ R(\varepsilon,\delta) $ .
• Divergence $ \mathsf{TV} $ is also generated over 2 (Balle et al., Reference Balle, Barthe, Gaboardi, Hsu, Sato, Chiappa and Calandra2020, Section C.1).
• Divergences $ \mathsf{Re}^\alpha $ , $ \mathsf{Chi} $ , $ \mathsf{HD} $ , and $ \mathsf{KL} $ are generated over the countably infinite discrete space $ \mathbb{N} $ . In contrast, they are not N-generated for every finite discrete space N (Balle et al., Reference Balle, Barthe, Gaboardi, Hsu, Sato, Chiappa and Calandra2020, Sections B.5 and B.9).
On the sub-Giry monad $ G_{s} $ , the divergence $ \mathsf{DP} $ is 1-generated, and the total variation distance $ \mathsf{TV} $ is 2-generated.
Proposition 28. The divergence $ \mathsf{DP} \in {\bf Div}(G_{s},\operatorname{Eq}, {\mathcal{R}}^+,{\mathcal{R}}^+) $ is 1-generated.
Proposition 29. The divergence $ \mathsf{TV} \in {\bf Div}(G_{s},\operatorname{Eq}, 1,{\mathcal{R}}^+) $ is not 1-generated but 2-generated.
$ \Omega $ -Generatedness of preorders on monads. We relate $ \Omega $ -generatedness of divergences and preorders on monads studied in Katsumata and Sato (Reference Katsumata, Sato and Pfenning2013). Let T be a monad on Set and $ \Omega $ be a set. Katsumata and Sato (Reference Katsumata, Sato and Pfenning2013) introduced the concept of congruent and substitutive preorders on $ T\Omega $ as those satisfying:
• (Substitutivity) For any function $ f\colon \Omega\to T\Omega $ and $ c_1,c_2\in T\Omega $ , $ c_1 \leq c_2 $ implies $ f^\sharp(c_1) \leq f^\sharp(c_2) $ .
• (Congruence) For any function $ f_1,f_2\colon J\to T\Omega $ , if $ f_1(x) \leq f_2(x) $ holds for any $ x\in J $ , then $ f_1^\sharp (c) \leq f_2^\sharp (c) $ holds for any $ c\in T\Omega $ .
For instance, any component of a preorder on T at $ \Omega $ forms a congruent and substitutive preorder on $ T\Omega $ . We write $ \mathbf{CSPre}(T,\Omega) $ for the set of all congruent and substitutive preorders on $ T\Omega $ , and $ \mathbf{Pre}(T) $ for the collection of all preorders on T. Katsumata and Sato (Reference Katsumata, Sato and Pfenning2013) gave a construction $ [-]^\Omega\colon \mathbf{CSPre}(T,\Omega)\to\mathbf{Pre}(T) $ of preorders on T from congruent and substitutive preorders on $ T\Omega $ :
The constructed preorders on T are $ \Omega $ -generated in the following sense:
Proposition 30. For any $ {\leq} \in \mathbf{CSPre}(T,\Omega) $ , the $ {\mathcal{B}} $ -divergence $ {\mathsf\Delta}^{[\leq]^\Omega}_{ }{} $ corresponding to the preorder $ {[\leq]}^\Omega $ on T is $ \Omega $ -generated (see Proposition 18 for the correspondence).
Applying this proposition, we can determine $ \Omega $ -generatedness of preorders on monads:
• If the monad T has a rank $ \alpha $ , the construction $ [-]^\alpha $ is bijective (Katsumata and Sato, Reference Katsumata, Sato and Pfenning2013, Theorem 7). Hence for such a monad, each preorder on T corresponds to an $ \alpha $ -generated $ {\mathcal{B}} $ -divergence.
• For the subprobability distribution monad $ D_s $ on Set, Sato (Reference Sato, Jacobs, Silva and Staton2014) identified all preorders on $ D_s $ : there are 41 preorders on $ D_s $ . Among them, 25 preorders are 1-generated, while 16 preorders are 2-generated (Sato, Reference Sato, Jacobs, Silva and Staton2014, Proposition 6.3).
6.3 An adjunction between quantitative equational theories and divergences
Mardare et al. (Reference Mardare, Panangaden and Plotkin2016) introduced a concept of quantitative equational theory as an algebraic presentation of monads on the category of pseudometric spaces. A quantitative equational theory is an equational theory with indexed equations $ {t}=_{\varepsilon}{u} $ having the axioms of pseudometric spaces, plus suitable axioms reflecting properties of quantitative algebras. A quantitative equational theory determines a pseudometric on the set of $ \Sigma $ -terms.
Consider a set $ \Sigma $ of function symbols of finite arity. If n is the arity of a function $ f \in \Sigma $ , we write $ f \colon n \in \Sigma $ . Let $ \Omega $ be a set of variables, and let $ T_\Sigma \Omega $ be the $ \Sigma $ -term algebra over $ \Omega $ . For $ f \colon n \in \Sigma $ and $ t_1,\ldots,t_n \in T_\Sigma \Omega $ , we write $ f(t_1,\ldots,t_n) $ for the term obtained by applying f to $ t_1,\ldots,t_n $ . The construction $ \Omega \mapsto T_\Sigma \Omega $ forms a (strong) monad on Set whose unit is given by $ \eta_\Omega (x) = x $ , and whose Kleisli extension $ h^\sharp \colon T_\Sigma I \to T_\Sigma \Omega $ of function $ h \colon I \to T_\Sigma \Omega $ is given inductively by:
A substitution of $ \Sigma $ -terms over $ \Omega $ is a function $ \sigma \colon \Omega \to T_\Sigma \Omega $ . For $ t \in T_\Sigma \Omega $ , we call $ \sigma^\sharp(t) $ the substitution of $ \sigma $ to t. We define the set of indexed equations of terms by:
Here, the index $ \varepsilon $ runs over nonnegative rational numbers. A conditional quantitative equation is a judgment of the following form:
the left-hand side of turnstile ( $ \vdash $ ) is called hypothesis and the right-hand side conclusion. By $ \mathbb E(T_\Sigma \Omega ) $ , we mean the set of conditional quantitative equations. For any countable subset $ \Gamma $ of $ \mathbb V (T_\Sigma \Omega ) $ and any substitution $ \sigma \colon \Omega \to T_\Sigma \Omega $ , we define $ \sigma(\Gamma) \triangleq \{ {\sigma^\sharp(t)}=_{\varepsilon} {\sigma^\sharp(u)}{ } \mid {t}=_{\varepsilon}{u} \in \Gamma \} $ .
Definition 31. Quantitative Equational Theory (Mardare et al., Reference Mardare, Panangaden and Plotkin2016, Definition 2.1). A quantitative equational theory (QET for short) of type $ \Sigma $ over $ \Omega $ is a set $ U \subseteq \mathbb E(T_\Sigma \Omega ) $ closed under the rules summarized as Figure 1. We write $ {\bf QET}(\Sigma ,\Omega ) $ for the set of QETs of type $ \Sigma $ over $ \Omega $ . We regard it as a poset $ ({\bf QET}(\Sigma ,\Omega ),\subseteq) $ by the set inclusion order. Given a set $ U_0 $ of conditional quantitative equations of type $ \Sigma $ over $ \Omega $ , by $ \overline{U_0}^{\mathrm{QET}({\Sigma},{ })} \Omega $ we mean the least QET of type $ \Sigma $ over $ \Omega $ including $ U_0 $ .
The goal of this section is to establish an adjunction between the poset of quantitative equational theories and the poset of pseudometricsFootnote 6 on free-algebra monads on Set. More specifically, we construct the following adjunction and isomorphism between posets:
which are subsequently defined (Definition 32 for CSPMet, Definition 33 for PMet and equations (9)–(12) for morphisms). The poset in the middle is that of congruent and substitutive pseudometrics, which are a quantitative analog of congruent and substitutive preorders appeared in Section 6.2.
Definition 32. Let T be a monad on Set and $ \Omega \in {\bf Set} $ . A congruent and substitutive pseudometric (CS-PMet for short) on $ T\Omega $ is a pseudometric $ d \colon (T\Omega )^2 \to {\mathcal{R}}^+ $ on $ T\Omega $ satisfying
• (Substitutivity) For all functions $ f\colon \Omega \to T\Omega $ and $ c_1,c_2\in T\Omega $ , $ d(f^\sharp(c_1),f^\sharp(c_2)) \leq d(c_1,c_2) $ .
• (Congruence) For all sets I, functions $ f_1,f_2 \colon I\to T\Omega $ and $ c \in TI $ , $ d(f_1^\sharp(c),f_2^\sharp(c)) \leq \sup_{i \in I} d(f_1(i),f_2(i)) $ .
By $ {\bf CSPMet}(T, \Omega ) $ , we mean the set of CS-PMets on $ T\Omega $ . We then make it into a poset $ ({\bf CSPMet}(T, \Omega ),\preceq) $ where $ \preceq $ is the restriction of the partial order $ \preceq_{T\Omega} $ in Definition 2 to CS-PMets.
Definition 33. Let T be a monad on Set and $ \Omega \in {\bf Set} $ . By $ {\bf PMet}(T,\Omega ) $ we mean the collection of $ \Omega $ -generated $ \operatorname{Eq} $ -relative $ {\mathcal{R}}^+ $ -divergences $ {\mathsf\Delta} $ on T such that each component $ {\mathsf\Delta}^{}_{I} $ is a pseudometric. We restrict the partial order $ \preceq $ on $ {\bf Div}(T,\operatorname{Eq},1,{\mathcal{R}}^+) $ to $ {\bf PMet}(T,\Omega ) $ .
We next introduce monotone functions appearing in (8):
Proposition 34. The functions $ D[-],U[-],\mathrm{Gen},(-)_\Omega $ defined as (9)-(12) are all well-defined monotone functions having types given in (8).
That D[U] is a pseudometric is shown in the beginning of Mardare et al. (Reference Mardare, Panangaden and Plotkin2016, Section 5). Here, we additionally show that it enjoys congruence and substitutivity of Definition 32. The function Gen is taken from the right-hand side of the definition of $ \Omega $ -generatedness (Definition 27). The function $ (-)_\Omega $ simply extracts the $ \Omega $ component of a given divergence.
Theorem 35. For any set $ \Sigma $ of function symbols with finite arity and set $ \Omega $ , the following holds for the monotone functions in (8):
(1) Gen is the inverse of $ (-)_\Omega $ .
(2) We have an adjunction $ U[-]\dashv D[-] $ satisfying $ D[U[-]] = {\rm id} $ :
(13)
Intuitively, the right adjoint $ D[-] $ extracts the pseudometric on $ T_\Sigma \Omega $ from a given QET. The left adjoint $ U[-] $ constructs the least QET containing all information of a given pseudometric on $ T_\Sigma \Omega $ . The range of $ U[-] $ is characterized as the set of unconditional QETs of type $ \Sigma $ over $ \Omega $ defined below (see also Mardare et al. (Reference Mardare, Panangaden and Plotkin2017, Section 3)):
Therefore, the adjunction (13) cuts down to the following isomorphism between posets, stating that unconditional QETs of type $ \Sigma $ over $ \Omega $ are equivalent to $ \Omega $ -generated pseudometrics on $ T_\Sigma $ :
Theorem 36. $ (\mathbf{UQET}(\Sigma ,\Omega ),\subseteq) \cong({\bf CSPMet}(T_\Sigma , \Omega ),\preceq) \cong({\bf PMet}(T_\Sigma ,\Omega ),\preceq) $ .
7 Graded Strong Relational Liftings for Divergences
We have introduced the concept of divergence on monad for measuring quantitative difference between two computational effects. To integrate this concept with relational program logic, we employ a semantic structure called graded strong relational lifting of monad. It is introduced for the semantics of approximate probabilistic relational Hoare logic for the verification of differential privacy (Barthe et al., Reference Barthe, Köpf, Olmedo and Béguelin2012), then later used in various program logics (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Kunz and Strub2014, Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015; Barthe and Olmedo, Reference Barthe and Olmedo2013; Sato, Reference Sato2016; Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019). Independently, it is also introduced as a semantic structure for effect system (Katsumata, Reference Katsumata2014). Liftings introduced in the study of differential privacy are designed to satisfy a special property called fundamental property (Barthe et al., Reference Barthe, Köpf, Olmedo and Béguelin2012, Theorem 1): when we supply the equivalence relation to the lifting, it returns the adjacency relation of the divergence. This special property is the key to express the differential privacy of probabilistic programs in relational program logics.
In this paper, we present a general construction of graded strong relational liftings from divergences on monads. First, we recall its definition (Gaboardi et al., Reference Gaboardi, Katsumata, Orchard, Sato and Yoshida2021; Katsumata, Reference Katsumata2014).
Definition 37. Let $ ({\mathbb{C}},T) $ be a CC-SM and $ (M, \leq, 1, (\cdot)) $ be a grading monoid. An M-graded strong relational lifting $ \dot{T} $ of T is a mapping $ \dot{T} : M \times {\bf Obj}({\bf BRel}({\mathbb{C}})) \rightarrow {\bf Obj}({\bf BRel}({\mathbb{C}})) $ satisfying the following conditions:
(1) $ p_{{\mathbb{C}}}(\dot{T} m X)=(T X_1, T X_2) $ , and $ m \leq m' $ implies $ \dot{T} m X \leq \dot{T} m' X $ .
(2) $ (\eta_{X_1}, \eta_{X_2}) : X \mathbin{\dot\rightarrow} \dot{T} 1 (X) $ .
(3) $ (f_1, f_2) : X \mathbin{\dot\rightarrow} \dot{T} m (Y) $ implies $ (f^\sharp_1, f^\sharp_2) : \dot{T} m' X \mathbin{\dot\rightarrow} \dot{T} (m \cdot m') Y $ .
(4) $ (\theta_{X_1, Y_1}, \theta_{X_2, Y_2}) : X \mathbin{\dot\times} \dot{T} m Y \mathbin{\dot\rightarrow} \dot{T} m (X \mathbin{\dot\times} Y) $ .
Our interest is in the graded strong relational lifting that carries the information of a given divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ . We formulate such liftings by the following fundamental property. First, we define the adjacency relation of $ {\mathsf\Delta} $ by:
Remark that $ \tilde{\mathsf\Delta} $ is monotone on m and v.
Definition 38. We say that an $ M\times {\mathcal{Q}} $ -graded strong relational lifting $ \dot T $ of T satisfies the fundamental property with respect to $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ if the following holds
Theorem 39. Let $ ({\mathbb{C}},T) $ be a CC-SM, $ (M, \leq, 1, (\cdot)) $ be a grading monoid, $ {\mathcal{Q}}=(Q,\le,0,(+)) $ be a divergence domain and $ {\mathsf\Delta} = \{{\mathsf\Delta}^{m}_{I} \colon (U(TI))^2\to {\mathcal{Q}} \}_{m\in M,I\in{\mathbb{C}}} $ be a doubly indexed family of $ {\mathcal{Q}} $ -divergences on TI satisfying monotonicity on m (Definition 6). Define the following mapping $ T^{[{\mathsf\Delta}]}:(M\times{\mathcal{Q}})\times{\bf Obj}({\bf BRel}({\mathbb{C}}))\to{\bf Obj}({\bf BRel}({\mathbb{C}})) $ :
(1) The mapping $ T^{[{\mathsf\Delta}]} $ is an $ M \times {\mathcal{Q}} $ -graded strong relational lifting of T.
(2) Let $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation. Then,
(S)(C)
Intuitively, $ T^{[ \mathsf\Delta]} $ is a graded version of the codensity lifting (Katsumata et al., Reference Katsumata, Sato and Uustalu2018) of T along the specific fibration $ p_{{\mathbb{C}}}:{\bf BRel}({\mathbb{C}})\to{{\mathbb{C}}}^2 $ . We extend the codensity lifting with the grading mechanism in the same way as the graded $ \top\top $ -lifting in Katsumata (Reference Katsumata2014). The graded codensity lifting is also a generalization of the graded relational lifting for DP given in Sato (Reference Sato2016).
Proof. (Proof of (1)) Proving conditions 1–3 of graded strong relational lifting (Definition 37) are routine generalization of Katsumata et al. (Reference Katsumata, Sato and Uustalu2018) and Katsumata (Reference Katsumata2014, Section 5); thus omitted here (see Lemma 50 in appendix).
Condition 4 of Definition 37 needs a special attention because in general codensity lifting does not automatically lift strength. The current setting works because of our particular choice of the category of binary relations over $ {\mathbb{C}} $ . We prove condition 4 as follows. Since $ f_i \mathbin{\bullet} j = f \mathbin{\bullet} \langle i, j \rangle $ holds for any $ j\in UJ $ , we have the equivalence:
From this, condition 3 (law of graded Kleisli extension), and the equation (1) on the strength of a CC-SM, we prove condition 4 from condition 2 (unit law): for all $ m \in M $ and $ v \in {\mathcal{Q}} $ , we have
(Proof of (2)-(S)) We show the equivalence of $ {\mathsf\Delta} $ being E-unit-reflexive and the implication:
We suppose that the above implication holds. We fix $ I \in {\mathbb{C}} $ . Let $ (i,j) \in EI $ . By instantiating the whole implication with $ m=1,v=0,c=\eta_I\mathbin{\bullet} i,c'=\eta_I\mathbin{\bullet} j $ , the middle part of (15) becomes
which is trivially true. Therefore, we conclude $ {\mathsf\Delta}^{m}_{I}(\eta_I\mathbin{\bullet} i,\eta_I\mathbin{\bullet} j)\le 0 $ for any $ (i,j) \in EI $ , that is, E-unit reflexivity holds.
Conversely, we suppose that $ {\mathsf\Delta} $ satisfies the unit reflexivity. We take I,m,v,c,c’ of appropriate type and assume the middle part of (15). By instantiating it with $ J=I,m'=1,v'=0,k=l=\eta_I $ , we conclude $ {\mathsf\Delta}^{m}_{I}(c,c')\le v $ .
(Proof of (2)-(C)) We show the equivalence of $ {\mathsf\Delta} $ being E-composable and the implication $ \forall{I \in {\mathbb{C}},m\in M,v\in {\mathcal{Q}}}~.~\tilde{\mathsf\Delta} (m,v) I\le T^{[ {\mathsf\Delta}]} (m,v)(EI) $ as follows:
The first two equivalences are obtained by expanding the definitions of $ {\bf BRel}({\mathbb{C}}) $ , $ T^{[ {\mathsf\Delta}]} {} $ , and $ \tilde{\mathsf\Delta} $ , and the last two equivalences hold because $ {\mathcal{Q}} $ is a divergence domain.
The construction of $ T^{[ {\mathsf\Delta}]} {} $ gives the greatest relational lifting of T with the fundamental property.
Proposition 40. Let $ ({\mathbb{C}},T) $ be a CC-SM, $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation, $ (M, \leq, 1, (\cdot)) $ be a grading monoid, $ {\mathcal{Q}} $ be a divergence domain, $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ be a divergence, and $ \dot T $ be an $ M \times {\mathcal{Q}} $ -graded relational lifting satisfying the fundamental property with respect to $ {\mathsf\Delta} $ . Then, for all $ m \in M $ , $ q \in {\mathcal{Q}} $ , and $ X \in {\bf BRel}({\mathbb{C}}) $ , the following inequality holds
7.1 Simplifying codensity liftings by $ \Omega $ -generatedness of divergences
We show that the calculation of the codensity lifting $ T^{[ {\mathsf\Delta}]} {} $ can be simplified when $ {\mathsf\Delta} $ is $ \Omega $ -generated. For an object $ I\in{\mathbb{C}} $ , we define $ T^{[ {\mathsf\Delta}],I} $ by:
The original calculation of $ T^{[{\mathsf\Delta} ]} {} $ is a large intersection $ T^{[ {\mathsf\Delta}]} = \bigwedge_{I\in {\mathbb{C}}} T^{[{\mathsf\Delta} ], I} $ where I runs over all $ {\mathbb{C}} $ -objects, while if $ {\mathsf\Delta} $ is $ \Omega $ -generated, the parameter I can be fixed to $ \Omega $ .
Proposition 41. For any $ \Omega $ -generated divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ , we have $ T^{[ {\mathsf\Delta}]} {} = T^{[{\mathsf\Delta}], \Omega} $ .
Proof. We show the equivalence $ T^{[{\mathsf\Delta} ]} X = T^{[{\mathsf\Delta}], \Omega} X $ for each $ X \in {\bf BRel}({\mathbb{C}}) $ .
( $ \leq $ ) Immediate from $ T^{[{\mathsf\Delta} ]} = \bigwedge_{I\in {\mathbb{C}}} T^{[ {\mathsf\Delta}], I} $ .
( $ \geq $ ) By the $ \Omega $ -generatedness of $ {\mathsf\Delta} $ , for all $ I \in {\mathbb{C}} $ and $ c'_1,c'_2 \in U(TI) $ , we have
Therefore, for any $ (c_2,c_2) \in U(TX_1) \times U(TX_2) $ , we have
This completes the proof.
For example, the generatedness of $ \mathsf{DP} $ shown in Section 6.2 implies that $ G^{[ {\mathsf{DP}}]} = G^{[\mathsf{DP}], 2} $ and $ G_{s}^{[ {\mathsf{DP}}]} = G_{s}^{[ \mathsf{DP}], 1} $ . In fact, the simplification $ G_{s}^{[ \mathsf{DP}, 1]} $ is equal to the $ ({\mathcal{R}}^+)^2 $ -graded relational lifting $ G_{s}^{\top\top} $ for DP given in Sato (Reference Sato2016, Section 2.2), which is defined by, for each $ (X_1,X_2,R_X) \in {\bf BRel}({\bf Meas}) $ :
For detail, see the proof of equalities (†) and (‡) in the proof of Theorem 2.2 (iv) in Sato (Reference Sato2016).
7.2 Two lifting approaches: Codensity and coupling
We briefly compare two lifting approaches: graded codensity lifting and coupling-based lifting, and the latter of which is employed in Barthe et al. (Reference Barthe, Köpf, Olmedo and Béguelin2012), Barthe and Olmedo (Reference Barthe and Olmedo2013), Barthe et al. (Reference Barthe, Gaboardi, Arias, Hsu, Kunz and Strub2014), Barthe et al. (Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015), and Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019).
We compare the role of the unit reflexivity and composability in each lifting approaches. Consider the CCC-SM $ ({\bf Set},D) $ , where D is the probability distribution monad. Given an $ \operatorname{Eq} $ -relative M-graded $ {\mathcal{Q}} $ -divergence $ {\mathsf\Delta} $ on D, the coupling-based graded lifting is defined by:
where $ p_i\colon R_X\rightarrow X_i $ is the projection ( $ i=1,2 $ ) from the binary relation. The pair $ (\mu_1,\mu_2) $ of probability distributions collected in the right-hand side of (16) is called a coupling.
The fundamental property $ \dot D^{\mathsf\Delta}(\operatorname{Eq} I)=\tilde{\mathsf\Delta}(m,v)I $ immediately follows from the definition of $ \dot D^{\mathsf\Delta} $ , while the composability and unit reflexivity of $ {\mathsf\Delta} $ are used to make $ \dot D^{{\mathsf\Delta}} $ a strong $ M\times {\mathcal{Q}} $ -graded lifting (Barthe and Olmedo, Reference Barthe and Olmedo2013, Proposition 9). On the other hand, the codensity graded lifting $ D^{[ {{\mathsf\Delta}}]} $ is always an $ M\times {\mathcal{Q}} $ -graded lifting; this does not rely on the unit reflexivity and composability of $ {\mathsf\Delta} $ (Proposition 1). These properties are used to show that $ D^{[ {{\mathsf\Delta}}]} $ satisfies the fundamental property (Proposition 2).
The coupling-based lifting (16) can be naturally generalized to any Set-monad T. However, at this momen, we do not know how to generalize the coupling technique to any CC-SM $ ({\mathbb{C}},T) $ . As the prior study by Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019) pointed out, there is already a difficulty in extending it to the CC-SM $ ({\bf Meas},G) $ , where G is the Giry monad.
We illustrate how the problem arises. Let $ X \in {\bf BRel}({\bf Meas}) $ . We would like to pick two probability measures over $ R_X $ as couplings, but $ R_X $ is merely a set. We therefore equip it with the subspace $ \sigma $ -algebra of $ X_1\times X_2 $ , and let $ H_X $ be the derived measurable space (hence $ |H_X|=R_X $ ). We write $ p_i:H_X \rightarrow X_i $ for measurable projections ( $ i=1,2 $ ). We then define a candidate $ M\times {\mathcal{Q}} $ -graded lifting of G by:
We now verify that $ \dot{G} $ also lifts the Kleisli extension of G, that is,
Let $ (f, g)\colon Y\mathbin{\dot\rightarrow} \dot{G} (m',v') X $ be pair of measurable functions. Then for each $ (x,y) \in R_Y $ , we have $ (f \mathbin{\bullet} x, g \mathbin{\bullet} y) \in R_{\dot G(m,v) X} $ . Therefore, there exists $ (\mu^{(x,y)}_1,\mu^{(x,y)}_2) \in (UG H_X)^2 $ such that $ G\pi_1 \mathbin{\bullet} \mu^{(x,y)}_1 = f \mathbin{\bullet} x $ and $ G\pi_2 \mathbin{\bullet} \mu^{(x,y)}_2 = g \mathbin{\bullet} y $ . Using the axiom of choice, we turn this relationship into functions $ \mu_1,\mu_2\colon R_Y \to UG H_X $ . If they were measurable functions of type $ H_Y \rightarrow G H_X $ , then from the composability of $ {\mathsf\Delta} $ , we would have $ {\mathsf\Delta}^{mm'}_{H_X}(\mu^\sharp_1 \mathbin{\bullet} w_1, \mu^\sharp_2 \mathbin{\bullet} w_2) \leq v + v' $ for $ w_1, w_2 \in U(G H_Y) $ such that $ {\mathsf\Delta}^{m'}_{H_Y}(w_1, w_2) \leq v' $ . This gives $ (f^\sharp, g^\sharp) \colon \dot{G} (m,v) Y\mathbin{\dot\rightarrow} \dot{G} (mm', v+v') X $ . However, in general, ensuring the measurability of $ \mu_1,\mu_2 $ is not possible, especially because they are picked up by the axiom of choice. A solution given in Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019) is to use the category $ {\bf Span}({\bf Meas}) $ of spans that guarantees the existence of good measurable functions $ h_1, h_2 \colon H_Y \rightarrow G H_X $ .
8. Approximate Computational Relational Logic
We introduce a program logic called approximate computational relational logic (acRL for short). It is a combination of Moggi’s computational metalanguage and a relational refinement type system (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015). The strong graded relational lifting of a monad constructed from a divergence will be used to relationally interpret monadic types, and gradings give upper bounds of divergences between computational effects caused by two programs. acRL is similar to the relational refinement type system HOARe2 (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015), which is designed for verifying differential privacy of probabilistic programs. Compared to HOARe2, acRL supports general monads and divergences, while it does not support dependent products nor nontermination.
The relational logic acRL adopts the extensional approach (Nielson and Nielson, Reference Nielson and Nielson2007, Chapter 9.2):
• Relational assertions between contexts $ \Gamma $ and $ \Delta $ are defined as binary relations between $ U[\![ \Gamma]\!] $ and $ U[\![ \Delta]\!] $ , or equivalently $ {\bf BRel}({\mathbb{C}}) $ -objects $ \phi $ such that $ p_{{\mathbb{C}}}(\phi)=([\![ \Gamma]\!],[\![ \Delta]\!]) $ . Logical connectives and quantifications are defined as operations on such $ {\bf BRel}({\mathbb{C}}) $ -objects. This is in contrast to the standard design of logic where assertions are defined by a BNF.
• Let $ \Gamma \vdash M : \tau $ and $ \Delta \vdash N : \sigma $ be well-typed terms, $ \phi $ be a relational assertion between $ \Gamma,\Delta $ , and $ \psi $ be an assertion between $ \tau, \sigma $ . The main concern of acRL is the statement “ $ \forall (\gamma, \delta) \in \phi . ([\![ M]\!] \mathbin{\bullet} \gamma, [\![ N]\!] \mathbin{\bullet} \delta) \in \psi $ ” (equivalently $ ([\![ M]\!],[\![ N]\!]):\phi\mathbin{\dot\rightarrow}\psi $ ). In this section, this statement is denoted by $ \phi \vdash (M, M') : \psi $ .
• Inference rules of the logic consists of the facts about the statement $ \phi \vdash (M, M') : \psi $ . We remark that in the standard logic, proving these facts corresponds to the soundness of inference rules.
8.1 Moggi’s computational metalanguage
8.1.1 Syntax of the computational metalanguage
For the higher-order programming language, we adopt Moggi’s computational metalanguage (Moggi, Reference Moggi1991).It is an extension of the simply typed lambda calculus with monadic types. For a set B, we define the set $ {\bf Typ}(B) $ of types over B by the first BNF in Figure 2. We then define the set $ {\bf Typ}_1(B) $ of first-order types to be the subset of $ {\bf Typ}(B) $ consisting only of $ b,1,\times,+ $ .
We next introduce computational signatures for specifying constants in the computational metalanguage. A computational signature is a tuple $ (B,\Sigma_v,\Sigma_e) $ where B is a set of base types, and $ \Sigma_v $ and $ \Sigma_e $ are functions whose range is $ {\bf Typ}_1(B)^2 $ . The domains of $ \Sigma_v,\Sigma_e $ are sets of value operation symbols and effectful operation symbols and are denoted by $ O_v $ and $ O_e $ , respectively. These functions assign input and output types to these operations.
Fix a countably infinite set V of variables. A context is a function from a finite subset of V to $ {\bf Typ}(B) $ ; contexts are often denoted by capital Greek letters $ \Gamma,\Delta $ . For contexts $ \Gamma,\Delta $ such that $ {\rm dom}(\Gamma)\cap{\rm dom}(\Delta)=\emptyset $ , by $ \Gamma,\Delta $ we mean the join of $ \Gamma $ and $ \Delta $ .
The set of raw terms is defined by the second BNF in Figure 2. The type system of the computational metalanguage has judgments of the form $ \Gamma\vdash M:\tau $ , where $ \Gamma $ is a context, M is a raw term, and $ \tau $ is a type. It adopts the standard rules for products, coproducts, implications, and monadic types; see for example, Moggi (Reference Moggi1991). The typing rules for value operations and effectful operations are given by:
A simultaneous substitution $ \theta $ from $ \Gamma $ to $ \Gamma' $ (written $ \Gamma \vdash \theta \colon \Gamma' $ ) is a function $ \theta $ from the set $ {\rm dom}(\Gamma') $ to raw terms that assigns to each variable $ x\in{\rm dom}(\Gamma') $ a well-typed raw term $ \Gamma\vdash\theta(x):\Gamma'(x) $ . The application of $ \theta $ to a term $ \Gamma'\vdash M:\tau $ is denoted by $ M\theta $ , which has a typing $ \Gamma\vdash M\theta:\tau $ . For disjoint contexts $ \Gamma_i $ ( $ i=1,2 $ ), we define the projection substitutions $ \Gamma_1,\Gamma_2\vdash\pi_{i}^{\Gamma_1,\Gamma_2}:\Gamma_i $ by $ \pi_{i}^{\Gamma_1,\Gamma_2}(x)=x $ .
8.1.2 Categorical semantics of the computational metalanguage
The interpretation of the computational metalanguage over a computational signature $ (B,\Sigma_v,\Sigma_e) $ is given by the data specified by Figure 3.
We first inductively extend the interpretation of base types to all types using the bi-Cartesian closed structure and the monad. Next, for each context $ \Gamma $ , we fix a product diagram $ ([\![ \Gamma]\!],\{\pi_x:[\![ \Gamma]\!]\to[\![ \Gamma(x)]\!]\}_{x\in{\rm dom}(\Gamma)}) $ ; when $ {\rm dom}(\Gamma)=\{x\} $ , we assume that $ [\![ \Gamma]\!]=[\![ \Gamma(x)]\!] $ with $ \pi_x={\rm id} $ . Lastly, we interpret a typing derivation of $ \Gamma\vdash M:\tau $ as a morphism $ [\![ M]\!]:[\![ \Gamma]\!]\to[\![ \tau]\!] $ in the standard way, using the interpretations of operations given in Figure 3. We further extend this to the interpretation of each simultaneous substitution $ \Gamma\vdash\theta:\Gamma' $ as a morphisms $ [\![ \theta]\!]:[\![ \Gamma]\!]\to[\![ \Gamma']\!] $ .
8.2 Approximate relational computational logic
8.2.1 Relational logic in external form
A relational assertion $ \phi $ between disjoint contexts $ \Gamma $ and $ \Delta $ is a binary relation between $ U[\![ \Gamma]\!] $ and $ U[\![ \Delta]\!] $ . Such a relational assertion is denoted by $ {}^{\Gamma}_{\Delta}\vdash \phi $ . We identify it as a $ {\bf BRel}({\mathbb{C}}) $ -object $ \phi $ such that $ p_{{\mathbb{C}}}(\phi)=([\![ \Gamma]\!],[\![ \Delta]\!]) $ . Similarly, a relational assertion between types $ \tau $ and $ \sigma $ is defined to be a relational assertion $ {}^{u:\tau}_{d:\sigma}\vdash \phi $ ; here u and d are reserved and fixed variables, respectively.
Relational assertions between contexts $ \Gamma $ and $ \Delta $ carry a boolean algebra structure $ \wedge,\vee,\neg $ given by the set-intersection, set-union, and set-complement (see the boolean algebra $ {\bf BRel}({\mathbb{C}})_{([\![ \Gamma]\!],[\![ \Delta]\!])} $ in Section 2.2). The pseudo-complement $ \phi\Rightarrow\psi $ is defined to be $ \neg \phi\vee\psi $ . For $ {}^{\Gamma,x:\tau}_{\Delta,y:\sigma}\vdash \phi $ , by $ {}^{\Gamma}_{\Delta}\vdash \forall^{x}_{y}~.~\phi $ and $ {}^{\Gamma}_{\Delta}\vdash \exists^{x}_{y}~.~\phi $ we mean the relational assertions defined by the following equivalence:
The boolean algebra structure and the above quantifier operations allow us to interpret first-order logical formulas as relational assertions; we omit its detail here. In addition to these standard logical connectives, we will use graded strong relational lifting $ T^{[{\mathsf\Delta}]} $ to form relational assertions. That is, for any basic endorelation $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ , grading monoid M, divergence domain $ {\mathcal{Q}} $ and divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ , we obtain a relational assertion $ {}^{u:\mathtt{T} \tau}_{d:\mathtt{T} \sigma}\vdash T^{[{\mathsf\Delta}]}(m,v)\phi $ from any $ {}^{u:\tau}_{d:\sigma}\vdash \phi $ , $ m\in M $ and $ v\in {\mathcal{Q}} $ .
For substitutions $ \Gamma\vdash\theta:\Gamma',\Delta\vdash{\theta'}:\Delta' $ and an assertion $ {}^{\Gamma}_{\Delta}\vdash \phi $ , by $ {}^{\Gamma'}_{\Delta'}\vdash \phi[{\theta};{\theta'}] $ , we mean the relational assertion $ \{(\gamma,\delta) \mid ([\![ \theta]\!]\mathbin{\bullet}\gamma,[\![ \theta']\!]\mathbin{\bullet}\delta)\in\phi\} $ . For disjoint context pairs $ \Gamma,\Gamma' $ and $ \Delta,\Delta' $ and relational assertions $ {}^{\Gamma}_{\Delta}\vdash \phi $ and $ {}^{\Gamma'}_{\Delta'}\vdash \psi $ , by the juxtaposition $ {}^{\Gamma,\Gamma'}_{\Delta,\Delta'}\vdash \phi,\psi $ , we mean the relational assertion $ {}^{\Gamma,\Gamma'}_{\Delta,\Delta'}\vdash \phi[{\pi_1^{\Gamma,\Gamma'}};{\pi_1^{\Delta,\Delta'}}]\wedge \psi[{\pi_2^{\Gamma,\Gamma'}};{\pi_2^{\Delta,\Delta'}}] $ .
8.2.2 Inference rules for acRL
For well-typed computational metalanguage terms $ \Gamma\vdash M:\tau $ and $ \Delta\vdash N:\sigma $ , and relational assertions $ {}^{\Gamma}_{\Delta}\vdash \phi $ and $ {}^{u:\tau}_{d:\sigma}\vdash \psi $ , by the judgment:
we mean the inclusion $ \phi\subseteq \psi[{[M/u]};{[N/d]}] $ of binary relations. This is equivalent to $ ([\![ M]\!], [\![ N]\!]):\phi\mathbin{\dot\rightarrow}\psi $ . We show basic facts about judgments $ \phi\vdash(M,N)\colon \psi $ .
Proposition 42.
(1) $ \phi\vdash(M,N)\colon {\psi} $ and $ [\![ M]\!]=[\![ M']\!] $ and $ [\![ N]\!]=[\![ N']\!] $ implies $ \phi\vdash{}(M',N')\colon \psi $ .
(2) $ \phi\vdash(M,N)\colon \psi $ and $ \phi'\subseteq \phi $ and $ \psi\subseteq \psi' $ implies $ \phi'\vdash{}(M,N)\colon \psi' $ .
(3) $ \phi\vdash(M,N)\colon {T^{[{\mathsf\Delta}]}(m,v)\psi} $ and $ m\le n $ and $ v\le w $ and $ \psi\le \psi' $
(4) implies $ \phi\vdash(M,N)\colon {T^{[{\mathsf\Delta}]}(n,w){\psi'}} $ .
(5) $ \phi\vdash{}( M,N)\colon \psi $ implies $ \phi\vdash{}(\mathop{\mathtt{ret}}(M),\mathop{\mathtt{ret}}(N))\colon T^{[{\mathsf\Delta}]}(1,0){\psi} $ .
(6) $ \phi\vdash(M,N)\colon {T^{[{\mathsf\Delta}]}(m,v)\psi} $ and $ \phi,\psi[{[x/u]};{[x'/d]}]\vdash{}(M',N')\colon {T^{[{\mathsf\Delta}]}(n,w)\rho} $
(7) implies $ \phi\vdash{}(\mathop{\mathtt{let}}{x=M}\mathbin{\mathtt{in}}M',\mathop{\mathtt{let}}{x'=N}\mathbin{\mathtt{in}}N')\colon T^{[{\mathsf\Delta}]}(m\cdot n,v\cdot w)\rho $ .
We next establish relational judgments on effectful operations. We present a convenient way to establish such judgments using the fundamental property of the graded relational lifting $ T^{[{\mathsf\Delta}]} $ .
Proposition 43. For any $ c\in O_e $ such that $ \Sigma_e(c)=(b,b') $ , relational assertion $ {}^{u:b}_{d:b}\vdash \phi $ and $ m\in M $ , putting $ v= \sup \{{\mathsf\Delta}^{m}_{[\![ b']\!]} ([\![ c]\!] \mathbin{\bullet} x, [\![ c]\!] \mathbin{\bullet} y) \mid (x, y) \in \phi \} $ , we have $ \phi\vdash{}(c(u),c(d))\colon T^{[{\mathsf\Delta}]}(m,v)(E[\![ b']\!]) $ .
Proof. Take an arbitrary pair $ (x, y) \in \phi $ . We have $ {\mathsf\Delta}^{m}_{[\![ b']\!]} ([\![ c]\!] \mathbin{\bullet} x, [\![ c]\!] \mathbin{\bullet} y) \leq v $ by definition of v. Thanks to the fundamental property of $ T^{[{\mathsf\Delta} ]} $ (Theorem 39), it is equivalent to $ ([\![ c]\!] \mathbin{\bullet} x, [\![ c]\!] \mathbin{\bullet} y) \in T^{[{\mathsf\Delta}]} (m,v) (E [\![ b']\!]) $ .
9. Case Study I: Higher-Order Probabilistic Programs
We represent a higher-order probabilistic programming language with sampling commands from continuous distributions as a computational metalanguage. For now, we assume that the language supports sampling from Gaussian distributions and Laplace distributions. This computational metalanguage is specified by the computational signature:
where $ \Sigma_v $ is some chosen signature for value operations over reals. We interpret this computational metalanguage by filling Figure 3 as follows:
(1) for the CCC-SM, we take $ ({\mathbb{C}},T)=({\bf QBS},P) $ (see Section 2.1),
(2) for the interpretation $ [\![ \mathtt{R}]\!] $ of $ \mathtt{R} $ , we take the quasi-Borel space $ K{\mathbb{R}} $ associated with the standard Borel space $ {\mathbb{R}} $ , where $ K \colon {\bf Meas} \to {\bf QBS} $ is defined in Section 2.1,
(3) the interpretation of value operations is given as expected (we omit it here); for example when $ \Sigma_v $ contains the real number addition operator $ + $ as type $ (\mathtt{R} \times \mathtt{R},\mathtt{R}) $ , its interpretation is the QBS morphism $ [\![ +]\!](x,y)=x + y: [\![ \mathtt{R} \times \mathtt{R}]\!]\rightarrow[\![ \mathtt{R}]\!] $ ,
(4) for the interpretation of effectful operations, we put
\begin{align*} [\![ \mathtt{norm}]\!](x,\sigma) &=[\mathrm{id},\mathcal{N}(x,\sigma^2)]_{\sim_{K{\mathbb{R}}}}, & [\![ \mathtt{lap}]\!](x,\lambda) &=[\mathrm{id},\mathrm{Lap}(x,\lambda)]_{\sim_{K{\mathbb{R}}}}. \end{align*}
Here, $ \mathcal{N}(x,\sigma^2) \in G{\mathbb{R}} $ is the Gaussian distribution with mean x and variance $ \sigma^2 $ . $ \mathrm{Lap}(x,\lambda) \in G{\mathbb{R}} $ is the Laplacian distribution with mean x and variance $ 2\lambda^2 $ Footnote 7. Every probability (Borel-)measure $ \mu \in G {\mathbb{R}} $ on $ {\mathbb{R}} $ can be converted to the probability measure $ [\mathrm{id},\mu]_{\sim_{K{\mathbb{R}}}} \in P K {\mathbb{R}} $ on the quasi-Borel space $ K {\mathbb{R}} $ (see Section 5.5).
9.1 A relational logic for differential privacy
To formulate differential privacy and its relaxations in the quasi-Borel setting, we convert statistical divergences $ {\mathsf\Delta} $ on the Giry monad G in Table 2 to $ \operatorname{Eq}{} $ -relative divergences $ \langle{L,l}\rangle^*{\mathsf\Delta} $ on the probability monad P on QBS by the construction in Section 5.5. Then, we construct the graded relational lifting $ P^{[ {\langle{L,l}\rangle^*{\mathsf\Delta}}]} $ by Theorem 39. Using this, as an instantiation of acRL, we build a relational logic reasoning about differential privacy and its relaxations, supporting higher-order programs and continuous random samplings. Basic proof rules can be given by Proposition 42.
For effectful operations, we import basic proof rules on noise-adding mechanisms given in prior studies (Bun et al., Reference Bun, Dwork, Rothblum and Steinke2018; Dwork et al., Reference Dwork, McSherry, Nissim and Smith2006; Dwork and Roth, Reference Dwork and Roth2013; Mironov, Reference Mironov2017) via Theorem 12 and Proposition 43. For example, consider the $ \operatorname{Eq} $ -relative $ {\mathcal{R}}^+ $ -graded $ {\mathcal{R}}^+ $ -divergence $ {\mathsf\Delta} = \langle{L,l}\rangle^*\mathsf{DP} $ on P. Proposition 43 with an effectful operation $ c=\mathtt{lap} $ and a relational assertion (below we identify global elements in $ K{\mathbb{R}} $ and real numbers):
together with Theorem 12 and the prior result (Dwork et al., Reference Dwork, McSherry, Nissim and Smith2006, Example 1) yields the following judgment:
By letting $ \mathrm{diff}_r $ be the relational assertion $ {}^{u:\mathtt{R}}_{d:\mathtt{R}}\vdash \{(x,y) \mid |x-y|\le r\} $ , the above judgment is equivalent to:
This rule corresponds to the rule [LapGen] of the program logic apRHL+ (Barthe et al., Reference Barthe, Grégoire, Hsu and Strub2017) for differential privacy. For another example, by the reflexivity of $ \mathsf{DP} $ , $ \langle{L,l}\rangle^*\mathsf{DP} $ is also reflexive. Hence, by letting $ \mathrm{succ}_r $ be the relational assertion $ {}^{u:\mathtt{R}}_{d:\mathtt{R}}\vdash \{(x,y) \mid y=x+r\} $ , we obtain the following judgments:
The judgment (18) corresponds to [LapNull] of apRHL+. Similarly, the following judgments about the DP, Rényi-DP, and zero-concentrated DP of the Gaussian mechanism can be derived as (20)–(22):
In (20), we require $ \sigma \geq \max ((1+\sqrt{3})/2,~\sqrt{2 \log (0.66/\delta)}/\epsilon) $ . The derivation is done via Proposition 43, Theorem12, and prior studies (Bun and Steinke, Reference Bun and Steinke2016;Mironov, Reference Mironov2017; Sato, Reference Sato2016).
10 Case Study II: Probabilistic Programs with Costs
We further extend the computational signature $ \mathcal C $ in the previous section with an effectful operation $ \mathtt{tick} $ such that $ \Sigma_e(\mathtt{tick})=(\mathtt{R},1) $ . The intention of $ \mathtt{tick}(r) $ is to increase cost counter by r during executionFootnote 8. To interpret this extended metalanguage, we fill Figure 3 as follows:
(1) for the CCC-SM, we take $ ({\mathbb{C}},T) = ({\bf QBS},P_{c}) $ where $ P_{c}\triangleq P(K{\mathbb{R}}\times-) $ is the monad for modeling probabilistic choice and cost counting (see Section 5.7).
(2) interpretation of $ b \in B $ is the same as Section 9,
(3) interpretation of value operations is also the same as Section 9,
(4) for the interpretation of effectful operations, put
\begin{align*} [\![ \mathtt{norm}]\!](x,\sigma) &= [(0,\mathrm{id}),\mathcal{N}(x,\sigma^2)]_{\sim_{K{\mathbb{R}} \times K{\mathbb{R}}}},\\ [\![ \mathtt{lap}]\!](x,\lambda) &= [(0,\mathrm{id}),\mathrm{Lap}(x,\lambda)]_{\sim_{K{\mathbb{R}} \times K{\mathbb{R}}}},\\ [\![ \mathtt{tick}]\!] (r) &= \eta^P_{K{\mathbb{R}} \times [\![ 1]\!]} (r,\ast) = [\mathrm{const}(r,\ast), \mu]_{\sim_{K{\mathbb{R}} \times 1}}. \end{align*}
We derive a closed term $ \mathtt{ntick} \colon \mathtt{R} \Rightarrow \mathtt{R} \Rightarrow \mathtt{T} 1 $ for ticking with a cost sampled from Gaussian distribution:
The term $ \mathtt{ntick}~s~r $ adds cost counter by a random value sampled from the Gaussian distribution $ \mathtt{norm}(r,s^2) $ .
10.1 Relational reasoning on probabilistic costs
We convert the total valuation distance $ \mathsf{TV}\in{\bf Div}(G,\operatorname{Eq},1,{\mathcal{R}}^+) $ to the divergence $ {\mathsf\Delta}_{c} \triangleq {\mathsf C}(\langle{L,l}\rangle^*\mathsf{TV} ,{K{\mathbb{R}}} ) \in {\bf Div}(P_{c},\operatorname{Eq},1,{\mathcal{R}}^+) $ on $ P_c $ by Propositions 11 and 17. We also prove basic facts on effectful operations. First, the following relational judgments on $ \mathtt{tick} $ can be easily given:
Remark that $ \operatorname{Eq} 1 = \top $ and $ [\![ \mathtt{tick}(0)]\!] = [\![ \mathop{\mathtt{ret}}(\ast)]\!] $ holds. Next, in the similar way as (18), by the reflexivity of $ \mathsf{TV} $ , we have the reflexivity of $ \langle {L,l}\rangle^*\mathsf{TV} $ , and we obtain, for each real number constant $ \sigma $ and $ \lambda $ :
We also directly verify the following judgment on $ \mathtt{ntick} $ using Theorem 12 and Proposition 43:
10.1.1 An example of relational reasoning
We give examples of verification of difference (of distributions) of costs between two runs of a probabilistic program whose output and cost depend on the input. We consider the following program:
It first samples a real number x from the Laplacian distribution centered at the input r, call the (possibly effectful) closure t with r and return $ x-r $ . Since the return type of t is T1, it can only probabilistically tick the counter. We show that the following two judgments in acRL:
In judgment (A), we pass the tick operation $ t = \lambda x.\mathtt{tick}(x) $ itself to $ M~0 $ and $ M~1 $ . By the fundamental property of $ T^{[ {{\mathsf\Delta}_{c}}]} $ , the difference of costs between two runs of $ M~0~t $ and $ M~1~t $ is 1, because each of these programs reports cost 0 and 1 deterministically. In contrast, in judgment (B), we pass to $ M~0 $ and $ M~1 $ the probabilistic tick function $ t' = \mathtt{ntick}(2) $ that ticks a real number sampled from the Gaussian distribution with variance $ 2^2 = 4 $ . Therefore, the cost reported by the runs of programs $ M~0~t' $ and $ M~1~t' $ follow the Gaussian distributions $ \mathcal{N}(0,4) $ and $ \mathcal{N}(1,4) $ , whose difference by $ \mathsf{TV} $ is bounded by $ 0.20 $ .
We first show (A). By (23) and 2 of Proposition 42, we have
By (26), and 4, 5 of Proposition 42, we obtain
By (24), (27), and 1 and 5 of Proposition 42 again, we conclude (A).
To show (B), it suffices to replace (26) by the following judgment proved by (25), the inequality $ \Pr_{r \sim \mathcal{N}(0,4)}[|r| < 0.5] \leq 0.20 $ and 2 of Proposition 42:
The rest of proof is the same as (A).
11. Related Work
This work is inspired by relational Hoare logics for verifying differential privacy of probabilistic programs, as summarized in Table 5. Composable divergences employed in these logics include the one for the standard DP, plus its recent relaxations, such as, Rényi DP, zero-concentrated DP, and truncated-concentrated DP (Bun et al., Reference Bun, Dwork, Rothblum and Steinke2018; Bun and Steinke, Reference Bun and Steinke2016; Mironov, Reference Mironov2017).
The key semantic structure in these logics is graded relational liftings of the probability distribution monad. Barthe et al. gave a graded relational lifting of the probability distribution monad based on couplings (Barthe et al., Reference Barthe, Köpf, Olmedo and Béguelin2012). Since then, coupling-based liftings have been refined and used in several works (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Kunz and Strub2014, Reference Barthe, Gaboardi, Grégoire, Hsu and Strub2016; Barthe and Olmedo, Reference Barthe and Olmedo2013; Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019). They can be systematically constructed from composable divergences on the probability distribution monad (Barthe and Olmedo, Reference Barthe and Olmedo2013). One advantage of coupling-based liftings is that, to relate two probability distributions, it suffices to exhibit a coupling; this is exploited in the mechanized verification of differential privacy of programs (Reference Albarghouthi and HsuAlbarghouthi andHsu, 2018a,b). These coupling-based liftings, however, are developed upon discrete probability distributions, and measure-theoretic probability distributions, such as Gaussian or Cauchy distributions, were not supported until the work by Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019).
The relational Hoare logic for differential privacy that supports sampling from continuous probability measures is given in the study by Sato (Reference Sato2016). In his work, the graded relational lifting for $ (\epsilon,\delta) $ -DP is constructed by a technique called codensity lifting (Katsumata et al., Reference Katsumata, Sato and Uustalu2018), which does not rely on the existence of coupling. It has been an open question (Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019, Section VIII) how to extend the codensity lifting technique to support various relaxations of differential privacy. Theorem 39 of this paper answers to this question. Later, coupling-based liftings has also been extended to support sampling from continuous probability measures (Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019). This extension is achieved by redefining binary relations as spans of measurable functions. Comparison of these approaches is in Section 7.2.
Verification of differential privacy in functional programming languages has also been pursued (Azevedo de Amorim et al., Reference Azevedo de Amorim, Gaboardi, Hsu and Katsumata2019; Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015; Gaboardi et al., Reference Gaboardi, Haeberlen, Hsu, Narayan and Pierce2013; Reed and Pierce, Reference Reed and Pierce2010). Reed and Pierce (Reference Reed and Pierce2010) introduced a linear functional programming language with graded comonadic types that supports reasoning about $ \epsilon $ -differential privacy of probabilistic programs. Later, Gaboardi et al. (Reference Gaboardi, Haeberlen, Hsu, Narayan and Pierce2013) strengthened the Reed–Pierce type system with dependent types. A category-theoretic account of the Reed–Pierce type system is given by Azevedo de Amorim et al.
(2019), where general $ (\epsilon,\delta) $ -differential privacy is also supported. These works basically regard types as metric spaces, allowing us to reason about sensitivity of programs with respect to inputs. A coupling-based lifting is also employed in a relational model of a higher-order probabilistic programming language (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015).
The study by Azevedo de Amorim et al. (Reference Azevedo de Amorim, Gaboardi, Hsu and Katsumata2019) gives a categorical definition of composable divergences in a general framework called weakly closed refinements of symmetric monoidal closed categories (Azevedo de Amorim et al., Reference Azevedo de Amorim, Gaboardi, Hsu and Katsumata2019, Definition 1). A comparison is given in Section 6.1.2.
Mardare et al. (Reference Mardare, Panangaden and Plotkin2016) introduced a quantitative refinement of algebraic theory called quantitative equational theory and studied variety theorem for quantitative algebras. Bacci et al. (Reference Bacci, Mardare, Panangaden, Plotkin, Gadducci and Silva2021) discussed tensor products of quantitative equational theories. QETs and divergences on monads share the common interest of measuring quantitative difference between computational effects. Divergences on monads are derived as a generalization of the composability condition of statistical divergences studied by Barthe and Olmedo (Reference Barthe and Olmedo2013). To make a precise connection between these two concepts, in Section 6.3, we have given an adjunction between QETs of type $ \Sigma $ over $ \Omega $ and $ \Omega $ -generated divergences on the free monad $ T_\Sigma $ . The adjunction cuts down to the isomorphism between unconditional QETs of type $ \Sigma $ over $ \Omega $ and $ \Omega $ -generated divergences on $ T_\Sigma $ .
In this paper, we have constructed strong graded relational liftings of monads from divergences on monads. The concept of relational lifting and its fibrational generalization are also key technical concepts in the categorical studies of bisimulation (Balan et al., Reference Balan, Kurz and Velebil2019; Baldan et al., Reference Baldan, Bonchi, Kerstan and König2018; Hermida and Jacobs, Reference Hermida and Jacobs1995; Katsumata et al., Reference Katsumata, Sato and Uustalu2018; Kurz and Velebil, Reference Kurz and Velebil2016; Sprunger et al., Reference Sprunger, Katsumata, Dubut and Hasuo2021). It remains to be seen if the relational liftings obtained in this paper, as well as the divergence liftings in Section 6.1.3, have applications in the coalgebraic study of bisimulations.
Metric-like spaces are used in several recent papers on semantics of programming languages and systems. Gavazzo (Reference Gavazzo2018) studied a quantitative refinement of Abramsky’s applicative bisimilarity for the Reed–Pierce type system. He introduced a monadic operational semantics of the type system and formalized the concept of quantitative applicative bisimilarity using monads that are lifted to the category of quantale-valued relations. Bonchi et al. (Reference Bonchi, König and Petrisan2018) also used metric-like spaces to study bisimulations and up-to techniques in the category of quantale-valued relations. In this paper, we are interested in relational program verification of effectful programs, and we carry it out in the relational category $ {\bf BRel}({\mathbb{C}}) $ rather than $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ . The quantitative difference of computational effects measured by a divergence $ {\mathsf\Delta} $ is represented by the binary relation $ \tilde{\mathsf\Delta}(v) $ that relates two computational effects whose distance is bound by v.
The $ \mathsf{RelCost} $ system by Çiçek et al. (Reference Çiçek, Barthe, Gaboardi, Garg and Hoffmann2017) is a formal system for reasoning about relational properties of higher-order functional programs and measuring cost difference of programs. It consists of two subsystems: the relational refinement type system that can measure cost difference of programs, and the unary logic that can estimate lower and upper bounds of cost (i.e. cost intervals) of programs. We expect a connection between the semantics of the former system and the graded relational lifting constructed from the divergence $ \mathsf{NCI} $ on $ P({\mathbb{N}} \times -) $ (or its variant) in Section 5.2. On the other hand, identifying a semantic structure behind the latter system is not the scope of this paper, and we leave it to future work to identify it and relate it with divergences on monads.
Acknowledgements
Tetsuya Sato carried out this research under the support by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603) and JSPS KAKENHI Grant Number 20K19775, Japan. Shin-ya Katsumata carried out this research under the support by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603) and JSPS KAKENHI Grant Number 18H03204, Japan. The authors are grateful to Ichiro Hasuo providing the opportunity of collaborating in that project. The authors are grateful to Satoshi Kura, Justin Hsu, Marco Gaboardi, Borja Balle, and Gilles Barthe for fruitful discussions and Kazuki Matsuoka for helpful comments on the manuscript.
Appendix A. Proofs for Section 4 (Divergences on Monads)
Proof. (Proof of Proposition 7) Let $ m \in M, (x_1, x_2) \in E I, c_1, c_2 \in U (T J) $ . Below $ \eta $ stands for the $ I \times J $ -component $ \eta_{I \times J} : I \times J \rightarrow T (I \times J) $ of the unit of T. From the composability of $ {\mathsf\Delta} $ and $ \eta_x \mathbin{\bullet} y = \eta \mathbin{\bullet} \langle x, y \rangle $ , we have
Since $ E I \mathbin{\dot\times} E J \leq E (I \times J) $ , we have $ (\langle x_1, y_1 \rangle, \langle x_2, y_2 \rangle) \in E (I \times J) $ for any $ (y_1, y_2) \in E J $ . Therefore,
holds by the E-unit reflexivity. From $ (\eta_x)^\sharp \mathbin{\bullet} c = \theta \mathbin{\bullet} \langle x, c \rangle $ by (1), we obtain
Appendix B. Proofs for Section 5 (Examples of Divergences on Monads)
Proposition 44. The family $ \mathsf{C}' = \{{\mathsf{C}'_I} \colon (\mathbb{N} \times I)^2 \to {\mathcal{N}} \}_{I \in {\bf Set}} $ of $ {\mathcal{N}} $ -divergences defined by:
is a $ \operatorname{Eq} $ -relative $ {\mathcal{N}} $ -divergence on the monad $ \mathbb{N} \times - $ .
Proof. The monotonicity of $ \mathsf{C}' $ is obvious.
We show the $ \operatorname{Eq} $ -unit reflexivity of $ \mathsf{C}' $ . For all $ (x,y) \in \operatorname{Eq} I $ (that is, $ x = y \in I $ ), we have
We show the $ \operatorname{Eq} $ -composability of $ \mathsf{C}' $ . Let $ (i,x), (j,y) \in \mathbb{N} \times I $ and $ f,g \colon I \to \mathbb{N} \times J $ . We write $ f(z) = (i_z,f_z) $ and $ g(z) = (j_z,g_z) $ for each $ z \in Z $ .
• If $ x = y $ and $ x_z = y_z $ for all $ z \in I $ , we have
\begin{align*} \mathsf{C}'_J (f^\sharp(i,x),g^\sharp(j,y)) &= \mathsf{C}'_J((i + i_x,f_{x}),(j + j_x,g_{x}))\\ &= |(i + i_x) - (j + j_x)|\leq |i - j | + |i_x - j_x| \\ &\leq \mathsf{C}'_I((i,x),(j,y)) + \sup_{(x,y) \in \operatorname{Eq} I (\iff x = y \in I)}\mathsf{C}'_J(f(x),g(y)) \end{align*}• If $ x \neq y $ or $ f_z \neq g_z $ for some $ z \in I $ , we have
\[ \mathsf{C}'_J (f^\sharp(i,x),g^\sharp(j,y)) \leq \infty = \mathsf{C}'_I((i,x),(j,y)) + \sup_{(x,y) \in \operatorname{Eq} I (\iff x = y \in I)}\mathsf{C}'_J(f(x),g(y)). \]
This completes the proof.
Proposition 45. The family $ \mathsf{NC} = \{\mathsf{NC}_I \colon (P(\mathbb{N} \times I))^2 \to {\mathcal{N}} \}_{I \in {\bf Set}} $ of $ {\mathcal{N}} $ -divergences defined by:
is a $ \operatorname{Top} $ -relative $ {\mathcal{N}} $ -divergence on the monad $ P(\mathbb{N} \times - ) $ .
Proof. The monotonicity of $ \mathsf{NC} $ is obvious.
We show the $ \operatorname{Top} $ -unit reflexivity of $ \mathsf{NC} $ . For all $ (x,y) \in \operatorname{Top} I $ (that is, $ x,y \in I $ ), we have
We show the $ \operatorname{Top} $ -composability of $ \mathsf{NC} $ . For all $ f,g \colon I \to P(\mathbb{N} \times J ) $ and $ A,B \in P(\mathbb{N} \times I ) $ , we have
This completes the proof.
Proposition 46. The family $ \mathsf{NCI} = \{\mathsf{NCI}_I \colon (P(\mathbb{N} \times I))^2 \to {\mathcal{N}} \}_{I \in {\bf Set}} $ of $ {\mathcal{Z}} $ -divergences defined by:
is a $ \operatorname{Top} $ -relative $ {\mathcal{Z}} $ -divergence on the monad $ P(\mathbb{N} \times - ) $ .
Proof. The monotonicity of $ \mathsf{NCI} $ is obvious.
We show the $ \operatorname{Top} $ -unit reflexivity of $ \mathsf{NCI} $ . For all $ (x,y) \in \operatorname{Top} I $ (that is, $ x,y \in I $ ), we have
We show the $ \operatorname{Top} $ -composability of $ \mathsf{NCI} $ . For all $ f,g \colon I \to P(\mathbb{N} \times J ) $ and $ A,B \in P(\mathbb{N} \times I ) $ , we have
This completes the proof.
Proof. (Proof of Proposition 10) We recall the continuity of f-divergence $ {{}^{f}\mathsf{Div}} $ in Liese and Vajda (Reference Liese and Vajda2006, Theorem 16) and Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019, Theorem 3):
Here, a measurable partition of I is a finite family $ \{B_i\}_{i = 0}^n $ of measurable subsets $ B_i \in \Sigma_I $ satisfying $ i \neq j \implies B_i \cap B_j = \emptyset $ and $ \bigcup_{i = 0}^n B_i = I $ .
We have the $ \operatorname{Eq}{} $ -unit reflexivity because the reflexivity $ {}^{f}\mathsf{Div}_{I} (\mu,\mu) = 0 $ is obtained from $ f(1) = 0 $ . We show the $ \operatorname{Eq}{} $ -composability. To show this, we prove a bit stronger statement.
Consider three positive weight functions $ f, f_1, f_2 \geq 0 $ with $ f(1) = f_1(1) = f_2(1) = 0 $ . Assume that there are some $ \alpha, \beta, \beta' \in {\mathbb{R}} $ satisfying the following conditions:
Let $ \mu_1, \mu_2 \in G_{s} I $ , and $ h, k \colon I \to G_{s} J $ . We suppose that at least one of the following two conditions holds
(1) $ \mu_1(I) = \mu_2(I) = 1 $ and $ \forall x \in I.~h(x)(J) = k(x)(J) = 1 $ ,
(2) $ \alpha = 0 $ and $ \beta,\beta \in [0,1] $ .
We then prove the composability in the sense of Olmedo (Reference Olmedo2014, Definition 5.2):
We fix a measurable partition $ \{A_i\}_{i = 0}^n $ of J. For each $ 0 \leq i \leq n $ , by definition of the $ \sigma $ -algebra $ \Sigma_{G J} $ , the functions $ h(-)(A_i) \colon I \to [0,1] $ and $ k(-)(A_i) \colon I \to [0,1] $ are measurable, and hence we have two monotone increasing sequences $ \{h^i_l\}_{l=0}^\infty $ and $ \{k^i_l\}_{l=0}^\infty $ of (nonnegative) simple functionsFootnote 9 that converge uniformly to $ h(-)(A_i) $ and $ k(-)(A_i) $ , respectively. Since the sequences $ \{h^i_l\}_{l=0}^\infty $ and $ \{k^i_l\}_{l=0}^\infty $ converge uniformly and are bounded above, we obtain
We remark that the above computation is consistent even if $ k^\sharp(\mu_2)(A_i) = 0 $ for some $ 0 \leq i \leq n $ . We here recall $ 0f(a/0) = a f^\ast(0) $ for any $ a \in [0,\infty) $ . If $ k^\sharp(\mu_2)(A_i) = 0 $ , then $ \int_X k^i_l~d\mu_2 = 0 $ for all $ l \in {\mathbb{N}} $ because it is a nonnegative monotone increasing sequence whose limit is 0. Then $ (\int_X k^i_l~d\mu_2 ) f\left( (\int_X h^i_l~d\mu_1)/(\int_X k^i_l~d\mu_2)\right) = (\int_X h^i_l~d\mu_1)f^\ast(0) $ holds for all $ l \in {\mathbb{N}} $ . It is monotonically increasing, and converges to $ (h^\sharp(\mu_1)(A_i))f^\ast(0) $ .
We thus show the following inequality:
We fix $ l \in {\mathbb{N}} $ . We can write $ h^i_l = \sum_{j = 0}^m \alpha^i_{j}\chi_{B_j} $ and $ k^i_l = \sum_{j = 0}^m \beta^i_{j}\chi_{B_j} $ with some coefficients $ \alpha^i_{j},\beta^i_{j} \in [0,1] $ ( $ 0 \leq j \leq m $ ) and measurable partition $ \{B_j\}_{j = 0}^m $ of I.
By Jensen’s inequality and the second inequality of (A’), we calculate as follows:
We will evaluate the three expressions $ V_1,V_2,V_3 $ .
First, we obtain $ \lim_{l \to \infty} V_1 \leq {{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2) $ as follows:
Here, the first inequality is given from the nonnegativity of each $ \mu_2(B_j) f_1\left(\frac{\mu_1(B_j)}{\mu_2(B_j)}\right) $ derived from $ 0 \leq f_1 $ ; the equality is given by definition of $ \alpha_{j}^{i} $ and $ \beta_{j}^{i} $ ; the second inequality can be given by the continuity of $ {{}^{f_1}\mathsf{Div}} $ ; the last inequality is derived by $ \beta h(x)(J) + (1-\beta) k(x)(J) \in [0,1] $ from the assumption that either $ \beta \in [0,1] $ or $ h(x)(J) = k(x)(J) $ for all $ x \in I $ holds.
Second, we obtain $ \lim_{l \to \infty} V_2 \leq \gamma\cdot{{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2) \cdot \sup_{x \in I}{{}^{f_2}\mathsf{Div}}_J(h(x),k(x)) $ as follows:
Here, the first inequality is proved by the nonnegativity of each
which is derived from the first inequality of (A’); the first equality is given by definition of $ \alpha_{j}^{i} $ and $ \beta_{j}^{i} $ and the countable additivity of $ \mu_1 $ and $ \mu_2 $ ; the second inequality is given by the continuity of $ {{}^{f_1}\mathsf{Div}} $ and $ 0 \leq \gamma $ ; the last inequality is derived by $ \beta' \mu_1(I) + (1-\beta') \mu_2(I) \in [0,1] $ from the assumption that either $ \beta' \in [0,1] $ or $ \mu_1(I) = \mu_2(I) $ holds.
We prove the third inequality. We recall that the sequences $ \{h^i_l(x)\}_{l = 0}^\infty $ and $ \{k^i_l(x)\}_{l = 0}^\infty $ are monotone increasing at each $ x \in I $ . Since $ f_2 $ is convex, by Jensen’s inequality, the sequence $ \left\{\sum_{i = 0}^n k^i_l(x) f_2 \left({h^i_l(x)}/{k^i_l(x)}\right)\right\}_{l = 0}^\infty $ is monotone increasing at each $ x \in I $ . Then, the sequence $ \left\{ \sup_{x \in I} \sum_{i = 0}^n k^i_l(x) f_2 \left({h^i_l(x)}/{k^i_l(x)}\right)\right\}_{l = 0}^\infty $ is monotone increasing, because $ \sum_{i = 0}^n k^i_{l + 1}(x) f_2 \left({h^i_{l + 1}(x)}/{k^i_{l + 1}(x)}\right) $ is always greater than $ \sum_{i = 0}^n k^i_l(x) f_2 \left({h^i_l(x)}/{k^i_l(x)}\right) $ for all $ l \in {\mathbb{N}} $ and $ x \in I $ . Hence, from the continuity of $ {}^{f_2}\mathsf{Div} $ , we obtain
Finally, we obtain $ \lim_{l \to \infty} V_3 = 0 $ as follows:
Here, the last equality is derived from the assumption that either $ \alpha = 0 $ or $ h(x)(J) = k(x)(J) $ for any $ x \in I $ holds.
From the above evaluations of $ V_1 $ , $ V_2 $ , and $ V_3 $ , we obtain the inequality (29). Since the measurable partition $ \{A_i\}_{i = 0}^n $ of J is arbitrary, we conclude (28) by the continuity of $ {}^{f}\mathsf{Div} $ . This completes the proof.
Parameters for Proposition 10 for for weight functions of $ \mathsf{TV} $ , $ \mathsf{KL} $ , $ \mathsf{HD} $ , and $ \mathsf{Chi} $ are shown in Table 4. Below, we check the conditions in Proposition 10.
• For the weight function $ f(t) = |t - 1| / 2 $ of $ \mathsf{TV} $ , the tuple $ (\gamma,\alpha,\beta,\beta') = (0,0,1,0) $ satisfies for all $ x,y,z,w \in [0,1] $ , we have
\begin{align*} 0 & \leq w + xf({z}/{x}),\\ xy f(zw/xy) & =| {zw} - xy | / 2 \leq (| {zw} - {wx}| + |{xw} - xy |) /2 = wxf({z}/{x}) + xf({w}/{y}). \end{align*}• For the weight function $ f(t) = t\log(t) - t + 1 $ of $ \mathsf{KL} $ , the tuple $ (\gamma,\alpha,\beta,\beta') = (0,-1,1,1) $ satisfies for all $ x,y,z,w \in [0,1] $ , we have
\begin{align*} 0 & \leq z + x f(z/x), \\ xyf(zw/xy) &= xy((zw/xy)\log(zw/xy) - zw/xy + 1)\\ &= zw\log(w/y) + zw\log(z/x) - zw + xy \\ & = xw((z/x)\log(z/x) {-} z/x {+} 1) {+} zy((w/y)\log(w/y) {-} w/y + 1) {-} (x-z)(w-y). \end{align*}• For the weight function $ f(t) = (\sqrt{t} - 1)^2 / 2 $ of $ \mathsf{HD} $ , the tuple $ (\gamma,\alpha,\beta,\beta') = (0,-1/4,1/2,1/2) $ satisfies for all $ x,y,z,w \in [0,1] $ ,
\begin{align*} 0 & \leq (z + x) / 2 + x f(z/x), \\ xyf(zw/xy) &= (zw + xy)/2 -\sqrt{xyzw} \\ & = (zw + xy)/2 - ((x + z) - (\sqrt{x} - \sqrt{z})^2)((y + w) - (\sqrt{y} - \sqrt{w})^2)/ 4 \\ & = (zw + xy)/2 - ((x + z) - 2 x f(z/x))((y + w) - 2 yf(w/y))/ 4\\ & = (zw + xy)/2 - ((x + z)/2 - x f(z/x))((y + w)/2 - yf(w/y))\\ & \leq (y + w)/ 2 \cdot x f(z/x) + (x + z)/2 \cdot y f(w/y) + (zw + xy)/2 - (x + z)(y + w)/4\\ & = (y + w)/ 2 \cdot x f(z/x) + (x + z)/2 \cdot y f(w/y) - (x - z)(w - y)/4. \end{align*}• For the weight function $ f(t) = ({t} - 1)^2 $ of $ \mathsf{Chi} $ , The tuple $ (\gamma,\alpha,\beta,\beta') = (1,-2,2,2) $ satisfies for all $ x,y,z,w \in [0,1] $ ,
\begin{align*} 0 & \leq z^2/x = (2z - x) + ((z/x) - 1)(z - x) = (2z - x) + x f(z/x), \\ xy f(zw/xy) &= z^2w^2/xy - 2zw + xy \\ &= (xf(z/x) + 2z - x)(yf(w/y) + 2w - y) - 2zw + xy \\ &=(2w - y) xf(z/x) + (2z - x)yf(w/y) + xyf(z/x)f(w/y) -2(x-z)(w-y). \end{align*}
Proof. (Proof of Proposition 11)
We first show the monotonicity of $ \langle{p,\lambda}\rangle^*{\mathsf\Delta} $ . Assume $ m \leq m' $ . From the monotonicity of the original $ {\mathsf\Delta} $ , we obtain for each $ \nu_1,\nu_2 \in U^{\mathbb{C}} (SI)) $ ,
Second, we show the F-unit reflexivity of $ \langle{p,\lambda}\rangle^*{\mathsf\Delta} $ . For $ FI = (I,I,R_{FI}) $ , we have $ EpI = (p = (pI,pI,R_{FI}) $ for all $ I \in {\mathbb{C}} $ . We can calculate for all $ (x,y) \in R_F $ ,
Finally, we show the $ {F} $ -composability of $ \langle{p,\lambda}\rangle^*{\mathsf\Delta} $ . For all $ J \in {\mathbb{C}} $ , $ c_1,c_2 \in U^{\mathbb{C}} S I $ , and $ f_1,f_2 \colon I \to S J $ we can calculate
To prove the equality ( $ \ast $ ), we calculate
This completes the proof.
Proof. (Proof of Proposition 13)
It suffices to show $ \operatorname{Top} $ -unit reflexivity and $ \operatorname{Top} $ -composability:
Here, $ F_1,F_2 \colon I \to T_S J $ and $ f_1,f_2 \in T_S I $ .
Proof. (Proof of Proposition 14)
It suffices to show $ \operatorname{Eq} $ -unit reflexivity and $ \operatorname{Eq} $ -composability:
Here, $ F_1,F_2 \colon I \to T_S J $ and $ f_1,f_2 \in T_S I $ . Without loss of generality, we may assume $ \pi_1 f_1 = \pi_1 f_2 $ holds and $ \pi_2 f_1 $ and $ \pi_2 f_2 $ are nonexpansive, and for every $ x \in I $ , $ \pi_1 F_1(x) = \pi_1 F_2(x) $ holds and $ \pi_2 F_1(x) $ and $ \pi_2 F_2(x) $ are nonexpansive.
Proof. (Proof of Proposition 15) We first show the $ \operatorname{Eq} {} $ -unit reflexivity of $ d^{T_S (-)} $ . For any $ s \in S $ , we calculate
We next show the $ \operatorname{Eq} {} $ -composability of $ d^{T_S (-)} $ . For any $ f_1,f_2 \in T_S (I,d_I) $ and nonexpansive functions $ F_1,F_2 \colon (I,d_I) \to T_S (J,d_J) $ , we compute
We note here that the nonexpansivity of $ F_2 \colon (I,d_I) \to (S,d_S) \Rightarrow (S,d_S) \times (J,d_J) $ is equivalent to the one of its uncurrying $ \overline{F_2} \colon (S,d_S) \times (I,d_I) \to (S,d_S) \times (J,d_J) $ .
Proof. (Proof of Proposition 16) We first show the $ \mathsf{Dist}_0 {} $ -unit reflexivity of $ {\mathsf\Delta}^{\mathsf{Dist}_0} $ . For $ (x_1,x_2) \in \mathsf{Dist}_0 (I,d_I) $ (i.e. $ d_I (x_1,x_2) = 0 $ ), we calculate
Next, we show the $ \mathsf{Dist}_0 {} $ -composability of $ {\mathsf\Delta}^{\mathsf{Dist}_0} $ . For any $ f_1,f_2 \in T_S (I,d_I) $ and nonexpansive functions $ F_1,F_2 \colon (I,d_I) \to T_S (J,d_J) $ , we compute
This completes the proof.
Proof. (Proof of Proposition 17) Since $ M = 1 $ , the monotonicity of $ {\mathsf C}({\mathsf\Delta}, N) $ is obvious.
We first show the $ \operatorname{Eq} $ -unit reflexivity of $ {\mathsf C}({\mathsf\Delta}, N) $ . We recall that for all $ x \in UI $ , we have
Hence,
We next show the $ \operatorname{Eq} $ -composability of $ {\mathsf C}({\mathsf\Delta}, N) $ . For any $ f \colon I \to T(N \times I) $ , we define $ h_f \colon N \times I \to T(N) $ by $ h_f = T(\star) \circ \theta_{N,N} \circ (\mathrm{id}_N \times (T \pi_1 \circ f)) $ . Then, we have $ T \pi_1 \mathbin{\bullet} f^{\sharp(T(N \times I))}\mathbin{\bullet} \nu = h_f^{\sharp T}\mathbin{\bullet} \nu $ for any $ \nu \in U(T(N \times I)) $ . First, for all $ m,n \in UN $ , we have
From this and the equality (1), we can calculate as follows:
From the assumption $ {\mathsf\Delta}^{}_{N \times I} (c_1, c_2) \leq {\mathsf C}({\mathsf\Delta}, N)_{I} (c_1, c_2) $ , the $ \operatorname{Eq} $ -unit reflexivity and $ \operatorname{Eq} $ -composability of the original divergence $ {\mathsf\Delta} $ , we obtain the $ \operatorname{Eq} $ -composability of $ {\mathsf C}({\mathsf\Delta}, N) $ as follows:
This completes the proof.
Proof. (Proof of Proposition 18)
We consider a preorder $ \sqsubseteq $ on a monad T. We define the $ {\mathcal{B}} $ -divergence $ {\mathsf\Delta}^{\sqsubseteq}_{ }{} $ on TI by:
Each $ \tilde{\mathsf\Delta}(1)I $ is a preorder because $ \tilde{\mathsf\Delta}(1)I = {\sqsubseteq_I} $ holds for each I.
The $ \operatorname{Eq} $ -unit reflexivity of $ {\mathsf\Delta}^{\sqsubseteq}_{ }{} $ is derived from the reflexivity of $ \sqsubseteq $ . For all set I and $ c \in TI $ ,
Since $ \sqsubseteq $ is a preorder on T, for any $ I,J\in{\bf Set} $ , $ c_1,c_2 \in TI $ and $ f,g \colon I \to TJ $ ,
Hence, we have the $ \operatorname{Eq} $ -composability:
Conversely, we consider an $ \operatorname{Eq} $ -relative $ {\mathcal{B}} $ -divergence $ {\mathsf\Delta} $ on T such that each $ \tilde {\mathsf\Delta} (1) I $ is a preorder. We show that the family $ \sqsubseteq^{\mathsf\Delta} = \{ \sqsubseteq^{\mathsf\Delta}_I \}_{I \in {\bf Set}} $ defined by $ {\sqsubseteq^{\mathsf\Delta}_I} \triangleq \tilde {\mathsf\Delta} (1) I $ forms a preorder on monad T.
Each component $ \sqsubseteq^{\mathsf\Delta}_I $ of $ \sqsubseteq^{\mathsf\Delta} $ at set I is a preorder on the set TI. We here note that the divergence $ {\mathsf\Delta} $ must be reflexive (i.e. $ {\mathsf\Delta}^{}_{I}(c,c) \leq 1 $ for all $ I \in {\bf Set},c \in TI $ ) because of the reflexivity of $ \sqsubseteq^{\mathsf\Delta}_I $ :
From the reflexivity and $ \operatorname{Eq} $ -composability of $ {\mathsf\Delta} $ , we have for all $ c_1,c_2,c \in TI $ and $ f,g \colon I \to TJ $ ,
They are equivalent to the substitutivity and congruence of $ \sqsubseteq^{\mathsf\Delta} $ , respectively:
Finally, the above conversions $ {\mathsf\Delta}^{(-)} $ and $ \sqsubseteq^{(-)} $ are mutually inverse:
This completes the proof.
Appendix C. Proofs for Section 5 (Properties of Divergences on Monads)
Proof. (Proof of Theorem 21) First, it is easy to see that the inequality (3) is equivalent to $ {\mathsf\Delta} $ satisfying E-unit reflexivity.
We next show that the inequality (4) is equivalent to $ {\mathsf\Delta} $ satisfying E-composability.
(only if) Since $ U 1 = \{ {\rm id}_1 \} $ , we have $ R_{E 1} = \{ ({\rm id}_1, {\rm id}_1) \} $ . Therefore, it holds $ d^{\mathsf\Delta}_{1, J} (c_1, c_2) = {\mathsf\Delta}^{}_{J} (c_1, c_2) $ . By letting $ I = 1 $ in the inequality (4), we obtain the E-composability:
(if) From the E-composability, for any $ f_1, f_2 : I \to T J $ and $ g_1, g_2 : J \to T K $ and $ (x_1, x_2) \in E I $ , we have
Next, for any $ (x_1, x_2) \in E I $ , we have $ {\mathsf\Delta}^{}_{J} (f_1 \mathbin{\bullet} x_1, f_2 \mathbin{\bullet} x_2) \leq d^{\mathsf\Delta}_{I, J} (f_1, f_2) $ . Thus, by monotonicity of $ (+) $ we have
By discharging $ (x_1, x_2) \in E I $ , we conclude
Proof. (Proof of Lemma 22) We write V for $ V_{{\mathcal{Q}},{\mathbb{C}}} $ . From $ x+\top=\top $ , we obtain the following equivalence (below $ z_1,z_2\in U(VZ) $ and $ x_1,x_2\in U(VX) $ ):
This shows that the currying bijection $ \lambda:{\mathbb{C}}(VZ\times VX,VY)\to {\mathbb{C}}(VZ,VX\Rightarrow VY) $ restricts to the one from $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})(Z\otimes X,Y) $ to $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})(Z,X\multimap Y) $ , showing that $ V_{{\mathcal{Q}},{\mathbb{C}}} $ is a map of adjunction.
Proof. (Proof of Theorem 25) $ [{\mathsf\Delta}] $ is a graded variant of the codensity lifting performed along the fibration $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ (Katsumata et al., Reference Katsumata, Sato and Uustalu2018, see also Definition 37). Proving that it is a graded lifting of T is routine. We show $ {\mathsf\Delta}^m_I = [{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}I) $ . The direction $ [{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}I) \preceq_{TI} {\mathsf\Delta}^m_I $ is easy. We show the converse. From the composability of $ {\mathsf\Delta} $ , for any $ c_1, c_2 \in U (T I) $ , $ J \in {\mathbb{C}} $ , $ n \in M $ and $ f \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) ( {E}^{\mathcal{Q}}I, {\mathsf\Delta}^n_J) $ , we have
Next, the nonexpansivity of f is equivalent to
Therefore, we conclude $ {\mathsf\Delta}^{m \cdot n}_J (f^\sharp \mathbin{\bullet} c_1, f^\sharp \mathbin{\bullet} c_2) \leq {\mathsf\Delta}^m_I (c_1, c_2) $ . By discharging J, n, f, we conclude the inequality $ {\mathsf\Delta}^m_I\preceq_{TI} [{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}I) $ .
Proof. (Proof of Theorem 26) Let $ {\mathsf\Delta} \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ . We have already shown that $ [{\mathsf\Delta}] $ is an M-graded $ {\mathcal{Q}} $ -divergence lifting of T. We show that $ [{\mathsf\Delta}] $ is E-strong (this proof does not need the closedness of $ {\mathbb{C}} $ ). Let $ X \triangleq (I, d) \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ and $ J \in {\mathbb{C}} $ be objects. We first rewrite the goal:
In the step $\mathbin{\stackrel{\dagger}{\iff}}$ , we use the equality (1). To show this goal, we proceed as follows. Let $ x_1, x_2 \in U I, c_1, c_2 \in U (T J), K \in {\mathbb{C}}, n \in M $ and $ f \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X \otimes {E}^{\mathcal{Q}}J, {\mathsf\Delta}^n_K) $ . First, from the composability of $ {\mathsf\Delta} $ , we obtain
We look at summands of the right-hand side. First, we have $ {\mathsf\Delta}^m_J (c_1, c_2) \leq d_{[{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}J)} (c_1, c_2) $ by Theorem 25. Next, from the nonexpansivity of f, for any $ x_1,x_2\in UI,y_1, y_2 \in U J $ , we have
Because $ x+\top=\top $ , we obtain
Therefore, we obtain the goal:
Next, let $ \dot{T} \in {\bf SGDLift}(T, E,M,{\mathcal{Q}}) $ . We show $ \langle \dot{T}\rangle \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ .
The unit law of $ \dot{T} $ immediately entails
Next, under the assumption on $ ({\mathbb{C}}, T) $ and $ {\mathcal{Q}} $ , in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ the functor $ (-) \otimes {E}^{\mathcal{Q}}I $ has a right adjoint $ {E}^{\mathcal{Q}}I \multimap (-) $ above the adjunction $ (-) \times I \dashv I \Rightarrow (-) $ (Lemma 22). Therefore, each component of the uncurried bind morphism ub given in (7) are nonexpansive morphisms in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ :
Therefore, we conclude
We also easily have monotonicity: $ \langle \dot{T}\rangle m ( {E}^{\mathcal{Q}}I) \leq \langle \dot{T}\rangle n ( {E}^{\mathcal{Q}}I) $ for $ m \leq n $ by condition 1 of graded divergence lifting. We thus conclude that $ \langle \dot{T}\rangle m {E}^{\mathcal{Q}}I \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ .
We finally show $ \dot{T} \preceq [\langle \dot{T}\rangle] $ . Let $ c_1, c_2 \in U (T I) $ . We show
Let $ n \in M, J \in {\mathbb{C}}, f \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X, \dot{T} n ( {E}^{\mathcal{Q}} J)) $ . Since $ \dot{T} $ is an M-graded $ {\mathcal{Q}} $ -divergence lifting of T, we obtain
This implies the inequality $ d_{\dot{T} (m \cdot n) ( {E}^{\mathcal{Q}}J)} (f^\sharp (c_1), f^\sharp (c_2)) \leq d_{\dot{T} m X} (c_1, c_2) $ in $ {\mathcal{Q}} $ . By taking the sup for n, J, f, we obtain the inequality (32).
Proof. (Proof of Proposition 28) We write $ |1| = \{ \ast \} $ . We first check the measurable isomorphism $ G_{s} 1 \cong [0,1] $ . The measurable functions $ \mathrm{ev}_{\{\ast\}} \colon G_{s} 1 \to [0,1] $ ( $ \nu \mapsto \nu (\ast) $ ) and the function $ H \colon | [0,1]| \to |G_{s} 1| $ ( $ r \mapsto r \cdot \mathbf{d}_{\ast} $ ) are mutually inverse. For any (Borel-)measurable $ U \in \Sigma_{[0,1]} $ , we have $ H^{-1}(\mathrm{ev}_{\{\ast\}}^{-1} (U)) = U $ and $ H^{-1}(\mathrm{ev}_{\emptyset}^{-1} (U)) = [0,1] $ if $ 0 \in U $ and $ H^{-1}(\mathrm{ev}_{\emptyset}^{-1} (U)) = \emptyset $ otherwise. Since all generators of $ \Sigma_{G_{s} 1} $ are $ \mathrm{ev}_{\{\ast\}}^{-1}(U) $ and $ \mathrm{ev}_{\emptyset}^{-1} (U) $ where $ U \in \Sigma_{[0,1]} $ , we conclude the measurability of $ H \colon [0,1] \to G_{s} 1 $ . Thus, $ f \colon I \to [0,1] $ corresponds bijectively to $ H \circ f \colon I \to G_{s} 1 $ , and
We then obtain, for all $ I \in {\bf Meas} $ , $ \nu_1,\nu_2 \in G_{s} I $
The first inequality is given by $ \nu(S) = \int_I \chi_S d\nu $ where $ \chi_S \colon I \to [0,1] $ is the indicator function of S defined by $ \chi_S(x) = 1 $ when $ x \in S $ and $ \chi_S(x) = 0 $ otherwise. The last inequality is given by the data processing inequality which is given by the reflexivity and $ \operatorname{Eq}{} $ -composability of $ \mathsf{DP} $ .
Proof. (Proof of Proposition 29) We first prove that $ \mathsf{TV} $ is not 1-generated. We write $ |2| = \{0,1\} $ . We define $ \nu_1,\nu_2 \in G_{s} 2 $ by:
Then the total variation distance between them is calculated by:
On the other hand, for any $ f \colon 2 \to G_{s}1 $ , we have
This implies that $ \mathsf{TV} $ is not 1-generated.
Next, we prove that $ \mathsf{TV} $ is 2-generated. From the data processing inequality $ \mathsf{TV} $ which is given by the reflexivity and $ \operatorname{Eq}{} $ -composability of $ \mathsf{TV} $ , we obtain for any $ \nu_1,\nu_2 \in G_{s} I $ ,
We show that the above inequality becomes the equality for some g.
We fix $ \nu_1,\nu_2 \in G_{s} I $ , a base measure $ \mu $ over I satisfying the absolute continuity $ \nu_1,\nu_2 \ll \mu $ and the Radon–Nikodym derivatives (density functions) $ \frac{d\nu_1}{d\mu}, \frac{d\nu_2}{d\mu} $ of $ \nu_1,\nu_2 $ with respect to $ \mu $ , respectively.
Let $ A = (\frac{d\nu_1}{d\mu} - \frac{d\nu_2}{d\mu})^{-1}([0,\infty)) $ and $ B = I \setminus A $ . We define $ g \colon I \to G_{s}2 $ by $ g(x) = \mathbf{d}_0 $ if $ x \in B $ and $ g(x) = \mathbf{d}_1 $ otherwise. Then for any $ \nu \in G_{s} I $ , we have
Similarly, we have $ (g^\sharp \nu)(\{1\}) = \nu(B) $ . Therefore, we obtain
We then conclude that $ \Delta^{\mathrm{TV}} $ is 2-generated.
Proof. (Proof of Proposition 30) For all set J and $ c_1,c_2 \in TJ $ , we have
This implies that $ {\mathsf\Delta}^{[\leq]^\Omega}_{ }{} $ is $ \Omega $ -generated.
Lemma 47. For any $ U\in{\bf QET}(\Sigma,\Omega),t,u\in T_\Sigma\Omega $ and $ \varepsilon\in\mathbb Q^+ $ , we have
Proof. ( $ \implies $ ) Assume $ D[U](t,u) \leq \varepsilon $ . We first fix an arbitrary $ \varepsilon' \in \mathbb Q^+ $ such that $ \varepsilon < \varepsilon' $ . By the definition of D[U], there is $ \varepsilon^\ast \in \mathbb Q^+ $ such that $ D[U](t,u) \leq \varepsilon^\ast < \varepsilon' $ and $ \emptyset \vdash {t}=_{\varepsilon^\ast}{u} \in U $ . Here $ (\varepsilon - \varepsilon^\ast) \in \mathbb Q^+ $ and $ \varepsilon' = \varepsilon^\ast + (\varepsilon' - \varepsilon^\ast) $ . Therefore, by (Max) and (cut), we conclude $ \emptyset \vdash {t}=_{\varepsilon'}{u} \in U $ . Since $ \varepsilon' $ is arbitrary, we obtain $ \left\{\emptyset \vdash {t}=_{\varepsilon'}{u} \mathrel{}\middle|\mathrel{} \varepsilon' \in \mathbb Q^+, \varepsilon < \varepsilon' \right\} \subseteq U $ . Hence, by (Arch) and (cut), we have $ \emptyset \vdash {t}=_{\varepsilon}{u} \in U $ . ( $ \impliedby $ ) Obvious.
Lemma 48. For any $ U \in {\bf QET}(\Sigma ,\Omega ) $ , the function $ D[U] \colon (T_\Sigma \Omega )^2 \to {\mathcal{R}}^+ $ defined by (9) is a CS-PMet on $ T_\Sigma \Omega $ .
Proof. That D[U] is a pseudometric is shown in the beginning of (Mardare et al., Reference Mardare, Panangaden and Plotkin2016, Section 5).
We check the substitutivity. Let $ t,u \in T_\Sigma \Omega $ and $ h \colon \Omega \to T_\Sigma \Omega $ . By (Subst), we have
Since $ \varepsilon $ is arbitrary, we conclude the substitutivity as follows:
We check the congruence. Let $ t \in T_\Sigma I $ and $ h_1,h_2 \colon I \to T_\Sigma \Omega $ . By unfolding the structure of t and applying (Nonexp) and (Cut) repeatedly, we have the implication:
From (33) and Lemma 47, we conclude the congruence as follows:
Here, the last equality follows from the density of $ \mathbb Q^+ $ .
The monotonicity of $ D[-] \colon ({\bf QET}(\Sigma ,\Omega ),\subseteq) \to ({\bf CSPMet}(T_\Sigma , \Omega ),\preceq) $ is shown as:
Lemma 49. Let T be a monad on Set and $ \Omega \in {\bf Set} $ . For any $ d \in {\bf CSPMet}(T, \Omega ) $ , the family $ \mathrm{Gen}(d) = \{\mathrm{Gen}(d)_I \colon (T I)^2 \to {\mathcal{R}}^+ \}_{I\in{\bf Set}} $ defined by (11) is an $ \Omega $ -generated $ \operatorname{Eq} $ -relative $ {\mathcal{R}}^+ $ -divergence on T where each $ \mathrm{Gen}(d)_I $ is a pseudometric.
Proof. From the reflexivity of d, we have the reflexivity of $ \mathrm{Gen}(d)_I $ : for each $ c \in TI $ ,
Hence, the $ \operatorname{Eq} $ -unit reflexivity of $ \mathrm{Gen}(d) $ follows. From the symmetry of d, we have the symmetry of $ \mathrm{Gen}(d)_I $ : for each $ c_1,c_2 \in TI $ ,
From the triangle inequality of d, we have the triangle inequality of $ \mathrm{Gen}(d)_I $ : for all $ c_1,c_2,c_3 \in TI $ ,
Using the reflexivity, congruence, and substitutivity of d and the triangle inequality of $ \mathrm{Gen}(d)_I $ , we next show the composability. Let $ c_1,c_2 \in TI $ and $ f_1,f_2 \colon I \to TJ $ . We obtain
We show the $ \Omega $ -generatedness of $ \mathrm{Gen}(d) $ :
The step (*) uses the equality $ \left\{l \mathrel{}\middle|\mathrel{} l \colon I \to T\Omega \right\} = \left\{ h^\sharp \circ k \mathrel{}\middle|\mathrel{} h \colon \Omega \to T\Omega , k \colon I \to T\Omega \right\} $ .
Gen is indeed a monotone function of type $ ({\bf CSPMet}(T , \Omega ),\preceq) \to ({\bf PMet}(T , \Omega ),\preceq) $ because
Proof. (Proof of Theorem 34) This is proved by Lemma 47, 48, 49.
Proof. (Proof of Theorem 35) Let $ d \in {\bf CSPMet}(T_\Sigma , \Omega ) $ . We show $ (\mathrm{Gen}(d))_\Omega = d $ . Let $ t,u \in T_\Sigma \Omega $ . From the substitutivity of d, we have
On the other hand, $ d(t,u)=d(\eta_\Omega^\sharp(t),\eta_\Omega^\sharp(u)) $ . Therefore, we conclude
Let $ {\mathsf\Delta} \in {\bf PMet}(T_\Sigma , \Omega ) $ . We show $ \mathrm{Gen}({\mathsf\Delta}_\Omega ) = {\mathsf\Delta} $ . By the $ \Omega $ -generatedness of $ {\mathsf\Delta} $ , we have, for all sets I and $ t,u \in T_\Sigma I $ ,
We show the adjointness: $ U[d] \subseteq V \iff d \preceq D[V] $ for any $ V \in {\bf QET}(\Sigma ,\Omega ) $ and $ d \in {\bf CSPMet}(T_\Sigma , \Omega ) $ .
The step (*) uses Lemma 47.
Let $ d \in {\bf CSPMet}(T_\Sigma , \Omega ) $ . We finally show $ D[U[d]] = d $ . By the adjunction $ U[-]\dashv D[-] $ , we have $ d\preceq D[U[d]] $ . We thus show $ D[U[d]]\preceq d $ . We unfold this goal:
Since U[d] is the least QET including $ \left\{ \emptyset \vdash {t}=_{\varepsilon}{u} \mathrel{}\middle|\mathrel{} d(t,u) \leq \varepsilon \right\} $ , it suffices to have a QET $ V \in {\bf QET}(\Sigma ,\Omega ) $ such that
Inspired from the definition of models of QET (Bacci et al., Reference Bacci, Mardare, Panangaden, Plotkin, Gadducci and Silva2021), we define V as follows:
By the substitutivity of d and the definition of V, we obtain, for all $ t,u \in T_\Sigma \Omega $ and $ \varepsilon \in \mathbb Q^+ $ ,
We check that V satisfies all rules of QET:
(Ref) From the reflexivity of d, we have $ {\emptyset \vdash {t}=_{0}{t}} \in V $ :
(Sym) From the symmetry of d, we have $ {\{{t}=_{\varepsilon}{u} \} \vdash {u}=_{\varepsilon}{t}} \in V $ :
(Tri) From the triangle inequality of d, we have $ {\{{t}=_{\varepsilon}{u}, {u}=_{\varepsilon'}{v} \} \vdash {t}=_{\varepsilon+\varepsilon'}{v}} \in V $ :
(Max) Since $ (\mathbb Q^+,\leq,0,+) $ is a preordered monoid where the unit 0 is the least element, we have $ {\{{t}=_{\varepsilon}{u} \} \vdash {t}=_{\varepsilon+\varepsilon'}{u}} \in V $ :
(Arch) From the density of $ \mathbb Q^+ $ , we have
It is equivalent to $ {\left \{{t}=_{\varepsilon'}{u} \mathrel{}\middle|\mathrel{} \varepsilon < \varepsilon' \right\} \vdash {t}=_{\varepsilon}{u}} \in V $ :
(Nonexp) Let $ f \colon n \in \Sigma $ be a function and $ \left\{ {t_i}=_{\varepsilon}{u_i}\mathrel{}\middle|\mathrel{} 1 \leq i \leq n \right\} \subseteq \mathbb V (T_\Sigma \Omega ) $ be a set of quantitiative equations. Let $ I = \{1,\ldots,n\} $ . We then take $ t_f = f(1,\ldots,n) \in T_{\Sigma } I $ . We define $ t, s : I \rightarrow T_{\Sigma } \Omega $ by $ t(i) = t_i $ and $ s(i) = s_i $ ( $ i \in I $ ). We now fix an arbitrary $ \sigma \colon \Omega \to T_{\Sigma } \Omega $ , and assume $ d (\sigma^\sharp(t_i), \sigma^\sharp(s_i)) \leq \varepsilon $ for all $ i \in I $ . Then this asserts $ \sup_{i \in I} d (\sigma^\sharp(t (i)), \sigma^\sharp(s (i))) \leq \varepsilon $ . From the congruence of d, we obtain
Since $ \sigma $ is arbitrary, we conclude $ \left\{ {t_i}=_{\varepsilon}{u_i}\mathrel{}\middle|\mathrel{} 1 \leq i \leq n \right\} \vdash {f (t_1,\ldots,t_n)}=_{\varepsilon}{f (s_1,\ldots,s_n)} \in V $ .
(Subst) Immediate by definition of V:
(Cut) Assume $ \Gamma' \vdash{t}=_{\varepsilon}{u} $ and $ \Gamma \vdash \psi \in V $ holds for all $ \psi \in \Gamma' $ . Fix an arbitrary $ \sigma \colon \Omega \to T_\Sigma \Omega $ , and assume $ \left(\forall{{t'}=_{\varepsilon''}{u''} \in\Gamma}~.~d(\sigma^\sharp(t''),\sigma^\sharp(u'')) \leq \varepsilon'' \right) $ . By definition of V, for any $ {t'}=_{\varepsilon'}{u'}\in \Gamma' $ , we obtain $ d(\sigma^\sharp(t'),\sigma^\sharp(u')) \leq \varepsilon' $ from $ \Gamma \vdash {t'}=_{\varepsilon'}{u'} \in V $ . Hence, by definition of V again, we obtain $ d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon $ from $ \Gamma' \vdash{t}=_{\varepsilon}{u} \in V $ . Since $ \sigma $ is arbitrary, we conclude $ \Gamma \vdash{t}=_{\varepsilon}{u} \in V $ .
(Assumpt) Assume $ {t}=_{\varepsilon}{u} \in \Gamma $ . Fix an arbitrary $ \sigma \colon \Omega \to T_\Sigma \Omega $ . Regardless of the value of $ d(\sigma^\sharp(t),\sigma^\sharp(u)) $ , the following predicare is true:
because the premise $ (\forall{{t'}=_{\varepsilon'}{u'} \in\Gamma}~.~d(\sigma^\sharp(t'),\sigma^\sharp(u')) \leq \varepsilon') $ is false whenever $ d(\sigma^\sharp(t),\sigma^\sharp(u)) > \varepsilon $ . Since $ \sigma $ is arbitrary, we conclude $ \Gamma \vdash{t}=_{\varepsilon}{u} \in V $ .
Proof. (Proof of Theorem 36) It is clear that U[d] is an unconditional QET from its definition. Therefore, we take arbitrary $ V\in \mathbf{UQET}(\Sigma ,\Omega ) $ and show $ U[D[V]]=V $ . We assume $ V = \overline{S}^{\mathrm{QET}({\Sigma},\Omega)} $ for some $ S \subseteq \{ \emptyset \vdash{t}=_{\varepsilon}{u} \mid t,u \in T_\Sigma \Omega , \varepsilon \in \mathbb Q^+ \} $ . The adjunction $ U[-]\dashv D[-] $ implies $ U[D[V]] \subseteq V $ . We thus show $ V \subseteq U[D[V]] $ . For any $ t,u\in T_\Sigma\Omega $ and $ \epsilon\in\mathbb Q^+ $ , we have
Here, (*) uses Lemma 47. From the monotonicity of the closure $ \overline{(-)}^{\mathrm{QET}({\Sigma},\Omega)} $ , we conclude
Appendix D. Proofs for Section 7 (Graded Strong Relational Liftings for Divergences)
Lemma 50. Let $ ({\mathbb{C}},T) $ be a CC-SM and $ {\mathsf\Delta} = \{{\mathsf\Delta}^{m}_{I} \colon (U(TI))^2\to {\mathcal{Q}} \}_{m\in M,I\in{\mathbb{C}}} $ be a doubly indexed family of $ {\mathcal{Q}} $ -divergences on TI satisfying monotonicity on m (Definition 6). Then $ T^{[{\mathsf\Delta}]} $ is an $ M \times {\mathcal{Q}} $ -graded relational lifting of T (satisfies conditions 1–3 of Definition 37).
Proof. (Condition 1) We first show that $ (\mathrm{id}_{TX_1},\mathrm{id}_{TX_2}) \in {\bf BRel}({\mathbb{C}}) (T^{[{\mathsf\Delta}]} (m,v) X,T^{[{\mathsf\Delta}]} (n,w) X) $ for all X whenever $ m \leq n $ and $ v \leq w $ . From the monotonicity of $ {\mathsf\Delta} $ , for all $ I \in {\mathbb{C}} $ , $ c'_1, c'_2 \in U(TI) $ , $ n' \in M $ , $ w' \in {\mathcal{Q}} $ , we have
Therefore, for any $ (c_1,c_2) \in T^{[{\mathsf\Delta}]} (m,v) X $ , we obtain $ (c_1,c_2)\in T^{[{\mathsf\Delta}]} (n,w) X $ as follows:
(Condition 2) We next show $ (\eta_{X_1},\eta_{X_2}) :X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (1,0) X $ . From the definition of morphisms in $ {\bf BRel}({\mathbb{C}}) $ , for all $ (x_1,x_2) \in X $ , we have $ (\eta_{X_1} \mathbin{\bullet} x_1, \eta_{X_2} \mathbin{\bullet} x_2) \in T^{[{\mathsf\Delta}]} (1,0) X $ as follows:
(Condition 3) Finally, we show that $ (f_1^\sharp,f_2^\sharp) : T^{[{\mathsf\Delta}]} (n,w) X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (n\cdot m,w +v) Y $ holds for any $ (f_1,f_2) : X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (m,v) Y $ and $ (n,w)\in M \times {\mathcal{Q}} $ . For all $ (f_1,f_2) : X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (m,v) Y $ , we have
For all $ (c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X $ , we have
We here fix $ (f_1,f_2) : X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (m,v) Y $ . We show $ (f_1^\sharp,f_2^\sharp) \colon T^{[{\mathsf\Delta}]} (n,w) X \mathbin{\dot\rightarrow} T{\mathsf\Delta} (n \cdot m,w + v) Y $ . We also fix $ I\in{\mathbb{C}} $ , $ n''\in M $ , $ w''\in{\mathcal{Q}} $ and $ (k_1, k_2) :Y\mathbin{\dot\rightarrow} \tilde{\mathsf\Delta} (n'', w'') I $ . From (a), we obtain
Therefore, by instantiating (b) with $ (n',w') = (m \cdot n'',v + w'') $ and $ (l_1,l_2) = (k_1^\sharp \circ f_1,k_2^\sharp \circ f_2) $ , for all $ (c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X $ , we have
Since $ (c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X $ , $ I\in{\mathbb{C}} $ , $ n''\in M $ , $ w''\in{\mathcal{Q}} $ and $ (k_1, k_2) :Y\mathbin{\dot\rightarrow} \tilde{\mathsf\Delta} (n'', w'') I $ are arbitrary, we conclude $ (f_1^\sharp,f_2^\sharp) \colon T^{[{\mathsf\Delta}]} (n,w) X \mathbin{\dot\rightarrow} T{\mathsf\Delta} (n \cdot m,w + v) $ as follows:
This completes the proof.
Proof. (Proof of Proposition 40) Since $ \dot T $ lifts the Kleisli extension (Condition 3 of Definition 37), and satisfy the fundamental property, we obtain
Appendix E. Proofs for Section 9 (Case Study I: Higher-Order Probabilistic Programs)
Lemma 51. The mapping
forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}} \to G{\mathbb{R}} $ .
Proof. We show that for all $ A \in \Sigma_{\mathbb{R}} $ , the mapping $ f_A(x,\sigma) = \mathcal{N}(x,\sigma^2)(A) $ forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}}_{\neq 0} \to [0,1] $ where $ {\mathbb{R}}_{\neq 0} $ is the subspace of $ {\mathbb{R}} $ whose underlying set is $ \{r\in{\mathbb{R}} \mid r \neq 0 \} $ . We have
The mapping $ h(x,\sigma,r) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - r)^2}{\sigma^2}\right) $ forms a continuous function of type $ {\mathbb{R}} \times {\mathbb{R}}_{\neq 0} \times {\mathbb{R}} \to {\mathbb{R}} $ ; hence, it is uniformly continuous on the compact set $ I_1 \times I_2 \times [k,k+1] $ where $ I_1 $ and $ I_2 $ are arbitrary closed intervals in $ {\mathbb{R}} $ and $ {\mathbb{R}}_{\neq 0} $ , respectively. Then, for all $ 0 < \varepsilon $ , there exists $ 0 < \delta $ such that $ |h(x,\sigma,r) - h(x',\sigma',r') |<\varepsilon $ holds wherever $ |x - x'| + |\sigma - \sigma'| + |r -r '| < \delta $ . Hence, for all $ 0 < \varepsilon $ , there is $ 0 < \delta $ such that whenever $ |x - x'| + |\sigma - \sigma'|< \delta $ ,
Since the closed intervals $ I_1 $ and $ I_2 $ are arbitrary, we conclude that the function $ f_{A \cap [k,k+1]} \colon {\mathbb{R}} \times {\mathbb{R}}_{\neq 0} \to [0,1] $ is continuous, hence measurable. Hence, the mapping $ f_A = \sum_{k \in \mathbb{Z}} f_{A \cap [k,k+1]} $ is measurable. Since A is arbitrary and $ f_A(x,\sigma^2) = \mathrm{ev}_A \circ \mathcal{N}(x,\sigma^2) $ , the mapping $ g(x,\sigma) \triangleq \mathcal{N}(x,\sigma^2) $ forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}}_{\neq 0} \to G{\mathbb{R}} $ . Next, it is obvious that the mapping $ g'(x,\sigma) \triangleq \mathbf{d}_{x} $ forms a measurable function of type $ {\mathbb{R}} \times \{0\} \to G{\mathbb{R}} $ . Finally, the following mapping g” forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}} \to ({\mathbb{R}} \times {\mathbb{R}}_{\neq 0}+ {\mathbb{R}} \times \{0\}) $ :
Let $ A \in \Sigma_{{\mathbb{R}} \times {\mathbb{R}}_{\neq 0}+ {\mathbb{R}} \times \{0\}} $ . By definition of g”, we have $ (g'')^{-1}(A) = \iota_1^{-1}(A) \cup \iota_2^{-1}(A) $ . Since the coprojections are measurable, we have $ \iota_1^{-1}(A) \in \Sigma_{{\mathbb{R}} \times {\mathbb{R}}_{\neq 0}} $ and $ \iota_2^{-1}(A) \in \Sigma_{{\mathbb{R}} \times \{0\}} $ . Since $ {\mathbb{R}}_{\neq 0} $ and $ \{0\} $ are measurable subsets of $ {\mathbb{R}} $ , we have $ \Sigma_{{\mathbb{R}} \times {\mathbb{R}}_{\neq 0}},\Sigma_{{\mathbb{R}} \times \{0\}} \subseteq \Sigma_{{\mathbb{R}}\times{\mathbb{R}}} $ . Thus, $ (g'')^{-1}(A) \in \Sigma_{{\mathbb{R}} \times {\mathbb{R}}} $ . Since A is arbitrary, g” is measurable. Therefore, we conclude the measurability of the composition $ [g,g'] \circ g'' \colon {\mathbb{R}} \times {\mathbb{R}} \to G {\mathbb{R}} $ , which is exactly the mapping in the statement.
Corollary 52. $ [\![ \mathtt{norm}]\!] \in {\bf QBS}(K{\mathbb{R}}\times K{\mathbb{R}},PK{\mathbb{R}}) $ .
Lemma 53. The mapping
forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}} \to G{\mathbb{R}} $ .
Proof. We have, for all $ A \in \Sigma_{\mathbb{R}} $ ,
The density function $ h(x,\lambda,r) = \frac{1}{2\lambda} \exp\left(-\frac{|x - r|}{\lambda}\right) $ is continuous function of type $ {\mathbb{R}} \times {\mathbb{R}}_{0\leq} \times {\mathbb{R}} \to {\mathbb{R}} $ where $ {\mathbb{R}}_{0\leq} $ is the subspace of $ {\mathbb{R}} $ whose underlying set is $ \{r\in {\mathbb{R}} \mid 0 \leq r\} $ . The measurability of $ \mathrm{Lap}(x,\lambda) $ is proved in the same way as $ \mathcal{N}(x,\sigma^2) $ . The rest of proof is the same routine as Lemma 51.
Corollary 54. $ [\![ \mathtt{lap}]\!] \in {\bf QBS}(K{\mathbb{R}}\times K{\mathbb{R}},PK{\mathbb{R}}) $ .