# Difference between revisions of "Wasserstein metric"

Chirvasitua (talk | contribs) |
m (AUTOMATIC EDIT (latexlist): Replaced 35 formulas out of 35 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.) |
||

Line 1: | Line 1: | ||

+ | <!--This article has been texified automatically. Since there was no Nroff source code for this article, | ||

+ | the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist | ||

+ | was used. | ||

+ | If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category. | ||

+ | |||

+ | Out of 35 formulas, 35 were replaced by TEX code.--> | ||

+ | |||

+ | {{TEX|semi-auto}}{{TEX|done}} | ||

''Vasershtein metric'' | ''Vasershtein metric'' | ||

The "Wasserstein metric" has a colourful history with several quite different fields of applications. It also has various historical sources. | The "Wasserstein metric" has a colourful history with several quite different fields of applications. It also has various historical sources. | ||

− | The term "Vasershtein distance" appeared for the first time in [[#References|[a3]]]. For probability measures | + | The term "Vasershtein distance" appeared for the first time in [[#References|[a3]]]. For probability measures $P$, $Q$ (cf. also [[Probability measure|Probability measure]]) on a [[Metric space|metric space]] $( U , d )$, L.N. Vasershtein [[#References|[a10]]] had introduced the metric $\operatorname {l} _ { 1 } ( P , Q ) = \operatorname { inf } \{ \mathsf{E} d ( X , Y ) \}$, where the infimum is with respect to all random variables $X$, $Y$ with distributions $P$, $Q$. His work was very influential in [[Ergodic theory|ergodic theory]] in connection with generalizations of the [[Ornstein isomorphism theorem|Ornstein isomorphism theorem]] (see [[#References|[a5]]]). In the English literature the Russian name was pronounced typically as "Wasserstein" and the notation $W ( P , Q )$ is common for $\mathbf{l} _ { 1 } ( P , Q )$. |

− | The minimal | + | The minimal $L_1$-metric $\mathbf{l}_{1}$ had been introduced and investigated already in 1940 by L.V. Kantorovich for compact metric spaces [[#References|[a6]]]. This work was motivated by the classical Monge transportation problem. Subsequently, the transportation distance was generalized to general cost functionals; the special case $c ( x , y ) = d ^ { p } ( x , y )$ leads to the minimal $L _ { p }$-metric $\text{l} _ { p } ( P , Q ) = \operatorname { inf } \{ \| d ( X , Y ) \| _ { p } \}$, with $\| \cdot \| p$ the usual $L _ { p }$-norm. The famous Kantorovich–Rubinshtein theorem [[#References|[a8]]] gives a dual representation of $\mathbf{l}_{1}$ in terms of a Lipschitz metric: |

− | + | \begin{equation*} \operatorname{l} _ { 1 } ( P , Q ) = \operatorname{ sup } \left\{ \int f d ( P - Q ) : \operatorname { Lip } f \leq 1 \right\}. \end{equation*} | |

− | From this point of view, the notion of a Kantorovich metric or minimal | + | From this point of view, the notion of a Kantorovich metric or minimal $L_1$-metric (or minimal $L _ { p }$-metric) seems historically to be also appropriate. |

− | In fact, in 1914, C. Gini, while introducing a "simple index of dissimilarity" , also defined the | + | In fact, in 1914, C. Gini, while introducing a "simple index of dissimilarity" , also defined the $\mathbf{l}_{1}$ metric in a discrete setting on the real line and T. Salvemini (the discrete case, [[#References|[a9]]]) and G. Dall'Aglio (the general case, [[#References|[a2]]]) proved the basic representation |

− | + | \begin{equation*} \operatorname { l } _ { p } ^ { p } ( P , Q ) = \int _ { 0 } ^ { 1 } | F ^ { - 1 } ( u ) - G ^ { - 1 } ( u ) | ^ { p } d u , p \geq 1, \end{equation*} | |

− | where | + | where $F$, $G$ are the distribution functions of $P$, $Q$. Gini had given this formula for empirical distributions and $p = 1,2$. This influential work initiated a lot of work on measures with given marginals in the Italian School of probability, while M. Fréchet [[#References|[a4]]] explicitly dealt with metric properties of these distances. |

− | C.L. Mallows [[#References|[a7]]] introduced the | + | C.L. Mallows [[#References|[a7]]] introduced the $I_2$-metric in a statistical context. He used its properties for proving a [[Central limit theorem|central limit theorem]] and proved the representation above. Based on Mallows work, P.J. Bickel and D.A. Freedman [[#References|[a1]]] described topological properties and investigated applications to statistical problems such as the bootstrap (cf. also [[Bootstrap method|Bootstrap method]]). They introduced the notion of a Mallows metric for $I_2$. This notion is used mainly in the statistics literature and in some literature on algorithms. |

− | So, the minimal | + | So, the minimal $L _ { p }$-metric $\text{I} _ { p }$ was invented historically several times from different perspectives. Maybe historically the name Gini–Dall'Aglio–Kantorovich–Vasershtein–Mallows metric would be correct for this class of metrics. For simplicity reasons it seems preferable to use the notion of a minimal $L _ { p }$-metric, and write it as $\mathbf{l} _ { p } ( P , Q )$. |

====References==== | ====References==== | ||

− | <table>< | + | <table><tr><td valign="top">[a1]</td> <td valign="top"> P.J. Bickel, D.A. Freedman, "Some asymptotic theory for the bootstrap" ''Ann. Statist.'' , '''9''' (1981) pp. 1196–1217</td></tr><tr><td valign="top">[a2]</td> <td valign="top"> G. Dall'Aglio, "Sugli estremi dei momenti delle funzioni di ripartizione doppia" ''Ann. Scuola Norm. Sup. Pisa Cl. Sci.'' , '''3''' : 1 (1956) pp. 33–74</td></tr><tr><td valign="top">[a3]</td> <td valign="top"> R.L. Dobrushin, "Definition of a system of random variables by conditional distributions" ''Teor. Verojatnost. i Primenen.'' , '''15''' (1970) pp. 469–497 (In Russian)</td></tr><tr><td valign="top">[a4]</td> <td valign="top"> M. Fréchet, "Les tableaux de corréllation dont les marges sont données" ''Ann. Univ. Lyon Ser. A'' , '''20''' (1957) pp. 13–31</td></tr><tr><td valign="top">[a5]</td> <td valign="top"> R.M. Gray, D.L. Neuhoff, P.C. Shields, "A generalization to Ornstein's $d$-distance with applications to information theory" ''Ann. Probab.'' , '''3''' (1975) pp. 315–328</td></tr><tr><td valign="top">[a6]</td> <td valign="top"> L.V. Kantorovich, "On one effective method of solving certain classes of extremal problems" ''Dokl. Akad. Nauk USSR'' , '''28''' (1940) pp. 212–215</td></tr><tr><td valign="top">[a7]</td> <td valign="top"> C.L. Mallows, "A note on asymptotic joint normality" ''Ann. Math. Stat.'' , '''43''' (1972) pp. 508–515</td></tr><tr><td valign="top">[a8]</td> <td valign="top"> L.V. Kantorovich, G.Sh. Rubinstein, "On a space of completely additive functions" ''Vestnik Leningrad Univ., Ser. Mat. Mekh. i Astron.'' , '''13''' : 7 (1958) pp. 52–59 (In Russian)</td></tr><tr><td valign="top">[a9]</td> <td valign="top"> T. Salvemini, "Sul calcolo degli indici di concordanza tra due caratteri quantitativi" ''Atti della VI Riunione della Soc. Ital. di Statistica, Roma'' (1943)</td></tr><tr><td valign="top">[a10]</td> <td valign="top"> L.N. Wasserstein, "Markov processes over denumerable products of spaces describing large systems of automata" ''Probl. Inform. Transmission'' , '''5''' (1969) pp. 47–52</td></tr></table> |

## Latest revision as of 16:55, 1 July 2020

*Vasershtein metric*

The "Wasserstein metric" has a colourful history with several quite different fields of applications. It also has various historical sources.

The term "Vasershtein distance" appeared for the first time in [a3]. For probability measures $P$, $Q$ (cf. also Probability measure) on a metric space $( U , d )$, L.N. Vasershtein [a10] had introduced the metric $\operatorname {l} _ { 1 } ( P , Q ) = \operatorname { inf } \{ \mathsf{E} d ( X , Y ) \}$, where the infimum is with respect to all random variables $X$, $Y$ with distributions $P$, $Q$. His work was very influential in ergodic theory in connection with generalizations of the Ornstein isomorphism theorem (see [a5]). In the English literature the Russian name was pronounced typically as "Wasserstein" and the notation $W ( P , Q )$ is common for $\mathbf{l} _ { 1 } ( P , Q )$.

The minimal $L_1$-metric $\mathbf{l}_{1}$ had been introduced and investigated already in 1940 by L.V. Kantorovich for compact metric spaces [a6]. This work was motivated by the classical Monge transportation problem. Subsequently, the transportation distance was generalized to general cost functionals; the special case $c ( x , y ) = d ^ { p } ( x , y )$ leads to the minimal $L _ { p }$-metric $\text{l} _ { p } ( P , Q ) = \operatorname { inf } \{ \| d ( X , Y ) \| _ { p } \}$, with $\| \cdot \| p$ the usual $L _ { p }$-norm. The famous Kantorovich–Rubinshtein theorem [a8] gives a dual representation of $\mathbf{l}_{1}$ in terms of a Lipschitz metric:

\begin{equation*} \operatorname{l} _ { 1 } ( P , Q ) = \operatorname{ sup } \left\{ \int f d ( P - Q ) : \operatorname { Lip } f \leq 1 \right\}. \end{equation*}

From this point of view, the notion of a Kantorovich metric or minimal $L_1$-metric (or minimal $L _ { p }$-metric) seems historically to be also appropriate.

In fact, in 1914, C. Gini, while introducing a "simple index of dissimilarity" , also defined the $\mathbf{l}_{1}$ metric in a discrete setting on the real line and T. Salvemini (the discrete case, [a9]) and G. Dall'Aglio (the general case, [a2]) proved the basic representation

\begin{equation*} \operatorname { l } _ { p } ^ { p } ( P , Q ) = \int _ { 0 } ^ { 1 } | F ^ { - 1 } ( u ) - G ^ { - 1 } ( u ) | ^ { p } d u , p \geq 1, \end{equation*}

where $F$, $G$ are the distribution functions of $P$, $Q$. Gini had given this formula for empirical distributions and $p = 1,2$. This influential work initiated a lot of work on measures with given marginals in the Italian School of probability, while M. Fréchet [a4] explicitly dealt with metric properties of these distances.

C.L. Mallows [a7] introduced the $I_2$-metric in a statistical context. He used its properties for proving a central limit theorem and proved the representation above. Based on Mallows work, P.J. Bickel and D.A. Freedman [a1] described topological properties and investigated applications to statistical problems such as the bootstrap (cf. also Bootstrap method). They introduced the notion of a Mallows metric for $I_2$. This notion is used mainly in the statistics literature and in some literature on algorithms.

So, the minimal $L _ { p }$-metric $\text{I} _ { p }$ was invented historically several times from different perspectives. Maybe historically the name Gini–Dall'Aglio–Kantorovich–Vasershtein–Mallows metric would be correct for this class of metrics. For simplicity reasons it seems preferable to use the notion of a minimal $L _ { p }$-metric, and write it as $\mathbf{l} _ { p } ( P , Q )$.

#### References

[a1] | P.J. Bickel, D.A. Freedman, "Some asymptotic theory for the bootstrap" Ann. Statist. , 9 (1981) pp. 1196–1217 |

[a2] | G. Dall'Aglio, "Sugli estremi dei momenti delle funzioni di ripartizione doppia" Ann. Scuola Norm. Sup. Pisa Cl. Sci. , 3 : 1 (1956) pp. 33–74 |

[a3] | R.L. Dobrushin, "Definition of a system of random variables by conditional distributions" Teor. Verojatnost. i Primenen. , 15 (1970) pp. 469–497 (In Russian) |

[a4] | M. Fréchet, "Les tableaux de corréllation dont les marges sont données" Ann. Univ. Lyon Ser. A , 20 (1957) pp. 13–31 |

[a5] | R.M. Gray, D.L. Neuhoff, P.C. Shields, "A generalization to Ornstein's $d$-distance with applications to information theory" Ann. Probab. , 3 (1975) pp. 315–328 |

[a6] | L.V. Kantorovich, "On one effective method of solving certain classes of extremal problems" Dokl. Akad. Nauk USSR , 28 (1940) pp. 212–215 |

[a7] | C.L. Mallows, "A note on asymptotic joint normality" Ann. Math. Stat. , 43 (1972) pp. 508–515 |

[a8] | L.V. Kantorovich, G.Sh. Rubinstein, "On a space of completely additive functions" Vestnik Leningrad Univ., Ser. Mat. Mekh. i Astron. , 13 : 7 (1958) pp. 52–59 (In Russian) |

[a9] | T. Salvemini, "Sul calcolo degli indici di concordanza tra due caratteri quantitativi" Atti della VI Riunione della Soc. Ital. di Statistica, Roma (1943) |

[a10] | L.N. Wasserstein, "Markov processes over denumerable products of spaces describing large systems of automata" Probl. Inform. Transmission , 5 (1969) pp. 47–52 |

**How to Cite This Entry:**

Wasserstein metric.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Wasserstein_metric&oldid=50083