Namespaces
Variants
Actions

Difference between revisions of "Wasserstein metric"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (AUTOMATIC EDIT (latexlist): Replaced 35 formulas out of 35 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 +
<!--This article has been texified automatically. Since there was no Nroff source code for this article,
 +
the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
 +
was used.
 +
If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
 +
 +
Out of 35 formulas, 35 were replaced by TEX code.-->
 +
 +
{{TEX|semi-auto}}{{TEX|done}}
 
''Vasershtein metric''
 
''Vasershtein metric''
  
 
The  "Wasserstein metric"  has a colourful history with several quite different fields of applications. It also has various historical sources.
 
The  "Wasserstein metric"  has a colourful history with several quite different fields of applications. It also has various historical sources.
  
The term  "Vasershtein distance"  appeared for the first time in [[#References|[a3]]]. For probability measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200201.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200202.png" /> (cf. also [[Probability measure|Probability measure]]) on a [[Metric space|metric space]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200203.png" />, L.N. Vasershtein [[#References|[a10]]] had introduced the metric <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200204.png" />, where the infimum is with respect to all random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200205.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200206.png" /> with distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200207.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200208.png" />. His work was very influential in [[Ergodic theory|ergodic theory]] in connection with generalizations of the [[Ornstein isomorphism theorem|Ornstein isomorphism theorem]] (see [[#References|[a5]]]). In the English literature the Russian name was pronounced typically as  "Wasserstein"  and the notation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w1200209.png" /> is common for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002010.png" />.
+
The term  "Vasershtein distance"  appeared for the first time in [[#References|[a3]]]. For probability measures $P$, $Q$ (cf. also [[Probability measure|Probability measure]]) on a [[Metric space|metric space]] $( U , d )$, L.N. Vasershtein [[#References|[a10]]] had introduced the metric $\operatorname {l} _ { 1 } ( P , Q ) = \operatorname { inf } \{ \mathsf{E} d ( X , Y ) \}$, where the infimum is with respect to all random variables $X$, $Y$ with distributions $P$, $Q$. His work was very influential in [[Ergodic theory|ergodic theory]] in connection with generalizations of the [[Ornstein isomorphism theorem|Ornstein isomorphism theorem]] (see [[#References|[a5]]]). In the English literature the Russian name was pronounced typically as  "Wasserstein"  and the notation $W ( P , Q )$ is common for $\mathbf{l} _ { 1 } ( P , Q )$.
  
The minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002012.png" />-metric <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002013.png" /> had been introduced and investigated already in 1940 by L.V. Kantorovich for compact metric spaces [[#References|[a6]]]. This work was motivated by the classical Monge transportation problem. Subsequently, the transportation distance was generalized to general cost functionals; the special case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002014.png" /> leads to the minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002016.png" />-metric <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002017.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002018.png" /> the usual <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002019.png" />-norm. The famous Kantorovich–Rubinshtein theorem [[#References|[a8]]] gives a dual representation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002020.png" /> in terms of a Lipschitz metric:
+
The minimal $L_1$-metric $\mathbf{l}_{1}$ had been introduced and investigated already in 1940 by L.V. Kantorovich for compact metric spaces [[#References|[a6]]]. This work was motivated by the classical Monge transportation problem. Subsequently, the transportation distance was generalized to general cost functionals; the special case $c ( x , y ) = d ^ { p } ( x , y )$ leads to the minimal $L _ { p }$-metric $\text{l} _ { p } ( P , Q ) = \operatorname { inf } \{ \| d ( X , Y ) \| _ { p } \}$, with $\| \cdot \| p$ the usual $L _ { p }$-norm. The famous Kantorovich–Rubinshtein theorem [[#References|[a8]]] gives a dual representation of $\mathbf{l}_{1}$ in terms of a Lipschitz metric:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002021.png" /></td> </tr></table>
+
\begin{equation*}  \operatorname{l} _ { 1 } ( P , Q ) = \operatorname{ sup } \left\{ \int f d ( P - Q ) : \operatorname { Lip } f \leq 1 \right\}. \end{equation*}
  
From this point of view, the notion of a Kantorovich metric or minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002022.png" />-metric (or minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002023.png" />-metric) seems historically to be also appropriate.
+
From this point of view, the notion of a Kantorovich metric or minimal $L_1$-metric (or minimal $L _ { p }$-metric) seems historically to be also appropriate.
  
In fact, in 1914, C. Gini, while introducing a  "simple index of dissimilarity" , also defined the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002024.png" /> metric in a discrete setting on the real line and T. Salvemini (the discrete case, [[#References|[a9]]]) and G. Dall'Aglio (the general case, [[#References|[a2]]]) proved the basic representation
+
In fact, in 1914, C. Gini, while introducing a  "simple index of dissimilarity" , also defined the $\mathbf{l}_{1}$ metric in a discrete setting on the real line and T. Salvemini (the discrete case, [[#References|[a9]]]) and G. Dall'Aglio (the general case, [[#References|[a2]]]) proved the basic representation
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002025.png" /></td> </tr></table>
+
\begin{equation*} \operatorname { l } _ { p } ^ { p } ( P , Q ) = \int _ { 0 } ^ { 1 } | F ^ { - 1 } ( u ) - G ^ { - 1 } ( u ) | ^ { p } d u , p \geq 1, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002026.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002027.png" /> are the distribution functions of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002028.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002029.png" />. Gini had given this formula for empirical distributions and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002030.png" />. This influential work initiated a lot of work on measures with given marginals in the Italian School of probability, while M. Fréchet [[#References|[a4]]] explicitly dealt with metric properties of these distances.
+
where $F$, $G$ are the distribution functions of $P$, $Q$. Gini had given this formula for empirical distributions and $p = 1,2$. This influential work initiated a lot of work on measures with given marginals in the Italian School of probability, while M. Fréchet [[#References|[a4]]] explicitly dealt with metric properties of these distances.
  
C.L. Mallows [[#References|[a7]]] introduced the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002031.png" />-metric in a statistical context. He used its properties for proving a [[Central limit theorem|central limit theorem]] and proved the representation above. Based on Mallows work, P.J. Bickel and D.A. Freedman [[#References|[a1]]] described topological properties and investigated applications to statistical problems such as the bootstrap (cf. also [[Bootstrap method|Bootstrap method]]). They introduced the notion of a Mallows metric for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002032.png" />. This notion is used mainly in the statistics literature and in some literature on algorithms.
+
C.L. Mallows [[#References|[a7]]] introduced the $I_2$-metric in a statistical context. He used its properties for proving a [[Central limit theorem|central limit theorem]] and proved the representation above. Based on Mallows work, P.J. Bickel and D.A. Freedman [[#References|[a1]]] described topological properties and investigated applications to statistical problems such as the bootstrap (cf. also [[Bootstrap method|Bootstrap method]]). They introduced the notion of a Mallows metric for $I_2$. This notion is used mainly in the statistics literature and in some literature on algorithms.
  
So, the minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002033.png" />-metric <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002034.png" /> was invented historically several times from different perspectives. Maybe historically the name Gini–Dall'Aglio–Kantorovich–Vasershtein–Mallows metric would be correct for this class of metrics. For simplicity reasons it seems preferable to use the notion of a minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002036.png" />-metric, and write it as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002037.png" />.
+
So, the minimal $L _ { p }$-metric $\text{I} _ { p }$ was invented historically several times from different perspectives. Maybe historically the name Gini–Dall'Aglio–Kantorovich–Vasershtein–Mallows metric would be correct for this class of metrics. For simplicity reasons it seems preferable to use the notion of a minimal $L _ { p }$-metric, and write it as $\mathbf{l} _ { p } ( P , Q )$.
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  P.J. Bickel,  D.A. Freedman,  "Some asymptotic theory for the bootstrap"  ''Ann. Statist.'' , '''9'''  (1981)  pp. 1196–1217</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  G. Dall'Aglio,  "Sugli estremi dei momenti delle funzioni di ripartizione doppia"  ''Ann. Scuola Norm. Sup. Pisa Cl. Sci.'' , '''3''' :  1  (1956)  pp. 33–74</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  R.L. Dobrushin,  "Definition of a system of random variables by conditional distributions"  ''Teor. Verojatnost. i Primenen.'' , '''15'''  (1970)  pp. 469–497</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  M. Fréchet,  "Les tableaux de corréllation dont les marges sont données"  ''Ann. Univ. Lyon Ser. A'' , '''20'''  (1957)  pp. 13–31</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  R.M. Gray,  D.L. Neuhoff,  P.C. Shields,  "A generalization to Ornstein's <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/w/w120/w120020/w12002038.png" />-distance with applications to information theory"  ''Ann. Probab.'' , '''3'''  (1975)  pp. 315–328</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  L.V. Kantorovich,  "On one effective method of solving certain classes of extremal problems"  ''Dokl. Akad. Nauk USSR'' , '''28'''  (1940)  pp. 212–215</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  C.L. Mallows,  "A note on asymptotic joint normality"  ''Ann. Math. Stat.'' , '''43'''  (1972)  pp. 508–515</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  L.V. Kantorovich,  G.Sh. Rubinstein,  "On a space of completely additive functions"  ''Vestnik Leningrad Univ., Ser. Mat. Mekh. i Astron.'' , '''13''' :  7  (1958)  pp. 52–59  (In Russian)</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  T. Salvemini,  "Sul calcolo degli indici di concordanza tra due caratteri quantitativi"  ''Atti della VI Riunione della Soc. Ital. di Statistica, Roma''  (1943)</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top">  L.N. Wasserstein,  "Markov processes over denumerable products of spaces describing large systems of automata"  ''Probl. Inform. Transmission'' , '''5'''  (1969)  pp. 47–52</TD></TR></table>
+
<table><tr><td valign="top">[a1]</td> <td valign="top">  P.J. Bickel,  D.A. Freedman,  "Some asymptotic theory for the bootstrap"  ''Ann. Statist.'' , '''9'''  (1981)  pp. 1196–1217</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  G. Dall'Aglio,  "Sugli estremi dei momenti delle funzioni di ripartizione doppia"  ''Ann. Scuola Norm. Sup. Pisa Cl. Sci.'' , '''3''' :  1  (1956)  pp. 33–74</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  R.L. Dobrushin,  "Definition of a system of random variables by conditional distributions"  ''Teor. Verojatnost. i Primenen.'' , '''15'''  (1970)  pp. 469–497 (In Russian)</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  M. Fréchet,  "Les tableaux de corréllation dont les marges sont données"  ''Ann. Univ. Lyon Ser. A'' , '''20'''  (1957)  pp. 13–31</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  R.M. Gray,  D.L. Neuhoff,  P.C. Shields,  "A generalization to Ornstein's $d$-distance with applications to information theory"  ''Ann. Probab.'' , '''3'''  (1975)  pp. 315–328</td></tr><tr><td valign="top">[a6]</td> <td valign="top">  L.V. Kantorovich,  "On one effective method of solving certain classes of extremal problems"  ''Dokl. Akad. Nauk USSR'' , '''28'''  (1940)  pp. 212–215</td></tr><tr><td valign="top">[a7]</td> <td valign="top">  C.L. Mallows,  "A note on asymptotic joint normality"  ''Ann. Math. Stat.'' , '''43'''  (1972)  pp. 508–515</td></tr><tr><td valign="top">[a8]</td> <td valign="top">  L.V. Kantorovich,  G.Sh. Rubinstein,  "On a space of completely additive functions"  ''Vestnik Leningrad Univ., Ser. Mat. Mekh. i Astron.'' , '''13''' :  7  (1958)  pp. 52–59  (In Russian)</td></tr><tr><td valign="top">[a9]</td> <td valign="top">  T. Salvemini,  "Sul calcolo degli indici di concordanza tra due caratteri quantitativi"  ''Atti della VI Riunione della Soc. Ital. di Statistica, Roma''  (1943)</td></tr><tr><td valign="top">[a10]</td> <td valign="top">  L.N. Wasserstein,  "Markov processes over denumerable products of spaces describing large systems of automata"  ''Probl. Inform. Transmission'' , '''5'''  (1969)  pp. 47–52</td></tr></table>

Latest revision as of 16:55, 1 July 2020

Vasershtein metric

The "Wasserstein metric" has a colourful history with several quite different fields of applications. It also has various historical sources.

The term "Vasershtein distance" appeared for the first time in [a3]. For probability measures $P$, $Q$ (cf. also Probability measure) on a metric space $( U , d )$, L.N. Vasershtein [a10] had introduced the metric $\operatorname {l} _ { 1 } ( P , Q ) = \operatorname { inf } \{ \mathsf{E} d ( X , Y ) \}$, where the infimum is with respect to all random variables $X$, $Y$ with distributions $P$, $Q$. His work was very influential in ergodic theory in connection with generalizations of the Ornstein isomorphism theorem (see [a5]). In the English literature the Russian name was pronounced typically as "Wasserstein" and the notation $W ( P , Q )$ is common for $\mathbf{l} _ { 1 } ( P , Q )$.

The minimal $L_1$-metric $\mathbf{l}_{1}$ had been introduced and investigated already in 1940 by L.V. Kantorovich for compact metric spaces [a6]. This work was motivated by the classical Monge transportation problem. Subsequently, the transportation distance was generalized to general cost functionals; the special case $c ( x , y ) = d ^ { p } ( x , y )$ leads to the minimal $L _ { p }$-metric $\text{l} _ { p } ( P , Q ) = \operatorname { inf } \{ \| d ( X , Y ) \| _ { p } \}$, with $\| \cdot \| p$ the usual $L _ { p }$-norm. The famous Kantorovich–Rubinshtein theorem [a8] gives a dual representation of $\mathbf{l}_{1}$ in terms of a Lipschitz metric:

\begin{equation*} \operatorname{l} _ { 1 } ( P , Q ) = \operatorname{ sup } \left\{ \int f d ( P - Q ) : \operatorname { Lip } f \leq 1 \right\}. \end{equation*}

From this point of view, the notion of a Kantorovich metric or minimal $L_1$-metric (or minimal $L _ { p }$-metric) seems historically to be also appropriate.

In fact, in 1914, C. Gini, while introducing a "simple index of dissimilarity" , also defined the $\mathbf{l}_{1}$ metric in a discrete setting on the real line and T. Salvemini (the discrete case, [a9]) and G. Dall'Aglio (the general case, [a2]) proved the basic representation

\begin{equation*} \operatorname { l } _ { p } ^ { p } ( P , Q ) = \int _ { 0 } ^ { 1 } | F ^ { - 1 } ( u ) - G ^ { - 1 } ( u ) | ^ { p } d u , p \geq 1, \end{equation*}

where $F$, $G$ are the distribution functions of $P$, $Q$. Gini had given this formula for empirical distributions and $p = 1,2$. This influential work initiated a lot of work on measures with given marginals in the Italian School of probability, while M. Fréchet [a4] explicitly dealt with metric properties of these distances.

C.L. Mallows [a7] introduced the $I_2$-metric in a statistical context. He used its properties for proving a central limit theorem and proved the representation above. Based on Mallows work, P.J. Bickel and D.A. Freedman [a1] described topological properties and investigated applications to statistical problems such as the bootstrap (cf. also Bootstrap method). They introduced the notion of a Mallows metric for $I_2$. This notion is used mainly in the statistics literature and in some literature on algorithms.

So, the minimal $L _ { p }$-metric $\text{I} _ { p }$ was invented historically several times from different perspectives. Maybe historically the name Gini–Dall'Aglio–Kantorovich–Vasershtein–Mallows metric would be correct for this class of metrics. For simplicity reasons it seems preferable to use the notion of a minimal $L _ { p }$-metric, and write it as $\mathbf{l} _ { p } ( P , Q )$.

References

[a1] P.J. Bickel, D.A. Freedman, "Some asymptotic theory for the bootstrap" Ann. Statist. , 9 (1981) pp. 1196–1217
[a2] G. Dall'Aglio, "Sugli estremi dei momenti delle funzioni di ripartizione doppia" Ann. Scuola Norm. Sup. Pisa Cl. Sci. , 3 : 1 (1956) pp. 33–74
[a3] R.L. Dobrushin, "Definition of a system of random variables by conditional distributions" Teor. Verojatnost. i Primenen. , 15 (1970) pp. 469–497 (In Russian)
[a4] M. Fréchet, "Les tableaux de corréllation dont les marges sont données" Ann. Univ. Lyon Ser. A , 20 (1957) pp. 13–31
[a5] R.M. Gray, D.L. Neuhoff, P.C. Shields, "A generalization to Ornstein's $d$-distance with applications to information theory" Ann. Probab. , 3 (1975) pp. 315–328
[a6] L.V. Kantorovich, "On one effective method of solving certain classes of extremal problems" Dokl. Akad. Nauk USSR , 28 (1940) pp. 212–215
[a7] C.L. Mallows, "A note on asymptotic joint normality" Ann. Math. Stat. , 43 (1972) pp. 508–515
[a8] L.V. Kantorovich, G.Sh. Rubinstein, "On a space of completely additive functions" Vestnik Leningrad Univ., Ser. Mat. Mekh. i Astron. , 13 : 7 (1958) pp. 52–59 (In Russian)
[a9] T. Salvemini, "Sul calcolo degli indici di concordanza tra due caratteri quantitativi" Atti della VI Riunione della Soc. Ital. di Statistica, Roma (1943)
[a10] L.N. Wasserstein, "Markov processes over denumerable products of spaces describing large systems of automata" Probl. Inform. Transmission , 5 (1969) pp. 47–52
How to Cite This Entry:
Wasserstein metric. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Wasserstein_metric&oldid=32291
This article was adapted from an original article by L. Rueshendorff (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article