Namespaces
Variants
Actions

Difference between revisions of "Vapnik-Chervonenkis class"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (tex encoded by computer)
Line 1: Line 1:
 +
<!--
 +
v1100201.png
 +
$#A+1 = 140 n = 1
 +
$#C+1 = 140 : ~/encyclopedia/old_files/data/V110/V.1100020 Vapnik\ANDChervonenkis class,
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
''Vapnik–Červonenkis class''
 
''Vapnik–Červonenkis class''
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100201.png" /> be a set, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100202.png" /> a collection of subsets of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100203.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100204.png" /> a finite subset of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100205.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100206.png" /> is said to shatter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100207.png" /> if for every subset <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100208.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100209.png" /> there is a set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002010.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002011.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002012.png" />. If there is a largest, finite <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002013.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002014.png" /> shatters at least one set of cardinality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002015.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002016.png" /> is called a Vapnik–Chervonenkis class, or VC class, of sets and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002017.png" /> its Vapnik–Chervonenkis index.
+
Let $  S $
 +
be a set, $  {\mathcal C} $
 +
a collection of subsets of $  S $
 +
and $  F $
 +
a finite subset of $  S $.  
 +
Then $  {\mathcal C} $
 +
is said to shatter $  F $
 +
if for every subset $  A $
 +
of $  F $
 +
there is a set $  C $
 +
in $  {\mathcal C} $
 +
with $  A = C \cap F $.  
 +
If there is a largest, finite $  k $
 +
such that $  {\mathcal C} $
 +
shatters at least one set of cardinality $  k $,  
 +
then $  {\mathcal C} $
 +
is called a Vapnik–Chervonenkis class, or VC class, of sets and $  S ( {\mathcal C} ) = k $
 +
its Vapnik–Chervonenkis index.
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002018.png" /> be the number of different sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002019.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002020.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002021.png" /> be the maximum of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002022.png" /> over all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002023.png" /> of cardinality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002024.png" />. Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002025.png" /> is a Vapnik–Chervonenkis class if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002026.png" /> for some finite <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002027.png" />, and then for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002028.png" />. Sauer's lemma says that
+
Let $  \Delta  ^  {\mathcal C}  ( F ) $
 +
be the number of different sets $  C \cap F $
 +
for $  C \in {\mathcal C} $.  
 +
Let $  m  ^  {\mathcal C}  ( n ) $
 +
be the maximum of $  \Delta  ^  {\mathcal C}  ( F ) $
 +
over all sets $  F $
 +
of cardinality $  n $.  
 +
Thus, $  {\mathcal C} $
 +
is a Vapnik–Chervonenkis class if and only if $  m  ^  {\mathcal C}  ( n ) < 2  ^ {n} $
 +
for some finite $  n $,  
 +
and then for all $  n > S ( {\mathcal C} ) $.  
 +
Sauer's lemma says that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002029.png" /></td> </tr></table>
+
$$
 +
m  ^  {\mathcal C}  ( n ) \leq  \sum _ {j = 0 } ^ { {S }  ( {\mathcal C} ) } \left ( \begin{array}{c}
 +
n \\
 +
j
 +
\end{array}
 +
\right ) .
 +
$$
  
Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002030.png" /> is either always <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002031.png" /> or, for a Vapnik–Chervonenkis class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002032.png" />, it is bounded above by a polynomial in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002033.png" /> of degree <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002034.png" />. (This is the so-called Vapnik–Chervonenkis property: if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002035.png" /> for large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002036.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002037.png" /> is bounded by a polynomial.)
+
Thus, $  m  ^  {\mathcal C}  ( n ) $
 +
is either always $  2  ^ {n} $
 +
or, for a Vapnik–Chervonenkis class $  {\mathcal C} $,  
 +
it is bounded above by a polynomial in $  n $
 +
of degree $  S ( {\mathcal C} ) $.  
 +
(This is the so-called Vapnik–Chervonenkis property: if $  m  ^  {\mathcal C}  ( n ) < 2  ^ {n} $
 +
for large $  n $,  
 +
then $  m  ^  {\mathcal C}  ( n ) $
 +
is bounded by a polynomial.)
  
Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002038.png" />-algebra <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002039.png" /> of subsets of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002040.png" />, including <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002041.png" />, and a mapping <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002042.png" /> from a complete separable [[Metric space|metric space]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002043.png" /> onto <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002044.png" /> such that the set of pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002045.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002046.png" /> is product-measurable in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002047.png" />. A VC class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002048.png" /> satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the [[Axiom of choice|axiom of choice]], the VC classes usually encountered in applications are VCM.
+
Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $  \sigma $-
 +
algebra $  {\mathcal S} $
 +
of subsets of $  S $,  
 +
including $  {\mathcal C} $,  
 +
and a mapping $  Y $
 +
from a complete separable [[Metric space|metric space]] $  U $
 +
onto $  {\mathcal C} $
 +
such that the set of pairs $  ( x,u ) $
 +
with $  x \in Y ( u ) $
 +
is product-measurable in $  S \times U $.  
 +
A VC class $  {\mathcal C} $
 +
satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the [[Axiom of choice|axiom of choice]], the VC classes usually encountered in applications are VCM.
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002049.png" /> be a probability measure on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002050.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002051.png" /> be independent coordinates with distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002052.png" />, specifically, on a countable Cartesian product of copies of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002053.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002054.png" /> be the sum of the point masses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002055.png" /> at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002056.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002057.png" />; it is called an empirical measure for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002058.png" /> (cf. also [[Empirical process|Empirical process]]). Then the [[Law of large numbers|law of large numbers]] for empirical measures holds uniformly over any VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002059.png" />, meaning that the supremum for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002060.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002061.png" /> approaches zero almost surely as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002062.png" /> becomes large [[#References|[a7]]]. This can be improved to a uniform [[Law of the iterated logarithm|law of the iterated logarithm]], meaning that for any VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002063.png" />, with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002064.png" />,
+
Let $  {\mathsf P} $
 +
be a probability measure on $  ( S, {\mathcal S} ) $
 +
and let $  X _ {1} ,X _ {2} , \dots $
 +
be independent coordinates with distribution $  {\mathsf P} $,  
 +
specifically, on a countable Cartesian product of copies of $  ( S, {\mathcal S}, {\mathsf P} ) $.  
 +
Let $  {\mathsf P} _ {n} $
 +
be the sum of the point masses $  {1 / n } $
 +
at $  X _ {i} $
 +
for $  i = 1 \dots n $;  
 +
it is called an empirical measure for $  {\mathsf P} $(
 +
cf. also [[Empirical process|Empirical process]]). Then the [[Law of large numbers|law of large numbers]] for empirical measures holds uniformly over any VCM class $  {\mathcal C} $,  
 +
meaning that the supremum for $  C \in {\mathcal C} $
 +
of $  | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $
 +
approaches zero almost surely as $  n $
 +
becomes large [[#References|[a7]]]. This can be improved to a uniform [[Law of the iterated logarithm|law of the iterated logarithm]], meaning that for any VCM class $  {\mathcal C} $,  
 +
with probability $  1 $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002065.png" /></td> </tr></table>
+
$$
 +
{\lim\limits }  \sup  _ {n \rightarrow \infty } n ^ {1/2 }  \sup  _ {C \in {\mathcal C} } {
 +
\frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } }
 +
} =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002066.png" /></td> </tr></table>
+
$$
 +
=  
 +
\sup  _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } .
 +
$$
  
Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002067.png" /> is any VCM class, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002068.png" /> assigns to sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002069.png" /> jointly normal (Gaussian) random variables with mean zero and covariances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002070.png" />, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002071.png" /> there is a sufficiently large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002072.png" /> such that for every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002073.png" />, there exists a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002074.png" /> with
+
Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if $  {\mathcal C} $
 +
is any VCM class, and $  G _  {\mathsf P}  $
 +
assigns to sets in $  {\mathcal C} $
 +
jointly normal (Gaussian) random variables with mean zero and covariances $  {\mathsf E} G _  {\mathsf P}  ( A ) G _  {\mathsf P}  ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $,  
 +
then for any $  \epsilon > 0 $
 +
there is a sufficiently large $  m $
 +
such that for every $  n \geq  m $,  
 +
there exists a $  G _  {\mathsf P}  $
 +
with
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002075.png" /></td> </tr></table>
+
$$
 +
\sup  _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _  {\mathsf P}  ( A ) } \right | < \epsilon
 +
$$
  
on an event with probability at least <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002076.png" />. For the uniform central limit theorem to hold for each probability measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002077.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002078.png" />, the VC property is also necessary.
+
on an event with probability at least $  1 - \epsilon $.  
 +
For the uniform central limit theorem to hold for each probability measure $  {\mathsf P} $
 +
on $  ( S, {\mathcal S} ) $,  
 +
the VC property is also necessary.
  
VC classes can be generated as follows. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002079.png" /> be a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002080.png" />-dimensional [[Vector space|vector space]] of real-valued functions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002081.png" />. For each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002082.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002083.png" /> be the set where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002084.png" />. Then the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002085.png" /> of all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002086.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002087.png" /> is a VC class with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002088.png" />. For example, the set of all ellipsoids in a Euclidean space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002089.png" /> is a VCM class for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002090.png" />. Also, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002091.png" /> be a VC class and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002092.png" /> a finite integer. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002093.png" /> be the union of all Boolean algebras of sets (cf. [[Boolean algebra|Boolean algebra]]), each generated by at most <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002094.png" /> sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002095.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002096.png" /> is a VC class. For example, the set of all convex polytopes with at most <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002097.png" /> faces in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002098.png" /> is a VC class for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002099.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020100.png" />. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [[#References|[a4]]].
+
VC classes can be generated as follows. Let $  V $
 +
be a $  k $-
 +
dimensional [[Vector space|vector space]] of real-valued functions on $  S $.  
 +
For each $  f \in V $,  
 +
let $  { \mathop{\rm pos} } ( f ) $
 +
be the set where $  f > 0 $.  
 +
Then the class $  {\mathcal C} $
 +
of all sets $  { \mathop{\rm pos} } ( f ) $
 +
for $  f \in V $
 +
is a VC class with $  S ( {\mathcal C} ) = k $.  
 +
For example, the set of all ellipsoids in a Euclidean space $  \mathbf R  ^ {d} $
 +
is a VCM class for each $  d $.  
 +
Also, let $  {\mathcal C} $
 +
be a VC class and $  m $
 +
a finite integer. Let $  {\mathcal D} $
 +
be the union of all Boolean algebras of sets (cf. [[Boolean algebra|Boolean algebra]]), each generated by at most $  m $
 +
sets in $  {\mathcal C} $.  
 +
Then $  {\mathcal D} $
 +
is a VC class. For example, the set of all convex polytopes with at most $  m $
 +
faces in $  \mathbf R  ^ {d} $
 +
is a VC class for each $  m $
 +
and $  d $.  
 +
Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [[#References|[a4]]].
  
The class of all finite sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020101.png" /> and the class of all closed convex sets are not VC classes.
+
The class of all finite sets in $  \mathbf R  ^ {d} $
 +
and the class of all closed convex sets are not VC classes.
  
The notion of VC class extends in different ways to a class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020102.png" /> of real functions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020103.png" />. The subgraph of a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020104.png" /> is the set
+
The notion of VC class extends in different ways to a class $  {\mathcal F} $
 +
of real functions on $  S $.  
 +
The subgraph of a function $  f $
 +
is the set
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020105.png" /></td> </tr></table>
+
$$
 +
\left \{ {( s,x ) } : {0 \leq  x \leq  f ( s )  \textrm{ or  }  f ( s ) \leq  x \leq  0 } \right \}
 +
$$
  
in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020106.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020107.png" /> is called a VC subgraph class if the collection of all subgraphs of functions in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020108.png" /> is a VC class in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020109.png" />; it is called a VC major class if the class of all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020110.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020111.png" /> and real <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020112.png" /> is a VC class in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020113.png" />.
+
in $  S \times \mathbf R $.  
 +
Then $  {\mathcal F} $
 +
is called a VC subgraph class if the collection of all subgraphs of functions in $  {\mathcal F} $
 +
is a VC class in $  S \times \mathbf R $;  
 +
it is called a VC major class if the class of all sets $  \{ {s \in S } : {f ( s ) > x } \} $
 +
for $  f \in {\mathcal F} $
 +
and real $  x $
 +
is a VC class in $  S $.
  
The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020114.png" /> appears not to be equivalent to any VC-type combinatorial property.
+
The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $  {\mathsf P} $
 +
appears not to be equivalent to any VC-type combinatorial property.
  
For a probability measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020115.png" /> and two events <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020116.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020117.png" />. For a totally bounded metric space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020118.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020119.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020120.png" /> be the maximum number of points of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020121.png" /> all at distance at least <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020122.png" /> from each other. For any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020123.png" /> there is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020124.png" /> such that for every VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020125.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020126.png" /> and any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020127.png" />,
+
For a probability measure $  {\mathsf P} $
 +
and two events $  A,B $,  
 +
let $  d _ {1, {\mathsf P} }  ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $.  
 +
For a totally bounded metric space $  ( T,d ) $
 +
and  $  \epsilon > 0 $,  
 +
let $  D ( \epsilon,T,d ) $
 +
be the maximum number of points of $  T $
 +
all at distance at least $  \epsilon $
 +
from each other. For any $  m $
 +
there is a $  K _ {m} < \infty $
 +
such that for every VCM class $  {\mathcal C} $
 +
with $  S ( {\mathcal C} ) = m $
 +
and any $  {\mathsf P} $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020128.png" /></td> </tr></table>
+
$$
 +
D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} }  ) \leq  K _ {m} \epsilon ^ {- m } ,
 +
$$
  
[[#References|[a3]]]. There is a universal constant <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020129.png" /> such that for every VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020130.png" /> and any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020131.png" />,
+
[[#References|[a3]]]. There is a universal constant $  K $
 +
such that for every VCM class $  {\mathcal C} $
 +
and any $  M < \infty $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020132.png" /></td> </tr></table>
+
$$
 +
{ \mathop{\rm Pr} } \left \{  \sup  _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020133.png" /></td> </tr></table>
+
$$
 +
\leq 
 +
KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M  ^ {2} ) ,
 +
$$
  
 
[[#References|[a5]]].
 
[[#References|[a5]]].
  
Every VC class is included in a maximal class with the same VC index. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020134.png" /> is a maximal VC class of index <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020135.png" />, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020136.png" /> the set of symmetric differences <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020137.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020138.png" /> has a tree-like partial ordering by inclusion, and conversely, such an ordering implies <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020139.png" /> [[#References|[a2]]]. For index greater than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020140.png" /> no such structure is known (1996).
+
Every VC class is included in a maximal class with the same VC index. If $  {\mathcal C} $
 +
is a maximal VC class of index $  1 $,  
 +
then for any $  A \in {\mathcal C} $
 +
the set of symmetric differences $  ( B \setminus  A ) \cup ( A \setminus  B ) $
 +
for $  B \in {\mathcal C} $
 +
has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $  S ( {\mathcal C} ) = 1 $[[#References|[a2]]]. For index greater than $  1 $
 +
no such structure is known (1996).
  
 
A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [[#References|[a6]]], Sect. 2.6.
 
A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [[#References|[a6]]], Sect. 2.6.

Revision as of 08:27, 6 June 2020


Vapnik–Červonenkis class

Let $ S $ be a set, $ {\mathcal C} $ a collection of subsets of $ S $ and $ F $ a finite subset of $ S $. Then $ {\mathcal C} $ is said to shatter $ F $ if for every subset $ A $ of $ F $ there is a set $ C $ in $ {\mathcal C} $ with $ A = C \cap F $. If there is a largest, finite $ k $ such that $ {\mathcal C} $ shatters at least one set of cardinality $ k $, then $ {\mathcal C} $ is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $ its Vapnik–Chervonenkis index.

Let $ \Delta ^ {\mathcal C} ( F ) $ be the number of different sets $ C \cap F $ for $ C \in {\mathcal C} $. Let $ m ^ {\mathcal C} ( n ) $ be the maximum of $ \Delta ^ {\mathcal C} ( F ) $ over all sets $ F $ of cardinality $ n $. Thus, $ {\mathcal C} $ is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for some finite $ n $, and then for all $ n > S ( {\mathcal C} ) $. Sauer's lemma says that

$$ m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} n \\ j \end{array} \right ) . $$

Thus, $ m ^ {\mathcal C} ( n ) $ is either always $ 2 ^ {n} $ or, for a Vapnik–Chervonenkis class $ {\mathcal C} $, it is bounded above by a polynomial in $ n $ of degree $ S ( {\mathcal C} ) $. (This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for large $ n $, then $ m ^ {\mathcal C} ( n ) $ is bounded by a polynomial.)

Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [a1]), probability theory and mathematical statistics [a6], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $- algebra $ {\mathcal S} $ of subsets of $ S $, including $ {\mathcal C} $, and a mapping $ Y $ from a complete separable metric space $ U $ onto $ {\mathcal C} $ such that the set of pairs $ ( x,u ) $ with $ x \in Y ( u ) $ is product-measurable in $ S \times U $. A VC class $ {\mathcal C} $ satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the axiom of choice, the VC classes usually encountered in applications are VCM.

Let $ {\mathsf P} $ be a probability measure on $ ( S, {\mathcal S} ) $ and let $ X _ {1} ,X _ {2} , \dots $ be independent coordinates with distribution $ {\mathsf P} $, specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $. Let $ {\mathsf P} _ {n} $ be the sum of the point masses $ {1 / n } $ at $ X _ {i} $ for $ i = 1 \dots n $; it is called an empirical measure for $ {\mathsf P} $( cf. also Empirical process). Then the law of large numbers for empirical measures holds uniformly over any VCM class $ {\mathcal C} $, meaning that the supremum for $ C \in {\mathcal C} $ of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $ approaches zero almost surely as $ n $ becomes large [a7]. This can be improved to a uniform law of the iterated logarithm, meaning that for any VCM class $ {\mathcal C} $, with probability $ 1 $,

$$ {\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } } = $$

$$ = \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } . $$

Moreover, a central limit theorem holds uniformly: if $ {\mathcal C} $ is any VCM class, and $ G _ {\mathsf P} $ assigns to sets in $ {\mathcal C} $ jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $, then for any $ \epsilon > 0 $ there is a sufficiently large $ m $ such that for every $ n \geq m $, there exists a $ G _ {\mathsf P} $ with

$$ \sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon $$

on an event with probability at least $ 1 - \epsilon $. For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $ on $ ( S, {\mathcal S} ) $, the VC property is also necessary.

VC classes can be generated as follows. Let $ V $ be a $ k $- dimensional vector space of real-valued functions on $ S $. For each $ f \in V $, let $ { \mathop{\rm pos} } ( f ) $ be the set where $ f > 0 $. Then the class $ {\mathcal C} $ of all sets $ { \mathop{\rm pos} } ( f ) $ for $ f \in V $ is a VC class with $ S ( {\mathcal C} ) = k $. For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $ is a VCM class for each $ d $. Also, let $ {\mathcal C} $ be a VC class and $ m $ a finite integer. Let $ {\mathcal D} $ be the union of all Boolean algebras of sets (cf. Boolean algebra), each generated by at most $ m $ sets in $ {\mathcal C} $. Then $ {\mathcal D} $ is a VC class. For example, the set of all convex polytopes with at most $ m $ faces in $ \mathbf R ^ {d} $ is a VC class for each $ m $ and $ d $. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [a4].

The class of all finite sets in $ \mathbf R ^ {d} $ and the class of all closed convex sets are not VC classes.

The notion of VC class extends in different ways to a class $ {\mathcal F} $ of real functions on $ S $. The subgraph of a function $ f $ is the set

$$ \left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \} $$

in $ S \times \mathbf R $. Then $ {\mathcal F} $ is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $ is a VC class in $ S \times \mathbf R $; it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $ for $ f \in {\mathcal F} $ and real $ x $ is a VC class in $ S $.

The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $ appears not to be equivalent to any VC-type combinatorial property.

For a probability measure $ {\mathsf P} $ and two events $ A,B $, let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $. For a totally bounded metric space $ ( T,d ) $ and $ \epsilon > 0 $, let $ D ( \epsilon,T,d ) $ be the maximum number of points of $ T $ all at distance at least $ \epsilon $ from each other. For any $ m $ there is a $ K _ {m} < \infty $ such that for every VCM class $ {\mathcal C} $ with $ S ( {\mathcal C} ) = m $ and any $ {\mathsf P} $,

$$ D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } , $$

[a3]. There is a universal constant $ K $ such that for every VCM class $ {\mathcal C} $ and any $ M < \infty $,

$$ { \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq $$

$$ \leq KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) , $$

[a5].

Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $ is a maximal VC class of index $ 1 $, then for any $ A \in {\mathcal C} $ the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $ for $ B \in {\mathcal C} $ has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[a2]. For index greater than $ 1 $ no such structure is known (1996).

A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [a6], Sect. 2.6.

References

[a1] A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" JACM , 6 (1989) pp. 929–965
[a2] R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer , 2 , Wadsworth (1985) pp. 495–508
[a3] D. Haussler, "Sphere packing numbers for subsets of the Boolean -cube with bounded Vapnik–Chervonenkis dimension" J. Combin. Th. A , 69 (1995) pp. 217–232
[a4] G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" Ann. Statist. , 17 (1989) pp. 1441–1446
[a5] M. Talagrand, "Sharper bounds for Gaussian and empirical processes" Ann. Probab. , 22 (1994) pp. 28–76
[a6] A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996)
[a7] V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" Th. Probab. Appl. , 16 (1971) pp. 264–280
[a8] R.M. Dudley, "Central limit theorems for empirical measures" Ann. of Probab. , 6 (1978) pp. 899–929
[a9] R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326
[a10] D. Pollard, "Convergence of stochastic processes" , Springer (1984)
How to Cite This Entry:
Vapnik-Chervonenkis class. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Vapnik-Chervonenkis_class&oldid=49110
This article was adapted from an original article by R.M. DudleyJ.H.J. Einmahl (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article