Difference between revisions of "Vapnik-Chervonenkis class"

Revision as of 08:27, 6 June 2020

Vapnik–Červonenkis class

Let $ S $ be a set, $ {\mathcal C} $ a collection of subsets of $ S $ and $ F $ a finite subset of $ S $. Then $ {\mathcal C} $ is said to shatter $ F $ if for every subset $ A $ of $ F $ there is a set $ C $ in $ {\mathcal C} $ with $ A = C \cap F $. If there is a largest, finite $ k $ such that $ {\mathcal C} $ shatters at least one set of cardinality $ k $, then $ {\mathcal C} $ is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $ its Vapnik–Chervonenkis index.

Let $ \Delta ^ {\mathcal C} ( F ) $ be the number of different sets $ C \cap F $ for $ C \in {\mathcal C} $. Let $ m ^ {\mathcal C} ( n ) $ be the maximum of $ \Delta ^ {\mathcal C} ( F ) $ over all sets $ F $ of cardinality $ n $. Thus, $ {\mathcal C} $ is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for some finite $ n $, and then for all $ n > S ( {\mathcal C} ) $. Sauer's lemma says that

$$ m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} n \\ j \end{array} \right ) . $$

Thus, $ m ^ {\mathcal C} ( n ) $ is either always $ 2 ^ {n} $ or, for a Vapnik–Chervonenkis class $ {\mathcal C} $, it is bounded above by a polynomial in $ n $ of degree $ S ( {\mathcal C} ) $. (This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for large $ n $, then $ m ^ {\mathcal C} ( n ) $ is bounded by a polynomial.)

Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [a1]), probability theory and mathematical statistics [a6], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $- algebra $ {\mathcal S} $ of subsets of $ S $, including $ {\mathcal C} $, and a mapping $ Y $ from a complete separable metric space $ U $ onto $ {\mathcal C} $ such that the set of pairs $ ( x,u ) $ with $ x \in Y ( u ) $ is product-measurable in $ S \times U $. A VC class $ {\mathcal C} $ satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the axiom of choice, the VC classes usually encountered in applications are VCM.

Let $ {\mathsf P} $ be a probability measure on $ ( S, {\mathcal S} ) $ and let $ X _ {1} ,X _ {2} , \dots $ be independent coordinates with distribution $ {\mathsf P} $, specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $. Let $ {\mathsf P} _ {n} $ be the sum of the point masses $ {1 / n } $ at $ X _ {i} $ for $ i = 1 \dots n $; it is called an empirical measure for $ {\mathsf P} $( cf. also Empirical process). Then the law of large numbers for empirical measures holds uniformly over any VCM class $ {\mathcal C} $, meaning that the supremum for $ C \in {\mathcal C} $ of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $ approaches zero almost surely as $ n $ becomes large [a7]. This can be improved to a uniform law of the iterated logarithm, meaning that for any VCM class $ {\mathcal C} $, with probability $ 1 $,

$$ {\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } } = $$

$$ = \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } . $$

Moreover, a central limit theorem holds uniformly: if $ {\mathcal C} $ is any VCM class, and $ G _ {\mathsf P} $ assigns to sets in $ {\mathcal C} $ jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $, then for any $ \epsilon > 0 $ there is a sufficiently large $ m $ such that for every $ n \geq m $, there exists a $ G _ {\mathsf P} $ with

$$ \sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon $$

on an event with probability at least $ 1 - \epsilon $. For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $ on $ ( S, {\mathcal S} ) $, the VC property is also necessary.

VC classes can be generated as follows. Let $ V $ be a $ k $- dimensional vector space of real-valued functions on $ S $. For each $ f \in V $, let $ { \mathop{\rm pos} } ( f ) $ be the set where $ f > 0 $. Then the class $ {\mathcal C} $ of all sets $ { \mathop{\rm pos} } ( f ) $ for $ f \in V $ is a VC class with $ S ( {\mathcal C} ) = k $. For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $ is a VCM class for each $ d $. Also, let $ {\mathcal C} $ be a VC class and $ m $ a finite integer. Let $ {\mathcal D} $ be the union of all Boolean algebras of sets (cf. Boolean algebra), each generated by at most $ m $ sets in $ {\mathcal C} $. Then $ {\mathcal D} $ is a VC class. For example, the set of all convex polytopes with at most $ m $ faces in $ \mathbf R ^ {d} $ is a VC class for each $ m $ and $ d $. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [a4].

The class of all finite sets in $ \mathbf R ^ {d} $ and the class of all closed convex sets are not VC classes.

The notion of VC class extends in different ways to a class $ {\mathcal F} $ of real functions on $ S $. The subgraph of a function $ f $ is the set

$$ \left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \} $$

in $ S \times \mathbf R $. Then $ {\mathcal F} $ is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $ is a VC class in $ S \times \mathbf R $; it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $ for $ f \in {\mathcal F} $ and real $ x $ is a VC class in $ S $.

The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $ appears not to be equivalent to any VC-type combinatorial property.

For a probability measure $ {\mathsf P} $ and two events $ A,B $, let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $. For a totally bounded metric space $ ( T,d ) $ and $ \epsilon > 0 $, let $ D ( \epsilon,T,d ) $ be the maximum number of points of $ T $ all at distance at least $ \epsilon $ from each other. For any $ m $ there is a $ K _ {m} < \infty $ such that for every VCM class $ {\mathcal C} $ with $ S ( {\mathcal C} ) = m $ and any $ {\mathsf P} $,

$$ D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } , $$

[a3]. There is a universal constant $ K $ such that for every VCM class $ {\mathcal C} $ and any $ M < \infty $,

$$ { \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq $$

$$ \leq KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) , $$

[a5].

Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $ is a maximal VC class of index $ 1 $, then for any $ A \in {\mathcal C} $ the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $ for $ B \in {\mathcal C} $ has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[a2]. For index greater than $ 1 $ no such structure is known (1996).

A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [a6], Sect. 2.6.

References

[a1]	A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" JACM , 6 (1989) pp. 929–965
[a2]	R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer , 2 , Wadsworth (1985) pp. 495–508
[a3]	D. Haussler, "Sphere packing numbers for subsets of the Boolean -cube with bounded Vapnik–Chervonenkis dimension" J. Combin. Th. A , 69 (1995) pp. 217–232
[a4]	G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" Ann. Statist. , 17 (1989) pp. 1441–1446
[a5]	M. Talagrand, "Sharper bounds for Gaussian and empirical processes" Ann. Probab. , 22 (1994) pp. 28–76
[a6]	A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996)
[a7]	V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" Th. Probab. Appl. , 16 (1971) pp. 264–280
[a8]	R.M. Dudley, "Central limit theorems for empirical measures" Ann. of Probab. , 6 (1978) pp. 899–929
[a9]	R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326
[a10]	D. Pollard, "Convergence of stochastic processes" , Springer (1984)

How to Cite This Entry:
Vapnik-Chervonenkis class. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Vapnik-Chervonenkis_class&oldid=49110

This article was adapted from an original article by R.M. DudleyJ.H.J. Einmahl (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Vapnik-Chervonenkis class"

Revision as of 08:27, 6 June 2020

References

@@ Line 1: / Line 1: @@
+<!--
+v1100201.png
+$#A+1 = 140 n = 1
+$#C+1 = 140 : ~/encyclopedia/old_files/data/V110/V.1100020 Vapnik\ANDChervonenkis class,
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 ''Vapnik–Červonenkis class''
-Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100201.png" /> be a set, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100202.png" /> a collection of subsets of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100203.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100204.png" /> a finite subset of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100205.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100206.png" /> is said to shatter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100207.png" /> if for every subset <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100208.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100209.png" /> there is a set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002010.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002011.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002012.png" />. If there is a largest, finite <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002013.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002014.png" /> shatters at least one set of cardinality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002015.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002016.png" /> is called a Vapnik–Chervonenkis class, or VC class, of sets and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002017.png" /> its Vapnik–Chervonenkis index.
+Let  $  S $
+be a set,  $  {\mathcal C} $
+a collection of subsets of  $  S $
+and  $  F $
+a finite subset of  $  S $.
+Then  $  {\mathcal C} $
+is said to shatter  $  F $
+if for every subset  $  A $
+of  $  F $
+there is a set  $  C $
+in  $  {\mathcal C} $
+with  $  A = C \cap F $.
+If there is a largest, finite  $  k $
+such that  $  {\mathcal C} $
+shatters at least one set of cardinality  $  k $,
+then  $  {\mathcal C} $
+is called a Vapnik–Chervonenkis class, or VC class, of sets and  $  S ( {\mathcal C} ) = k $
+its Vapnik–Chervonenkis index.
-Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002018.png" /> be the number of different sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002019.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002020.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002021.png" /> be the maximum of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002022.png" /> over all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002023.png" /> of cardinality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002024.png" />. Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002025.png" /> is a Vapnik–Chervonenkis class if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002026.png" /> for some finite <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002027.png" />, and then for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002028.png" />. Sauer's lemma says that
+Let  $  \Delta  ^  {\mathcal C}  ( F ) $
+be the number of different sets  $  C \cap F $
+for  $  C \in {\mathcal C} $.
+Let  $  m  ^  {\mathcal C}  ( n ) $
+be the maximum of  $  \Delta  ^  {\mathcal C}  ( F ) $
+over all sets  $  F $
+of cardinality  $  n $.
+Thus,  $  {\mathcal C} $
+is a Vapnik–Chervonenkis class if and only if  $  m  ^  {\mathcal C}  ( n ) < 2  ^ {n} $
+for some finite  $  n $,
+and then for all  $  n > S ( {\mathcal C} ) $.
+Sauer's lemma says that
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002029.png" /></td> </tr></table>
+$$
+m  ^  {\mathcal C}  ( n ) \leq  \sum _ {j = 0 } ^ { {S }  ( {\mathcal C} ) } \left ( \begin{array}{c}
+n \\
+ j
+\end{array}
+ \right ) .
+$$
-Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002030.png" /> is either always <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002031.png" /> or, for a Vapnik–Chervonenkis class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002032.png" />, it is bounded above by a polynomial in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002033.png" /> of degree <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002034.png" />. (This is the so-called Vapnik–Chervonenkis property: if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002035.png" /> for large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002036.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002037.png" /> is bounded by a polynomial.)
+Thus,  $  m  ^  {\mathcal C}  ( n ) $
+is either always  $  2  ^ {n} $
+or, for a Vapnik–Chervonenkis class  $  {\mathcal C} $,
+it is bounded above by a polynomial in  $  n $
+of degree  $  S ( {\mathcal C} ) $.
+(This is the so-called Vapnik–Chervonenkis property: if  $  m  ^  {\mathcal C}  ( n ) < 2  ^ {n} $
+for large  $  n $,
+then  $  m  ^  {\mathcal C}  ( n ) $
+is bounded by a polynomial.)
-Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002038.png" />-algebra <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002039.png" /> of subsets of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002040.png" />, including <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002041.png" />, and a mapping <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002042.png" /> from a complete separable [[Metric space|metric space]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002043.png" /> onto <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002044.png" /> such that the set of pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002045.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002046.png" /> is product-measurable in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002047.png" />. A VC class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002048.png" /> satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the [[Axiom of choice|axiom of choice]], the VC classes usually encountered in applications are VCM.
+Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a  $  \sigma $-
+algebra  $  {\mathcal S} $
+of subsets of  $  S $,
+including  $  {\mathcal C} $,
+and a mapping  $  Y $
+from a complete separable [[Metric space|metric space]]  $  U $
+onto  $  {\mathcal C} $
+such that the set of pairs  $  ( x,u ) $
+with  $  x \in Y ( u ) $
+is product-measurable in  $  S \times U $.
+A VC class  $  {\mathcal C} $
+satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the [[Axiom of choice|axiom of choice]], the VC classes usually encountered in applications are VCM.
-Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002049.png" /> be a probability measure on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002050.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002051.png" /> be independent coordinates with distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002052.png" />, specifically, on a countable Cartesian product of copies of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002053.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002054.png" /> be the sum of the point masses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002055.png" /> at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002056.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002057.png" />; it is called an empirical measure for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002058.png" /> (cf. also [[Empirical process|Empirical process]]). Then the [[Law of large numbers|law of large numbers]] for empirical measures holds uniformly over any VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002059.png" />, meaning that the supremum for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002060.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002061.png" /> approaches zero almost surely as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002062.png" /> becomes large [[#References|[a7]]]. This can be improved to a uniform [[Law of the iterated logarithm|law of the iterated logarithm]], meaning that for any VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002063.png" />, with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002064.png" />,
+Let  $  {\mathsf P} $
+be a probability measure on  $  ( S, {\mathcal S} ) $
+and let  $  X _ {1} ,X _ {2} , \dots $
+be independent coordinates with distribution  $  {\mathsf P} $,
+specifically, on a countable Cartesian product of copies of  $  ( S, {\mathcal S}, {\mathsf P} ) $.
+Let  $  {\mathsf P} _ {n} $
+be the sum of the point masses  $  {1 / n } $
+at  $  X _ {i} $
+for  $  i = 1 \dots n $;
+it is called an empirical measure for  $  {\mathsf P} $(
+cf. also [[Empirical process|Empirical process]]). Then the [[Law of large numbers|law of large numbers]] for empirical measures holds uniformly over any VCM class  $  {\mathcal C} $,
+meaning that the supremum for  $  C \in {\mathcal C} $
+of  $  | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $
+approaches zero almost surely as  $  n $
+becomes large [[#References|[a7]]]. This can be improved to a uniform [[Law of the iterated logarithm|law of the iterated logarithm]], meaning that for any VCM class  $  {\mathcal C} $,
+with probability  $  1 $,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002065.png" /></td> </tr></table>
+$$
+{\lim\limits }  \sup  _ {n \rightarrow \infty } n ^ {1/2 }  \sup  _ {C \in {\mathcal C} } {
+\frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } }
+ } =
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002066.png" /></td> </tr></table>
+$$
+=
+ \sup  _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } .
+$$
-Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002067.png" /> is any VCM class, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002068.png" /> assigns to sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002069.png" /> jointly normal (Gaussian) random variables with mean zero and covariances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002070.png" />, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002071.png" /> there is a sufficiently large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002072.png" /> such that for every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002073.png" />, there exists a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002074.png" /> with
+Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if  $  {\mathcal C} $
+is any VCM class, and  $  G _  {\mathsf P}  $
+assigns to sets in  $  {\mathcal C} $
+jointly normal (Gaussian) random variables with mean zero and covariances  $  {\mathsf E} G _  {\mathsf P}  ( A ) G _  {\mathsf P}  ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $,
+then for any  $  \epsilon > 0 $
+there is a sufficiently large  $  m $
+such that for every  $  n \geq  m $,
+there exists a  $  G _  {\mathsf P}  $
+with
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002075.png" /></td> </tr></table>
+$$
+ \sup  _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _  {\mathsf P}  ( A ) } \right | < \epsilon
+$$
-on an event with probability at least <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002076.png" />. For the uniform central limit theorem to hold for each probability measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002077.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002078.png" />, the VC property is also necessary.
+on an event with probability at least  $  1 - \epsilon $.
+For the uniform central limit theorem to hold for each probability measure  $  {\mathsf P} $
+on  $  ( S, {\mathcal S} ) $,
+the VC property is also necessary.
-VC classes can be generated as follows. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002079.png" /> be a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002080.png" />-dimensional [[Vector space|vector space]] of real-valued functions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002081.png" />. For each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002082.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002083.png" /> be the set where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002084.png" />. Then the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002085.png" /> of all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002086.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002087.png" /> is a VC class with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002088.png" />. For example, the set of all ellipsoids in a Euclidean space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002089.png" /> is a VCM class for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002090.png" />. Also, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002091.png" /> be a VC class and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002092.png" /> a finite integer. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002093.png" /> be the union of all Boolean algebras of sets (cf. [[Boolean algebra|Boolean algebra]]), each generated by at most <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002094.png" /> sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002095.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002096.png" /> is a VC class. For example, the set of all convex polytopes with at most <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002097.png" /> faces in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002098.png" /> is a VC class for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002099.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020100.png" />. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [[#References|[a4]]].
+VC classes can be generated as follows. Let  $  V $
+be a  $  k $-
+dimensional [[Vector space|vector space]] of real-valued functions on  $  S $.
+For each  $  f \in V $,
+let  $  { \mathop{\rm pos} } ( f ) $
+be the set where  $  f > 0 $.
+Then the class  $  {\mathcal C} $
+of all sets  $  { \mathop{\rm pos} } ( f ) $
+for  $  f \in V $
+is a VC class with  $  S ( {\mathcal C} ) = k $.
+For example, the set of all ellipsoids in a Euclidean space  $  \mathbf R  ^ {d} $
+is a VCM class for each  $  d $.
+Also, let  $  {\mathcal C} $
+be a VC class and  $  m $
+a finite integer. Let  $  {\mathcal D} $
+be the union of all Boolean algebras of sets (cf. [[Boolean algebra|Boolean algebra]]), each generated by at most  $  m $
+sets in  $  {\mathcal C} $.
+Then  $  {\mathcal D} $
+is a VC class. For example, the set of all convex polytopes with at most  $  m $
+faces in  $  \mathbf R  ^ {d} $
+is a VC class for each  $  m $
+and  $  d $.
+Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [[#References|[a4]]].
-The class of all finite sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020101.png" /> and the class of all closed convex sets are not VC classes.
+The class of all finite sets in  $  \mathbf R  ^ {d} $
+and the class of all closed convex sets are not VC classes.
-The notion of VC class extends in different ways to a class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020102.png" /> of real functions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020103.png" />. The subgraph of a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020104.png" /> is the set
+The notion of VC class extends in different ways to a class  $  {\mathcal F} $
+of real functions on  $  S $.
+The subgraph of a function  $  f $
+is the set
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020105.png" /></td> </tr></table>
+$$
+\left \{ {( s,x ) } : {0 \leq  x \leq  f ( s )  \textrm{ or  }  f ( s ) \leq  x \leq  0 } \right \}
+$$
-in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020106.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020107.png" /> is called a VC subgraph class if the collection of all subgraphs of functions in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020108.png" /> is a VC class in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020109.png" />; it is called a VC major class if the class of all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020110.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020111.png" /> and real <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020112.png" /> is a VC class in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020113.png" />.
+in  $  S \times \mathbf R $.
+Then  $  {\mathcal F} $
+is called a VC subgraph class if the collection of all subgraphs of functions in  $  {\mathcal F} $
+is a VC class in  $  S \times \mathbf R $;
+it is called a VC major class if the class of all sets  $  \{ {s \in S } : {f ( s ) > x } \} $
+for  $  f \in {\mathcal F} $
+and real  $  x $
+is a VC class in  $  S $.
-The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020114.png" /> appears not to be equivalent to any VC-type combinatorial property.
+The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all  $  {\mathsf P} $
+appears not to be equivalent to any VC-type combinatorial property.
-For a probability measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020115.png" /> and two events <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020116.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020117.png" />. For a totally bounded metric space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020118.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020119.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020120.png" /> be the maximum number of points of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020121.png" /> all at distance at least <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020122.png" /> from each other. For any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020123.png" /> there is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020124.png" /> such that for every VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020125.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020126.png" /> and any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020127.png" />,
+For a probability measure  $  {\mathsf P} $
+and two events  $  A,B $,
+let  $  d _ {1, {\mathsf P} }  ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $.
+For a totally bounded metric space  $  ( T,d ) $
+and  $  \epsilon > 0 $,
+let  $  D ( \epsilon,T,d ) $
+be the maximum number of points of  $  T $
+all at distance at least  $  \epsilon $
+from each other. For any  $  m $
+there is a  $  K _ {m} < \infty $
+such that for every VCM class  $  {\mathcal C} $
+with  $  S ( {\mathcal C} ) = m $
+and any  $  {\mathsf P} $,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020128.png" /></td> </tr></table>
+$$
+D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} }  ) \leq  K _ {m} \epsilon ^ {- m } ,
+$$
-[[#References|[a3]]]. There is a universal constant <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020129.png" /> such that for every VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020130.png" /> and any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020131.png" />,
+[[#References|[a3]]]. There is a universal constant  $  K $
+such that for every VCM class  $  {\mathcal C} $
+and any  $  M < \infty $,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020132.png" /></td> </tr></table>
+$$
+{ \mathop{\rm Pr} } \left \{  \sup  _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020133.png" /></td> </tr></table>
+$$
+\leq
+KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M  ^ {2} ) ,
+$$
 [[#References|[a5]]].
-Every VC class is included in a maximal class with the same VC index. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020134.png" /> is a maximal VC class of index <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020135.png" />, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020136.png" /> the set of symmetric differences <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020137.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020138.png" /> has a tree-like partial ordering by inclusion, and conversely, such an ordering implies <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020139.png" /> [[#References|[a2]]]. For index greater than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020140.png" /> no such structure is known (1996).
+Every VC class is included in a maximal class with the same VC index. If  $  {\mathcal C} $
+is a maximal VC class of index  $  1 $,
+then for any  $  A \in {\mathcal C} $
+the set of symmetric differences  $  ( B \setminus  A ) \cup ( A \setminus  B ) $
+for  $  B \in {\mathcal C} $
+has a tree-like partial ordering by inclusion, and conversely, such an ordering implies  $  S ( {\mathcal C} ) = 1 $[[#References|[a2]]]. For index greater than  $  1 $
+no such structure is known (1996).
 A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [[#References|[a6]]], Sect. 2.6.