Difference between revisions of "Vapnik-Chervonenkis class"
Ulf Rehmann (talk | contribs) m (moved Vapnik–Chervonenkis class to Vapnik-Chervonenkis class: ascii title) |
m (fix tex) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | <!-- | ||
+ | v1100201.png | ||
+ | $#A+1 = 140 n = 1 | ||
+ | $#C+1 = 140 : ~/encyclopedia/old_files/data/V110/V.1100020 Vapnik\ANDChervonenkis class, | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
+ | |||
+ | {{TEX|auto}} | ||
+ | {{TEX|done}} | ||
+ | |||
''Vapnik–Červonenkis class'' | ''Vapnik–Červonenkis class'' | ||
− | Let | + | Let $ S $ |
+ | be a set, $ {\mathcal C} $ | ||
+ | a collection of subsets of $ S $ | ||
+ | and $ F $ | ||
+ | a finite subset of $ S $. | ||
+ | Then $ {\mathcal C} $ | ||
+ | is said to shatter $ F $ | ||
+ | if for every subset $ A $ | ||
+ | of $ F $ | ||
+ | there is a set $ C $ | ||
+ | in $ {\mathcal C} $ | ||
+ | with $ A = C \cap F $. | ||
+ | If there is a largest, finite $ k $ | ||
+ | such that $ {\mathcal C} $ | ||
+ | shatters at least one set of cardinality $ k $, | ||
+ | then $ {\mathcal C} $ | ||
+ | is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $ | ||
+ | its Vapnik–Chervonenkis index. | ||
− | Let | + | Let $ \Delta ^ {\mathcal C} ( F ) $ |
+ | be the number of different sets $ C \cap F $ | ||
+ | for $ C \in {\mathcal C} $. | ||
+ | Let $ m ^ {\mathcal C} ( n ) $ | ||
+ | be the maximum of $ \Delta ^ {\mathcal C} ( F ) $ | ||
+ | over all sets $ F $ | ||
+ | of cardinality $ n $. | ||
+ | Thus, $ {\mathcal C} $ | ||
+ | is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ | ||
+ | for some finite $ n $, | ||
+ | and then for all $ n > S ( {\mathcal C} ) $. | ||
+ | Sauer's lemma says that | ||
− | + | $$ | |
+ | m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} | ||
+ | n \\ | ||
+ | j | ||
+ | \end{array} | ||
+ | \right ) . | ||
+ | $$ | ||
− | Thus, | + | Thus, $ m ^ {\mathcal C} ( n ) $ |
+ | is either always $ 2 ^ {n} $ | ||
+ | or, for a Vapnik–Chervonenkis class $ {\mathcal C} $, | ||
+ | it is bounded above by a polynomial in $ n $ | ||
+ | of degree $ S ( {\mathcal C} ) $. | ||
+ | (This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ | ||
+ | for large $ n $, | ||
+ | then $ m ^ {\mathcal C} ( n ) $ | ||
+ | is bounded by a polynomial.) | ||
− | Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a | + | Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $- |
+ | algebra $ {\mathcal S} $ | ||
+ | of subsets of $ S $, | ||
+ | including $ {\mathcal C} $, | ||
+ | and a mapping $ Y $ | ||
+ | from a complete separable [[Metric space|metric space]] $ U $ | ||
+ | onto $ {\mathcal C} $ | ||
+ | such that the set of pairs $ ( x,u ) $ | ||
+ | with $ x \in Y ( u ) $ | ||
+ | is product-measurable in $ S \times U $. | ||
+ | A VC class $ {\mathcal C} $ | ||
+ | satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the [[Axiom of choice|axiom of choice]], the VC classes usually encountered in applications are VCM. | ||
− | Let | + | Let $ {\mathsf P} $ |
+ | be a probability measure on $ ( S, {\mathcal S} ) $ | ||
+ | and let $ X _ {1} ,X _ {2} , \dots $ | ||
+ | be independent coordinates with distribution $ {\mathsf P} $, | ||
+ | specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $. | ||
+ | Let $ {\mathsf P} _ {n} $ | ||
+ | be the sum of the point masses $ {1 / n } $ | ||
+ | at $ X _ {i} $ | ||
+ | for $ i = 1 \dots n $; | ||
+ | it is called an empirical measure for $ {\mathsf P} $( | ||
+ | cf. also [[Empirical process|Empirical process]]). Then the [[Law of large numbers|law of large numbers]] for empirical measures holds uniformly over any VCM class $ {\mathcal C} $, | ||
+ | meaning that the supremum for $ C \in {\mathcal C} $ | ||
+ | of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $ | ||
+ | approaches zero almost surely as $ n $ | ||
+ | becomes large [[#References|[a7]]]. This can be improved to a uniform [[Law of the iterated logarithm|law of the iterated logarithm]], meaning that for any VCM class $ {\mathcal C} $, | ||
+ | with probability $ 1 $, | ||
− | + | $$ | |
+ | {\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { | ||
+ | \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } | ||
+ | } = | ||
+ | $$ | ||
− | + | $$ | |
+ | = | ||
+ | \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } . | ||
+ | $$ | ||
− | Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if | + | Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if $ {\mathcal C} $ |
+ | is any VCM class, and $ G _ {\mathsf P} $ | ||
+ | assigns to sets in $ {\mathcal C} $ | ||
+ | jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $, | ||
+ | then for any $ \epsilon > 0 $ | ||
+ | there is a sufficiently large $ m $ | ||
+ | such that for every $ n \geq m $, | ||
+ | there exists a $ G _ {\mathsf P} $ | ||
+ | with | ||
− | + | $$ | |
+ | \sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon | ||
+ | $$ | ||
− | on an event with probability at least | + | on an event with probability at least $ 1 - \epsilon $. |
+ | For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $ | ||
+ | on $ ( S, {\mathcal S} ) $, | ||
+ | the VC property is also necessary. | ||
− | VC classes can be generated as follows. Let | + | VC classes can be generated as follows. Let $ V $ |
+ | be a $ k $- | ||
+ | dimensional [[Vector space|vector space]] of real-valued functions on $ S $. | ||
+ | For each $ f \in V $, | ||
+ | let $ { \mathop{\rm pos} } ( f ) $ | ||
+ | be the set where $ f > 0 $. | ||
+ | Then the class $ {\mathcal C} $ | ||
+ | of all sets $ { \mathop{\rm pos} } ( f ) $ | ||
+ | for $ f \in V $ | ||
+ | is a VC class with $ S ( {\mathcal C} ) = k $. | ||
+ | For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $ | ||
+ | is a VCM class for each $ d $. | ||
+ | Also, let $ {\mathcal C} $ | ||
+ | be a VC class and $ m $ | ||
+ | a finite integer. Let $ {\mathcal D} $ | ||
+ | be the union of all Boolean algebras of sets (cf. [[Boolean algebra|Boolean algebra]]), each generated by at most $ m $ | ||
+ | sets in $ {\mathcal C} $. | ||
+ | Then $ {\mathcal D} $ | ||
+ | is a VC class. For example, the set of all convex polytopes with at most $ m $ | ||
+ | faces in $ \mathbf R ^ {d} $ | ||
+ | is a VC class for each $ m $ | ||
+ | and $ d $. | ||
+ | Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [[#References|[a4]]]. | ||
− | The class of all finite sets in | + | The class of all finite sets in $ \mathbf R ^ {d} $ |
+ | and the class of all closed convex sets are not VC classes. | ||
− | The notion of VC class extends in different ways to a class | + | The notion of VC class extends in different ways to a class $ {\mathcal F} $ |
+ | of real functions on $ S $. | ||
+ | The subgraph of a function $ f $ | ||
+ | is the set | ||
− | + | $$ | |
+ | \left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \} | ||
+ | $$ | ||
− | in | + | in $ S \times \mathbf R $. |
+ | Then $ {\mathcal F} $ | ||
+ | is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $ | ||
+ | is a VC class in $ S \times \mathbf R $; | ||
+ | it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $ | ||
+ | for $ f \in {\mathcal F} $ | ||
+ | and real $ x $ | ||
+ | is a VC class in $ S $. | ||
− | The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all | + | The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $ |
+ | appears not to be equivalent to any VC-type combinatorial property. | ||
− | For a probability measure | + | For a probability measure $ {\mathsf P} $ |
+ | and two events $ A,B $, | ||
+ | let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $. | ||
+ | For a totally bounded metric space $ ( T,d ) $ | ||
+ | and $ \epsilon > 0 $, | ||
+ | let $ D ( \epsilon,T,d ) $ | ||
+ | be the maximum number of points of $ T $ | ||
+ | all at distance at least $ \epsilon $ | ||
+ | from each other. For any $ m $ | ||
+ | there is a $ K _ {m} < \infty $ | ||
+ | such that for every VCM class $ {\mathcal C} $ | ||
+ | with $ S ( {\mathcal C} ) = m $ | ||
+ | and any $ {\mathsf P} $, | ||
− | + | $$ | |
+ | D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } , | ||
+ | $$ | ||
− | [[#References|[a3]]]. There is a universal constant | + | [[#References|[a3]]]. There is a universal constant $ K $ |
+ | such that for every VCM class $ {\mathcal C} $ | ||
+ | and any $ M < \infty $, | ||
− | + | $$ | |
+ | { \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq | ||
+ | $$ | ||
− | + | $$ | |
+ | \leq | ||
+ | KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) , | ||
+ | $$ | ||
[[#References|[a5]]]. | [[#References|[a5]]]. | ||
− | Every VC class is included in a maximal class with the same VC index. If | + | Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $ |
+ | is a maximal VC class of index $ 1 $, | ||
+ | then for any $ A \in {\mathcal C} $ | ||
+ | the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $ | ||
+ | for $ B \in {\mathcal C} $ | ||
+ | has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[[#References|[a2]]]. For index greater than $ 1 $ | ||
+ | no such structure is known (1996). | ||
A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [[#References|[a6]]], Sect. 2.6. | A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [[#References|[a6]]], Sect. 2.6. | ||
====References==== | ====References==== | ||
− | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" ''JACM'' , '''6''' (1989) pp. 929–965</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , ''Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer'' , '''2''' , Wadsworth (1985) pp. 495–508</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top"> D. Haussler, "Sphere packing numbers for subsets of the Boolean | + | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" ''JACM'' , '''6''' (1989) pp. 929–965</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , ''Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer'' , '''2''' , Wadsworth (1985) pp. 495–508</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top"> D. Haussler, "Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik–Chervonenkis dimension" ''J. Combin. Th. A'' , '''69''' (1995) pp. 217–232</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top"> G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" ''Ann. Statist.'' , '''17''' (1989) pp. 1441–1446</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top"> M. Talagrand, "Sharper bounds for Gaussian and empirical processes" ''Ann. Probab.'' , '''22''' (1994) pp. 28–76</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top"> A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996)</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top"> V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" ''Th. Probab. Appl.'' , '''16''' (1971) pp. 264–280</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top"> R.M. Dudley, "Central limit theorems for empirical measures" ''Ann. of Probab.'' , '''6''' (1978) pp. 899–929</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top"> R.M. Dudley, "Universal Donsker classes and metric entropy" ''Ann. of Probab.'' , '''15''' (1987) pp. 1306–1326</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top"> D. Pollard, "Convergence of stochastic processes" , Springer (1984)</TD></TR></table> |
Latest revision as of 12:43, 19 February 2021
Vapnik–Červonenkis class
Let $ S $ be a set, $ {\mathcal C} $ a collection of subsets of $ S $ and $ F $ a finite subset of $ S $. Then $ {\mathcal C} $ is said to shatter $ F $ if for every subset $ A $ of $ F $ there is a set $ C $ in $ {\mathcal C} $ with $ A = C \cap F $. If there is a largest, finite $ k $ such that $ {\mathcal C} $ shatters at least one set of cardinality $ k $, then $ {\mathcal C} $ is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $ its Vapnik–Chervonenkis index.
Let $ \Delta ^ {\mathcal C} ( F ) $ be the number of different sets $ C \cap F $ for $ C \in {\mathcal C} $. Let $ m ^ {\mathcal C} ( n ) $ be the maximum of $ \Delta ^ {\mathcal C} ( F ) $ over all sets $ F $ of cardinality $ n $. Thus, $ {\mathcal C} $ is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for some finite $ n $, and then for all $ n > S ( {\mathcal C} ) $. Sauer's lemma says that
$$ m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} n \\ j \end{array} \right ) . $$
Thus, $ m ^ {\mathcal C} ( n ) $ is either always $ 2 ^ {n} $ or, for a Vapnik–Chervonenkis class $ {\mathcal C} $, it is bounded above by a polynomial in $ n $ of degree $ S ( {\mathcal C} ) $. (This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for large $ n $, then $ m ^ {\mathcal C} ( n ) $ is bounded by a polynomial.)
Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [a1]), probability theory and mathematical statistics [a6], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $- algebra $ {\mathcal S} $ of subsets of $ S $, including $ {\mathcal C} $, and a mapping $ Y $ from a complete separable metric space $ U $ onto $ {\mathcal C} $ such that the set of pairs $ ( x,u ) $ with $ x \in Y ( u ) $ is product-measurable in $ S \times U $. A VC class $ {\mathcal C} $ satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the axiom of choice, the VC classes usually encountered in applications are VCM.
Let $ {\mathsf P} $ be a probability measure on $ ( S, {\mathcal S} ) $ and let $ X _ {1} ,X _ {2} , \dots $ be independent coordinates with distribution $ {\mathsf P} $, specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $. Let $ {\mathsf P} _ {n} $ be the sum of the point masses $ {1 / n } $ at $ X _ {i} $ for $ i = 1 \dots n $; it is called an empirical measure for $ {\mathsf P} $( cf. also Empirical process). Then the law of large numbers for empirical measures holds uniformly over any VCM class $ {\mathcal C} $, meaning that the supremum for $ C \in {\mathcal C} $ of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $ approaches zero almost surely as $ n $ becomes large [a7]. This can be improved to a uniform law of the iterated logarithm, meaning that for any VCM class $ {\mathcal C} $, with probability $ 1 $,
$$ {\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } } = $$
$$ = \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } . $$
Moreover, a central limit theorem holds uniformly: if $ {\mathcal C} $ is any VCM class, and $ G _ {\mathsf P} $ assigns to sets in $ {\mathcal C} $ jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $, then for any $ \epsilon > 0 $ there is a sufficiently large $ m $ such that for every $ n \geq m $, there exists a $ G _ {\mathsf P} $ with
$$ \sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon $$
on an event with probability at least $ 1 - \epsilon $. For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $ on $ ( S, {\mathcal S} ) $, the VC property is also necessary.
VC classes can be generated as follows. Let $ V $ be a $ k $- dimensional vector space of real-valued functions on $ S $. For each $ f \in V $, let $ { \mathop{\rm pos} } ( f ) $ be the set where $ f > 0 $. Then the class $ {\mathcal C} $ of all sets $ { \mathop{\rm pos} } ( f ) $ for $ f \in V $ is a VC class with $ S ( {\mathcal C} ) = k $. For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $ is a VCM class for each $ d $. Also, let $ {\mathcal C} $ be a VC class and $ m $ a finite integer. Let $ {\mathcal D} $ be the union of all Boolean algebras of sets (cf. Boolean algebra), each generated by at most $ m $ sets in $ {\mathcal C} $. Then $ {\mathcal D} $ is a VC class. For example, the set of all convex polytopes with at most $ m $ faces in $ \mathbf R ^ {d} $ is a VC class for each $ m $ and $ d $. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [a4].
The class of all finite sets in $ \mathbf R ^ {d} $ and the class of all closed convex sets are not VC classes.
The notion of VC class extends in different ways to a class $ {\mathcal F} $ of real functions on $ S $. The subgraph of a function $ f $ is the set
$$ \left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \} $$
in $ S \times \mathbf R $. Then $ {\mathcal F} $ is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $ is a VC class in $ S \times \mathbf R $; it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $ for $ f \in {\mathcal F} $ and real $ x $ is a VC class in $ S $.
The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $ appears not to be equivalent to any VC-type combinatorial property.
For a probability measure $ {\mathsf P} $ and two events $ A,B $, let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $. For a totally bounded metric space $ ( T,d ) $ and $ \epsilon > 0 $, let $ D ( \epsilon,T,d ) $ be the maximum number of points of $ T $ all at distance at least $ \epsilon $ from each other. For any $ m $ there is a $ K _ {m} < \infty $ such that for every VCM class $ {\mathcal C} $ with $ S ( {\mathcal C} ) = m $ and any $ {\mathsf P} $,
$$ D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } , $$
[a3]. There is a universal constant $ K $ such that for every VCM class $ {\mathcal C} $ and any $ M < \infty $,
$$ { \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq $$
$$ \leq KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) , $$
[a5].
Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $ is a maximal VC class of index $ 1 $, then for any $ A \in {\mathcal C} $ the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $ for $ B \in {\mathcal C} $ has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[a2]. For index greater than $ 1 $ no such structure is known (1996).
A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [a6], Sect. 2.6.
References
[a1] | A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" JACM , 6 (1989) pp. 929–965 |
[a2] | R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer , 2 , Wadsworth (1985) pp. 495–508 |
[a3] | D. Haussler, "Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik–Chervonenkis dimension" J. Combin. Th. A , 69 (1995) pp. 217–232 |
[a4] | G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" Ann. Statist. , 17 (1989) pp. 1441–1446 |
[a5] | M. Talagrand, "Sharper bounds for Gaussian and empirical processes" Ann. Probab. , 22 (1994) pp. 28–76 |
[a6] | A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996) |
[a7] | V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" Th. Probab. Appl. , 16 (1971) pp. 264–280 |
[a8] | R.M. Dudley, "Central limit theorems for empirical measures" Ann. of Probab. , 6 (1978) pp. 899–929 |
[a9] | R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326 |
[a10] | D. Pollard, "Convergence of stochastic processes" , Springer (1984) |
Vapnik-Chervonenkis class. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Vapnik-Chervonenkis_class&oldid=23097