Vapnik-Chervonenkis class
Vapnik–Červonenkis class
Let $ S $ be a set, $ {\mathcal C} $ a collection of subsets of $ S $ and $ F $ a finite subset of $ S $. Then $ {\mathcal C} $ is said to shatter $ F $ if for every subset $ A $ of $ F $ there is a set $ C $ in $ {\mathcal C} $ with $ A = C \cap F $. If there is a largest, finite $ k $ such that $ {\mathcal C} $ shatters at least one set of cardinality $ k $, then $ {\mathcal C} $ is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $ its Vapnik–Chervonenkis index.
Let $ \Delta ^ {\mathcal C} ( F ) $ be the number of different sets $ C \cap F $ for $ C \in {\mathcal C} $. Let $ m ^ {\mathcal C} ( n ) $ be the maximum of $ \Delta ^ {\mathcal C} ( F ) $ over all sets $ F $ of cardinality $ n $. Thus, $ {\mathcal C} $ is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for some finite $ n $, and then for all $ n > S ( {\mathcal C} ) $. Sauer's lemma says that
$$ m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} n \\ j \end{array} \right ) . $$
Thus, $ m ^ {\mathcal C} ( n ) $ is either always $ 2 ^ {n} $ or, for a Vapnik–Chervonenkis class $ {\mathcal C} $, it is bounded above by a polynomial in $ n $ of degree $ S ( {\mathcal C} ) $. (This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for large $ n $, then $ m ^ {\mathcal C} ( n ) $ is bounded by a polynomial.)
Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [a1]), probability theory and mathematical statistics [a6], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $- algebra $ {\mathcal S} $ of subsets of $ S $, including $ {\mathcal C} $, and a mapping $ Y $ from a complete separable metric space $ U $ onto $ {\mathcal C} $ such that the set of pairs $ ( x,u ) $ with $ x \in Y ( u ) $ is product-measurable in $ S \times U $. A VC class $ {\mathcal C} $ satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the axiom of choice, the VC classes usually encountered in applications are VCM.
Let $ {\mathsf P} $ be a probability measure on $ ( S, {\mathcal S} ) $ and let $ X _ {1} ,X _ {2} , \dots $ be independent coordinates with distribution $ {\mathsf P} $, specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $. Let $ {\mathsf P} _ {n} $ be the sum of the point masses $ {1 / n } $ at $ X _ {i} $ for $ i = 1 \dots n $; it is called an empirical measure for $ {\mathsf P} $( cf. also Empirical process). Then the law of large numbers for empirical measures holds uniformly over any VCM class $ {\mathcal C} $, meaning that the supremum for $ C \in {\mathcal C} $ of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $ approaches zero almost surely as $ n $ becomes large [a7]. This can be improved to a uniform law of the iterated logarithm, meaning that for any VCM class $ {\mathcal C} $, with probability $ 1 $,
$$ {\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } } = $$
$$ = \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } . $$
Moreover, a central limit theorem holds uniformly: if $ {\mathcal C} $ is any VCM class, and $ G _ {\mathsf P} $ assigns to sets in $ {\mathcal C} $ jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $, then for any $ \epsilon > 0 $ there is a sufficiently large $ m $ such that for every $ n \geq m $, there exists a $ G _ {\mathsf P} $ with
$$ \sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon $$
on an event with probability at least $ 1 - \epsilon $. For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $ on $ ( S, {\mathcal S} ) $, the VC property is also necessary.
VC classes can be generated as follows. Let $ V $ be a $ k $- dimensional vector space of real-valued functions on $ S $. For each $ f \in V $, let $ { \mathop{\rm pos} } ( f ) $ be the set where $ f > 0 $. Then the class $ {\mathcal C} $ of all sets $ { \mathop{\rm pos} } ( f ) $ for $ f \in V $ is a VC class with $ S ( {\mathcal C} ) = k $. For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $ is a VCM class for each $ d $. Also, let $ {\mathcal C} $ be a VC class and $ m $ a finite integer. Let $ {\mathcal D} $ be the union of all Boolean algebras of sets (cf. Boolean algebra), each generated by at most $ m $ sets in $ {\mathcal C} $. Then $ {\mathcal D} $ is a VC class. For example, the set of all convex polytopes with at most $ m $ faces in $ \mathbf R ^ {d} $ is a VC class for each $ m $ and $ d $. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [a4].
The class of all finite sets in $ \mathbf R ^ {d} $ and the class of all closed convex sets are not VC classes.
The notion of VC class extends in different ways to a class $ {\mathcal F} $ of real functions on $ S $. The subgraph of a function $ f $ is the set
$$ \left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \} $$
in $ S \times \mathbf R $. Then $ {\mathcal F} $ is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $ is a VC class in $ S \times \mathbf R $; it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $ for $ f \in {\mathcal F} $ and real $ x $ is a VC class in $ S $.
The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $ appears not to be equivalent to any VC-type combinatorial property.
For a probability measure $ {\mathsf P} $ and two events $ A,B $, let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $. For a totally bounded metric space $ ( T,d ) $ and $ \epsilon > 0 $, let $ D ( \epsilon,T,d ) $ be the maximum number of points of $ T $ all at distance at least $ \epsilon $ from each other. For any $ m $ there is a $ K _ {m} < \infty $ such that for every VCM class $ {\mathcal C} $ with $ S ( {\mathcal C} ) = m $ and any $ {\mathsf P} $,
$$ D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } , $$
[a3]. There is a universal constant $ K $ such that for every VCM class $ {\mathcal C} $ and any $ M < \infty $,
$$ { \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq $$
$$ \leq KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) , $$
[a5].
Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $ is a maximal VC class of index $ 1 $, then for any $ A \in {\mathcal C} $ the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $ for $ B \in {\mathcal C} $ has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[a2]. For index greater than $ 1 $ no such structure is known (1996).
A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [a6], Sect. 2.6.
References
[a1] | A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" JACM , 6 (1989) pp. 929–965 |
[a2] | R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer , 2 , Wadsworth (1985) pp. 495–508 |
[a3] | D. Haussler, "Sphere packing numbers for subsets of the Boolean -cube with bounded Vapnik–Chervonenkis dimension" J. Combin. Th. A , 69 (1995) pp. 217–232 |
[a4] | G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" Ann. Statist. , 17 (1989) pp. 1441–1446 |
[a5] | M. Talagrand, "Sharper bounds for Gaussian and empirical processes" Ann. Probab. , 22 (1994) pp. 28–76 |
[a6] | A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996) |
[a7] | V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" Th. Probab. Appl. , 16 (1971) pp. 264–280 |
[a8] | R.M. Dudley, "Central limit theorems for empirical measures" Ann. of Probab. , 6 (1978) pp. 899–929 |
[a9] | R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326 |
[a10] | D. Pollard, "Convergence of stochastic processes" , Springer (1984) |
Vapnik-Chervonenkis class. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Vapnik-Chervonenkis_class&oldid=51622