Let be a set, a collection of subsets of and a finite subset of . Then is said to shatter if for every subset of there is a set in with . If there is a largest, finite such that shatters at least one set of cardinality , then is called a Vapnik–Chervonenkis class, or VC class, of sets and its Vapnik–Chervonenkis index.
Let be the number of different sets for . Let be the maximum of over all sets of cardinality . Thus, is a Vapnik–Chervonenkis class if and only if for some finite , and then for all . Sauer's lemma says that
Thus, is either always or, for a Vapnik–Chervonenkis class , it is bounded above by a polynomial in of degree . (This is the so-called Vapnik–Chervonenkis property: if for large , then is bounded by a polynomial.)
Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [a1]), probability theory and mathematical statistics [a6], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a -algebra of subsets of , including , and a mapping from a complete separable metric space onto such that the set of pairs with is product-measurable in . A VC class satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the axiom of choice, the VC classes usually encountered in applications are VCM.
Let be a probability measure on and let be independent coordinates with distribution , specifically, on a countable Cartesian product of copies of . Let be the sum of the point masses at for ; it is called an empirical measure for (cf. also Empirical process). Then the law of large numbers for empirical measures holds uniformly over any VCM class , meaning that the supremum for of approaches zero almost surely as becomes large [a7]. This can be improved to a uniform law of the iterated logarithm, meaning that for any VCM class , with probability ,
Moreover, a central limit theorem holds uniformly: if is any VCM class, and assigns to sets in jointly normal (Gaussian) random variables with mean zero and covariances , then for any there is a sufficiently large such that for every , there exists a with
on an event with probability at least . For the uniform central limit theorem to hold for each probability measure on , the VC property is also necessary.
VC classes can be generated as follows. Let be a -dimensional vector space of real-valued functions on . For each , let be the set where . Then the class of all sets for is a VC class with . For example, the set of all ellipsoids in a Euclidean space is a VCM class for each . Also, let be a VC class and a finite integer. Let be the union of all Boolean algebras of sets (cf. Boolean algebra), each generated by at most sets in . Then is a VC class. For example, the set of all convex polytopes with at most faces in is a VC class for each and . Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [a4].
The class of all finite sets in and the class of all closed convex sets are not VC classes.
The notion of VC class extends in different ways to a class of real functions on . The subgraph of a function is the set
in . Then is called a VC subgraph class if the collection of all subgraphs of functions in is a VC class in ; it is called a VC major class if the class of all sets for and real is a VC class in .
The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all appears not to be equivalent to any VC-type combinatorial property.
For a probability measure and two events , let . For a totally bounded metric space and , let be the maximum number of points of all at distance at least from each other. For any there is a such that for every VCM class with and any ,
[a3]. There is a universal constant such that for every VCM class and any ,
Every VC class is included in a maximal class with the same VC index. If is a maximal VC class of index , then for any the set of symmetric differences for has a tree-like partial ordering by inclusion, and conversely, such an ordering implies [a2]. For index greater than no such structure is known (1996).
A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [a6], Sect. 2.6.
|[a1]||A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" JACM , 6 (1989) pp. 929–965|
|[a2]||R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer , 2 , Wadsworth (1985) pp. 495–508|
|[a3]||D. Haussler, "Sphere packing numbers for subsets of the Boolean -cube with bounded Vapnik–Chervonenkis dimension" J. Combin. Th. A , 69 (1995) pp. 217–232|
|[a4]||G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" Ann. Statist. , 17 (1989) pp. 1441–1446|
|[a5]||M. Talagrand, "Sharper bounds for Gaussian and empirical processes" Ann. Probab. , 22 (1994) pp. 28–76|
|[a6]||A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996)|
|[a7]||V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" Th. Probab. Appl. , 16 (1971) pp. 264–280|
|[a8]||R.M. Dudley, "Central limit theorems for empirical measures" Ann. of Probab. , 6 (1978) pp. 899–929|
|[a9]||R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326|
|[a10]||D. Pollard, "Convergence of stochastic processes" , Springer (1984)|
Vapnik-Chervonenkis class. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Vapnik-Chervonenkis_class&oldid=17559