# Vapnik-Chervonenkis class

Vapnik–Červonenkis class

Let $S$ be a set, ${\mathcal C}$ a collection of subsets of $S$ and $F$ a finite subset of $S$. Then ${\mathcal C}$ is said to shatter $F$ if for every subset $A$ of $F$ there is a set $C$ in ${\mathcal C}$ with $A = C \cap F$. If there is a largest, finite $k$ such that ${\mathcal C}$ shatters at least one set of cardinality $k$, then ${\mathcal C}$ is called a Vapnik–Chervonenkis class, or VC class, of sets and $S ( {\mathcal C} ) = k$ its Vapnik–Chervonenkis index.

Let $\Delta ^ {\mathcal C} ( F )$ be the number of different sets $C \cap F$ for $C \in {\mathcal C}$. Let $m ^ {\mathcal C} ( n )$ be the maximum of $\Delta ^ {\mathcal C} ( F )$ over all sets $F$ of cardinality $n$. Thus, ${\mathcal C}$ is a Vapnik–Chervonenkis class if and only if $m ^ {\mathcal C} ( n ) < 2 ^ {n}$ for some finite $n$, and then for all $n > S ( {\mathcal C} )$. Sauer's lemma says that

$$m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} n \\ j \end{array} \right ) .$$

Thus, $m ^ {\mathcal C} ( n )$ is either always $2 ^ {n}$ or, for a Vapnik–Chervonenkis class ${\mathcal C}$, it is bounded above by a polynomial in $n$ of degree $S ( {\mathcal C} )$. (This is the so-called Vapnik–Chervonenkis property: if $m ^ {\mathcal C} ( n ) < 2 ^ {n}$ for large $n$, then $m ^ {\mathcal C} ( n )$ is bounded by a polynomial.)

Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [a1]), probability theory and mathematical statistics [a6], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $\sigma$- algebra ${\mathcal S}$ of subsets of $S$, including ${\mathcal C}$, and a mapping $Y$ from a complete separable metric space $U$ onto ${\mathcal C}$ such that the set of pairs $( x,u )$ with $x \in Y ( u )$ is product-measurable in $S \times U$. A VC class ${\mathcal C}$ satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the axiom of choice, the VC classes usually encountered in applications are VCM.

Let ${\mathsf P}$ be a probability measure on $( S, {\mathcal S} )$ and let $X _ {1} ,X _ {2} , \dots$ be independent coordinates with distribution ${\mathsf P}$, specifically, on a countable Cartesian product of copies of $( S, {\mathcal S}, {\mathsf P} )$. Let ${\mathsf P} _ {n}$ be the sum of the point masses ${1 / n }$ at $X _ {i}$ for $i = 1 \dots n$; it is called an empirical measure for ${\mathsf P}$( cf. also Empirical process). Then the law of large numbers for empirical measures holds uniformly over any VCM class ${\mathcal C}$, meaning that the supremum for $C \in {\mathcal C}$ of $| {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } |$ approaches zero almost surely as $n$ becomes large [a7]. This can be improved to a uniform law of the iterated logarithm, meaning that for any VCM class ${\mathcal C}$, with probability $1$,

$${\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } } =$$

$$= \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } .$$

Moreover, a central limit theorem holds uniformly: if ${\mathcal C}$ is any VCM class, and $G _ {\mathsf P}$ assigns to sets in ${\mathcal C}$ jointly normal (Gaussian) random variables with mean zero and covariances ${\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B )$, then for any $\epsilon > 0$ there is a sufficiently large $m$ such that for every $n \geq m$, there exists a $G _ {\mathsf P}$ with

$$\sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon$$

on an event with probability at least $1 - \epsilon$. For the uniform central limit theorem to hold for each probability measure ${\mathsf P}$ on $( S, {\mathcal S} )$, the VC property is also necessary.

VC classes can be generated as follows. Let $V$ be a $k$- dimensional vector space of real-valued functions on $S$. For each $f \in V$, let ${ \mathop{\rm pos} } ( f )$ be the set where $f > 0$. Then the class ${\mathcal C}$ of all sets ${ \mathop{\rm pos} } ( f )$ for $f \in V$ is a VC class with $S ( {\mathcal C} ) = k$. For example, the set of all ellipsoids in a Euclidean space $\mathbf R ^ {d}$ is a VCM class for each $d$. Also, let ${\mathcal C}$ be a VC class and $m$ a finite integer. Let ${\mathcal D}$ be the union of all Boolean algebras of sets (cf. Boolean algebra), each generated by at most $m$ sets in ${\mathcal C}$. Then ${\mathcal D}$ is a VC class. For example, the set of all convex polytopes with at most $m$ faces in $\mathbf R ^ {d}$ is a VC class for each $m$ and $d$. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [a4].

The class of all finite sets in $\mathbf R ^ {d}$ and the class of all closed convex sets are not VC classes.

The notion of VC class extends in different ways to a class ${\mathcal F}$ of real functions on $S$. The subgraph of a function $f$ is the set

$$\left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \}$$

in $S \times \mathbf R$. Then ${\mathcal F}$ is called a VC subgraph class if the collection of all subgraphs of functions in ${\mathcal F}$ is a VC class in $S \times \mathbf R$; it is called a VC major class if the class of all sets $\{ {s \in S } : {f ( s ) > x } \}$ for $f \in {\mathcal F}$ and real $x$ is a VC class in $S$.

The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all ${\mathsf P}$ appears not to be equivalent to any VC-type combinatorial property.

For a probability measure ${\mathsf P}$ and two events $A,B$, let $d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } |$. For a totally bounded metric space $( T,d )$ and $\epsilon > 0$, let $D ( \epsilon,T,d )$ be the maximum number of points of $T$ all at distance at least $\epsilon$ from each other. For any $m$ there is a $K _ {m} < \infty$ such that for every VCM class ${\mathcal C}$ with $S ( {\mathcal C} ) = m$ and any ${\mathsf P}$,

$$D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } ,$$

[a3]. There is a universal constant $K$ such that for every VCM class ${\mathcal C}$ and any $M < \infty$,

$${ \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq$$

$$\leq KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) ,$$

[a5].

Every VC class is included in a maximal class with the same VC index. If ${\mathcal C}$ is a maximal VC class of index $1$, then for any $A \in {\mathcal C}$ the set of symmetric differences $( B \setminus A ) \cup ( A \setminus B )$ for $B \in {\mathcal C}$ has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $S ( {\mathcal C} ) = 1$[a2]. For index greater than $1$ no such structure is known (1996).

A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [a6], Sect. 2.6.

#### References

 [a1] A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" JACM , 6 (1989) pp. 929–965 [a2] R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer , 2 , Wadsworth (1985) pp. 495–508 [a3] D. Haussler, "Sphere packing numbers for subsets of the Boolean -cube with bounded Vapnik–Chervonenkis dimension" J. Combin. Th. A , 69 (1995) pp. 217–232 [a4] G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" Ann. Statist. , 17 (1989) pp. 1441–1446 [a5] M. Talagrand, "Sharper bounds for Gaussian and empirical processes" Ann. Probab. , 22 (1994) pp. 28–76 [a6] A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996) [a7] V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" Th. Probab. Appl. , 16 (1971) pp. 264–280 [a8] R.M. Dudley, "Central limit theorems for empirical measures" Ann. of Probab. , 6 (1978) pp. 899–929 [a9] R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326 [a10] D. Pollard, "Convergence of stochastic processes" , Springer (1984)
How to Cite This Entry:
Vapnik-Chervonenkis class. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Vapnik-Chervonenkis_class&oldid=49110
This article was adapted from an original article by R.M. DudleyJ.H.J. Einmahl (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article