|
|
Line 1: |
Line 1: |
| + | <!-- |
| + | v1100201.png |
| + | $#A+1 = 140 n = 1 |
| + | $#C+1 = 140 : ~/encyclopedia/old_files/data/V110/V.1100020 Vapnik\ANDChervonenkis class, |
| + | Automatically converted into TeX, above some diagnostics. |
| + | Please remove this comment and the {{TEX|auto}} line below, |
| + | if TeX found to be correct. |
| + | --> |
| + | |
| + | {{TEX|auto}} |
| + | {{TEX|done}} |
| + | |
| ''Vapnik–Červonenkis class'' | | ''Vapnik–Červonenkis class'' |
| | | |
− | Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100201.png" /> be a set, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100202.png" /> a collection of subsets of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100203.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100204.png" /> a finite subset of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100205.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100206.png" /> is said to shatter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100207.png" /> if for every subset <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100208.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v1100209.png" /> there is a set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002010.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002011.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002012.png" />. If there is a largest, finite <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002013.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002014.png" /> shatters at least one set of cardinality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002015.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002016.png" /> is called a Vapnik–Chervonenkis class, or VC class, of sets and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002017.png" /> its Vapnik–Chervonenkis index. | + | Let $ S $ |
| + | be a set, $ {\mathcal C} $ |
| + | a collection of subsets of $ S $ |
| + | and $ F $ |
| + | a finite subset of $ S $. |
| + | Then $ {\mathcal C} $ |
| + | is said to shatter $ F $ |
| + | if for every subset $ A $ |
| + | of $ F $ |
| + | there is a set $ C $ |
| + | in $ {\mathcal C} $ |
| + | with $ A = C \cap F $. |
| + | If there is a largest, finite $ k $ |
| + | such that $ {\mathcal C} $ |
| + | shatters at least one set of cardinality $ k $, |
| + | then $ {\mathcal C} $ |
| + | is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $ |
| + | its Vapnik–Chervonenkis index. |
| | | |
− | Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002018.png" /> be the number of different sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002019.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002020.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002021.png" /> be the maximum of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002022.png" /> over all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002023.png" /> of cardinality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002024.png" />. Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002025.png" /> is a Vapnik–Chervonenkis class if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002026.png" /> for some finite <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002027.png" />, and then for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002028.png" />. Sauer's lemma says that | + | Let $ \Delta ^ {\mathcal C} ( F ) $ |
| + | be the number of different sets $ C \cap F $ |
| + | for $ C \in {\mathcal C} $. |
| + | Let $ m ^ {\mathcal C} ( n ) $ |
| + | be the maximum of $ \Delta ^ {\mathcal C} ( F ) $ |
| + | over all sets $ F $ |
| + | of cardinality $ n $. |
| + | Thus, $ {\mathcal C} $ |
| + | is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ |
| + | for some finite $ n $, |
| + | and then for all $ n > S ( {\mathcal C} ) $. |
| + | Sauer's lemma says that |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002029.png" /></td> </tr></table>
| + | $$ |
| + | m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} |
| + | n \\ |
| + | j |
| + | \end{array} |
| + | \right ) . |
| + | $$ |
| | | |
− | Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002030.png" /> is either always <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002031.png" /> or, for a Vapnik–Chervonenkis class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002032.png" />, it is bounded above by a polynomial in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002033.png" /> of degree <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002034.png" />. (This is the so-called Vapnik–Chervonenkis property: if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002035.png" /> for large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002036.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002037.png" /> is bounded by a polynomial.) | + | Thus, $ m ^ {\mathcal C} ( n ) $ |
| + | is either always $ 2 ^ {n} $ |
| + | or, for a Vapnik–Chervonenkis class $ {\mathcal C} $, |
| + | it is bounded above by a polynomial in $ n $ |
| + | of degree $ S ( {\mathcal C} ) $. |
| + | (This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ |
| + | for large $ n $, |
| + | then $ m ^ {\mathcal C} ( n ) $ |
| + | is bounded by a polynomial.) |
| | | |
− | Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002038.png" />-algebra <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002039.png" /> of subsets of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002040.png" />, including <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002041.png" />, and a mapping <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002042.png" /> from a complete separable [[Metric space|metric space]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002043.png" /> onto <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002044.png" /> such that the set of pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002045.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002046.png" /> is product-measurable in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002047.png" />. A VC class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002048.png" /> satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the [[Axiom of choice|axiom of choice]], the VC classes usually encountered in applications are VCM. | + | Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $- |
| + | algebra $ {\mathcal S} $ |
| + | of subsets of $ S $, |
| + | including $ {\mathcal C} $, |
| + | and a mapping $ Y $ |
| + | from a complete separable [[Metric space|metric space]] $ U $ |
| + | onto $ {\mathcal C} $ |
| + | such that the set of pairs $ ( x,u ) $ |
| + | with $ x \in Y ( u ) $ |
| + | is product-measurable in $ S \times U $. |
| + | A VC class $ {\mathcal C} $ |
| + | satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the [[Axiom of choice|axiom of choice]], the VC classes usually encountered in applications are VCM. |
| | | |
− | Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002049.png" /> be a probability measure on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002050.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002051.png" /> be independent coordinates with distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002052.png" />, specifically, on a countable Cartesian product of copies of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002053.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002054.png" /> be the sum of the point masses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002055.png" /> at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002056.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002057.png" />; it is called an empirical measure for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002058.png" /> (cf. also [[Empirical process|Empirical process]]). Then the [[Law of large numbers|law of large numbers]] for empirical measures holds uniformly over any VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002059.png" />, meaning that the supremum for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002060.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002061.png" /> approaches zero almost surely as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002062.png" /> becomes large [[#References|[a7]]]. This can be improved to a uniform [[Law of the iterated logarithm|law of the iterated logarithm]], meaning that for any VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002063.png" />, with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002064.png" />, | + | Let $ {\mathsf P} $ |
| + | be a probability measure on $ ( S, {\mathcal S} ) $ |
| + | and let $ X _ {1} ,X _ {2} , \dots $ |
| + | be independent coordinates with distribution $ {\mathsf P} $, |
| + | specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $. |
| + | Let $ {\mathsf P} _ {n} $ |
| + | be the sum of the point masses $ {1 / n } $ |
| + | at $ X _ {i} $ |
| + | for $ i = 1 \dots n $; |
| + | it is called an empirical measure for $ {\mathsf P} $( |
| + | cf. also [[Empirical process|Empirical process]]). Then the [[Law of large numbers|law of large numbers]] for empirical measures holds uniformly over any VCM class $ {\mathcal C} $, |
| + | meaning that the supremum for $ C \in {\mathcal C} $ |
| + | of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $ |
| + | approaches zero almost surely as $ n $ |
| + | becomes large [[#References|[a7]]]. This can be improved to a uniform [[Law of the iterated logarithm|law of the iterated logarithm]], meaning that for any VCM class $ {\mathcal C} $, |
| + | with probability $ 1 $, |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002065.png" /></td> </tr></table>
| + | $$ |
| + | {\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { |
| + | \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } |
| + | } = |
| + | $$ |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002066.png" /></td> </tr></table>
| + | $$ |
| + | = |
| + | \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } . |
| + | $$ |
| | | |
− | Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002067.png" /> is any VCM class, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002068.png" /> assigns to sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002069.png" /> jointly normal (Gaussian) random variables with mean zero and covariances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002070.png" />, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002071.png" /> there is a sufficiently large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002072.png" /> such that for every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002073.png" />, there exists a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002074.png" /> with | + | Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if $ {\mathcal C} $ |
| + | is any VCM class, and $ G _ {\mathsf P} $ |
| + | assigns to sets in $ {\mathcal C} $ |
| + | jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $, |
| + | then for any $ \epsilon > 0 $ |
| + | there is a sufficiently large $ m $ |
| + | such that for every $ n \geq m $, |
| + | there exists a $ G _ {\mathsf P} $ |
| + | with |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002075.png" /></td> </tr></table>
| + | $$ |
| + | \sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon |
| + | $$ |
| | | |
− | on an event with probability at least <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002076.png" />. For the uniform central limit theorem to hold for each probability measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002077.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002078.png" />, the VC property is also necessary. | + | on an event with probability at least $ 1 - \epsilon $. |
| + | For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $ |
| + | on $ ( S, {\mathcal S} ) $, |
| + | the VC property is also necessary. |
| | | |
− | VC classes can be generated as follows. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002079.png" /> be a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002080.png" />-dimensional [[Vector space|vector space]] of real-valued functions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002081.png" />. For each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002082.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002083.png" /> be the set where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002084.png" />. Then the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002085.png" /> of all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002086.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002087.png" /> is a VC class with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002088.png" />. For example, the set of all ellipsoids in a Euclidean space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002089.png" /> is a VCM class for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002090.png" />. Also, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002091.png" /> be a VC class and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002092.png" /> a finite integer. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002093.png" /> be the union of all Boolean algebras of sets (cf. [[Boolean algebra|Boolean algebra]]), each generated by at most <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002094.png" /> sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002095.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002096.png" /> is a VC class. For example, the set of all convex polytopes with at most <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002097.png" /> faces in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002098.png" /> is a VC class for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v11002099.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020100.png" />. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [[#References|[a4]]]. | + | VC classes can be generated as follows. Let $ V $ |
| + | be a $ k $- |
| + | dimensional [[Vector space|vector space]] of real-valued functions on $ S $. |
| + | For each $ f \in V $, |
| + | let $ { \mathop{\rm pos} } ( f ) $ |
| + | be the set where $ f > 0 $. |
| + | Then the class $ {\mathcal C} $ |
| + | of all sets $ { \mathop{\rm pos} } ( f ) $ |
| + | for $ f \in V $ |
| + | is a VC class with $ S ( {\mathcal C} ) = k $. |
| + | For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $ |
| + | is a VCM class for each $ d $. |
| + | Also, let $ {\mathcal C} $ |
| + | be a VC class and $ m $ |
| + | a finite integer. Let $ {\mathcal D} $ |
| + | be the union of all Boolean algebras of sets (cf. [[Boolean algebra|Boolean algebra]]), each generated by at most $ m $ |
| + | sets in $ {\mathcal C} $. |
| + | Then $ {\mathcal D} $ |
| + | is a VC class. For example, the set of all convex polytopes with at most $ m $ |
| + | faces in $ \mathbf R ^ {d} $ |
| + | is a VC class for each $ m $ |
| + | and $ d $. |
| + | Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [[#References|[a4]]]. |
| | | |
− | The class of all finite sets in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020101.png" /> and the class of all closed convex sets are not VC classes. | + | The class of all finite sets in $ \mathbf R ^ {d} $ |
| + | and the class of all closed convex sets are not VC classes. |
| | | |
− | The notion of VC class extends in different ways to a class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020102.png" /> of real functions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020103.png" />. The subgraph of a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020104.png" /> is the set | + | The notion of VC class extends in different ways to a class $ {\mathcal F} $ |
| + | of real functions on $ S $. |
| + | The subgraph of a function $ f $ |
| + | is the set |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020105.png" /></td> </tr></table>
| + | $$ |
| + | \left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \} |
| + | $$ |
| | | |
− | in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020106.png" />. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020107.png" /> is called a VC subgraph class if the collection of all subgraphs of functions in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020108.png" /> is a VC class in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020109.png" />; it is called a VC major class if the class of all sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020110.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020111.png" /> and real <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020112.png" /> is a VC class in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020113.png" />. | + | in $ S \times \mathbf R $. |
| + | Then $ {\mathcal F} $ |
| + | is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $ |
| + | is a VC class in $ S \times \mathbf R $; |
| + | it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $ |
| + | for $ f \in {\mathcal F} $ |
| + | and real $ x $ |
| + | is a VC class in $ S $. |
| | | |
− | The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020114.png" /> appears not to be equivalent to any VC-type combinatorial property. | + | The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $ |
| + | appears not to be equivalent to any VC-type combinatorial property. |
| | | |
− | For a probability measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020115.png" /> and two events <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020116.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020117.png" />. For a totally bounded metric space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020118.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020119.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020120.png" /> be the maximum number of points of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020121.png" /> all at distance at least <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020122.png" /> from each other. For any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020123.png" /> there is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020124.png" /> such that for every VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020125.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020126.png" /> and any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020127.png" />, | + | For a probability measure $ {\mathsf P} $ |
| + | and two events $ A,B $, |
| + | let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $. |
| + | For a totally bounded metric space $ ( T,d ) $ |
| + | and $ \epsilon > 0 $, |
| + | let $ D ( \epsilon,T,d ) $ |
| + | be the maximum number of points of $ T $ |
| + | all at distance at least $ \epsilon $ |
| + | from each other. For any $ m $ |
| + | there is a $ K _ {m} < \infty $ |
| + | such that for every VCM class $ {\mathcal C} $ |
| + | with $ S ( {\mathcal C} ) = m $ |
| + | and any $ {\mathsf P} $, |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020128.png" /></td> </tr></table>
| + | $$ |
| + | D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } , |
| + | $$ |
| | | |
− | [[#References|[a3]]]. There is a universal constant <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020129.png" /> such that for every VCM class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020130.png" /> and any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020131.png" />, | + | [[#References|[a3]]]. There is a universal constant $ K $ |
| + | such that for every VCM class $ {\mathcal C} $ |
| + | and any $ M < \infty $, |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020132.png" /></td> </tr></table>
| + | $$ |
| + | { \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq |
| + | $$ |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020133.png" /></td> </tr></table>
| + | $$ |
| + | \leq |
| + | KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) , |
| + | $$ |
| | | |
| [[#References|[a5]]]. | | [[#References|[a5]]]. |
| | | |
− | Every VC class is included in a maximal class with the same VC index. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020134.png" /> is a maximal VC class of index <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020135.png" />, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020136.png" /> the set of symmetric differences <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020137.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020138.png" /> has a tree-like partial ordering by inclusion, and conversely, such an ordering implies <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020139.png" /> [[#References|[a2]]]. For index greater than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/v/v110/v110020/v110020140.png" /> no such structure is known (1996). | + | Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $ |
| + | is a maximal VC class of index $ 1 $, |
| + | then for any $ A \in {\mathcal C} $ |
| + | the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $ |
| + | for $ B \in {\mathcal C} $ |
| + | has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[[#References|[a2]]]. For index greater than $ 1 $ |
| + | no such structure is known (1996). |
| | | |
| A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [[#References|[a6]]], Sect. 2.6. | | A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [[#References|[a6]]], Sect. 2.6. |
Vapnik–Červonenkis class
Let $ S $
be a set, $ {\mathcal C} $
a collection of subsets of $ S $
and $ F $
a finite subset of $ S $.
Then $ {\mathcal C} $
is said to shatter $ F $
if for every subset $ A $
of $ F $
there is a set $ C $
in $ {\mathcal C} $
with $ A = C \cap F $.
If there is a largest, finite $ k $
such that $ {\mathcal C} $
shatters at least one set of cardinality $ k $,
then $ {\mathcal C} $
is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $
its Vapnik–Chervonenkis index.
Let $ \Delta ^ {\mathcal C} ( F ) $
be the number of different sets $ C \cap F $
for $ C \in {\mathcal C} $.
Let $ m ^ {\mathcal C} ( n ) $
be the maximum of $ \Delta ^ {\mathcal C} ( F ) $
over all sets $ F $
of cardinality $ n $.
Thus, $ {\mathcal C} $
is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $
for some finite $ n $,
and then for all $ n > S ( {\mathcal C} ) $.
Sauer's lemma says that
$$
m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c}
n \\
j
\end{array}
\right ) .
$$
Thus, $ m ^ {\mathcal C} ( n ) $
is either always $ 2 ^ {n} $
or, for a Vapnik–Chervonenkis class $ {\mathcal C} $,
it is bounded above by a polynomial in $ n $
of degree $ S ( {\mathcal C} ) $.
(This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $
for large $ n $,
then $ m ^ {\mathcal C} ( n ) $
is bounded by a polynomial.)
Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [a1]), probability theory and mathematical statistics [a6], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $-
algebra $ {\mathcal S} $
of subsets of $ S $,
including $ {\mathcal C} $,
and a mapping $ Y $
from a complete separable metric space $ U $
onto $ {\mathcal C} $
such that the set of pairs $ ( x,u ) $
with $ x \in Y ( u ) $
is product-measurable in $ S \times U $.
A VC class $ {\mathcal C} $
satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the axiom of choice, the VC classes usually encountered in applications are VCM.
Let $ {\mathsf P} $
be a probability measure on $ ( S, {\mathcal S} ) $
and let $ X _ {1} ,X _ {2} , \dots $
be independent coordinates with distribution $ {\mathsf P} $,
specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $.
Let $ {\mathsf P} _ {n} $
be the sum of the point masses $ {1 / n } $
at $ X _ {i} $
for $ i = 1 \dots n $;
it is called an empirical measure for $ {\mathsf P} $(
cf. also Empirical process). Then the law of large numbers for empirical measures holds uniformly over any VCM class $ {\mathcal C} $,
meaning that the supremum for $ C \in {\mathcal C} $
of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $
approaches zero almost surely as $ n $
becomes large [a7]. This can be improved to a uniform law of the iterated logarithm, meaning that for any VCM class $ {\mathcal C} $,
with probability $ 1 $,
$$
{\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } {
\frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } }
} =
$$
$$
=
\sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } .
$$
Moreover, a central limit theorem holds uniformly: if $ {\mathcal C} $
is any VCM class, and $ G _ {\mathsf P} $
assigns to sets in $ {\mathcal C} $
jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $,
then for any $ \epsilon > 0 $
there is a sufficiently large $ m $
such that for every $ n \geq m $,
there exists a $ G _ {\mathsf P} $
with
$$
\sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon
$$
on an event with probability at least $ 1 - \epsilon $.
For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $
on $ ( S, {\mathcal S} ) $,
the VC property is also necessary.
VC classes can be generated as follows. Let $ V $
be a $ k $-
dimensional vector space of real-valued functions on $ S $.
For each $ f \in V $,
let $ { \mathop{\rm pos} } ( f ) $
be the set where $ f > 0 $.
Then the class $ {\mathcal C} $
of all sets $ { \mathop{\rm pos} } ( f ) $
for $ f \in V $
is a VC class with $ S ( {\mathcal C} ) = k $.
For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $
is a VCM class for each $ d $.
Also, let $ {\mathcal C} $
be a VC class and $ m $
a finite integer. Let $ {\mathcal D} $
be the union of all Boolean algebras of sets (cf. Boolean algebra), each generated by at most $ m $
sets in $ {\mathcal C} $.
Then $ {\mathcal D} $
is a VC class. For example, the set of all convex polytopes with at most $ m $
faces in $ \mathbf R ^ {d} $
is a VC class for each $ m $
and $ d $.
Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [a4].
The class of all finite sets in $ \mathbf R ^ {d} $
and the class of all closed convex sets are not VC classes.
The notion of VC class extends in different ways to a class $ {\mathcal F} $
of real functions on $ S $.
The subgraph of a function $ f $
is the set
$$
\left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \}
$$
in $ S \times \mathbf R $.
Then $ {\mathcal F} $
is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $
is a VC class in $ S \times \mathbf R $;
it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $
for $ f \in {\mathcal F} $
and real $ x $
is a VC class in $ S $.
The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $
appears not to be equivalent to any VC-type combinatorial property.
For a probability measure $ {\mathsf P} $
and two events $ A,B $,
let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $.
For a totally bounded metric space $ ( T,d ) $
and $ \epsilon > 0 $,
let $ D ( \epsilon,T,d ) $
be the maximum number of points of $ T $
all at distance at least $ \epsilon $
from each other. For any $ m $
there is a $ K _ {m} < \infty $
such that for every VCM class $ {\mathcal C} $
with $ S ( {\mathcal C} ) = m $
and any $ {\mathsf P} $,
$$
D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } ,
$$
[a3]. There is a universal constant $ K $
such that for every VCM class $ {\mathcal C} $
and any $ M < \infty $,
$$
{ \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq
$$
$$
\leq
KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) ,
$$
[a5].
Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $
is a maximal VC class of index $ 1 $,
then for any $ A \in {\mathcal C} $
the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $
for $ B \in {\mathcal C} $
has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[a2]. For index greater than $ 1 $
no such structure is known (1996).
A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [a6], Sect. 2.6.
References
[a1] | A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" JACM , 6 (1989) pp. 929–965 |
[a2] | R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer , 2 , Wadsworth (1985) pp. 495–508 |
[a3] | D. Haussler, "Sphere packing numbers for subsets of the Boolean -cube with bounded Vapnik–Chervonenkis dimension" J. Combin. Th. A , 69 (1995) pp. 217–232 |
[a4] | G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" Ann. Statist. , 17 (1989) pp. 1441–1446 |
[a5] | M. Talagrand, "Sharper bounds for Gaussian and empirical processes" Ann. Probab. , 22 (1994) pp. 28–76 |
[a6] | A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996) |
[a7] | V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" Th. Probab. Appl. , 16 (1971) pp. 264–280 |
[a8] | R.M. Dudley, "Central limit theorems for empirical measures" Ann. of Probab. , 6 (1978) pp. 899–929 |
[a9] | R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326 |
[a10] | D. Pollard, "Convergence of stochastic processes" , Springer (1984) |