Difference between revisions of "User:Richard Pinch/sandbox-13"
(→Markov inequality: more) |
(→Monad: move) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | = | + | =Mann–Whitney test= |
− | A | + | A statistical test for testing the hypothesis $ H _ {0} $ |
− | + | of homogeneity of two samples $ X _ {1} \dots X _ {n} $ | |
+ | and $ Y _ {1} \dots Y _ {m} $, | ||
+ | all $ m + n $ | ||
+ | elements of which are mutually independent and have continuous distributions. This test, suggested by H.B. Mann and D.R. Whitney [[#References|[1]]], is based on the statistic | ||
+ | |||
+ | $$ | ||
+ | U = W - | ||
+ | \frac{1}{2} | ||
+ | m ( m + 1 ) = \ | ||
+ | \sum _ { i=1 } ^ { n } \ | ||
+ | \sum _ { j=1 } ^ { m } | ||
+ | \delta _ {ij} , | ||
$$ | $$ | ||
− | \ | + | |
+ | where $ W $ | ||
+ | is the statistic of the [[Wilcoxon test|Wilcoxon test]] intended for testing the same hypothesis, equal to the sum of the ranks of the elements of the second sample among the pooled order statistics (cf. [[Order statistic|Order statistic]]), and | ||
+ | |||
+ | $$ | ||
+ | \delta _ {ij} = \ | ||
+ | \left \{ | ||
+ | \begin{array}{ll} | ||
+ | 1 & \textrm{ if } X _ {i} < Y _ {j} , \\ | ||
+ | 0 & \textrm{ otherwise } . \\ | ||
+ | \end{array} | ||
+ | |||
+ | \right .$$ | ||
+ | |||
+ | Thus, $ U $ | ||
+ | counts the number of cases when the elements of the second sample exceed elements of the first sample. It follows from the definition of $ U $ | ||
+ | that if $ H _ {0} $ | ||
+ | is true, then | ||
+ | |||
+ | $$ \tag{* } | ||
+ | {\mathsf E} U = | ||
+ | \frac{nm}{2} | ||
+ | ,\ \ | ||
+ | {\mathsf D} U = | ||
+ | \frac{n m ( n + m + 1 ) }{12} | ||
+ | , | ||
$$ | $$ | ||
− | + | ||
− | + | and, in addition, this statistic has all the properties of the Wilcoxon statistic $ W $, | |
+ | including asymptotic normality with parameters (*). | ||
+ | |||
+ | ====References==== | ||
+ | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> H.B. Mann, D.R. Whitney, "On a test whether one of two random variables is statistically larger than the other" ''Ann. Math. Stat.'' , '''18''' (1947) pp. 50–60</TD></TR></table> | ||
+ | |||
+ | ====Comments==== | ||
+ | Instead of Mann–Whitney test, the phrase $U$-test is also used. | ||
+ | |||
+ | ==Wilcoxon test== | ||
+ | |||
+ | A [[Non-parametric test|non-parametric test]] of the homogeneity of two samples $ X _ {1} \dots X _ {n} $ | ||
+ | and $ Y _ {1} \dots Y _ {m} $. | ||
+ | The elements of the samples are assumed to be mutually independent, with continuous distribution functions $ F( x) $ | ||
+ | and $ G( x) $, | ||
+ | respectively. The hypothesis to be tested is $ F( x)= G( x) $. | ||
+ | Wilcoxon's test is based on the [[Rank statistic|rank statistic]] | ||
+ | |||
+ | $$ \tag{* } | ||
+ | W = s ( r _ {1} ) + \dots + s ( r _ {m} ), | ||
$$ | $$ | ||
− | + | ||
+ | where $ r _ {j} $ | ||
+ | are the ranks of the random variables $ Y _ {j} $ | ||
+ | in the common series of order statistics of $ X _ {i} $ | ||
+ | and $ Y _ {j} $, | ||
+ | while the function $ s( r) $, | ||
+ | $ r = 1 \dots n + m $, | ||
+ | is defined by a given permutation | ||
+ | |||
+ | $$ | ||
+ | \left( | ||
+ | \begin{array}{cccc} | ||
+ | 1 & 2 & \cdots & m+n \\ | ||
+ | s(1) & s(2) & \cdots & s(m+n) | ||
+ | \end{array} | ||
+ | \right)\ , | ||
$$ | $$ | ||
+ | where $ s( 1) \dots s( n+ m) $ | ||
+ | is one of the possible rearrangements of the numbers $ 1 \dots n + m $. | ||
+ | The permutation is chosen so that the power of Wilcoxon's test for the given alternative is highest. The statistical distribution of $ W $ | ||
+ | depends only on the size of the samples and not on the chosen permutation (if the homogeneity hypothesis is true). If $ n \rightarrow \infty $ | ||
+ | and $ m \rightarrow \infty $, | ||
+ | the random variable $ W $ | ||
+ | has an asymptotically-normal distribution. This variant of the test was first proposed by F. Wilcoxon in 1945 for samples of equal sizes and was based on the special case $ s( r) \equiv r $( | ||
+ | cf. [[Rank sum test|Rank sum test]]; [[Mann–Whitney test|Mann–Whitney test]]). See also [[Van der Waerden test|van der Waerden test]]; [[Rank test|Rank test]]. | ||
+ | |||
+ | ====References==== | ||
+ | <table> | ||
+ | <TR><TD valign="top">[1]</TD> <TD valign="top"> F. Wilcoxon, "Individual comparison by ranking methods" ''Biometrics'' , '''1''' : 6 (1945) pp. 80–83</TD></TR> | ||
+ | <TR><TD valign="top">[2]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR> | ||
+ | <TR><TD valign="top">[3]</TD> <TD valign="top"> B.L. van der Waerden, "Mathematische Statistik" , Springer (1957)</TD></TR> | ||
+ | </table> | ||
+ | |||
+ | ====Comments==== | ||
+ | |||
+ | ====References==== | ||
+ | <table> | ||
+ | <TR><TD valign="top">[a1]</TD> <TD valign="top"> E.L. Lehmann, "Testing statistical hypotheses" , Wiley (1986)</TD></TR> | ||
+ | </table> | ||
+ | |||
+ | |||
+ | |||
+ | =Vapnik-Chervonenkis class= | ||
+ | ''Vapnik–Červonenkis class'' | ||
+ | |||
+ | Let $ S $ | ||
+ | be a set, $ {\mathcal C} $ | ||
+ | a collection of subsets of $ S $ | ||
+ | and $ F $ | ||
+ | a finite subset of $ S $. | ||
+ | Then $ {\mathcal C} $ | ||
+ | is said to shatter $ F $ | ||
+ | if for every subset $ A $ | ||
+ | of $ F $ | ||
+ | there is a set $ C $ | ||
+ | in $ {\mathcal C} $ | ||
+ | with $ A = C \cap F $. | ||
+ | If there is a largest, finite $ k $ | ||
+ | such that $ {\mathcal C} $ | ||
+ | shatters at least one set of cardinality $ k $, | ||
+ | then $ {\mathcal C} $ | ||
+ | is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $ | ||
+ | its Vapnik–Chervonenkis index. | ||
− | + | Let $ \Delta ^ {\mathcal C} ( F ) $ | |
− | + | be the number of different sets $ C \cap F $ | |
+ | for $ C \in {\mathcal C} $. | ||
+ | Let $ m ^ {\mathcal C} ( n ) $ | ||
+ | be the maximum of $ \Delta ^ {\mathcal C} ( F ) $ | ||
+ | over all sets $ F $ | ||
+ | of cardinality $ n $. | ||
+ | Thus, $ {\mathcal C} $ | ||
+ | is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ | ||
+ | for some finite $ n $, | ||
+ | and then for all $ n > S ( {\mathcal C} ) $. | ||
+ | Sauer's lemma says that | ||
− | = | + | $$ |
− | + | m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} | |
− | + | n \\ | |
+ | j | ||
+ | \end{array} | ||
+ | \right ) . | ||
$$ | $$ | ||
− | |||
− | |||
− | |||
− | + | Thus, $ m ^ {\mathcal C} ( n ) $ | |
+ | is either always $ 2 ^ {n} $ | ||
+ | or, for a Vapnik–Chervonenkis class $ {\mathcal C} $, | ||
+ | it is bounded above by a polynomial in $ n $ | ||
+ | of degree $ S ( {\mathcal C} ) $. | ||
+ | (This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ | ||
+ | for large $ n $, | ||
+ | then $ m ^ {\mathcal C} ( n ) $ | ||
+ | is bounded by a polynomial.) | ||
+ | |||
+ | Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [[#References|[a1]]]), [[Probability theory|probability theory]] and [[Mathematical statistics|mathematical statistics]] [[#References|[a6]]], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $- | ||
+ | algebra $ {\mathcal S} $ | ||
+ | of subsets of $ S $, | ||
+ | including $ {\mathcal C} $, | ||
+ | and a mapping $ Y $ | ||
+ | from a complete separable [[Metric space|metric space]] $ U $ | ||
+ | onto $ {\mathcal C} $ | ||
+ | such that the set of pairs $ ( x,u ) $ | ||
+ | with $ x \in Y ( u ) $ | ||
+ | is product-measurable in $ S \times U $. | ||
+ | A VC class $ {\mathcal C} $ | ||
+ | satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the [[Axiom of choice|axiom of choice]], the VC classes usually encountered in applications are VCM. | ||
+ | |||
+ | Let $ {\mathsf P} $ | ||
+ | be a probability measure on $ ( S, {\mathcal S} ) $ | ||
+ | and let $ X _ {1} ,X _ {2} , \dots $ | ||
+ | be independent coordinates with distribution $ {\mathsf P} $, | ||
+ | specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $. | ||
+ | Let $ {\mathsf P} _ {n} $ | ||
+ | be the sum of the point masses $ {1 / n } $ | ||
+ | at $ X _ {i} $ | ||
+ | for $ i = 1 \dots n $; | ||
+ | it is called an empirical measure for $ {\mathsf P} $( | ||
+ | cf. also [[Empirical process|Empirical process]]). Then the [[Law of large numbers|law of large numbers]] for empirical measures holds uniformly over any VCM class $ {\mathcal C} $, | ||
+ | meaning that the supremum for $ C \in {\mathcal C} $ | ||
+ | of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $ | ||
+ | approaches zero almost surely as $ n $ | ||
+ | becomes large [[#References|[a7]]]. This can be improved to a uniform [[Law of the iterated logarithm|law of the iterated logarithm]], meaning that for any VCM class $ {\mathcal C} $, | ||
+ | with probability $ 1 $, | ||
− | + | $$ | |
+ | {\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { | ||
+ | \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } | ||
+ | } = | ||
$$ | $$ | ||
− | \ | + | |
+ | $$ | ||
+ | = | ||
+ | \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } . | ||
$$ | $$ | ||
− | |||
− | + | Moreover, a [[Central limit theorem|central limit theorem]] holds uniformly: if $ {\mathcal C} $ | |
− | + | is any VCM class, and $ G _ {\mathsf P} $ | |
+ | assigns to sets in $ {\mathcal C} $ | ||
+ | jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $, | ||
+ | then for any $ \epsilon > 0 $ | ||
+ | there is a sufficiently large $ m $ | ||
+ | such that for every $ n \geq m $, | ||
+ | there exists a $ G _ {\mathsf P} $ | ||
+ | with | ||
− | + | $$ | |
− | A | + | \sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon |
− | |||
$$ | $$ | ||
− | \mathsf | + | |
+ | on an event with probability at least $ 1 - \epsilon $. | ||
+ | For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $ | ||
+ | on $ ( S, {\mathcal S} ) $, | ||
+ | the VC property is also necessary. | ||
+ | |||
+ | VC classes can be generated as follows. Let $ V $ | ||
+ | be a $ k $- | ||
+ | dimensional [[Vector space|vector space]] of real-valued functions on $ S $. | ||
+ | For each $ f \in V $, | ||
+ | let $ { \mathop{\rm pos} } ( f ) $ | ||
+ | be the set where $ f > 0 $. | ||
+ | Then the class $ {\mathcal C} $ | ||
+ | of all sets $ { \mathop{\rm pos} } ( f ) $ | ||
+ | for $ f \in V $ | ||
+ | is a VC class with $ S ( {\mathcal C} ) = k $. | ||
+ | For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $ | ||
+ | is a VCM class for each $ d $. | ||
+ | Also, let $ {\mathcal C} $ | ||
+ | be a VC class and $ m $ | ||
+ | a finite integer. Let $ {\mathcal D} $ | ||
+ | be the union of all Boolean algebras of sets (cf. [[Boolean algebra|Boolean algebra]]), each generated by at most $ m $ | ||
+ | sets in $ {\mathcal C} $. | ||
+ | Then $ {\mathcal D} $ | ||
+ | is a VC class. For example, the set of all convex polytopes with at most $ m $ | ||
+ | faces in $ \mathbf R ^ {d} $ | ||
+ | is a VC class for each $ m $ | ||
+ | and $ d $. | ||
+ | Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [[#References|[a4]]]. | ||
+ | |||
+ | The class of all finite sets in $ \mathbf R ^ {d} $ | ||
+ | and the class of all closed convex sets are not VC classes. | ||
+ | |||
+ | The notion of VC class extends in different ways to a class $ {\mathcal F} $ | ||
+ | of real functions on $ S $. | ||
+ | The subgraph of a function $ f $ | ||
+ | is the set | ||
+ | |||
+ | $$ | ||
+ | \left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \} | ||
$$ | $$ | ||
− | + | in $ S \times \mathbf R $. | |
+ | Then $ {\mathcal F} $ | ||
+ | is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $ | ||
+ | is a VC class in $ S \times \mathbf R $; | ||
+ | it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $ | ||
+ | for $ f \in {\mathcal F} $ | ||
+ | and real $ x $ | ||
+ | is a VC class in $ S $. | ||
+ | |||
+ | The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $ | ||
+ | appears not to be equivalent to any VC-type combinatorial property. | ||
+ | |||
+ | For a probability measure $ {\mathsf P} $ | ||
+ | and two events $ A,B $, | ||
+ | let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $. | ||
+ | For a totally bounded metric space $ ( T,d ) $ | ||
+ | and $ \epsilon > 0 $, | ||
+ | let $ D ( \epsilon,T,d ) $ | ||
+ | be the maximum number of points of $ T $ | ||
+ | all at distance at least $ \epsilon $ | ||
+ | from each other. For any $ m $ | ||
+ | there is a $ K _ {m} < \infty $ | ||
+ | such that for every VCM class $ {\mathcal C} $ | ||
+ | with $ S ( {\mathcal C} ) = m $ | ||
+ | and any $ {\mathsf P} $, | ||
− | + | $$ | |
+ | D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } , | ||
$$ | $$ | ||
− | \ | + | |
+ | [[#References|[a3]]]. There is a universal constant $ K $ | ||
+ | such that for every VCM class $ {\mathcal C} $ | ||
+ | and any $ M < \infty $, | ||
+ | |||
+ | $$ | ||
+ | { \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq | ||
$$ | $$ | ||
− | + | $$ | |
− | + | \leq | |
− | + | KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) , | |
− | |||
− | |||
− | |||
− | |||
$$ | $$ | ||
+ | [[#References|[a5]]]. | ||
+ | |||
+ | Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $ | ||
+ | is a maximal VC class of index $ 1 $, | ||
+ | then for any $ A \in {\mathcal C} $ | ||
+ | the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $ | ||
+ | for $ B \in {\mathcal C} $ | ||
+ | has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[[#References|[a2]]]. For index greater than $ 1 $ | ||
+ | no such structure is known (1996). | ||
+ | |||
+ | A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [[#References|[a6]]], Sect. 2.6. | ||
+ | |||
+ | ==Vapnik-Chervonenkis dimension== | ||
+ | |||
+ | ''Vapnik–Červonenkis dimension, VC-dimension'' | ||
+ | |||
+ | Let $H = (V_H,E_H)$ be a [[hypergraph]]. The Vapnik–Chervonenkis dimension of $H$ is the largest cardinality of a subset $F$ of $V_H$ that is scattered by $E_H$, i.e. such that for all $A \subseteq F$ there is an $E \in E_H$ with $A = F \cap E$. Thus, it is the same as the index of a [[Vapnik–Chervonenkis class]]. It is usually denoted by $\mathrm{VC}(H)$. | ||
+ | |||
+ | Computing the Vapnik–Chervonenkis dimension is $\mathcal{NP}$-hard (cf. also [[NP|$\mathcal{NP}$]]) for many classes of hypergraphs, [[#References|[b1]]], [[#References|[b2]]]. | ||
− | + | The Vapnik–Chervonenkis dimension plays an important role in learning theory, especially in probably approximately correct ([[PAC]]) learning. Thus, learnability of classes of $\{0,1\}$-valued functions is equivalent to finiteness of the Vapnik–Chervonenkis dimension, [[#References|[b3]]]. | |
− | |||
− | |||
+ | For the role of the Vapnik–Chervonenkis dimension in [[neural network]]s, see, e.g., [[#References|[b4]]], [[#References|[b5]]]. | ||
+ | The independence number of a hypergraph $H$ is the maximal cardinality of a subset $A$ of $V_H$ that does not contain any $E \in E_H$ (see also [[Graph, numerical characteristics of a]]). This notion is closely related with $\mathrm{VC}(H)$, [[#References|[b6]]], [[#References|[b7]]]. | ||
− | |||
− | |||
− | |||
− | |||
==References== | ==References== | ||
− | + | <table> | |
+ | <TR><TD valign="top">[a1]</TD> <TD valign="top"> A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" ''JACM'' , '''6''' (1989) pp. 929–965</TD></TR> | ||
+ | <TR><TD valign="top">[a2]</TD> <TD valign="top"> R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , ''Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer'' , '''2''' , Wadsworth (1985) pp. 495–508</TD></TR> | ||
+ | <TR><TD valign="top">[a3]</TD> <TD valign="top"> D. Haussler, "Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik–Chervonenkis dimension" ''J. Combin. Th. A'' , '''69''' (1995) pp. 217–232</TD></TR> | ||
+ | <TR><TD valign="top">[a4]</TD> <TD valign="top"> G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" ''Ann. Statist.'' , '''17''' (1989) pp. 1441–1446</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top"> M. Talagrand, "Sharper bounds for Gaussian and empirical processes" ''Ann. Probab.'' , '''22''' (1994) pp. 28–76</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top"> A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996)</TD></TR> | ||
+ | <TR><TD valign="top">[a7]</TD> <TD valign="top"> V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" ''Th. Probab. Appl.'' , '''16''' (1971) pp. 264–280</TD></TR> | ||
+ | <TR><TD valign="top">[a8]</TD> <TD valign="top"> R.M. Dudley, "Central limit theorems for empirical measures" ''Ann. of Probab.'' , '''6''' (1978) pp. 899–929</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top"> R.M. Dudley, "Universal Donsker classes and metric entropy" ''Ann. of Probab.'' , '''15''' (1987) pp. 1306–1326</TD></TR> | ||
+ | <TR><TD valign="top">[a10]</TD> <TD valign="top"> D. Pollard, "Convergence of stochastic processes" , Springer (1984)</TD></TR> | ||
+ | <TR><TD valign="top">[b1]</TD> <TD valign="top"> E. Kranakis, D. Krizanc, B. Ruf, J. Urrutia, G. Wöginger, "The VC-dimension of set systems defined by graphs" ''Discr. Appl. Math.'' , '''77''' : 3 (1997) pp. 237–257</TD></TR> | ||
+ | <TR><TD valign="top">[b2]</TD> <TD valign="top"> C.H. Papadimitriou, M. Yannakakis, "On limited nondeterminism and the complexity of VC-dimension" ''J. Comput. Syst. Sci.'' , '''53''' : 2 (1996) pp. 161–170</TD></TR> | ||
+ | <TR><TD valign="top">[b3]</TD> <TD valign="top"> S. Ben-David, N. Cesa-Bianchi, D. Haussler, P.M. Long, "Characterizations of learnability of $\{0,\ldots,n\}$-valued functions" ''J. Comput. Syst. Sci.'' , '''50''' : 1 (1995) pp. 74–86 {{DOI|10.1006/jcss.1995.1008}} {{ZBL|0827.68095}}</TD></TR> | ||
+ | <TR><TD valign="top">[b4]</TD> <TD valign="top"> S.B. Holden, "Neural networks and the VC-dimension" J.G. McWhirter (ed.) , ''Mathematics in Signal Processing'' , '''III''' , Oxford Univ. Press (1994) pp. 73–84</TD></TR> | ||
+ | <TR><TD valign="top">[b5]</TD> <TD valign="top"> W. Maass, "Perspectives of current research about the complexity of learning on neural nets" V. Roychowdhury (ed.) et al. (ed.) , ''Theoretical Advances in Neural Computation and Learning'' , Kluwer Acad. Publ. (1994) pp. 295–336</TD></TR> | ||
+ | <TR><TD valign="top">[b6]</TD> <TD valign="top"> D.Q. Naiman, H.P. Wynn, "Independence number and the complexity of families of sets" ''Discr. Math.'' , '''154''' (1996) pp. 203–216</TD></TR> | ||
+ | <TR><TD valign="top">[b7]</TD> <TD valign="top"> J. Pach, P.K. Agarwal, "Combinatorial geometry" , Wiley/Interscience (1995) pp. 247–254</TD></TR> | ||
+ | </table> | ||
+ | |||
=Spanning set= | =Spanning set= | ||
''generating set'', ''for a module $M$ over a ring $R$'' | ''generating set'', ''for a module $M$ over a ring $R$'' | ||
− | A subset $S$ of $M$ such that every element of $M$ can be written as a finite linear combination $\sum_{i=1}^k r_i s_i$ with $r_i \in R$ and $s_i \in S$. | + | A subset $S$ of $M$ such that every element of $M$ can be written as a finite linear combination $\sum_{i=1}^k r_i s_i$ with $r_i \in R$ and $s_i \in S$: a set $S$ such that $M$ is the [[linear span]] of $S$. |
==References== | ==References== | ||
Line 85: | Line 357: | ||
A measure of dissimilarity between [[word]]s over some alphabet in terms of the number of elementary "edit" operations required to. turn one word into another. | A measure of dissimilarity between [[word]]s over some alphabet in terms of the number of elementary "edit" operations required to. turn one word into another. | ||
− | + | Examples include | |
* ''[[Hamming distance]]'' between words of the same length. An edit operation consists of ''substitution'': replacing one letter in a given position by another letter in the same position. | * ''[[Hamming distance]]'' between words of the same length. An edit operation consists of ''substitution'': replacing one letter in a given position by another letter in the same position. | ||
Line 309: | Line 581: | ||
<table> | <table> | ||
<TR><TD valign="top">[1]</TD> <TD valign="top"> S. MacLane, "Categories for the working mathematician" , Springer (1971). ISBN 0-387-98403-8</TD></TR> | <TR><TD valign="top">[1]</TD> <TD valign="top"> S. MacLane, "Categories for the working mathematician" , Springer (1971). ISBN 0-387-98403-8</TD></TR> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</table> | </table> | ||
Latest revision as of 18:44, 3 July 2021
Mann–Whitney test
A statistical test for testing the hypothesis $ H _ {0} $ of homogeneity of two samples $ X _ {1} \dots X _ {n} $ and $ Y _ {1} \dots Y _ {m} $, all $ m + n $ elements of which are mutually independent and have continuous distributions. This test, suggested by H.B. Mann and D.R. Whitney [1], is based on the statistic
$$ U = W - \frac{1}{2} m ( m + 1 ) = \ \sum _ { i=1 } ^ { n } \ \sum _ { j=1 } ^ { m } \delta _ {ij} , $$
where $ W $ is the statistic of the Wilcoxon test intended for testing the same hypothesis, equal to the sum of the ranks of the elements of the second sample among the pooled order statistics (cf. Order statistic), and
$$ \delta _ {ij} = \ \left \{ \begin{array}{ll} 1 & \textrm{ if } X _ {i} < Y _ {j} , \\ 0 & \textrm{ otherwise } . \\ \end{array} \right .$$
Thus, $ U $ counts the number of cases when the elements of the second sample exceed elements of the first sample. It follows from the definition of $ U $ that if $ H _ {0} $ is true, then
$$ \tag{* } {\mathsf E} U = \frac{nm}{2} ,\ \ {\mathsf D} U = \frac{n m ( n + m + 1 ) }{12} , $$
and, in addition, this statistic has all the properties of the Wilcoxon statistic $ W $, including asymptotic normality with parameters (*).
References
[1] | H.B. Mann, D.R. Whitney, "On a test whether one of two random variables is statistically larger than the other" Ann. Math. Stat. , 18 (1947) pp. 50–60 |
Comments
Instead of Mann–Whitney test, the phrase $U$-test is also used.
Wilcoxon test
A non-parametric test of the homogeneity of two samples $ X _ {1} \dots X _ {n} $ and $ Y _ {1} \dots Y _ {m} $. The elements of the samples are assumed to be mutually independent, with continuous distribution functions $ F( x) $ and $ G( x) $, respectively. The hypothesis to be tested is $ F( x)= G( x) $. Wilcoxon's test is based on the rank statistic
$$ \tag{* } W = s ( r _ {1} ) + \dots + s ( r _ {m} ), $$
where $ r _ {j} $ are the ranks of the random variables $ Y _ {j} $ in the common series of order statistics of $ X _ {i} $ and $ Y _ {j} $, while the function $ s( r) $, $ r = 1 \dots n + m $, is defined by a given permutation
$$ \left( \begin{array}{cccc} 1 & 2 & \cdots & m+n \\ s(1) & s(2) & \cdots & s(m+n) \end{array} \right)\ , $$ where $ s( 1) \dots s( n+ m) $ is one of the possible rearrangements of the numbers $ 1 \dots n + m $. The permutation is chosen so that the power of Wilcoxon's test for the given alternative is highest. The statistical distribution of $ W $ depends only on the size of the samples and not on the chosen permutation (if the homogeneity hypothesis is true). If $ n \rightarrow \infty $ and $ m \rightarrow \infty $, the random variable $ W $ has an asymptotically-normal distribution. This variant of the test was first proposed by F. Wilcoxon in 1945 for samples of equal sizes and was based on the special case $ s( r) \equiv r $( cf. Rank sum test; Mann–Whitney test). See also van der Waerden test; Rank test.
References
[1] | F. Wilcoxon, "Individual comparison by ranking methods" Biometrics , 1 : 6 (1945) pp. 80–83 |
[2] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
[3] | B.L. van der Waerden, "Mathematische Statistik" , Springer (1957) |
Comments
References
[a1] | E.L. Lehmann, "Testing statistical hypotheses" , Wiley (1986) |
Vapnik-Chervonenkis class
Vapnik–Červonenkis class
Let $ S $ be a set, $ {\mathcal C} $ a collection of subsets of $ S $ and $ F $ a finite subset of $ S $. Then $ {\mathcal C} $ is said to shatter $ F $ if for every subset $ A $ of $ F $ there is a set $ C $ in $ {\mathcal C} $ with $ A = C \cap F $. If there is a largest, finite $ k $ such that $ {\mathcal C} $ shatters at least one set of cardinality $ k $, then $ {\mathcal C} $ is called a Vapnik–Chervonenkis class, or VC class, of sets and $ S ( {\mathcal C} ) = k $ its Vapnik–Chervonenkis index.
Let $ \Delta ^ {\mathcal C} ( F ) $ be the number of different sets $ C \cap F $ for $ C \in {\mathcal C} $. Let $ m ^ {\mathcal C} ( n ) $ be the maximum of $ \Delta ^ {\mathcal C} ( F ) $ over all sets $ F $ of cardinality $ n $. Thus, $ {\mathcal C} $ is a Vapnik–Chervonenkis class if and only if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for some finite $ n $, and then for all $ n > S ( {\mathcal C} ) $. Sauer's lemma says that
$$ m ^ {\mathcal C} ( n ) \leq \sum _ {j = 0 } ^ { {S } ( {\mathcal C} ) } \left ( \begin{array}{c} n \\ j \end{array} \right ) . $$
Thus, $ m ^ {\mathcal C} ( n ) $ is either always $ 2 ^ {n} $ or, for a Vapnik–Chervonenkis class $ {\mathcal C} $, it is bounded above by a polynomial in $ n $ of degree $ S ( {\mathcal C} ) $. (This is the so-called Vapnik–Chervonenkis property: if $ m ^ {\mathcal C} ( n ) < 2 ^ {n} $ for large $ n $, then $ m ^ {\mathcal C} ( n ) $ is bounded by a polynomial.)
Vapnik–Chervonenkis classes have turned out to be useful in computer science (learning theory [a1]), probability theory and mathematical statistics [a6], because certain probability limit theorems hold uniformly over them under suitable measurability conditions. One such sufficient measurability condition is that there exist a $ \sigma $- algebra $ {\mathcal S} $ of subsets of $ S $, including $ {\mathcal C} $, and a mapping $ Y $ from a complete separable metric space $ U $ onto $ {\mathcal C} $ such that the set of pairs $ ( x,u ) $ with $ x \in Y ( u ) $ is product-measurable in $ S \times U $. A VC class $ {\mathcal C} $ satisfying this last condition is called a VCM class. While VC, but not VCM, classes can be shown to exist using the axiom of choice, the VC classes usually encountered in applications are VCM.
Let $ {\mathsf P} $ be a probability measure on $ ( S, {\mathcal S} ) $ and let $ X _ {1} ,X _ {2} , \dots $ be independent coordinates with distribution $ {\mathsf P} $, specifically, on a countable Cartesian product of copies of $ ( S, {\mathcal S}, {\mathsf P} ) $. Let $ {\mathsf P} _ {n} $ be the sum of the point masses $ {1 / n } $ at $ X _ {i} $ for $ i = 1 \dots n $; it is called an empirical measure for $ {\mathsf P} $( cf. also Empirical process). Then the law of large numbers for empirical measures holds uniformly over any VCM class $ {\mathcal C} $, meaning that the supremum for $ C \in {\mathcal C} $ of $ | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } | $ approaches zero almost surely as $ n $ becomes large [a7]. This can be improved to a uniform law of the iterated logarithm, meaning that for any VCM class $ {\mathcal C} $, with probability $ 1 $,
$$ {\lim\limits } \sup _ {n \rightarrow \infty } n ^ {1/2 } \sup _ {C \in {\mathcal C} } { \frac{\left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( C ) } \right | }{( 2 { \mathop{\rm log} } { \mathop{\rm log} } n ) ^ {1/2 } } } = $$
$$ = \sup _ {A \in {\mathcal C} } ( {\mathsf P} ( A ) ( 1 - {\mathsf P} ( A ) ) ) ^ {1/2 } . $$
Moreover, a central limit theorem holds uniformly: if $ {\mathcal C} $ is any VCM class, and $ G _ {\mathsf P} $ assigns to sets in $ {\mathcal C} $ jointly normal (Gaussian) random variables with mean zero and covariances $ {\mathsf E} G _ {\mathsf P} ( A ) G _ {\mathsf P} ( B ) = {\mathsf P} ( A \cap B ) - {\mathsf P} ( A ) P ( B ) $, then for any $ \epsilon > 0 $ there is a sufficiently large $ m $ such that for every $ n \geq m $, there exists a $ G _ {\mathsf P} $ with
$$ \sup _ {A \in {\mathcal C} } \left | {n ^ {1/2 } ( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) - G _ {\mathsf P} ( A ) } \right | < \epsilon $$
on an event with probability at least $ 1 - \epsilon $. For the uniform central limit theorem to hold for each probability measure $ {\mathsf P} $ on $ ( S, {\mathcal S} ) $, the VC property is also necessary.
VC classes can be generated as follows. Let $ V $ be a $ k $- dimensional vector space of real-valued functions on $ S $. For each $ f \in V $, let $ { \mathop{\rm pos} } ( f ) $ be the set where $ f > 0 $. Then the class $ {\mathcal C} $ of all sets $ { \mathop{\rm pos} } ( f ) $ for $ f \in V $ is a VC class with $ S ( {\mathcal C} ) = k $. For example, the set of all ellipsoids in a Euclidean space $ \mathbf R ^ {d} $ is a VCM class for each $ d $. Also, let $ {\mathcal C} $ be a VC class and $ m $ a finite integer. Let $ {\mathcal D} $ be the union of all Boolean algebras of sets (cf. Boolean algebra), each generated by at most $ m $ sets in $ {\mathcal C} $. Then $ {\mathcal D} $ is a VC class. For example, the set of all convex polytopes with at most $ m $ faces in $ \mathbf R ^ {d} $ is a VC class for each $ m $ and $ d $. Classes of projections of positivity sets of polynomials of bounded degree, and some other related classes, are also VC [a4].
The class of all finite sets in $ \mathbf R ^ {d} $ and the class of all closed convex sets are not VC classes.
The notion of VC class extends in different ways to a class $ {\mathcal F} $ of real functions on $ S $. The subgraph of a function $ f $ is the set
$$ \left \{ {( s,x ) } : {0 \leq x \leq f ( s ) \textrm{ or } f ( s ) \leq x \leq 0 } \right \} $$
in $ S \times \mathbf R $. Then $ {\mathcal F} $ is called a VC subgraph class if the collection of all subgraphs of functions in $ {\mathcal F} $ is a VC class in $ S \times \mathbf R $; it is called a VC major class if the class of all sets $ \{ {s \in S } : {f ( s ) > x } \} $ for $ f \in {\mathcal F} $ and real $ x $ is a VC class in $ S $.
The above probability limit theorems extend to these and larger classes of functions, with suitable measurability and boundedness. Neither the VC subgraph nor VC major property implies the other. For a uniformly bounded, suitably measurable family of functions, the uniform central limit property for all $ {\mathsf P} $ appears not to be equivalent to any VC-type combinatorial property.
For a probability measure $ {\mathsf P} $ and two events $ A,B $, let $ d _ {1, {\mathsf P} } ( A,B ) = {\mathsf E} | {1 _ {A} - 1 _ {B} } | $. For a totally bounded metric space $ ( T,d ) $ and $ \epsilon > 0 $, let $ D ( \epsilon,T,d ) $ be the maximum number of points of $ T $ all at distance at least $ \epsilon $ from each other. For any $ m $ there is a $ K _ {m} < \infty $ such that for every VCM class $ {\mathcal C} $ with $ S ( {\mathcal C} ) = m $ and any $ {\mathsf P} $,
$$ D ( \epsilon, {\mathcal C},d _ {1, {\mathsf P} } ) \leq K _ {m} \epsilon ^ {- m } , $$
[a3]. There is a universal constant $ K $ such that for every VCM class $ {\mathcal C} $ and any $ M < \infty $,
$$ { \mathop{\rm Pr} } \left \{ \sup _ {A \in {\mathcal C} } \left | {( {\mathsf P} _ {n} - {\mathsf P} ) ( A ) } \right | > M \right \} \leq $$
$$ \leq KM ^ {2S ( {\mathcal C} ) - 1 } { \mathop{\rm exp} } ( - 2M ^ {2} ) , $$
[a5].
Every VC class is included in a maximal class with the same VC index. If $ {\mathcal C} $ is a maximal VC class of index $ 1 $, then for any $ A \in {\mathcal C} $ the set of symmetric differences $ ( B \setminus A ) \cup ( A \setminus B ) $ for $ B \in {\mathcal C} $ has a tree-like partial ordering by inclusion, and conversely, such an ordering implies $ S ( {\mathcal C} ) = 1 $[a2]. For index greater than $ 1 $ no such structure is known (1996).
A general reference on VC classes of sets and functions, also from the viewpoint of probability and statistics, is [a6], Sect. 2.6.
Vapnik-Chervonenkis dimension
Vapnik–Červonenkis dimension, VC-dimension
Let $H = (V_H,E_H)$ be a hypergraph. The Vapnik–Chervonenkis dimension of $H$ is the largest cardinality of a subset $F$ of $V_H$ that is scattered by $E_H$, i.e. such that for all $A \subseteq F$ there is an $E \in E_H$ with $A = F \cap E$. Thus, it is the same as the index of a Vapnik–Chervonenkis class. It is usually denoted by $\mathrm{VC}(H)$.
Computing the Vapnik–Chervonenkis dimension is $\mathcal{NP}$-hard (cf. also $\mathcal{NP}$) for many classes of hypergraphs, [b1], [b2].
The Vapnik–Chervonenkis dimension plays an important role in learning theory, especially in probably approximately correct (PAC) learning. Thus, learnability of classes of $\{0,1\}$-valued functions is equivalent to finiteness of the Vapnik–Chervonenkis dimension, [b3].
For the role of the Vapnik–Chervonenkis dimension in neural networks, see, e.g., [b4], [b5].
The independence number of a hypergraph $H$ is the maximal cardinality of a subset $A$ of $V_H$ that does not contain any $E \in E_H$ (see also Graph, numerical characteristics of a). This notion is closely related with $\mathrm{VC}(H)$, [b6], [b7].
References
[a1] | A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension" JACM , 6 (1989) pp. 929–965 |
[a2] | R.M. Dudley, "The structure of some Vapnik–Červonenkis classes" , Proc. Berkeley Conf.in honor of J. Neyman and J. Kiefer , 2 , Wadsworth (1985) pp. 495–508 |
[a3] | D. Haussler, "Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik–Chervonenkis dimension" J. Combin. Th. A , 69 (1995) pp. 217–232 |
[a4] | G. Stengle, J. Yukich, "Some new Vapnik–Chervonenkis classes" Ann. Statist. , 17 (1989) pp. 1441–1446 |
[a5] | M. Talagrand, "Sharper bounds for Gaussian and empirical processes" Ann. Probab. , 22 (1994) pp. 28–76 |
[a6] | A. van der Vaart, J. Wellner, "Weak convergence and empirical processes" , Springer (1996) |
[a7] | V.N. Vapnik, A.Ya. Červonenkis, "On the uniform convergence of frequencies of occurrence of events to their probabilities" Th. Probab. Appl. , 16 (1971) pp. 264–280 |
[a8] | R.M. Dudley, "Central limit theorems for empirical measures" Ann. of Probab. , 6 (1978) pp. 899–929 |
[a9] | R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326 |
[a10] | D. Pollard, "Convergence of stochastic processes" , Springer (1984) |
[b1] | E. Kranakis, D. Krizanc, B. Ruf, J. Urrutia, G. Wöginger, "The VC-dimension of set systems defined by graphs" Discr. Appl. Math. , 77 : 3 (1997) pp. 237–257 |
[b2] | C.H. Papadimitriou, M. Yannakakis, "On limited nondeterminism and the complexity of VC-dimension" J. Comput. Syst. Sci. , 53 : 2 (1996) pp. 161–170 |
[b3] | S. Ben-David, N. Cesa-Bianchi, D. Haussler, P.M. Long, "Characterizations of learnability of $\{0,\ldots,n\}$-valued functions" J. Comput. Syst. Sci. , 50 : 1 (1995) pp. 74–86 DOI 10.1006/jcss.1995.1008 Zbl 0827.68095 |
[b4] | S.B. Holden, "Neural networks and the VC-dimension" J.G. McWhirter (ed.) , Mathematics in Signal Processing , III , Oxford Univ. Press (1994) pp. 73–84 |
[b5] | W. Maass, "Perspectives of current research about the complexity of learning on neural nets" V. Roychowdhury (ed.) et al. (ed.) , Theoretical Advances in Neural Computation and Learning , Kluwer Acad. Publ. (1994) pp. 295–336 |
[b6] | D.Q. Naiman, H.P. Wynn, "Independence number and the complexity of families of sets" Discr. Math. , 154 (1996) pp. 203–216 |
[b7] | J. Pach, P.K. Agarwal, "Combinatorial geometry" , Wiley/Interscience (1995) pp. 247–254 |
Spanning set
generating set, for a module $M$ over a ring $R$
A subset $S$ of $M$ such that every element of $M$ can be written as a finite linear combination $\sum_{i=1}^k r_i s_i$ with $r_i \in R$ and $s_i \in S$: a set $S$ such that $M$ is the linear span of $S$.
References
- P. M. Cohn, "Classic Algebra" Wiley (2000) ISBN 047187731X
Edit distance
A measure of dissimilarity between words over some alphabet in terms of the number of elementary "edit" operations required to. turn one word into another.
Examples include
- Hamming distance between words of the same length. An edit operation consists of substitution: replacing one letter in a given position by another letter in the same position.
- Lee distance between words of the same length over the alphabet $\mathbf{Z}/m$. An edit operation consists of replacing one letter $i \pmod m$ by $i\pm 1 \pmod m$.
- Levenshtein distance between strings. An edit operation consists of deleting one character or inserting one character. A substitution can be obtained as a deletion following by an insertion, but may be considered as another elementary operation.
References
Star-semiring
$*$-semiring
A semiring with a unary operator $*$.
A Conway semiring satisfies the properties $$ (x+y)^* = (x^*y)^*x^* $$ and $$ (xy)^* = 1 + x(yx)^*y \ . $$
In a Conway semiring $C$ the $*$ operator extends to the matrix semiring over $C$.
References
Jean Berstel, Christophe Reutenauer, "Noncommutative rational series with applications", Encyclopedia of Mathematics and its Applications 137, Cambridge (2011) ISBN 978-0-521-19022-0 Zbl 1250.68007
Club
Let $\gamma$ be a limit ordinal. A subset $a \subset \gamma$ is unbounded in $\gamma$ if $\sup a = \gamma$; it is closed in $\gamma$ if it is closed in the order topology on $\gamma$: that is, for any limit $\mu < \gamma$ with $\sup(a\cap\mu) = \mu$ then $\mu \in a$. A club (closed unbounded) subset $a$ is one which is both closed and unbounded in $\gamma$.
If the cofinality $\kappa$ of $\gamma$ is greater than $\omega$, then the intersection of any family of fewer than $\kappa$ clubs is again a club.
References
- Kenneth Kunen, "Set Theory", Studies in Logic 34, College Publications (2013) ISBN 978-1-84890-050-9. Zbl 1262.03001
Weakly compact cardinal
A cardinal $\mathfrak{k}$ is weakly compact if for any 2-colouring of the edges of a complete graph on a vertex set of cardinality $\mathfrak{k}$ there is a homogeneous subgraph of cardinality $\mathfrak{k}$.
Weakly compact cardinals are inaccessible: that is, their existence is independent of ZFC.
Every weakly compact cardinal is a Mahlo cardinal.
References
- Erdős, Paul; Tarski, Alfred "On some problems involving inaccessible cardinals", Essays on the foundations of mathematics, Magnes Press (1961) pp. 50–82, MR0167422. Zbl 0212.32502
Weakly compact operator
An operator $T : X \rightarrow Y$ on Banach spaces is weakly compact if $T$ sends bounded subsets of $X$ into weakly compact subsets of $Y$.
See: Grothendieck space, Dunford–Pettis property, Dunford–Pettis operator.
Grothendieck universe
A type of set in which all of mathematics can be performed. Formally, a set $U$ with the properties:
- $x \in U$ and $y \in x$ implies $y \in U$: that is, $U$ is a transitive set;
- If $x, y \in U$ then the doubleton $\{x,y\} \in U$;
- If $x \in U$ then the power set $\mathcal{P}(x) \in U$;
- If $\{x_\alpha\}_{\alpha \in y}$ is a family of elements of $U$ indexed by $y$, and $y \in U$, then the union $\cup_{\alpha \in y} x_\alpha \in U$.
The empty set is a Grothendieck universe, as is the set $V_\omega$ of hereditarily finite sets.
The existence of a non-trivial Grothendieck universe is equivalent to a large cardinal axiom: the existence of strongly inaccessible cardinals.
References
- Thomas Streicher, "Universes in Toposes" in Crosilla, Laura and Schuster, Peter (edd.) From sets and types to topology and analysis. Towards practicable foundations for constructive mathematics. Clarendon Press (2005). pp. 78–90. ISBN 978-0-19-856651-9 Zbl 1092.03038
Hereditarily finite set
A finite set all of whose elements are finite sets.
Large cardinal axiom
Strongly inaccessible cardinal
Inaccessible cardinal
Finite type
For an algebraic structure, an alternative term for finitely generated.
A sheaf of modules $\def\cF{ {\mathcal F}}\cF$ over a sheaf of rings $\cO$ is a sheaf of finite type if it is locally generated over $\cO$ by a finite number of sections.
A Riemann surface $M$ is of finite type if it can be imbedded in a compact Riemann surface $\tilde M$ such that $\tilde M \setminus M$ consists of finitely many points. Cf. also Riemann surfaces, classification of and (the references to) Double of a Riemann surface.
Local properties of a group
In group theory, if $\mathcal{P}$ is a property of groups, then a group $G$ is said to be locally $\mathcal{P}$ if every finitely-generated subgroup of $G$ has the property $\mathcal{P}$. The term was introduced by A.G. Kurosh.
See
- Locally cyclic group
- Locally finite group
- Locally free group
- Locally nilpotent group
- Locally solvable group
The term "Locally normal group" does not fit this paradigm.
References
- B. Chandler, W. Magnus, "The History of Combinatorial Group Theory: A Case Study in the History of Ideas", Springer (2012) ISBN 1-4613-9487-2
- A.G. Kurosh, Group theory. (Теория групп) (Russian) OGIZ, Moskva-Leningrad (1944) Zbl 0061.02101
Separable partial order
A partially ordered set $(X,{<})$ is Cantor separable if no strictly increasing linearly ordered subset has a least upper bound: $$ a_1 < a_2 < \cdots < b $$ implies there exists $c$ with $$ a_1 < a_2 < \cdots < c < b \ . $$ It is duBois–Reymond separable if a strictly increasing sequence can be separated from a decreasing sequence of upper bounds: $$ a_1 < a_2 < \cdots < \cdots < b_2 < b_1 $$ implies there exists $c$ with $$ a_1 < a_2 < \cdots < c < \cdots < b_2 < b_1 \ . $$
References
- R.C. Walker, "The Stone–Čech compactification", Springer (1974)
Heron triangle
A triangle for which the lengths of the sides and the area are expressible by integers. Named after Heron (1st century A.D.), who studied triangles with side lengths $13,14,15$ and $5,12,13$, the areas of which are 84 and 30, respectively.
The Pythagorean triangles are special cases (cf. Pythagorean numbers). In this case the area is a congruent number.
The Heron formula for the area $S$ of a triangle in terms of its sides $a$, $b$ and $c$ and semi-perimeter $p=(a+b+c)/2$ is $$S=\sqrt{p(p-a)(p-b)(p-c)},$$ so Heron triangles correspond to integer solutions to $$ s^2 = p(p-a)(p-b)(p-c) \ . $$
References
Trace monoid
Let $A$ be an alphabet with an irreflexive symmetric relation $I$ called independence. The complementary relation $I = A \times A \setminus I$ is the "dependence" relation. Such an alphabet is a concurrence or dependency alphabet. The free monoid on $A$ modulo the relations $ab=ba$ when $a,b \in I$ is the trace monoid on $(A,D)$. The elements of a trace monoid are "traces" and the subsets are the "trace languages".
Trace monoids are used to model concurrency in computer languages.
References
- Diekert, Volker; Rozenberg, Grzegorz (edd) "The Book Of Traces" (World Scientific, 1995) ISBN 981-02-2058-8
Trace-class operator
An operator $T$ on a Hilbert space $H$ with complete orthonormal set $(e_n)$ for which the sum $\sum_n \langle Tx_n , x_n \rangle$ is finite. For such operators, the trace is defined to be the value of this sum. The set of trace-class operators on $H$ coincides with the set of squares of Hilbert-Schmidt operators. The trace-class operators are precisely the Schatten class for $p=1$.
References
- Retherford, J. R. "Hilbert space: Compact operators and the trace theorem" London Mathematical Society Student Texts 27. (Cambridge University Press, 1993) ISBN 0-521-42933-1. Zbl 0783.47031
Schatten class
Schatten ideal
A class of operators on a Hilbert space. Let $T$ be an operator with singular values $\sigma_n$. For $1 \le p < \infty$ we say that $T$ is in the Schatten $p$-class if the sequence $(\sigma_n)$ is in $\ell_p$: that is, if $\sum_n |\sigma_n|^p$ converges, and then the $p$-root of the value is the Schatten $p$-norm of $T$. The Schatten classes form ideals of the operator algebra.
The Schatten $2$-class is precisely the Hilbert–Schmidt operators. The Schatten $1$-class is the trace-class operators.
References
- Retherford, J. R. "Hilbert space: Compact operators and the trace theorem" London Mathematical Society Student Texts 27. (Cambridge University Press, 1993) ISBN 0-521-42933-1. Zbl 0783.47031
- Schatten, Robert. "Norm ideals of completely continuous operators" Ergebnisse der Mathematik und ihrer Grenzgebiete. Neue Folge, 27 . (Springer-Verlag, 1960) ISBN 0090.09402
Singular value
The singular values of a complex matrix $A$ are the eigenvalues of $A^*A$, or equivalently of $AA^*$. The singular value decomposition of $A$ is the expression $A=U\Sigma V$, with $U$ a unitary $(m\times n)$-matrix, $V$ a unitary $(n\times n)$-matrix and $\Sigma$ of the form $$ \Sigma = \begin{pmatrix} {\mathcal D} & 0\\ 0 & 0\end{pmatrix}, $$ where ${\mathcal D}$ is diagonal with entries the singular values $s_1,\dots,s_k$ of $A$ and $k$ the rank of $A$.
In the case of a closed operator $A$ on a Hilbert space, then $A^*$ is a positive operator and the singular values of $A$ are the spectrum of $A^*A$.
Span
Span may refer to
- Linear hull, also called linear span or span
- Span (category theory)
Span (category theory)
A diagram in a category of the form $$ \begin{array}{ccccc} & & C & & \\ & f \swarrow & & \searrow g & \\ A & & & & B \end{array} $$
Two spans with arrows $(f,g)$ and $(f',g')$ are equivalent if for all $D,p,q$ the diagrams $$ \begin{array}{ccccc} & & C & & \\ & f \swarrow & & \searrow g & \\ A & & & & B \\ & p \searrow & & \swarrow q \\ & & D & & \\ \end{array} \ \ \text{and}\ \ \begin{array}{ccccc} & & C & & \\ & f' \swarrow & & \searrow g' & \\ A & & & & B \\ & p \searrow & & \swarrow q \\ & & D & & \\ \end{array} $$ either both commute or both do not commute.
A pushout is the colimit of a span.
References
[1] | S. MacLane, "Categories for the working mathematician" , Springer (1971). ISBN 0-387-98403-8 |
Ostrowski representation
MSC 11A67
Let $[a_1,a_2,\ldots]$ be the partial quotients of an infinite continued fraction and $(c_n)$ the corresponding continuants, $c_0 = 1$, $c_1 = a_1$ and $c_{n+1} = a_{n+1} c_n + c_{n-1}$. An Ostrowski representation of $N$ is $$ N = \sum_{k=0}^n x_{k+1} c_k $$ where $0 \le x_k \le a_k$ and if $x_k = a_k$ then $x_{k-1} = 0$. Every positive integer $M$ has a unique Ostrowski representation.
When the $a_n$ are all equal to $1$, the $c_n$ are the Fibonacci numbers, and the Ostrowski representation is just the Zeckendorf representation. The addition of two $n$-digit numbers in Ostrowski representation based on a given continued fraction can be computed by three linear passes over the input.
References
- Philipp Hieronymi; Alonza Terry jun. "Ostrowski numeration systems, addition, and finite automata" Notre Dame J. Formal Logic 59 (2018) 215-232 Zbl 06870290
Dirac comb
A sum of Dirac delta-functions supported on a locally finite point set.
References
- Michael Baake, Uwe Gromm; "Aperiodic order", vol.1, Encyclopedia of Mathematics and its Applications 149 (Cambridge, 2013) ISBN 978-0-521-86991-1 Zbl 1295.37001
- Marjorie Senechal; "Quasicrystals and geometry" (Cambridge, 1995) ISBN 0-521-57541-9 Zbl 0828.52007
Delone set
Delaunay set, $(r,R)$-set
A subset $D$ of $\mathbf{R}^n$ which is both discrete: there exists $r>0$ such that the balls of radius $r$ centred on points of $D$ are disjoint; and relatively dense: there exists $R$ such that the balls of radius $R$ centred on points of $D$ cover $\mathbf{R}^n$.
See also Covering and packing.
The spectrum of a Delone set $D$ is defined as the Fourier transform $$ \hat\gamma(s) = \lim_{T\to\infty} \frac{1}{(2T)^n} \sum_{d \in D_T} \exp(-2\pi i\, d\cdot s) $$ where $D_T = D \cap [-T,T]^n$. The spectrum $\hat\gamma$ is a positive measure and has a Lebesgue decomposition into a sum of discrete and continuous measures. If the discrete measure is supported on a countably infinite set $S$, then $D$ is said to satisfy the diffraction condition.
References
- Marjorie Senechal; "Quasicrystals and geometry" (Cambridge, 1995) ISBN 0-521-57541-9 Zbl 0828.52007
SIS model
A simple model in mathematical epidemiology which reduces to the logistic equation. Assume that the population falls into two subgroups, "susceptible" ($S$) and "infected" ($I$), with susceptible members being infected at a rate proportional to the number of infected, and infected members recovering and returning to the susceptible subgroup at a constant rate. We therefore have $$ S' = - \beta S \cdot I + \alpha I $$ and $$ I' = \beta S \cdot I - \alpha I $$ Since $S+I = N$ is constant, we have $I' = r I (1- k^{-1} I)$ where $r = \beta N - \alpha$ is the growth rate and $k = r / \beta$. The basic reproduction number $R_0 = \beta N / \alpha$. If $R_0 < 1$, so that $r<0$, then $I$ decreases to zero. Otherwise we have the explicit solution $$ I(t) = \frac{ k B e^{rt} }{ 1 + B e^{rt} } $$ where $B= I(0) / (k - I(0))$ and $I(t)$ tends to $k$ as $t \to \infty$
Reference
- Maia Martcheva, "An Introduction to Mathematical Epdidemiology" Texts in Applied Mathematics 41 (Springer, 2015) ISBN 978-1-4899-7611-6 Zbl 1333.92006
Pregroup
A pregroup generalises the notion of a free group with amalgamation. A pregroup is a partially ordered monoid $(M,{\cdot},1,{\ge})$ with left and right adjoint maps $L$ and $R$ satisfying $$ x^L \cdot x \ge 1 \ge x \cdot x^L $$ $$ x \cdot x^R \ge 1 \ge x^R \cdot x $$ It follows that $x^{LR} = x = x^{RL}$
A partially ordered group is a pregroup, with both adjoint maps being group inversion.
References
- J. Stallings, "Group theory and three-dimensional manifolds" Yale Univ. Monogr. 4 (1971) Zbl 0241.57001
Freiman homomorphism
A map $\phi$ defined on a subset $A$ of an additive group $G$ to a group $H$ such that for $a_1,a_2,a_3,a-4 \in A$ $$ a_1 + a_2 = a_3 + a_4 \ \Rightarrow\ \phi(a_1) + \phi(a_2) = \phi(a_3) + \phi(a_4) $$
Clearly an affine map $x \mapsto \psi(x) + b$ with $\psi$ a group homomorphism and $b \in H$ is a Frieman homomorphism. A notable problem in additive combinatorics is to find conditions on $A$ that require every Freiman homomorphism to be affine.
More generally, a Freiman homomorphism of order $k$ satisfies the corresponding property for $k$-tuples with equal sums.
References
- Melvyn B. Nathanson, "Additive Number Theory: Inverse Problems and the Geometry of Sumsets", Graduate Texts in Mathematics 165 (Springer, 1996) ISBN 0-387-94655-1
- Terence Tao, Van H. Vu; "Additive Combinatorics", Cambridge Studies in Advanced Mathematics 105 (Cambridge University Press, 2006)ISBN 1-1394-5834-5
- David J. Grynkiewicz, "Structural Additive Theory", Developments in Mathematics 30 (Springer, 2013) ISBN 3-319-00416-6
Richard Pinch/sandbox-13. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Richard_Pinch/sandbox-13&oldid=51573