# Multi-dimensional statistical analysis

multivariate statistical analysis

The branch of mathematical statistics devoted to mathematical methods for constructing optimal designs for the collection, systematization and processing of multivariate statistical data, directed towards clarifying the nature and the structure of the correlations between the components of the multivariate attribute in question, and intended for obtaining scientific and practical inferences. By a multivariate attribute is meant a $p$- dimensional vector $\mathbf x = ( x _ {1} \dots x _ {p} ) ^ \prime$ of components (laws, variables) $x _ {1} \dots x _ {p}$ which may be quantitative, that is, measuring in some fixed scale the degree of manifestation of the studied property of an object, it may be ordering (or ordinal), that is, allowing the objects being analyzed to be ordered relative to the degree of manifestation in them of the studied property, and it may be classifying (or nominal), that is, allow the collection of objects being investigated, which does not lend itself to ordering, to be separated into homogeneous (relative to the analyzed property) classes. The results of measuring these components,

$$\tag{1 } \{ \mathbf x _ {\cdot i } \} _ {1} ^ {n} = \ \{ ( x _ {1i} \dots x _ {p i } ) ^ \prime \} _ {1} ^ {n}$$

for each of $n$ objects of a collection, forms a sequence of multivariate observations, or an initial ensemble of multivariate data, for conducting a multivariate statistical analysis. A significant part of multivariate statistical analysis involves the situation in which $\mathbf x$ is interpreted as a multivariate random variable, and the corresponding sequence of observations (1) is a population sample. In this case the choice of a method for processing the initial statistical data and the analysis of their properties is carried out on the basis of assumptions regarding the nature of the multivariate (joint) law of the probability distribution ${\mathsf P} ( \mathbf x )$.

The content of multivariate statistical analysis can be conventionally divided into three basic subdivisions: the multivariate statistical analysis of multivariate distributions and their basic characteristics; the multivariate statistical analysis of the nature and structure of the correlations between the components of the multivariate attribute being investigated; and the multivariate statistical analysis of the geometric structure of the set of multi-dimensional observations being investigated.

## Multivariate statistical analysis of multivariate distributions and their fundamental characteristics.

This branch covers only situations in which the observations (1) being processed have a probabilistic nature, that is, can be interpreted as a sample from a corresponding population. The basic problems of this branch are: the statistical estimation, for the multivariate distributions in question, of their fundamental numerical characteristics and parameters; the investigation of the properties of the statistical estimators used; and the investigation of the probability distributions of a number of statistics that are used to construct statistical tests for the verification of various hypotheses on the nature of the multi-dimensional data being analyzed. The fundamental results are related to the particular case when the attribute in question, $\mathbf x$, is subject to a multivariate normal law $N _ {p} ( \pmb\mu , \mathbf V )$, with density function $f ( \mathbf x \mid \pmb\mu , \mathbf V )$ given by

$$\tag{2 } f ( \mathbf x \mid \pmb\mu , \mathbf V ) = \ \frac{1}{( 2 \pi ) ^ {p/2} | \mathbf V | ^ {1/2} } \times$$

$$\times \mathop{\rm exp} \left \{ - \frac{1}{2} ( \mathbf x - \pmb\mu ) ^ \prime \mathbf V ^ {-} 1 ( \mathbf x - \pmb\mu ) \right \} ,$$

where $\pmb\mu = ( \mu _ {1} \dots \mu _ {p} ) ^ \prime$ is the vector of mathematical expectations (cf. Mathematical expectation) of the components of $\mathbf x$, that is, $\pmb\mu _ {i} = {\mathsf E} x _ {i}$, $i = 1 \dots p$, and $V = \| v _ {ij} \| _ {i , j = 1 } ^ {p}$ is the covariance matrix of $\mathbf x$, that is, $v _ {ij} = {\mathsf E} ( x _ {i} - \mu _ {i} ) ( x _ {j} - \mu _ {j} )$ is the covariance of these components of $\mathbf x$( the non-degenerate case $\mathop{\rm rank} \mathbf V = p$ is considered; in case $\mathop{\rm rank} \mathbf V = p ^ \prime < p$, all the results remain true, but in a subspace of a smaller dimension $p ^ \prime$ on which the probability distribution of $\mathbf x$ is concentrated).

Thus, if (1) is a sequence of independent observations, forming a random sample from $N _ {p} ( \pmb\mu , \mathbf V )$, then the maximum-likelihood estimators for the parameters $\pmb\mu$ and $\mathbf V$ in (2) are, respectively, the statistics (see [1], [2])

$$\tag{3 } \widehat{\pmb\mu} = \frac{1}{n} \sum _ { i= } 1 ^ { n } \mathbf x _ {\cdot i }$$

and

$$\tag{4 } \widehat{\mathbf V} = \frac{1}{n} \sum _ { i= } 1 ^ { n } ( \mathbf x _ {\cdot i } - \widehat{\pmb\mu} ) ( \mathbf x _ {\cdot i } - \widehat{\pmb\mu} ) ^ \prime ,$$

where the random vector $\widehat{\pmb\mu}$ is subject to the $p$- dimensional normal law $N _ {p} ( \pmb\mu , \mathbf V / n )$ and is statistically independent of $\widehat{\mathbf V}$, and the joint distribution of the elements of the matrix $\widehat{\mathbf Q} = n \widehat{\mathbf V}$ is described by the so-called Wishart distribution (see [4]) with density

$$w ( \widehat{\mathbf Q} \mid \mathbf V ; n) =$$

$$= \ \frac{\widehat{\mathbf Q} ^ {( n - p - 2 ) / 2 } \mathop{\rm exp} \{ - \mathop{\rm tr} ( \mathbf V ^ {-} 1 \widehat{\mathbf Q} ) /2 \} }{2 ^ { ( n - 2 ) p / 2 } \pi ^ {p ( p - 1 ) / 4 } | \mathbf V | ^ {( n - 1 ) / 2 } \prod _ { j= } 1 ^ { p } \Gamma ( ( n- j ) / 2) }$$

if $\widehat{\mathbf Q}$ is positive definite, and 0 otherwise.

Within this scheme, the distribution and moments of sampling characteristics of multivariate random variables such as the coefficients of paired, partial and multiple correlations, the generalized variance (i.e., the statistic $| \widehat{\mathbf V} |$) and the generalized Hotelling $T ^ {2}$- statistic (cf. Hotelling $T ^ {2}$- distribution and [5]) have been investigated. In particular (see [1]), if the sample covariance matrix $\mathbf S _ {n}$ is defined as the estimator $\widehat{\mathbf V}$ made "unbiased" , namely:

$$\tag{5 } \mathbf S _ {n} = \frac{n}{n-} 1 \widehat{\mathbf V} ,$$

then the distribution of $\sqrt n ( | \mathbf S _ {n} | / | \mathbf V | ^ {-} 1 )$ tends to $N _ {1} ( 0 , 2 p )$ as $n \rightarrow \infty$, and the random variables

$$\tag{6 } \frac{n - p }{p ( n - 1 ) } T ^ {2} = \ \frac{n - p }{p ( n - 1 ) } n ( \widehat{\pmb\mu} - \pmb\mu ) ^ \prime \mathbf S _ {n} ^ {-} 1 ( \widehat{\pmb\mu} - \pmb\mu )$$

and

$$\tag{7 } \frac{n _ {1} + n _ {2} - p - 1 }{( n _ {1} + n _ {2} - 2 ) p } \widetilde{T} {} ^ {2\ } =$$

$$= \ \frac{n _ {1} + n _ {2} - p - 1 }{( n _ {1} + n _ {2} - 2 ) p } \frac{n _ {1} n _ {2} }{n _ {1} + n _ {2} } ( \widehat{\pmb\mu} _ {n _ {1} } - \widehat{\pmb\mu} _ {n _ {2} } ) ^ \prime \mathbf S _ {n _ {1} + n _ {2} } ^ {-} 1 ( \widehat{\pmb\mu} _ {n _ {1} } - \widehat{\pmb\mu} _ {n _ {2} } )$$

have the Fisher $F$- distribution with degrees of freedom $( p , n - p )$ and $( p , n _ {1} + n _ {2} - p - 1 )$, respectively. In (7), $n _ {1}$ and $n _ {2}$ are the sizes of two independent samples of the form (1) taken from the same population $N _ {p} ( \pmb\mu , \mathbf V )$, $\widehat{\pmb\mu} _ {n _ {i} }$ and $\mathbf S _ {n _ {i} }$ being estimators of the form (3) and (4)–(5), constructed with respect to the $i$- th sample, and

$$\mathbf S _ {n _ {1} + n _ {2} } = \ \frac{1}{n _ {1} + n _ {2} - 2 } [ ( n _ {1} - 1 ) \mathbf S _ {n _ {1} } + ( n _ {2} - 1 ) \mathbf S _ {n _ {2} } ]$$

is the common sample covariance matrix constructed with respect to the estimators $\mathbf S _ {n _ {1} }$ and $\mathbf S _ {n _ {2} }$.

## Multivariate statistical analysis of the nature and structure of correlations between the components of the multivariate attribute in question.

This branch unifies the ideas and results used in such methods and models of multivariate statistical analysis as multiple regression; multivariate dispersion analysis and covariance analysis; factor analysis; the method of principal components; and the analysis of canonical correlations. The results of this branch may be conventionally divided into two basic types.

1) The construction of best (in a specified sense) statistical estimators for the parameters of these models and the analysis of their properties (more precisely, and in a probabilistic formulation, of their distribution laws, confidence regions, etc.). Thus, let the multivariate attribute $\mathbf x$ be interpreted as a vector-valued random variable subject to the $p$- dimensional normal distribution $N _ {p} ( \pmb\mu , \mathbf V )$, and let it be partitioned into two subvectors $\mathbf x ^ {(} 1)$ and $\mathbf x ^ {(} 2)$ of dimensions $q$ and $p- q$, respectively. This defines a corresponding partition of the expectation vector $\pmb\mu$ and of the theoretical and sample covariance matrices $\mathbf V$ and $\widehat{\mathbf V}$, namely:

$$\pmb\mu = \ \left ( \begin{array}{c} \pmb\mu ^ {(} 1) \\ \pmb\mu ^ {(} 2) \end{array} \right ) ,\ \ \mathbf V = \ \left ( \begin{array}{cc} \mathbf V _ {11} &\mathbf V _ {12} \\ \mathbf V _ {21} &\mathbf V _ {22} \\ \end{array} \right ) \ \textrm{ and } \ \ \widehat{\mathbf V} = \ \left ( \begin{array}{cc} \widehat{\mathbf V} _ {11} &\widehat{\mathbf V} _ {12} \\ \widehat{\mathbf V} _ {21} &\widehat{\mathbf V} _ {22} \\ \end{array} \right ) .$$

Then (see [1], [2]) the conditional distribution of the subvector $\mathbf x ^ {(} 1)$( under the condition that the second subvector $\mathbf x ^ {(} 2)$ takes a fixed value) will also be normal $N _ {q} ( \pmb\mu ^ {(} 1) + \mathbf B ( \mathbf x ^ {(} 2) - \pmb\mu ^ {(} 2) ) , \pmb\Sigma )$. Here the maximum-likelihood estimators $\widehat{\mathbf B}$ and $\widehat{\pmb\Sigma}$ of the matrices of regression coefficients $\mathbf B$ and covariances $\pmb\Sigma$, in this classical multivariate model of multiple regression

$$\tag{8 } {\mathsf E} ( \mathbf x ^ {(} 1) \mid \mathbf x ^ {(} 2) ) = \pmb\mu ^ {(} 1) + \mathbf B ( \mathbf x ^ {(} 2) - \pmb\mu ^ {(} 2) ) ,$$

will be the mutually independent statistics

$$\widehat{\mathbf B} = \widehat{\mathbf V} _ {12} \widehat{\mathbf V} {} _ {22} ^ {-} 1 \ \ \textrm{ and } \ \ \widehat \Sigma = \widehat{\mathbf V} _ {11} - \widehat{\mathbf V} _ {12} \mathbf V hat {} _ {22} ^ {-} 1 \widehat{\mathbf V} _ {21} ,$$

respectively. Here the distribution of $\widehat{\mathbf B}$ is the normal law $N _ {q ( p- q ) } ( \mathbf B , \mathbf V _ {\mathbf B } )$, and $\mathbf n \widehat{\pmb\Sigma}$ has the Wishart distribution with parameters $\pmb\Sigma$ and $n - ( p- q )$( the elements of $\mathbf V _ {\mathbf B }$ are given in terms of the elements of $\mathbf V$).

The basic results on the construction of estimators of parameters and in the investigation of their properties in models of factor analysis, of principal components and of canonical correlations are related to the analysis of the probabilistic-statistical properties of the eigen values (characteristic values) and eigen vectors of the various covariance matrices.

In schemes not falling within the limits of the classical normal model or even within the limits of any probabilistic model, the basic results are concerned with the construction of algorithms (and the investigation of their properties) for calculating estimators of parameters which are best from the point of view of some exogeneously given functional of the quality (or adequacy) of the model.

2) The construction of statistical tests for the verification of various hypotheses on the structure of the correlations being investigated. Within the limits of a multivariate normal model (sequences of observations of the form (1) are interpreted as random samples from the corresponding multivariate normal population) statistical tests have been constructed for testing, for example, the following hypotheses.

I) The hypothesis $\pmb\mu = \pmb\mu ^ {*}$, i.e., that the expectation of the variables studied be equal to a specific vector $\pmb\mu ^ {*}$; this is tested via the Hotelling $T ^ {2}$- statistic by substituting $\pmb\mu = \pmb\mu ^ {*}$ in (6).

II) The hypothesis $\pmb\mu ^ {(} 1) = \pmb\mu ^ {(} 2)$ of equality of the expectation vectors in two populations (with identical but unknown covariance matrices), based on two samples; this is tested via the statistic $\widetilde{T} {} ^ {2}$( see [7]).

III) The hypothesis $\pmb\mu ^ {(} 1) = \dots = \pmb\mu ^ {(} k) = \pmb\mu$ of equality of the expectation vectors in several populations (with identical but unknown covariance matrices), based on samples from them; this is tested via the statistic

$$U _ { p , k- 1 , n- k } = \ \frac{\left | \sum _ { j= } 1 ^ { k } \sum _ { i= } 1 ^ { {n _ j } } ( \mathbf x _ { . i } ^ {(} j) - \widehat{\pmb\mu} {} ^ {(} j) ) ( \mathbf x _ {. i } ^ {(} j) - \widehat{\pmb\mu} {} ^ {(} j) ) ^ \prime \right | }{\left | \sum _ { j= } 1 ^ { k } \sum _ { i= } 1 ^ { {n _ j } } ( \mathbf x _ {. i } ^ {(} j) - \widehat{\pmb\mu} ) ( \mathbf x _ {. i } ^ {(} j) - \pmb\mu hat ) ^ \prime \right | } ,$$

in which $\mathbf x _ {. i } ^ {(} j)$ is the $i$- th $p$- dimensional observation in a sample of size $n _ {j}$, representing the $j$- th population, and $\widehat{\pmb\mu} {} ^ {(} j)$ and $\widehat{\pmb\mu}$ are estimators of the form (3), constructed separately with respect to each of the samples and with respect to the joint sample of size $n = n _ {1} + \dots + n _ {k}$, respectively.

IV) The hypotheses $\pmb\mu ^ {(} 1) = \dots = \pmb\mu ^ {(} k) = \pmb\mu$ and $\mathbf V _ {1} = \dots = \mathbf V _ {k} = \mathbf V$ of equivalence of several normal populations, based on samples from them $\{ \mathbf x _ {. i } ^ {(} j) \} _ {i=} 1 ^ {n _ {j} }$, $j = 1 \dots k$; this is tested via the statistic

$$\lambda = \ \frac{\prod _ { j= } 1 ^ { k } | n _ {j} \widehat{\mathbf V} _ {j} | ^ { ( n _ {j} - 1 ) /2 } }{\left | \sum _ { j= } 1 ^ { k } \sum _ { i= } 1 ^ { {n _ j} } ( \mathbf x _ {. i } ^ {(} j) - \widehat{\pmb\mu} ) ( \mathbf x _ {. i } ^ {(} j) - \widehat{\pmb\mu} ) ^ \prime \right | ^ {( n- k ) /2 } } ,$$

in which the $\widehat{\mathbf V} _ {j}$ are estimators of the form (4) constructed separately with respect to the observations from the $j$- th sample, $j = 1 \dots k$.

V) The hypothesis of mutual independence of the subvectors $\mathbf x ^ {(} 1) \dots \mathbf x ^ {(} m)$ of dimensions $p _ {1} \dots p _ {m}$, respectively, into which the initial $p$- dimensional vector $\mathbf x$ has been partitioned, $p _ {1} + \dots + p _ {m} = p$; this is tested via the statistic

$$\pmb\psi = \frac{| \mathbf n \widehat{\mathbf V} | }{\prod _ { i= } 1 ^ { m } | n _ {i} \widehat{\mathbf V} _ {i} | } ,$$

in which $\widehat{\mathbf V}$ and $\widehat{\mathbf V} _ {i}$ are sample covariance matrices of the form (4) for the vector $\mathbf x$ and its subvectors $\mathbf x ^ {(} i)$, respectively.

## Multivariate statistical analysis of the geometric structure of the set of multi-dimensional observations being investigated.

This branch unifies notions and results of models and schemes such as discriminant analysis, mixtures of probability distributions, cluster analysis, taxonomy, and multi-dimensional scaling. The key in all of these schemes is a notion of distance (measure of proximity, measure of similarity) between the elements being analyzed. Here the objects being analyzed may both be real objects, in each of which the values of the components $\mathbf x$ are fixed — then in the geometrical representation the $i$- th object will be a point $\mathbf x _ {. i } = ( x _ {1i}, \dots ,x _ {p i } ) ^ \prime$ in the corresponding $p$- dimensional space, as well as the variables $\mathbf x _ {l . }$, $l = 1 \dots p$, themselves — in the geometrical representation the $l$- th index will be a point $\mathbf x _ {l . } = ( x _ {l1}, \dots, x _ { ln} )$ in the corresponding $n$- dimensional space.

The methods and results of discriminant analysis (see [1], [2], [7]) are directed to the solution of the following problem. Suppose that the existence of a specific number $k \geq 2$ of populations is known and that there is a sample from each (a "training sample" ) known. It is required to construct, on the basis of training samples, the best, in a specified sense, classifying rule which allows one to attribute some new element (an observation $\mathbf x$) to its population, when the investigator does not know in advance to which population the element belongs. Usually, a classification rule means a sequence of actions; the calculation of a scalar function of the variables in question, based on which a decision is taken on assigning the element to one of the classes (the construction of a discriminant function); an ordering of the variables themselves according to their degree of informativeness from the point of view of a proper assignment of elements to classes; and a calculation of the corresponding probabilities of the errors in the classification.

The problem of analysis of a mixture of probability distributions (see [7]) most often (but not always) also arises in connection with the investigation of the "geometric structure" of some population. Here the idea of the $r$- th homogeneous class is formalized with the help of a population described by some (as a rule, unimodal) distribution law ${\mathsf P} ( \mathbf x \mid \pmb\theta _ {r} )$, so that the distribution of the general population from which the sample (1) is extracted is described by a mixture of distributions of the form

$${\mathsf P} ( \mathbf x ) = \sum _ { r= } 1 ^ { k } \pi _ {r} {\mathsf P} ( \mathbf x \mid \pmb\theta _ {r} ) ,$$

where $\pi _ {r}$ is the a priori probability (the specific weight of the elements) of the $r$- th class in the general population. The problem is to give a "good" statistical estimation (with respect to a sample $\{ \mathbf x _ {. i } \} _ {1} ^ {n}$) of the unknown parameters $\pmb\theta _ {r}$, $\pi _ {r}$, and sometimes even $k$. This, in particular, allows one to reduce the problem of the classification of the elements to a scheme of discriminant analysis, although in this case training samples are absent.

The methods and results of cluster analysis (classification, taxonomy, pattern recognition "without a teacher" , see [2], [6], [7]) are directed to the solution of the following problem. The geometric structure of the set of elements to be analyzed is given either by the coordinates of the corresponding points (that is, by the matrix $\| x _ {ij} \|$, $i = 1 \dots p$, $j = 1 \dots n$), or by geometric characteristics of their mutual disposition, for example, by the matrix of pairwise distances $\| \rho _ {ij} \| _ {i , j = 1 } ^ {n}$. It is required to partition the set of elements being investigated into a comparatively small (known in advance or not) number of classes, so that the elements of a class are at a small distance from each other, and at the same time different classes should, as far as possible, be sufficiently far from each other and could not be partitioned into other subsets equally far from each other.

The problem of multi-dimensional scaling (see [6]) is related to the situation when the set of elements being investigated is given via a matrix of mutual distances $\| \rho _ {ij} \| _ {i , j = 1 } ^ {n}$ and consists of attributing to each of the elements a given number ( $p$) of coordinates so that the structure of the mutual distances between the elements, measured using these auxiliary coordinates, would on the average differ least from that given. It should be noted that the basic results and methods of cluster analysis and multi-dimensional scaling have usually been developed without any assumptions regarding the probabilistic nature of the initial data.

## The merits of multivariate statistical analysis in practice.

These consist mainly in processing the following three problems.

### The problem of statistical investigation of dependence between the variables being analyzed.

Suppose that the set of recorded statistical variables $\mathbf x$ partitions, according to the meaning of these variables and the final aim of investigation, into a $q$- dimensional subvector $\mathbf x ^ {(} 1)$ of (dependent) variables to be predicted and a $( p- q )$- dimensional subvector $\mathbf x ^ {(} 2)$ of predicting (independent) variables. Then it can be said that the problem is to determine, on the basis of a sample (1), a $q$- dimensional vector-valued function $f ( \mathbf x ^ {(} 2) )$ from the class of acceptable decisions $F$, which would give the best, in a specific sense, approximation of the behaviour of the subvector $\mathbf x ^ {(} 1)$. Depending on the concrete form of the functional of the quality of the approximation and the nature of the variables being analyzed, one arrives at some scheme of multiple regression, variance, covariance, or confluence analysis [8].

### The problem of classifying elements.

This problem in a general (non-rigorous) formulation is that the whole set of elements (objects or variables) being analyzed, represented statistically as a matrix $\| x _ {ij} \|$, $i = 1 \dots p$, $j = 1 \dots n$, or a matrix $\| \rho _ {ij} \|$, $i , j = 1 \dots n$, partitions into a comparatively small number of homogeneous (in a specified sense) groups [7]. Depending on the behaviour of the a priori information and the concrete form of the functional giving the criteria for the quality of the classification, one arrives at some scheme of discriminant analysis, cluster analysis (taxonomy, pattern recognition "without a teacher" ), or splitting mixtures of distributions.

### The problem of lowering the dimension of the factor space being investigated and the selection of the most informative variables.

This consists of defining a set of a comparatively small number $m \ll p$ of variables $\mathbf z = ( z _ {1} \dots z _ {m} ) ^ \prime$ in the class of admissible transformations $Z ( \mathbf x )$ of the initial variables $\mathbf x = ( x _ {1} \dots x _ {p} )$ for which some exogeneously given measure of informativity for an $m$- dimensional system of tests attains its least upper bound (see [7]). Concretization of the functional giving the measure of self-informativity (that is, aimed at a maximal preservation of the information contained in the statistical ensemble (1) relative to the initial attributes themselves) results, in particular, in various schemes of factor analysis and principal components, and in the method of extremal grouping of tests. Functionals giving a measure of external informativity, that is, aimed at extracting from (1) maximum information relative to certain other variables or phenomena not directly contained in $\mathbf x$, lead to different methods of selecting the most informative variables in schemes of statistical research into dependences and discriminant analysis.

## Fundamental mathematical tools in multivariate statistical analysis.

These consist of special methods of the theory of systems of linear equations and matrices (the method of solution of simple and generalized problems on eigen values and vectors; simple inversion and pseudo-inversion of matrices; a procedure for the diagonalization of matrices; etc.) and certain optimization algorithms (methods of coordinate-wise descent, conjugate gradients, branch-and-bound, various versions of random scanning and stochastic approximation, etc.).

#### References

 [1] T.W. Anderson, "An introduction to multivariate statistical analysis" , Wiley (1958) [2] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 3 , Griffin (1983) [3] L.N. Bol'shev, Bull. Int. Stat. Inst. , 43 (1969) pp. 425–441 [4] J. Wishart, Biometrika , 20A (1928) pp. 32–52 [5] H. Hotelling, "The generalization of student's ratio" Ann. Math. Statist. , 2 (1931) pp. 360–378 [6] J.B. Kruskal, "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis" Psychometrika , 29 (1964) pp. 1–27 [7] S.A. Aivazyan, V.M. Bukhshtaber, I.S. Yenyukov, L.D. Meshalkin, "Applied statistics: classification and reduction of dimensionality" , Moscow (1989) (In Russian) [8] S.A. Aivazyan, I.S. Yenyukov, L.D. Meshalkin, "Applied statistics: study of relationships" , Moscow (1985) (In Russian)