Namespaces
Variants
Actions

Distance-weighted mean

From Encyclopedia of Mathematics
Revision as of 15:51, 4 December 2012 by Boris Tsirelson (talk | contribs) (→‎See also: Robust statistics)
Jump to: navigation, search

The distance-weighted mean is a measure of central tendency, a special case of weighted mean, where weighting coefficient for each data point is computed as the inverse sum of distances between this data point and the other data points[1]. Thus, central observations in a dataset get the highest weights, while values in the tails of a distribution are downweighted. In other words, data points close to other data points carry more weight than isolated datapoints.

An important property of the distance-weighted mean is that computing weighting coefficients does not require mean or other parameters of the original distribution as input information, because each value in the dataset is weighted in relation to the entire data array.

Calculation

The weighting coefficient for xi is computed as the inverse mean distance between xi and the other data points:

\begin{equation*}\bar{x} = \frac{ \sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}\;\;\;\text{where}\;\;\;w_i = \frac{k}{\sum_{j=1}^n |x_i-x_j|}.\end{equation*}

where k is any positive number. The coefficient k is used to avoid computational problems caused by large magnitudes of distances between data points (and thus large sums in the denominator). In most cases, it is useful to set k equal to n (number of data points) or (n-1). In the latter case, each nonstandardized weighting coefficient is the inverse mean distance between the respective data point and the other data points.

Example

Consider a simple numerical example of a dataset consisting of four observations: x1 = 5, x2 = 6, x3 = 8, x4 = 12 (n = 4). Weighting coefficients for xi are:

\[ w_1 = \frac{1}{\left| {x_1-x_2} \right| + \left| {x_1-x_3} \right| + \left| {x_1-x_4} \right|} = \frac{1}{\left| {5-6} \right| + \left| {5-8} \right| + \left| {5-12} \right|} = \frac{1}{11}, \]

\[ w_2 = \frac{1}{\left| {x_2-x_1} \right| + \left| {x_2-x_3} \right| + \left| {x_2-x_4} \right|} = \frac{1}{\left| {6-5} \right| + \left| {6-8} \right| + \left| {6-12} \right|} = \frac{1}{9}, \]

\[ w_3 = \frac{1}{\left| {x_3-x_1} \right| + \left| {x_3-x_2} \right| + \left| {x_3-x_4} \right|} = \frac{1}{\left| {8-5} \right| + \left| {8-6} \right| + \left| {8-12} \right|} = \frac{1}{9}, \]

\[ w_4 = \frac{1}{\left| {x_4-x_1} \right| + \left| {x_4-x_2} \right| + \left| {x_4-x_3} \right|} = \frac{1}{\left| {12-5} \right| + \left| {12-6} \right| + \left| {12-8} \right|} = \frac{1}{17}. \]

The distance-weighted mean is:

\[ \mathrm{DWM} = \frac{w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4}{w_1 + w_2 + w_3 + w_4} \approx 7.3. \]

Comparison to other measures of central tendency

Distance-weighted mean is less sensitive to outliers than the arithmetic mean and many other measures of central tendency. It can be regarded as an alternative to trimmed mean and Winsorized mean. The main advantage of the distance-weighted estimator is that it does not require definite judging of whether or not some values must be deleted as outliers, which is extremely important for empirical studies when no data point can be identified as an outlier with confidence.


See also

Robust statistics

Distance-weighted standard deviation

Distance-weighted standard score

References

  1. Yury S. Dodonov, & Yulia A. Dodonova Robust measures of central tendency: weighting as a possible alternative to trimming in response-time data analysis. Psikhologicheskie Issledovaniya, 5(19).
How to Cite This Entry:
Distance-weighted mean. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Distance-weighted_mean&oldid=29077