Difference between revisions of "Distance-weighted mean"
(Created page with " The '''distance-weighted mean''' is a measure of central tendency, a special case of weighted mean, where weighting coefficient for each data point is computed as the invers...") |
(→Example: numerical error corrected) |
||
Line 40: | Line 40: | ||
\[ | \[ | ||
\mathrm{DWM} = \frac{w_1 x_1 + w_2 x_2 + w_3 | \mathrm{DWM} = \frac{w_1 x_1 + w_2 x_2 + w_3 | ||
− | x_3 + w_4 x_4}{w_1 + w_2 + w_3 + w_4} \approx | + | x_3 + w_4 x_4}{w_1 + w_2 + w_3 + w_4} \approx 4.8. |
\] | \] | ||
Revision as of 06:47, 4 December 2012
The distance-weighted mean is a measure of central tendency, a special case of weighted mean, where weighting coefficient for each data point is computed as the inverse sum of distances between this data point and the other data points[1]. Thus, central observations in a dataset get the highest weights, while values in the tails of a distribution are downweighted. In other words, data points close to other data points carry more weight than isolated datapoints.
An important property of the distance-weighted mean is that computing weighting coefficients does not require mean or other parameters of the original distribution as input information, because each value in the dataset is weighted in relation to the entire data array.
Calculation
The weighting coefficient for xi is computed as the inverse mean distance between xi and the other data points:
\begin{equation*}\bar{x} = \frac{ \sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}\;\;\;\text{where}\;\;\;w_i = \frac{k}{\sum_{j=1}^n |x_i-x_j|}.\end{equation*}
where k is any positive number. The coefficient k is used to avoid computational problems caused by large magnitudes of distances between data points (and thus large sums in the denominator). In most cases, it is useful to set k equal to n (number of data points) or (n-1). In the latter case, each nonstandardized weighting coefficient is the inverse mean distance between the respective data point and the other data points.
Example
Consider a simple numerical example of a dataset consisting of four observations: x1 = 2, x2 = 3, x3 = 5, x4 = 12 (n = 4). Weighting coefficients for xi are:
\[ w_1 = \frac{1}{\left| {x_1-x_2} \right| + \left| {x_1-x_3} \right| + \left| {x_1-x_4} \right|} = \frac{1}{\left| {5-6} \right| + \left| {5-8} \right| + \left| {5-12} \right|} = \frac{1}{11}, \]
\[ w_2 = \frac{1}{\left| {x_2-x_1} \right| + \left| {x_2-x_3} \right| + \left| {x_2-x_4} \right|} = \frac{1}{\left| {6-5} \right| + \left| {6-8} \right| + \left| {6-12} \right|} = \frac{1}{9}, \]
\[ w_3 = \frac{1}{\left| {x_3-x_1} \right| + \left| {x_3-x_2} \right| + \left| {x_3-x_4} \right|} = \frac{1}{\left| {8-5} \right| + \left| {8-6} \right| + \left| {8-12} \right|} = \frac{1}{9}, \]
\[ w_4 = \frac{1}{\left| {x_4-x_1} \right| + \left| {x_4-x_2} \right| + \left| {x_4-x_3} \right|} = \frac{1}{\left| {12-5} \right| + \left| {12-6} \right| + \left| {12-8} \right|} = \frac{1}{17}. \]
The distance-weighted mean is:
\[ \mathrm{DWM} = \frac{w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4}{w_1 + w_2 + w_3 + w_4} \approx 4.8. \]
Comparison to other measures of central tendency
Distance-weighted mean is less sensitive to outliers than the arithmetic mean and many other measures of central tendency. It can be regarded as an alternative to trimmed mean and Winsorized mean. The main advantage of the distance-weighted estimator is that it does not require definite judging of whether or not some values must be deleted as outliers, which is extremely important for empirical studies when no data point can be identified as an outlier with confidence.
See also
Distance-weighted standard deviation
Distance-weighted standard score
References
- ↑ Yury S. Dodonov, & Yulia A. Dodonova Robust measures of central Tendency: weighting as a possible alternative to trimming in response-time data analysis
Distance-weighted mean. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Distance-weighted_mean&oldid=29070