Namespaces
Variants
Actions

Difference between revisions of "Distance-weighted mean"

From Encyclopedia of Mathematics
Jump to: navigation, search
(→‎Example: numerical error corrected)
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
+
The '''distance-weighted mean''' is a measure of central tendency, a special case of weighted mean, where weighting coefficient for each data point is computed as the inverse sum of distances between this data point and the other data points<ref>''Yury S. Dodonov, & Yulia A. Dodonova'' [http://psystudy.com/files/Dodonov_Dodonova_psystudy_ru_2011_5(19)en.pdf Robust measures of central tendency: weighting as a possible alternative to trimming in response-time data analysis]. Psikhologicheskie Issledovaniya, 5(19).</ref>. Thus, central observations in a dataset get the highest weights, while values in the tails of a distribution are downweighted. In other words, data points close to other data points carry more weight than isolated datapoints.
 
 
The '''distance-weighted mean''' is a measure of central tendency, a special case of weighted mean, where weighting coefficient for each data point is computed as the inverse sum of distances between this data point and the other data points<ref>''Yury S. Dodonov, & Yulia A. Dodonova'' [http://psystudy.com/files/Dodonov_Dodonova_psystudy_ru_2011_5(19)en.pdf Robust measures of central Tendency: weighting as a possible alternative to trimming in response-time data analysis]</ref>. Thus, central observations in a dataset get the highest weights, while values in the tails of a distribution are downweighted. In other words, data points close to other data points carry more weight than isolated datapoints.
 
  
 
An important property of the distance-weighted mean is that computing weighting coefficients does not require mean or other parameters of the original distribution as input information, because each value in the dataset is weighted in relation to the entire data array.
 
An important property of the distance-weighted mean is that computing weighting coefficients does not require mean or other parameters of the original distribution as input information, because each value in the dataset is weighted in relation to the entire data array.
Line 17: Line 15:
 
=== Example ===
 
=== Example ===
  
Consider a simple numerical example of a dataset consisting of four observations: x<sub>1</sub> = 2, x<sub>2</sub> = 3, x<sub>3</sub> = 5, x<sub>4</sub> = 12 (n = 4).
+
Consider a simple numerical example of a dataset consisting of four observations: x<sub>1</sub> = 5, x<sub>2</sub> = 6, x<sub>3</sub> = 8, x<sub>4</sub> = 12 (n = 4).
 
Weighting coefficients for xi are:
 
Weighting coefficients for xi are:
  
Line 40: Line 38:
 
\[
 
\[
 
\mathrm{DWM} = \frac{w_1 x_1 + w_2 x_2 + w_3
 
\mathrm{DWM} = \frac{w_1 x_1 + w_2 x_2 + w_3
x_3 + w_4 x_4}{w_1 + w_2 + w_3 + w_4} \approx 4.8.
+
x_3 + w_4 x_4}{w_1 + w_2 + w_3 + w_4} \approx 7.3.
 
\]
 
\]
  
Line 46: Line 44:
  
 
Distance-weighted mean is less sensitive to outliers than the arithmetic mean and many other measures of central tendency. It can be regarded as an alternative to trimmed mean and Winsorized mean. The main advantage of the distance-weighted estimator is that it does not require definite judging of whether or not some values must be deleted as outliers, which is extremely important for empirical studies when no data point can be identified as an outlier with confidence.
 
Distance-weighted mean is less sensitive to outliers than the arithmetic mean and many other measures of central tendency. It can be regarded as an alternative to trimmed mean and Winsorized mean. The main advantage of the distance-weighted estimator is that it does not require definite judging of whether or not some values must be deleted as outliers, which is extremely important for empirical studies when no data point can be identified as an outlier with confidence.
 +
  
 
== See also ==
 
== See also ==
 
+
[[Distribution fitting by using the mean of distances of empirical data]]
[[Distance-weighted standard deviation]]
 
 
 
[[Distance-weighted standard score]]
 
  
 
== References ==  
 
== References ==  
 
<references/>
 
<references/>
 +
<br>

Latest revision as of 16:32, 12 July 2023

The distance-weighted mean is a measure of central tendency, a special case of weighted mean, where weighting coefficient for each data point is computed as the inverse sum of distances between this data point and the other data points[1]. Thus, central observations in a dataset get the highest weights, while values in the tails of a distribution are downweighted. In other words, data points close to other data points carry more weight than isolated datapoints.

An important property of the distance-weighted mean is that computing weighting coefficients does not require mean or other parameters of the original distribution as input information, because each value in the dataset is weighted in relation to the entire data array.

Calculation

The weighting coefficient for xi is computed as the inverse mean distance between xi and the other data points:

\begin{equation*}\bar{x} = \frac{ \sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}\;\;\;\text{where}\;\;\;w_i = \frac{k}{\sum_{j=1}^n |x_i-x_j|}.\end{equation*}

where k is any positive number. The coefficient k is used to avoid computational problems caused by large magnitudes of distances between data points (and thus large sums in the denominator). In most cases, it is useful to set k equal to n (number of data points) or (n-1). In the latter case, each nonstandardized weighting coefficient is the inverse mean distance between the respective data point and the other data points.

Example

Consider a simple numerical example of a dataset consisting of four observations: x1 = 5, x2 = 6, x3 = 8, x4 = 12 (n = 4). Weighting coefficients for xi are:

\[ w_1 = \frac{1}{\left| {x_1-x_2} \right| + \left| {x_1-x_3} \right| + \left| {x_1-x_4} \right|} = \frac{1}{\left| {5-6} \right| + \left| {5-8} \right| + \left| {5-12} \right|} = \frac{1}{11}, \]

\[ w_2 = \frac{1}{\left| {x_2-x_1} \right| + \left| {x_2-x_3} \right| + \left| {x_2-x_4} \right|} = \frac{1}{\left| {6-5} \right| + \left| {6-8} \right| + \left| {6-12} \right|} = \frac{1}{9}, \]

\[ w_3 = \frac{1}{\left| {x_3-x_1} \right| + \left| {x_3-x_2} \right| + \left| {x_3-x_4} \right|} = \frac{1}{\left| {8-5} \right| + \left| {8-6} \right| + \left| {8-12} \right|} = \frac{1}{9}, \]

\[ w_4 = \frac{1}{\left| {x_4-x_1} \right| + \left| {x_4-x_2} \right| + \left| {x_4-x_3} \right|} = \frac{1}{\left| {12-5} \right| + \left| {12-6} \right| + \left| {12-8} \right|} = \frac{1}{17}. \]

The distance-weighted mean is:

\[ \mathrm{DWM} = \frac{w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4}{w_1 + w_2 + w_3 + w_4} \approx 7.3. \]

Comparison to other measures of central tendency

Distance-weighted mean is less sensitive to outliers than the arithmetic mean and many other measures of central tendency. It can be regarded as an alternative to trimmed mean and Winsorized mean. The main advantage of the distance-weighted estimator is that it does not require definite judging of whether or not some values must be deleted as outliers, which is extremely important for empirical studies when no data point can be identified as an outlier with confidence.


See also

Distribution fitting by using the mean of distances of empirical data

References

  1. Yury S. Dodonov, & Yulia A. Dodonova Robust measures of central tendency: weighting as a possible alternative to trimming in response-time data analysis. Psikhologicheskie Issledovaniya, 5(19).


How to Cite This Entry:
Distance-weighted mean. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Distance-weighted_mean&oldid=29071