Namespaces
Variants
Actions

Difference between revisions of "Distribution fitting by using the mean of distances of empirical data"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Created page with "Distribution fitting by using the mean of distances of empirical data")
 
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
Distribution fitting by using the mean of distances of empirical data
+
Described method was proposed by '''''Yury S. Dodonov''''' (2020)<br><br>
 +
On the one hand, for any given set of n values ${x}_{1}, {x}_{2}, ...,{x}_{n}$ with ${x}_{i}\in{}R$, we can calculate the mean of distances or mean square of distances from ${x}_{i}$ to the others point using, respectively,<br>
 +
 
 +
$$ \tag{1}
 +
\Psi .1 = \frac{{\sum\limits_{j = 1}^n {\left| {{x_i} - {x_j}} \right|}
 +
}}{{{\rm{n}} - 1}}
 +
$$
 +
 
 +
$$or$$
 +
 
 +
$$ \tag{2}
 +
\Psi .2 = \frac{{\sum\limits_{j = 1}^n {{{\left( {{x_i} - {x_j}} \right)}^2}}
 +
}}{{{\rm{n}} - 1}}.
 +
$$
 +
<br><br>
 +
On the other hand, there is a functional dependency for theoretical distribution, which can be written (for the mean of distances) as
 +
<br><br>
 +
 
 +
$$ \tag{3}
 +
\left\{ \begin{array}{l}
 +
\Phi .1\left( {x\,;\;\theta ,\;L,\;U} \right) = \frac{{\int\limits_0^{U - x}
 +
{\Omega \left( {t;\;x,\;\theta} \right)dt - \int\limits_{L - x}^0 {\Omega \left(
 +
{t;\;x,\;\theta} \right)dt} } }}{{CDF\left( {U;\;\theta } \right) - CDF\left(
 +
{L;\;\theta } \right)}}\\
 +
L \le x \le U
 +
\end{array} \right.
 +
$$
 +
<br><br>
 +
where CDF is a cumulative distribution function, L and U are the lower and upper of boundaries, respectively, and
 +
<br><br>
 +
 
 +
$$ \tag{4}
 +
\Omega \left( {t;\;x,\;\theta } \right) = t \cdot PDF\left( {t;\;x,{\theta _{x\to \left( {t + x} \right)}}} \right)
 +
$$
 +
<br><br>
 +
In dealing with a mean square of distances, the general functional dependency takes the following form:
 +
<br><br>
 +
 
 +
$$ \tag{5}
 +
\left\{ \begin{array}{l}
 +
\Phi .2\left( {x\,;\;\theta ,\;L,\;U} \right) = \frac{{\int\limits_L^U {\left[
 +
{{{\left( {x - t} \right)}^2} \cdot PDF\left( {t;\theta } \right)} \right]dt}
 +
}}{{CDF\left( {U;\;\theta } \right) - CDF\left(
 +
{L;\;\theta } \right)}}\\
 +
L \le x \le U
 +
\end{array} \right.
 +
$$
 +
<br><br>
 +
Thus, a general fitting algorithm contains two steps. The first is calculating the mean of distances or mean square of distances from ${x}_{i}$ to the other points, and then, by means of using (3) or (5), the
 +
distribution parameters are recovered by using a conventional regression method.
 +
 
 +
 
 +
'''Example for normal distribution'''
 +
<br><br>
 +
Probability density and cumulative distribution functions for a Gaussian distribution are defined by:
 +
<br><br>
 +
$$ \tag{6}
 +
PD{F_N} = \frac{1}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}
 +
$$
 +
<br><br>
 +
$$ \tag{7}
 +
CD {F_N}  = \frac{1}{2}\left[  {1 + erf\left(  {\frac{ {x - \mu } }{ {\sigma \sqrt 2 } }}  \right)}  \right]
 +
$$
 +
<br><br>
 +
For the mean of distances according to (6) and (4):
 +
<br><br>
 +
$$ \tag{8}
 +
{\Omega _N}\left( {t;\;x,\;\mu ,\;\sigma } \right) = t \cdot PDF\left( {t;\;x,{\theta _{x \to \left( {t + x} \right)}}} \right) = \frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}
 +
$$
 +
<br><br>
 +
according to (3),  (7) and (8):
 +
<br><br>
 +
$$ \tag{9}
 +
\left\{ \begin{array}{l}
 +
\Phi {.1_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \frac{{\int\limits_0^{U - x} {\frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}dt - \int\limits_{L - x}^0 {\frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}dt} } }}{{CD{F_N}\left( {U;\;\mu ,\;\sigma } \right) - CD{F_N}\left( {L;\;\mu ,\;\sigma } \right)}}\\
 +
L \le x \le U
 +
\end{array} \right.
 +
$$
 +
<br><br>
 +
and finally:
 +
<br><br>
 +
$$ \tag{10}
 +
\left\{ \begin{array}{l}\Phi  {.1_N} \left(  {x\,;\;\mu ,\;\sigma ,\;L,\;U}  \right) = \\ = \frac{ {\left(  {x - \mu }  \right)\left(  {2erf\left(  {\frac{ {x - \mu } }{ {\sigma \sqrt 2 } }}  \right) - erf\left(  {\frac{ {U - \mu } }{ {\sigma \sqrt 2 } }}  \right) - erf\left(  {\frac{ {L - \mu } }{ {\sigma \sqrt 2 } }}  \right)}  \right) + \frac{ {2\sigma } }{ {\sqrt  {2\pi }  } }\left(  {2 {e^ {\frac{ { -  { {\left(  {x - \mu }  \right)} ^2} } }{ {2 {\sigma ^2} } }} }  -  {e^ {\frac{ { -  { {\left(  {U - \mu }  \right)} ^2} } }{ {2 {\sigma ^2} } }} }  -  {e^ {\frac{ { -  { {\left(  {L - \mu }  \right)} ^2} } }{ {2 {\sigma ^2} } }} } }  \right)} }{ {erf\left(  {\frac{ {U - \mu } }{ {\sigma \sqrt 2 } }}  \right) - erf\left(  {\frac{ {L - \mu } }{ {\sigma \sqrt 2 } }}  \right)} }\\L \le x \le U\end{array} \right.
 +
$$
 +
<br><br>
 +
For the mean square of distances according to (5), (6) and (7):
 +
<br><br>
 +
$$ \tag{11}
 +
\left\{ \begin{array}{l}
 +
\Phi {.2_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \frac{{\int\limits_L^U {\left[ {\frac{{{{\left( {x - t} \right)}^2}}}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {t - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}} \right]dt} }}{{CD{F_N}(U;\mu ,\;\sigma ) - CD{F_N}(L;\mu ,\;\sigma )}}\\
 +
L \le x \le U
 +
\end{array} \right.
 +
$$
 +
<br><br>
 +
and finally:
 +
<br><br>
 +
$$ \tag{12}
 +
\left\{ \begin{array}{l}\Phi  {.2_N} \left(  {x\,;\;\mu ,\;\sigma ,\;L,\;U}  \right) = \\ = \frac{ {\left(  { { {\left(  {\mu  - x}  \right)} ^2}  +  {\sigma ^2} }  \right)\left(  {erf\left(  {\frac{ {U - \mu } }{ {\sigma \sqrt 2 } }}  \right) - erf\left(  {\frac{ {L - \mu } }{ {\sigma \sqrt 2 } }}  \right)}  \right) + \frac{ {2\sigma } }{ {\sqrt  {2\pi }  } }\left(  { {e^ {\frac{ { -  { {\left(  {L - \mu }  \right)} ^2} } }{ {2 {\sigma ^2} } }} } \left(  {L + \mu  - 2x}  \right) -  {e^ {\frac{ { -  { {\left(  {U - \mu }  \right)} ^2} } }{ {2 {\sigma ^2} } }} } \left(  {\mu  + U - 2x}  \right)}  \right)} }{ {erf\left(  {\frac{ {U - \mu } }{ {\sigma \sqrt 2 } }}  \right) - erf\left(  {\frac{ {L - \mu } }{ {\sigma \sqrt 2 } }}  \right)} }\\L \le x \le U\end{array} \right.
 +
$$
 +
<br><br><br>
 +
== Reference ==
 +
Dodonov, Yury (2020): Distribution fitting by using the mean of distances of empirical data. figshare. Journal contribution. https://doi.org/10.6084/m9.figshare.13014410
 +
== Appendix. R code for fitting normal distribution ==
 +
Available at https://doi.org/10.6084/m9.figshare.13042745
 +
<br>

Latest revision as of 16:30, 12 July 2023

Described method was proposed by Yury S. Dodonov (2020)

On the one hand, for any given set of n values ${x}_{1}, {x}_{2}, ...,{x}_{n}$ with ${x}_{i}\in{}R$, we can calculate the mean of distances or mean square of distances from ${x}_{i}$ to the others point using, respectively,

$$ \tag{1} \Psi .1 = \frac{{\sum\limits_{j = 1}^n {\left| {{x_i} - {x_j}} \right|} }}{{{\rm{n}} - 1}} $$

$$or$$

$$ \tag{2} \Psi .2 = \frac{{\sum\limits_{j = 1}^n {{{\left( {{x_i} - {x_j}} \right)}^2}} }}{{{\rm{n}} - 1}}. $$

On the other hand, there is a functional dependency for theoretical distribution, which can be written (for the mean of distances) as

$$ \tag{3} \left\{ \begin{array}{l} \Phi .1\left( {x\,;\;\theta ,\;L,\;U} \right) = \frac{{\int\limits_0^{U - x} {\Omega \left( {t;\;x,\;\theta} \right)dt - \int\limits_{L - x}^0 {\Omega \left( {t;\;x,\;\theta} \right)dt} } }}{{CDF\left( {U;\;\theta } \right) - CDF\left( {L;\;\theta } \right)}}\\ L \le x \le U \end{array} \right. $$

where CDF is a cumulative distribution function, L and U are the lower and upper of boundaries, respectively, and

$$ \tag{4} \Omega \left( {t;\;x,\;\theta } \right) = t \cdot PDF\left( {t;\;x,{\theta _{x\to \left( {t + x} \right)}}} \right) $$

In dealing with a mean square of distances, the general functional dependency takes the following form:

$$ \tag{5} \left\{ \begin{array}{l} \Phi .2\left( {x\,;\;\theta ,\;L,\;U} \right) = \frac{{\int\limits_L^U {\left[ {{{\left( {x - t} \right)}^2} \cdot PDF\left( {t;\theta } \right)} \right]dt} }}{{CDF\left( {U;\;\theta } \right) - CDF\left( {L;\;\theta } \right)}}\\ L \le x \le U \end{array} \right. $$

Thus, a general fitting algorithm contains two steps. The first is calculating the mean of distances or mean square of distances from ${x}_{i}$ to the other points, and then, by means of using (3) or (5), the distribution parameters are recovered by using a conventional regression method.


Example for normal distribution

Probability density and cumulative distribution functions for a Gaussian distribution are defined by:

$$ \tag{6} PD{F_N} = \frac{1}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}} $$

$$ \tag{7} CD {F_N} = \frac{1}{2}\left[ {1 + erf\left( {\frac{ {x - \mu } }{ {\sigma \sqrt 2 } }} \right)} \right] $$

For the mean of distances according to (6) and (4):

$$ \tag{8} {\Omega _N}\left( {t;\;x,\;\mu ,\;\sigma } \right) = t \cdot PDF\left( {t;\;x,{\theta _{x \to \left( {t + x} \right)}}} \right) = \frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}} $$

according to (3), (7) and (8):

$$ \tag{9} \left\{ \begin{array}{l} \Phi {.1_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \frac{{\int\limits_0^{U - x} {\frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}dt - \int\limits_{L - x}^0 {\frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}dt} } }}{{CD{F_N}\left( {U;\;\mu ,\;\sigma } \right) - CD{F_N}\left( {L;\;\mu ,\;\sigma } \right)}}\\ L \le x \le U \end{array} \right. $$

and finally:

$$ \tag{10} \left\{ \begin{array}{l}\Phi {.1_N} \left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \\ = \frac{ {\left( {x - \mu } \right)\left( {2erf\left( {\frac{ {x - \mu } }{ {\sigma \sqrt 2 } }} \right) - erf\left( {\frac{ {U - \mu } }{ {\sigma \sqrt 2 } }} \right) - erf\left( {\frac{ {L - \mu } }{ {\sigma \sqrt 2 } }} \right)} \right) + \frac{ {2\sigma } }{ {\sqrt {2\pi } } }\left( {2 {e^ {\frac{ { - { {\left( {x - \mu } \right)} ^2} } }{ {2 {\sigma ^2} } }} } - {e^ {\frac{ { - { {\left( {U - \mu } \right)} ^2} } }{ {2 {\sigma ^2} } }} } - {e^ {\frac{ { - { {\left( {L - \mu } \right)} ^2} } }{ {2 {\sigma ^2} } }} } } \right)} }{ {erf\left( {\frac{ {U - \mu } }{ {\sigma \sqrt 2 } }} \right) - erf\left( {\frac{ {L - \mu } }{ {\sigma \sqrt 2 } }} \right)} }\\L \le x \le U\end{array} \right. $$

For the mean square of distances according to (5), (6) and (7):

$$ \tag{11} \left\{ \begin{array}{l} \Phi {.2_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \frac{{\int\limits_L^U {\left[ {\frac{{{{\left( {x - t} \right)}^2}}}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {t - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}} \right]dt} }}{{CD{F_N}(U;\mu ,\;\sigma ) - CD{F_N}(L;\mu ,\;\sigma )}}\\ L \le x \le U \end{array} \right. $$

and finally:

$$ \tag{12} \left\{ \begin{array}{l}\Phi {.2_N} \left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \\ = \frac{ {\left( { { {\left( {\mu - x} \right)} ^2} + {\sigma ^2} } \right)\left( {erf\left( {\frac{ {U - \mu } }{ {\sigma \sqrt 2 } }} \right) - erf\left( {\frac{ {L - \mu } }{ {\sigma \sqrt 2 } }} \right)} \right) + \frac{ {2\sigma } }{ {\sqrt {2\pi } } }\left( { {e^ {\frac{ { - { {\left( {L - \mu } \right)} ^2} } }{ {2 {\sigma ^2} } }} } \left( {L + \mu - 2x} \right) - {e^ {\frac{ { - { {\left( {U - \mu } \right)} ^2} } }{ {2 {\sigma ^2} } }} } \left( {\mu + U - 2x} \right)} \right)} }{ {erf\left( {\frac{ {U - \mu } }{ {\sigma \sqrt 2 } }} \right) - erf\left( {\frac{ {L - \mu } }{ {\sigma \sqrt 2 } }} \right)} }\\L \le x \le U\end{array} \right. $$


Reference

Dodonov, Yury (2020): Distribution fitting by using the mean of distances of empirical data. figshare. Journal contribution. https://doi.org/10.6084/m9.figshare.13014410

Appendix. R code for fitting normal distribution

Available at https://doi.org/10.6084/m9.figshare.13042745

How to Cite This Entry:
Distribution fitting by using the mean of distances of empirical data. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Distribution_fitting_by_using_the_mean_of_distances_of_empirical_data&oldid=50916