Difference between revisions of "Distribution fitting by using the mean of distances of empirical data"
(Blanked the page) Tag: Blanking |
|||
Line 1: | Line 1: | ||
+ | Described method was proposed by '''Yury S. Dodonov''' (2020)<br><br> | ||
+ | On the one hand, for any given set of n values ${x}_{1}, {x}_{2}, ...,{x}_{n}$ with ${x}_{i}\in{}R$, we can calculate the mean of distances or mean square of distances from ${x}_{i}$ to the others point using, respectively,<br> | ||
+ | (1) | ||
+ | $ | ||
+ | \Psi .1 = \frac{{\sum\limits_{j = 1}^n {\left| {{x_i} - {x_j}} \right|} | ||
+ | }}{{{\rm{n}} - 1}} | ||
+ | $ | ||
+ | or | ||
+ | |||
+ | (2) | ||
+ | $ | ||
+ | \Psi .2 = \frac{{\sum\limits_{j = 1}^n {{{\left( {{x_i} - {x_j}} \right)}^2}} | ||
+ | }}{{{\rm{n}} - 1}} | ||
+ | $. | ||
+ | <br><br> | ||
+ | On the other hand, there is a functional dependency for theoretical distribution, which can be written (for the mean of distances) as | ||
+ | <br><br> | ||
+ | (3) | ||
+ | $ | ||
+ | \left\{ \begin{array}{l} | ||
+ | \Phi .1\left( {x\,;\;\theta ,\;L,\;U} \right) = \frac{{\int\limits_0^{U - x} | ||
+ | {\Omega \left( {t;\;x,\;\theta} \right)dt - \int\limits_{L - x}^0 {\Omega \left( | ||
+ | {t;\;x,\;\theta} \right)dt} } }}{{CDF\left( {U;\;\theta } \right) - CDF\left( | ||
+ | {L;\;\theta } \right)}}\\ | ||
+ | L \le x \le U | ||
+ | \end{array} \right. | ||
+ | $ | ||
+ | <br><br> | ||
+ | where CDF is a cumulative distribution function, L and U are the lower and upper of boundaries, respectively, and | ||
+ | <br><br> | ||
+ | (4) | ||
+ | $ | ||
+ | \Omega \left( {t;\;x,\;\theta } \right) = t \cdot PDF\left( {t;\;x,{\theta _{x\to \left( {t + x} \right)}}} \right) | ||
+ | $ | ||
+ | <br><br> | ||
+ | In dealing with a mean square of distances, the general functional dependency takes the following form: | ||
+ | <br><br> | ||
+ | (5) | ||
+ | $ | ||
+ | \left\{ \begin{array}{l} | ||
+ | \Phi .2\left( {x\,;\;\theta ,\;L,\;U} \right) = \frac{{\int\limits_L^U {\left[ | ||
+ | {{{\left( {x - t} \right)}^2} \cdot PDF\left( {t;\theta } \right)} \right]dt} | ||
+ | }}{{CDF\left( {U;\;\theta } \right) - CDF\left( | ||
+ | {L;\;\theta } \right)}}\\ | ||
+ | L \le x \le U | ||
+ | \end{array} \right. | ||
+ | $ | ||
+ | <br><br> | ||
+ | Thus, a general fitting algorithm contains two steps. The first is calculating the mean of distances or mean square of distances from ${x}_{i}$ to the other points, and then, by means of using (3) or (5), the | ||
+ | distribution parameters are recovered by using a conventional regression method. | ||
+ | |||
+ | |||
+ | '''Example for normal distribution''' | ||
+ | <br><br> | ||
+ | Probability density and cumulative distribution functions for a Gaussian distribution are defined by: | ||
+ | <br><br> | ||
+ | (6) | ||
+ | $PD{F_N} = \frac{1}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}$ | ||
+ | <br><br> | ||
+ | (7) | ||
+ | $CD{F_N} = \frac{1}{2}\left[ {1 + erf\left( {\frac{{x - \mu }}{{\sigma \sqrt 2 }}} \right)} \right]$ | ||
+ | <br><br> | ||
+ | For the mean of distances according to (6) and (4): | ||
+ | <br><br> | ||
+ | (8) | ||
+ | ${\Omega _N}\left( {t;\;x,\;\mu ,\;\sigma } \right) = t \cdot PDF\left( {t;\;x,{\theta _{x \to \left( {t + x} \right)}}} \right) = \frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}$ | ||
+ | <br><br> | ||
+ | according to (3), (7) and (8): | ||
+ | <br><br> | ||
+ | (9) | ||
+ | $\left\{ \begin{array}{l} | ||
+ | \Phi {.1_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \frac{{\int\limits_0^{U - x} {\frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}dt - \int\limits_{L - x}^0 {\frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}dt} } }}{{CD{F_N}\left( {U;\;\mu ,\;\sigma } \right) - CD{F_N}\left( {L;\;\mu ,\;\sigma } \right)}}\\ | ||
+ | L \le x \le U | ||
+ | \end{array} \right.$ | ||
+ | <br><br> | ||
+ | and finally: | ||
+ | <br><br> | ||
+ | (10) | ||
+ | $\left\{ \begin{array}{l} | ||
+ | \Phi {.1_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \\ | ||
+ | = \frac{{\left( {x - \mu } \right)\left( {2erf\left( {\frac{{x - \mu }}{{\sigma \sqrt 2 }}} \right) - erf\left( {\frac{{U - \mu }}{{\sigma \sqrt 2 }}} \right) - erf\left( {\frac{{L - \mu }}{{\sigma \sqrt 2 }}} \right)} \right) + \frac{{2\sigma }}{{\sqrt {2\pi } }}\left( {2{e^{\frac{{ - {{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}} - {e^{\frac{{ - {{\left( {U - \mu } \right)}^2}}}{{2{\sigma ^2}}}}} - {e^{\frac{{ - {{\left( {L - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}} \right)}}{{erf\left( {\frac{{U - \mu }}{{\sigma \sqrt 2 }}} \right) - erf\left( {\frac{{L - \mu }}{{\sigma \sqrt 2 }}} \right)}}\\ | ||
+ | L \le x \le U | ||
+ | \end{array} \right.$ | ||
+ | <br><br> | ||
+ | For the mean square of distances according to (5), (6) and (7): | ||
+ | <br><br> | ||
+ | (11) | ||
+ | $\left\{ \begin{array}{l} | ||
+ | \Phi {.2_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \frac{{\int\limits_L^U {\left[ {\frac{{{{\left( {x - t} \right)}^2}}}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {t - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}} \right]dt} }}{{CD{F_N}(U;\mu ,\;\sigma ) - CD{F_N}(L;\mu ,\;\sigma )}}\\ | ||
+ | L \le x \le U | ||
+ | \end{array} \right.$ | ||
+ | <br><br> | ||
+ | and finally: | ||
+ | <br><br> | ||
+ | (12) | ||
+ | $\left\{ \begin{array}{l} | ||
+ | \Phi {.2_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \\ | ||
+ | = \frac{{\left( {{{\left( {\mu - x} \right)}^2} + {\sigma ^2}} \right)\left( {erf\left( {\frac{{U - \mu }}{{\sigma \sqrt 2 }}} \right) - erf\left( {\frac{{L - \mu }}{{\sigma \sqrt 2 }}} \right)} \right) + \frac{{2\sigma }}{{\sqrt {2\pi } }}\left( {{e^{\frac{{ - {{\left( {L - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}\left( {L + \mu - 2x} \right) - {e^{\frac{{ - {{\left( {U - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}\left( {\mu + U - 2x} \right)} \right)}}{{erf\left( {\frac{{U - \mu }}{{\sigma \sqrt 2 }}} \right) - erf\left( {\frac{{L - \mu }}{{\sigma \sqrt 2 }}} \right)}}\\ | ||
+ | L \le x \le U | ||
+ | \end{array} \right.$ | ||
+ | <br><br><br> | ||
+ | '''Reference''' | ||
+ | <br><br> | ||
+ | Dodonov, Yury (2020): Distribution fitting by using the mean of distances of empirical data. figshare. Journal contribution. https://doi.org/10.6084/m9.figshare.13014410 | ||
+ | <br><br><br> | ||
+ | '''Appendix. R code for fitting normal distribution''' | ||
+ | <br><br><br> |
Revision as of 07:53, 18 October 2020
Described method was proposed by Yury S. Dodonov (2020)
On the one hand, for any given set of n values ${x}_{1}, {x}_{2}, ...,{x}_{n}$ with ${x}_{i}\in{}R$, we can calculate the mean of distances or mean square of distances from ${x}_{i}$ to the others point using, respectively,
(1)
$
\Psi .1 = \frac{{\sum\limits_{j = 1}^n {\left| {{x_i} - {x_j}} \right|}
}}{{{\rm{n}} - 1}}
$
or
(2)
$
\Psi .2 = \frac{{\sum\limits_{j = 1}^n {{{\left( {{x_i} - {x_j}} \right)}^2}}
}}{{{\rm{n}} - 1}}
$.
On the other hand, there is a functional dependency for theoretical distribution, which can be written (for the mean of distances) as
(3)
$
\left\{ \begin{array}{l}
\Phi .1\left( {x\,;\;\theta ,\;L,\;U} \right) = \frac{{\int\limits_0^{U - x}
{\Omega \left( {t;\;x,\;\theta} \right)dt - \int\limits_{L - x}^0 {\Omega \left(
{t;\;x,\;\theta} \right)dt} } }}{{CDF\left( {U;\;\theta } \right) - CDF\left(
{L;\;\theta } \right)}}\\
L \le x \le U
\end{array} \right.
$
where CDF is a cumulative distribution function, L and U are the lower and upper of boundaries, respectively, and
(4)
$
\Omega \left( {t;\;x,\;\theta } \right) = t \cdot PDF\left( {t;\;x,{\theta _{x\to \left( {t + x} \right)}}} \right)
$
In dealing with a mean square of distances, the general functional dependency takes the following form:
(5)
$
\left\{ \begin{array}{l}
\Phi .2\left( {x\,;\;\theta ,\;L,\;U} \right) = \frac{{\int\limits_L^U {\left[
{{{\left( {x - t} \right)}^2} \cdot PDF\left( {t;\theta } \right)} \right]dt}
}}{{CDF\left( {U;\;\theta } \right) - CDF\left(
{L;\;\theta } \right)}}\\
L \le x \le U
\end{array} \right.
$
Thus, a general fitting algorithm contains two steps. The first is calculating the mean of distances or mean square of distances from ${x}_{i}$ to the other points, and then, by means of using (3) or (5), the
distribution parameters are recovered by using a conventional regression method.
Example for normal distribution
Probability density and cumulative distribution functions for a Gaussian distribution are defined by:
(6)
$PD{F_N} = \frac{1}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}$
(7)
$CD{F_N} = \frac{1}{2}\left[ {1 + erf\left( {\frac[[:Template:X - \mu]][[:Template:\sigma \sqrt 2]]} \right)} \right]$
For the mean of distances according to (6) and (4):
(8)
${\Omega _N}\left( {t;\;x,\;\mu ,\;\sigma } \right) = t \cdot PDF\left( {t;\;x,{\theta _{x \to \left( {t + x} \right)}}} \right) = \frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}$
according to (3), (7) and (8):
(9)
$\left\{ \begin{array}{l}
\Phi {.1_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \frac{{\int\limits_0^{U - x} {\frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}dt - \int\limits_{L - x}^0 {\frac{t}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {\left( {t + x} \right) - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}dt} } }}{{CD{F_N}\left( {U;\;\mu ,\;\sigma } \right) - CD{F_N}\left( {L;\;\mu ,\;\sigma } \right)}}\\
L \le x \le U
\end{array} \right.$
and finally:
(10)
$\left\{ \begin{array}{l}
\Phi {.1_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \\
= \frac{{\left( {x - \mu } \right)\left( {2erf\left( {\frac[[:Template:X - \mu]][[:Template:\sigma \sqrt 2]]} \right) - erf\left( {\frac[[:Template:U - \mu]][[:Template:\sigma \sqrt 2]]} \right) - erf\left( {\frac[[:Template:L - \mu]][[:Template:\sigma \sqrt 2]]} \right)} \right) + \frac[[:Template:2\sigma]]{{\sqrt {2\pi } }}\left( {2{e^{\frac{{ - {{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}} - {e^{\frac{{ - {{\left( {U - \mu } \right)}^2}}}{{2{\sigma ^2}}}}} - {e^{\frac{{ - {{\left( {L - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}} \right)}}{{erf\left( {\frac[[:Template:U - \mu]][[:Template:\sigma \sqrt 2]]} \right) - erf\left( {\frac[[:Template:L - \mu]][[:Template:\sigma \sqrt 2]]} \right)}}\\
L \le x \le U
\end{array} \right.$
For the mean square of distances according to (5), (6) and (7):
(11)
$\left\{ \begin{array}{l}
\Phi {.2_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \frac{{\int\limits_L^U {\left[ {\frac{{{{\left( {x - t} \right)}^2}}}{{\sigma \sqrt {2\pi } }}{e^{\frac{{ - {{\left( {t - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}} \right]dt} }}{{CD{F_N}(U;\mu ,\;\sigma ) - CD{F_N}(L;\mu ,\;\sigma )}}\\
L \le x \le U
\end{array} \right.$
and finally:
(12)
$\left\{ \begin{array}{l}
\Phi {.2_N}\left( {x\,;\;\mu ,\;\sigma ,\;L,\;U} \right) = \\
= \frac{{\left( {{{\left( {\mu - x} \right)}^2} + {\sigma ^2}} \right)\left( {erf\left( {\frac[[:Template:U - \mu]][[:Template:\sigma \sqrt 2]]} \right) - erf\left( {\frac[[:Template:L - \mu]][[:Template:\sigma \sqrt 2]]} \right)} \right) + \frac[[:Template:2\sigma]]{{\sqrt {2\pi } }}\left( {{e^{\frac{{ - {{\left( {L - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}\left( {L + \mu - 2x} \right) - {e^{\frac{{ - {{\left( {U - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}\left( {\mu + U - 2x} \right)} \right)}}{{erf\left( {\frac[[:Template:U - \mu]][[:Template:\sigma \sqrt 2]]} \right) - erf\left( {\frac[[:Template:L - \mu]][[:Template:\sigma \sqrt 2]]} \right)}}\\
L \le x \le U
\end{array} \right.$
Reference
Dodonov, Yury (2020): Distribution fitting by using the mean of distances of empirical data. figshare. Journal contribution. https://doi.org/10.6084/m9.figshare.13014410
Appendix. R code for fitting normal distribution
Distribution fitting by using the mean of distances of empirical data. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Distribution_fitting_by_using_the_mean_of_distances_of_empirical_data&oldid=50918