Generalized quasi-likelihood
Copyright notice |
---|
This article Generalized Quasi-likelihood (GQL) Inference was adapted from an original article by Brajendra C Sutradhar, which appeared in StatProb: The Encyclopedia Sponsored by Statistics and Probability Societies. The original article ([http://statprob.com/encyclopedia/GeneralizedQuasiLikelihoodGQLInferences.html StatProb Source], Local Files: pdf | tex) is copyrighted by the author(s), the article has been donated to Encyclopedia of Mathematics, and its further issues are under Creative Commons Attribution Share-Alike License'. All pages from StatProb are contained in the Category StatProb. |
2020 Mathematics Subject Classification: Primary: 62F10 Secondary: 62H20 [MSN][ZBL]
Generalized Quasi-likelihood (GQL) Inference*
by Brajendra C. Sutradhar
Memorial University
Email address: bsutradh@mun.ca
QL Estimation for Independent Data. For $i=1,\ldots,K,$ let $Y_i$ denote the response variable for the $i$th individual, and $x_i=(x_{i1},\ldots,x_{iv},\ldots,x_{ip})'$ be the associated $p-$dimensional covariate vector. Also, let $\beta$ be the $p-$dimensional vector of regression effects of $x_i$ on $y_i.$ Further suppose that the responses are collected from $K$ independent individuals. It is understandable that if the probability distribution of $Y_i$ is not known, then one can not use the well known likelihood approach to estimate the underlying regression parameter $\beta.$ Next suppose that only two moments of the data, that is, the mean and the variance functions of the response variable $Y_i$ for all $i=1,\ldots,K,$ are known, and for a known functional form $a(\cdot)$, these moments are given by $$ E[Y_i]=a'(\theta_i)\;\mbox{and}\; \mbox{var}[Y_i]=a''(\theta_i), \tag{1}$$
where for a link function $h(\cdot),$ $\theta_i=h(x'_i\beta),$ and $a'(\theta_i)$ and $a''(\theta_i)$ are the first and second order derivatives of $a(\theta_i),$ respectively, with respect to $\theta_i.$ For the estimation of the regression parameter vector $\beta$ under this independence set up, Wedderburn (1974) (see also McCullagh (1983)) proposed to solve the so-called quasi-likelihood (QL) estimating equation given by $$ \sum^K_{i=1}[\frac{\partial a'(\theta_{i})}{\partial \beta}\frac{(y_{i}-a'(\theta_{i}))}{a''(\theta_i)}]=0. \tag{2}$$
Let $\hat{\beta}_{QL}$ be the QL estimator of $\beta$ obtained from (2). It is known that this estimator is consistent and highly efficient. In fact, for Poisson and binary data, for example, $\hat{\beta}_{QL}$ is equivalent to the maximum likelihood (ML) estimator and hence it turns out to be an optimal estimator.
{\mathbf Illustration for the Poisson case:} For the Poisson data, one uses
$$
a(\theta_{i})=\exp(\theta_{i})
\tag{3}$$
with identity link function $h(\cdot),$ that is, $\theta_{i}=x'_{i}\beta.$ This gives the mean and the variance functions as $$\mbox{var}(Y_{i})=a''(\theta_{i})=E(Y_i)=a'(\theta_{i})=\mu_{i}\;\mbox{(say)}=\exp(x'_{i}\beta),$$ yielding by (2), the QL estimating equation for $\beta$ as $$ \sum^K_{i=1}x_i(y_i-\mu_i)=0. \tag{4}$$
Note that as the Poisson density is given by $f(y_i|x_i)=\frac{1}{y_i!}\exp[y_ilog(\mu_i)-\mu_i],$ with $\mu_i=\exp(\theta_i)=\exp(x'_i\beta),$ it follows that the log likelihood function of $\beta$ has the form $\mbox{log}L(\beta)=-\sum^K_{i=1}log(y_i!)+\sum^K_{i=1}[y_{i}\theta_{i}-a(\theta_{i})],$ yielding the likelihood equation for $\beta$ as $$ \frac{\partial \mbox{log} L}{\partial \beta}=\sum^K_{i=1}[y_{i}-a'(\theta_{i})]\frac{\partial \theta_{i}}{\partial \beta}=\sum^K_{i=1}x_i(y_i-\mu_i)=0, \tag{5}$$
which is the same as the QL estimating equation (4). Thus, if the likelihood function were known, then the ML estimate of $\beta$ would be the same as the QL estimate $\hat{\beta}_{QL}.$
Illustration for the binary case: For the binary data, one uses
$$
a'(\theta_{i})=\frac{\exp(\theta_{i})}{1+\exp(\theta_{i})}=\mu_i\;\mbox{and}\;a''(\theta_i)=\mu_i(1-\mu_i),
\tag{6}$$
with $\theta_i=x'_i\beta.$ The QL estimating equation (2) for the binary data, however, provides the same formula (4) as in the Poisson case, except that now for the binary case $\mu_i= \frac{\exp(\theta_{i})}{1+\exp(\theta_{i})},$ whereas for the Poisson case $\mu_i=\exp(\theta_i).$
As far as the ML estimation for the binary case is concerned, one first writes the binary density given by $f(y_i|x_i)={\mu_i}^{y_i}(1-\mu_i)^{1-y_i}.$ Next by writing the log likelihood function as $\mbox{log}L(\beta)=\sum^K_{i=1}y_i\mu_i+\sum^K_{i=1}(1-y_i)(1-\mu_i),$ one obtains the same likelihood estimating equation as in (5), except that here $\mu_i= \frac{\exp(x'_i\beta)}{1+\exp(x'_i\beta)},$ under the binary model. Since the QL estimating equation (4) is the same as the ML estimating equation (5), it then follows that the ML and QL estimates for $\beta$ would also be the same for the binary data.
GQL Estimation: A Generalization of the QL Estimation to the Correlated Data.
As opposed to the independence set up, we now consider $y_i$ as a vector of $T$ repeated binary or count responses, collected from the $i-$th individual, for all $i=1,\ldots,K.$ Let $y_i=(y_{i1},\ldots,y_{it},\ldots,y_{iT})',$ where $y_{it}$ represents the response recorded at time $t$ for the $i$th individual. Also, let $x_{it}=(x_{it1},\ldots,x_{itv},\ldots,x_{itp})'$ be the $p-$dimensional covariate vector corresponding to the scalar $y_{it},$ and $\beta$ be the $p-$dimensional regression effects of $x_{it}$ on $y_{it}$ for all $i=1,\ldots,K,$ and all $t=1,\ldots,T.$ Suppose that $\mu_{it}$ and $\sigma_{itt}$ be the mean and the variance of $Y_{it},$ that is $\mu_{it}=E[Y_{it}]$ and $\mbox{var}[Y_{it}]=\sigma_{itt}.$ Note that both $\mu_{it}$ and $\sigma_{itt}$ are functions of $\beta.$ But, when the variance is a function of mean, it is sufficient to estimate $\beta$ involved in the mean function only, by treating $\beta$ involved in the variance function to be known. Further note that since the $T$ repeated responses of an individual are likely to correlated, the estimate of $\beta$ to be obtained by ignoring the correlations, that is, the solution of the independence assumption based QL estimating equation $$ \sum^K_{i=1}\sum^T_{t=1}[\frac{\partial \mu_{it}}{\partial \beta}\frac{(y_{i}-\mu_{it})}{\sigma_{itt}}]=0, \tag{7}$$
for $\beta,$ will be consistent but inefficient. As a remedy to this inefficient estimation problem, Sutradhar (2003) has proposed a generalization of the QL estimation approach, where $\beta$ is now obtained by solving the GQL estimating equation given by $$ \sum^K_{i=1} \frac{\partial \mu'_i}{\partial \beta}{\Sigma_i}^{-1}(\rho )(y_i-\mu_i)=0, \tag{8}$$
where $\mu_i=(\mu_{i1},\ldots,\mu_{it},\ldots,\mu_{iT})'$ is the mean vector of $Y_i,$ and $\Sigma_i(\rho)$ is the covariance matrix of $Y_i$ that can be expressed as $ \Sigma_i(\rho )=A^{\frac{1}{2}}_iC_i(\rho )A^{\frac{1}{2}}_i$, with $A_i=\mbox{diag}[\sigma_{i11},\ldots,\sigma_{itt},\ldots,\sigma_{iTT}]$ and $C_i(\rho)$ as the correlation matrix of $Y_i,$ $\rho$ being a correlation index parameter.
Note that the use of the GQL estimating equation (8) requires the structure of the correlation matrix $C_i(\rho)$ to be known, which is, however, unknown in practice. To overcome this difficulty, Sutradhar (2003) has suggested a general stationary auto-correlation structure given by $$ C_i(\rho)=\left[ \begin{array}{ccccc} 1 & \rho_1 & \rho_2 & \cdots & \rho_{T-1} \\ [2ex] \rho_1 & 1 & \rho_1 & \cdots & \rho_{T-2} \\ \vdots & \vdots & \vdots && \vdots \\ \rho_{T-1} & \rho_{T-2} & \rho_{T-3} & \cdots & 1 \\ \end{array} \right] , \tag{9}$$
(see also Sutradhar and Das (1999, Section 3)), for all $i=1,\ldots,K,$ where for $\ell=1,\ldots,T-1,$ $\rho_\ell$ represents the lag $\ell$ auto-correlation. As far as the estimation of the lag correlations is concerned, they may be consistently estimated by using the well known method of moments. For $\ell =|u-t|$, $u\neq t$, $u, t=1,\ldots ,T$, the moment estimator for the autocorrelation of lag $\ell$, $\rho_{\ell}$, has the formula $$ \hat{\rho}_\ell = \frac{\sum^K_{i=1}\sum^{T-\ell}_{t=1}\tilde{y}_{it}\tilde{y}_{i,t+\ell} /K(T-\ell )}{\sum^K_{i=1}\sum^T_{t=1}\tilde{y}^2_{it}/KT} , \tag{10}$$
(Sutradhar and Kovacevic (2000, eqn. (2.18), Sutradhar (2003)), where $\tilde{y}_{it}$ is the standardized residual, defined as $ \tilde{y}_{it}=(y_{it}-\mu_{it})/\{\sigma_{itt} \}^{\frac{1}{2}}$.
The GQL estimating equation (8) for $\beta$ and the moment estimate of $\rho_\ell$ by (10) are solved iteratively until convergence. The final estimate of $\beta$ obtained from this iterative process is referred to as the GQL estimate of $\beta,$ and may be denoted by $\hat{\beta}_{GQL}.$ This estimator $\hat{\beta}_{GQL}$ is consistent for $\beta$ and also highly efficient, the ML estimator being fully efficient which is however impossible or extremely complex to obtain in the correlated data set up.
With regard to the generality of the stationary auto-correlation matrix $C_i(\rho)$ in (9), one may show that this matrix, in fact, represents the correlations of many stationary dynamic such as stationary auto-regressive order 1 (AR(1)), stationary moving average order 1 (MA(1)), and stationary equi-correlations (EQC) models. For example, consider the stationary AR(1) model given by $$ y_{it}=\rho * y_{i,t-1}+d_{it}, \tag{11}$$
(McKenzie (1988), Sutradhar (2003)) where it is assumed that for given $y_{i,t-1}$, $\rho * y_{i,t-1}$ denotes the so-called binomial thinning operation (McKenzie, 1988). That is, $$ \rho * y_{i,t-1} = \sum^{y_{i,t-1}}_{j=1}b_j(\rho ) = z_{i,t-1}, {\mbox{say}}, \tag{12}$$
with $\Pr [b_j(\rho )=1]=\rho$ and $\Pr [b_j(\rho )=0]=1-\rho$. Furthermore, it is assumed in (11) that $y_{i1}$ follows the Poisson distribution with mean parameter $\mu_{i\cdot},$ that is, $y_{i1}\sim Poi(\mu_{i\cdot}),$ where $\mu_{i\cdot}=\exp(x'_{i\cdot}\beta)$ with stationary covariate vector $x_{i\cdot}$ such that $x_{it}=x_{i\cdot}$ for all $t=1,\ldots,T.$ Further, in (11), $d_{it} \sim P(\mu_{i\cdot}(1-\rho ))$ and is independent of $z_{i,t-1}.$ This model in (11) yields the mean, variance and auto-correlations of the data as shown in Table 1. The Table 1 also contains the MA(1) and EQC models and their basic properties including the correlation structures.
{\mathbf Table 1.} A class of stationary correlation models for longitudinal count data and basic properties.
Model | Dynamic relationship | Mean-variance |
& Correlations | ||
AR(1) | $y_{it}=\rho * y_{i,t-1}+d_{it}, t=2,\ldots$ | $E[Y_{it}]=\mu_{i\cdot}$ |
$y_{i1}\sim Poi(\mu_{i\cdot})$ | $\mbox{var}[Y_{it}]=\mu_{i\cdot}$ | |
$d_{it} \sim P(\mu_{i\cdot}(1-\rho )), t=2,\ldots$ | $\mbox{corr}[Y_{it},Y_{i,t+\ell}]=\rho_{\ell}$ | |
$=\rho^{\ell}$ | ||
MA(1) | $y_{it}=\rho * d_{i,t-1}+d_{it}, t=2,\ldots$ | $E[Y_{it}]=\mu_{i\cdot}$ |
$y_{i1}=d_{i1} \sim Poi(\mu_{i\cdot}/(1+\rho))$ | $\mbox{var}[Y_{it}]=\mu_{i\cdot}$ | |
$d_{it} \sim P(\mu_{i\cdot}/(1+\rho )), t=2,\ldots$ | $\mbox{corr}[Y_{it},Y_{i,t+\ell}]=\rho_{\ell}$ | |
$= \left\{ \begin{array}{ll} \frac{\rho}{1+\rho} & \mbox{for } \ell=1\\ 0 & \mbox{otherwise}, \end{array} \right.$ | ||
EQC | $y_{it}=\rho * y_{i1}+d_{it}, t=2,\ldots$ | $E[Y_{it}]=\mu_{i\cdot}$ |
$y_{i1}\sim Poi(\mu_{i\cdot})$ | $\mbox{var}[Y_{it}]=\mu_{i\cdot}$ | |
$d_{it} \sim P(\mu_{i\cdot}(1-\rho )), t=2,\ldots$ | $\mbox{corr}[Y_{it},Y_{i,t+\ell}]=\rho_{\ell}$ | |
$=\rho$ |
It is clear from Table 1 that the correlation structures for all three processes can be represented by $C_i(\rho)$ in (9). By following Qaqish (2003), one may write similar but different dynamic models for the repeated binary data, with their correlation structures represented by $C_i(\rho).$ Thus, if the count or binary data follow this type of auto-correlations model, one may then certainly estimate the regression vector consistently and efficiently by solving the general auto-correlations matrix based GQL estimating equation (8), where the lag correlations are estimated by (10) consistently.
[* Reprinted with permission from Lovric, Miodrag (2011), International
Encyclopedia of Statistical Science. Heidelberg: Springer Science & Business
Media, LLC]
\medskip
References
[1] | McCullagh, P. (1983). Quasilikelihood functions. Ann. Statist. 11, 59-67. |
[2] | McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability 20, 822-35. |
[3] | Qaqish, B. F. (2003). A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90, 455-463. |
[4] | Sutradhar, B. C. (2003). An overview on regression models for discrete longitudinal responses. Statistical Science 18, 377-93. |
[5] | Sutradhar, B. C. & Das, K. (1999). On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 86, 459-65. |
[6] | Sutradhar, B. C. & Kovacevic, M. (2000). Analyzing ordinal longitudinal survey data: Generalized estimating equations approach. Biometrika 87, 837-848. |
[7] | Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalised linear models, and the Gauss-Newton method. Biometrika 61, 439-447. |
Generalized quasi-likelihood. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Generalized_quasi-likelihood&oldid=37747