Cox regression model
A regression model introduced by D.R. Cox [a4] and subsequently proved to be one of the most useful and versatile statistical models, in particular with regards to applications in survival analysis (cf. also Regression analysis).
Let be stochastically independent, strictly positive random variables (cf. also Random variable), to be thought of as the failure times of
different items, such that
has hazard function
(i.e.
![]() |
for ) of the form
![]() |
Here, is an unknown hazard function, the baseline hazard obtained if
, and
is a vector of
unknown regression parameters. The
denote known non-random vectors of possibly time-dependent covariates, e.g. individual characteristics of a patient referring to age, sex, method of treatment as well as physiological and other measurements.
The parameter vector is estimated by maximizing the partial likelihood [a5]
![]() | (a1) |
where are the
ordered according to size,
if it is item
that fails at time
, and
denotes the set of items
still at risk, i.e. not yet failed, immediately before
. With this setup, the
th factor in
describes the conditional distribution of
given
and
.
For many applications it is natural to allow for, e.g., censorings (cf. also Errors, theory of) or truncations (the removal of an item from observation through other causes than failure) as well as random covariate processes . Formally this may be done by introducing the counting processes
registering the failures if they are observed, where
is a
-valued stochastic process with
if item
is at risk (under observation) just before time
. If
denotes the
-algebra for everything observed (failures, censorings, covariate values, etc.) on the time interval
, it is then required that
have
-intensity process
![]() | (a2) |
i.e. defines a
-martingale (cf. also Martingale), while intuitively, for small
, the conditional probability given the past that item
will fail during the interval
is approximately
, provided
is at risk at time
. For
known, (a2) is then an example of Aalen's multiplicative intensity model [a1] with the integrated baseline hazard
estimated by, for any
,
![]() | (a3) |
writing and where
signifies that it is the values of
and
just before the observed failure times that should be used. Since in practice
is unknown, in (a3) one of course has to replace
by the estimator
, still obtained maximizing the partial likelihood (a1), replacing
by the random number of observed failures, replacing
by
, and using
with
now the
th observed failure. (Note that in contrast to the situation with non-random covariates described above, there is no longer an interpretation of the factors in
as conditional distributions.)
Using central limit theorems for martingales (cf. also Central limit theorem: Martingale), conditions may be given for consistency and asymptotic normality of the estimators and
, see [a3].
It is of particular interest to be able to test for the effect of one or more covariates, i.e. to test hypothesis of the form for one or more given values of
,
. Such tests include likelihood-ratio tests derived from the partial likelihood (cf. also Likelihood-ratio test), or Wald test statistics based on the asymptotic normality of
. A thorough discussion of the tests in particular and of the Cox regression model in general is contained in [a2], Sect. VII.2; [a2], Sect. VII.3, presents methods for checking the proportional hazards structure assumed in (a2).
Refinements of the model (a2) include models for handling e.g. stratified data, Markov chains with regression structures for the transition intensities, etc. It should be emphasized that these models, including (a2), are only partially specified in the sense that with (a2) alone nothing much is said about the distributions of the or
. This, in particular, makes it extremely difficult to use the models for, e.g., the prediction of survival times.
References
[a1] | O.O. Aalen, "Nonparametric inference for a family of counting processes" Ann. Statist. , 6 (1978) pp. 701–726 |
[a2] | P.K.A. Andersen, Ø. Borgan, R.D. Gill, N. Keiding, "Statistical models based on counting processes" , Springer (1993) |
[a3] | P.K.A. Andersen, R.D. Gill, "Cox's regression model for counting processes: A large sample study" Ann. Statist. , 10 (1982) pp. 1100–1120 |
[a4] | D.R. Cox, "Regression models and life-tables (with discussion)" J. Royal Statist. Soc. B , 34 (1972) pp. 187–220 |
[a5] | D.R. Cox, "Partial likelihood" Biometrika , 62 (1975) pp. 269–276 |
Cox regression model. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Cox_regression_model&oldid=16071