Survival analysis has been widely used in medical science ec

2019-12-02

Survival analysis has been widely used in medical science, economics, finance, and social science, among others. In many studies, survival data have primary outcomes or responses that are subject to censoring. The Cox model [7], [8] is the most commonly used regression model for survival data, and the partial likelihood method has become a standard approach to parameter estimation and statistical inference. Recently, variable selection and parameter estimation in Cox regression models have been considered by various authors (see, e.g., [4], [9], [14], [18], [19], [30]). Huang et al. [15] studied the penalized partial likelihood with the -penalty for the Cox model with high-dimensional covariates. Yan and Huang [28] proposed the adaptive group Lasso in a Cox regression model with time-varying coefficients. However, they Bay 11-7085 sale have not considered varying-coefficient models. In this paper, we propose a new feature screening procedure for ultrahigh-dimensional varying-coefficient Cox models. It is distinguished from SIS procedures [11], [32] in that the proposed procedure is based on the joint partial likelihood of potentially important features, rather than the marginal partial likelihood of individual features. Xu and Chen [27] proposed a joint screening procedure and showed its advantage over SIS procedures in the context of generalized linear models. Yang et al. [29] extended the procedures in [27] to the Cox models. This work further extends the joint screening strategy and develops a feature screening procedure for varying-coefficient Cox models, which are natural extensions of Cox models and can be useful to explore nonlinear interaction effects between a primary covariate and other covariates. The asymptotic properties of the proposed procedure are studied systematically. It is technically challenging to establish its sure screening property. The techniques used in [29] and other works related to SIS procedures cannot be applied for the present setting. We first develop Hoeffding’s inequality for a sequence of martingale differences and then establish a concentration inequality for the score function of a partial likelihood. Based on the concentration inequality, we prove the screening property for our proposed sure joint screening procedure. We also conduct simulation studies to assess the finite-sample performance of the proposed procedure and compare its performance with existing sure screening procedures for ultrahigh-dimensional survival data. The proposed methodology is demonstrated through an empirical analysis of a genomic data set. The rest of this paper is organized as follows. In Section 2, we propose a new feature screening procedure for the varying-coefficient Cox model, develop an algorithm to carry restriction enzymes out, and demonstrate the ascent property of the proposed algorithm. We study the sampling property of the proposed procedure and establish its sure screening property. In Section 3, we present numerical comparisons and an empirical analysis of a real data set. Discussion is in Section 4. Technical proofs are in the Appendix.
New feature screening procedure for varying-coefficient Cox model Let be the survival time and and be -dimensional covariate vector and univariate covariate, respectively. Throughout this paper, we consider the varying-coefficient Cox proportional hazard model given by where is an unspecified baseline hazard function and consists of the unknown nonparametric coefficient functions. It is assumed that the support of is finite and denoted by . In survival data analysis, survival times are subject to a censoring time . Denote the observed time by and the event indicator by . It is assumed throughout this paper that the censoring mechanism is noninformative. That is, given and , and are conditionally independent. Suppose that is a random sample from model (1). Let be the ordered observed failure times. Let be the label for the subject failing at time , so that the covariates associated with the failures are and . Denote the risk set right before time by . The partial likelihood function [8] of the random sample is