Consistent bootstrap procedures for post-model-selection inference


smslee - Posted on 01 December 2015

Project Description: 

Statistical inference has often been made under a model pre-selected by a data-driven procedure as if this model were the “correct” model. That this practice has been prevalent in many applied works necessitates a careful evaluation of the effects of model selection on conclusions drawn from standard statistical inference. Better insights into the issue have been afforded by recent studies on the distributions of post-model-selection estimators and predictors, which typically exhibit a complicated form in defiance of the “classical” Gaussian protocol, thus pointing to possible invalidity of standard post-model-selection inference. The problem is particularly acute in regression studies where inference analysis is often acted on a “final” model derived from a data-driven variable selection procedure, without taking into account the uncertainties induced by the selection process. This calls into question validity of standard regression inference tools like t-tests and ANOVA, upon which misleading conclusions may potentially be drawn. It is therefore important to develop accurate estimates of the “complicated” distributions of post-model-selection estimators, for which task the conventional asymptotic approach has been found to be particularly ill-equipped. Standard resampling procedures such as the paired and residual bootstraps fail to provide a satisfactory solution neither, as they often turn out to be inconsistent. This project aims to develop a more sustainable bootstrap-based procedure for estimating the distributions of post-model-selection estimators, which will suffer less from the aforementioned difficulties. The proposed procedure is novel and can be viewed as an adaptive modification of existing bootstrap methods. Its performance will be investigated theoretically and empirically under settings where estimation of the distribution is found to be difficult. The project, on successful completion, will deliver a more reliable inference scheme for post-model-selection regression analysis, which will find useful applications in a wide spectrum of quantitative studies.

Research Project Details
Project Duration: 
01/2016 to 12/2018
Project Significance: 
The project addresses an important issue in regression inference — consistent estimation of sampling distributions of post-model-selection least squares estimators. Practitioners have for long undertaken regression analyses in the mistaken belief that a model pre-selected by a data-driven procedure were indeed the true model, thus arriving at inference results which are often unreliable. Complexity of the distributions of post-model-selection least squares estimators, which are asymptotically markedly non-Gaussian, has only recently been exposed and recognized as a serious problem. Our proposed bootstrap procedure strives to construct consistent estimates of such distributions, based on which credible statistical inference about the regression parameters can be drawn. We believe that the procedure will correct a common fallacy held by regression practitioners and provide for them a satisfactory solution which may find important applications in standard regression studies.
Remarks: 
Large-scale simulations