Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-26T03:38:43.975Z Has data issue: false hasContentIssue false

RISK MINIMIZATION FOR TIME SERIES BINARY CHOICE WITH VARIABLE SELECTION

Published online by Cambridge University Press:  05 March 2010

Wenxin Jiang*
Affiliation:
Northwestern University
Martin A. Tanner
Affiliation:
Northwestern University
*
*Address correspondence to Wenxin Jiang, Department of Statistics, Northwestern University, Evanston, IL 60208, U.S.A.; e-mail: [email protected].

Abstract

This paper considers the problem of predicting binary choices by selecting from a possibly large set of candidate explanatory variables, which can include both exogenous variables and lagged dependent variables. We consider risk minimization with the risk function being the predictive classification error. We study the convergence rates of empirical risk minimization in both the frequentist and Bayesian approaches. The Bayesian treatment uses a Gibbs posterior constructed directly from the empirical risk instead of using the usual likelihood-based posterior. Therefore these approaches do not require a correctly specified probability model. We show that the proposed methods have near optimal performance relative to a class of linear classification rules with selected variables. Such results in classification are obtained in a framework of dependent data with strong mixing.

Type
ARTICLES
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Carrasco, M. & Chen, X. (2002) Mixing and moment properties of various GARCH and stochastic volatility models. Econometric Theory 18, 1739.CrossRefGoogle Scholar
Davidson, J. (1994) Stochastic Limit Theory. Oxford University Press.Google Scholar
de Jong, R.M. & Woutersen, T. (2007) Dynamic Time Series Binary Choice. Economics Working Paper Archive 538, Johns Hopkins University.Google Scholar
Devroye, L., Györfi, L., & Lugosi, G. (1996) A Probabililstic Theory of Pattern Recognition. Springer.Google Scholar
Doob, J.L. (1949) Application of the theory of martingales. In Le Calcul des Probabilits et ses Applications, Colloques Internationaux du Centre National de la Recherche Scientifique 13, pp. 2327. CNRS.Google Scholar
Elliott, G. & Lieli, R.P. (2005) Predicting Binary Outcomes. Manuscript, University of California, San Diego.Google Scholar
Francq, C. & Zakoïan, J.-M. (2006) Mixing properties of a general class of GARCH(1,1) models without moment assumptions on the observed process. Econometric Theory 22, 815834.Google Scholar
Geman, S. & Geman, D. (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721741.Google Scholar
Greenshtein, E. (2006) Best subset selection, persistency in high dimensional statistical learning and optimization under ℓ1 constraint. Annals of Statistics 34, 23672386.CrossRefGoogle Scholar
Horowitz, J.L. (1992) A smoothed maximum score estimator for the binary response model. Econometrica 60, 505531.Google Scholar
Jiang, W. & Tanner, M.A. (2008a) Gibbs posterior for variable selection in high dimensional classification and data mining. Annals of Statistics 36, 22072231.Google Scholar
Jiang, W. & Tanner, M.A. (2008b) Risk Minimization for Time Series Binary Choice with Variable Selection. Technical Report 07-02, Northwestern University.Google Scholar
Kleijn, B.J.K. & van der Vaart, A.W. (2006) Misspecification in infinite-dimensional Bayesian statistics. Annals of Statistics 34, 837877.CrossRefGoogle Scholar
Lijoi, A., Prünster, I., & Walker, S.G. (2007) Bayesian consistency for stationary models. Econometric Theory 23, 749759.Google Scholar
Manski, C.F. (1975) Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics 3, 205228.CrossRefGoogle Scholar
Manski, C.F. (1985) Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator. Journal of Econometrics 27, 313333.Google Scholar
Meitz, M. & Saikkonen, P. (2008) Ergodicity, mixing, and existence of moments of a class of Markov models with applications to GARCH and ACD models. Econometric Theory 24, 12911320.Google Scholar
Modha, D.S. & Masry, E. (1996) Minimum complexity regression estimation with weakly dependent observations. IEEE Transactions on Information Theory 42, 21332145.Google Scholar
Stock, J.H. & Watson, M.W. (2006) Forecasting with many predictors. In Elliott, G., Granger, C.W.J., & Timmermann, A. (eds.), Handbook of Economic Forecasting, pp. 515554. Elsevier.Google Scholar
Tanner, M.A. (1996) Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 3rd ed.Springer-Verlag.Google Scholar
Zhang, T. (1999) Theoretical analysis of a class of randomized regularization methods. In COLT 99, Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 156163. ACM Press.CrossRefGoogle Scholar
Zhang, T. (2006) Information theoretical upper and lower bounds for statistical estimation. IEEE Transaction on Information Theory 52, 13071321.Google Scholar
Zou, B. & Li, L. (2007) The performance bounds of learning machines based on exponentially strong mixing sequences, Computers and Mathematics with Applications 53, 10501058.CrossRefGoogle Scholar