Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-26T03:51:04.996Z Has data issue: false hasContentIssue false

A SMOOTH TEST FOR THE EQUALITY OF DISTRIBUTIONS

Published online by Cambridge University Press:  30 July 2012

Anil K. Bera
Affiliation:
University of Illinois at Urbana - Champaign
Aurobindo Ghosh
Affiliation:
Singapore Management University
Zhijie Xiao*
Affiliation:
Boston College
*
*Address correspondence to Zhijie Xiao, Dept. of Economics, Boston College, Chestnut Hill, MA 02467; e-mail: [email protected].

Abstract

The two-sample version of the celebrated Pearson goodness-of-fit problem has been a topic of extensive research, and several tests like the Kolmogorov-Smirnov and Cramér-von Mises have been suggested. Although these tests perform fairly well as omnibus tests for comparing two probability density functions (PDFs), they may have poor power against specific departures such as in location, scale, skewness, and kurtosis. We propose a new test for the equality of two PDFs based on a modified version of the Neyman smooth test using empirical distribution functions minimizing size distortion in finite samples. The suggested test can detect the specific directions of departure from the null hypothesis. Specifically, it can identify deviations in the directions of mean, variance, skewness, or tail behavior. In a finite sample, the actual probability of type-I error depends on the relative sizes of the two samples. We propose two different approaches to deal with this problem and show that, under appropriate conditions, the proposed tests are asymptotically distributed as chi-squared. We also study the finite sample size and power properties of our proposed test. As an application of our procedure, we compare the age distributions of employees with small employers in New York and Pennsylvania with group insurance before and after the enactment of the “community rating” legislation in New York. It has been conventional wisdom that if community rating is enforced (where the group health insurance premium does not depend on age or any other physical characteristics of the insured), then the insurance market will collapse, since only older or less healthy patients would prefer group insurance. We find that there are significant changes in the age distribution in the population in New York owing mainly to a shift in location and scale.

Type
Miscellanea
Copyright
Copyright © Cambridge University Press 2012 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The authors thank the co-editor, two referees, and Oliver Linton, Peter Phillips, Yanqing Fan, and seminar participants at the Tinbergen Institute, University of Amsterdam, University of Maryland, International Symposium on Econometrics of Specification Tests in 30 Years at Xiamen University, and other conferences for helpful comments. The usual disclaimers apply.

References

REFERENCES

Akerlof, G. (1970) The market for "lemons": Qualitative uncertainty and the market mechanism. Quarterly Journal of Economics 89, 488500.CrossRefGoogle Scholar
Albers, W., Kallenberg, W.C.M., & Martini, F. (2001) Data-driven rank tests for classes of tail alternatives. Journal of the American Statistical Association 96, 685696.CrossRefGoogle Scholar
Bai, J. (2003) Testing parametric conditional distributions of dynamic models. Review of Economics and Statistics 85, 531549.CrossRefGoogle Scholar
Bera, A.K. & Bilias, Y. (2001) Rao’s score, Neyrnan’s C(α) and Silvey’s LM tests: An essay on historical developments and some new results. Journal of Statistical Planning and Inference 97, 944.CrossRefGoogle Scholar
Bera, A.K. & Ghosh, A. (2002) Neyman’s smooth test and its applications in econometrics. In Ullah, A., Wan, A., & Chaturvedi, A. (eds.), Handbook of Applied Econometrics and Statistical Inference, pp. 177230. Marcel Dekker.Google Scholar
Bera, A.K., Ghosh, A., & Xiao, Z. (2010) Smooth Test for Equality of Distributions. Working paper presented at the International Symposium on Econometrics of Specification Tests in 30 Years, Xiamen University.Google Scholar
Bickel, P.J. & Ritov, Y. (1992) Testing for goodness of fit: A new approach. In Saleh, A.K.Md.E. (ed.), Nonparametric Statistics and Related Topics, pp. 5157. North-Holland.Google Scholar
Bickel, P.J., Ritov, Y., & Stoker, T. (2006) Tailor-made tests for goodness of fit to semiparametric hypotheses. Annals of Statistics 34, 721741.CrossRefGoogle Scholar
Buchmueller, T. & DiNardo, J. (2002) Did community rating induce an adverse selection death spiral? Evidence from New York, Pennsylvania, and Connecticut. American Economic Review 92, 280294.CrossRefGoogle Scholar
D’Agostino, R.B. & Stephens, M.A. (1986) Goodness-of-Fit Techniques. New York: Marcel Dekker.Google Scholar
Darling, D.A. (1957) The Kolmogorov-Smirnov, Cramér-von Mises tests. Annals of Mathematical Statistics 28, 823838.CrossRefGoogle Scholar
Ducharme, G.R. & Ledwina, T. (2003) Efficient and adaptive nonparametric test for the two-sample problem. Annals of Statistics 31, 20362058.Google Scholar
Escanciano, J.C. (2009) On the lack of power of omnibus specification tests. Econometric Theory 25, 162194.CrossRefGoogle Scholar
Eubank, R.L. & LaRiccia, V.N. (1992) Asymptotic comparison of Cramér-von Mises and nonparametric function estimation techniques for testing goodness-of-fit. Annals of Statistics 20, 14121425.Google Scholar
Fan, J. (1996) Test of significance based on wavelet thresholding and Neyman’s truncation. Journal of the American Statistical Association 91, 674688.CrossRefGoogle Scholar
Fan, J. & Huang, L-S. (2001) Goodness-of-fit tests for parametric regression models. Journal of the American Statistical Association 96, 640652.CrossRefGoogle Scholar
Fromont, M. & Laurent, B. (2006) Adaptive goodness-of-fit tests in a density model. Annals of Statistics 34, 680720.CrossRefGoogle Scholar
Gourieroux, C. & Monfort, A. (1996) Simulation-based Econometric Methods. Oxford University Press.Google Scholar
Hart, J.D. (1997) Nonparametric Smoothing and Lack of Fit Tests. Springer-Verlag.CrossRefGoogle Scholar
Horowitz, J.L. (2002) The bootstrap in econometrics. Statistical Science 18, Silver Anniversary of the Bootstrap, 211218.Google Scholar
Inglot, T., Jurlewicz, T., & Ledwina, T. (1990) On Neyman-type smooth tests of fit. Statistics 21, 549568.CrossRefGoogle Scholar
Inglot, T., Kallenberg, W.C.M., & Ledwina, T. (1994) Power approximations to and power comparison of smooth goodness-of- fit tests. Scandinavian Journal of Statistics 21, 131145.Google Scholar
Inglot, T., Kallenberg, W.C.M., & Ledwina, T. (1998) Vanishing shortcoming of data driven Neyman’s tests. In Szyszkowicz, B. (ed.), Asymptotic Methods in Probability and Statistics: A Volume to Honour Miklós Csörgo, pp. 811829. Elsevier.CrossRefGoogle Scholar
Inglot, T., Kallenberg, W.C.M., & Ledwina, T. (2000) Vanishing shortcoming and asymptotic relative efficiency. Annals of Statistics 28, 215238 [Correction: ibidem 28 (2000), 1795].CrossRefGoogle Scholar
Inglot, T. & Ledwina, T. (2006a) Data-driven score tests for homoscedastic linear regression model: Asymptotic results. Probability and Mathematical Statistics 26, 4161.Google Scholar
Inglot, T. & Ledwina, T. (2006b) Towards data driven selection of a penalty function for data driven Neyman tests. Linear Algebra and Its Applications 417, 124133.CrossRefGoogle Scholar
Janic-Wróblewska, A. & Ledwina, T. (2000) Data driven rank test for two-sample problem. Scandinavian Journal of Statistics 27, 281297.CrossRefGoogle Scholar
Janssen, A. (2000) Global power functions of goodness of fit tests. Annals of Statistics 28, 239253.CrossRefGoogle Scholar
Kallenberg, W.C.M. (2002) The penalty in data driven Neyman’s tests. Mathematical Methods of Statistics 11, 323340.Google Scholar
Kallenberg, W.C.M. & Ledwina, T. (1995a) Consistency and Monte Carlo simulation of a data driven version of smooth goodness-of-fit tests. Annals of Statistics 23, 15941608.CrossRefGoogle Scholar
Kallenberg, W.C.M. & Ledwina, T. (1995b) On data driven Neyman’s tests. Probability and Mathematical Statistics 15, 409426.Google Scholar
Kallenberg, W.C.M. & Ledwina, T. (1997) Data-driven smooth tests when the hypothesis is composite. Journal of the American Statistical Association 92, 10941104.CrossRefGoogle Scholar
Kallenberg, W.C.M., Oosterhoff, J., & Schriever, B.F. (1985) The number of classes in chi-squared goodness-of-fit tests. Journal of the American Statistical Association 80, 959968.CrossRefGoogle Scholar
Kendall, M.G. & Stuart, A. (1973) The Advanced Theory of Statistics, 3rd ed., vol. 2. Hafner.Google Scholar
Ledwina, T. (1994) Data-driven version of Neyman’s smooth test of fit. Journal of the American Statistical Association 89, 10001005.CrossRefGoogle Scholar
Lehmann, E.L. (1953) The power of rank tests. Annals of Mathematical Statistics 24, 2343.CrossRefGoogle Scholar
Lehmann, E.L. & Romano, J.P. (2005) Testing Statistical Hypothesis, 3d ed. Springer.Google Scholar
McLeish, D.L. (1974) Dependent central limit theorems and invariance principles. Annals of Probability 2, 620628.CrossRefGoogle Scholar
Mora, J. & Neumeyer, N. (2005) The Two-Sample Problem with Regression Errors: An Empirical Process Approach. Working paper, Universidad de Alicante.Google Scholar
Newey, W.K. (1985) Generalized method of moments specification testing. Journal of Econometrics 29, 229256.CrossRefGoogle Scholar
Neyman, J. (1937) ‘Smooth test’ for goodness of fit. Skandinaviske Aktuarietidskrift 20, 150199.Google Scholar
Parzen, E. (1962) On estimation of a probability density function and mode. Annals of Mathematical Statistics 33, 10651076.CrossRefGoogle Scholar
Pearson, K. (1900) On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 5th series 50, 157175.CrossRefGoogle Scholar
Rao, C.R. (1948) Large sample tests of statistical hypothesis concerning several parameters with applications to problems of estimation. Proceedings of the Cambridge Philosophical Society 44, 5057.Google Scholar
Rothenberg, T.J. (1984) Approximating the distribution of econometric estimators and test statistics. In Griliches, Z. and Intriligator, M.D. (eds.), Handbook of Econometrics, vol. 2. North-Holland.Google Scholar
Rothschild, M. & Stiglitz, J. (1976) Equilibrium in competitive insurance markets: An essay on the economics of imperfect information. Quarterly Journal of Economics 90, 630649.CrossRefGoogle Scholar
Shorack, G.R. & Wellner, J. (1986) Empirical Processes with Applications to Statistics. Wiley.Google Scholar
Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman & Hall.CrossRefGoogle Scholar
Solomon, H. & Stephens, M.A. (1985) Neyman’s test for uniformity. In Kotz, S. and Johnson, N.L. (eds.), Encylcopedia of Statistical Sciences, vol. 6, pp. 232235. Wiley.Google Scholar
Stephens, M.A. (1970) Use of Kolmogorov-Smirnov, Cramer-von Mises and related statistics without extensive tables. Journal of the Royal Statistical Society Series B 32, 115122.Google Scholar
Stock, J.H., Wright, J.H., & Yogo, M. (2002) A survey of weak instruments and weak identification in generalized method of moments. The Journal of Business and Economic Statistics 20, 518529.CrossRefGoogle Scholar
Thomas, D.R., & Pierce, D.A. (1979) Neyman’s smooth goodness-of-fit test when the hypothesis is composite. Journal of the American Statistical Association 74, 441445.Google Scholar