On the Optimality of Sample-Based Estimatesof the Expectation of theEmpirical Minimizer* **

Peter L. Bartlett; Shahar Mendelson; Petra Philips

doi:10.1051/ps:2008036

On the Optimality of Sample-Based Estimatesof the Expectation of theEmpirical Minimizer* **

Published online by Cambridge University Press: 29 October 2010

Peter L. Bartlett ,

Shahar Mendelson and

Petra Philips

Show author details

Peter L. Bartlett*: Affiliation:
Computer Science Division and Department of Statistics, 367 Evans Hall #3860, University of California, Berkeley, CA, 94720-3860, USA
Shahar Mendelson: Affiliation:
Centre for Mathematics and its Applications (CMA), The Australian National University Canberra, Canberra, ACT, 0200, Australia Department of Mathematics, Technion I.I.T., Haifa, 32000, Israel
Petra Philips: Affiliation:
Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, 72076, Germany
*: Corresponding author: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Abstract

We study sample-based estimates of the expectation of the functionproduced by the empirical minimization algorithm. We investigate theextent to which one can estimate the rate of convergence of theempirical minimizer in a data dependent manner. We establish threemain results. First, we provide an algorithm that upper bounds theexpectation of the empirical minimizer in a completelydata-dependent manner. This bound is based on a structural resultdue to Bartlett and Mendelson, which relates expectations to sampleaverages. Second, we show that these structural upper bounds can beloose, compared to previous bounds. In particular, we demonstrate aclass for which the expectation of the empirical minimizer decreasesas O(1/n) for sample size n, although the upper bound based onstructural properties is Ω(1). Third, we show that thislooseness of the bound is inevitable: we present an example thatshows that a sharp bound cannot be universally recovered fromempirical data.

Keywords

Error bounds; empirical minimization;data-dependent complexity

Type: Research Article
Information: ESAIM: Probability and Statistics , Volume 14 , 2010 , pp. 315 - 337

DOI: https://doi.org/10.1051/ps:2008036 [Opens in a new window]
Copyright: © EDP Sciences, SMAI, 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

This work was supported in part by National Science Foundation Grant 0434383.

This work was supported in part by the Australian Research Council Discovery Grant DP0559465.

References

Bartlett, P.L. and Mendelson, S., Empirical minimization. Probab. Theory Relat. Fields 135 (2006) 311–334. CrossRef

Bartlett, P.L. and Wegkamp, M.H., Classification with a reject option using a hinge loss. J. Machine Learn. Res. 9 (2008) 1823–1840.

Bartlett, P.L., Bousquet, O. and Mendelson, S., Local Rademacher Complexities. Ann. Statist. 33 (2005) 1497–1537. CrossRef

Bartlett, P.L., Jordan, M.I. and McAuliffe, J.D., Convexity, classification, and risk bounds. J. Am. Statist. Assoc. 101 (2006) 138–156. CrossRef

Blanchard, G., Lugosi, G. and Vayatis, N., On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. 4 (2003) 861–894.

Boucheron, S., Lugosi, G. and Massart, P., Concentration inequalities using the entropy method. Ann. Probab. 31 (2003) 1583–1614.

O. Bousquet, Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. Ph.D. thesis, École Polytechnique, 2002.

R.M. Dudley, Uniform Central Limit Theorems, Cambridge University Press (1999).

Haussler, D., Sphere Packing Numbers for Subsets of the Boolean n-cube with Bounded Vapnik-Chervonenkis Dimension. J. Combin. Theory Ser. A 69 (1995) 217–232. CrossRef

T. Klein, Une inégalité de concentration gauche pour les processus empiriques. C. R. Math. Acad. Sci. Paris 334 (2002) 501–504. CrossRef

V. Koltchinskii, Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. Ann. Statist. 34 (2006).

V. Koltchinskii and D. Panchenko, Rademacher processes and bounding the risk of function learning. High Dimensional Probability, Vol. II (2000) 443–459.

M. Ledoux, The Concentration of Measure Phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society (2001).

Lee, W.S., Bartlett, P.L. and Williamson, R.C., The Importance of Convexity in Learning with Squared Loss. IEEE Trans. Informa. Theory 44 (1998) 1974–1980.

G. Lugosi and N. Vayatis, On the Bayes-risk consistency of regularized boosting methods (with discussion), Ann. Statist. 32 (2004) 30–55.

Lugosi, G. and Wegkamp, M., Complexity regularization via localized random penalties. Ann. Statist. 32 (2004) 1679–1697.

Massart, P., The constants in Talagrand's concentration inequality for empirical processes. Ann. Probab. 28 (2000) 863–884. CrossRef

P. Massart, Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math. IX (2000) 245–303.

Massart, P. and Nédélec, E., Risk bounds for statistical learning. Ann. Statist. 34 (2006) 2326–2366. CrossRef

Mendelson, S., Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 (2002) 1977–1991. CrossRef

S. Mendelson, A few notes on Statistical Learning Theory. In Proc. of the Machine Learning Summer School, Canberra 2002, S. Mendelson and A. J. Smola (Eds.), LNCS 2600. Springer (2003).

Rio, E., Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Relat. Fields 119 (2001) 163–175. CrossRef

Rudelson, M. and Vershynin, R., Combinatorics of random processes and sections of convex bodies. Ann. Math. 164 (2006) 603–648. CrossRef

Talagrand, M., Sharper Bounds for Gaussian and Empirical Processes. Ann. Probab. 22 (1994) 20–76.

Talagrand, M., New concentration inequalities in product spaces. Inventiones Mathematicae 126 (1996) 505–563. CrossRef

B. Tarigan and S.A. Van de Geer, Adaptivity of support vector machines with $\ell_1$

penalty. Technical Report MI 2004-14, University of Leiden (2004).

Tsybakov, A., Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 (2004) 135–166. CrossRef

Van de Geer, S.A., A new approach to least squares estimation, with applications. Ann. Statist. 15 (1987) 587–602. CrossRef

S.A. Van de Geer, Empirical Processes in M-Estimation, Cambridge University Press (2000).

A. van der Vaart and J. Wellner, Weak Convergence and Empirical Processes. Springer (1996).

Vapnik, V.N. and Chervonenkis, A.Y., On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 (1971) 264–280. CrossRef

Article contents

On the Optimality of Sample-Based Estimatesof the Expectation of theEmpirical Minimizer* **

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests