Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-26T20:16:14.796Z Has data issue: false hasContentIssue false

Valid Generalisation from Approximate Interpolation

Published online by Cambridge University Press:  12 September 2008

Martin Anthony
Affiliation:
Department of Mathematics, The London School of Economics and Political Science, Houghton Street, London WC2A 2AE, UK Email: [email protected]
Peter Bartlett
Affiliation:
Department of Systems Engineering, Research School of Information Sciences and Engineering, Australian National University, Canberra 0200, Australia Email: [email protected]
Yuval Ishai
Affiliation:
Department of Computer Science, Technion, Haifa 32000, Israel
John Shawe-Taylor
Affiliation:
Computer Science Department, Royal Holloway, University of London, Egham Hill, Egham, Surrey TW20 0EX, UK Email: [email protected]

Abstract

Let and be sets of functions from domain X to ℝ. We say that validly generalises from approximate interpolation if and only if for each η > 0 and ∈, δ ∈ (0,1) there is m0(η, ∈, δ) such that for any function t and any probability distribution on X, if m > m0 then with m-probability at least 1 – δ, a sample X = (x1, X2,…,xm) ∈ Xm satisfies

We find conditions that are necessary and sufficient for to validly generalise from approximate interpolation, and we obtain bounds on the sample length m0{η,∈,δ) in terms of various parameters describing the expressive power of .

Type
Research Article
Copyright
Copyright © Cambridge University Press 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Valiant, L. G. (1984) A theory of the learnable. Comm. ACM 27(11) 11341142.CrossRefGoogle Scholar
[2]Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M. K. (1989) Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36(4) 929965.CrossRefGoogle Scholar
[3]Anthony, M. and Biggs, N. (1992) Computational Learning Theory: An Introduction. Cambridge University Press.Google Scholar
[4]Natarajan, B. K. (1991) Machine Learning: A Theoretical Approach. Morgan Kaufmann.Google Scholar
[5]Haussler, D. (1992) Decision theoretic generalizations of the PAC model for neural net and other learning applications Information and Computation 100 78150.CrossRefGoogle Scholar
[6]Alon, N., Ben-David, S., Cesa-Bianchi, N. and Haussler, D. (1993) Scale-sensitive dimensions, uniform convergence, and learnability. Proceedings IEEE Symposium on Foundations of Computer Science. IEEE Press.Google Scholar
[7]Bartlett, P. L., Long, P. M. and Williamson, R. C. (1994) Fat-shattering and the learnability of real-valued functions. Proceedings 7th Annual ACM Conference on Computational Learning Theory. ACM Press. (J. Computer and System Sciences. To appear.)Google Scholar
[8]Sontag, E. D. (1992) Feedforward nets for interpolation and classification. J. Computer and System Sciences 45 2048.CrossRefGoogle Scholar
[9]Pollard, D. (1984) Convergence of Stochastic Processes. Springer-Verlag.CrossRefGoogle Scholar
[10]Ben-David, S., Benedek, G. M. and Mansour, Y. (1989) A parameterization scheme for classifying models of learnability. COLT'89, Proceedings 2nd Workshop on Computational Learning Theory. Morgan Kaufmann.Google Scholar
[11]Kearns, M. J. and Schapire, R. E. (1990) Efficient distribution-free learning of probabilistic concepts. Proceedings IEEE Symposium on Foundations of Computer Science. IEEE Press.Google Scholar
[12]Vapnik, V. N. and Chervonenkis, A. Ya. (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16(2) 264280.CrossRefGoogle Scholar
[13]Natarajan, B. K. (1993) Occam's razor for functions. Proceedings 6th ACM Workshop on Computational Learning Theory. ACM Press.Google Scholar
[14]Simon, H. U. (1993) General bounds on the number of examples needed for learning probabilistic concepts. Proceedings 6th ACM Workshop on Computational Learning Theory. ACM Press. (J. Computer and System Sciences. To appear.)Google Scholar
[15]Simon, H. U. (1994) Bounds on the number of examples needed for learning functions. In Computational Learning Theory: EUROCOLT'93 (Shawe-Taylor, J. and Anthony, M., eds.). Oxford University Press.Google Scholar
[16]Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer-Verlag.Google Scholar
[17]Anthony, M. and Shawe-Taylor, J. (1994) Valid generalisation of functions from close approximations on a sample. In Computational Learning Theory: EUROCOLT'93 (Shawe-Taylor, J. and Anthony, M., eds.). Oxford University Press.Google Scholar
[18]Anthony, M., Biggs, N. and Shawe-Taylor, J. (1990) The learnability of formal concepts. COLT'90, Proceedings 3rd Annual Workshop on Computational Learning Theory. Morgan Kaufmann.Google Scholar
[19]Shawe-Taylor, J., Anthony, M. and Biggs, N. L. (1993) Bounding sample size with the Vapnik-Chervonenkis dimension. Discrete Appl. Math. 41 6573.CrossRefGoogle Scholar
[20]Angluin, D. and Valiant, L. (1979) Fast probabilistic algorithms for Hamiltonian circuits and matchings. J. Computer and System Sciences 18 155193.CrossRefGoogle Scholar
[21]Natarajan, B. K. (1989) On learning sets and functions. Machine Learning 4 6797.CrossRefGoogle Scholar
[22]Anthony, M. and Shawe-Taylor, J. (1993) A result of Vapnik with applications. Discrete Appl. Math. 47 207217.CrossRefGoogle Scholar
[23]Ben-David, S., Cesa-Bianchi, N., Haussler, D. and Long, P. (1992) Characterizations of learnability for classes of {0,...,n}-valued functions. J. Computer and System Sciences 50 7486.CrossRefGoogle Scholar
[24]Anthony, M. and Bartlett, P. L. (1994) Function learning from interpolation. In preparation.CrossRefGoogle Scholar