Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part one Pattern Classification with Binary-Output Neural Networks
- 2 The Pattern Classification Problem
- 3 The Growth Function and VC-Dimension
- 4 General Upper Bounds on Sample Complexity
- 5 General Lower Bounds on Sample Complexity
- 6 The VC-Dimension of Linear Threshold Networks
- 7 Bounding the VC-Dimension using Geometric Techniques
- 8 Vapnik-Chervonenkis Dimension Bounds for Neural Networks
- Part two Pattern Classification with Real-Output Networks
- Part three Learning Real-Valued Functions
- Part four Algorithmics
- Appendix 1 Useful Results
- Bibliography
- Author index
- Subject index
7 - Bounding the VC-Dimension using Geometric Techniques
Published online by Cambridge University Press: 26 February 2010
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part one Pattern Classification with Binary-Output Neural Networks
- 2 The Pattern Classification Problem
- 3 The Growth Function and VC-Dimension
- 4 General Upper Bounds on Sample Complexity
- 5 General Lower Bounds on Sample Complexity
- 6 The VC-Dimension of Linear Threshold Networks
- 7 Bounding the VC-Dimension using Geometric Techniques
- 8 Vapnik-Chervonenkis Dimension Bounds for Neural Networks
- Part two Pattern Classification with Real-Output Networks
- Part three Learning Real-Valued Functions
- Part four Algorithmics
- Appendix 1 Useful Results
- Bibliography
- Author index
- Subject index
Summary
Introduction
Results in the previous chapter show that the VC-dimension of the class of functions computed by a network of linear threshold units with W parameters is no larger than a constant times W log W. These results cannot immediately be extended to networks of sigmoid units (with continuous activation functions), since the proofs involve counting the number of distinct outputs of all linear threshold units in the network as the input varies over m patterns, and a single sigmoid unit has an infinite number of output values. In this chapter and the next we derive bounds on the VC-dimension of certain sigmoid networks, including networks of units having the standard sigmoid activation function σ(α) = 1/(1 + e−α). Before we begin this derivation, we study an example that shows that the form of the activation function is crucial.
The Need for Conditions on the Activation Functions
One might suspect that if we construct networks of sigmoid units with a well-behaved activation function, they will have finite VC-dimension. For instance, perhaps it suffices if the activation function is sufficiently smooth, bounded, and monotonically increasing. Unfortunately, the situation is not so simple. The following result shows that there is an activation function that has all of these properties, and even has its derivative monotonically increasing to the left of zero and decreasing to the right (so it is convex and concave in those regions), and yet is such that a two-layer network having only two computation units in the first layer, each with this activation function, has infinite VC-dimension.
- Type
- Chapter
- Information
- Neural Network LearningTheoretical Foundations, pp. 86 - 107Publisher: Cambridge University PressPrint publication year: 1999