Asymptotic statistical theory of overtraining and cross-validation

Shun Ichi Amari, Noboru Murata, Klaus Robert Müller, Michael Finke, Howard Hua Yang

Research output: Contribution to journalArticle

235 Citations (Scopus)

Abstract

A statistical theory for overtraining is proposed. The analysis treats general realizable stochastic neural networks, trained with Kullback-Leibler divergence in the asymptotic case of a large number of training examples. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and cross-validation sets in order to obtain the optimum performance. Although cross-validated early stopping is useless in the asymptotic region, it surely decreases the generalization error in the nonasymptotic region. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

Original languageEnglish
Pages (from-to)985-996
Number of pages12
JournalIEEE Transactions on Neural Networks
Volume8
Issue number5
DOIs
Publication statusPublished - 1997
Externally publishedYes

Fingerprint

Cross-validation
Early Stopping
Generalization Error
Stochastic Neural Networks
Optimal Stopping Time
Kullback-Leibler Divergence
Neural networks
Decrease
Simulation
Training

Keywords

  • Asymptotic analysis
  • Cross-validation
  • Early stopping
  • Generalization
  • Overtraining
  • Stochastic neural networks

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Theoretical Computer Science
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Hardware and Architecture

Cite this

Asymptotic statistical theory of overtraining and cross-validation. / Amari, Shun Ichi; Murata, Noboru; Müller, Klaus Robert; Finke, Michael; Yang, Howard Hua.

In: IEEE Transactions on Neural Networks, Vol. 8, No. 5, 1997, p. 985-996.

Research output: Contribution to journalArticle

Amari, Shun Ichi ; Murata, Noboru ; Müller, Klaus Robert ; Finke, Michael ; Yang, Howard Hua. / Asymptotic statistical theory of overtraining and cross-validation. In: IEEE Transactions on Neural Networks. 1997 ; Vol. 8, No. 5. pp. 985-996.
@article{3fefb73fe97646c7a37f246f9dd9807a,
title = "Asymptotic statistical theory of overtraining and cross-validation",
abstract = "A statistical theory for overtraining is proposed. The analysis treats general realizable stochastic neural networks, trained with Kullback-Leibler divergence in the asymptotic case of a large number of training examples. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and cross-validation sets in order to obtain the optimum performance. Although cross-validated early stopping is useless in the asymptotic region, it surely decreases the generalization error in the nonasymptotic region. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.",
keywords = "Asymptotic analysis, Cross-validation, Early stopping, Generalization, Overtraining, Stochastic neural networks",
author = "Amari, {Shun Ichi} and Noboru Murata and M{\"u}ller, {Klaus Robert} and Michael Finke and Yang, {Howard Hua}",
year = "1997",
doi = "10.1109/72.623200",
language = "English",
volume = "8",
pages = "985--996",
journal = "IEEE Transactions on Neural Networks and Learning Systems",
issn = "2162-237X",
publisher = "IEEE Computational Intelligence Society",
number = "5",

}

TY - JOUR

T1 - Asymptotic statistical theory of overtraining and cross-validation

AU - Amari, Shun Ichi

AU - Murata, Noboru

AU - Müller, Klaus Robert

AU - Finke, Michael

AU - Yang, Howard Hua

PY - 1997

Y1 - 1997

N2 - A statistical theory for overtraining is proposed. The analysis treats general realizable stochastic neural networks, trained with Kullback-Leibler divergence in the asymptotic case of a large number of training examples. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and cross-validation sets in order to obtain the optimum performance. Although cross-validated early stopping is useless in the asymptotic region, it surely decreases the generalization error in the nonasymptotic region. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

AB - A statistical theory for overtraining is proposed. The analysis treats general realizable stochastic neural networks, trained with Kullback-Leibler divergence in the asymptotic case of a large number of training examples. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and cross-validation sets in order to obtain the optimum performance. Although cross-validated early stopping is useless in the asymptotic region, it surely decreases the generalization error in the nonasymptotic region. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

KW - Asymptotic analysis

KW - Cross-validation

KW - Early stopping

KW - Generalization

KW - Overtraining

KW - Stochastic neural networks

UR - http://www.scopus.com/inward/record.url?scp=0031236925&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031236925&partnerID=8YFLogxK

U2 - 10.1109/72.623200

DO - 10.1109/72.623200

M3 - Article

VL - 8

SP - 985

EP - 996

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

SN - 2162-237X

IS - 5

ER -