Dynamics of the adaptive natural gradient descent method for soft committee machines

Masato Inoue, Hyeyoung Park, Masato Okada

Research output: Contribution to journalArticle

Abstract

Adaptive natural gradient descent (ANGD) method realizes natural gradient descent (NGD) without needing to know the input distribution of learning data and reduces the calculation cost from a cubic order to a square order. However, no performance analysis of ANGD has been done. We have developed a statistical-mechanical theory of the simplified version of ANGD dynamics for soft committee machines in on-line learning; this method provides deterministic learning dynamics expressed through a few order parameters, even though ANGD intrinsically holds a large approximated Fisher information matrix. Numerical results obtained using this theory were consistent with those of a simulation, with respect not only to the learning curve but also to the learning failure. Utilizing this method, we numerically evaluated ANGD efficiency and found that ANGD generally performs as well as NGD. We also revealed the key condition affecting the learning plateau in ANGD.

Original languageEnglish
Number of pages1
JournalPhysical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics
Volume69
Issue number5
DOIs
Publication statusPublished - 2004 Jan 1
Externally publishedYes

Fingerprint

Gradient Descent Method
Gradient Descent
descent
gradients
learning
learning curves
Learning Curve
Fisher Information Matrix
Fisher information
Order Parameter
Performance Analysis
plateaus
Learning
Numerical Results
costs
Costs

ASJC Scopus subject areas

  • Statistical and Nonlinear Physics
  • Statistics and Probability
  • Condensed Matter Physics

Cite this

@article{3659783a74084bed83c7a03fcf5b8b4c,
title = "Dynamics of the adaptive natural gradient descent method for soft committee machines",
abstract = "Adaptive natural gradient descent (ANGD) method realizes natural gradient descent (NGD) without needing to know the input distribution of learning data and reduces the calculation cost from a cubic order to a square order. However, no performance analysis of ANGD has been done. We have developed a statistical-mechanical theory of the simplified version of ANGD dynamics for soft committee machines in on-line learning; this method provides deterministic learning dynamics expressed through a few order parameters, even though ANGD intrinsically holds a large approximated Fisher information matrix. Numerical results obtained using this theory were consistent with those of a simulation, with respect not only to the learning curve but also to the learning failure. Utilizing this method, we numerically evaluated ANGD efficiency and found that ANGD generally performs as well as NGD. We also revealed the key condition affecting the learning plateau in ANGD.",
author = "Masato Inoue and Hyeyoung Park and Masato Okada",
year = "2004",
month = "1",
day = "1",
doi = "10.1103/PhysRevE.69.056120",
language = "English",
volume = "69",
journal = "Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics",
issn = "1063-651X",
publisher = "American Physical Society",
number = "5",

}

TY - JOUR

T1 - Dynamics of the adaptive natural gradient descent method for soft committee machines

AU - Inoue, Masato

AU - Park, Hyeyoung

AU - Okada, Masato

PY - 2004/1/1

Y1 - 2004/1/1

N2 - Adaptive natural gradient descent (ANGD) method realizes natural gradient descent (NGD) without needing to know the input distribution of learning data and reduces the calculation cost from a cubic order to a square order. However, no performance analysis of ANGD has been done. We have developed a statistical-mechanical theory of the simplified version of ANGD dynamics for soft committee machines in on-line learning; this method provides deterministic learning dynamics expressed through a few order parameters, even though ANGD intrinsically holds a large approximated Fisher information matrix. Numerical results obtained using this theory were consistent with those of a simulation, with respect not only to the learning curve but also to the learning failure. Utilizing this method, we numerically evaluated ANGD efficiency and found that ANGD generally performs as well as NGD. We also revealed the key condition affecting the learning plateau in ANGD.

AB - Adaptive natural gradient descent (ANGD) method realizes natural gradient descent (NGD) without needing to know the input distribution of learning data and reduces the calculation cost from a cubic order to a square order. However, no performance analysis of ANGD has been done. We have developed a statistical-mechanical theory of the simplified version of ANGD dynamics for soft committee machines in on-line learning; this method provides deterministic learning dynamics expressed through a few order parameters, even though ANGD intrinsically holds a large approximated Fisher information matrix. Numerical results obtained using this theory were consistent with those of a simulation, with respect not only to the learning curve but also to the learning failure. Utilizing this method, we numerically evaluated ANGD efficiency and found that ANGD generally performs as well as NGD. We also revealed the key condition affecting the learning plateau in ANGD.

UR - http://www.scopus.com/inward/record.url?scp=85036432834&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85036432834&partnerID=8YFLogxK

U2 - 10.1103/PhysRevE.69.056120

DO - 10.1103/PhysRevE.69.056120

M3 - Article

AN - SCOPUS:85036432834

VL - 69

JO - Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics

JF - Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics

SN - 1063-651X

IS - 5

ER -