On-line learning theory of soft committee machines with correlated hidden units - Steepest gradient descent and natural gradient descent

Masato Inoue, Hyeyoung Park, Masato Okada

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.

Original languageEnglish
Pages (from-to)805-810
Number of pages6
JournalJournal of the Physical Society of Japan
Volume72
Issue number4
DOIs
Publication statusPublished - 2003 Apr
Externally publishedYes

Fingerprint

learning theory
descent
learning
gradients
instructors
saddles
plateaus
self organizing systems
permutations
symmetry
saddle points
statistical mechanics
causes

Keywords

  • Natural gradient descent
  • Perceptron
  • Plateau
  • Saddle
  • Singularity
  • Soft committee machine

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

On-line learning theory of soft committee machines with correlated hidden units - Steepest gradient descent and natural gradient descent. / Inoue, Masato; Park, Hyeyoung; Okada, Masato.

In: Journal of the Physical Society of Japan, Vol. 72, No. 4, 04.2003, p. 805-810.

Research output: Contribution to journalArticle

@article{376f52a929b5488e8ef38ecf41fb115c,
title = "On-line learning theory of soft committee machines with correlated hidden units - Steepest gradient descent and natural gradient descent",
abstract = "The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.",
keywords = "Natural gradient descent, Perceptron, Plateau, Saddle, Singularity, Soft committee machine",
author = "Masato Inoue and Hyeyoung Park and Masato Okada",
year = "2003",
month = "4",
doi = "10.1143/JPSJ.72.805",
language = "English",
volume = "72",
pages = "805--810",
journal = "Journal of the Physical Society of Japan",
issn = "0031-9015",
publisher = "Physical Society of Japan",
number = "4",

}

TY - JOUR

T1 - On-line learning theory of soft committee machines with correlated hidden units - Steepest gradient descent and natural gradient descent

AU - Inoue, Masato

AU - Park, Hyeyoung

AU - Okada, Masato

PY - 2003/4

Y1 - 2003/4

N2 - The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.

AB - The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.

KW - Natural gradient descent

KW - Perceptron

KW - Plateau

KW - Saddle

KW - Singularity

KW - Soft committee machine

UR - http://www.scopus.com/inward/record.url?scp=0038323312&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038323312&partnerID=8YFLogxK

U2 - 10.1143/JPSJ.72.805

DO - 10.1143/JPSJ.72.805

M3 - Article

AN - SCOPUS:0038323312

VL - 72

SP - 805

EP - 810

JO - Journal of the Physical Society of Japan

JF - Journal of the Physical Society of Japan

SN - 0031-9015

IS - 4

ER -