Transport analysis of infinitely deep neural network

Sho Sonoda, Noboru Murata

Research output: Contribution to journalArticle

Abstract

We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth—why do DNNs perform better than shallow models?—and the interpretation of DNNs—what do intermediate layers do? Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth, or the hidden layer number, and it is specified by an ordinary differential equation (ODE) with a vector field. We interpret an ordinary DNN as a transport map or an Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE), which is a cornerstone for the development of deep learning. Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution. We expect that further investigations on these questions lead to the development of an interpretable and principled alternatives to DNNs.

Original languageEnglish
Pages (from-to)1-52
Number of pages52
JournalJournal of Machine Learning Research
Volume20
Publication statusPublished - 2019 Feb 1

Fingerprint

Denoising
Neural Networks
Continuum Limit
Integral Representation
Dynamical system
Broken line
Dynamical systems
Neural Network Applications
Mass Transport
Gradient Flow
Continuity Equation
Shannon Entropy
Data Distribution
Faithful
Deep neural networks
Parametrization
Euler
Ordinary differential equations
Vector Field
Ordinary differential equation

Keywords

  • Backward heat equation
  • Continuum limit
  • Denoising autoencoder
  • Flow representation
  • Representation learning
  • Ridgelet analysis
  • Wasserstein geometry

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Cite this

Transport analysis of infinitely deep neural network. / Sonoda, Sho; Murata, Noboru.

In: Journal of Machine Learning Research, Vol. 20, 01.02.2019, p. 1-52.

Research output: Contribution to journalArticle

@article{8efadca94d3a419d853fd58f4e957cd7,
title = "Transport analysis of infinitely deep neural network",
abstract = "We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth—why do DNNs perform better than shallow models?—and the interpretation of DNNs—what do intermediate layers do? Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth, or the hidden layer number, and it is specified by an ordinary differential equation (ODE) with a vector field. We interpret an ordinary DNN as a transport map or an Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE), which is a cornerstone for the development of deep learning. Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution. We expect that further investigations on these questions lead to the development of an interpretable and principled alternatives to DNNs.",
keywords = "Backward heat equation, Continuum limit, Denoising autoencoder, Flow representation, Representation learning, Ridgelet analysis, Wasserstein geometry",
author = "Sho Sonoda and Noboru Murata",
year = "2019",
month = "2",
day = "1",
language = "English",
volume = "20",
pages = "1--52",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Transport analysis of infinitely deep neural network

AU - Sonoda, Sho

AU - Murata, Noboru

PY - 2019/2/1

Y1 - 2019/2/1

N2 - We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth—why do DNNs perform better than shallow models?—and the interpretation of DNNs—what do intermediate layers do? Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth, or the hidden layer number, and it is specified by an ordinary differential equation (ODE) with a vector field. We interpret an ordinary DNN as a transport map or an Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE), which is a cornerstone for the development of deep learning. Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution. We expect that further investigations on these questions lead to the development of an interpretable and principled alternatives to DNNs.

AB - We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth—why do DNNs perform better than shallow models?—and the interpretation of DNNs—what do intermediate layers do? Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth, or the hidden layer number, and it is specified by an ordinary differential equation (ODE) with a vector field. We interpret an ordinary DNN as a transport map or an Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE), which is a cornerstone for the development of deep learning. Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution. We expect that further investigations on these questions lead to the development of an interpretable and principled alternatives to DNNs.

KW - Backward heat equation

KW - Continuum limit

KW - Denoising autoencoder

KW - Flow representation

KW - Representation learning

KW - Ridgelet analysis

KW - Wasserstein geometry

UR - http://www.scopus.com/inward/record.url?scp=85072374357&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072374357&partnerID=8YFLogxK

M3 - Article

VL - 20

SP - 1

EP - 52

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -