Speech coding based on a multi-layer neural network

Shigeo Morishima, Hiroshi Harashima, Yasuo Katayama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The authors present a speech-compression scheme based on a three-layer perceptron in which the number of units in the hidden layer is reduced. Input and output layers have the same number of units in order to achieve identity mapping. Speech coding is realized by scalar or vector quantization of hidden-layer outputs. By analyzing the weighting coefficients, it can be shown that speech coding based on a three-layer neural network is speaker-independent. Transform coding is automatically based on back propagation. The relation between compression ratio and SNR (signal-to-noise ratio) is investigated. The bit allocation and optimum number of hidden-layer units necessary to realize a specific bit rate are given. According to the analysis of weighting coefficients, speech coding based on a neural network is transform coding similar to Karhunen-Loeve transformation. The characteristics of a five-layer neural network are examined. It is shown that since the five-layer neural network can realize nonlinear mapping, it is more effective than the three-layer network.

Original languageEnglish
Title of host publicationConference Record - International Conference on Communications
PublisherPubl by IEEE
Pages429-433
Number of pages5
Volume2
Publication statusPublished - 1990
Externally publishedYes
EventIEEE International Conference on Communications - ICC '90 Part 2 (of 4) - Atlanta, GA, USA
Duration: 1990 Apr 161990 Apr 19

Other

OtherIEEE International Conference on Communications - ICC '90 Part 2 (of 4)
CityAtlanta, GA, USA
Period90/4/1690/4/19

Fingerprint

Speech coding
Multilayer neural networks
Neural networks
Network layers
Vector quantization
Backpropagation
Signal to noise ratio

ASJC Scopus subject areas

  • Media Technology

Cite this

Morishima, S., Harashima, H., & Katayama, Y. (1990). Speech coding based on a multi-layer neural network. In Conference Record - International Conference on Communications (Vol. 2, pp. 429-433). Publ by IEEE.

Speech coding based on a multi-layer neural network. / Morishima, Shigeo; Harashima, Hiroshi; Katayama, Yasuo.

Conference Record - International Conference on Communications. Vol. 2 Publ by IEEE, 1990. p. 429-433.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Morishima, S, Harashima, H & Katayama, Y 1990, Speech coding based on a multi-layer neural network. in Conference Record - International Conference on Communications. vol. 2, Publ by IEEE, pp. 429-433, IEEE International Conference on Communications - ICC '90 Part 2 (of 4), Atlanta, GA, USA, 90/4/16.
Morishima S, Harashima H, Katayama Y. Speech coding based on a multi-layer neural network. In Conference Record - International Conference on Communications. Vol. 2. Publ by IEEE. 1990. p. 429-433
Morishima, Shigeo ; Harashima, Hiroshi ; Katayama, Yasuo. / Speech coding based on a multi-layer neural network. Conference Record - International Conference on Communications. Vol. 2 Publ by IEEE, 1990. pp. 429-433
@inproceedings{c11249e4532a47d780d83475578b118c,
title = "Speech coding based on a multi-layer neural network",
abstract = "The authors present a speech-compression scheme based on a three-layer perceptron in which the number of units in the hidden layer is reduced. Input and output layers have the same number of units in order to achieve identity mapping. Speech coding is realized by scalar or vector quantization of hidden-layer outputs. By analyzing the weighting coefficients, it can be shown that speech coding based on a three-layer neural network is speaker-independent. Transform coding is automatically based on back propagation. The relation between compression ratio and SNR (signal-to-noise ratio) is investigated. The bit allocation and optimum number of hidden-layer units necessary to realize a specific bit rate are given. According to the analysis of weighting coefficients, speech coding based on a neural network is transform coding similar to Karhunen-Loeve transformation. The characteristics of a five-layer neural network are examined. It is shown that since the five-layer neural network can realize nonlinear mapping, it is more effective than the three-layer network.",
author = "Shigeo Morishima and Hiroshi Harashima and Yasuo Katayama",
year = "1990",
language = "English",
volume = "2",
pages = "429--433",
booktitle = "Conference Record - International Conference on Communications",
publisher = "Publ by IEEE",

}

TY - GEN

T1 - Speech coding based on a multi-layer neural network

AU - Morishima, Shigeo

AU - Harashima, Hiroshi

AU - Katayama, Yasuo

PY - 1990

Y1 - 1990

N2 - The authors present a speech-compression scheme based on a three-layer perceptron in which the number of units in the hidden layer is reduced. Input and output layers have the same number of units in order to achieve identity mapping. Speech coding is realized by scalar or vector quantization of hidden-layer outputs. By analyzing the weighting coefficients, it can be shown that speech coding based on a three-layer neural network is speaker-independent. Transform coding is automatically based on back propagation. The relation between compression ratio and SNR (signal-to-noise ratio) is investigated. The bit allocation and optimum number of hidden-layer units necessary to realize a specific bit rate are given. According to the analysis of weighting coefficients, speech coding based on a neural network is transform coding similar to Karhunen-Loeve transformation. The characteristics of a five-layer neural network are examined. It is shown that since the five-layer neural network can realize nonlinear mapping, it is more effective than the three-layer network.

AB - The authors present a speech-compression scheme based on a three-layer perceptron in which the number of units in the hidden layer is reduced. Input and output layers have the same number of units in order to achieve identity mapping. Speech coding is realized by scalar or vector quantization of hidden-layer outputs. By analyzing the weighting coefficients, it can be shown that speech coding based on a three-layer neural network is speaker-independent. Transform coding is automatically based on back propagation. The relation between compression ratio and SNR (signal-to-noise ratio) is investigated. The bit allocation and optimum number of hidden-layer units necessary to realize a specific bit rate are given. According to the analysis of weighting coefficients, speech coding based on a neural network is transform coding similar to Karhunen-Loeve transformation. The characteristics of a five-layer neural network are examined. It is shown that since the five-layer neural network can realize nonlinear mapping, it is more effective than the three-layer network.

UR - http://www.scopus.com/inward/record.url?scp=0025694638&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0025694638&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0025694638

VL - 2

SP - 429

EP - 433

BT - Conference Record - International Conference on Communications

PB - Publ by IEEE

ER -