Very low bit rate speech coding based on a phoneme recognition

Shigeo Morishima, Hiroshi Harashima

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Summary form only given, as follows. A new speech compression technique for voice storage or voice mail is presented. Basically the coding scheme of this system is stochastic coding (CELP), but the results of phoneme recognition and segmentation are utilized as the standard for vector quantization (VQ) codebook selection and voiced-unvoiced control. The recognition process is performed using the heuristic knowledge to decide nine phonemes. Codebooks for both PARCOR coefficients and excitations for each phoneme are trained by a 75 spoken word sequence that includes all the VCV patterns. The phoneme code number is quantized at the beginning of each segment to select the optimum codebooks and strategies for that segment. This scheme can be categorized as multiple-stage VQ. Thus the size of each codebook is very small and the length of each segment is very long. Very-low-bit-rate coding with high quality can be realized, and a special procedure can be performed to increase the intelligibility. In the case where the average bit rate is 860 b/s, the experimental results show that the average segmental SNR is 6.30 dB, and a subjective test indicates good intelligibility and phoneme clarity.

Original languageEnglish
Title of host publicationIEEE 1988 Int Symp on Inf Theory Abstr of Pap
Place of PublicationNew York, NY, USA
PublisherPubl by IEEE
Pages71-72
Number of pages2
Volume25 n 13
Publication statusPublished - 1988
Externally publishedYes

Fingerprint

Speech coding
Vector quantization

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Morishima, S., & Harashima, H. (1988). Very low bit rate speech coding based on a phoneme recognition. In IEEE 1988 Int Symp on Inf Theory Abstr of Pap (Vol. 25 n 13, pp. 71-72). New York, NY, USA: Publ by IEEE.

Very low bit rate speech coding based on a phoneme recognition. / Morishima, Shigeo; Harashima, Hiroshi.

IEEE 1988 Int Symp on Inf Theory Abstr of Pap. Vol. 25 n 13 New York, NY, USA : Publ by IEEE, 1988. p. 71-72.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Morishima, S & Harashima, H 1988, Very low bit rate speech coding based on a phoneme recognition. in IEEE 1988 Int Symp on Inf Theory Abstr of Pap. vol. 25 n 13, Publ by IEEE, New York, NY, USA, pp. 71-72.
Morishima S, Harashima H. Very low bit rate speech coding based on a phoneme recognition. In IEEE 1988 Int Symp on Inf Theory Abstr of Pap. Vol. 25 n 13. New York, NY, USA: Publ by IEEE. 1988. p. 71-72
Morishima, Shigeo ; Harashima, Hiroshi. / Very low bit rate speech coding based on a phoneme recognition. IEEE 1988 Int Symp on Inf Theory Abstr of Pap. Vol. 25 n 13 New York, NY, USA : Publ by IEEE, 1988. pp. 71-72
@inproceedings{48e9c65a4e48412e9eae4679d516874b,
title = "Very low bit rate speech coding based on a phoneme recognition",
abstract = "Summary form only given, as follows. A new speech compression technique for voice storage or voice mail is presented. Basically the coding scheme of this system is stochastic coding (CELP), but the results of phoneme recognition and segmentation are utilized as the standard for vector quantization (VQ) codebook selection and voiced-unvoiced control. The recognition process is performed using the heuristic knowledge to decide nine phonemes. Codebooks for both PARCOR coefficients and excitations for each phoneme are trained by a 75 spoken word sequence that includes all the VCV patterns. The phoneme code number is quantized at the beginning of each segment to select the optimum codebooks and strategies for that segment. This scheme can be categorized as multiple-stage VQ. Thus the size of each codebook is very small and the length of each segment is very long. Very-low-bit-rate coding with high quality can be realized, and a special procedure can be performed to increase the intelligibility. In the case where the average bit rate is 860 b/s, the experimental results show that the average segmental SNR is 6.30 dB, and a subjective test indicates good intelligibility and phoneme clarity.",
author = "Shigeo Morishima and Hiroshi Harashima",
year = "1988",
language = "English",
volume = "25 n 13",
pages = "71--72",
booktitle = "IEEE 1988 Int Symp on Inf Theory Abstr of Pap",
publisher = "Publ by IEEE",

}

TY - GEN

T1 - Very low bit rate speech coding based on a phoneme recognition

AU - Morishima, Shigeo

AU - Harashima, Hiroshi

PY - 1988

Y1 - 1988

N2 - Summary form only given, as follows. A new speech compression technique for voice storage or voice mail is presented. Basically the coding scheme of this system is stochastic coding (CELP), but the results of phoneme recognition and segmentation are utilized as the standard for vector quantization (VQ) codebook selection and voiced-unvoiced control. The recognition process is performed using the heuristic knowledge to decide nine phonemes. Codebooks for both PARCOR coefficients and excitations for each phoneme are trained by a 75 spoken word sequence that includes all the VCV patterns. The phoneme code number is quantized at the beginning of each segment to select the optimum codebooks and strategies for that segment. This scheme can be categorized as multiple-stage VQ. Thus the size of each codebook is very small and the length of each segment is very long. Very-low-bit-rate coding with high quality can be realized, and a special procedure can be performed to increase the intelligibility. In the case where the average bit rate is 860 b/s, the experimental results show that the average segmental SNR is 6.30 dB, and a subjective test indicates good intelligibility and phoneme clarity.

AB - Summary form only given, as follows. A new speech compression technique for voice storage or voice mail is presented. Basically the coding scheme of this system is stochastic coding (CELP), but the results of phoneme recognition and segmentation are utilized as the standard for vector quantization (VQ) codebook selection and voiced-unvoiced control. The recognition process is performed using the heuristic knowledge to decide nine phonemes. Codebooks for both PARCOR coefficients and excitations for each phoneme are trained by a 75 spoken word sequence that includes all the VCV patterns. The phoneme code number is quantized at the beginning of each segment to select the optimum codebooks and strategies for that segment. This scheme can be categorized as multiple-stage VQ. Thus the size of each codebook is very small and the length of each segment is very long. Very-low-bit-rate coding with high quality can be realized, and a special procedure can be performed to increase the intelligibility. In the case where the average bit rate is 860 b/s, the experimental results show that the average segmental SNR is 6.30 dB, and a subjective test indicates good intelligibility and phoneme clarity.

UR - http://www.scopus.com/inward/record.url?scp=0024124089&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0024124089&partnerID=8YFLogxK

M3 - Conference contribution

VL - 25 n 13

SP - 71

EP - 72

BT - IEEE 1988 Int Symp on Inf Theory Abstr of Pap

PB - Publ by IEEE

CY - New York, NY, USA

ER -