An automatic labeling technique for known speech samples is proposed to construct a fine speech data base. A word (or sentence) is represented by a phonetic network which covers the acoustic variation contained in the utterances of the word (or sentence). An input speech sample is segmented using its parameter pattern dynamics and labeled to the optimal phonetic label (called APSEG) sequence by matching th segment sequence to the generated phonetic network using constrained dynamic programming. The feasibility of the method is confirmed when it is applied ot a word set containing 53 city names.
|ジャーナル||Denshi Gijutsu Sogo Kenkyusho Iho/Bulletin of the Electrotechnical Laboratory|
|出版ステータス||Published - 1988|
ASJC Scopus subject areas