Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier

Naoyuki Kanda, Katsutoshi Itoyama, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, a novel index combination method for spoken term detection is proposed. In our method, outputs from four different recognizers (word, syllable, word-syllable, and fragment recognizer) are combined into one confusion network. A novel index-selection method for the multiple index-combination method is then used to suppress the increase of the index size. Two methods are proposed to reduce index size: (1) arc selection and (2) unit selection, both of which are based on an OOV-region classifier score. Experimental results with 39 hours of Japanese lecture recordings showed that the index-selection method achieved a 22% reduction of index size of the best confusion network while maintaining its high accuracy. Compared with the best phoneme-based index from a single recognizer, the proposed method achieved a 25.0% and 14.8% relative error reduction for IV and OOV queries without increasing the index size.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages8540-8544
Number of pages5
DOIs
Publication statusPublished - 2013 Oct 18
Externally publishedYes
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC
Duration: 2013 May 262013 May 31

Other

Other2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
CityVancouver, BC
Period13/5/2613/5/31

Fingerprint

Classifiers

Keywords

  • keyword spotting
  • out-of-vocabulary detection
  • Spoken term detection

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Kanda, N., Itoyama, K., & Okuno, H. G. (2013). Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 8540-8544). [6639332] https://doi.org/10.1109/ICASSP.2013.6639332

Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier. / Kanda, Naoyuki; Itoyama, Katsutoshi; Okuno, Hiroshi G.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. p. 8540-8544 6639332.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kanda, N, Itoyama, K & Okuno, HG 2013, Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 6639332, pp. 8540-8544, 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver, BC, 13/5/26. https://doi.org/10.1109/ICASSP.2013.6639332
Kanda N, Itoyama K, Okuno HG. Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. p. 8540-8544. 6639332 https://doi.org/10.1109/ICASSP.2013.6639332
Kanda, Naoyuki ; Itoyama, Katsutoshi ; Okuno, Hiroshi G. / Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. pp. 8540-8544
@inproceedings{12c62e030cc149a8832391736960e414,
title = "Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier",
abstract = "In this paper, a novel index combination method for spoken term detection is proposed. In our method, outputs from four different recognizers (word, syllable, word-syllable, and fragment recognizer) are combined into one confusion network. A novel index-selection method for the multiple index-combination method is then used to suppress the increase of the index size. Two methods are proposed to reduce index size: (1) arc selection and (2) unit selection, both of which are based on an OOV-region classifier score. Experimental results with 39 hours of Japanese lecture recordings showed that the index-selection method achieved a 22{\%} reduction of index size of the best confusion network while maintaining its high accuracy. Compared with the best phoneme-based index from a single recognizer, the proposed method achieved a 25.0{\%} and 14.8{\%} relative error reduction for IV and OOV queries without increasing the index size.",
keywords = "keyword spotting, out-of-vocabulary detection, Spoken term detection",
author = "Naoyuki Kanda and Katsutoshi Itoyama and Okuno, {Hiroshi G.}",
year = "2013",
month = "10",
day = "18",
doi = "10.1109/ICASSP.2013.6639332",
language = "English",
isbn = "9781479903566",
pages = "8540--8544",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier

AU - Kanda, Naoyuki

AU - Itoyama, Katsutoshi

AU - Okuno, Hiroshi G.

PY - 2013/10/18

Y1 - 2013/10/18

N2 - In this paper, a novel index combination method for spoken term detection is proposed. In our method, outputs from four different recognizers (word, syllable, word-syllable, and fragment recognizer) are combined into one confusion network. A novel index-selection method for the multiple index-combination method is then used to suppress the increase of the index size. Two methods are proposed to reduce index size: (1) arc selection and (2) unit selection, both of which are based on an OOV-region classifier score. Experimental results with 39 hours of Japanese lecture recordings showed that the index-selection method achieved a 22% reduction of index size of the best confusion network while maintaining its high accuracy. Compared with the best phoneme-based index from a single recognizer, the proposed method achieved a 25.0% and 14.8% relative error reduction for IV and OOV queries without increasing the index size.

AB - In this paper, a novel index combination method for spoken term detection is proposed. In our method, outputs from four different recognizers (word, syllable, word-syllable, and fragment recognizer) are combined into one confusion network. A novel index-selection method for the multiple index-combination method is then used to suppress the increase of the index size. Two methods are proposed to reduce index size: (1) arc selection and (2) unit selection, both of which are based on an OOV-region classifier score. Experimental results with 39 hours of Japanese lecture recordings showed that the index-selection method achieved a 22% reduction of index size of the best confusion network while maintaining its high accuracy. Compared with the best phoneme-based index from a single recognizer, the proposed method achieved a 25.0% and 14.8% relative error reduction for IV and OOV queries without increasing the index size.

KW - keyword spotting

KW - out-of-vocabulary detection

KW - Spoken term detection

UR - http://www.scopus.com/inward/record.url?scp=84890503364&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890503364&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2013.6639332

DO - 10.1109/ICASSP.2013.6639332

M3 - Conference contribution

SN - 9781479903566

SP - 8540

EP - 8544

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -