Audio style transfer in non-native speech recognition

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Current automatic speech recognition (ASR) systems achieve the over 90-95% accuracy, depending on methodology applied and datasets. However, the accuracy drops significantly, while the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation features. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size as well as in the number of existing languages, which makes it difficult to train sufficiently accurate ASR systems targeted for non-native speakers. Therefore applying a different method is necessary. In this paper, we suggest an idea for an alternative approach to the problem, by employing so-called style transfer methodology. Style transfer, used mainly in graphical domain until now, could help solve the problem of non-native speech. Another advantage is that the style transferring algorithm could be compatible with already existing ASR systems, which means it would not be necessary to train new systems which can be difficult and time consuming.

Original languageEnglish
Title of host publicationPhotonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018
EditorsRyszard S. Romaniuk, Maciej Linczuk
PublisherSPIE
ISBN (Electronic)9781510622036
DOIs
Publication statusPublished - 2018 Jan 1
Externally publishedYes
EventPhotonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018 - Wilga, Poland
Duration: 2018 Jun 32018 Jun 10

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume10808
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

ConferencePhotonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018
CountryPoland
CityWilga
Period18/6/318/6/10

Fingerprint

speech recognition
Speech Recognition
Speech recognition
Automatic Speech Recognition
methodology
Necessary
Methodology
Style
Alternatives
Speech
Language

Keywords

  • Artificial intelligence
  • Deep learning
  • Machine learning
  • Non-native speaker
  • Speech recognition
  • Style transfer

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Cite this

Radzikowski, K. P. (2018). Audio style transfer in non-native speech recognition. In R. S. Romaniuk, & M. Linczuk (Eds.), Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018 [1080839] (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 10808). SPIE. https://doi.org/10.1117/12.2501495

Audio style transfer in non-native speech recognition. / Radzikowski, Kacper Pawel.

Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018. ed. / Ryszard S. Romaniuk; Maciej Linczuk. SPIE, 2018. 1080839 (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 10808).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Radzikowski, KP 2018, Audio style transfer in non-native speech recognition. in RS Romaniuk & M Linczuk (eds), Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018., 1080839, Proceedings of SPIE - The International Society for Optical Engineering, vol. 10808, SPIE, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018, Wilga, Poland, 18/6/3. https://doi.org/10.1117/12.2501495
Radzikowski KP. Audio style transfer in non-native speech recognition. In Romaniuk RS, Linczuk M, editors, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018. SPIE. 2018. 1080839. (Proceedings of SPIE - The International Society for Optical Engineering). https://doi.org/10.1117/12.2501495
Radzikowski, Kacper Pawel. / Audio style transfer in non-native speech recognition. Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018. editor / Ryszard S. Romaniuk ; Maciej Linczuk. SPIE, 2018. (Proceedings of SPIE - The International Society for Optical Engineering).
@inproceedings{a33f4025ce1b4110b2b6c61d1305c2c9,
title = "Audio style transfer in non-native speech recognition",
abstract = "Current automatic speech recognition (ASR) systems achieve the over 90-95{\%} accuracy, depending on methodology applied and datasets. However, the accuracy drops significantly, while the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation features. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size as well as in the number of existing languages, which makes it difficult to train sufficiently accurate ASR systems targeted for non-native speakers. Therefore applying a different method is necessary. In this paper, we suggest an idea for an alternative approach to the problem, by employing so-called style transfer methodology. Style transfer, used mainly in graphical domain until now, could help solve the problem of non-native speech. Another advantage is that the style transferring algorithm could be compatible with already existing ASR systems, which means it would not be necessary to train new systems which can be difficult and time consuming.",
keywords = "Artificial intelligence, Deep learning, Machine learning, Non-native speaker, Speech recognition, Style transfer",
author = "Radzikowski, {Kacper Pawel}",
year = "2018",
month = "1",
day = "1",
doi = "10.1117/12.2501495",
language = "English",
series = "Proceedings of SPIE - The International Society for Optical Engineering",
publisher = "SPIE",
editor = "Romaniuk, {Ryszard S.} and Maciej Linczuk",
booktitle = "Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018",

}

TY - GEN

T1 - Audio style transfer in non-native speech recognition

AU - Radzikowski, Kacper Pawel

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Current automatic speech recognition (ASR) systems achieve the over 90-95% accuracy, depending on methodology applied and datasets. However, the accuracy drops significantly, while the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation features. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size as well as in the number of existing languages, which makes it difficult to train sufficiently accurate ASR systems targeted for non-native speakers. Therefore applying a different method is necessary. In this paper, we suggest an idea for an alternative approach to the problem, by employing so-called style transfer methodology. Style transfer, used mainly in graphical domain until now, could help solve the problem of non-native speech. Another advantage is that the style transferring algorithm could be compatible with already existing ASR systems, which means it would not be necessary to train new systems which can be difficult and time consuming.

AB - Current automatic speech recognition (ASR) systems achieve the over 90-95% accuracy, depending on methodology applied and datasets. However, the accuracy drops significantly, while the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation features. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size as well as in the number of existing languages, which makes it difficult to train sufficiently accurate ASR systems targeted for non-native speakers. Therefore applying a different method is necessary. In this paper, we suggest an idea for an alternative approach to the problem, by employing so-called style transfer methodology. Style transfer, used mainly in graphical domain until now, could help solve the problem of non-native speech. Another advantage is that the style transferring algorithm could be compatible with already existing ASR systems, which means it would not be necessary to train new systems which can be difficult and time consuming.

KW - Artificial intelligence

KW - Deep learning

KW - Machine learning

KW - Non-native speaker

KW - Speech recognition

KW - Style transfer

UR - http://www.scopus.com/inward/record.url?scp=85056259045&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056259045&partnerID=8YFLogxK

U2 - 10.1117/12.2501495

DO - 10.1117/12.2501495

M3 - Conference contribution

AN - SCOPUS:85056259045

T3 - Proceedings of SPIE - The International Society for Optical Engineering

BT - Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018

A2 - Romaniuk, Ryszard S.

A2 - Linczuk, Maciej

PB - SPIE

ER -