Efficient learning for spoken language understanding tasks with word embedding based pre-training

Yi Luan, Shinji Watanabe, Bret Harsham

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Spoken language understanding (SLU) tasks such as goal estimation and intention identification from user's commands are essential components in spoken dialog systems. In recent years, neural network approaches have shown great success in various SLU tasks. However, one major difficulty of SLU is that the annotation of collected data can be expensive. Often this results in insufficient data being available for a task. The performance of a neural network trained in low resource conditions is usually inferior because of over-training. To improve the performance, this paper investigates the use of unsupervised training methods with large-scale corpora based on word embedding and latent topic models to pre-train the SLU networks. In order to capture long-term characteristics over the entire dialog, we propose a novel Recurrent Neural Network (RNN) architecture. The proposed RNN uses two sub-networks to model the different time scales represented by word and turn sequences. The combination of pre-training and RNN gives us a 18% relative error reduction compared to a baseline system.

Original languageEnglish
Pages (from-to)1398-1402
Number of pages5
JournalUnknown Journal
Volume2015-January
Publication statusPublished - 2015
Externally publishedYes

Fingerprint

Recurrent neural networks
embedding
learning
Recurrent Neural Networks
education
Neural networks
Neural Networks
Network architecture
Spoken Dialogue Systems
Error Reduction
Essential Component
Network Architecture
Relative Error
Annotation
Baseline
Time Scales
annotations
Entire
commands
Resources

Keywords

  • Fine-tuning
  • Goal estimation
  • Recurrent neural networks
  • Semantic embedding
  • Spoken language understanding

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Efficient learning for spoken language understanding tasks with word embedding based pre-training. / Luan, Yi; Watanabe, Shinji; Harsham, Bret.

In: Unknown Journal, Vol. 2015-January, 2015, p. 1398-1402.

Research output: Contribution to journalArticle

@article{365796f387684fb9a78662287b532be5,
title = "Efficient learning for spoken language understanding tasks with word embedding based pre-training",
abstract = "Spoken language understanding (SLU) tasks such as goal estimation and intention identification from user's commands are essential components in spoken dialog systems. In recent years, neural network approaches have shown great success in various SLU tasks. However, one major difficulty of SLU is that the annotation of collected data can be expensive. Often this results in insufficient data being available for a task. The performance of a neural network trained in low resource conditions is usually inferior because of over-training. To improve the performance, this paper investigates the use of unsupervised training methods with large-scale corpora based on word embedding and latent topic models to pre-train the SLU networks. In order to capture long-term characteristics over the entire dialog, we propose a novel Recurrent Neural Network (RNN) architecture. The proposed RNN uses two sub-networks to model the different time scales represented by word and turn sequences. The combination of pre-training and RNN gives us a 18{\%} relative error reduction compared to a baseline system.",
keywords = "Fine-tuning, Goal estimation, Recurrent neural networks, Semantic embedding, Spoken language understanding",
author = "Yi Luan and Shinji Watanabe and Bret Harsham",
year = "2015",
language = "English",
volume = "2015-January",
pages = "1398--1402",
journal = "Nuclear Physics A",
issn = "0375-9474",
publisher = "Elsevier",

}

TY - JOUR

T1 - Efficient learning for spoken language understanding tasks with word embedding based pre-training

AU - Luan, Yi

AU - Watanabe, Shinji

AU - Harsham, Bret

PY - 2015

Y1 - 2015

N2 - Spoken language understanding (SLU) tasks such as goal estimation and intention identification from user's commands are essential components in spoken dialog systems. In recent years, neural network approaches have shown great success in various SLU tasks. However, one major difficulty of SLU is that the annotation of collected data can be expensive. Often this results in insufficient data being available for a task. The performance of a neural network trained in low resource conditions is usually inferior because of over-training. To improve the performance, this paper investigates the use of unsupervised training methods with large-scale corpora based on word embedding and latent topic models to pre-train the SLU networks. In order to capture long-term characteristics over the entire dialog, we propose a novel Recurrent Neural Network (RNN) architecture. The proposed RNN uses two sub-networks to model the different time scales represented by word and turn sequences. The combination of pre-training and RNN gives us a 18% relative error reduction compared to a baseline system.

AB - Spoken language understanding (SLU) tasks such as goal estimation and intention identification from user's commands are essential components in spoken dialog systems. In recent years, neural network approaches have shown great success in various SLU tasks. However, one major difficulty of SLU is that the annotation of collected data can be expensive. Often this results in insufficient data being available for a task. The performance of a neural network trained in low resource conditions is usually inferior because of over-training. To improve the performance, this paper investigates the use of unsupervised training methods with large-scale corpora based on word embedding and latent topic models to pre-train the SLU networks. In order to capture long-term characteristics over the entire dialog, we propose a novel Recurrent Neural Network (RNN) architecture. The proposed RNN uses two sub-networks to model the different time scales represented by word and turn sequences. The combination of pre-training and RNN gives us a 18% relative error reduction compared to a baseline system.

KW - Fine-tuning

KW - Goal estimation

KW - Recurrent neural networks

KW - Semantic embedding

KW - Spoken language understanding

UR - http://www.scopus.com/inward/record.url?scp=84959108247&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959108247&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84959108247

VL - 2015-January

SP - 1398

EP - 1402

JO - Nuclear Physics A

JF - Nuclear Physics A

SN - 0375-9474

ER -