Semi-supervised discourse relation classification with structural learning

Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a time-consuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classification based on Structural Learning. First, we solve a set of auxiliary classification problems using unlabeled data. Second, the learned classifiers are used to extend feature vectors to train a discourse relation classifier. By defining a relevant set of auxiliary classification problems, we show that the proposed method brings improvement of at least 50% in accuracy and F-score on the RST Discourse Treebank and Penn Discourse Treebank, when small training sets of ca. 1000 training instances are employed. This is an attractive perspective for training discourse relation classifiers on domains where little amount of labeled training data is available.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages340-352
Number of pages13
Volume6608 LNCS
EditionPART 1
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011 - Tokyo
Duration: 2011 Feb 202011 Feb 26

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume6608 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011
CityTokyo
Period11/2/2011/2/26

Fingerprint

Classifiers
Classifier
Taxonomies
Classification Problems
Discourse
Learning
Taxonomy
Feature Vector
Training

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Hernault, H., Bollegala, D., & Ishizuka, M. (2011). Semi-supervised discourse relation classification with structural learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 1 ed., Vol. 6608 LNCS, pp. 340-352). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6608 LNCS, No. PART 1). https://doi.org/10.1007/978-3-642-19400-9_27

Semi-supervised discourse relation classification with structural learning. / Hernault, Hugo; Bollegala, Danushka; Ishizuka, Mitsuru.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6608 LNCS PART 1. ed. 2011. p. 340-352 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6608 LNCS, No. PART 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hernault, H, Bollegala, D & Ishizuka, M 2011, Semi-supervised discourse relation classification with structural learning. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 edn, vol. 6608 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 6608 LNCS, pp. 340-352, 12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011, Tokyo, 11/2/20. https://doi.org/10.1007/978-3-642-19400-9_27
Hernault H, Bollegala D, Ishizuka M. Semi-supervised discourse relation classification with structural learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 ed. Vol. 6608 LNCS. 2011. p. 340-352. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). https://doi.org/10.1007/978-3-642-19400-9_27
Hernault, Hugo ; Bollegala, Danushka ; Ishizuka, Mitsuru. / Semi-supervised discourse relation classification with structural learning. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6608 LNCS PART 1. ed. 2011. pp. 340-352 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).
@inproceedings{56c488329d9e494ba8dd445a276c70cb,
title = "Semi-supervised discourse relation classification with structural learning",
abstract = "The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a time-consuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classification based on Structural Learning. First, we solve a set of auxiliary classification problems using unlabeled data. Second, the learned classifiers are used to extend feature vectors to train a discourse relation classifier. By defining a relevant set of auxiliary classification problems, we show that the proposed method brings improvement of at least 50{\%} in accuracy and F-score on the RST Discourse Treebank and Penn Discourse Treebank, when small training sets of ca. 1000 training instances are employed. This is an attractive perspective for training discourse relation classifiers on domains where little amount of labeled training data is available.",
author = "Hugo Hernault and Danushka Bollegala and Mitsuru Ishizuka",
year = "2011",
doi = "10.1007/978-3-642-19400-9_27",
language = "English",
isbn = "9783642193996",
volume = "6608 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 1",
pages = "340--352",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 1",

}

TY - GEN

T1 - Semi-supervised discourse relation classification with structural learning

AU - Hernault, Hugo

AU - Bollegala, Danushka

AU - Ishizuka, Mitsuru

PY - 2011

Y1 - 2011

N2 - The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a time-consuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classification based on Structural Learning. First, we solve a set of auxiliary classification problems using unlabeled data. Second, the learned classifiers are used to extend feature vectors to train a discourse relation classifier. By defining a relevant set of auxiliary classification problems, we show that the proposed method brings improvement of at least 50% in accuracy and F-score on the RST Discourse Treebank and Penn Discourse Treebank, when small training sets of ca. 1000 training instances are employed. This is an attractive perspective for training discourse relation classifiers on domains where little amount of labeled training data is available.

AB - The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a time-consuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classification based on Structural Learning. First, we solve a set of auxiliary classification problems using unlabeled data. Second, the learned classifiers are used to extend feature vectors to train a discourse relation classifier. By defining a relevant set of auxiliary classification problems, we show that the proposed method brings improvement of at least 50% in accuracy and F-score on the RST Discourse Treebank and Penn Discourse Treebank, when small training sets of ca. 1000 training instances are employed. This is an attractive perspective for training discourse relation classifiers on domains where little amount of labeled training data is available.

UR - http://www.scopus.com/inward/record.url?scp=79952275230&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952275230&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-19400-9_27

DO - 10.1007/978-3-642-19400-9_27

M3 - Conference contribution

SN - 9783642193996

VL - 6608 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 340

EP - 352

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -