Towards semi-supervised classification of discourse relations using feature correlations

Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Two of the main corpora available for training discourse relation classifiers are the RST Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), which are both based on the Wall Street Journal corpus. Most recent work using discourse relation classifiers have employed fully-supervised methods on these corpora. However, certain discourse relations have little labeled data, causing low classification performance for their associated classes. In this paper, we attempt to tackle this problem by employing a semi-supervised method for discourse relation classification. The proposed method is based on the analysis of feature cooccurrences in unlabeled data. This information is then used as a basis to extend the feature vectors during training. The proposed method is evaluated on both RST-DT and PDTB, where it significantly outperformed baseline classifiers. We believe that the proposed method is a first step towards improving classification performance, particularly for discourse relations lacking annotated data.

Original languageEnglish
Title of host publicationProceedings of the SIGDIAL 2010 Conference: 11th Annual Meeting of the Special Interest Group onDiscourse and Dialogue
Pages55-58
Number of pages4
Publication statusPublished - 2010
Externally publishedYes
Event11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2010 - Tokyo
Duration: 2010 Sep 242010 Sep 25

Other

Other11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2010
CityTokyo
Period10/9/2410/9/25

Fingerprint

Supervised Classification
Classifiers
Classifier
Discourse
Feature Vector
Baseline

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Modelling and Simulation

Cite this

Hernault, H., Bollegala, D., & Ishizuka, M. (2010). Towards semi-supervised classification of discourse relations using feature correlations. In Proceedings of the SIGDIAL 2010 Conference: 11th Annual Meeting of the Special Interest Group onDiscourse and Dialogue (pp. 55-58)

Towards semi-supervised classification of discourse relations using feature correlations. / Hernault, Hugo; Bollegala, Danushka; Ishizuka, Mitsuru.

Proceedings of the SIGDIAL 2010 Conference: 11th Annual Meeting of the Special Interest Group onDiscourse and Dialogue. 2010. p. 55-58.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hernault, H, Bollegala, D & Ishizuka, M 2010, Towards semi-supervised classification of discourse relations using feature correlations. in Proceedings of the SIGDIAL 2010 Conference: 11th Annual Meeting of the Special Interest Group onDiscourse and Dialogue. pp. 55-58, 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2010, Tokyo, 10/9/24.
Hernault H, Bollegala D, Ishizuka M. Towards semi-supervised classification of discourse relations using feature correlations. In Proceedings of the SIGDIAL 2010 Conference: 11th Annual Meeting of the Special Interest Group onDiscourse and Dialogue. 2010. p. 55-58
Hernault, Hugo ; Bollegala, Danushka ; Ishizuka, Mitsuru. / Towards semi-supervised classification of discourse relations using feature correlations. Proceedings of the SIGDIAL 2010 Conference: 11th Annual Meeting of the Special Interest Group onDiscourse and Dialogue. 2010. pp. 55-58
@inproceedings{04b0045543904e629281a4ee103db652,
title = "Towards semi-supervised classification of discourse relations using feature correlations",
abstract = "Two of the main corpora available for training discourse relation classifiers are the RST Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), which are both based on the Wall Street Journal corpus. Most recent work using discourse relation classifiers have employed fully-supervised methods on these corpora. However, certain discourse relations have little labeled data, causing low classification performance for their associated classes. In this paper, we attempt to tackle this problem by employing a semi-supervised method for discourse relation classification. The proposed method is based on the analysis of feature cooccurrences in unlabeled data. This information is then used as a basis to extend the feature vectors during training. The proposed method is evaluated on both RST-DT and PDTB, where it significantly outperformed baseline classifiers. We believe that the proposed method is a first step towards improving classification performance, particularly for discourse relations lacking annotated data.",
author = "Hugo Hernault and Danushka Bollegala and Mitsuru Ishizuka",
year = "2010",
language = "English",
isbn = "9781932432855",
pages = "55--58",
booktitle = "Proceedings of the SIGDIAL 2010 Conference: 11th Annual Meeting of the Special Interest Group onDiscourse and Dialogue",

}

TY - GEN

T1 - Towards semi-supervised classification of discourse relations using feature correlations

AU - Hernault, Hugo

AU - Bollegala, Danushka

AU - Ishizuka, Mitsuru

PY - 2010

Y1 - 2010

N2 - Two of the main corpora available for training discourse relation classifiers are the RST Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), which are both based on the Wall Street Journal corpus. Most recent work using discourse relation classifiers have employed fully-supervised methods on these corpora. However, certain discourse relations have little labeled data, causing low classification performance for their associated classes. In this paper, we attempt to tackle this problem by employing a semi-supervised method for discourse relation classification. The proposed method is based on the analysis of feature cooccurrences in unlabeled data. This information is then used as a basis to extend the feature vectors during training. The proposed method is evaluated on both RST-DT and PDTB, where it significantly outperformed baseline classifiers. We believe that the proposed method is a first step towards improving classification performance, particularly for discourse relations lacking annotated data.

AB - Two of the main corpora available for training discourse relation classifiers are the RST Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), which are both based on the Wall Street Journal corpus. Most recent work using discourse relation classifiers have employed fully-supervised methods on these corpora. However, certain discourse relations have little labeled data, causing low classification performance for their associated classes. In this paper, we attempt to tackle this problem by employing a semi-supervised method for discourse relation classification. The proposed method is based on the analysis of feature cooccurrences in unlabeled data. This information is then used as a basis to extend the feature vectors during training. The proposed method is evaluated on both RST-DT and PDTB, where it significantly outperformed baseline classifiers. We believe that the proposed method is a first step towards improving classification performance, particularly for discourse relations lacking annotated data.

UR - http://www.scopus.com/inward/record.url?scp=84857740734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857740734&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84857740734

SN - 9781932432855

SP - 55

EP - 58

BT - Proceedings of the SIGDIAL 2010 Conference: 11th Annual Meeting of the Special Interest Group onDiscourse and Dialogue

ER -