A semi-supervised approach to improve classification of infrequent discourse relations using feature vector extension

Hugo Hernault*, Danushka Bollegala, Mitsuru Ishizuka

*この研究の対応する著者

研究成果

31 被引用数 (Scopus)

抄録

Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, un-labeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based on the analysis of co-occurring features in unlabeled data, which is then taken into account for extending the feature vectors given to a classifier. Our experimental results on the RST Discourse Tree-bank corpus and Penn Discourse Treebank indicate that the proposed method brings a significant improvement in classification accuracy and macro-average F-score when small training datasets are used. For instance, with training sets of c.a. 1000 labeled instances, the proposed method brings improvements in accuracy and macro-average F-score up to 50% compared to a baseline classifier. We believe that the proposed method is a first step towards detecting low-occurrence relations, which is useful for domains with a lack of annotated data.

本文言語English
ホスト出版物のタイトルEMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
ページ399-409
ページ数11
出版ステータスPublished - 2010
外部発表はい
イベントConference on Empirical Methods in Natural Language Processing, EMNLP 2010 - Cambridge, MA
継続期間: 2010 10 92010 10 11

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2010
CityCambridge, MA
Period10/10/910/10/11

ASJC Scopus subject areas

  • 計算理論と計算数学
  • コンピュータ サイエンスの応用
  • 情報システム

フィンガープリント

「A semi-supervised approach to improve classification of infrequent discourse relations using feature vector extension」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル