A sequential model for discourse segmentation

Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages315-326
Number of pages12
Volume6008 LNCS
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010 - Iasi
Duration: 2010 Mar 212010 Mar 27

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6008 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010
CityIasi
Period10/3/2110/3/27

Fingerprint

Segmentation
Syntactics
Labeling
Support vector machines
Processing
Conditional Random Fields
Model
Question Answering
Summarization
Natural Language
Support Vector Machine
Discourse
Unit
Experimental Results
Text

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Hernault, H., Bollegala, D., & Ishizuka, M. (2010). A sequential model for discourse segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6008 LNCS, pp. 315-326). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6008 LNCS). https://doi.org/10.1007/978-3-642-12116-6_26

A sequential model for discourse segmentation. / Hernault, Hugo; Bollegala, Danushka; Ishizuka, Mitsuru.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6008 LNCS 2010. p. 315-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6008 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hernault, H, Bollegala, D & Ishizuka, M 2010, A sequential model for discourse segmentation. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 6008 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6008 LNCS, pp. 315-326, 11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010, Iasi, 10/3/21. https://doi.org/10.1007/978-3-642-12116-6_26
Hernault H, Bollegala D, Ishizuka M. A sequential model for discourse segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6008 LNCS. 2010. p. 315-326. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-12116-6_26
Hernault, Hugo ; Bollegala, Danushka ; Ishizuka, Mitsuru. / A sequential model for discourse segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6008 LNCS 2010. pp. 315-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{c42bc5e0bc1946f6a572c43a7c1c0fc6,
title = "A sequential model for discourse segmentation",
abstract = "Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.",
author = "Hugo Hernault and Danushka Bollegala and Mitsuru Ishizuka",
year = "2010",
doi = "10.1007/978-3-642-12116-6_26",
language = "English",
isbn = "3642121152",
volume = "6008 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "315--326",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A sequential model for discourse segmentation

AU - Hernault, Hugo

AU - Bollegala, Danushka

AU - Ishizuka, Mitsuru

PY - 2010

Y1 - 2010

N2 - Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.

AB - Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.

UR - http://www.scopus.com/inward/record.url?scp=78650444122&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650444122&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12116-6_26

DO - 10.1007/978-3-642-12116-6_26

M3 - Conference contribution

AN - SCOPUS:78650444122

SN - 3642121152

SN - 9783642121159

VL - 6008 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 315

EP - 326

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -