From rhetorical structures to document structure: Shallow pragmatic analysis for document engineering

Gersende Georg, Hugo Hernault, Marc Cavazza, Helmut Prendinger, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

In this paper, we extend previous work on the automatic structuring of medical documents using content analysis. Our long-term objective is to take advantage of specific rhetoric markers encountered in specialized medical documents (clinical guidelines) to automatically structure free text according to its role in the document. This should enable to generate multiple views of the same document depending on the target audience, generate document summaries, as well as facilitating knowledge extraction from text. We have established in previous work that the structure of clinical guidelines could be refined through the identification of a limited set of deontic operators. We now propose to extend this approach by analyzing the text delimited by these operators using Rhetorical Structure Theory. The emphasis on causality and time in RST proves a powerful complement to the recognition of deontic structures while retaining the same philosophy of high-level recognition of sentence structure, which can be converted into applicationspecific mark-ups. Throughout the paper, we illustrate our findings through results produced by the automatic processing of English guidelines for the management of hypertension and Alzheimer disease.

Original languageEnglish
Title of host publicationDocEng'09 - Proceedings of the 2009 ACM Symposium on Document Engineering
Pages185-192
Number of pages8
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event9th ACM Symposium on Document Engineering, DocEng'09 - Munich
Duration: 2009 Sep 152009 Sep 18

Other

Other9th ACM Symposium on Document Engineering, DocEng'09
CityMunich
Period09/9/1509/9/18

Fingerprint

Processing

Keywords

  • Medical document processing
  • Natural language processing

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Georg, G., Hernault, H., Cavazza, M., Prendinger, H., & Ishizuka, M. (2009). From rhetorical structures to document structure: Shallow pragmatic analysis for document engineering. In DocEng'09 - Proceedings of the 2009 ACM Symposium on Document Engineering (pp. 185-192) https://doi.org/10.1145/1600193.1600235

From rhetorical structures to document structure : Shallow pragmatic analysis for document engineering. / Georg, Gersende; Hernault, Hugo; Cavazza, Marc; Prendinger, Helmut; Ishizuka, Mitsuru.

DocEng'09 - Proceedings of the 2009 ACM Symposium on Document Engineering. 2009. p. 185-192.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Georg, G, Hernault, H, Cavazza, M, Prendinger, H & Ishizuka, M 2009, From rhetorical structures to document structure: Shallow pragmatic analysis for document engineering. in DocEng'09 - Proceedings of the 2009 ACM Symposium on Document Engineering. pp. 185-192, 9th ACM Symposium on Document Engineering, DocEng'09, Munich, 09/9/15. https://doi.org/10.1145/1600193.1600235
Georg G, Hernault H, Cavazza M, Prendinger H, Ishizuka M. From rhetorical structures to document structure: Shallow pragmatic analysis for document engineering. In DocEng'09 - Proceedings of the 2009 ACM Symposium on Document Engineering. 2009. p. 185-192 https://doi.org/10.1145/1600193.1600235
Georg, Gersende ; Hernault, Hugo ; Cavazza, Marc ; Prendinger, Helmut ; Ishizuka, Mitsuru. / From rhetorical structures to document structure : Shallow pragmatic analysis for document engineering. DocEng'09 - Proceedings of the 2009 ACM Symposium on Document Engineering. 2009. pp. 185-192
@inproceedings{5d057f271bca4a3b8541614374ff674c,
title = "From rhetorical structures to document structure: Shallow pragmatic analysis for document engineering",
abstract = "In this paper, we extend previous work on the automatic structuring of medical documents using content analysis. Our long-term objective is to take advantage of specific rhetoric markers encountered in specialized medical documents (clinical guidelines) to automatically structure free text according to its role in the document. This should enable to generate multiple views of the same document depending on the target audience, generate document summaries, as well as facilitating knowledge extraction from text. We have established in previous work that the structure of clinical guidelines could be refined through the identification of a limited set of deontic operators. We now propose to extend this approach by analyzing the text delimited by these operators using Rhetorical Structure Theory. The emphasis on causality and time in RST proves a powerful complement to the recognition of deontic structures while retaining the same philosophy of high-level recognition of sentence structure, which can be converted into applicationspecific mark-ups. Throughout the paper, we illustrate our findings through results produced by the automatic processing of English guidelines for the management of hypertension and Alzheimer disease.",
keywords = "Medical document processing, Natural language processing",
author = "Gersende Georg and Hugo Hernault and Marc Cavazza and Helmut Prendinger and Mitsuru Ishizuka",
year = "2009",
doi = "10.1145/1600193.1600235",
language = "English",
isbn = "9781605585758",
pages = "185--192",
booktitle = "DocEng'09 - Proceedings of the 2009 ACM Symposium on Document Engineering",

}

TY - GEN

T1 - From rhetorical structures to document structure

T2 - Shallow pragmatic analysis for document engineering

AU - Georg, Gersende

AU - Hernault, Hugo

AU - Cavazza, Marc

AU - Prendinger, Helmut

AU - Ishizuka, Mitsuru

PY - 2009

Y1 - 2009

N2 - In this paper, we extend previous work on the automatic structuring of medical documents using content analysis. Our long-term objective is to take advantage of specific rhetoric markers encountered in specialized medical documents (clinical guidelines) to automatically structure free text according to its role in the document. This should enable to generate multiple views of the same document depending on the target audience, generate document summaries, as well as facilitating knowledge extraction from text. We have established in previous work that the structure of clinical guidelines could be refined through the identification of a limited set of deontic operators. We now propose to extend this approach by analyzing the text delimited by these operators using Rhetorical Structure Theory. The emphasis on causality and time in RST proves a powerful complement to the recognition of deontic structures while retaining the same philosophy of high-level recognition of sentence structure, which can be converted into applicationspecific mark-ups. Throughout the paper, we illustrate our findings through results produced by the automatic processing of English guidelines for the management of hypertension and Alzheimer disease.

AB - In this paper, we extend previous work on the automatic structuring of medical documents using content analysis. Our long-term objective is to take advantage of specific rhetoric markers encountered in specialized medical documents (clinical guidelines) to automatically structure free text according to its role in the document. This should enable to generate multiple views of the same document depending on the target audience, generate document summaries, as well as facilitating knowledge extraction from text. We have established in previous work that the structure of clinical guidelines could be refined through the identification of a limited set of deontic operators. We now propose to extend this approach by analyzing the text delimited by these operators using Rhetorical Structure Theory. The emphasis on causality and time in RST proves a powerful complement to the recognition of deontic structures while retaining the same philosophy of high-level recognition of sentence structure, which can be converted into applicationspecific mark-ups. Throughout the paper, we illustrate our findings through results produced by the automatic processing of English guidelines for the management of hypertension and Alzheimer disease.

KW - Medical document processing

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=70450265536&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70450265536&partnerID=8YFLogxK

U2 - 10.1145/1600193.1600235

DO - 10.1145/1600193.1600235

M3 - Conference contribution

AN - SCOPUS:70450265536

SN - 9781605585758

SP - 185

EP - 192

BT - DocEng'09 - Proceedings of the 2009 ACM Symposium on Document Engineering

ER -