Extraction of lexical bundles used in natural language processing articles

Chooi Ling Goh, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Lexical bundles are indispensable for fluent academic writing. They might not constitute complete structural units but they occur very frequently in academic conversations, conference presentations and scientific articles. This paper shows how to collect a large database of lexical bundles from articles in the Natural Language Processing (NLP) domain. We first collect highly frequent N-grams from the ACL-ARC collection of NLP articles and then classify them into true or false lexical bundles using machine learning models trained from a set of manually checked bundles. In a verification experiment, our best model achieves an accuracy of 76 %. Using this model, we extract more than 18,000 lexical bundles from the ACL-ARC corpus, which we publicly release.

Original languageEnglish
Title of host publication2019 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages223-228
Number of pages6
ISBN (Electronic)9781728152929
DOIs
Publication statusPublished - 2019 Oct
Event11th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019 - Bali, Indonesia
Duration: 2019 Oct 122019 Oct 13

Publication series

Name2019 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019

Conference

Conference11th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
CountryIndonesia
CityBali
Period19/10/1219/10/13

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Health Informatics
  • Education
  • Communication

Fingerprint Dive into the research topics of 'Extraction of lexical bundles used in natural language processing articles'. Together they form a unique fingerprint.

Cite this