Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization

Tingyi Liu, Mizuho Iwaihara*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Keyphrase extraction is the task of selecting a set of phrases that can best represent a given document. Keyphrase extraction is utilized in document indexing and categorization, thus being one of core technologies of digital libraries. Supervised keyphrase extraction based on pretrained language models are advantageous thorough their contextualized text representations. In this paper, we show an adaptation of the pertained language model BERT to keyphrase extraction, called BERT Keyphrase-Rank (BK-Rank), based on a cross-encoder architecture. However, the accuracy of BK-Rank alone is suffering when documents contain a large amount of candidate phrases, especially in long documents. Based on the notion that keyphrases are more likely to occur in representative sentences of the document, we propose a new approach called Keyphrase-Focused BERT Summarization (KFBS), which extracts important sentences as a summary, from which BK-Rank can more easily find keyphrases. Training of KFBS is by distant supervision such that sentences lexically similar to the keyphrase set are chosen as positive samples. Our experimental results show that the combination of KFBS + BK-Rank show superior performance over the compared baseline methods on well-known four benchmark collections, especially on long documents.

Original languageEnglish
Title of host publicationTowards Open and Trustworthy Digital Societies - 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings
EditorsHao-Ren Ke, Chei Sian Lee, Kazunari Sugiyama
PublisherSpringer Science and Business Media Deutschland GmbH
Pages157-166
Number of pages10
ISBN (Print)9783030916688
DOIs
Publication statusPublished - 2021
Event23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021 - Virtual, Online
Duration: 2021 Dec 12021 Dec 3

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13133 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021
CityVirtual, Online
Period21/12/121/12/3

Keywords

  • Document indexing
  • Extractive summarization
  • Keyphrase extraction
  • Pretrained language model
  • Supervised learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization'. Together they form a unique fingerprint.

Cite this