Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization

Tingyi Liu, Mizuho Iwaihara*

*この研究の対応する著者

研究成果: Conference contribution

抄録

Keyphrase extraction is the task of selecting a set of phrases that can best represent a given document. Keyphrase extraction is utilized in document indexing and categorization, thus being one of core technologies of digital libraries. Supervised keyphrase extraction based on pretrained language models are advantageous thorough their contextualized text representations. In this paper, we show an adaptation of the pertained language model BERT to keyphrase extraction, called BERT Keyphrase-Rank (BK-Rank), based on a cross-encoder architecture. However, the accuracy of BK-Rank alone is suffering when documents contain a large amount of candidate phrases, especially in long documents. Based on the notion that keyphrases are more likely to occur in representative sentences of the document, we propose a new approach called Keyphrase-Focused BERT Summarization (KFBS), which extracts important sentences as a summary, from which BK-Rank can more easily find keyphrases. Training of KFBS is by distant supervision such that sentences lexically similar to the keyphrase set are chosen as positive samples. Our experimental results show that the combination of KFBS + BK-Rank show superior performance over the compared baseline methods on well-known four benchmark collections, especially on long documents.

本文言語English
ホスト出版物のタイトルTowards Open and Trustworthy Digital Societies - 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings
編集者Hao-Ren Ke, Chei Sian Lee, Kazunari Sugiyama
出版社Springer Science and Business Media Deutschland GmbH
ページ157-166
ページ数10
ISBN(印刷版)9783030916688
DOI
出版ステータスPublished - 2021
イベント23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021 - Virtual, Online
継続期間: 2021 12月 12021 12月 3

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13133 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021
CityVirtual, Online
Period21/12/121/12/3

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル