Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization

Tingyi Liu, Mizuho Iwaihara*


研究成果: Conference contribution

2 被引用数 (Scopus)


Keyphrase extraction is the task of selecting a set of phrases that can best represent a given document. Keyphrase extraction is utilized in document indexing and categorization, thus being one of core technologies of digital libraries. Supervised keyphrase extraction based on pretrained language models are advantageous thorough their contextualized text representations. In this paper, we show an adaptation of the pertained language model BERT to keyphrase extraction, called BERT Keyphrase-Rank (BK-Rank), based on a cross-encoder architecture. However, the accuracy of BK-Rank alone is suffering when documents contain a large amount of candidate phrases, especially in long documents. Based on the notion that keyphrases are more likely to occur in representative sentences of the document, we propose a new approach called Keyphrase-Focused BERT Summarization (KFBS), which extracts important sentences as a summary, from which BK-Rank can more easily find keyphrases. Training of KFBS is by distant supervision such that sentences lexically similar to the keyphrase set are chosen as positive samples. Our experimental results show that the combination of KFBS + BK-Rank show superior performance over the compared baseline methods on well-known four benchmark collections, especially on long documents.

ホスト出版物のタイトルTowards Open and Trustworthy Digital Societies - 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings
編集者Hao-Ren Ke, Chei Sian Lee, Kazunari Sugiyama
出版社Springer Science and Business Media Deutschland GmbH
出版ステータスPublished - 2021
イベント23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021 - Virtual, Online
継続期間: 2021 12月 12021 12月 3


名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13133 LNCS


Conference23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021
CityVirtual, Online

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)


「Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。