Duplicate Bug Report Detection by Using Sentence Embedding and Fine-tuning

Haruna Isotani, Hironori Washizaki, Yoshiaki Fukazawa, Tsutomu Nomoto, Saori Ouji, Shinobu Saito

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Industrial software maintenance devotes much time and effort to find duplicate bug reports. In this paper, we propose an automated duplicate bug report detection system to improve software maintenance efficiency. Our system detects duplicate reports by vectorizing the contents of each report item by deep-learning-based sentence embedding and calculating the similarity of the whole report from those of the item vectors. The Sentence-BERT fine-tuned with report texts is used for sentence embedding. Finally, we verify that the combination of processing separately by item and Sentence-BERT fine-tuned with reports effectively detects duplicate bug reports in industrial experiments that compare the performance of existing methods.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Software Maintenance and Evolution, ICSME 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages535-544
Number of pages10
ISBN (Electronic)9781665428828
DOIs
Publication statusPublished - 2021
Event37th IEEE International Conference on Software Maintenance and Evolution, ICSME 2021 - Luxembourg City, Luxembourg
Duration: 2021 Sep 272021 Oct 1

Publication series

NameProceedings - 2021 IEEE International Conference on Software Maintenance and Evolution, ICSME 2021

Conference

Conference37th IEEE International Conference on Software Maintenance and Evolution, ICSME 2021
Country/TerritoryLuxembourg
CityLuxembourg City
Period21/9/2721/10/1

Keywords

  • BERT
  • Bug reports
  • duplicate detection
  • information retrieval
  • natural language processing
  • sentence embedding

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Duplicate Bug Report Detection by Using Sentence Embedding and Fine-tuning'. Together they form a unique fingerprint.

Cite this