Improving Syntactical Clone Detection Methods through the Use of an Intermediate Representation

Pedro M. Caldeira, Kazunori Sakamoto, Hironori Washizaki, Yoshiaki Fukazawa, Takahisa Shimada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Detection of type-3 and type-4 clones remains a difficult task. Current methods are complex, both on a conceptual and computational level. Similarly, their usage requires substantial implementation efforts. Instead of creating yet another method, it might be more productive to combine the simplicity of syntactic approaches with the abstractions granted by intermediate representations (IR). To this end, we devised a c-like IR based on LLVM and ran NiCad on it (LLNiCad). To establish whether the clone detection capabilities of syntactic approaches can be improved through an IR, we compared NiCad and LLNiCad on three open source projects taken from Krutz's benchmark and a subset of Google code jam solutions. In our results, the f1-score of LLNiCad consistently outperforms NiCad. Indeed, for all clone types in Krutz's benchmark, LLNiCad has a f1-score that is 37% higher than NiCad; with both better precision and recall. For type-4 clones in our GCJ benchmark, the f1-score of LLNiCad also outperforms CCCD (a semantic clone detector) by 44%. These findings suggest that IRs are beneficial for improving clone detection and that they have a larger impact on type-3 and type-4 clones.

Original languageEnglish
Title of host publicationIWSC 2020 - Proceedings of the 2020 IEEE 14th International Workshop on Software Clones
EditorsHitesh Sajnani, Chaiyong Ragkhitwetsagul
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages8-14
Number of pages7
ISBN (Electronic)9781728162690
DOIs
Publication statusPublished - 2020 Feb
Event14th IEEE International Workshop on Software Clones, IWSC 2020 - London, Canada
Duration: 2020 Feb 18 → …

Publication series

NameIWSC 2020 - Proceedings of the 2020 IEEE 14th International Workshop on Software Clones

Conference

Conference14th IEEE International Workshop on Software Clones, IWSC 2020
CountryCanada
CityLondon
Period20/2/18 → …

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint Dive into the research topics of 'Improving Syntactical Clone Detection Methods through the Use of an Intermediate Representation'. Together they form a unique fingerprint.

Cite this