Sonnekalb, Tim and Gruner, Bernd and Brust, Clemens-Alexander and Mäder, Patrick (2022) Generalizability of Code Clone Detection on CodeBERT. In: 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022. ACM. ASE 2022, 2022-10-10 - 2022-10-14, Michigan, USA. ISBN 978-145039475-8.
|
PDF
383kB |
Abstract
Transformer networks such as CodeBERT already achieve very good results for code clone detection in benchmark datasets, so one could assume that this task has already been solved. However, code clone detection is not a trivial task. Semantic code clones in particular are difficult to detect. We show that the generalizability of CodeBERT decreases by evaluating two different subsets of Java code clones from BigCloneBench. We observe a significant drop of F1 score when we evaluate different code snippets and different functionality IDs than those used for model building.
| Item URL in elib: | https://elib.dlr.de/144942/ | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Document Type: | Conference or Workshop Item (Speech) | ||||||||||||||||||||
| Title: | Generalizability of Code Clone Detection on CodeBERT | ||||||||||||||||||||
| Authors: |
| ||||||||||||||||||||
| Date: | 10 October 2022 | ||||||||||||||||||||
| Journal or Publication Title: | 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022 | ||||||||||||||||||||
| Refereed publication: | Yes | ||||||||||||||||||||
| Open Access: | Yes | ||||||||||||||||||||
| Gold Open Access: | No | ||||||||||||||||||||
| In SCOPUS: | No | ||||||||||||||||||||
| In ISI Web of Science: | Yes | ||||||||||||||||||||
| Publisher: | ACM | ||||||||||||||||||||
| ISBN: | 978-145039475-8 | ||||||||||||||||||||
| Status: | Published | ||||||||||||||||||||
| Keywords: | clone detection, transformer networks, bigclonebench, machine learning on code | ||||||||||||||||||||
| Event Title: | ASE 2022 | ||||||||||||||||||||
| Event Location: | Michigan, USA | ||||||||||||||||||||
| Event Type: | international Conference | ||||||||||||||||||||
| Event Start Date: | 10 October 2022 | ||||||||||||||||||||
| Event End Date: | 14 October 2022 | ||||||||||||||||||||
| HGF - Research field: | Aeronautics, Space and Transport | ||||||||||||||||||||
| HGF - Program: | Space | ||||||||||||||||||||
| HGF - Program Themes: | Space System Technology | ||||||||||||||||||||
| DLR - Research area: | Raumfahrt | ||||||||||||||||||||
| DLR - Program: | R SY - Space System Technology | ||||||||||||||||||||
| DLR - Research theme (Project): | R - Intelligent analysis and methods for safe software development, D - short study [DAT], D - short study [KIZ] | ||||||||||||||||||||
| Location: | Jena , Köln-Porz , Oberpfaffenhofen | ||||||||||||||||||||
| Institutes and Institutions: | Institute of Data Science Institute of Data Science > Data Analysis and Intelligence | ||||||||||||||||||||
| Deposited By: | Sonnekalb, Tim | ||||||||||||||||||||
| Deposited On: | 05 Dec 2022 10:46 | ||||||||||||||||||||
| Last Modified: | 24 Apr 2024 20:44 |
Repository Staff Only: item control page