Druskat, Stephan and Chue Hong, Neil P. and Buzzard, Sammie and Konovalov, Olexandr and Kornek, Patrick (2024) Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research. [Other]
PDF
473kB |
Official URL: http://arxiv.org/abs/2402.14602
Abstract
Datasets collecting software mentions from scholarly publications can potentially be used for research into the software that has been used in the published research, as well as into the practice of software citation. Recently, new software mention datasets with different characteristics have been published. We present an approach to assess the usability of such datasets for research on research software. Our approach includes sampling and data preparation, manual annotation for quality and mention characteristics, and annotation analysis. We applied it to two software mention datasets for evaluation based on qualitative observation. Doing this, we were able to find challenges to working with the selected datasets to do research. Main issues refer to the structure of the dataset, the quality of the extracted mentions (54% and 23% of mentions respectively are not to software), and software accessibility. While one dataset does not provide links to mentioned software at all, the other does so in a way that can impede quantitative research endeavors: (1) Links may come from different sources and each point to different software for the same mention. (2) The quality of the automatically retrieved links is generally poor (in our sample, 65.4% link the wrong software). (3) Links exist only for a small subset (in our sample, 20.5%) of mentions, which may lead to skewed or disproportionate samples. However, the greatest challenge and underlying issue in working with software mention datasets is the still suboptimal practice of software citation: Software should not be mentioned, it should be cited following the software citation principles.
Item URL in elib: | https://elib.dlr.de/202972/ | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Document Type: | Other | ||||||||||||||||||||||||
Additional Information: | 2nd revision of a submission to PeerJ Computer Science. Original submission withdrawn due to impracticalities of examining a sufficient sample size. | ||||||||||||||||||||||||
Title: | Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research | ||||||||||||||||||||||||
Authors: |
| ||||||||||||||||||||||||
Date: | 2024 | ||||||||||||||||||||||||
Journal or Publication Title: | ArXiv | ||||||||||||||||||||||||
Refereed publication: | Yes | ||||||||||||||||||||||||
Open Access: | Yes | ||||||||||||||||||||||||
DOI: | 10.48550/arXiv.2402.14602 | ||||||||||||||||||||||||
Number of Pages: | 17 | ||||||||||||||||||||||||
Status: | Published | ||||||||||||||||||||||||
Keywords: | Software citation, empirical software engineering, datasets | ||||||||||||||||||||||||
HGF - Research field: | Aeronautics, Space and Transport | ||||||||||||||||||||||||
HGF - Program: | Space | ||||||||||||||||||||||||
HGF - Program Themes: | Space System Technology | ||||||||||||||||||||||||
DLR - Research area: | Raumfahrt | ||||||||||||||||||||||||
DLR - Program: | R SY - Space System Technology | ||||||||||||||||||||||||
DLR - Research theme (Project): | R - Tasks SISTEC | ||||||||||||||||||||||||
Location: | Berlin-Adlershof | ||||||||||||||||||||||||
Institutes and Institutions: | Institute of Software Technology > Intelligent and Distributed Systems Institute of Software Technology | ||||||||||||||||||||||||
Deposited By: | Druskat, Stephan | ||||||||||||||||||||||||
Deposited On: | 26 Feb 2024 10:08 | ||||||||||||||||||||||||
Last Modified: | 26 Feb 2024 10:08 |
Repository Staff Only: item control page