elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research

Druskat, Stephan and Chue Hong, Neil P. and Buzzard, Sammie and Konovalov, Olexandr and Kornek, Patrick (2024) Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research. [Other]

[img] PDF
473kB

Official URL: http://arxiv.org/abs/2402.14602

Abstract

Datasets collecting software mentions from scholarly publications can potentially be used for research into the software that has been used in the published research, as well as into the practice of software citation. Recently, new software mention datasets with different characteristics have been published. We present an approach to assess the usability of such datasets for research on research software. Our approach includes sampling and data preparation, manual annotation for quality and mention characteristics, and annotation analysis. We applied it to two software mention datasets for evaluation based on qualitative observation. Doing this, we were able to find challenges to working with the selected datasets to do research. Main issues refer to the structure of the dataset, the quality of the extracted mentions (54% and 23% of mentions respectively are not to software), and software accessibility. While one dataset does not provide links to mentioned software at all, the other does so in a way that can impede quantitative research endeavors: (1) Links may come from different sources and each point to different software for the same mention. (2) The quality of the automatically retrieved links is generally poor (in our sample, 65.4% link the wrong software). (3) Links exist only for a small subset (in our sample, 20.5%) of mentions, which may lead to skewed or disproportionate samples. However, the greatest challenge and underlying issue in working with software mention datasets is the still suboptimal practice of software citation: Software should not be mentioned, it should be cited following the software citation principles.

Item URL in elib:https://elib.dlr.de/202972/
Document Type:Other
Additional Information:2nd revision of a submission to PeerJ Computer Science. Original submission withdrawn due to impracticalities of examining a sufficient sample size.
Title:Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Druskat, StephanUNSPECIFIEDhttps://orcid.org/0000-0003-4925-7248154027141
Chue Hong, Neil P.EPCC, University of Edinburgh, Edinburgh, UKhttps://orcid.org/0000-0002-8876-7606UNSPECIFIED
Buzzard, SammieSchool of Earth and Environmental Sciences, Cardiff University, Cardiff, United KingdomUNSPECIFIEDUNSPECIFIED
Konovalov, OlexandrSchool of Computer Science, University of St Andrews, St Andrews, United Kingdomhttps://orcid.org/0000-0001-5299-3292UNSPECIFIED
Kornek, PatrickSchool of Computer Science, University of St Andrews, St Andrews, United KingdomUNSPECIFIEDUNSPECIFIED
Date:2024
Journal or Publication Title:ArXiv
Refereed publication:Yes
Open Access:Yes
DOI:10.48550/arXiv.2402.14602
Number of Pages:17
Status:Published
Keywords:Software citation, empirical software engineering, datasets
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space System Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Space System Technology
DLR - Research theme (Project):R - Tasks SISTEC
Location: Berlin-Adlershof
Institutes and Institutions:Institute of Software Technology > Intelligent and Distributed Systems
Institute of Software Technology
Deposited By: Druskat, Stephan
Deposited On:26 Feb 2024 10:08
Last Modified:26 Feb 2024 10:08

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.