Druskat, Stephan und Chue Hong, Neil P. und Buzzard, Sammie und Konovalov, Olexandr und Kornek, Patrick (2024) Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research. [sonstige Veröffentlichung]
PDF
473kB |
Offizielle URL: http://arxiv.org/abs/2402.14602
Kurzfassung
Datasets collecting software mentions from scholarly publications can potentially be used for research into the software that has been used in the published research, as well as into the practice of software citation. Recently, new software mention datasets with different characteristics have been published. We present an approach to assess the usability of such datasets for research on research software. Our approach includes sampling and data preparation, manual annotation for quality and mention characteristics, and annotation analysis. We applied it to two software mention datasets for evaluation based on qualitative observation. Doing this, we were able to find challenges to working with the selected datasets to do research. Main issues refer to the structure of the dataset, the quality of the extracted mentions (54% and 23% of mentions respectively are not to software), and software accessibility. While one dataset does not provide links to mentioned software at all, the other does so in a way that can impede quantitative research endeavors: (1) Links may come from different sources and each point to different software for the same mention. (2) The quality of the automatically retrieved links is generally poor (in our sample, 65.4% link the wrong software). (3) Links exist only for a small subset (in our sample, 20.5%) of mentions, which may lead to skewed or disproportionate samples. However, the greatest challenge and underlying issue in working with software mention datasets is the still suboptimal practice of software citation: Software should not be mentioned, it should be cited following the software citation principles.
elib-URL des Eintrags: | https://elib.dlr.de/202972/ | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dokumentart: | sonstige Veröffentlichung | ||||||||||||||||||||||||
Zusätzliche Informationen: | 2nd revision of a submission to PeerJ Computer Science. Original submission withdrawn due to impracticalities of examining a sufficient sample size. | ||||||||||||||||||||||||
Titel: | Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research | ||||||||||||||||||||||||
Autoren: |
| ||||||||||||||||||||||||
Datum: | 2024 | ||||||||||||||||||||||||
Erschienen in: | ArXiv | ||||||||||||||||||||||||
Referierte Publikation: | Ja | ||||||||||||||||||||||||
Open Access: | Ja | ||||||||||||||||||||||||
DOI: | 10.48550/arXiv.2402.14602 | ||||||||||||||||||||||||
Seitenanzahl: | 17 | ||||||||||||||||||||||||
Status: | veröffentlicht | ||||||||||||||||||||||||
Stichwörter: | Software citation, empirical software engineering, datasets | ||||||||||||||||||||||||
HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||||||||||
HGF - Programm: | Raumfahrt | ||||||||||||||||||||||||
HGF - Programmthema: | Technik für Raumfahrtsysteme | ||||||||||||||||||||||||
DLR - Schwerpunkt: | Raumfahrt | ||||||||||||||||||||||||
DLR - Forschungsgebiet: | R SY - Technik für Raumfahrtsysteme | ||||||||||||||||||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | R - Aufgaben SISTEC | ||||||||||||||||||||||||
Standort: | Berlin-Adlershof | ||||||||||||||||||||||||
Institute & Einrichtungen: | Institut für Softwaretechnologie > Intelligente und verteilte Systeme Institut für Softwaretechnologie | ||||||||||||||||||||||||
Hinterlegt von: | Druskat, Stephan | ||||||||||||||||||||||||
Hinterlegt am: | 26 Feb 2024 10:08 | ||||||||||||||||||||||||
Letzte Änderung: | 26 Feb 2024 10:08 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags