Hu, Xuke und Elßner, Tobias und Zheng, Shiyu und Ngonidzashe Serere, Helen und Kersten, Jens und Klan, Friederike und Qiu, Qinjun (2024) DLRGeoTweet: A comprehensive social media geocoding corpus featuring fine-grained places. Information Processing and Management. Elsevier. doi: 10.1016/j.ipm.2024.103742. ISSN 0306-4573.
Dieses Archiv kann nicht den Volltext zur Verfügung stellen.
Kurzfassung
Every day, many short text messages on social media are generated in response to real-world events, providing a valuable resource for various domains such as emergency response and traffic management. Since exact coordinates of social media posts are rarely attached by users, accurately recognizing and resolving fine-grained place names, such as home addresses and Points of Interest, from these posts is crucial for understanding the precise locations of critical events, such as rescue requests. This task, known as geoparsing, involves toponym recognition and toponym resolution or geocoding. However, existing social media datasets for evaluating geoparsing approaches often lack sufficient fine-grained place names with associated geo-coordinates, making evaluating, comparing, and training geocoding methods for such locations challenging. Moreover, the absence of supportive annotation tools compounds this challenge. To address these gaps, we implemented a lightweight Python tool leveraging the power of Nominatim. Using this tool, we annotated a comprehensive X (formerly Twitter) geocoding corpus called DLRGeoTweet. The corpus underwent a rigorous cross-validation process to guarantee its quality. This corpus includes a total of 7,364 tweets and 12,510 places, of which 6,012 are fine-grained. It comprises two global datasets encompassing worldwide events and three local datasets related to local events such as the 2017 Hurricane Harvey. The annotation process spanned over ten months and required approximately 1000 person-hours to complete. We then evaluate 15 latest and representative geocoding approaches, including many deep learning-based, on DLRGeoTweet. The results highlight the inherent challenges in resolving fine-grained places accurately. To adhere to Twitter’s rules and regulations, the corpus is available upon request.
elib-URL des Eintrags: | https://elib.dlr.de/208939/ | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Zeitschriftenbeitrag | ||||||||||||||||||||||||||||||||
Titel: | DLRGeoTweet: A comprehensive social media geocoding corpus featuring fine-grained places | ||||||||||||||||||||||||||||||||
Autoren: |
| ||||||||||||||||||||||||||||||||
Datum: | 1 Juli 2024 | ||||||||||||||||||||||||||||||||
Erschienen in: | Information Processing and Management | ||||||||||||||||||||||||||||||||
Referierte Publikation: | Ja | ||||||||||||||||||||||||||||||||
Open Access: | Ja | ||||||||||||||||||||||||||||||||
Gold Open Access: | Nein | ||||||||||||||||||||||||||||||||
In SCOPUS: | Ja | ||||||||||||||||||||||||||||||||
In ISI Web of Science: | Ja | ||||||||||||||||||||||||||||||||
DOI: | 10.1016/j.ipm.2024.103742 | ||||||||||||||||||||||||||||||||
Verlag: | Elsevier | ||||||||||||||||||||||||||||||||
ISSN: | 0306-4573 | ||||||||||||||||||||||||||||||||
Status: | veröffentlicht | ||||||||||||||||||||||||||||||||
Stichwörter: | Annotated Twitter corpus; Geoparsing; Geocoding; Toponym resolution; Toponym disambiguation; Fine-grained places. | ||||||||||||||||||||||||||||||||
HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||||||||||||||||||
HGF - Programm: | Raumfahrt | ||||||||||||||||||||||||||||||||
HGF - Programmthema: | Technik für Raumfahrtsysteme | ||||||||||||||||||||||||||||||||
DLR - Schwerpunkt: | Raumfahrt | ||||||||||||||||||||||||||||||||
DLR - Forschungsgebiet: | R SY - Technik für Raumfahrtsysteme | ||||||||||||||||||||||||||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | R - Big Data und KI für die Entscheidungsunterstützung, D - OpenSearch@DLR | ||||||||||||||||||||||||||||||||
Standort: | Jena | ||||||||||||||||||||||||||||||||
Institute & Einrichtungen: | Institut für Datenwissenschaften > Datengewinnung und -mobilisierung | ||||||||||||||||||||||||||||||||
Hinterlegt von: | Hu, Xuke | ||||||||||||||||||||||||||||||||
Hinterlegt am: | 19 Dez 2024 10:47 | ||||||||||||||||||||||||||||||||
Letzte Änderung: | 19 Dez 2024 10:47 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags