elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Kontakt | English
Schriftgröße: [-] Text [+]

Gaussian Processes for One-class and Binary Classification of Crisis-related Tweets

Kersten, Jens und Bongard, Jan und Klan, Friederike (2022) Gaussian Processes for One-class and Binary Classification of Crisis-related Tweets. In: 19th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2022, Seiten 664-673. ISCRAM 2022, 22.-25. Mai 2022, Tarbes, Frankreich. ISBN 978-828427099-9. ISSN 2411-3387.

[img] PDF
312kB

Offizielle URL: http://idl.iscram.org/files/jenskersten/2022/2446_JensKersten_etal2022.pdf

Kurzfassung

The Twitter Stream API offers the possibility to develop (near) real-time methods and applications to detect and monitor impacts of crisis events and their changes over time. As demonstrated by various related research, the content of individual tweets or even entire thematic trends can be utilized to support disaster management, fill information gaps and augment results of satellite-based workflows as well as to extend and improve disaster management databases. Considering the sheer volume of incoming tweets, it is necessary to automatically identify the small number of crisis-relevant tweets and present them in a manageable way. Current approaches for identifying crisis-related content focus on the use of supervised models that decide on the relevance of each tweet individually. Although supervised models can efficiently process the high number of incoming tweets, they have to be extensively pre-trained. Furthermore, the models do not capture the history of already processed messages. During a crisis, various and unique sub-events can occur that are likely to be not covered by the respective supervised model and its training data. Unsupervised learning offers both, to take into account tweets from the past, and a higher adaptive capability, which in turn allows a customization to the specific needs of different disasters. From a practical point of view, drawbacks of unsupervised methods are the higher computational costs and the potential need of user interaction for result interpretation. In order to enhance the limited generalization capabilities of pre-trained models as well as to speed up and guide unsupervised learning, we propose a combination of both concepts. A successive clustering of incoming tweets allows to semantically aggregate the stream data, whereas pre-trained models allow to identify potentially crisis-relevant clusters. Besides the identification of potentially crisis-related content based on semantically aggregated clusters, this approach offers a sound foundation for visualizations, and further related tasks, like event detection as well as the extraction of detailed information about the temporal or spatial development of events. Our work focuses on analyzing the entire freely available Twitter stream by combining an interval-based semantic clustering with an supervised machine learning model for identifying crisis-related messages. The stream is divided into intervals, e.g. of one hour, and each tweet is projected into a numerical vector by using state-of-the-art sentence embeddings. The embeddings are then grouped by a parametric Chinese Restaurant Process clustering. At the end of each interval, a pre-trained feed-forward neural network decides whether a cluster contains crisis-related tweets. With a further developed concept of cluster chains and central centroids, crisis-related clusters of different intervals can be linked in a topic- and even subtopic-related manner. Initial results show that the hybrid approach can significantly improve the results of pre-trained supervised methods. This is especially true for categories in which the supervised model could not be sufficiently pre-trained due to missing labels. In addition, the semantic clustering of tweets offers a flexible and customizable procedure, resulting in a practical summary of topic-specific stream content.

elib-URL des Eintrags:https://elib.dlr.de/187871/
Dokumentart:Konferenzbeitrag (Vortrag)
Titel:Gaussian Processes for One-class and Binary Classification of Crisis-related Tweets
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Kersten, Jensjens.kersten (at) dlr.dehttps://orcid.org/0000-0002-4735-7360NICHT SPEZIFIZIERT
Bongard, Janjan.bongard (at) dlr.dehttps://orcid.org/0000-0001-9453-7391NICHT SPEZIFIZIERT
Klan, FriederikeFriederike.Klan (at) dlr.dehttps://orcid.org/0000-0002-1856-7334NICHT SPEZIFIZIERT
Datum:Mai 2022
Erschienen in:19th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2022
Referierte Publikation:Ja
Open Access:Ja
Gold Open Access:Nein
In SCOPUS:Ja
In ISI Web of Science:Nein
Seitenbereich:Seiten 664-673
Herausgeber:
HerausgeberInstitution und/oder E-Mail-Adresse der HerausgeberHerausgeber-ORCID-iDORCID Put Code
Karray, HediINP-ENIT, FranceNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
De Nicola, AntonioENEA, ItalyNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
Matta, NadaUTT, FranceNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
Purohit, HemantGeorge Mason University, USANICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
Name der Reihe:ISCRAM 2022 Conference Proceedings
ISSN:2411-3387
ISBN:978-828427099-9
Status:veröffentlicht
Stichwörter:Twitter, Natural Disasters, Supervised and Unsupervised Learning, Information Overload Reduction
Veranstaltungstitel:ISCRAM 2022
Veranstaltungsort:Tarbes, Frankreich
Veranstaltungsart:internationale Konferenz
Veranstaltungsdatum:22.-25. Mai 2022
Veranstalter :National School of Engineers of Tarbes and ISCRAM Organisation
HGF - Forschungsbereich:Luftfahrt, Raumfahrt und Verkehr
HGF - Programm:Raumfahrt
HGF - Programmthema:Technik für Raumfahrtsysteme
DLR - Schwerpunkt:Raumfahrt
DLR - Forschungsgebiet:R SY - Technik für Raumfahrtsysteme
DLR - Teilgebiet (Projekt, Vorhaben):R - Erforschung Bürgerwissenschaftlicher Methoden, R - Umwelt, Gesundheit und Big Data
Standort: Jena
Institute & Einrichtungen:Institut für Datenwissenschaften > Datengewinnung und -mobilisierung
Hinterlegt von: Kersten, Dr.-Ing. Jens
Hinterlegt am:07 Nov 2022 13:39
Letzte Änderung:05 Feb 2024 13:37

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.