elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Kontakt | English
Schriftgröße: [-] Text [+]

Twitter Stream Clustering for the Identification and Contextualization of Event-related Tweets

Bongard, Jan Heinrich (2020) Twitter Stream Clustering for the Identification and Contextualization of Event-related Tweets. Masterarbeit, Friedrich-Schiller-Universität Jena / DLR Institute of Data Science.

Dies ist die aktuellste Version dieses Eintrags.

[img] PDF - Nur DLR-intern zugänglich
8MB

Kurzfassung

Within a crisis, social media data can provide near-real time information about current events and their development. In particular, Twitter data offers high information value due to a large number of users and easy access to its stream API. Considering the sheer volume of data, it is necessary to filter out the small amount of crisis-relevant tweets, process them, and present them in a manageable way for users. Current approaches focus on the use of pre-trained supervised models that decide on the relevance of each tweet. However, the supervised models are highly dependent on the training data used and are only partially transferable to new event types. Compute-intensive unsupervised methods filter the data stream for data reduction based on keywords, hashtags, or named entities so that much information is excluded before it has been analyzed. In contrast to these approaches, this work focuses on analyzing the stream's complete content by combining an interval-based parametric Chinese Restaurant Process clustering with a pre-trained feed-forward neural network. The stream is divided into intervals and each tweet is projected into a numerical vector by pre-trained sentence embeddings. The embeddings are then grouped by clustering. At the end of each interval, the pre-trained neural network decides whether a cluster contains crisis relevant tweets. With a further developed concept of cluster chains and central centroids, crisis relevant clusters of different intervals can be synchronized. This hybrid methodology is validated in three steps. It is evaluated how well the selected clustering can distinguish tweets from crisis relevant categories and is compared to a non-parametric cluster approach. Moreover, the applicability of the proposed method to the Twitter stream is quantitatively evaluated. Finally, the information content of several clusters will be visualized and investigated by a qualitative validation. It is shown that the hybrid approach can significantly improve the results of pre-trained supervised methods. This is especially true for categories in which the supervised methods could not be sufficiently pre-trained due to missing labels. In addition, the semantic grouping of tweets offers a flexible and customizable procedure, resulting in an overview of topic-specific stream content. The proposed method can distinguish between parallel events and sub-events. With the proposed hybrid method, the automated detection and analysis of crisis-relevant tweets can be improved.

elib-URL des Eintrags:https://elib.dlr.de/140299/
Dokumentart:Hochschulschrift (Masterarbeit)
Titel:Twitter Stream Clustering for the Identification and Contextualization of Event-related Tweets
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Bongard, Jan Heinrichjan.bongard (at) dlr.deNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
Datum:2020
Erschienen in:Twitter Stream Clustering for the Identification and Contextualization of Event-related Tweets
Referierte Publikation:Ja
Open Access:Ja
Seitenanzahl:67
Status:veröffentlicht
Stichwörter:Crisis informatics, semantic clustering, Twitter stream, supervised and unsupervised methods
Institution:Friedrich-Schiller-Universität Jena / DLR Institute of Data Science
Abteilung:Department of Geography / Citizen Science
HGF - Forschungsbereich:Luftfahrt, Raumfahrt und Verkehr
HGF - Programm:Raumfahrt
HGF - Programmthema:keine Zuordnung
DLR - Schwerpunkt:Raumfahrt
DLR - Forschungsgebiet:R - keine Zuordnung
DLR - Teilgebiet (Projekt, Vorhaben):R - keine Zuordnung, R - Erforschung Bürgerwissenschaftlicher Methoden, R - QS-Projekt_04 Big-Data-Plattform
Standort: Jena
Institute & Einrichtungen:Institut für Datenwissenschaften > Bürgerwissenschaften
Hinterlegt von: Kersten, Dr.-Ing. Jens
Hinterlegt am:12 Jan 2021 10:15
Letzte Änderung:30 Nov 2021 14:27

Verfügbare Versionen dieses Eintrags

  • Twitter Stream Clustering for the Identification and Contextualization of Event-related Tweets. (deposited 12 Jan 2021 10:15) [Gegenwärtig angezeigt]

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.