Hamm, Andreas und Odrowski, Simon (2021) Term-community-based topic detection with variable resolution. [sonstige Veröffentlichung]
Es ist eine neuere Version dieses Eintrags verfügbar. |
PDF
552kB |
Offizielle URL: https://arxiv.org/abs/2103.13550
Kurzfassung
Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present in detail a method that is especially designed with the requirements of domain experts in mind. Like similar methods, it employs community detection in term co-occurrence graphs, but it is enhanced by including a resolution parameter that can be used for changing the targeted topic granularity. We also establish a term ranking and use semantic word-embedding for presenting term communities in a way that facilitates their interpretation. We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations of detected topics at various resolutions. A comparison with topics detected by Latent Dirichlet Allocation is also included. Finally, we discuss factors that influence topic interpretation.
elib-URL des Eintrags: | https://elib.dlr.de/141587/ | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dokumentart: | sonstige Veröffentlichung | ||||||||||||
Titel: | Term-community-based topic detection with variable resolution | ||||||||||||
Autoren: |
| ||||||||||||
Datum: | 25 März 2021 | ||||||||||||
Erschienen in: | arXiv.org | ||||||||||||
Referierte Publikation: | Nein | ||||||||||||
Open Access: | Ja | ||||||||||||
Seitenanzahl: | 23 | ||||||||||||
Status: | veröffentlicht | ||||||||||||
Stichwörter: | Text mining, Natural language processing, Topic modeling, Term ranking, Community detection, Corpus analysis, Word embeddings | ||||||||||||
HGF - Forschungsbereich: | keine Zuordnung | ||||||||||||
HGF - Programm: | keine Zuordnung | ||||||||||||
HGF - Programmthema: | keine Zuordnung | ||||||||||||
DLR - Schwerpunkt: | keine Zuordnung | ||||||||||||
DLR - Forschungsgebiet: | keine Zuordnung | ||||||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | keine Zuordnung | ||||||||||||
Standort: | Köln-Porz | ||||||||||||
Institute & Einrichtungen: | Think Tank | ||||||||||||
Hinterlegt von: | Hamm, Dr. Andreas | ||||||||||||
Hinterlegt am: | 29 Mär 2021 12:39 | ||||||||||||
Letzte Änderung: | 29 Mär 2021 12:39 |
Verfügbare Versionen dieses Eintrags
- Term-community-based topic detection with variable resolution. (deposited 29 Mär 2021 12:39) [Gegenwärtig angezeigt]
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags