elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Term-Community-Based Topic Detection with Variable Resolution

Hamm, Andreas and Odrowski, Simon (2021) Term-Community-Based Topic Detection with Variable Resolution. Information, 12 (6). Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/info12060221. ISSN 2078-2489.

[img] PDF - Published version
634kB

Official URL: https://www.mdpi.com/2078-2489/12/6/221

Abstract

Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present in detail a method that is especially designed with the requirements of domain experts in mind. Like similar methods, it employs community detection in term co-occurrence graphs, but it is enhanced by including a resolution parameter that can be used for changing the targeted topic granularity. We also establish a term ranking and use semantic word-embedding for presenting term communities in a way that facilitates their interpretation. We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations of detected topics at various resolutions. A comparison with topics detected by Latent Dirichlet Allocation is also included. Finally, we discuss factors that influence topic interpretation.

Item URL in elib:https://elib.dlr.de/142499/
Document Type:Article
Title:Term-Community-Based Topic Detection with Variable Resolution
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Hamm, AndreasUNSPECIFIEDhttps://orcid.org/0000-0001-5854-851XUNSPECIFIED
Odrowski, SimonUNSPECIFIEDhttps://orcid.org/0000-0001-9050-1711UNSPECIFIED
Date:23 May 2021
Journal or Publication Title:Information
Refereed publication:Yes
Open Access:Yes
Gold Open Access:Yes
In SCOPUS:Yes
In ISI Web of Science:Yes
Volume:12
DOI:10.3390/info12060221
Publisher:Multidisciplinary Digital Publishing Institute (MDPI)
ISSN:2078-2489
Status:Published
Keywords:text mining; natural language processing; topic modeling; term ranking; community detection; corpus analysis; word embeddings
HGF - Research field:other
HGF - Program:other
HGF - Program Themes:other
DLR - Research area:no assignment
DLR - Program:no assignment
DLR - Research theme (Project):no assignment
Location: Köln-Porz
Institutes and Institutions:Think Tank
Deposited By: Hamm, Dr. Andreas
Deposited On:31 May 2021 15:21
Last Modified:23 Oct 2023 09:52

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.