Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty

Hechinger, Katharina und Schweden, Christoph und Zhu, Xiao Xiang und Kauermann, Göran (2025) Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty. Statistical Modelling. SAGE Publications. doi: 10.1177/1471082X251371796. ISSN 1471-082X.

PDF - Preprintversion (eingereichte Entwurfsversion)
9MB

Offizielle URL: https://doi.org/10.1177/1471082X251371796

Kurzfassung

Uncertainty in machine learning models is a timely and vast field of research. In supervised learning, uncertainty can already occur in the first stage of the training process, the annotation phase. This scenario is particularly evident when some instances cannot be definitively classified. In other words, there is inevitable ambiguity in the annotation step and hence, not necessarily a single "ground truth" associated with each instance. The main idea of this work is to drop the assumption of a ground truth label and instead embed the annotations into a multidimensional space. This embedding is derived from the empirical distribution of annotations in a Bayesian setup, modeled via a Dirichlet-Multinomial framework. We estimate the model parameters and posteriors using a stochastic Expectation Maximisation algorithm with Markov Chain Monte Carlo steps. The methods developed in this paper readily extend to various situations where multiple annotators independently label instances. To showcase the generality of the proposed approach, we apply our approach to three benchmark datasets for image classification and Natural Language Inference, where multiple annotations per instance are available. Besides the embeddings, we can investigate the resulting correlation matrices, which reflect the semantic similarities of the original classes very well for all three exemplary datasets.

elib-URL des Eintrags:

https://elib.dlr.de/202329/

Dokumentart:

Zeitschriftenbeitrag

Titel:

Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Hechinger, Katharina	katharina.hechinger (at) stat.uni-muenchen.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT
Schweden, Christoph	Christoph.Koller (at) dlr.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT
Zhu, Xiao Xiang	xiaoxiang.zhu (at) tum.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT
Kauermann, Göran	goeran.kauermann (at) lmu.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT

Datum:

2025

Erschienen in:

Statistical Modelling

Referierte Publikation:

Open Access:

Gold Open Access:

Nein

In SCOPUS:

In ISI Web of Science:

DOI:

10.1177/1471082X251371796

Verlag:

SAGE Publications

ISSN:

1471-082X

Status:

veröffentlicht

Stichwörter:

Annotation Uncertainty, Multiple Labels, Label Variation, Stochastic EM Algorithm, Dirichlet-Multinomial Model, Classification and Clustering

HGF - Forschungsbereich:

Luftfahrt, Raumfahrt und Verkehr

HGF - Programm:

Raumfahrt

HGF - Programmthema:

Erdbeobachtung

DLR - Schwerpunkt:

Raumfahrt

DLR - Forschungsgebiet:

R EO - Erdbeobachtung

DLR - Teilgebiet (Projekt, Vorhaben):

R - Künstliche Intelligenz

Standort:

Oberpfaffenhofen

Institute & Einrichtungen:

Institut für Methodik der Fernerkundung > EO Data Science

Hinterlegt von:

Koller, Christoph

Hinterlegt am:

18 Nov 2025 13:34

Letzte Änderung:

18 Nov 2025 14:20

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags