elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Barrierefreiheit | Kontakt | English
Schriftgröße: [-] Text [+]

The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation

Diallo, Diaoulé und Dworatzyk, Katharina und Jentzsch, Sophie Freya und Schütt, Peer und Theis, Sabine und Hecking, Tobias (2025) The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation. IEEE Access. IEEE - Institute of Electrical and Electronics Engineers. doi: 10.1109/ACCESS.2025.3628500. ISSN 2169-3536.

[img] PDF - Verlagsversion (veröffentlichte Fassung)
1MB

Offizielle URL: https://ieeexplore.ieee.org/document/11224465

Kurzfassung

Controlling the behavior of large language models (LLMs) at inference time is essential for aligning outputs with human abilities and safety requirements. Activation steering provides a lightweight alternative to prompt engineering and fine-tuning by directly modifying internal activations to guide generation. This research advances the literature in three significant directions. First, while previous work demonstrated the technical feasibility of steering emotional tone using automated classifiers, this paper presents the first human evaluation of activation steering concerning the emotional tone of LLM outputs, collecting over 7,000 crowd-sourced ratings from 190 participants via Prolific (n = 190). These ratings assess both perceived emotional intensity and overall text quality. Second, we find strong alignment between human and model-based quality ratings (mean r = 0.776, range 0.157–0.985), indicating automatic scoring can proxy perceived quality. Moderate steering strengths (λ ≈ 0.15) reliably amplify target emotions while preserving comprehensibility, with the strongest effects for disgust (η2p = 0.616) and fear (η2 p = 0.540), and minimal effects for surprise (η2 p = 0.042). Finally, upgrading from Alpaca to LlaMA-3 yielded more consistent steering with significant effects across emotions and strengths (all p < 0.001). Inter-rater reliability was high (ICC = 0.71–0.87), underscoring the robustness of the findings. These findings support activation-based control as a scalable method for steering LLM behavior across affective dimensions.

elib-URL des Eintrags:https://elib.dlr.de/218629/
Dokumentart:Zeitschriftenbeitrag
Titel:The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Diallo, Diaoulédiaoule.diallo (at) dlr.dehttps://orcid.org/0000-0001-9226-0050197149504
Dworatzyk, KatharinaKatharina.Dworatzyk (at) dlr.dehttps://orcid.org/0000-0002-4927-1464197149506
Jentzsch, Sophie FreyaSophie.Jentzsch (at) dlr.dehttps://orcid.org/0000-0001-6217-8814NICHT SPEZIFIZIERT
Schütt, Peerpeer.schuett (at) dlr.dehttps://orcid.org/0000-0002-6513-5235NICHT SPEZIFIZIERT
Theis, Sabinesabine.theis (at) dlr.dehttps://orcid.org/0000-0002-3422-3734197149507
Hecking, TobiasTobias.Hecking (at) dlr.dehttps://orcid.org/0000-0003-0833-7989197149508
Datum:3 November 2025
Erschienen in:IEEE Access
Referierte Publikation:Ja
Open Access:Ja
Gold Open Access:Ja
In SCOPUS:Ja
In ISI Web of Science:Ja
DOI:10.1109/ACCESS.2025.3628500
Verlag:IEEE - Institute of Electrical and Electronics Engineers
ISSN:2169-3536
Status:veröffentlicht
Stichwörter:Activation engineering, controllable text generation, emotion control, human evaluation, large language models, style vectors
HGF - Forschungsbereich:Luftfahrt, Raumfahrt und Verkehr
HGF - Programm:Raumfahrt
HGF - Programmthema:Technik für Raumfahrtsysteme
DLR - Schwerpunkt:Raumfahrt
DLR - Forschungsgebiet:R SY - Technik für Raumfahrtsysteme
DLR - Teilgebiet (Projekt, Vorhaben):R - Kollaboration von Luftfahrt-Operateuren und KI-Systemen
Standort: Köln-Porz
Institute & Einrichtungen:Institut für Softwaretechnologie > Intelligente und verteilte Systeme
Institut für Softwaretechnologie
Hinterlegt von: Diallo, Diaoulé
Hinterlegt am:17 Nov 2025 12:25
Letzte Änderung:17 Nov 2025 12:25

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
OpenAIRE Validator logo electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.