Diallo, Diaoulé und Dworatzyk, Katharina und Jentzsch, Sophie Freya und Schütt, Peer und Theis, Sabine und Hecking, Tobias (2025) The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation. IEEE Access. IEEE - Institute of Electrical and Electronics Engineers. doi: 10.1109/ACCESS.2025.3628500. ISSN 2169-3536.
|
PDF
- Verlagsversion (veröffentlichte Fassung)
1MB |
Offizielle URL: https://ieeexplore.ieee.org/document/11224465
Kurzfassung
Controlling the behavior of large language models (LLMs) at inference time is essential for aligning outputs with human abilities and safety requirements. Activation steering provides a lightweight alternative to prompt engineering and fine-tuning by directly modifying internal activations to guide generation. This research advances the literature in three significant directions. First, while previous work demonstrated the technical feasibility of steering emotional tone using automated classifiers, this paper presents the first human evaluation of activation steering concerning the emotional tone of LLM outputs, collecting over 7,000 crowd-sourced ratings from 190 participants via Prolific (n = 190). These ratings assess both perceived emotional intensity and overall text quality. Second, we find strong alignment between human and model-based quality ratings (mean r = 0.776, range 0.157–0.985), indicating automatic scoring can proxy perceived quality. Moderate steering strengths (λ ≈ 0.15) reliably amplify target emotions while preserving comprehensibility, with the strongest effects for disgust (η2p = 0.616) and fear (η2 p = 0.540), and minimal effects for surprise (η2 p = 0.042). Finally, upgrading from Alpaca to LlaMA-3 yielded more consistent steering with significant effects across emotions and strengths (all p < 0.001). Inter-rater reliability was high (ICC = 0.71–0.87), underscoring the robustness of the findings. These findings support activation-based control as a scalable method for steering LLM behavior across affective dimensions.
| elib-URL des Eintrags: | https://elib.dlr.de/218629/ | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dokumentart: | Zeitschriftenbeitrag | ||||||||||||||||||||||||||||
| Titel: | The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation | ||||||||||||||||||||||||||||
| Autoren: |
| ||||||||||||||||||||||||||||
| Datum: | 3 November 2025 | ||||||||||||||||||||||||||||
| Erschienen in: | IEEE Access | ||||||||||||||||||||||||||||
| Referierte Publikation: | Ja | ||||||||||||||||||||||||||||
| Open Access: | Ja | ||||||||||||||||||||||||||||
| Gold Open Access: | Ja | ||||||||||||||||||||||||||||
| In SCOPUS: | Ja | ||||||||||||||||||||||||||||
| In ISI Web of Science: | Ja | ||||||||||||||||||||||||||||
| DOI: | 10.1109/ACCESS.2025.3628500 | ||||||||||||||||||||||||||||
| Verlag: | IEEE - Institute of Electrical and Electronics Engineers | ||||||||||||||||||||||||||||
| ISSN: | 2169-3536 | ||||||||||||||||||||||||||||
| Status: | veröffentlicht | ||||||||||||||||||||||||||||
| Stichwörter: | Activation engineering, controllable text generation, emotion control, human evaluation, large language models, style vectors | ||||||||||||||||||||||||||||
| HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||||||||||||||
| HGF - Programm: | Raumfahrt | ||||||||||||||||||||||||||||
| HGF - Programmthema: | Technik für Raumfahrtsysteme | ||||||||||||||||||||||||||||
| DLR - Schwerpunkt: | Raumfahrt | ||||||||||||||||||||||||||||
| DLR - Forschungsgebiet: | R SY - Technik für Raumfahrtsysteme | ||||||||||||||||||||||||||||
| DLR - Teilgebiet (Projekt, Vorhaben): | R - Kollaboration von Luftfahrt-Operateuren und KI-Systemen | ||||||||||||||||||||||||||||
| Standort: | Köln-Porz | ||||||||||||||||||||||||||||
| Institute & Einrichtungen: | Institut für Softwaretechnologie > Intelligente und verteilte Systeme Institut für Softwaretechnologie | ||||||||||||||||||||||||||||
| Hinterlegt von: | Diallo, Diaoulé | ||||||||||||||||||||||||||||
| Hinterlegt am: | 17 Nov 2025 12:25 | ||||||||||||||||||||||||||||
| Letzte Änderung: | 17 Nov 2025 12:25 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags