Steering Large Language Models towards Political Ideologies on Prompt-Level

Markus, Jost (2024) Steering Large Language Models towards Political Ideologies on Prompt-Level. Masterarbeit, Bielefeld University.

PDF - Nur DLR-intern zugänglich
17MB

Kurzfassung

Large Language Models (LLMs) achieve state-of-the-art performance on a variety of Natural Language Processing (NLP) tasks and are employed in a diverse range of applications. However, studies provide evidence that LLMs follow a particular political ideology. If the ideologies within these models can also be easily manipulated, there is a risk that LLMs will be used as a political tool to promote certain ideologies. In this thesis, we go one step further and investigate the extent to which LLMs are able to adopt ideological biases on prompt-level. The aim of this work is to demonstrate the robustness of LLMs against ideological manipulation. To achieve this, we employ cost-effective methods, namely Prompt Engineering and Retrieval Augmented Generation (RAG), to steer three LLMs (ChatGPT, Mixtral and Qwen1.5 ) towards a target ideology. Within these methods, we ask the LLM to answer various political statements from two multiple-choice ideology tests, the Wahl-O-Mat and the Political Compass Test (PCT). In order to demonstrate that the models are capable of adopting political views found in complex political systems that extend beyond those analyzed in existing work, we utilize a dataset of German manifestos from the most popular German political parties to inject biased contexts from four distinct ideologies. Our findings reveal that the tested models were able to adapt to even indirect, related ideological contexts without the necessity for model training. Nevertheless, the models demonstrated a limited susceptibility to ideologically-driven triggers that were randomly selected. The thesis identifies the risks associated with deliberate and unintentional introduction of political bias on prompt-level and provides future directions for the development of fair AI practices. Code for the framework built to measure political ideologies is available at: https://github.com/j0st/PoliticalLLM. For an online demo, please check out https://huggingface.co/spaces/jost/PoliticalLLM

elib-URL des Eintrags:

https://elib.dlr.de/205192/

Dokumentart:

Hochschulschrift (Masterarbeit)

Zusätzliche Informationen:

Supervised by Roxanne El Baff and Oliver Bensch from IVS-ISS

Titel:

Steering Large Language Models towards Political Ideologies on Prompt-Level

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Markus, Jost	Bielefeld University	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT

Datum:

April 2024

Erschienen in:

Steering Large Language Models towards Political Ideologies on Prompt-Level

Open Access:

Nein

Seitenanzahl:

Status:

veröffentlicht

Stichwörter:

LLM, LLM-impersonation, RAG, prompt-engineering, LLM-ideology

Institution:

Bielefeld University

Abteilung:

Interdisziplinäre Medienwissenschaft

HGF - Forschungsbereich:

keine Zuordnung

HGF - Programm:

keine Zuordnung

HGF - Programmthema:

keine Zuordnung

DLR - Schwerpunkt:

keine Zuordnung

DLR - Forschungsgebiet:

keine Zuordnung

DLR - Teilgebiet (Projekt, Vorhaben):

keine Zuordnung

Standort:

Oberpfaffenhofen

Institute & Einrichtungen:

Institut für Softwaretechnologie > Intelligente und verteilte Systeme
Institut für Softwaretechnologie

Hinterlegt von:

El Baff, Roxanne

Hinterlegt am:

12 Sep 2024 08:39

Letzte Änderung:

16 Sep 2024 13:23

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags