Jailbreak

From Rice Wiki

Revision as of 01:48, 19 June 2024 by Rice (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jailbreaking is a classification of attacks that attempts to defeat LLMs' safety-tuning (usually to avoid inappropriate output) by the model provider. Examples include

Manipulating a chatbot into swearing or committing illegal acts
A chatbot divulges personal identifiable information from its training data
A user bypasses the system prompt to make a chatbot transfer money from a bank account

Retrieved from "http://ricefriedegg.com:80/mediawiki/index.php?title=Jailbreak&oldid=941"

LLM security