Jailbreak: Difference between revisions
From Rice Wiki
(Created page with "Category:LLM security '''Jailbreaking''' is a classification of attacks that attempts to defeat LLMs' safety-tuning (usually to avoid inappropriate output) by the model provider.") |
No edit summary |
||
Line 1: | Line 1: | ||
[[Category:LLM security]] | [[Category:LLM security]] | ||
'''Jailbreaking''' is a classification of attacks that attempts to defeat LLMs' safety-tuning (usually to avoid inappropriate output) by the model provider. | '''Jailbreaking''' is a classification of attacks that attempts to defeat LLMs' safety-tuning (usually to avoid inappropriate output) by the model provider. Examples include | ||
* Manipulating a chatbot into swearing or committing illegal acts | |||
* A chatbot divulges personal identifiable information from its training data | |||
* A user bypasses the system prompt to make a chatbot transfer money from a bank account |
Latest revision as of 01:48, 19 June 2024
Jailbreaking is a classification of attacks that attempts to defeat LLMs' safety-tuning (usually to avoid inappropriate output) by the model provider. Examples include
- Manipulating a chatbot into swearing or committing illegal acts
- A chatbot divulges personal identifiable information from its training data
- A user bypasses the system prompt to make a chatbot transfer money from a bank account