Jailbreak: Difference between revisions

Latest revision as of 01:48, 19 June 2024

Jailbreaking is a classification of attacks that attempts to defeat LLMs' safety-tuning (usually to avoid inappropriate output) by the model provider. Examples include

Manipulating a chatbot into swearing or committing illegal acts
A chatbot divulges personal identifiable information from its training data
A user bypasses the system prompt to make a chatbot transfer money from a bank account

@@ Line 1: / Line 1: @@
 [[Category:LLM security]]
-'''Jailbreaking''' is a classification of attacks that attempts to defeat LLMs' safety-tuning (usually to avoid inappropriate output) by the model provider.
+'''Jailbreaking''' is a classification of attacks that attempts to defeat LLMs' safety-tuning (usually to avoid inappropriate output) by the model provider. Examples include
+* Manipulating a chatbot into swearing or committing illegal acts
+* A chatbot divulges personal identifiable information from its training data
+* A user bypasses the system prompt to make a chatbot transfer money from a bank account

Anonymous

Search

Jailbreak: Difference between revisions

Namespaces

More

Page actions

Latest revision as of 01:48, 19 June 2024

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Jailbreak: Difference between revisions

Latest revision as of 01:48, 19 June 2024

Navigation

Wiki tools

Page tools

Categories