Revision as of 20:57, 23 May 2024

A prompt injection attack involves a user injecting a malicious instruction in an LLM-integrated application, in which user input was intended to act as only data.

Vulnerability

Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed.

Attacks

Naive attack simply inject additional instruction
Ignore attacks tells the model to ignore previous instructions
Escape character attacks "deletes" previous instructions
Escape separation attacks "adds space" between system and user
Completion attacks fakes a response to trick the model into thinking a new query is beginning.

Defense strategies

StruQ rejects all user instructions
Instruction hierarchy rejects user instructions that are misaligned with the system prompt

Difference from jailbreaking

Jailbreak is a similar but different classification of attacks. Instead of attacking the application, it targets the model to produce inappropriate output.

@@ Line 14: / Line 14: @@
 * Escape character attacks "deletes" previous instructions
 * Escape separation attacks "adds space" between system and user
+* Completion attacks fakes a response to trick the model into thinking a new query is beginning.
 = Defense strategies =

Anonymous

Search

Prompt injection: Difference between revisions

Namespaces

More

Page actions

Revision as of 20:57, 23 May 2024

Contents

Vulnerability

Attacks

Defense strategies

Difference from jailbreaking

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Prompt injection: Difference between revisions

Revision as of 20:57, 23 May 2024

Vulnerability

Attacks

Defense strategies

Difference from jailbreaking

Navigation

Wiki tools

Page tools

Categories