Prompt injection: Difference between revisions
From Rice Wiki
No edit summary |
|||
Line 8: | Line 8: | ||
Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed. | Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed. | ||
= Attacks = | |||
* Naive attack simply inject additional instruction | |||
* Ignore attacks tells the model to ignore previous instructions | |||
* Escape character attacks "deletes" previous instructions | |||
= Defense strategies = | = Defense strategies = |
Revision as of 20:55, 23 May 2024
A prompt injection attack involves a user injecting a malicious
instruction in an LLM-integrated application, in which user input was
intended to act as only data.
Vulnerability
Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed.
Attacks
- Naive attack simply inject additional instruction
- Ignore attacks tells the model to ignore previous instructions
- Escape character attacks "deletes" previous instructions
Defense strategies
- StruQ rejects all user instructions
- Instruction hierarchy rejects user instructions that are misaligned with the system prompt
Difference from jailbreaking
Jailbreak is a similar but different classification of attacks. Instead of attacking the application, it targets the model to produce inappropriate output.