Prompt injection: Difference between revisions
From Rice Wiki
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
[[Category: | [[Category:LLM security]] | ||
A '''prompt [[injection]]''' attack involves a user injecting a malicious | A '''prompt [[injection]]''' attack involves a user injecting a malicious | ||
Line 12: | Line 12: | ||
* [[StruQ]] rejects all user instructions | * [[StruQ]] rejects all user instructions | ||
* [[Instruction hierarchy]] rejects user instructions that are misaligned with the system prompt | * [[Instruction hierarchy]] rejects user instructions that are misaligned with the system prompt | ||
= Difference from jailbreaking = | |||
[[Jailbreak]] is a similar but different classification of attacks. Instead of attacking the application, it targets the model to produce inappropriate output. |
Revision as of 20:47, 23 May 2024
A prompt injection attack involves a user injecting a malicious
instruction in an LLM-integrated application, in which user input was
intended to act as only data.
Vulnerability
Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed.
Defense strategies
- StruQ rejects all user instructions
- Instruction hierarchy rejects user instructions that are misaligned with the system prompt
Difference from jailbreaking
Jailbreak is a similar but different classification of attacks. Instead of attacking the application, it targets the model to produce inappropriate output.