Prompt injection: Difference between revisions
From Rice Wiki
No edit summary |
No edit summary |
||
Line 2: | Line 2: | ||
A '''prompt [[injection]]''' attack involves a user injecting a malicious | A '''prompt [[injection]]''' attack involves a user injecting a malicious | ||
instruction in an LLM-integrated application, in which user input was | instruction in an [[LLM-integrated application]], in which user input was | ||
intended to act as only data. | intended to act as only data. | ||
= Vulnerability = | |||
Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed. | |||
= Defense strategies = | = Defense strategies = | ||
* [[StruQ]] rejects all user instructions | * [[StruQ]] rejects all user instructions | ||
* [[Instruction hierarchy]] rejects user instructions that are misaligned with the system prompt | * [[Instruction hierarchy]] rejects user instructions that are misaligned with the system prompt |
Revision as of 20:28, 23 May 2024
A prompt injection attack involves a user injecting a malicious
instruction in an LLM-integrated application, in which user input was
intended to act as only data.
Vulnerability
Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed.
Defense strategies
- StruQ rejects all user instructions
- Instruction hierarchy rejects user instructions that are misaligned with the system prompt