Prompt injection: Difference between revisions

From Rice Wiki
No edit summary
Line 8: Line 8:


Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed.
Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed.
= Attacks =
* Naive attack simply inject additional instruction
* Ignore attacks tells the model to ignore previous instructions
* Escape character attacks "deletes" previous instructions


= Defense strategies =
= Defense strategies =

Revision as of 20:55, 23 May 2024


A prompt injection attack involves a user injecting a malicious instruction in an LLM-integrated application, in which user input was intended to act as only data.

Vulnerability

Prompt injection exploits the single-channel nature of LLM's, where user prompts and system prompts are simply concatenated together and processed.

Attacks

  • Naive attack simply inject additional instruction
  • Ignore attacks tells the model to ignore previous instructions
  • Escape character attacks "deletes" previous instructions

Defense strategies

Difference from jailbreaking

Jailbreak is a similar but different classification of attacks. Instead of attacking the application, it targets the model to produce inappropriate output.