Tree-of-Attacks: Revision history

From Rice Wiki

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

23 May 2024

  • curprev 21:0921:09, 23 May 2024Rice talk contribs 430 bytes +430 Created page with "Category:LLM security Tree-of-Attacks (aka. TAP) is an automated red teaming strategy to generate LLM jailbreak and prompt injection attacks. = Description = TAP consists of two LLMs: an ''attacker'' and a ''judge''. The attacker is given a prompt and benign data and asked to inject the target to output "Hacked!". The judge then scores the attack prompt. Based on the score, the attacker iteratively improves."