Rice: Created page with "Category:LLM security Tree-of-Attacks (aka. TAP) is an automated red teaming strategy to generate LLM jailbreak and prompt injection attacks. = Description = TAP consists of two LLMs: an ''attacker'' and a ''judge''. The attacker is given a prompt and benign data and asked to inject the target to output "Hacked!". The judge then scores the attack prompt. Based on the score, the attacker iteratively improves."

2024-05-23T21:09:10Z

Created page with "Category:LLM security Tree-of-Attacks (aka. TAP) is an automated red teaming strategy to generate LLM jailbreak and prompt injection attacks. = Description = TAP consists of two LLMs: an ''attacker'' and a ''judge''. The attacker is given a prompt and benign data and asked to inject the target to output "Hacked!". The judge then scores the attack prompt. Based on the score, the attacker iteratively improves."

New page

[[Category:LLM security]]

Tree-of-Attacks (aka. TAP) is an [[automated red team]]ing strategy to generate LLM [[jailbreak]] and [[prompt injection]] attacks.

= Description =

TAP consists of two LLMs: an ''attacker'' and a ''judge''. The attacker is given a prompt and benign data and asked to inject the target to output "Hacked!". The judge then scores the attack prompt. Based on the score, the attacker iteratively improves.

Tree-of-Attacks - Revision history