← Back to Blogs

Skip to main content

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

January 9, 2024 · One min read

Developer Advocate

A preview of the paper

🗣️Persuasive Adversarial Prompting to Jailbreak LLMs with 92% Success Rate

🔒Fascinating new paper breaks down jailbreak prompting to a science!

⏩In Short:

Provide a taxonomy of 40 persuasion prompting techniques
Use this list of 40 techniques they can jailbreak LLMs including GPT4 with a 92% success rate!!
Pretty interestingly Anthropic models are not susceptible at all to PAP attacks!! More advanced models like GPT-4 are more vulnerable to persuasive adversarial prompts (PAPs).
If you can defend against these PAPs this also provides effective protection against other attacks
Test these PAPs to perform attacks covering 14 different risk categories (such as economic harm, etc.)

Blog+Demo: https://chats-lab.github.io/persuasive_jailbreaker/

🔗 arXiv Link

📜 Download paper

Ready to start building?

Check out the Quickstart tutorial, or build amazing apps with a free trial of Weaviate Cloud (WCD).

GitHub

Forum

Slack

X (Twitter)

Don't want to miss another blog post?

Sign up for our bi-weekly newsletter to stay updated!

By submitting, I agree to the Terms of Service and Privacy Policy.