Adversarial Captcha for Breaking MLLM-Powered AI Agents

bron123 · 2025-11-28T08:26:47 1764318407

We introduce the Adversarial Confusion Attack as a new mechanism for protecting websites from MLLM-powered AI Agents. Embedding these “Adversarial CAPTCHAs” into web content pushes models into systemic decoding failures, from confident hallucinations to full incoherence. The perturbations disrupt all white-box models we test and transfer to proprietary systems like GPT-5 in the full-image setting. Technically, the attack uses PGD to maximize next-token entropy across a small surrogate ensemble of MLLMs.

Pranav2612000 · 2025-11-28T08:38:06 1764319086

Interesting! Captchas were built to prevent bots from spamming. Wondering if there's a need of a captcha type mechanism to block LLMs/AI generated slop