Typographic Alignment Examples

Jailbreaking AI Censors Via In-Image Text

Researchers claim that leading image editing AIs can be jailbroken through rasterized text and visual cues, allowing prohibited edits to bypass safety filters and succeed in up to 80.9% of cases.

Microsoft

A one-prompt attack that breaks LLM safety alignment

As LLMs and diffusion models power more applications, their safety alignment becomes critical. Our research shows that even minimal downstream fine‑tuning can weaken safeguards, raising a key question ...

GitHub

mawenkai879-cloud/cs-alignment-loss

A PyTorch implementation of distributional alignment loss based on Cauchy-Schwarz (CS) divergence with Kernel Density Estimation (KDE). This module provides a plug-and-play loss function for ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Jailbreaking AI Censors Via In-Image Text

A one-prompt attack that breaks LLM safety alignment

mawenkai879-cloud/cs-alignment-loss

Trending now