The GRP‑Obliteration technique reveals that even mild prompts can reshape internal safety mechanisms, raising oversight concerns as enterprises increasingly fine‑tune open‑weight models with ...
As LLMs and diffusion models power more applications, their safety alignment becomes critical. Our research shows that even minimal downstream fine‑tuning can weaken safeguards, raising a key question ...
That helpful “Summarize with AI” button? It might be secretly manipulating what your AI recommends. Microsoft security researchers have discovered a growing trend of AI memory poisoning attacks used ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results