No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Abstract
Summary
Refusal mechanisms in LLMs can be exploited through harmless fine-tuning data.

Refusal mechanisms in LLMs can be exploited through harmless fine-tuning data.