Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
Abstract
Summary
Analyzing limitations of existing jailbreak defenses and proposing a transcript-classifier approach.

Analyzing limitations of existing jailbreak defenses and proposing a transcript-classifier approach.