FINDING · DEFENSE

Simple character-level perturbations (English) and homophone substitutions (Chinese), combined with LLM instruction-following prompts directing the model to use word substitutions in its output, successfully bypassed all input and output filters for all 41 input-blocked and 197 output-blocked queries across five major Chinese LLM services (Baidu-Chat, DeepSeek, Doubao, Kimi, Qwen). Every input-blocked query contained at least one keyword combination that alone triggered the filter, confirming keyword-matching rather than semantic classification.

From 2026-ablove-characterizingCharacterizing the Implementation of Censorship Policies in Chinese LLM Services · §VII-D · 2026 · Network and Distributed System Security

Implications

Tags

censors
cn
techniques
keyword-filtering

Extracted by claude-sonnet-4-6 — review before relying.