FINDING · DETECTION

A 598-term sensitive-keyword blacklist (sourced from Wikipedia and China Digital Times) achieved only 53% classification accuracy on Weibo censorship — below the 66% achieved by punctuation features alone — and appeared in only 31 of 152 uncensored posts versus 60 of 192 censored posts, confirming keywords are not the primary driver of platform censorship decisions.

From 2018-ng-detecting — Detecting Censorable Content on Sina Weibo: A Pilot Study · §4.2, §6, Table 1 · 2018 · Hellenic Conference on Artificial Intelligence

Implications

Do not model platform censorship risk using published keyword blacklists alone; they capture at most a weak secondary signal and will miss the majority of removal decisions.
Homophone or synonym substitution for known sensitive terms addresses only the weakest feature; stylistic rewriting toward formal, objective register provides stronger evasion.

Implications

Tags