FINDING · EVALUATION

Multi-word Chinese phrases as search seeds discover qualitatively different censored sites than individual English words: the phrase 'Chinese human rights violation' surfaces Chinese activist homepages and culture-specific outlets, while individual constituent words return only well-known Western media. TF-IDF scoring against a Chinese corpus ranks culturally rare phrases (e.g., '自由亚洲电台' / Radio Free Asia) as high-signal seeds and discards common filler phrases.

From 2018-hounsel-automaticallyAutomatically Generating a Large, Culture-Specific Blocklist for China · §3.1–3.2 · 2018 · Free and Open Communications on the Internet

Implications

Tags

censors
cn
techniques
keyword-filteringmeasurement-platform

Extracted by claude-sonnet-4-6 — review before relying.