2024-ahmed-extended
findings extracted from this paper
-
The paper proposes a black-box methodology for detecting censorship bias in LLMs by comparing responses to identical prompts in Simplified vs. Traditional Chinese — scripts for the same spoken language — controlling for translation quality while exploiting that Simplified Chinese training data is disproportionately sourced from mainland China's censored internet. Each prompt is repeated ten times and scored for similarity to censored text using an XLM-RoBERTa classifier fine-tuned on Baidu Baike (censored) vs. Chinese Wikipedia (uncensored) with scores from 0 to 1.
-
Of 326 websites known to adhere to CCP censorship laws — including Chinese government sites and state media — 325 were found indexed in the Common Crawl dataset commonly used to train major LLMs including GPT-3. Only the official government site of Macao (www.gov.mo) was absent, indicating that LLM training corpora are broadly contaminated with CCP-censored content.
-
Because LLMs such as ChatGPT (over 100 million weekly active users) reflect CCP information-control requirements when prompted in Simplified Chinese, they effectively export Chinese domestic censorship to diaspora communities and non-China-based Chinese speakers worldwide — extending the reach of information manipulation beyond any jurisdiction where Chinese censorship law applies.
-
Exploratory testing of GPT-3.5 Turbo showed significant response divergence between Simplified and Traditional Chinese prompts on politically sensitive topics. Simplified responses glossed over or omitted details on Tiananmen Square casualties, Uyghur genocide allegations, Taiwan's sovereignty status, and Xi Jinping's human rights record; Traditional Chinese responses described these topics in substantially more critical and detailed terms.