FINDING · EVALUATION
A maximum entropy named entity extraction (NEE) model trained on Chinese-language Wikipedia achieved 89.63% recall and 83.44% specificity for person names, 96.3% recall and 69.80% specificity for place names, and 87.56% recall and 88.40% specificity for organization names. Despite 0.42% precision for person names, the system reduces the number of words requiring censorship probes by nearly an order of magnitude while retaining nearly 90% of actual named entities.
From 2011-espinoza-automated — Automated Named Entity Extraction for Tracking Censorship of Current Events · §4.1 · 2011 · Free and Open Communications on the Internet
Implications
- Automated corpus-driven keyword generation can replace manual list curation for censorship measurement, enabling continuous broad probing that keeps pace with current events.
- Maximum entropy NEE trained on Wikipedia is viable for low-resource target languages (Arabic, Farsi, Spanish), making this approach extensible to censorship monitoring in additional regions.
Tags
Extracted by claude-sonnet-4-6 — review before relying.