Brown et al. (2023) combined supervised ML models trained on expert-labeled data with unsupervised models establishing a baseline of 'normal' behavior to detect DNS-based censorship from Satellite and OONI datasets, achieving high true-positive rates for both known and new DNS censorship instances. The hybrid supervised/unsupervised approach is proposed as a template for the LLM-based system.
From 2024-gao-extended — Extended Abstract: Leveraging Large Language Models to Identify Internet Censorship through Network Data
· §2 Related Works
· 2024
· Free and Open Communications on the Internet
Implications
A hybrid supervised + unsupervised architecture for DNS censorship detection — labeled known-censorship events for supervised training, unlabeled traffic for anomaly baselines — provides a practical blueprint for production censorship-alerting systems.
DNS blocking detectors should cross-reference Satellite and OONI datasets to validate findings and reduce false positives when classifying new censorship events.