The proposed LLM-based censorship detection system plans to use ICLab as the primary dataset for its semantic richness across all network-stack levels, then cross-reference with OONI and Censored Planet to reduce false negatives. The paper explicitly notes ICLab lacks the scale and geographic coverage of OONI/Censored Planet but offers richer per-measurement context suited to LLM feature learning.
From 2024-gao-extended — Extended Abstract: Leveraging Large Language Models to Identify Internet Censorship through Network Data
· §4 Proposed Future Works
· 2024
· Free and Open Communications on the Internet
Implications
Multi-platform cross-referencing (ICLab + OONI + Censored Planet) is the recommended validation strategy for LLM-based censorship detection to prevent platform-specific blind spots from producing false negatives.
Circumvention tool designers building automated blocking-detection pipelines should weight ICLab-style deep per-connection metadata over high-coverage-but-shallow platforms when training classifiers, and use the latter only for geographic breadth validation.