Rule-based censorship detection systems rely on predefined regular expressions designed by human experts and fail to adapt to evolving censor techniques, leading to false negatives and poor scalability as data volume grows. In contrast, learning-based models are described as thriving on large data volumes and offering contextual understanding that rule-based systems lack.
From 2024-gao-extended — Extended Abstract: Leveraging Large Language Models to Identify Internet Censorship through Network Data
· §1 Introduction
· 2024
· Free and Open Communications on the Internet
Implications
Measurement platforms (OONI, ICLab, Censored Planet) should invest in ML-augmented pipelines rather than purely rule-based fingerprinting to reduce false-negative rates as censors evolve.
Circumvention tool operators monitoring for their own blocking should adopt adaptive anomaly detectors rather than static signatures to keep pace with changing censor behavior.