Of 21.8 billion raw measurements, approximately 7% (1.5 billion) were initially flagged as blocked; iterative HTML clustering and DBSCAN image clustering then removed ~500 million false positives, leaving ~1 billion confirmed blocked measurements. The clustering process formed 457 new response clusters, of which 308 were confirmed blockpages and 149 were false positives, with Cloudflare bot-checks being a notable source of false positives in HTTPS measurements.
From 2020-raman-censored — Censored Planet: An Internet-wide, Longitudinal Censorship Observatory
· §5.1.3, §5.1.4
· 2020
· Computer and Communications Security
Implications
When instrumenting circumvention infrastructure for blocking detection, apply multi-step response clustering before acting on blocked-connection signals; raw block flags overcount real censorship by roughly 33% in large-scale deployments.
Treat CDN-side bot checks (Cloudflare, Akamai) as a distinct failure class separate from censor-induced resets, or false positives will inflate apparent blocking rates for HTTPS-based transports.