FINDING · DETECTION

ICLab's semi-automated block page discovery — combining HTML tag-frequency vector clustering with locality-sensitive hashing (LSH) of page text — identified 48 previously unknown block page signatures from 13 countries: 15 via structural clustering across 5 countries and 33 via textual similarity clustering across 8 countries. The system seeds from 308 manually verified regular expressions and uses a URL-to-country ratio sort (largest ratio discovered: 286) to prioritize candidates for manual review, eliminating reliance on brittle hand-maintained regex lists alone.

From 2020-niaki-iclabICLab: A Global, Longitudinal Internet Censorship Measurement Platform · §IV-C · 2020 · Symposium on Security \& Privacy

Implications

Tags

techniques
measurement-platformpacket-injection

Extracted by claude-sonnet-4-6 — review before relying.