FINDING · EVALUATION

152 of 5,478 crawled domains (approximately 2.8%) deployed active bot-detection measures—captcha delivery or perimeter protection—that blocked automated OpenWPM crawling entirely. The authors note this disproportionately excludes untrustworthy sites, biasing the training dataset toward well-resourced trustworthy outlets and limiting recall on the untrustworthy class.

From 2025-sivan-sevilla-probingProbing the third-party infrastructure of digital news on the Web · §4 Limitations · 2025 · Free and Open Communications on the Internet

Implications

Tags

censors
generic
techniques
measurement-platform

Extracted by claude-sonnet-4-6 — review before relying.