FINDING · EVALUATION

Data gaps severely degrade D-LDA accuracy: erasing every other month reduced the corpus from 4,577 to 1,919 documents and caused the model to lose detection of 'Religion-motivated killing,' 'Religious websites,' 'Muslim Violence,' and 'Homicide' topics entirely. Erasing one in three months (1,479 documents) caused further topic loss, and even removing one random month altered topic evolution trajectories. For 25% of pages, the gap between Wayback Machine snapshots and ICLab observations exceeds one year.

From 2022-waheed-darwin-sDarwin's Theory of Censorship: Analysing the Evolution of Censored Topics with Dynamic Topic Models · §3.3, Figures 4–5, Tables 2–3 · 2022 · Workshop on Privacy in the Electronic Society

Implications

Tags

censors
in
techniques
measurement-platform

Extracted by claude-sonnet-4-6 — review before relying.