2025-berke-unique-whose-web
findings extracted from this paper
-
Simulating a shift from a 0% to 100% male dataset sample changes Shannon entropy estimates by more than 10% for User-Agent (downward) and more than 68% for WebGL Renderer (upward), revealing that prior large fingerprinting studies — Panopticlick (83.6–94.2% unique, predominantly reached via tech-oriented channels) and AmIUnique (90% desktop unique) — likely misrepresent real-world risk due to uncontrolled male bias, as confirmed by a directly comparable study showing 76.5% male participants.
-
Approximately 60% of users in the 8,400-participant US dataset had a unique overall browser fingerprint when combining 13 standard attributes, matching FingerprintJS's advertised 60% accuracy. Fingerprinting risk followed strict monotonic trends: uniqueness increased with age (65+ group most at risk) and decreased with income (household income under $25,000 group at greatest risk), while males showed more unique overall fingerprints but females showed higher uniqueness on passive-fingerprintable attributes (User-Agent, Languages).
-
A simple three-hidden-layer MLP trained on only 13 standard browser attributes achieves AUROC above 0.5 for every tested demographic group: gender 0.663–0.679, age 55+ 0.644, Hispanic ethnicity 0.60, Asian race 0.698, Black race 0.677, and high-income bracket 0.617. Because the model used only attributes already collected by mainstream fingerprinting scripts (e.g., FingerprintJS), richer real-world attribute sets would yield substantially higher demographic inference accuracy.
-
User-Agent and Accept-Language browser attributes are transmitted in HTTP request headers, enabling passive server-side fingerprinting without JavaScript execution or any browser-detectable signal. In the 8,400-user dataset, the Languages attribute placed Hispanic users (who represent only 11% of the sample) among more than 45% of users with 'es-US' as their Languages value, substantially reducing their anonymity set size versus the general population.
-
Screen resolution (572 distinct values, 4.5% unique, entropy 5.51), WebGL Unmasked Renderer (654 distinct values, 3.2% unique, entropy 6.833), and User-Agent (434 distinct values, 2.8% unique, entropy 4.613) are simultaneously the most uniquely identifying individual attributes and the strongest demographic predictors by normalized mutual information across all five demographic categories tested (gender, age, income, Hispanic ethnicity, race).