Without chunk-based padding, an XGBoost classifier identifies the target website from covert data-chunk sizes with 91% accuracy (Tranco top-100). Chunking at 2 MB reduces accuracy to 12% at a 21.3% bandwidth overhead, while 16 MB chunks reduce accuracy to near random guessing at a 480.3% overhead. Chunks as small as 64 KB already reduce accuracy to 64%, demonstrating a monotonic fingerprinting–overhead tradeoff.
From 2026-kamali-huma — Huma: Censorship Circumvention via Web Protocol Tunneling with Deferred Traffic Replacement
· §V-C, Figure 4
· 2026
· Network and Distributed System Security
Implications
Pad and chunk SP-to-DW payloads at a configurable size (2 MB is a practical sweet spot: 12% fingerprinting accuracy for 21.3% overhead); expose this as a tunable parameter so operators can trade bandwidth for protection based on their threat model.
Never transmit variable-length covert payloads without fixed-size padding — even modest chunk sizes provide substantial fingerprinting resistance, so any padding is strictly better than none.