FreeWave's modem generates audio whose packet-length distribution has dramatically lower variance than human speech, even when transmitted through Skype's variable-bit-rate encoder; Figure 9 shows that English and Portuguese speech samples produce high-variance packet-length sequences while modem audio produces a narrow, nearly constant distribution, providing a reliable passive classifier for modem-over-VoIP traffic. This content mismatch persists even with perfect emulation of the VoIP protocol framing.
From 2013-geddes-cover — Cover Your ACKs: Pitfalls of Covert Channel Censorship Circumvention
· §6, Figure 9
· 2013
· Computer and Communications Security
Implications
Mimicking a cover protocol's packet framing is insufficient; the payload content must produce the same statistical fingerprint (packet-size distribution, entropy, inter-arrival variance) as legitimate cover-protocol payloads — for VoIP covers this means the audio codec output, not just headers.
Systems that inject arbitrary data into a VBR-encoded media stream should apply traffic shaping to match observed natural speech length distributions before transmitting, or should use a CBR codec cover where payload entropy is less discriminative.