2026-mathews-tracing-chain-deep
Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection
canonical link → · arxiv: 2604.08800
2026-mathews-tracing-chain-deep
canonical link → · arxiv: 2604.08800
findings extracted from this paper
At operationally realistic base rates—1 million connection pairs per hour with only 10 true stepping-stone chains—a detector with a 1% FPR generates approximately 10,000 false alarms per hour while correctly flagging all 10 intrusions, making classical statistical methods (which cannot reach FPR ≪ 10⁻²) operationally unusable; deep learning methods must target FPR ≤ 10⁻³ to be viable.
ESPRESSO achieves only TPR 0.132 at FPR ≤ 10⁻³ in network-mode for DNS-tunneled traffic—near chance—compared to TPR 0.992 for SSH traffic at the same threshold. The paper attributes this to the polling-based communication mechanism of dnscat2, which disrupts the timing patterns that interval-based flow correlation relies on.
ESPRESSO, a deep learning flow correlator combining a transformer backbone with time-aligned interval features and online triplet mining, achieves TPR >0.99 at FPR ≤ 10⁻³ for SSH, SOCAT, and ICMP stepping-stone traffic in network-mode detection, versus DCF's TPR of 0.320–0.956 across those same protocols at the same threshold. On the harder mixed-protocol dataset in network-mode, ESPRESSO achieves TPR 0.748 at FPR ≤ 10⁻³, more than double DCF's 0.334.
Ablation experiments show that replacing ESPRESSO's transformer backbone with a CNN ('Modified DCF') while retaining time-aligned interval features achieves performance competitive with the full ESPRESSO model across most protocols (e.g., SOCAT network-mode pAUC 0.997 vs. 0.989 at FPR ≤ 10⁻³), demonstrating that the time-interval feature representation—not the transformer architecture—is the primary driver of correlation accuracy.
A systematic robustness evaluation found that ESPRESSO is highly robust to packet padding alone but that even modest artificial timing jitter causes significant performance degradation, identifying timing-based perturbations as the primary vulnerability of correlation-based stepping-stone (and by extension, anonymity-network) detectors.