2026-he-trafficmoe-heterogeneity-aware-mixture
TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification
canonical link → · arxiv: 2603.29520
2026-he-trafficmoe-heterogeneity-aware-mixture
canonical link → · arxiv: 2603.29520
findings extracted from this paper
Routing-guided conditional aggregation (CA) that dynamically weights header versus payload contributions using per-sample MoE routing probabilities outperforms static fusion on all six datasets, demonstrating that the relative discriminative utility of headers versus payloads varies by application type — and that classifiers can adaptively shift reliance to whichever modality is less obfuscated.
Explicitly disentangling packet headers (structured, low-entropy) from encrypted payloads (high-entropy, stochastic) into separate MoE branches yields consistent gains across six datasets: 86.85% F1 on 120-class TLS 1.3 traffic (CSTNET-TLS), 97.88% F1 on USTC-TFC2016 malware/benign flows, and 92.65% F1 on imbalanced IoT traffic (CIC-IoT2022), demonstrating that headers and payloads carry fundamentally different and independently exploitable discriminative signals.
Pretraining on 30 GB of unlabeled mixed traffic via masked language modeling (ISCX-VPN2016 NonVPN, CICIDS2017, WIDE backbone), then fine-tuning, enables TrafficMoE to classify VPN application traffic at 88.72% F1 and VPN service traffic at 92.61% F1, exceeding all fully supervised and prior pretraining baselines without requiring labeled training data for those domains.
TrafficMoE achieves 97.65% accuracy and F1-score on the ISCX-Tor2016 dataset, substantially outperforming all baselines including the best pretraining-based competitor FlowletFormer (91.16% F1), by separately modeling protocol headers and encrypted payloads via dual-branch sparse Mixture-of-Experts rather than treating them as a unified byte stream.
An uncertainty-aware filtering (UF) mechanism quantifies per-token reliability via Shannon entropy of the cross-modal header–payload attention matrix, finding that encrypted payloads still contain low-entropy tokens with stable cross-modal alignment that serve as reliable classification anchors — demonstrating that nominally randomized byte streams retain exploitable low-entropy structure.