The component-aware binary splitting algorithm (CompAwareBinSplit) requires on average 35.47 messages per article to isolate a sensitive keyword combination — 10.3% as many as the 342.72 required by the previously used algorithm — and is the only evaluated algorithm that correctly handles overlapping keyword components and multiple co-occurring combinations.
From 2019-xiong-efficient — An Efficient Method to Determine which Combination of Keywords Triggered Automatic Filtering of a Message
· §5.4, §6, Table 1
· 2019
· Free and Open Communications on the Internet
Implications
Adopt CompAwareBinSplit (open-sourced at citizenlab/censored-keyword-isolation) for server-side keyword enumeration on WeChat and similar platforms; the 10x message reduction is critical given account banning and phone-number registration costs.
The algorithm's overlap-aware inner loop (advancing from index i+1 rather than j) is necessary to correctly detect components that share characters — a design detail to replicate in any reimplementation.