FINDING · EVALUATION

Category-level analysis of 100 statements across 5 sensitive content categories found that interface-based moderation gaps vary significantly by topic. Sexuality showed the strongest WebUI/API gap (WebUI 7.0× more likely to be moderated than API per GPT-4o judge for Gemini). Political ideology followed at 2.0×, then hate speech at 1.0×. Miscellaneous offensive topics showed the inverse pattern (API more moderated at 0.3×). Religious content showed WebUI moderation with no API moderation. The pattern suggests public-facing WebUI interfaces prioritize reputational risk management for high-scrutiny categories.

From 2026-lipphardt-dual — Dual Standards: Examining Content Moderation Disparities Between API and WebUI Interfaces in Large Language Models · §4.5, Figure 8 · 2026 · Free and Open Communications on the Internet

Implications

Category-specific moderation gaps (notably sexuality and political ideology) mean that information access disparities are not uniform; circumvention-tool operators should test their LLM-assisted moderation systems against all sensitive categories, not only the most obvious ones.

Implications

Tags