FINDING · EVALUATION

Category-level analysis of 100 statements across 5 sensitive content categories found that interface-based moderation gaps vary significantly by topic. Sexuality showed the strongest WebUI/API gap (WebUI 7.0× more likely to be moderated than API per GPT-4o judge for Gemini). Political ideology followed at 2.0×, then hate speech at 1.0×. Miscellaneous offensive topics showed the inverse pattern (API more moderated at 0.3×). Religious content showed WebUI moderation with no API moderation. The pattern suggests public-facing WebUI interfaces prioritize reputational risk management for high-scrutiny categories.

From 2026-lipphardt-dualDual Standards: Examining Content Moderation Disparities Between API and WebUI Interfaces in Large Language Models · §4.5, Figure 8 · 2026 · Free and Open Communications on the Internet

Implications

Tags

censors
generic
techniques
ml-classifierkeyword-filtering

Extracted by claude-sonnet-4-6 — review before relying.