An empirical study of 100 sensitive statements tested on Gemini (2.5 Flash) and
ChatGPT (GPT-5) found that WebUI interfaces are systematically more restrictive than
their API counterparts. According to GPT-4o judge: WebUI was moderated 18% of the
time vs. 9% (Gemini API) and 13% (ChatGPT API). DeBERTa classifier found 82% of
WebUI responses moderated vs. 58% of API responses. The Gemini WebUI:API ratio
ranged from 2.0:1 (GPT-4o) to 7.0:1 (Claude), and ChatGPT from 1.4:1 (GPT-4o) to
15.6:1 (Claude). Neither Google nor OpenAI discloses these interface-specific policies.
From 2026-lipphardt-dual — Dual Standards: Examining Content Moderation Disparities Between API and WebUI Interfaces in Large Language Models
· §4.3, Table 2
· 2026
· Free and Open Communications on the Internet
Implications
Researchers studying LLM-based censorship detection or using LLMs as evaluation judges must report which interface (API vs. WebUI) was used; results are not interchangeable between interfaces.
Circumvention-tool developers using LLM APIs to evaluate sensitivity of content should expect the API to be less filtered than the WebUI, introducing systematic bias toward under-reporting of moderation.