Use the corpus
The corpus is designed to be useful in several ways, sorted from least to most setup. Pick whichever fits your workflow.
1. Browse this site
The simplest mode. Every paper has a stable URL: /papers/<id>/. Tag indexes (censors, techniques, defenses) let you walk the field by axis. The whole site rebuilds from the YAML on every push to main; whatever you see here matches the source repo.
2. Read the YAML directly
Every paper is a small YAML file in corpus/papers/. The JSON schema documents every field. The taxonomy documents the controlled-vocabulary IDs that tag fields use. If you're building your own tooling on top of the corpus, this is the most boring, most stable interface — clone the repo, walk the directory.
git clone https://github.com/getlantern/circumvention-corpus
cd circumvention-corpus
ls corpus/papers/ # one YAML per paper
yq '.censors' corpus/papers/2023-wu-fully-encrypted-detect.yaml
3. Run the MCP server (recommended)
The most powerful mode: an LLM can query the corpus on demand and compose its results with whatever else it knows. The corpus ships its own MCP server in Go — single binary, zero non-stdlib runtime deps, reads the YAMLs at startup.
Install
git clone https://github.com/getlantern/circumvention-corpus
cd circumvention-corpus
go build -o corpus-mcp ./cmd/corpus-mcp/
# Optional: put it on your PATH so MCP clients can launch it by name.
sudo mv corpus-mcp /usr/local/bin/
Or, if you only want to run the binary without managing a checkout:
go install github.com/getlantern/circumvention-corpus/cmd/corpus-mcp@latest
Register with Claude Code
claude mcp add -s user circumvention-corpus \
/usr/local/bin/corpus-mcp -- --corpus $HOME/code/circumvention-corpus
Replace the --corpus path with wherever you cloned the repo. Verify with claude mcp list; it should show ✓ Connected.
Register with Claude Desktop
Edit your Claude Desktop config:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"circumvention-corpus": {
"command": "/usr/local/bin/corpus-mcp",
"args": ["--corpus", "/Users/you/code/circumvention-corpus"]
}
}
}
Restart Claude Desktop; the server's tools become available in your conversations.
Register with Cursor / VS Code Copilot / other MCP clients
Any MCP-compliant client takes a stdio-launched binary. The shape:
{
"circumvention-corpus": {
"command": "/usr/local/bin/corpus-mcp",
"args": ["--corpus", "/path/to/circumvention-corpus"]
}
}
For VS Code: drop the above into .vscode/mcp.json under a "servers" key. For Cursor: add it via Settings → MCP → Add new MCP server.
What the MCP server exposes
Four tools, designed to compose:
- search_papers
- Keyword + tag-filter search. Filters:
censors,techniques,defenses,year_min,year_max,venue,core_only. Returns ranked records with abstract, tags, and team notes. - get_paper
- Full record for a single paper id. Use after
search_paperswhen the agent needs the full notes / references / metadata. - list_taxonomy
- Returns the controlled vocabulary so the agent knows the canonical IDs to filter on. Especially useful as the first call in a session — gives the model the mental model of the field's structure.
- find_related
- Papers that share tags with a given paper.
mode=same_technique(default),same_censor, orsame_defense.
Example questions the MCP makes easy:
- "Find every paper that evaluates a defense against the GFW's fully-encrypted-traffic detector."
- "What did anyone publish about Iran's censorship in 2024-2025?"
- "For my new protocol design: which papers should I read about active probing?"
- "Show me the citation neighborhood of
2023-wu-fully-encrypted-detect."
4. Public MCP HTTPS endpoint
Not yet live. A read-only HTTPS endpoint at corpus.lantern.io/mcp is on the roadmap so other circumvention-tool teams can plug the corpus into their AI assistants without running anything locally. When it lands, point your MCP client at the HTTPS URL instead of a local binary.
5. Build something on top
The schema is CC0. The metadata is CC0. Build whatever you want with it — your own UI, a notification system that pings you when papers tagged with a specific technique appear, a sister index for a different region. The whole point of having a structured-metadata layer is that the data outlives whatever interface we put on top of it.