Use the corpus

The corpus is designed to be useful in several ways, sorted from least to most setup. Pick whichever fits your workflow.

1. Browse this site

The simplest mode. Every paper has a stable URL: /papers/<id>/. Tag indexes (censors, techniques, defenses) let you walk the field by axis. The whole site rebuilds from the YAML on every push to main; whatever you see here matches the source repo.

2. Read the YAML directly

Every paper is a small YAML file in corpus/papers/. The JSON schema documents every field. The taxonomy documents the controlled-vocabulary IDs that tag fields use. If you're building your own tooling on top of the corpus, this is the most boring, most stable interface — clone the repo, walk the directory.

git clone https://github.com/getlantern/circumvention-corpus
cd circumvention-corpus
ls corpus/papers/                       # one YAML per paper
yq '.censors' corpus/papers/2023-wu-fully-encrypted-detect.yaml

3. Run the MCP server (recommended)

The most powerful mode: an LLM can query the corpus on demand and compose its results with whatever else it knows. The corpus ships its own MCP server in Go — single binary, zero non-stdlib runtime deps, reads the YAMLs at startup.

Install

git clone https://github.com/getlantern/circumvention-corpus
cd circumvention-corpus
go build -o corpus-mcp ./cmd/corpus-mcp/

# Optional: put it on your PATH so MCP clients can launch it by name.
sudo mv corpus-mcp /usr/local/bin/

Or, if you only want to run the binary without managing a checkout:

go install github.com/getlantern/circumvention-corpus/cmd/corpus-mcp@latest

Register with Claude Code

claude mcp add -s user circumvention-corpus \
  /usr/local/bin/corpus-mcp -- --corpus $HOME/code/circumvention-corpus

Replace the --corpus path with wherever you cloned the repo. Verify with claude mcp list; it should show ✓ Connected.

Register with Claude Desktop

Edit your Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "circumvention-corpus": {
      "command": "/usr/local/bin/corpus-mcp",
      "args": ["--corpus", "/Users/you/code/circumvention-corpus"]
    }
  }
}

Restart Claude Desktop; the server's tools become available in your conversations.

Register with Cursor / VS Code Copilot / other MCP clients

Any MCP-compliant client takes a stdio-launched binary. The shape:

{
  "circumvention-corpus": {
    "command": "/usr/local/bin/corpus-mcp",
    "args": ["--corpus", "/path/to/circumvention-corpus"]
  }
}

For VS Code: drop the above into .vscode/mcp.json under a "servers" key. For Cursor: add it via Settings → MCP → Add new MCP server.

What the MCP server exposes

Four tools, designed to compose:

search_papers: Keyword + tag-filter search. Filters: censors, techniques, defenses, year_min, year_max, venue, core_only. Returns ranked records with abstract, tags, and team notes.
get_paper: Full record for a single paper id. Use after search_papers when the agent needs the full notes / references / metadata.
list_taxonomy: Returns the controlled vocabulary so the agent knows the canonical IDs to filter on. Especially useful as the first call in a session — gives the model the mental model of the field's structure.
find_related: Papers that share tags with a given paper. mode = same_technique (default), same_censor, or same_defense.

Example questions the MCP makes easy:

"Find every paper that evaluates a defense against the GFW's fully-encrypted-traffic detector."
"What did anyone publish about Iran's censorship in 2024-2025?"
"For my new protocol design: which papers should I read about active probing?"
"Show me the citation neighborhood of 2023-wu-fully-encrypted-detect."

4. Public MCP HTTPS endpoint

Not yet live. A read-only HTTPS endpoint at corpus.lantern.io/mcp is on the roadmap so other circumvention-tool teams can plug the corpus into their AI assistants without running anything locally. When it lands, point your MCP client at the HTTPS URL instead of a local binary.

5. Build something on top

The schema is CC0. The metadata is CC0. Build whatever you want with it — your own UI, a notification system that pings you when papers tagged with a specific technique appear, a sister index for a different region. The whole point of having a structured-metadata layer is that the data outlives whatever interface we put on top of it.