▶ benchmark · reproducible · integrity-first

Measured, not estimated.

Every token-cost claim on visitportal.dev is produced by Anthropic'scount_tokensAPI. If the measurement disagrees with the pitch, we update the pitch. Never the other way.

Canonical run — tokens-matrix-v1

48 cells · 48 ok · seed 42 · mode count_tokens_only

Started 2026-04-19T08:05:10.635Z, finished 2026-04-19T08:05:29.466Z. Full raw JSON: packages/bench/results/tokens-matrix-v1.json.

Summary

Median input tokens per turn, by tool count, across the matrix:

Tool count	MCP (median input tokens)	Portal	MCP : Portal
10	1,956	172	11.4×
50	7,343	172	42.7×
100	13,929	172	81.0×
400	54,677	172	317.9×

▸ MCP scales linearly at ~137 tokens per preloaded tool in this simulation. Portal stays flat — the manifest is loaded on visit, not preloaded into every turn. Tokenizer parity across Sonnet 4.5 and Opus 4.5 confirmed (byte-identical counts for the same prompt + tool list).

Chart

Reproduce it

export ANTHROPIC_API_KEY=sk-ant-...
pnpm install
BENCH_MODE=count_tokens_only pnpm --filter @visitportal/bench bench
# 48 cells against Anthropic's count_tokens API in ~20s, ~$0.10 total

The bench harness is in packages/bench/. Scenarios live in packages/bench/src/harness/bench.ts; the MCP tool-schema simulator is packages/bench/src/mcp-simulator.ts; the tasks we measure against are in packages/bench/src/tasks/definitions.ts.

Methodology — what we can and can't claim

The simulator generates plausible MCP tool schemas across seven domains (filesystem, github, search, database, http, communication, knowledge), derived from seed tools scraped from the modelcontextprotocol/servers repo. Mean description length ~112 chars; every tool has 1–6 params.

Can claim: for a plausibly-shaped multi-server MCP deployment of N tools, preloaded schema consumes X tokens per turn on Sonnet 4.5 / Opus 4.5, measured by count_tokens. Determinism: same seed → byte-identical tools → byte-identical token counts.

Cannot claim: that every specific real-world deployment is exactly this shape. Real MCP sometimes emits deeply nested JSON Schema ($ref, oneOf, allOf) which we skip — so our MCP number is a conservative lower bound. Full disclosure in packages/bench/METHODOLOGY.md.