Plugins Are Wasting Your Tokens, Thanks Vercel



I was debugging something completely unrelated to web development. Exploring the Claude Code source code, actually. And I noticed something strange: every single prompt I sent was triggering skill injections from a Vercel plugin I had installed. Documentation about SWR, shadcn, AI SDK, sign-in-with-vercel. None of it relevant. All of it eating my context window.
So I did what any engineer would do. I traced it.
The Problem: Lexical Matching in a Semantic World
The Vercel plugin for Claude Code uses a UserPromptSubmit hook that fires on every message you send. It pattern-matches keywords in your prompt text against skill names. The matching is lexical, not semantic. It has no idea what you are actually doing.
Here is what that looks like in practice. I typed a prompt containing the word "vercel" while discussing Claude Code's own architecture. The debug output:
"sign-in-with-vercel" matched: lexical recall (raw 223.7, capped +8.0, source: lexical)
The word "vercel" appeared in my prompt, so the plugin injected documentation about Vercel's OAuth sign-in flow. I was reading TypeScript source code. Other false positives across the session: "feature flagged" triggered vercel-flags. "Architecture" triggered nextjs and v0-dev. "SWR" got auto-suggested alongside other web-stack skills. The matcher doesn't understand context. It sees keywords and fires.
The Cost: 48,000 Tokens of Nothing
I tracked every injection across the session. Six turns. Twelve skill injections. Not a single one was relevant to the task at hand.
Turn 1: ai-sdk, shadcn — roughly 12K tokens. Turn 2: chat-sdk, vercel-flags — 8K tokens. Turn 3: swr — 3K tokens. Turn 4: nextjs, v0-dev — 16K tokens. Turn 5: verification, runtime-cache — 6K tokens. Turn 6: sign-in-with-vercel, vercel-agent — 3K tokens. Total: approximately 48,000 tokens of Vercel documentation injected into a session about Claude Code's own source code.
But the token cost is only half the story. Each injection comes with a mandatory directive: "You must run the Skill() tool." That forces two additional tool calls per turn. Tool calls that load even more irrelevant context. Tool calls that cost time and compute. Across six turns, that is twelve wasted tool invocations.
Why This Matters Beyond One Plugin
This is not a hit piece on Vercel. Their plugin is one of the most comprehensive Claude Code integrations out there, and when you are actually building a Next.js app on Vercel, the skill injections are genuinely useful. The problem is architectural, and it applies to every plugin in the ecosystem.
AI coding assistants have finite context windows. Every token you waste on irrelevant documentation is a token that cannot be used for your actual code, your actual conversation, your actual reasoning. When plugins inject aggressively, they are silently degrading the quality of every response you get. The model has less room to think. Less room to hold your codebase in memory. Less room to plan multi-step changes.
And you have no idea it is happening. There is no "context window usage" meter. No notification that says "a plugin just consumed 16K of your 200K token budget." It just silently fills up, and your responses get a little worse, and you don't know why.
What Plugin Architecture Should Look Like
After tracing this, I wrote down seven things that need to change. Not just for this plugin, but for the plugin model in general.
First, semantic matching over lexical. The word "vercel" appearing in a prompt about Claude Code internals is not the same as the word "vercel" appearing in a prompt about deploying a Next.js app. At minimum, check if the project actually uses the technology before injecting anything. A quick heuristic: does the project have a package.json with next? A vercel.json? A .vercel directory? If none exist, skip injection entirely.
Second, offer skills instead of mandating them. The current "You must run the Skill() tool" directive is extremely expensive. Let the model decide if the skill is relevant. Present it as available, not required.
Third, enforce a session-level budget. The current system caps at 3 skills per hook invocation with an 18KB budget. But across a session, 12 or more unique skills can get injected totaling 48K tokens. There should be a session-level cap of maybe 3 to 5 total skills unless the user explicitly asks for more.
Fourth, project-type gating. Detect the project type on session start. Is this a Next.js project? Enable Next.js skills. Is it deployed on Vercel? Enable deployment skills. Is it neither? Inject nothing.
Fifth, make plugins opt-in per project. A global install that fires everywhere is the wrong default. Let users declare "this project uses Vercel" in their configuration. Without that signal, default to off.
Sixth, lighter payloads. Each skill currently injects 3 to 10K tokens of comprehensive documentation. For injection purposes, a 200 to 300 token summary with a "run the full skill for details" escape hatch would be 95 percent less waste while still being useful when relevant.
Seventh, deduplicate the boilerplate. Every single turn in my session got the same preamble: "MANDATORY: Your training data for these libraries is OUTDATED and UNRELIABLE." That same warning appeared seven times. Once per session is enough.
The Bigger Picture
We are in the early days of plugin ecosystems for AI coding tools. Claude Code has plugins and MCP servers. Cursor has rules and context files. Windsurf has its own approach. Every one of them is going to face this same problem: how do you inject relevant context without drowning the model in noise?
The answer is not "inject everything and let the model sort it out." Context windows are large but not infinite, and every wasted token has a real cost in response quality, latency, and actual dollars. The answer is relevance-first injection. Prove you are needed before you inject. Default to silence, not noise.
The core issue with the current approach is a philosophical one: the plugin assumes every session is a Vercel development session and acts accordingly. It should assume nothing and require evidence of relevance before injecting a single token.
Your context window is precious real estate. Treat it that way.
