I Validated 15 Popular MCP Servers. Most Have the Same Blind Spot.
March 25, 2026 · 9 min read
MCP tool definitions are how LLMs decide which tools to call, what arguments to pass, and whether a tool is safe to run without asking. Bad definitions mean tools get ignored, called with wrong arguments, or run destructive operations without confirmation.
I audited 15 popular MCP servers — 75k+ combined GitHub stars — checking how their tool definitions hold up against the WildRun MCP Validator. The top-line scores are surprisingly good. But the same gap keeps appearing.
The Scorecard
| Server | Stars | Score | Grade | Annotations? |
|---|---|---|---|---|
| mcp-server-kubernetes | 1.4k | 97 | A | Yes |
| exa-mcp-server | 4.1k | 97 | A | Yes (3 hints) |
| apple-docs-mcp | 1.2k | 97 | A | Yes |
| n8n-mcp-server | 1.6k | 94 | A | No |
| mcp-server-browserbase | 3.2k | 94 | A | No |
| DesktopCommanderMCP | 5.8k | 94 | A | No |
Expanded Audit (March 25 Update)
After the initial benchmark, I audited 9 more servers. The pattern got clearer with scale: well-maintained servers by large orgs tend to have annotations. Everything else doesn't.
| Server | Stars | Annotations? |
|---|---|---|
| mcp-chrome | 10.9k | No — 37 tools, incl. chrome_inject_script, chrome_javascript |
| BrowserMCP | 6.2k | No — browser_click, browser_type unannoted |
| mcp-atlassian | 4.7k | Yes — destructiveHint on Jira/Confluence writes |
| magic-mcp | 4.6k | No |
| notion-mcp-server | 4.1k | Yes — auto-derived from HTTP methods |
| mcp-server-cloudflare | 3.6k | Yes — per-service annotations (D1, Workers, KV) |
| mcp-obsidian | 3.1k | No — DeleteFile has no destructiveHint |
| dbhub | 2.4k | Yes — read/write distinguished |
| google_workspace_mcp | 1.9k | No — send_gmail_message has no destructiveHint |
Scores based on tool name format, description quality, inputSchema validation, parameter descriptions, annotations, and outputSchema. Validated using wildrunai.com/tools/mcp-validator.
The Annotation Gap
Every server nails the basics: descriptive names, good descriptions, proper inputSchema with typed properties. But only 7 out of 15 use MCP annotations. The eight that don't include servers that execute arbitrary JavaScript (mcp-chrome), send real emails (google_workspace_mcp), delete notes (mcp-obsidian), run shell commands (DesktopCommander), and delete workflows (n8n). And the LLM has no hint about the risk level.
Here's what good looks like. From mcp-server-kubernetes:
{
"name": "kubectl_delete",
"description": "Delete Kubernetes resources...",
"annotations": {
"destructiveHint": true // ← LLM knows to confirm
},
"inputSchema": { ... }
}And from exa-mcp-server:
{
"name": "web_search_exa",
"annotations": {
"readOnlyHint": true, // ← Safe to run anytime
"destructiveHint": false, // ← Explicitly non-destructive
"idempotentHint": true // ← Same input = same output
}
}Now compare with n8n-mcp-server:
{
"name": "delete_workflow",
"description": "Delete a workflow in n8n",
// ← No annotations at all
// ← LLM doesn't know this is destructive
"inputSchema": { ... }
}Our validator catches this: “delete_workflow sounds destructive — add annotations: { destructiveHint: true } so LLMs handle with care.”
Nobody Uses Output Schemas
Zero out of six servers define outputSchema. This is optional, but it matters for tool chaining. When an LLM calls tool A and needs to pass the result to tool B, it needs to know what A returns. Without an output schema, the LLM has to guess.
A simple addition makes a real difference:
{
"name": "kubectl_get",
"outputSchema": {
"type": "object",
"description": "Kubernetes resource(s) in JSON format",
"properties": {
"items": { "type": "array" },
"kind": { "type": "string" }
}
}
}What the Best Servers Do
Three patterns separate the 97s from the 94s:
- Every parameter has a description. Not just a type. Not just a name.
"description": "Type of resource to get (e.g., pods, deployments, services)"tells the LLM what values are valid. - Annotations on every tool. The K8s server marks reads as
readOnlyHint: trueand deletes asdestructiveHint: true. Exa goes further withidempotentHinton search tools. - Descriptions explain when to use the tool, not just what it does. Exa's
web_search_exadescription includes “Best for: Finding current information” and “Query tips: describe the ideal page, not keywords.” This prevents misuse.
Common Mistakes That Drop Your Score
To show what a badtool definition looks like, here's a synthetic example that scores 66/100 (C grade):
{
"tools": [
{
"name": "do_thing", // ← Vague name
"description": "Does thing", // ← Useless description
"inputSchema": {
"type": "object",
"properties": {
"id": { "type": "string" }, // ← No description
"data": { "type": "object" } // ← No description
}
}
},
{
"name": "delete-all" // ← No description at all
// ← No destructiveHint
}
]
}The validator flags 12 issues: missing descriptions, undescribed parameters, no annotations, and a destructive-sounding tool with no safety hint. Every one of these makes the LLM more likely to use the tool incorrectly — or skip it entirely.
Validate Your Own Server
Three ways to check your tool definitions:
- Web validator: Paste your
tools/listJSON response at wildrunai.com/tools/mcp-validator - GitHub Action (free): Add to your CI pipeline to catch regressions on every PR:
- uses: wildrunai/mcp-validate-action@v1 with: file: ./tools.json min-score: 80 - API: POST your tool JSON to
https://wildrunai.com/api/tools/mcp-validatefor programmatic validation.
The Takeaway
Popular MCP servers are well-built. The basics — names, descriptions, schemas — are solid across the board. But annotations are where the ecosystem has a blind spot. Only 7 of 15 servers tested tell LLMs which tools are destructive. And almost nobody uses output schemas.
These aren't cosmetic issues. Annotations prevent LLMs from running send_gmail_message or chrome_inject_script without asking. Output schemas enable reliable tool chaining. Both are in the MCP spec. Both are easy to add. Both make your server safer and more useful.
Check your server. Fix the warnings. It takes 30 seconds.