What is the MCP Servers Tool Catalog dataset?

A catalog of 922 public Model Context Protocol (MCP) servers and the 9,922 tools they actually expose at runtime. We spawned each server as a child process over stdio and sent it a JSON-RPC tools/list call - the same handshake any MCP host (Claude Desktop, Cursor, Cline) uses. The result is two tables: 922 rows in servers.jsonl (one per package, including failures with a classified status code) and 9,922 rows in tools.jsonl (one per tool that successfully responded, with full JSON Schema for its input).

How is this different from existing MCP server lists?

Existing MCP catalogs (modelcontextprotocol/servers, awesome-mcp-servers, mcp.so) are curated lists of READMEs - they tell you what a server claims to do. This dataset captures what each server actually does when you spawn it: the exact tool names, the exact input schemas, and a classified reason for why each failure failed. No scraping. No human summarization. The data is the response to a tools/list call.

How often is the catalog refreshed?

Monthly, via a GitHub Actions cron that runs on the 1st of each month at 03:00 UTC. The workflow re-validates every npm package still exists, re-spawns each server, re-captures tools/list, diffs against the previous snapshot, and pushes a fresh Parquet bundle to HuggingFace. The diff is published as a changelog in the GitHub repo.

Why did 563 of 922 servers fail introspection?

Most MCP servers expect environment variables, CLI args, or service credentials at spawn time - they crash without them. We classify every non-ok server into one of 16 status buckets (init_timeout, needs_env_var, needs_cli_args, npm_install_generic, broken_install, needs_slack_token, needs_aws_creds, etc.) by pattern-matching stderr. The classifications are themselves useful: they tell you what setup each server expects before you wire it into an agent.

What formats is the catalog available in?

Four files on HuggingFace: servers.jsonl (UTF-8 JSON-lines, one record per server), tools.jsonl (one record per tool), and the Parquet equivalents. The JSON-Schema input field on each tool is preserved as a nested JSON value in the Parquet file - readable directly in DuckDB or pandas with no extra parsing.

How do I load the MCP catalog in Python?

Via the HuggingFace datasets library: from datasets import load_dataset; ds = load_dataset('automatelab/mcp-servers-tool-catalog', 'tools', split='train'). Or read the Parquet file directly with pandas: import pandas as pd; df = pd.read_parquet('tools.parquet'). DuckDB can query it in-process: SELECT server_name, tool_name FROM read_parquet('tools.parquet') WHERE tool_description ILIKE '%search%'.

Free · CC-BY-4.0

MCP Servers Tool Catalog

Name: MCP Servers Tool Catalog
Creator: AutomateLab
License: https://creativecommons.org/licenses/by/4.0/

922 public Model Context Protocol servers spawned over stdio. 359 introspected successfully. 9,922 real tool schemas captured. Not scraped from READMEs - this is the verbatim response to a JSON-RPC tools/list call, the same handshake every MCP host uses. Refreshed monthly.

View on HuggingFace Source on GitHub

What you get

Two tables and a monthly diff. Every record is the output of code that actually ran - no human curation, no LLM summarization between source and dataset.

922 npm packages introspected

359 servers that responded to tools/list

9,922 tool schemas captured

16 classified failure categories

What each record contains

servers.jsonl - one row per attempted package (922 rows), regardless of whether introspection succeeded. The status field is the key column: ok means the server responded to tools/list; anything else is one of 16 classified failure reasons.

Field	Type	Notes
`name`	string	Short identifier from the curated seed list (e.g. `brave-search`, `stripe`).
`package`	string	npm package name. What `npx` would resolve to.
`status`	string	`ok` on success. On failure: `init_timeout`, `needs_env_var`, `needs_cli_args`, `npm_install_generic`, `broken_install`, or one of 11 vendor-specific buckets (`needs_slack_token`, `needs_aws_creds`, etc.).
`elapsed_s`	number	Wall-clock seconds from spawn to terminal state. Useful for spotting cold-start sluggishness.
`tool_count`	number	Number of tools returned by `tools/list`. `0` on failure.
`server_info`	object	The `serverInfo` block from the MCP `initialize` response: name and version self-reported by the server.

tools.jsonl - one row per (server, tool) pair on successful introspections (9,922 rows). Each tool record carries the verbatim inputSchema the server returned - usable as an OpenAI / Anthropic function-calling tool definition with no transformation.

Field	Type	Notes
`server_name`	string	Foreign key into `servers.jsonl`.
`package`	string	Denormalized npm package for convenience.
`tool_name`	string	The tool's machine name (e.g. `brave_web_search`). What an LLM would call.
`tool_description`	string	The description string the server reported. This is the text an LLM reads to decide whether to use the tool.
`input_schema`	object (JSON Schema)	The verbatim `inputSchema` from `tools/list`. Compatible with Anthropic `tool.input_schema` and OpenAI `function.parameters`.

What makes it different

Introspected, not scraped

Existing MCP catalogs (modelcontextprotocol/servers, awesome-mcp-servers, mcp.so) list READMEs. We spawn each server and capture what it actually responds with. The schemas are real. The descriptions are what the LLM will see at runtime.

Failures are first-class data

The 563 servers that didn't introspect aren't dropped - they're classified. needs_env_var, needs_cli_args, init_timeout, and 13 other buckets tell you what each server expects before it'll boot.

Drop-in for function calling

The input_schema field is a valid JSON Schema as defined by the MCP spec. Pass it straight into Anthropic tool.input_schema or OpenAI function.parameters with no transform.

Monthly auto-refresh

A GitHub Actions cron re-runs the full pipeline on the 1st of every month. Each refresh produces a diff (new servers, removed servers, changed tool schemas) committed to the repo as a changelog. Older snapshots stay reproducible.

How introspection works

The pipeline is small enough to read in one sitting - the whole thing is in the GitHub repo. For each candidate package, the introspector:

Validates the package still exists on npm. A separate validate_npm.py pass catches packages that vanished between snapshots.
Spawns npx -y <package> with stdio piped and a 120-second timeout. Each child gets its own subprocess; up to 8 run in parallel.
Sends a JSON-RPC initialize with protocol version 2024-11-05 and standard client capabilities. Same handshake Claude Desktop sends.
Sends tools/list and reads the response. The result array becomes the tool rows verbatim.
On failure: classifies stderr by pattern matching into one of 16 status buckets. Vendor-specific tokens (needs_slack_token, needs_aws_creds) get their own bucket so you can filter for them.

Concurrency is configurable (CONCURRENCY=8 in CI). The whole run takes about 3 hours of runner time on a clean machine. The transport is the official MCP stdio transport - we don't touch the HTTP transport because most published servers ship stdio-only.

Three ways to access it

HuggingFace datasets API - load servers or tools as a separate split

pip install datasets
from datasets import load_dataset

# tools split: one row per (server, tool)
tools = load_dataset("automatelab/mcp-servers-tool-catalog", "tools", split="train")

# all tools whose description mentions "search"
search_tools = tools.filter(lambda r: "search" in r["tool_description"].lower())
print(len(search_tools), "search-related MCP tools")

Parquet + pandas - join servers and tools locally

pip install pandas pyarrow
import pandas as pd

servers = pd.read_parquet("servers.parquet")
tools = pd.read_parquet("tools.parquet")

# Servers that need credentials, with their first tool name
needs_auth = servers[servers["status"].str.startswith("needs_")]
print(needs_auth.groupby("status").size().sort_values(ascending=False))

DuckDB SQL - cross-table queries with no setup

SELECT s.name, COUNT(t.tool_name) AS tools
FROM read_parquet('servers.parquet') s
LEFT JOIN read_parquet('tools.parquet') t USING (package)
WHERE s.status = 'ok'
GROUP BY s.name
ORDER BY tools DESC
LIMIT 10;

Full schema and monthly diff history: dataset card on HuggingFace. Pipeline source, classifier logic, and monthly cron: AutomateLab-tech/mcp-tool-catalog on GitHub.

FAQ

What is the license?

The pipeline code is MIT. The dataset rows on HuggingFace are CC-BY-4.0. The introspected tool descriptions and schemas come from each individual MCP server's source - attribute the original server author when you redistribute, and check their license if you're using a specific server's tools downstream. We're cataloging public-facing JSON-RPC responses, not the server source code itself.

How is this different from modelcontextprotocol/servers or awesome-mcp-servers?

Those are curated lists - human-maintained README directories of MCP servers. They tell you what exists. This is a catalog of the runtime behavior of those servers: the tool names they actually expose, the input schemas they actually accept, and a classified reason for the ones that don't boot cleanly. If you're building an agent and want to pick the right MCP server for a job, the README tells you the pitch and this dataset tells you the API.

Why did so many servers fail to introspect?

Most MCP servers expect environment variables (API tokens, database URLs), CLI arguments, or a working credential chain before they'll respond to initialize. A clean-room spawn with no env doesn't satisfy them. We classify each failure - needs_env_var, needs_cli_args, needs_slack_token, etc. - so the failure list is itself a useful "what setup does this server need" lookup.

Can I add a server to the catalog?

Yes. Edit servers-final.validated.json in the GitHub repo with a one-line entry ({"name": "...", "package": "@scope/pkg"}) and open a PR. The next monthly refresh picks it up. We accept any public npm package that ships an MCP server. validate_npm.py will reject hallucinated packages automatically.

Why JSONL and Parquet for the same data?

JSONL is human-readable and diff-friendly - the rolling state in the repo is JSONL so each monthly commit is a meaningful changelog. Parquet is what you want for analysis - it's 5-10x smaller, preserves the nested input_schema object as a real struct instead of a string, and the HuggingFace dataset viewer renders it directly.

Does the catalog include Python or Go MCP servers?

Not yet - the introspector currently spawns npm packages only. Python MCP servers (typically uvx-runnable) are a planned addition; the bottleneck is environment isolation, not transport (the JSON-RPC layer is identical). Watch the GitHub repo for the multi-runtime PR.

Need this wired into an agent pipeline?

We use this catalog to pick the right MCP server for jobs at AutomateLab - and to build retrieval over tool schemas so an agent can find the right tool from thousands of options. If you want it integrated into your own agent harness, MCP host, or fine-tuning run, we can scope and build it.

Get in touch