n8n Nodes Catalog
A structured, machine-readable catalog of 524 n8n nodes: operations each one supports, credential requirements, and properties schema. Extracted monthly from n8n source. The node-level metadata layer that lets an AI agent reason about which nodes to use - without guessing from stale training data.
What each record contains
524 records. One per node. 13 fields extracted from n8n source - covering both packages/nodes-base (431 nodes) and packages/@n8n/nodes-langchain (93 nodes). All fields are locked to what the extraction script produces.
| Field | Type | Notes |
|---|---|---|
node_name |
string | Internal identifier (e.g. slack, airtable). Matches INodeTypeDescription.name. |
display_name |
string | Human-readable label shown in the n8n UI. |
categories |
list[string] | Category tags from the node's codex file. Examples: Communication, AI, Data & Storage. |
subcategories |
list[string] | Subcategory leaf values, flattened from codex.subcategories. |
group |
list[string] | n8n execution group: input, output, or transform. |
version |
string | Explicit version for single-version nodes; defaultVersion for multi-version nodes. |
description |
string | One-line description from INodeTypeDescription.description. |
credentials_required |
list[string] | Credential type names from the node's credentials array. Empty for trigger and core nodes. |
operations_supported |
list[string] | Values from the operation property options; falls back to resource options. Empty for nodes without a resource/operation picker. |
properties_schema |
string (JSON) | Compact top-level property descriptors: [{"name":"...","displayName":"...","type":"..."}]. Serialized as a JSON string. |
source_package |
string | nodes-base or @n8n (for nodes-langchain nodes). |
source_file_path |
string | Repo-relative path to the primary .node.ts file. |
github_permalink |
string | Permanent GitHub link at the exact tag the record was extracted from. |
Parquet note: list fields (categories, credentials_required, etc.) are stored as JSON strings in the Parquet file. Parse with json.loads().
What makes it different
Fills the gap below workflow datasets
Existing n8n datasets on HuggingFace catalog workflow examples - they train models on how to assemble nodes. None catalog what each node is. This dataset is the metadata layer underneath: what a node does, what it accepts, what it requires.
Agent tooling at inference time
An agent building an n8n workflow can load this dataset as context to pick the right node, validate operation names, and check credential requirements - before generating a single line of workflow JSON.
Monthly auto-update from source
The extraction pipeline runs monthly against the latest n8n release. The github_permalink field anchors every record to the tag it was extracted from, so older rows remain stable across updates.
Queryable in one line
"What n8n nodes support OAuth2?" is currently a docs-browsing exercise. With this dataset it is a single pandas filter or DuckDB query. Two formats: nodes.json and nodes.parquet.
Three ways to access it
-
HuggingFace datasets API - works directly in Python, no file download required
pip install datasets from datasets import load_dataset import json ds = load_dataset("automatelab/n8n-nodes-catalog", split="train") # nodes requiring OAuth2 oauth = [r for r in ds if "oAuth2Api" in json.loads(r["credentials_required"])] print([r["display_name"] for r in oauth]) -
Parquet + pandas - best for local analysis and filtering
pip install pandas pyarrow import pandas as pd, json df = pd.read_parquet("nodes.parquet") df["ops"] = df["operations_supported"].apply(json.loads) # all Slack nodes and their operations slack = df[df["node_name"].str.contains("slack", case=False)] print(slack[["display_name", "ops"]]) -
DuckDB SQL - count nodes by category with no setup beyond the Parquet file
SELECT category, COUNT(*) AS node_count FROM read_parquet('nodes.parquet'), UNNEST(json_extract_string(categories, '$[*]')) AS t(category) GROUP BY category ORDER BY node_count DESC;
Full schema, methodology, and sample queries: dataset card on HuggingFace. Extraction script and monthly update pipeline: source on GitHub. Deep dive on AI-agent use cases: the companion post.
FAQ
What is the license?
How is this different from other n8n datasets on HuggingFace?
Why Parquet and not just CSV?
Does it cover all n8n nodes?
packages/nodes-base (431 nodes) and packages/@n8n/nodes-langchain (93 nodes). Not included: credentials definitions, utility modules, the core workflow engine, and EE-only nodes that don't follow the standard descriptor pattern.Can I use this for LLM fine-tuning?
Need this wired into an agent pipeline?
We use this catalog to power n8n agent tooling at AutomateLab. If you want it integrated into your own workflow-building agent, retrieval pipeline, or fine-tuning run, we can help scope and build it.
Get in touch