What is the n8n Nodes Catalog dataset?

The n8n Nodes Catalog is a structured dataset of 524 n8n node metadata records extracted from the n8n GitHub repository. Each record covers one node: its name, display name, categories, operations supported, credential requirements, compact properties schema, and a GitHub permalink to the source file. Available free on HuggingFace in JSON and Parquet format.

How is the n8n Nodes Catalog different from other n8n datasets on HuggingFace?

Existing n8n datasets on HuggingFace (npv2k1/n8n-workflow, mbakgun/n8nbuilder-n8n-workflows-dataset) are collections of workflow examples - they train models to assemble workflows. The n8n Nodes Catalog fills the layer underneath: it catalogs what each node is, what operations it supports, and what credentials it needs. That is the metadata an AI agent needs to pick the right node in the first place.

How often is the n8n Nodes Catalog updated?

Monthly, via an automated pipeline that downloads the latest n8n release tarball, extracts node metadata, and pushes to HuggingFace. The github_permalink field anchors every record to the exact tag it was extracted from, so older rows remain stable across updates.

What formats is the catalog available in?

Two: nodes.json (UTF-8 JSON array, one record per node) and nodes.parquet (Snappy-compressed columnar, HuggingFace dataset viewer-ready). List fields like categories and credentials_required are stored as JSON strings in the Parquet file - parse with json.loads().

How do I load the n8n Nodes Catalog in Python?

Via the HuggingFace datasets library: from datasets import load_dataset; ds = load_dataset('automatelab/n8n-nodes-catalog', split='train'). Or read the Parquet file directly with pandas: import pandas as pd; df = pd.read_parquet('nodes.parquet'). DuckDB can query it in-process via SELECT * FROM read_parquet('nodes.parquet').

Free · CC-BY-4.0

n8n Nodes Catalog

Name: n8n Nodes Catalog
Creator: AutomateLab
License: https://creativecommons.org/licenses/by/4.0/

A structured, machine-readable catalog of 524 n8n nodes: operations each one supports, credential requirements, and properties schema. Extracted monthly from n8n source. The node-level metadata layer that lets an AI agent reason about which nodes to use - without guessing from stale training data.

View on HuggingFace Source on GitHub Read the post

What each record contains

524 records. One per node. 13 fields extracted from n8n source - covering both packages/nodes-base (431 nodes) and packages/@n8n/nodes-langchain (93 nodes). All fields are locked to what the extraction script produces.

Field	Type	Notes
`node_name`	string	Internal identifier (e.g. `slack`, `airtable`). Matches `INodeTypeDescription.name`.
`display_name`	string	Human-readable label shown in the n8n UI.
`categories`	list[string]	Category tags from the node's codex file. Examples: `Communication`, `AI`, `Data & Storage`.
`subcategories`	list[string]	Subcategory leaf values, flattened from `codex.subcategories`.
`group`	list[string]	n8n execution group: `input`, `output`, or `transform`.
`version`	string	Explicit version for single-version nodes; `defaultVersion` for multi-version nodes.
`description`	string	One-line description from `INodeTypeDescription.description`.
`credentials_required`	list[string]	Credential type names from the node's `credentials` array. Empty for trigger and core nodes.
`operations_supported`	list[string]	Values from the `operation` property options; falls back to `resource` options. Empty for nodes without a resource/operation picker.
`properties_schema`	string (JSON)	Compact top-level property descriptors: `[{"name":"...","displayName":"...","type":"..."}]`. Serialized as a JSON string.
`source_package`	string	`nodes-base` or `@n8n` (for nodes-langchain nodes).
`source_file_path`	string	Repo-relative path to the primary `.node.ts` file.
`github_permalink`	string	Permanent GitHub link at the exact tag the record was extracted from.

Parquet note: list fields (categories, credentials_required, etc.) are stored as JSON strings in the Parquet file. Parse with json.loads().

What makes it different

Fills the gap below workflow datasets

Existing n8n datasets on HuggingFace catalog workflow examples - they train models on how to assemble nodes. None catalog what each node is. This dataset is the metadata layer underneath: what a node does, what it accepts, what it requires.

Agent tooling at inference time

An agent building an n8n workflow can load this dataset as context to pick the right node, validate operation names, and check credential requirements - before generating a single line of workflow JSON.

Monthly auto-update from source

The extraction pipeline runs monthly against the latest n8n release. The github_permalink field anchors every record to the tag it was extracted from, so older rows remain stable across updates.

Queryable in one line

"What n8n nodes support OAuth2?" is currently a docs-browsing exercise. With this dataset it is a single pandas filter or DuckDB query. Two formats: nodes.json and nodes.parquet.

Three ways to access it

HuggingFace datasets API - works directly in Python, no file download required

pip install datasets
from datasets import load_dataset
import json

ds = load_dataset("automatelab/n8n-nodes-catalog", split="train")

# nodes requiring OAuth2
oauth = [r for r in ds if "oAuth2Api" in json.loads(r["credentials_required"])]
print([r["display_name"] for r in oauth])

Parquet + pandas - best for local analysis and filtering

pip install pandas pyarrow
import pandas as pd, json

df = pd.read_parquet("nodes.parquet")
df["ops"] = df["operations_supported"].apply(json.loads)

# all Slack nodes and their operations
slack = df[df["node_name"].str.contains("slack", case=False)]
print(slack[["display_name", "ops"]])

DuckDB SQL - count nodes by category with no setup beyond the Parquet file

SELECT category, COUNT(*) AS node_count
FROM read_parquet('nodes.parquet'),
     UNNEST(json_extract_string(categories, '$[*]')) AS t(category)
GROUP BY category
ORDER BY node_count DESC;

Full schema, methodology, and sample queries: dataset card on HuggingFace. Extraction script and monthly update pipeline: source on GitHub. Deep dive on AI-agent use cases: the companion post.

FAQ

What is the license?

Our additions - the catalog format, extraction script, and dataset card - are CC-BY-4.0. The upstream node metadata is derived from n8n source (copyright n8n team, used under the n8n Sustainable Use License). This dataset is a community-maintained catalog/index, not a redistribution of n8n source.

How is this different from other n8n datasets on HuggingFace?

Existing n8n datasets (npv2k1/n8n-workflow, mbakgun/n8nbuilder-n8n-workflows-dataset) are collections of workflow examples - they train models to assemble workflows. This dataset fills the layer underneath: what each node is, what operations it supports, and what credentials it requires. Those datasets tell a model how to use nodes; this one tells it which nodes exist and what they do.

Why Parquet and not just CSV?

Parquet preserves the list-typed fields (categories, operations, credentials) in a way CSV cannot cleanly represent. It is also Snappy-compressed, which makes it ~10x smaller than the equivalent CSV. The HuggingFace dataset viewer renders Parquet files directly - no download needed to browse records.

Does it cover all n8n nodes?

It covers the 524 community nodes in packages/nodes-base (431 nodes) and packages/@n8n/nodes-langchain (93 nodes). Not included: credentials definitions, utility modules, the core workflow engine, and EE-only nodes that don't follow the standard descriptor pattern.

Can I use this for LLM fine-tuning?

Yes, for non-commercial use under CC-BY-4.0. Attribute AutomateLab as the catalog maintainer and preserve the n8n team copyright notice per the n8n Sustainable Use License. Full attribution text is on the dataset card.

Need this wired into an agent pipeline?

We use this catalog to power n8n agent tooling at AutomateLab. If you want it integrated into your own workflow-building agent, retrieval pipeline, or fine-tuning run, we can help scope and build it.

Get in touch