A shared vocabulary for agentic data pipelines and agentic data engineering.
A data pipeline that knows why it exists, heals itself when it fails, and is operated by cooperating AI agents; implementing the three tenets of intent awareness, self-healing, and AI orchestration.
The business purpose and operational objective a pipeline exists to serve.
A pipeline capable of autonomously detecting, diagnosing, and repairing failures.
A system coordinated by cooperating AI agents rather than static workflows alone; with an orchestrating agent directing specialist agents across ingestion, transformation, data quality, and repair.
The continuous cycle through which a self-healing pipeline operates: Observe pipeline health, Diagnose root cause, Repair the failure, and Learn from the outcome to resolve the same class of problem faster next time.
A reference framework defining the architectural requirements of an agentic data pipeline system.
The emerging discipline of defining intent, supervising agents, and building data systems for human and machine consumers.
A property of a pipeline that knows the purpose it serves; the business outcome it supports, the consumers it delivers to, and the quality and freshness standards it must meet.
Persistent operational knowledge retained across incidents and workflow adaptations.
The continuous monitoring of pipeline behavior, performance, schema integrity, and data quality.
The AI agent responsible for coordinating specialist agents across a pipeline; deciding when to intervene, which specialist to deploy, and how to respond to changing conditions. The orchestrating agent is the decision-making layer of an agentic data pipeline.
An AI agent with a defined role within the pipeline — such as ingestion, transformation, data quality, or repair — deployed by the orchestrating agent to handle a specific class of task. Specialist agents reason and act rather than follow static instructions.
The six-layer architecture of an agentic data pipeline system: Intent, Agents, Execution, Data Systems, Observability, and Memory. Each layer has a distinct responsibility; together they form a system that can observe its own state, understand what went wrong, fix it, and retain what it learned.
This vocabulary is part of the effort to define the category of agentic data pipelines and establish a shared language for the field.