What is Context Engineering? The Discipline Behind Reliable AI Agents

Q: What is context engineering?

Context engineering is the discipline of designing, building, and maintaining the structured context that AI agents need to make reliable decisions in production environments. It goes beyond prompt engineering by managing the full lifecycle of context: what information an agent receives, how it is structured, when it expires, and why it was selected.

Q: How is context engineering different from prompt engineering?

Prompt engineering focuses on crafting the right instructions for a single LLM call. Context engineering focuses on the entire information supply chain: what data reaches the model, how it is structured, whether it is still valid, and how decisions are traced. Prompt engineering is one conversation; context engineering is the system around every conversation.

Q: Why does context engineering matter for production AI agents?

Production AI agents make decisions across multiple steps, tools, and sessions. Without engineered context, agents experience state drift, hallucinate outdated information, and make decisions they cannot explain. Context engineering provides the structural reliability layer that prevents these failures.

Patrick Joubert; Patrick Joubert

Pillar

What is Context Engineering?

The discipline of designing, building, and maintaining the structured context that AI agents need to make reliable decisions in production.

March 19, 2026·12 min read·Patrick Joubert

Context engineering is to AI agents what data engineering is to analytics: the infrastructure that determines whether outputs are reliable or random.

The problem context engineering solves

Every AI agent operates within a context window — the information available when it makes a decision. Most teams treat this window as a prompt. They optimize the instruction and hope the model figures out the rest.

In demos, this works. In production, it collapses. The agent encounters stale data, contradictory rules, expired permissions, and decisions it made three steps ago that it can no longer recall. The prompt was fine. The context was broken.

Context engineering addresses the root cause: the information supply chain feeding the model is unmanaged. No freshness guarantees. No structural validation. No trace of why a particular piece of information was selected over another.

Where the term comes from

The term gained traction in mid-2025 when Shopify CEO Tobi Lutke and former OpenAI researcher Andrej Karpathy both endorsed it publicly. Karpathy described context engineering as the art and science of filling the context window with just the right information for the next step. Lutke called it a more accurate label for what production teams actually do when building AI systems.

The distinction matters. As Simon Willison noted, unlike "prompt engineering" — which people dismiss as typing things into a chatbot — "context engineering" has an inferred definition much closer to its intended meaning. It signals systems work, not wordsmithing.

The mental model

Think of the LLM as a CPU and its context window as RAM. The context engineer acts as the operating system — loading working memory with just the right code and data for each task. The quality of the output depends entirely on what is loaded into that window.

Context engineering vs prompt engineering

Prompt engineering is one layer of context engineering. It focuses on the instruction — the "what to do" part of the context window. Context engineering manages everything else: what information reaches the model, how it is structured, whether it is still valid, and how decisions are traced back to their inputs.

Dimension	Prompt Engineering	Context Engineering
Scope	Single LLM call	Full information supply chain
Focus	Instruction quality	Information architecture
Temporal awareness	None — static text	Freshness, validity windows, expiry
State management	Stateless	Cross-step, cross-session state
Traceability	None	Decision traces, provenance
Failure mode	Bad output	Silent degradation at scale

The five pillars of context engineering

Context engineering is not a single technique. It is a discipline built on five interconnected practices:

01

Context Selection

Determining what information reaches the model for each decision. Not everything in the knowledge base is relevant. Not everything relevant is valid right now. Context selection is the gatekeeper — choosing what goes in and what stays out based on the current task, the agent's state, and the decision being made.

02

Context Structuring

Organizing selected information so the model can reason over it effectively. Raw documents dumped into a prompt are not structured context. Structuring means providing explicit relationships, hierarchies, constraints, and metadata that reduce ambiguity and enable deterministic reasoning.

03

Temporal Management

Ensuring every piece of context carries validity windows, freshness guarantees, and expiration rules. A pricing rule from last quarter should not silently influence today's decision. Temporal management makes time a first-class property of every context element.

04

State Continuity

Maintaining coherent agent state across steps, tool calls, and sessions. When an agent approves a request in step 3, that decision must persist through steps 4 through 12. State continuity prevents the drift that causes agents to contradict themselves or lose track of their own actions.

05

Decision Traceability

Recording not just what the agent decided, but why — what context was available, what was selected, what was excluded, and what rules applied. Traceability transforms AI agents from black boxes into auditable systems.

Context engineering in production: emerging frameworks

Several teams building production agents have published their context engineering approaches. Three patterns keep emerging:

Anthropic's Three Categories

Anthropic's Applied AI team categorizes context engineering into static context (system prompts, tool definitions, few-shot examples), dynamic context retrieval (just-in-time loading, progressive disclosure), and long-horizon task management (compaction, structured note-taking, sub-agent architectures). Their guiding principle: find the smallest set of high-signal tokens that maximize the likelihood of a desired outcome.

Manus's Production Patterns

The Manus team, after rebuilding their agent framework four times, published six production-tested patterns: design around KV-cache hit rates (stable prefixes, append-only contexts), mask tools instead of removing them to preserve cache, externalize context to the file system as unlimited persistent storage, manipulate attention through recitation (agents maintain todo files to prevent goal drift across 50+ tool calls), preserve error evidence so models learn implicitly, and avoid few-shot brittleness through structured variation.

LangChain's Four Operations

LangChain identifies four core context operations: write (persist information), select (retrieve the right context), compress (summarize to fit the window), and isolate (give sub-agents clean context windows). They introduced Middleware in LangChain 1.0 specifically for programmatic context control.

The industry signal

Gartner declared 2026 the year of context, positioning context engineering as critical infrastructure for enterprise AI. Their finding: 4 out of 5 organizations increased AI investments in 2026, yet only 1 in 5 shows measurable ROI — a gap they attribute to fragmented context across documentation, tribal knowledge, and disconnected tools.

The Model Context Protocol (MCP) — now governed by the Agentic AI Foundation under the Linux Foundation — has become the universal connector standard with adoption from Anthropic, OpenAI, Google, and Microsoft. Martin Fowler's team has published detailed analyses of context engineering for coding agents, signaling the concept has reached mainstream software engineering.

The context graph: context engineering in practice

A context graph is the primary data structure for implementing context engineering. It captures not just facts and relationships (like a knowledge graph), but also applicability rules, temporal validity, exceptions, provenance, and decision traces.

Where prompt engineering optimizes the instruction and RAG optimizes the retrieval, context engineering via context graphs optimizes the entire decision substrate — the structured environment within which AI agents reason and act.

The Stack

Prompt EngineeringThe instruction

RAG / RetrievalThe relevant documents

Context EngineeringThe structured decision substrate

Who needs context engineering

Context engineering becomes critical when AI agents move from single-turn assistants to multi-step, tool-using, decision-making systems deployed in production:

Teams running multi-step agents that use tools and modify state
Organizations requiring audit trails for AI-driven decisions
Products where agent decisions affect revenue, compliance, or user safety
Systems where context changes over time (pricing, permissions, policies)
Platforms orchestrating multiple agents that share state

Frequently asked questions

Is context engineering just a new name for prompt engineering?

No. Prompt engineering is a subset of context engineering. It focuses on the instruction within a single LLM call. Context engineering manages the entire information supply chain across the full agent lifecycle — selection, structuring, temporal validity, state continuity, and decision traceability.

Do I need a context graph to do context engineering?

Not necessarily, but a context graph is the most effective implementation pattern. You can practice elements of context engineering with structured prompts, metadata enrichment, and state management systems. A context graph formalizes these practices into a coherent data structure.

How is context engineering related to RAG?

RAG (Retrieval-Augmented Generation) is a retrieval technique. It answers 'what documents are relevant?' Context engineering asks a broader set of questions: 'Is this information still valid? Does this rule apply to this specific situation? What did the agent decide three steps ago?' RAG is one input to context engineering, not a replacement for it.

When should I start investing in context engineering?

When your agents move from demos to production. The moment agents make decisions that affect real users, real money, or real compliance requirements, unmanaged context becomes a liability. Most teams discover this through silent failures — agents that produce plausible but wrong outputs because their context was stale, incomplete, or contradictory.