Context Engineering in 2026: From Karpathy's Tweet to Production Infrastructure

Patrick Joubert; Patrick Joubert

Context Engineering in 2026: From Karpathy's Tweet to Production Infrastructure

April 4, 2026·Patrick Joubert·7 min read

context-engineeringcontext-graphproduction-reliabilityagent-architecturedecision-infrastructure

In June 2025, Andrej Karpathy killed prompt engineering in a tweet.

He called it "context engineering," describing it as "the delicate art and science of filling the context window with just the right information for the next step." Shopify CEO Tobi Lutke followed: "the art of providing all the context for the task to be plausibly solvable by the LLM." Simon Willison predicted the term would stick because, unlike "prompt engineering," its inferred meaning actually matches the real work.

They were right. A year later, "context engineering" is everywhere. Conference talks, job titles, blog posts, frameworks. Gartner declared 2026 "the year of context." The term won.

But winning the naming debate is not the same as solving the problem. Everyone agrees you need to fill the context window with the right information. Almost nobody has built the system that actually does it. The term is everywhere. The implementation is nowhere.

What everyone says

Three frameworks have emerged to define what context engineering means in practice.

Anthropic's Applied AI team breaks it into three categories: static context (system prompts, tools, few-shot examples), dynamic context retrieval (just-in-time loading, progressive disclosure), and long-horizon task management (compaction, structured note-taking, sub-agent architectures). Their guiding principle: find "the smallest set of high-signal tokens that maximize the likelihood of some desired outcome."

The Manus team, after rebuilding their agent framework four times, distilled six production patterns: design around KV-cache hit rates, mask tools instead of removing them, externalize context to the file system, manipulate attention through recitation, preserve error evidence, and avoid few-shot brittleness.

LangChain proposed four core operations: write, select, compress, and isolate. They introduced middleware in LangChain 1.0 specifically for context engineering control.

These are serious, experience-driven frameworks. They represent real lessons from real production systems. And they all share the same blind spot.

The problem nobody is asking

Every framework describes operations on context. None of them describes the structure of context itself.

This is like documenting every possible SQL query without ever defining a schema. You can SELECT, INSERT, UPDATE, and DELETE all day long. Without a schema, you have no constraints, no relationships, no integrity guarantees. You have data. You do not have a database.

Context engineering without a context model is sophisticated prompt stuffing.

Karpathy's CPU/RAM analogy is useful but incomplete. If the LLM is the CPU and the context window is the RAM, then context engineering is the operating system that decides what to load into memory. But who is the file system? Who manages:

Temporal validity. Is this fact still true? A product price retrieved 48 hours ago may have changed. A customer segment analysis from last quarter may be obsolete. An API schema cached at startup may have been deprecated. Without temporal validation, agents make decisions on stale context and have no mechanism to detect it.

Provenance. Where did this information come from? Was it retrieved from a verified source, generated by another agent, or inferred from partial data? When two facts contradict each other, provenance determines which one to trust. Without it, agents treat all context as equally reliable.

Dependencies. If this fact changes, what else is affected? A customer's credit score change impacts their loan eligibility, which impacts the recommended products, which impacts the email campaign they should receive. Without dependency tracking, agents update one fact and leave downstream decisions built on the old value.

Scope. Is this information relevant to this specific decision? A customer service agent resolving a billing dispute does not need the customer's browsing history from six months ago. A pricing agent does not need the full product catalog. Without scope binding, agents operate on bloated context where signal drowns in noise.

The answer to all four questions, in most production systems today, is: nobody manages this. And that is why $684 billion invested in AI in 2025 delivered less than 20% of its intended business value.

The context graph as architectural answer

A context graph is context engineering with a schema.

Not a concept. Not a framework. A data structure with four properties that map directly to the four problems above.

State nodes. Each fact in the graph is a node that carries its source, its creation timestamp, its validity window, and its scope. A node is not "the customer's credit score is 720." A node is "the customer's credit score is 720, sourced from Experian API, retrieved at 2026-04-03T14:22:00Z, valid until 2026-04-10T14:22:00Z, scoped to financial-product-eligibility."

Dependency edges. Relationships between nodes are explicit, weighted, and directional. When the credit score node updates, the graph knows which downstream nodes (loan eligibility, interest rate, product recommendations) need revalidation. This is not inference. This is structure.

Temporal validity. Before any decision, the graph validates that every node in the decision path is within its validity window. Expired nodes trigger retrieval, not hallucination. The agent does not decide on stale context because the graph will not let it.

Decision trace. Every decision the agent makes points back to the specific nodes and edges that produced it. When an agent recommends a financial product, the trace shows exactly which context nodes informed that recommendation, when they were retrieved, and from where. This is not logging. This is architectural traceability.

These are not theoretical properties. They solve the concrete problems that context engineering frameworks describe but do not address:

Problem	Without context graph	With context graph
Context drift across steps	Undetected, compounds silently	Each step validates node validity
Contradictory information	Last-in wins, no resolution logic	Provenance-weighted conflict resolution
Scope pollution	All context loaded, signal buried	Scope-bound traversal, relevant nodes only
Decision auditability	Log scraping, post-hoc reconstruction	Native trace from decision to source nodes
Stale data in long-running agents	No expiry mechanism	Temporal validity enforced at query time

Why now

Three forces are converging in 2026 that make the context graph not just useful but necessary.

MCP solved the plumbing. The Model Context Protocol, now governed by the Agentic AI Foundation with 97 million monthly SDK downloads, has become the universal connector. Every major model provider has adopted it. Agents can connect to any tool, any data source, any service. The transport layer is solved. But transport without governance is a pipe that carries contaminated water. MCP tells the agent what tools are available. A context graph tells the agent which information is valid, relevant, and trustworthy for the decision at hand.

Gartner made context an enterprise priority. Their 2026 predictions are explicit: 50% of agent deployments will fail from insufficient governance. 50% of business decisions will be augmented or automated by AI agents by 2027. Physical AI will generate 10x more data than digital AI by 2029. The enterprise has finally understood that the model is not the bottleneck. Context is. And the budgets are shifting accordingly.

Production failure rates forced the conversation. Multi-agent systems show failure rates between 41% and 87% across major frameworks. These are not prototype failures. These are production incidents. Teams that deployed agents in 2025 spent 2026 debugging context problems they could not see, because their architectures had no mechanism for context integrity. The pattern is consistent: agents do not fail because they reason badly. They fail because they reason correctly on broken context.

What comes next

Three shifts are already underway.

Distributed context graphs. Today, context graphs operate within a single agent or organization. As multi-agent systems mature, agents will need to share validated context across organizational boundaries. A procurement agent negotiating with a supplier agent needs to trust the supplier's context, not just its outputs. Distributed context graphs with shared validation protocols will become the standard for inter-agent communication.

Context graph as protocol. MCP standardized tool access. The next standardization wave will be context structure. A common format for exchanging context nodes, with temporal metadata, provenance, and scope bindings, will emerge as the interoperability layer above MCP. The organizations building this infrastructure now will define the standard.

Regulatory demand for decision traceability. The EU AI Act requires explainability for high-risk AI systems. Financial regulators require audit trails for automated decisions. Healthcare authorities require provenance tracking for clinical AI. Every one of these requirements maps directly to a context graph property. The organizations that built context graphs for reliability will discover they also built compliance infrastructure. The organizations that did not will discover they cannot retrofit it.

The line is clear

Prompt engineering was how you talked to a model. Context engineering is how you build the system around it.

A context graph is that system.

Not because it is the trendiest architecture. Because it is the only one that provides temporal validity, provenance tracking, dependency management, and decision traceability as structural properties rather than afterthoughts.

The organizations that will run reliable agents at scale in 2027 are not the ones with the best models. They are the ones with the best context infrastructure. And context infrastructure, when you strip away the buzzwords and the conference talks, is a graph.

Cite this memo

Patrick Joubert. (2026). "Context Engineering in 2026: From Karpathy's Tweet to Production Infrastructure." The Context Graph. https://thecontextgraph.co/memos/context-engineering-2026-from-tweet-to-infrastructure

Running into these patterns in production?

Compare notes