Context Engineering

January 14, 2025

Enterprise AI spending surged to $13.8 billion in 2024, a sixfold increase from the prior year. Organizations deployed 11 times more models into production. RAG adoption jumped from 31% to 51%. Context windows expanded to 200K tokens, then 1 million. And yet, 21% of failed AI pilots cited data privacy concerns as the root cause. Another 26% failed on implementation costs. A staggering 97% of organizations that experienced an AI-related security incident lacked proper AI access controls.

The pattern is unmistakable. The models are not the bottleneck. The bottleneck is context: how enterprises assemble, govern, and deliver the right information to AI systems at inference time, without violating privacy guarantees, security boundaries, safety requirements, or compliance obligations. This is the defining challenge of enterprise AI in 2025, and it is fundamentally an engineering problem, not a model selection problem.

We are entering the era of context engineering.

From Prompt Engineering to Context Engineering

Prompt engineering dominated the discourse in 2023. Teams hired prompt engineers. Companies published prompt libraries. The assumption was that crafting the right instruction would unlock model capability. That assumption was correct for demos and insufficient for production.

The distinction is architectural. Prompt engineering operates on the instruction layer: how you tell the model what to do. Context engineering operates on the information layer: what the model is allowed to know and use when it does it. A perfectly crafted prompt cannot compensate for missing or incorrect context. A mediocre prompt with precisely relevant context will outperform an elegant prompt with irrelevant context every time.

Context engineering encompasses everything that determines the information available to an AI system at inference time: the system prompt, conversation history, retrieved documents from RAG pipelines, tool schemas and function definitions, memory systems (episodic, semantic, procedural), knowledge graph relationships, runtime metadata, and dynamic state management. It is the art and science of filling the context window with exactly the right information for the next step.

For enterprises, this is where the problem gets hard. Enterprise knowledge is fragmented across legacy databases, SaaS platforms, CRM systems, document repositories, internal wikis, ticketing systems, communication tools, and regulatory filing systems. Each source has its own access control model, data format, retention policy, and compliance posture. Assembling context means crossing all of these boundaries simultaneously, and doing so safely, securely, privately, and compliantly.

Why Context Is the Competitive Moat

The model layer is commoditizing rapidly. Enterprises now routinely deploy three or more foundation models, routing by use case. OpenAI market share declined from 50% to 34% in 2024 while Anthropic doubled from 12% to 24%. Open-source models hold 19% of enterprise deployments and climbing. The model is becoming interchangeable. What is not interchangeable is the context architecture that feeds it.

Consider the production evidence. RAG with proper metadata filtering and reranking reduces hallucination rates from 19% to approximately 2%. Microsoft Research's GraphRAG achieves 80% accuracy on complex multi-hop queries where traditional RAG scores 50%. Hybrid retrieval combining dense vector embeddings with sparse BM25 representations shows 15-30% precision improvements over either approach alone.

These are not marginal gains. They are the difference between a system that enterprise users trust and one they abandon after the first incorrect answer. And every one of these improvements is a context engineering improvement, not a model improvement. The same model produces dramatically different results depending on the quality, relevance, and structure of the context it receives.

The organizations that win the enterprise AI race will not be those with access to the best model. Every enterprise has access to the best models. The winners will be those that build the most robust context infrastructure: the systems that assemble, govern, and deliver the right information to the right model at the right time, under the right constraints.

The Enterprise Context Architecture

A production-grade enterprise context architecture has four layers, each with its own engineering challenges and compliance implications.

Layer 1: Ingestion and Embedding

Enterprise content must be chunked, encoded into dense vector representations via embedding models, and indexed in a vector database. This is the foundation of any retrieval pipeline. But in an enterprise context, ingestion is where compliance obligations first attach.

Every document ingested carries metadata: its access control list, its data classification (public, internal, confidential, restricted), its regulatory jurisdiction (GDPR, HIPAA, CCPA), its retention schedule, and its provenance chain. This metadata must be preserved through the embedding process and attached to every vector in the index. A vector without metadata is an ungovernned vector, and ungoverned vectors are how enterprises leak sensitive information through AI systems.

Vector embeddings themselves present a security surface. Research has demonstrated that embeddings can be partially inverted to reconstruct source data. This means the vector index is not a safe abstraction layer. It is a derivative data store that inherits the sensitivity classification of its source documents. Enterprises that treat their vector databases as less sensitive than their document repositories are making a category error with real compliance consequences.

Layer 2: Retrieval and Assembly

When a user submits a query, the retrieval layer must find the most relevant context from the index and assemble it into a coherent context package for the model. Production-grade retrieval in 2025 is multi-stage:

Hybrid search. Combining dense vector similarity (semantic matching) with sparse BM25 representations (exact keyword matching) via Reciprocal Rank Fusion. This dual approach catches both semantically similar and lexically exact matches, addressing the known weaknesses of pure vector search on domain-specific terminology.
Metadata filtering.Before or during retrieval, filtering by document classification, regulatory jurisdiction, department, date range, and critically, the requesting user's access permissions. This is where role-based access control (RBAC) must be enforced at the vector level, not just the document level.
Reranking. A second-stage model (cross-encoder or dedicated reranker) rescores initial retrieval results for precision, pushing the most relevant chunks to the top and filtering noise.
Chunk reordering.Placing the most relevant context at the beginning and end of the assembled context window, mitigating the well-documented "lost in the middle" problem where models underperform on information positioned in the center of long contexts.

The critical compliance requirement at this layer is access control propagation. When an AI system retrieves a document fragment for a user's query, it must inherit and enforce the access permissions of the original document. Most RAG implementations in 2024 did not properly enforce document-level RBAC through the retrieval pipeline. This is the single most common compliance failure in enterprise AI deployments, and it is a context engineering problem.

Layer 3: Knowledge Graph Augmentation

Traditional vector-based RAG excels at "needle in a haystack" retrieval: finding the specific chunk that answers a specific question. It fails at thematic, cross-document reasoning: "What are the compliance risks across all our vendor contracts?" or "How do our data processing agreements interact with the new EU AI Act requirements?"

Knowledge graphs address this by capturing entity-relationship structures extracted from enterprise content. GraphRAG, introduced by Microsoft Research in 2024, extracts entity-relationship graphs from raw text, builds community hierarchies, generates summaries for these communities, and leverages these structures during retrieval. The result is a 3.4x improvement on enterprise benchmarks for complex queries and 72-83% comprehensiveness on global thematic questions.

For compliance-sensitive enterprises, knowledge graphs offer a critical advantage: provenance. Every node and edge in the graph carries metadata about its source document, extraction timestamp, and confidence score. When an AI system makes a claim, the knowledge graph provides a full audit trail from the claim back through the graph traversal to the original source documents. This lineage tracking is not optional in regulated industries. It is a requirement.

Layer 4: Context Orchestration

The orchestration layer decides what context to assemble for each inference request. This is the control plane of the context architecture. It must make real-time decisions about which retrieval strategies to invoke, which knowledge graph subgraphs to traverse, what memory to include, and what constraints to enforce.

In an enterprise setting, a single query might require context from documents governed by GDPR (EU data residency), HIPAA (protected health information), internal confidentiality classifications, and contractual data processing agreements simultaneously. The orchestration layer must enforce per-document, per-field compliance rules in real-time during retrieval, before the context reaches the model.

This is the layer where most enterprises fail. They build RAG pipelines that retrieve relevant content but do not enforce governance at the retrieval boundary. The result is AI systems that produce accurate, helpful answers that also constitute compliance violations.

Privacy: The First Constraint

Privacy is not a feature of enterprise AI. It is a precondition. Every technique for assembling context must preserve the privacy of the underlying data, the individuals represented in that data, and the organizational boundaries that separate different data domains.

Differential Privacy in Context Pipelines

Differential privacy (DP) is the only technique with provable mathematical bounds on privacy leakage. It works by adding calibrated noise to data or model updates, ensuring that the output of any computation does not reveal whether any specific individual's data was included in the input. The privacy guarantee is controlled by epsilon, the privacy budget: lower epsilon means stronger privacy, more noise, and less utility.

In context engineering, differential privacy applies at multiple points. During embedding generation, DP mechanisms can ensure that individual documents cannot be reconstructed from their vector representations. During retrieval, DP can bound the information leakage from query patterns. During aggregation, DP can ensure that summarized context does not reveal protected attributes of the source data.

The trade-off is real. More noise improves privacy but reduces the precision of retrieval results. Enterprises must calibrate their epsilon budgets based on the sensitivity of the data domain, the regulatory requirements, and the acceptable performance degradation. There is no universal setting. Context engineering requires domain-specific privacy calibration.

Federated Context Assembly

Federated learning trains models across distributed data sources without centralizing the raw data. In the context engineering paradigm, this extends to federated context assembly: building the context package from distributed sources without moving sensitive data across organizational boundaries.

A production case study in contract review demonstrated 94.2% clause classification accuracy using federated learning combined with differential privacy and secure multi-party computation, achieving 96% improvement in reconstruction resistance. The training overhead was approximately 26%, a meaningful but manageable cost for the privacy guarantees achieved.

For enterprises operating across jurisdictions, federated approaches solve a fundamental constraint: GDPR requires that EU personal data remain within EU data residency boundaries, but AI systems may need context that spans jurisdictions. Federated context assembly allows the model to benefit from global context without physically moving data across regulatory boundaries. The context comes to the model in processed form, never as raw data.

Confidential Computing for Context

Trusted Execution Environments (TEEs) provide hardware-isolated memory regions where code and data are encrypted at rest, in transit, and during computation. Intel SGX, AMD SEV-SNP, AWS Nitro Enclaves, and NVIDIA H100 confidential GPUs all provide TEE capabilities.

Azure made confidential VMs with NVIDIA H100 GPUs generally available in 2024, combining AMD SEV-SNP CPU isolation with confidential computing primitives in the H100 GPU. All code and data, including encryption keys, prompts, and completions, remains encrypted in CPU memory and during CPU-GPU PCIe bus transfer. The cloud provider never sees the plaintext.

This is transformative for enterprise context engineering. It means an organization can use its most sensitive documents to ground AI systems without those documents ever being visible to the infrastructure provider. The context pipeline runs inside a TEE. The model runs inside a TEE. The assembled context never exists in plaintext outside the hardware enclave. For enterprises in regulated industries, healthcare, financial services, legal, defense, this is the architecture that makes AI-grounded-on-sensitive-data possible at all.

The Hybrid Privacy Architecture

No single privacy technique is sufficient. The strongest enterprise privacy posture combines all three: federated learning to keep raw data local, differential privacy to bound statistical inference risk, and trusted execution environments to protect computation on untrusted infrastructure. This layered approach addresses data movement risks (federated), inference risks (DP), and infrastructure trust risks (TEE) simultaneously.

Worldwide spending on privacy-enhancing technologies reached $3.17 billion in 2024. The investment trajectory suggests that privacy-preserving context engineering is not a niche concern. It is becoming the default architecture for enterprise AI.

Security: Protecting the Context Pipeline

The average data breach cost reached $4.88 million in 2024. AI automation reduced breach costs by $2.2 million when properly implemented, but one in five organizations reported a breach due to shadow AI, and those breaches cost $670,000 more on average. The context pipeline is a high-value attack surface, and enterprises must secure it accordingly.

Vector Index Security

Vector databases are derivative data stores. They contain encoded representations of enterprise content. Embedding inversion attacks can partially reconstruct source text from vector representations. This means vector indices must be protected with the same access controls, encryption, and audit logging as the source document repositories.

Concretely, this requires: encryption at rest for all vector storage, encryption in transit between the retrieval service and the model, access control lists (ACLs) attached to individual vectors (not just the index), audit logging of every retrieval query with the requesting user identity, and regular rotation of embedding models and re-encoding of the index to limit the window for inversion attacks.

Access Control Propagation

This is the most critical and most commonly neglected security requirement in enterprise AI. When User A queries an AI system, the retrieval pipeline must enforce User A's access permissions at every step. If User A does not have access to Document X in the source system, User A must not receive context derived from Document X in the AI response. This seems obvious. In practice, most enterprise RAG deployments in 2024 used a single service account for retrieval, effectively granting all users access to all indexed content.

Proper access control propagation requires: integration with the enterprise identity provider (SAML, OIDC), mapping user identities to source system permissions, enforcing those permissions as filters at retrieval time, and caching permission checks to maintain acceptable latency. This is an engineering-intensive requirement, but it is non-negotiable. An AI system that surfaces confidential information to unauthorized users is a data breach, regardless of whether a human chose to retrieve that information or an AI did.

Context Leakage Prevention

Context leakage occurs when information retrieved for one user's query becomes accessible to another user. This happens through several mechanisms: cached responses that are served across users, model state that persists between sessions, embedding caches that do not enforce user-level isolation, and logging systems that capture full context payloads.

Preventing context leakage requires user-level session isolation (no shared model state between users), cache partitioning by user identity and permission level, PII scrubbing in logging pipelines, and regular purging of ephemeral context stores. In agentic architectures where the AI makes multiple tool calls per task, each call may access different data sources with different access controls. The context isolation boundary must be maintained across the entire agent execution graph.

Compliance: The Regulatory Context

The compliance landscape for enterprise AI is complex and accelerating. Every time an AI system assembles context from multiple data sources, it potentially crosses compliance boundaries. The context orchestration layer must enforce per-document, per-field compliance rules in real-time.

The EU AI Act

The EU AI Act entered into force on August 1, 2024. It is the first comprehensive AI-specific regulation with global reach. Its four-tier risk classification system categorizes AI applications as unacceptable (prohibited), high-risk, limited risk, or minimal risk. High-risk AI systems, which include most enterprise applications that make decisions affecting individuals, require documented risk management systems, robust data governance measures, detailed technical documentation, automatic logging, human oversight, and accuracy, robustness, and cybersecurity safeguards.

Penalties are severe: up to 35 million euros or 7% of global turnover for deploying prohibited systems. The phased implementation begins with prohibited practices taking effect in February 2025, general-purpose AI model obligations in August 2025, and full applicability in August 2026. Enterprises building context architectures today must design for compliance that will be enforced within months.

For context engineering specifically, the EU AI Act's data governance requirements mean that every piece of context assembled for a high-risk AI system must be traceable, auditable, and governed. Training data provenance, retrieval source attribution, and decision-making transparency are regulatory requirements, not optional best practices.

GDPR and the Right to Erasure

GDPR's maximum fines of 20 million euros or 4% of annual global turnover are well known. Less discussed is the specific tension between GDPR requirements and enterprise AI context architectures.

The right to erasure (Article 17) requires that personal data be deleted upon request. In a vector database, this means not just deleting the source document but identifying and removing every vector chunk derived from that document, every knowledge graph node extracted from it, every cached retrieval result that includes it, and every model state that was influenced by it during fine-tuning. This is technically demanding. Embeddings are not trivially reversible to their source documents. Enterprises need robust metadata linkage between source documents and their derivative representations across the entire context pipeline.

Purpose limitation (Article 5) constrains how retrieved data can be used. Context assembled for customer support cannot be repurposed for marketing analytics without separate legal basis. Data minimization (Article 5) conflicts directly with the intuition that more context produces better AI outputs. Enterprises must retrieve the minimum necessary context for each query, not the maximum available context.

HIPAA and Protected Health Information

Any AI system accessing Protected Health Information requires Business Associate Agreements with all AI infrastructure providers. Major hospital systems report spending $300,000 to $500,000 to properly vet and implement a single complex AI algorithm. The context pipeline must ensure that PHI is never transmitted to model providers without appropriate contractual and technical safeguards. TEE-based inference and on-premises model deployment are the primary architectural responses.

SOC 2 and Audit Trails

SOC 2 Type II has become table stakes for enterprise AI vendors. Its five trust service criteria require encrypted storage, quarterly key rotation, and immutable audit trails. For context engineering, this means every retrieval query, every context assembly decision, and every model interaction must be logged in an immutable, queryable audit store. The audit trail must answer: what context was assembled, from which sources, under which access permissions, for which user, at what time, and what was the model's response.

This is not just a compliance checkbox. It is the foundation for debugging, incident response, and continuous improvement of the context pipeline. An enterprise that cannot trace how a specific AI response was generated from source data through retrieval to output cannot defend that response to regulators, auditors, or customers.

Safety: Guardrails on the Context Pipeline

Safety in context engineering means ensuring that the assembled context does not cause the AI system to produce harmful, biased, or misleading outputs. This goes beyond model-level safety (content filtering, toxicity detection) to the safety of the context itself.

Context Poisoning

If an attacker can inject malicious content into the enterprise knowledge base, that content will be retrieved and surfaced to users through the AI system. This is the context-layer analogue of training data poisoning, but it operates at inference time and can be executed against RAG systems without any access to the model itself. Defenses include integrity verification on ingested content, anomaly detection on newly added documents, source reputation scoring, and human review gates for sensitive knowledge domains.

Context Staleness and Temporal Safety

Enterprise knowledge changes. Policies are updated. Regulations evolve. Product specifications are revised. An AI system that retrieves outdated context may produce responses that were correct last quarter but are wrong today. Temporal validation in the context pipeline, ensuring that retrieved content reflects the current state of enterprise knowledge, is a safety requirement. This requires version-aware retrieval, TTL (time-to-live) metadata on indexed content, and automated re-ingestion pipelines triggered by source document changes.

Bias Propagation Through Context

AI systems inherit biases not just from their training data but from the context they receive at inference time. If the enterprise knowledge base disproportionately represents certain perspectives, demographics, or outcomes, the AI system will reflect those biases in its responses. Context-level bias auditing, analyzing the distribution and representation of the indexed knowledge base, is a necessary complement to model-level bias detection. This is particularly critical in regulated domains like lending, hiring, and insurance where biased outcomes carry legal liability.

Agentic AI Multiplies the Context Challenge

Everything above becomes harder with agentic AI. A chatbot needs context for a single response. An agent needs context maintained and evolved across multi-step reasoning chains, tool invocations, and state transitions. Agentic architectures represent 12% of enterprise AI deployments and are the fastest-growing category. Gartner predicts 15% of day-to-day work decisions will be made autonomously by agentic AI by 2028.

The context challenges compound:

Multi-step permission propagation. When an agent calls a database tool, then a document retrieval tool, then a communication tool in sequence, each step may have different access control requirements. The context orchestration layer must enforce permissions at every boundary, not just at the initial query.
Context accumulation and leakage.As an agent executes a multi-step task, it accumulates context from each step. Information retrieved in step 3 should not influence the access control decisions in step 7 unless the permission model explicitly allows it. This requires fine-grained context isolation within the agent's execution graph.
Audit trail complexity. A single agentic task may involve dozens of retrieval operations, tool calls, and intermediate reasoning steps. The audit trail must capture not just the final output but the full chain of context assembly decisions. This is an order-of-magnitude increase in logging volume and complexity compared to single-turn interactions.
Tool standardization.Anthropic launched the Model Context Protocol (MCP) in November 2024 as an open standard for connecting AI systems to data sources and tools. Described as "USB-C for AI," MCP addresses the fragmentation problem where every tool integration requires custom engineering. For enterprise context engineering, MCP provides a standardized interface for context delivery, but the governance layer on top of MCP, who can access what through which tool, remains the enterprise's responsibility.

Only one in five companies has a mature governance model for autonomous AI agents. Gartner warns that over 40% of agentic AI projects will fail by 2027 because legacy systems cannot support modern AI execution demands. The common failure mode will not be model capability. It will be context governance: the inability to safely, securely, and compliantly assemble the context that agents need to operate autonomously.

The Data Architecture Foundation

Context engineering does not exist in isolation. It requires a data architecture that supports governed, auditable, real-time access to enterprise knowledge. Two architectural patterns are converging:

Data Fabric. An automated, metadata-driven architecture providing unified and seamless access, sharing, and governance across data silos. Knowledge graphs and semantic layers form the connective tissue that makes data understandable across systems. Data fabric provides the substrate on which context engineering operates.
Data Mesh. A decentralized approach that shifts data ownership to domain experts, treating data as products with clear contracts, SLAs, and quality guarantees. Data mesh provides the organizational model that ensures context sources are well-governed at their origin.

Gartner's 2024 survey found 22% of enterprises adopted data fabric, 26% adopted data mesh, and 13% use both. The prediction is that by 2028, 80% of autonomous data products supporting AI-ready data will emerge from complementary fabric-mesh architectures. Enterprises investing in context engineering without investing in their underlying data architecture will hit a ceiling. The context pipeline can only deliver what the data architecture makes accessible and governable.

What Happens Next

The trajectory for 2025 and beyond is clear along several dimensions.

Models will continue to improve and commoditize. Context windows will expand further. Inference costs will continue to fall. But bigger context windows will not solve the enterprise context problem, because the problem is not token capacity. It is governance, access control, compliance, and trust. Filling a 2-million-token context window with ungoverned, unfiltered enterprise data is not a solution. It is a liability.

Agentic AI will drive a step function increase in context engineering complexity. Autonomous agents making multi-step decisions across enterprise systems will require context governance that operates at machine speed, enforcing permissions, compliance rules, and safety constraints at every step without human review in the loop. This is an infrastructure problem that most enterprises have not begun to address.

Regulation will accelerate. The EU AI Act's phased implementation will create compliance deadlines throughout 2025 and 2026. Other jurisdictions will follow with their own frameworks. Enterprises that build context architectures without governance will face retroactive compliance obligations that are expensive to satisfy after the fact. Building governance into the context pipeline from the start is cheaper than bolting it on later.

Privacy-preserving computation will become the default, not the exception. Confidential computing, federated learning, and differential privacy will move from research papers and proof-of-concepts to standard enterprise architecture patterns. The $3.17 billion invested in privacy-enhancing technologies in 2024 is early-stage spending. The projected growth to $28.4 billion within the decade reflects the recognition that privacy-preserving AI is not optional.

The enterprises that will capture the most value from AI are not those racing to deploy the newest model or expand their inference capacity. They are the ones investing in context infrastructure: the retrieval pipelines, knowledge graphs, governance frameworks, privacy-preserving computation layers, and audit systems that make it possible to safely, securely, privately, and compliantly assemble the context that AI systems need to be useful.

Context is the product. Context is the moat. Context is the work. Everything else is plumbing.