ai

The AI Context Layer Won’t Build Itself

My response to a16z’s article

Ido BronsteinIdo Bronstein
March 18th, 2026

The recent a16z article on the data context layer for AI agents struck a chord with us at Upriver. As a company building AI agents for data teams, we couldn't agree more: data context is everything. Connecting LLMs to data warehouses alone will not unlock reliable AI agents. 

I see the a16z article as an important step in framing the problem. But the real challenge now is not recognizing the need for a context layer - it’s understanding how that layer will actually be built and maintained inside organizations.

The 3 key points I strongly agree with:

  1. Context is critical for agents. Connecting an LLM to a warehouse is not enough. Agents will simply fail without data context. They must understand entities, relationships, and business definitions to operate reliably.

  2. The "wall" organizations hit with data agents. When deploying agents to real environments, teams quickly realize the problem isn't the model - it's the missing organizational context around the data. Context becomes the bottleneck. The detailed example in the article captures this well. It clearly shows why context is critical, even for a seemingly simple question like: “What was revenue growth last quarter?”

  3. Semantic vs. context layer. It’s important to understand the distinction between a semantic layer and a context layer, and this was very well articulated in the article. The semantic layer organizes data. The context layer captures how the business actually works - entities, relationships, operational logic, and institutional knowledge.

Where I think the conversation should go further:

Automation cannot build the context layer

The discussion around automated context construction is where I see a gap. While automation can be useful for bootstrapping a context layer, it cannot be the primary approach.

Automation can only look at patterns from the past. It can analyze historical queries, schemas, and logs, but a business is a constantly evolving system

And here’s the fundamental problem with inferring context from historical metadata: if the context layer only captures meaning after patterns have already formed, then by definition there was a period where it didn’t reflect reality. Any decisions that were made during that gap were made on context you couldn’t trust.

The most important context often comes from decisions about the present - defining new business entities, changing metric interpretations, redefining system relationships, and introducing new product concepts. These cannot be inferred from query history, they must be intentionally defined by humans.

Human refinement vs. human ownership

The term “human refinement” in the article understates the role humans will play here. Humans will not just refine the context layer, they will own and guide it.

Organizations will need members of their data teams to define business entities, the relationships between those entities, and mappings between those concepts and raw data. Human ownership must be embedded directly in the workflow.

The rise of the “Knowledge Engineer”:

This shift will lead to the emergence of a new role. An evolution of today’s data engineers and analysts. A role focused not just on pipelines or dashboards, but on modeling organizational knowledge for AI systems and humans.

I call this role the Knowledge Engineer. Their job is to translate how the business actually works into structured context that AI agents can understand and use. As companies become increasingly AI-driven, I believe this role will become inevitable in the modern data team.

The real question:

So the key challenge is not:

“How do we automatically extract entities and relationships from historical queries?”

The real question is:

How do we enable humans to continuously evolve business definitions within the context layer quickly, safely at the speed the agentic era demands?

Right now, this workflow is extremely slow. Defining a business concept can be relatively straightforward, but translating it into something usable - for both agents and humans - is much harder.

Raw data is messy by nature. It’s full of inconsistent naming conventions, fragmented formats, and misalignment across source systems. Making it usable often requires heavy transformations and complex computations. This all needs to be done efficiently, not recalculated every time an agent interacts with it.

What this really does is push humans into deep technical work: building and maintaining data pipelines that are slow to implement and more difficult to evolve.

The result? The context layer never becomes the living, trusted system it needs to be. Instead, it ends up static, outdated, and slow to adapt.

What we’re building at Upriver:

This is exactly the problem we are focused on solving. 

Our belief is simple: Organizations will only be able to build a true context layer if data teams are empowered to define business knowledge directly. That’s why we are building agents for data teams.

The goal of our agents is not to replace the human defining the context. Instead, it is to remove the technical friction required to implement it.

Our agents handle the heavy technical work across the data stack, allowing data teams to focus on understanding the business, defining entities and relationships, and connecting those concepts to real data.

In other words, enabling the Knowledge Engineer inside every organization. Because in the end, the context layer cannot be generated automatically, it must be authored by humans and operationalized by machines.

Join the conversation:

Likewise, I welcome your thoughts on this perspective. Reach out to me on LinkedIn to continue the discussion. And if you want to see how we’re approaching this at Upriver, request a demo.

Meet Your AI-Ready Data Engineering Copilot

Book a demo to see how Upriver lets your team stop firefighting – and start building