Your Agent Isn't Bad at Data Engineering. Your Data Platform Just Isn't Ready.

The conversations around AI coding agents lately have been focused almost entirely on application code. How to optimize React components. How to scaffold API endpoints. How to refactor microservices. There’s real progress on how to structure codebases so agents can navigate, build, and verify their changes autonomously - and it's excellent work.

But data teams are being left out of the conversation.

At the Coding Agents Conference in March 2026, Shrivu Shankar (VP AI at Abnormal Security) presented a workshop on "Optimizing Codebases for Agents" and his core insight was blunt: "Your agent isn't broken. Your codebase is."

He introduced a three-dimension scorecard - rules/config, file organization, test/verification - and showed that agent performance depends far more on how the codebase is structured than how clever your prompts are.

That insight applies directly to data platforms. But data platforms have characteristics that make agent readiness both harder and more important than what the codebase framework covers. You can't just apply the same three dimensions to a dbt project and call it a day. If you do, your agent will simply fail. Data platforms need their own framework.

This post explains why. Stay tuned for our next post that provides the data platform specific framework itself - 5 dimensions, a practical scorecard, and concrete quick wins you can act on immediately.

Why data platforms break AI agents

Data platforms aren't "codebases with SQL." They have four properties that make agent success fundamentally harder.

The metadata problemIn application development, there's a saying that good code documents itself. A well-named function like calculate_monthly_revenue() tells the agent most of what it needs to know. Even without a clear name, the logic in it usually reveals the answer. In data, meaning is hidden. A column called rev_3 might represent net revenue after refunds, or something else entirely. The definition is tribal knowledge - locked in someone’s head, maybe even someone who left the company eight months ago. Agents don't necessarily have visibility into how this column is calculated. Software agents can read the source of truth. Data agents often can't, because the source of truth was never written down.

The multi-tool problemAn application codebase mostly contains the information an agent needs to do its job. The agents usually operate within one system. Data agents don’t.

A data platform consists of:

+ a warehouse (Snowflake, BigQuery, Databricks) + an orchestrator (Airflow, dbt Cloud, Dagster) + a transformation layer (dbt, SQLMesh)+ a BI tool (Looker, Tableau, Metabase)

Each with its own configuration, its own conventions, and its own failure modes. The data agent needs to reason across all of them to actually work. Even when connecting MCPs and skills, guiding the agent to look at the right tool at the right time is harder than it sounds.

The blast radius problemA single staging model can feed dozens of downstream models. These feed dashboards, reverse ETL syncs, and ML features. Modify that staging model incorrectly and everything downstream is affected - but nothing visibly breaks. The downstream models still build. The dashboards still render. The numbers just quietly shift.

Someone discovers the problem two weeks later when a monthly reconciliation surfaces a discrepancy. Application codebases have their own version of this (a bad API change can cascade too), but data platforms are uniquely prone to silent failures - changes that propagate without producing errors, where the only signal that something went wrong is a number that looks slightly off to someone who knows what it should be.

The verification gapApplication agents can run tests and get a clear signal: pass or fail. Data agents face a harder problem: verifying not just syntactic correctness (does the SQL compile?) but semantic correctness (does this query actually calculate net revenue the way our CFO defines it?). A query can return a perfectly plausible number and be completely wrong. Without tests that encode business logic, there is no definition of what "correct" means. And the agents have no way to know.

These four properties mean that when agents fail on data platforms, they fail differently than they do on application code. And they fail in predictable ways.

6 common AI agent failures in data engineering

When agents work on data platforms that aren’t ready, they fail in predictable ways. Here are six anti-patterns to watch out for, adapted from the eight codebase anti-patterns in Shankar's original framework, filtered for data relevance and extended with data-specific failure modes.

The "looks right" trapThe agent writes a query that returns a plausible number. It joins orders to users and calculates average revenue per user. The number looks reasonable - $47.50. But it joined on the wrong grain, double-counting users who placed multiple orders. No test catches this because the only tests are not_null and unique on the primary key. This is the data equivalent of "tests pass but the feature is wrong" - and the most dangerous anti-pattern because the output is indistinguishable from a correct result.
Metric driftThe agent calculates "churn rate" using its own definition - users who haven't logged in for 30 days divided by total users. Your company defines churn as canceled subscriptions in the trailing 90 days divided by active users at the start of the period. Both are reasonable definitions. Only one matches what your CFO reports to the board. Without a semantic layer or machine-readable glossary, every agent-written metric is a potential source of inconsistency across your organization.
Schema contaminationThe agent needs to build a new mart and look for existing patterns. It finds a legacy staging model with an old naming convention, deprecated columns, and a join pattern that was replaced six months ago. It copies that pattern faithfully into the new mart - "pattern pollution" from the codebase framework, data edition. The agent can't distinguish between a current best practice from a historical artifact nobody got around to deleting.
The mega-CTEThe agent is asked to build a new model but can't find the right intermediate model to build on - because the project is poorly organized, or the names aren't greppable, or there's no header comment explaining what each model does. So it writes a 15-CTE model from scratch, reimplementing joins and business logic that already exist in three other files. The model works, but the duplicated logic will drift as the originals are updated.
Unguarded DDLThe agent encounters an error and decides to start fresh with CREATE OR REPLACE TABLE - against production. This is fundamentally an access-control problem (the agent shouldn't have DDL permissions on prod), but it's also a guardrails problem. Without hooks that block destructive operations, the agent treats production the same as development, and there's nothing in the environment to tell it otherwise.
Test deletionA data test fails after the agent's change. Instead of investigating and fixing the model, the agent removes the failing test. The build passes. The PR looks clean. And nobody notices that a critical business-rule check just disappeared - until the next month-end close surfaces numbers that don't reconcile. Without protected baseline tests, the agent is grading its own homework.

The root cause: It’s not the AI agent

None of these failures are about the agent being “bad”. Every one of them is a rational response to an environment that doesn’t provide information or guardrails the agent needs. The agent writes wrong dialect SQL because nothing told it the dialect. It duplicates logic because it can't find the existing logic. It ships wrong numbers because no test encodes what “right” means. It drops a production table because nothing prevents it.

The problem isn’t the agent. It’s the platform.

Why this matters now

Three forces are converging that make this urgent.

Data complexity has outpaced humansThe modern data stack has more dependencies, more tools, more moving parts than any team can manually understand. Operational overhead is growing faster than headcount. Hiring more data engineers doesn’t scale, and even when you can hire, onboarding takes months because the knowledge needed to operate the platform is tribal - locked in people’s heads, not written down anywhere a new engineer (or an agent) can access.
Generic AI agents lack context in data environmentsOrganizations are investing heavily in implementing AI agents across the business, but agents are only as good as the context they operate on. In data engineering, that context is scattered across warehouses, orchestrators, and code repos that no single agent can see. The data isn’t clean enough, documented enough, or connected enough to power reliable AI. As a result, generic AI agents fall short. Without a system that connects to the full environment and feeds that understanding to the agent, AI can’t do the real work. The result: AI pilots in data engineering produce unreliable results. Not because the AI is bad, but because it lacks a trustworthy understanding of the environment. Without the right context, the right connections, and the right orchestration, even the most capable AI fails.
Data teams are already feeling the painObservability is in place but it isn’t enough - it detects problems but doesn't solve them. Engineers are experimenting with AI coding tools but hitting the context wall - spending fifteen to twenty minutes explaining their environment before the agent can do anything useful. Leadership is asking for AI-ready data infrastructure but the foundation isn’t there. There is a missing capability in the stack and the teams that feel it most, know it.

Context has become the bottleneck for AI adoption in data engineering. The problem isn’t the agent, it’s that nothing feeds it a trustworthy, living understanding of the environment.What comes next

The original codebase readiness framework needs to be adapted and extended for data platforms. We’ve done that - building a 5 dimension framework tailored for data: agent configuration, schema and metadata readiness, pipeline organization, data testing, and observability and lineage. In our next post, we walk through each dimension with before-and-after examples, provide a scored rubric you can run with your team, and give concrete quick wins that each take less than an hour.

The teams that will get the most out of AI agents aren’t the ones with the best prompts - they’re the ones whose platforms are structured so agents can navigate schemas, understand business context, verify their own work, and reason about downstream impact.

AI success in data engineering starts with platform readiness - not just smarter agents.

Want to see how ready your data platform is for AI agents? In our next post, we share a simple scoring framework and quick wins you can implement in under an hour. If you're already exploring AI agents, you can also reach out to see how we help teams make their data platforms agent-ready.

Your Agent Isn't Bad at Data Engineering. Your Data Platform Just Isn't Ready.

Why data platforms break AI agents

6 common AI agent failures in data engineering

The root cause: It’s not the AI agent

Why this matters now

Meet Your AI-Ready Data Engineering Copilot