Data Teams Aren’t Dying. They’re Finally Becoming What We Hoped They’d Be.
AI is making data teams smaller, sharper, and harder to fake your way into
I’ve been thinking a lot about the impact AI is going to have on data teams. Not in the abstract “AI will replace everyone” sense, but in the very practical sense of how data work actually gets done day to day.
I’m more optimistic than most, with one important caveat.
I do think data teams are going to look better than they ever have. I also think they’re going to be smaller.
In fact, I think data teams are starting to resemble much more closely what I hoped they would be when I entered this world nearly ten years ago. Small groups of people who deeply understand the business, work closely with stakeholders, and use data to drive real decisions.
Back then, the appeal wasn’t building endless pipelines or debugging transformations. It was leverage. The idea that a handful of people, armed with the right tools and judgment, could have an outsized impact.
For a long time, reality didn’t quite match that vision. Data teams grew large, work became fragmented, and much of the effort went into maintenance, translation, and busywork rather than insight.
AI might be what finally closes that gap.
But it comes with a tradeoff.
As the work becomes more leveraged, the bar inevitably goes up. There will be fewer seats at the table and more competition for them. And that competition will be less about who can grind through basic tasks and more about who truly excels at the craft.
That’s uncomfortable. It’s also unavoidable.
Why data teams got bloated (and why that’s changing)
For years, data teams became bloated not because businesses needed that much analytical insight, but because maintaining the data stack itself required so much ongoing effort.
A large share of the work was necessary, but low leverage:
Keeping pipelines from breaking
Rewriting transformations as schemas changed
Debugging downstream failures
Rebuilding reports after upstream changes
Headcount grew to support maintenance, not to create value. Over time, the size of the data team became more about keeping the lights on than about helping the business make better decisions.
Coding skill became the bottleneck. And because it was the bottleneck, it became the primary signal of value inside data teams.
This wasn’t irrational. Writing production-quality SQL, Python, and dbt models used to be genuinely expensive. Debugging was slow. Deployments were fragile. Context switching was painful. The people who could grind through that work were incredibly valuable.
But two things have changed, dramatically.
First, AI has collapsed the cost of writing and iterating on code. What used to take hours of careful implementation and debugging can now be done in a fraction of the time. The bottleneck is no longer typing. It’s knowing what to build and why.
Second, the data tooling ecosystem has quietly absorbed a huge amount of the pain that once justified large teams.
Orchestration and deployment no longer require bespoke infrastructure. You can buy something like Dagster, write a modest amount of Python, and deploy it serverlessly.
Event instrumentation no longer means hand-rolled tracking plans and brittle customer code. Drop Segment into an app and start emitting structured, standardized events.
Extract-and-load pipelines used to mean Fivetran. Even that assumption is changing. Tools like dlt make it possible to build and deploy custom pipelines with far less effort, often by working directly with AI inside frameworks designed for it.
Reverse ETL used to be bespoke glue code. Now it’s as simple as pointing modeled datasets at Census or Hightouch and flipping a switch.
Taken together, a massive amount of what once required custom engineering effort is now automated, abstracted away, or dramatically easier to build.
And yet, many data teams still operate as if none of this has changed.
That’s where the confusion crept in.
We started optimizing for the production of clean, clever code because historically that’s what was hard, and mistook it for impact.
Producing elegant pipelines and models is not the same thing as helping the business make better decisions.
They’re related. But they’re not the same.
When value became invisible
Outside of the data team, almost no one cared how elegant your SQL was, how clean your dbt models were, or how clever your abstractions were.
That stuff mostly mattered to us.
From the rest of the company’s perspective, data work happened quietly in the background. Pipelines ran. Models refreshed. Dashboards updated. And yet, it was often unclear how any of this translated into better decisions or real outcomes.
This created a familiar tension. Data teams felt constantly busy, while everyone else quietly wondered what the data team was actually spending their time on.
Part of the issue is that data work looks a lot like engineering, but produces something fundamentally different.
A software engineer ships a feature. You can see it, click it, and play with it. A data engineer ships a pipeline. That pipeline produces datasets, which then get handed off to data scientists, who may or may not turn them into analysis that actually changes behavior.
By the time value shows up, it’s several steps removed from the original work.
That invisibility wasn’t a failure of the role. It was a failure of the interface.
And that’s where AI changes the story.
What self-serve analytics actually looks like now
When people hear “self-serve analytics,” they often picture dashboards no one trusts, metric sprawl, and endless debates about definitions.
That’s not what this is.
In an AI-native data stack, self-serve analytics starts to look less like reporting and more like product development.
A real business problem gets identified. The company needs a new data asset. Maybe it’s product usage, costs, funnel behavior, or something else that doesn’t exist cleanly yet.
The data engineer’s job is to get the raw data into the warehouse. Sources are wired up, schemas are understood, and data lands reliably. This work still matters, but it’s faster and more predictable than it used to be thanks to modern tooling and AI-assisted development.
Then the analytics engineer steps in.
Their job isn’t just to transform data. It’s to shape meaning. They define the grain. They encode business logic. They decide which joins are valid, which metrics are safe, and which questions the data can and cannot answer.
What comes out of that work is not just a set of tables. It’s a real data asset.
And here’s the key shift. That asset is no longer released quietly to a small group of analysts. It’s shipped to the entire company.
The analytics engineer effectively launches the dataset. They share it. They document it. They explain how to explore it. And because AI understands the underlying data models, anyone can start asking questions immediately in a conversational, guided, and trustworthy environment.
People don’t need to know SQL. They don’t need to understand the entire schema. They reason through questions with an AI that already knows the structure, the definitions, and the constraints.
This is where the work of data engineers and analytics engineers becomes visible in a way it rarely was before.
Instead of pipelines running silently in the background, they build something people actually interact with. You can see it being used. You can see questions being asked. You can see decisions being made because of it.
They are no longer just keeping the lights on. They are shipping a product the entire organization uses.
This flow also eliminates a massive amount of busywork.
Today, many data teams spend huge amounts of time building bespoke dashboards. Someone asks for a view. A dashboard gets created. Then come the follow-ups. Change this cut. Add that filter. Reformat this chart. Break it out by one more dimension.
In an AI-native self-serve world, most of that simply goes away.
Instead of building hundreds of one-off dashboards, teams build a small number of high-quality data assets and let people explore them dynamically with AI. Exploration happens through conversation and iteration, not endless dashboard revisions.
Dashboards don’t disappear. They just get a much clearer job.
They become the place for core reporting. The source of truth for the metrics that matter most. The canonical place everyone goes to understand what the number is.
Not the place where every ad hoc question turns into a permanent artifact.
That shift alone saves enormous amounts of time and makes the work of data engineers and analytics engineers far more visible and valued.
Why the data scientist role is actually better now
Once self-serve analytics works the way it should, something important happens. Data scientists get their time back.
They’re no longer the default interface for every metric, report, or one-off question. The constant stream of “can you pull this number?” requests starts to fade.
That’s not because the business needs less insight. It’s because the right questions are being answered closer to where decisions are made.
What this gives back to data scientists is focus.
Instead of spending time on basic analytics and reactive support, data scientists get to work on problems that actually require depth. Ambiguity, tradeoffs, and uncertainty. The questions where there isn’t an obvious SQL query or dashboard waiting at the end.
AI compounds this shift.
With code generation no longer the bottleneck, data scientists can explore ideas dramatically faster. Hypotheses that once took days to test can now be explored in hours. Entire branches of analysis that felt too expensive before suddenly become feasible.
The role becomes more exploratory, more creative, and more intellectually demanding. It starts to look much closer to research and problem solving than production analytics.
In other words, the job begins to resemble what many people hoped data science would be in the first place.
What this means for data scientists (and why the bar is higher)
There’s a flip side to all of this.
As AI removes friction from execution, it exposes what actually matters. And that raises the bar.
Data scientists are no longer differentiated by their ability to write code from scratch or debug pipelines under pressure. AI can do much of that now, quickly and competently.
What it cannot do on its own is decide what to do.
The real differentiator becomes ideas and judgment:
Knowing which problems are worth solving
Understanding which methodologies apply to which situations
Recognizing the assumptions hiding inside a model
Knowing when results don’t pass the sniff test
There is an enormous and growing toolkit available to data scientists. Causal inference methods, experimental designs, forecasting techniques, Bayesian models, and more.
AI has knowledge of all of them.
But knowledge isn’t the same as understanding.
AI can surface methods, write code, and explain tradeoffs, but only if it’s guided by well-framed questions. Framing those questions correctly requires deep, internalized understanding of both the domain and the methods.
That’s where great data scientists stand out.
They know how to translate messy business problems into precise analytical questions. They know which assumptions matter and which don’t. They know how to push AI in the right direction, challenge its output, and iterate intelligently.
In an AI-native world, data scientists aren’t replaced by models. They become the people who unlock models.
You’re graded less on how much code you can produce and more on how well you can think. On how deeply you understand the landscape of methods available. On how effectively you can use AI as a force multiplier for that understanding.
The bar isn’t lower. It’s higher.
There will be fewer roles. More competition. And less room for people whose value comes from pushing basic work around.
But for those who truly care about the craft, this is the best version of the job we’ve had yet.
It’s harder. It’s sharper. And it finally looks like the role many of us signed up for in the first place.


It's about shoveling content into the agent. At the business level, the agent needs:
1. Data - all of it. As pre-cleaned / prepared as feasible.
2. Prose - explanatory, about the business (and data)
3. Prose - procedural, about known operating patterns
And the 'agent' part is just Claude Code in a file system with code execution and a library of utils. Anything else makes it worse.
Nice writeup! Really nailed the dynamics of how data teams have both felt so busy and also struggled to justify their business impact in recent years.
One additional benefit we've gotten on the data science devex: analysis execution has gotten dramatically faster. Apple silicon + GPU-accelerated libraries like XGBoost, duckdb, and near-zero friction of spinning up burst compute on tools like Modal all mean that sophisticated analyses now run orders of magnitude faster locally or with minimal infra work. So in addition to fewer one-off stakeholder requests on the frontend, we can also reclaim time on the backend simply by waiting less for things to run.
I also liked your commentary on dashboards:
> “They become the place for core reporting… not the place where every ad hoc question turns into a permanent artifact.”
I've long believed that dashboard tools are the wrong place for ad hoc analysis, so this statement resonated. The open question for me here is: how does the core actually expand in this world?
I imagine expansion topics mostly emerge from second- and third-order questions data scientists explore with their newfound time. But another known friction point in our field is turning a one-off analysis into a reusable, production-grade artifact (think notebook --> dbt models + dashboard/rETL sync). Does that deployment work also get largely solved by AI coding tools, or is there still some deployment pain that needs to be solved by someone with knowledge of both the company's data stack and the net-new analysis content that needs to be deployed?