Why Opinionated, Code-First Systems Are the Future of Data

How AI amplifies strong abstractions instead of replacing them

Dec 30, 2025

Most problems in the data space can be broken down into two fundamentally different challenges. The first is building, deploying, and hosting environments where work can happen. The second is handling the complex, domain-specific logic required to actually do the work.

For a long time, these two problems were tightly coupled. If you wanted to solve meaningful problems with data, you also had to build and maintain the infrastructure that made that work possible. In practice, that meant many data engineering teams spent the bulk of their time standing up environments, wiring orchestration, managing compute, and keeping pipelines alive. That coupling made data work expensive, slow, and inaccessible for many teams, not because the insights were hard to find, but because the cost of supporting the underlying systems was so high.

What’s changing now is that AI combined with opinionated, code-first tools is finally breaking that coupling. To make this concrete, I’ll use a familiar example throughout this article: the classic ETL process.

Problem 1: Building, Deploying, and Hosting Environments

Historically, just getting an environment stood up to run data workflows was non-trivial. Entire blog posts, conference talks, and internal docs were written about how hard it was to get Airflow running reliably, often followed by an even longer detour into Kubernetes. Teams would spend weeks or months just trying to get a scheduler deployed, workers scaled correctly, logs centralized, retries behaving, and permissions wired up. All of this happened before a single line of business logic was written. By the time the environment was stable, a huge amount of engineering energy had already been spent on infrastructure rather than on solving actual data problems. This is where an enormous amount of data team effort quietly disappeared.

Tools like Dagster have fundamentally shifted this baseline by tackling both structure and deployment at the same time. Dagster is opinionated about how data work should be represented: assets, jobs, schedules, and sensors are first-class concepts with clearly defined relationships. That structure forces teams to be explicit about dependencies, freshness, and ownership, rather than letting orchestration logic sprawl across ad hoc DAGs.

At the same time, Dagster dramatically lowers the operational burden of running this system. With a serverless deployment model, you can define everything in code while getting orchestration, observability, retries, and state awareness out of the box. As a data scientist, not a platform engineer, I can operate production-grade workflows without needing to own the underlying infrastructure.

For the vast majority of workflows, Dagster’s serverless deployment model is more than sufficient. Most pipelines don’t need massive amounts of compute or memory, and the simplicity of a managed, serverless environment is a huge win.

But there are always edge cases. Large historical backfills, unusually wide tables, or memory-intensive workloads can push beyond what a default serverless environment is designed to handle.

This is where Modal fits naturally into the picture. Modal allows jobs defined in Dagster to execute with effectively unlimited compute and memory, on demand. Instead of redesigning your system or standing up bespoke infrastructure, those edge cases become configuration choices. The environment stays simple for 90% of workloads, while still scaling cleanly when you need it to.

Taken together, Dagster and Modal largely solve the first problem. The environment is easy to stand up, observable by default, and capable of scaling without deep infrastructure expertise.

Problem 2: Handling Complex Job Logic

Once the environment exists, you still need to define what the job actually does. In the case of ETL, this logic is deceptively complex. You need to understand third-party APIs, manage state across runs, handle schema drift, implement incremental loading, deal with rate limits and retries, ensure idempotency, and support backfills without corrupting downstream data.

For a long time, tools like Fivetran succeeded by owning this entire layer. They encoded the logic for common systems and ran it inside infrastructure you didn’t have to think about.

But the key insight is that this logic, while complex, is also highly standardized. The way one company pulls data from Stripe or Salesforce looks remarkably similar to how every other company does it. That makes it an ideal candidate for strong, opinionated abstractions.

This is exactly what tools like dlt provide. dlt doesn’t ask you to invent ingestion patterns. It encodes best practices around extraction, state management, incremental loading, retries, and schema evolution. The solution space is intentionally narrow. There are fewer clever ways to do things, and far more correct ones.

Where AI Actually Fits

This is the part that often gets misunderstood. The promise of AI in data is not that it can invent bespoke systems from scratch. That path leads to fragile pipelines that are hard to debug and even harder to maintain.

The real promise is that AI can operate inside opinionated systems and assemble proven patterns extremely quickly. You effectively create a constrained space where AI is exceptionally effective When you combine:

a well-defined execution environment (Dagster + Modal)
strong opinions about how jobs should be structured (dlt)

Inside a tool like Cursor, AI isn’t guessing. It’s inheriting the opinions baked into these tools and composing them correctly. It can craft pipelines that follow dlt’s patterns, register them cleanly in Dagster, and configure them to run at scale in Modal…all within the context of an existing codebase.

You’re not letting AI “go its own way.” You’re letting it stand on top of the expertise of the people who built these tools.

The Bigger Shift

This pattern extends far beyond ETL. We’re moving toward a world where:

environments are easy to build, deploy, and scale,
complex logic is captured in opinionated libraries, and
AI dramatically lowers the activation energy required to glue those pieces together.

In that world, building a new data pipeline starts to feel less like bespoke engineering and more like structured composition. You still have code. You still have control. But you no longer need to be an expert in every layer of the stack to be effective.

That’s not vibe coding. That’s AI amplifying opinionated, code-first systems.

What This Means for Tool Builders

The natural conclusion of all of this is that our attention should shift toward the people building tools that fit cleanly into this framework.

The next generation of essential data infrastructure won’t be defined by how much complexity it hides behind a UI. It will be defined by how well it encodes hard-earned opinion and how easily that opinion can be expressed in code.

If you’re building a tool for the data ecosystem, the key question is no longer “Can we make this no-code?” It’s:

Can we do one important job, in a very opinionated way, inside an environment that AI can reliably operate within?

The most valuable tools will:

Solve a narrow but painful problem
Make strong, explicit choices about how that problem should be solved
Expose those choices through simple, composable interfaces
Be easy for AI to reason about with minimal context and documentation

In this world, documentation stops being something only humans read. It becomes part of the interface AI uses to compose systems on our behalf. Clear abstractions, predictable patterns, and explicit constraints aren’t just good engineering practices. They’re what make a tool legible to AI in the first place.

This is how a tool becomes essential again. Not by owning an entire workflow end to end, but by becoming the obvious building block that shows up everywhere because it does one thing exceptionally well.

Build opinionated, code-first tools that slot cleanly into scalable environments, and let AI handle the glue.

A Bet on Dashboards

My bet is that this pattern extends well beyond ETL. There’s still a massive opportunity in the data space to solve the problem of building, hosting, and deploying dashboards in the same way.

For years, dashboards have largely lived in one of two worlds. On one end, rigid BI tools, where you’re constrained by whatever UI components and interaction patterns the tool happens to support. On the other, bespoke web apps, which offer flexibility but are fragile, expensive to build, and rarely worth the long-term maintenance cost. Both approaches break down quickly.

With traditional BI tools, the moment you want to do anything moderately complex, you hit a wall. Running a regression, simulating outcomes, generating predictions, and then visualizing those results is far harder than it should be. These workflows are trivial in Python, but awkward or impossible in most dashboarding tools.

Dashboards also fundamentally lack context. They don’t know anything about your business by default. They don’t understand which metrics matter, how they’re defined, or how they relate to one another. To fix that, teams start wiring up semantic layers and governance frameworks, which quickly become entire systems of their own.

Tools like Streamlit already hint at a different future: dashboards as code, built with clear opinions, running in managed environments, and capable of executing complex logic directly.

What’s been missing is everything else this article has described.

If the environment for deploying and scaling dashboards is solved, and the primitives for building them are opinionated and composable, AI becomes incredibly effective here too, especially when those tools understand the surrounding data context. A dashboarding tool that knows about your code-first dbt transformation layer, the models that already exist, the metrics you’ve defined, and the grain of your data can let AI handle not just layout and interaction, but also the complex SQL and analytical logic underneath.

At that point, dashboards stop being thin UI layers over static queries. They become first-class analytical artifacts, written in code and deeply connected to the rest of your data stack. And because that code lives inside opinionated systems with strong defaults, it becomes accessible far beyond traditional technical roles. It’s entirely plausible that a non-technical stakeholder could open a tool like Cursor, ask a well-formed question, and have AI assemble a high-quality dashboard or answer a simple analytical question by composing existing models, metrics, and visualizations correctly.

The leverage doesn’t come from everyone learning to code. It comes from code-first systems being legible enough that AI can do the heavy lifting on their behalf.

Solve the environment problem. Encode strong opinions about how the work should be done. Make those opinions easy to express in code. Let AI do the assembly.

That’s the future I’m betting on.

Andrew Clough

This is the same path I've been on in discovering AI works really well with addressing variations in data as long as there is a structured system.

So glad other people are thinking about this too! Looking forward to your future writings!

Expand full comment

AnyFactor

I have mixed feelings about this. The issue is that a lot of larger corporations still use GUI-based ETL solutions. These GUI-based systems also provide an all-in-one DE platform solution and they have not stopped adding features/value to the product.

Migrating them off to a code-based system is going to be an incredible challenge, and the value is simply not there. Data Engineering as a concept is less of a "code" based solution, rather a platform-based solution. Code is supposed to be modular. Even the code-based solutions are more like simple, plain-text configuration files rather than code.

1 reply

1 more comment...

ramblings and ruminations

Discussion about this post

Ready for more?