Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,640
|
Comments: 51,253
Privacy Policy · Terms
filter by tags archive
time to read 6 min | 1146 words

You read the story a hundred times: “I told Codex (or Claude, or Antigravity, etc.) to build me a full app to run my business, and 30 minutes later, it’s done”. These types of stories usually celebrate the new ecosystem and the ability to build complex systems without having to dive into the details.

The benchmarks celebrate "one-shotting" entire applications, as if that's the relevant metric. I think this is the wrong framing entirely. Mostly because I care very little about disposable software, stuff that you stop using after a few days or a week. I work on projects whose lifetime is measured in decades.

AI agent-driven development isn't about the ability to use a one-shot prompt to generate a full-blown app that matches exactly what the user wants. That is a nice trick, but nothing more, because after you generate the application, you need to maintain it, add features (and ensure stability over time), fix bugs, and adjust what you have.

The process of using AI agents to build long-lived applications is distinctly different from what I see people bandying about. I want to dedicate this post to discussing some aspects of using AI agents to accelerate development in long-lived software projects.

Code quality only matters in the long run

The key difference between one-off work and long-lived systems is that we don’t care about code quality at all for the one-off stuff. It's a throwaway artifact. Run it, get your answer, move on. I am usually not even going to look at the code that was generated; I certainly don’t care how it is structured.

If I need to make any changes, or have to come back to it in six months, it is usually easier to just regenerate the whole thing from scratch rather than trying to maintain or evolve it.

When you're talking about an application that will live for a decade or more - or worse, an existing application with decades of accumulated effort baked into it - what happens then? The calculus changes completely. How do you even begin to bring AI into that kind of system?

It turns out that proper software architecture becomes more relevant, not less.

Software architecture as context management for AI

Think about what good software architecture actually gives you: components, layers, clear boundaries, and well-defined responsibilities. The traditional justification is that this lets you make small, careful, targeted changes. You know where to go, and you can change one thing. You slowly evolve things over time. Your changes don't break ten others because not everything is intermingled.

Now think about how an AI operates on a codebase. It works within a context window. That constraint isn't unique to AI, people do that too. There is only so much you can keep in your head, and proper architecture means that you are separating concerns so you can work with just the relevant details in mind.

When your architecture is clean, the AI can focus on exactly the right piece of the system. When it isn't, you're either feeding the AI irrelevant noise or hiding the context it actually needs from it.

Good architecture, it turns out, is also a good AI interface. And the reason this works is the same as for people: it reduces the cognitive load you have to carry while understanding and modifying the system. For AI, we just call it the context window. For people, it is cognitive load. Same term, same concept.

Beyond the mechanical benefits, good architecture gives you two things that I think are underappreciated in this conversation.

The first is structural comprehension. You don't need to have every line of a large codebase in your head. But you do need a genuine mental model of how data flows, how components relate, and where things live. That's only possible if the architecture actually reflects the system's intent.

When using AI to generate code, you need to have a proper understanding of the flow of the system. That allows you to look at a pull request and understand the changes, their intent, and how they fit into the greater whole. Without that, you can't meaningfully review the code. You're just rubber-stamping diffs you don't have a hope of understanding.

The second is that the work has shifted. We're moving from "how do I write this code?" to "how do I review all of this code?". Nobody is going to meaningfully maintain 30,000 lines a day of dense AI code. At that point, the codebase has escaped human comprehension, and you've lostthe game. This isn’t your project anymore, and sooner or later, you’ll face the Big Decision.

Turtles all the way down

I hear the proposed solution constantly: "I have an agent that writes the code, an agent that tests it, an agent that reviews the reviews, and so on." This is, I think, genuinely insane for anything that matters.

We already have evidence from the field that this doesn’t work. Amazon has had production failures from AI-generated code produced through exactly these kinds of layered-AI pipelines. Microsoft's aggressive approach to AI integration has shown what happens when AI-generated code enters production with minimal meaningful human oversight.

In both of those cases, the “proper oversight” was also provided by AI. And the end result wasn’t encouraging for this pattern of behavior. For critical systems that carry real consequences, "AI supervising AI" is not a thing.

AI works when you treat it as a tool in your hands, not as an autonomous system you've delegated to. An engineer who understands architecture and can look at a diff and say "this is right" or "this is wrong, and here's why" is much more capable with AI than without it.

An engineer who has offloaded comprehension to the machine is flying blind; worse, they are flying very fast directly into a cliff wall.

What should you do about it?

When we treat AI agents as a tool, it turns out that not all that much needs to change. The current processes you have in place (CI/CD, testing, review cycles, etc.) are all about being able to generate trust in the new code being written. Whether a human wrote it or a GPU did is less interesting.

At the same time, we have decades of experience building big systems. We know that a Big Ball of Mud isn’t sustainable. We know that proper architecture means breaking the system into digestible chunks. Yes, with AI you can throw everything together, and it will sort of work for a surprisingly long time. Until it doesn’t.

With a proper architecture, the scope you need to keep track of is inherently limited. That allows you to evolve over time and make changes that are inherently limited in scope (thus, reviewable, actionable, etc.).

“The more things change, the more they stay the same.” It is a nice saying, but it also carries a fundamental truth. Using AI doesn’t absolve us from the realities on the ground, after all.

time to read 5 min | 813 words

Like everything else, we have been using AI in various forms for a while now, from asking ChatGPT to write a function to asking it to explain an error, then graduating to running it on our code in the IDE, and finally to full-blown independent coding assistants.

Recently, we shifted into a much higher gear, rolling it out across most of the teams at RavenDB. I want to talk specifically about what that looks like in practice in real production software.

RavenDB is a mature codebase, with about 18 years of history behind it. The core team is a few dozen developers working on this full-time. We also care very deeply about correctness, performance, and maintainability.

With all the noise about Claude, Codex, and their ilk recently, we decided to run some experiments to see how we can leverage them to help us build RavenDB.

The numbers that got my attention

We started with features that were relatively self-contained — ambitious enough to be real work, but isolated enough that an AI agent could take them end-to-end without stepping on core aspects of RavenDB.

The first one was estimated at about a month of work for a senior developer. We completed it in two days. To be fair, a significant portion of that time was spent learning how to work effectively with Claude as an agent, learning the ropes and the right discipline and workflows, not just the task itself.

The second was estimated at roughly three months for an initial version. It was delivered in about a week. And we didn't just hit the target — we significantly exceeded the planned feature set.

In terms of efficiency, we are talking about a proper leap from what we previously could expect.

This isn't vibe coding

I want to be direct about something: this is not "prompt it and ship it." There is a discipline required here. The AI can move very fast, explore a lot of ground, and generate code that looks right, but isn’t. Code ownership and engineering responsibility don't go away; they become much more demanding.

I personally sat and read 30,000 lines of code. I had to understand what was there, push back on decisions, redirect the approach, and enforce the standards that RavenDB has built up over many years.

Those 30,000 lines of code didn’t appear out of thin air. They were the final result of a lot of planning, back and forth with the agent, incremental steps in the right direction (and many wrong ones, etc.).

To be fair, 30,000 lines of code sounds like a lot, right? About 60% of that is actually tests, and about half of the remaining code is boilerplate infrastructure that we need to have, but isn’t really interesting.

The juicy parts are only around 5,000 lines or so.

In many respects, this isn’t prompt-and-go but feels a lot more like a pair programming session on steroids.

What AI agents give you is the ability to explore the problem space cheaply and quickly. After we had something built, I had a different idea about how to go about implementing it. So I asked it to do that, and it gave me something that I could actually explore.

Being able to evaluate multiple different approaches to a solution is crazy valuable. It is transformative for architectural decisions.

Having said that, using a coding agent to take all the boilerplate stuff meant that I was able to focus on the “fun parts”, the pieces that actually add the most value, not everything else that I need to do to get to that part.

What this means going forward

AI agents are going to amplify your existing engineering culture, for better or worse.

A lot of the cost of writing good software is going to move from actually writing code to reviewing it. For many people, the act of writing the code was also the part where they thought about it most deeply.

Now the thinking part moves either upfront, at the planning phase, or to the end, when you look at the pull request. Reading a pull request, you could reasonably expect to see code that has already been reasoned about and properly tamed.

Now, in some cases, this is the first time that a human is actually going to properly walk through the whole thing. To ensure proper quality, you also need to shift a lot of your focus to that part.

The bottleneck for good software is going to be the review cycle, the architectural approach, and an experienced team that can actually evaluate the output and ensure consistent high quality.

Without that, you can go very fast, but just generating code quickly is a losing proposition. You’ll go very fast directly into a painful collision with a wall.

We are still settling down and trying to properly understand the best approach to take, but I have to say that this experiment was a major success.

FUTURE POSTS

  1. Expertise in the age of AI, or: Matt's Claude'll handle this - about one day from now
  2. 15+ years of working with coding agents - 6 days from now
  3. Putting Claude up against our test suite - 8 days from now
  4. The GPU Is the New Bangalore - 10 days from now
  5. Learning to code, 1990s vs 2026 - 14 days from now

There are posts all the way to May 05, 2026

RECENT SERIES

  1. API Design (10):
    29 Jan 2026 - Don't try to guess
  2. Recording (20):
    05 Dec 2025 - Build AI that understands your business
  3. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  4. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  5. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
View all series

Syndication

Main feed ... ...
Comments feed   ... ...