Skip to content
← Articles

I spent $25,507 on AI coding agents in twelve weeks

A twelve-week receipt from real agentic engineering, and what the numbers actually mean for your own spend.

Akshat Jain8 min readfrom a 14-page talk
The takeaway

Cheaper tokens do not lower your AI bill. They raise how much you do, so the question is not “can I afford a session” but “is the work this session produces worth more than it costs.”

The number, said plainly

Akshat Jain spent $25,507 on AI coding agents over twelve weeks. That is the whole headline. Not a projection, not a pitch for a tool, just what the meter read after three months of building.

Two agents did the work: Codex and Claude Code. Between them they moved 34.2 billion tokens and produced about 2,250 commits. That is the raw shape of it. Big numbers, but numbers you can hold.

Before going further, the honest caveats, because they change how much weight any of this carries. These are ccusage estimates derived from model pricing, not a bill from a provider. Commits are a rough proxy for output, not a clean measure of value. And the repo has a second contributor, so not every line traces back to one person and one agent. Keep all three in mind as you read.

$25,507 over twelve weeks. Not a projection. What the meter read.

  • $25,507 spent over twelve weeks
  • Two agents: Codex and Claude Code
  • 34.2 billion tokens, about 2,250 commits

Where the money actually went

The first instinct on seeing a number like this is to imagine it scattered across experiments, side quests, and half-finished toys. It was not. 95% of the spend went into a single codebase: the product he was building.

And almost all of that was production engineering. Feature work. QA. Onboarding flows. Deploys. Debugging. Not prompt-tinkering, not chatting with a model about ideas. The unglamorous middle of shipping software, done over and over.

If you are trying to reason about your own spend, this is the first useful divide. Money spent exploring is different from money spent building the thing you already decided to build. This was overwhelmingly the second kind.

95% went into one codebase. Almost all of it was the unglamorous middle of shipping software.

  • 95% of spend went into one product codebase
  • Almost all of it was production engineering
  • Feature work, QA, onboarding, deploys, debugging

Spend concentrates, and a few runs dominate

Here is the part that surprises most people. The cost was not spread evenly across the work. It clumped. Of 596 sessions, the 20 largest accounted for 85% of all tokens. A small handful of runs ate almost the entire budget.

The single biggest run makes the point on its own. A $5,203 Codex session did exactly one job: open every page of the app in a browser and test it. That one session was 19% of the whole bill. One run, nearly a fifth of three months of spending, just to walk the product like a user and check that it worked.

This is worth sitting with. The expensive thing was not writing code. It was thorough, exhaustive verification. The agent did the kind of QA pass a human tester would do, except it did all of it, every page, without getting bored or cutting corners. That completeness costs tokens, and the tokens cost money.

One session, $5,203, did one job: open every page and test it. That was 19% of the whole bill.

  • 596 sessions total; the 20 largest were 85% of tokens
  • Biggest run: a $5,203 Codex browser-QA session
  • That one session alone was 19% of the bill

What the work looked like underneath

Strip away the dollar figure and look at the actual shape of the activity, and it looks much more like ordinary engineering than like a chatbot conversation. By volume: 106,000 shell commands, 36,000 sed edits, 18,000 git calls, 15,000 ripgrep searches, and a couple thousand browser-QA passes.

Read that list again. Running commands. Editing files with sed. Committing with git. Searching the codebase with ripgrep. Checking the live app in a browser. These are the same moves a developer makes by hand, just done at a volume no person would sustain.

This is what people miss when they picture AI coding as one clever answer to one clever prompt. Real agentic engineering is mostly grinding: search, edit, run, check, commit, repeat, thousands of times. The intelligence shows up in the loop, not in a single brilliant reply.

Real agentic engineering is mostly grinding: search, edit, run, check, commit, repeat, thousands of times.

  • 106,000 shell commands
  • 36,000 sed edits, 18,000 git calls
  • 15,000 ripgrep searches, a couple thousand browser-QA passes

The unit economics, and the trap inside them

Per unit of output, the work was cheap. About $11 per commit. Under two cents per net line of code. By those numbers, an individual commit or an individual line feels almost free, which is exactly the feeling that drives the total up.

That is the trap, and it has a name. As each token got cheaper, he used far more of them. Weekly usage grew roughly 7,800x between March and May. The price per unit fell, and consumption rose so much faster that the bill climbed anyway. This is Jevons Paradox, the old observation that making a resource cheaper to use often increases, not decreases, how much of it you burn.

So the cheap unit cost is not the reassuring fact it looks like. It is the engine. When each commit costs $11, you do not do fewer commits to save money. You do more of them, because each one is so affordable, and the aggregate is what shows up on the statement. Cheaper tokens did not produce a smaller bill. They produced a much larger appetite.

The price per unit fell, and consumption rose so much faster that the bill climbed anyway.

  • About $11 per commit
  • Under two cents per net line of code
  • Weekly usage grew roughly 7,800x from March to May

What a normal person should take from this

First, do not anchor on the total. $25,507 is what twelve weeks of full-time, near-continuous agentic engineering on one product costs at this point in time. If you are dabbling, fixing a bug here, generating a script there, your number will look nothing like this. The headline is a ceiling for a specific, intense way of working, not a typical monthly bill.

Second, expect your spend to concentrate. A few big, thorough runs, especially exhaustive QA passes that walk the whole app, can dwarf everything else. If you want to control cost, that is where to look first. Watch the long sessions, not the short ones. One careful browser-test run was a fifth of the entire bill here.

Third, and most important, plan for Jevons Paradox in your own usage. As the tools get cheaper, your instinct will be to do more, not to save. That is not a flaw to fix, it is just the dynamic to expect. The right question is never “can I afford this session.” It is whether the work a session produces is worth more than what it costs. If a $5,203 run finds the bugs across every page of your product before your users do, that may be a bargain. If it does not, no per-line price makes it cheap. Judge the output, not the unit price, and the economics stay legible.

The right question is never “can I afford this session.” It is whether the work is worth more than what it costs.

  • The total reflects intense full-time use, not a typical bill
  • Watch the few long runs; they dominate the cost
  • Judge sessions by output value, not by per-line price

Reading is good. Building is better.

Bring this to a free Oslo Vibe Coding drop-in and put it to work with people around you.