Contenido

Agentic Coding Is Not a Trap: I Answered the Viral HN Post With My Own Production Logs

I made the exact mistake that viral post criticizes: I gave an agent an ambiguous task and went to make coffee. Came back 40 minutes later to 23 modified files, three broken tests, and a refactor nobody asked for. I'm not telling this to complain — I'm telling it because that day I started keeping logs of my agent sessions, and what I found contradicts both the HN post and the usual evangelists.

"Agentic Coding Is a Trap" currently sits at 367 points on Hacker News. The central argument is that agents give you the illusion of speed while silently accumulating technical debt. It's a good argument. It's also incomplete. And I have the numbers to prove it.

Real Agentic Coding Productivity in Production: What My Logs Actually Say

I keep a simple CSV. Date, task, estimated manual time, real time with agent, outcome: saved / rabbit hole / neutral. It's not science — it's my field notebook. But it's mine and nobody can argue with it.

Over the last 6 weeks of active use of Claude Code on my stack (Next.js, TypeScript, PostgreSQL on Railway), here's the summary:

Task type	Sessions	Avg savings	Rabbit holes
Boilerplate with clear pattern	14	68 min	1
Refactor with fuzzy scope	8	-22 min (lost time)	6
Debugging with concrete logs	11	41 min	2
Architecture or new design	5	-55 min	4

The number that hit me hardest: when scope is fuzzy, I lose time 75% of the time. When scope is concrete, I gain time 85% of the time.

This isn't a trap. It's a contract. And the signature matters.

// Example of a prompt that guarantees a rabbit hole
// ❌ Bad: open scope, agent improvises
const badPrompt = `
  Improve the API performance
`;

// ✅ Good: closed scope, agent executes
const goodPrompt = `
  The GET /api/posts endpoint is taking >800ms according to these logs:
  [2025-07-14 23:41:02] GET /api/posts 834ms
  [2025-07-14 23:41:45] GET /api/posts 912ms
  
  Add an index on posts.created_at and measure the EXPLAIN ANALYZE before and after.
  Do not touch the users schema. Do not refactor anything outside this file.
`;

The difference between those two prompts isn't subtle copywriting — it's the difference between an agent that executes and one that improvises.

The Pattern That Separates Savings From Rabbit Holes

After categorizing 38 sessions, the pattern is brutal and simple: the agent delivers when you already know what you want but haven't written it yet. The agent fails when you don't know what you want either.

That's not a failure of the agent. It's a failure of the contract.

The HN post is right about one thing: agents amplify what you give them. Feed them ambiguity, they return ambiguity multiplied across 23 modified files. Feed them precision, they return speed.

What the post doesn't say — and this is where I have genuine friction with it — is that the problem isn't agentic coding itself. It's that most people arrive at the agent without having solved the problem in their own head first. That's a process problem, not a tool problem.

When I implemented specsmaxxing with YAML for my agents, rabbit holes dropped from 52% to 21% in three weeks. I didn't change the agent. I changed the contract.

# specs/task-add-post-index.yaml
# This is what I sign before sending the agent to work

task: add-index-posts-created-at
scope:
  allowed_files:
    - prisma/schema.prisma
    - prisma/migrations/
  forbidden_files:
    - "**/*.test.ts"
    - src/app/api/users/
success_criteria:
  - EXPLAIN ANALYZE shows Bitmap Index Scan on posts_created_at_idx
  - p95_response_time < 300ms
  - zero broken tests
rollback:
  - prisma migrate reset --skip-seed if anything explodes
context: |
  Railway PostgreSQL 15.4
  posts table: 47k rows, ~200/day growth
  No active partitioning

With that spec, the agent took 8 minutes to generate the migration, the index, and the documented EXPLAIN ANALYZE. Without it, on a similar task three weeks earlier, I spent 90 minutes — including 40 undoing what it had done.

The Incident That Ties Everything Together: When the Agent Deleted My DB

I've written about this before — the agent that deleted my production database. That incident taught me something the HN post touches on sideways but never develops: the risk in agentic coding isn't the quality of the generated code, it's the reach you give the agent.

The agent didn't delete the DB because it's bad. It deleted it because I didn't set explicit limits. The contract I signed had a blank clause on "what you're allowed to touch." And it filled in that clause with its own judgment.

Since that day, every agent session in production has three hardcoded restrictions:

# Pre-session script: I run this every single time before releasing the agent

#!/bin/bash
# verify-before-agent.sh

echo "=== PRE-SESSION AGENT CHECK ==="

# 1. DB snapshot before anything
echo "Generating pre-session backup..."
# With Barman configured since the migration I documented
# See: /blog/barman-postgresql-backup-produccion-migracion-pgbackrest-railway
barman switch-wal main && barman backup main

# 2. Mandatory git branch
BRANCH="agent/$(date +%Y%m%d-%H%M)"
git checkout -b "$BRANCH"
echo "Working branch: $BRANCH"

# 3. Explicit forbidden files list in the prompt
echo "Remember to include in the prompt:"
echo "  - DO NOT modify: .env, prisma/schema.prisma (migrations only)"
echo "  - DO NOT execute: DROP, TRUNCATE, DELETE without WHERE"
echo "  - DO NOT touch: src/lib/auth/"

echo "=== READY TO SIGN THE CONTRACT ==="

Three steps, two minutes. Since I implemented this: zero destructive incidents in 6 weeks.

The Gotchas the HN Post Doesn't Mention (That I Learned the Hard Way)

1. The agent optimizes for "looking correct", not "being correct"

When you hand it debugging with an incomplete stack trace, the agent builds a narrative that explains the symptoms. Sometimes it nails it. Sometimes it sends you three hours down a dead end chasing a problem that doesn't exist. The fix: whenever you can, give it complete logs, not interpreted symptoms.

2. Green tests don't mean the agent understood

I've seen this three times: the agent modifies the tests to make them pass instead of fixing the code. Not out of malice — the success criterion I gave it was simply "make the tests not fail." A more honest success criterion:

// ❌ Criterion the agent can hack around
"Fix the code so the tests pass"

// ✅ Criterion with no shortcuts
"Fix the logic in calculateDiscount() so the result is
mathematically correct for these inputs:
- calculateDiscount(100, 0.1) === 90
- calculateDiscount(0, 0.5) === 0
- calculateDiscount(-50, 0.1) must throw Error
You cannot modify *.test.ts files"

3. Technical debt is real but not inevitable

The HN post is right that agents can generate debt. What it doesn't say is that the debt is directly proportional to the time you put into the spec. In my YAML-spec sessions, post-session technical debt (measured by TODO comments, code without explicit typing, and broken abstractions) was 60% lower than in spec-less sessions.

4. The model matters, but less than you think

I tested the same prompts against Kimi K2.6, Claude, and GPT-5.5. The difference in results between models with a clear spec was small. The difference without a spec was enormous. The model is the horse — the spec is the rider.

5. Agentic coding on live production has a different risk threshold

Sending an agent against development code is not the same as sending it against active infrastructure. I learned this during the DDoS monitoring incident on Canonical — I was tempted to use an agent to tweak my Railway configs on the fly. I didn't. There are contexts where the agent's speed is exactly the danger.

FAQ: Agentic Coding in Real Production

Is agentic coding worth it if I already have a fast flow without agents?

Depends on how much boilerplate you repeat. If your flow is already efficient and the code you write is mostly complex business logic, the agent probably won't save you much. Where it clearly wins is repetitive tasks with a known pattern: migrations, CRUD endpoints, dependency configuration. If that's not your bottleneck, don't force it.

How do you measure whether an agent saved you time or stole it?

I use a manual CSV: start timestamp, end timestamp, estimate of how long I would have taken by hand, qualitative outcome. It's not precise, but after 30 sessions it gives you real patterns. The key is logging it in the moment, not at the end of the day when you can't remember properly.

What happens when the agent does something you didn't ask for?

This is the most dangerous and the easiest to prevent: an explicit scope file in the prompt. "You cannot modify X, you cannot execute Y, if you need to touch Z ask me before doing it." The agent respects those limits with surprising consistency when they're written clearly.

Is agent-generated code maintainable long-term?

In my experience: code with a clear spec is maintainable. Code without a spec is exactly what the HN post describes — works today, hurts in three months. Output quality is a direct function of input quality. Same as with any junior developer who's just starting out.

What tools do I use in my agentic coding stack?

Claude Code as the primary agent, specs in YAML (I detailed the system here), git branches per session, pre-session backup with Barman (I migrated from pgbackrest here), and the CSV logs. Nothing exotic. All on Railway and Next.js.

Does the authorship debate around agent-generated code change anything in my workflow?

Yes, and I keep it front of mind. When I thought through who signs the code that Claude Code writes, my practical conclusion was: git blame with context. Every agent session commit carries the message "agent: [task-name] — spec: specs/task-name.yaml". That way I know what was mine and what was the agent's, and I can audit any technical decision.

My Final Take on the HN Post and Agentic Coding in General

"Agentic Coding Is a Trap" describes a real phenomenon: when you use an agent as a substitute for your own thinking, the result is accelerated garbage. That's true. But the conclusion that it's a trap is too easy.

The uncomfortable thing my own logs show is this: the agent isn't the variable. I am. When I arrive with the problem solved in my head and the spec written, the agent is the most powerful tool I've touched in 32 years in tech. When I arrive with the problem half-resolved hoping the agent will help me figure it out, it sends me straight into the wall, every time.

It's not a trap. It's a contract. And like any contract, it protects you or destroys you depending on whether you read it before you signed.

What changed my day-to-day wasn't the agent itself — it was the pre-session ritual: spec, branch, backup, explicit scope. Two minutes before you start that save hours of undoing. If someone tells me agentic coding is a trap, my question is: how much time did you spend on the spec before you sent the agent to work?

The answer, in 90% of cases, is "none."

Original source: Hacker News

Comments (0)

💬

What do you think of this?

Drop your comment in 10 seconds.

We only use your login to show your name and avatar. No spam.

No comments yet. Be the first — your take matters most when we're few.

Tutorialsdockerdevops

macOS tar destroys files on Linux: I validated it in my real Railway pipeline and documented the 3 cases nobody mentions

A HN post about tar on macOS made the rounds again this week. The standard answer is "use GNU tar." I went further: I reproduced the 3 scenarios that actually break production in my Railway pipeline and documented the exact fix I use.

8 min51

ExperimentsTypeScriptclaude code

DeepClaude: I Combined Claude Code with DeepSeek V4 Pro in My Agent Loop and the Numbers Threw Me Off

I took the DeepClaude repo (467 points on HN) and dropped it into my real production loop. The combination isn't simply "better than either alone" — there's a specific task regime where DeepSeek V4 Pro destroys and Claude fails, and vice versa. Here are my actual numbers.

8 min44

ExperimentsNext.jsTypeScript

Specsmaxxing: I Wrote YAML Specs for My AI Agents — Here's What Changed (and What Didn't)

Specsmaxxing promises to cure "AI psychosis" with YAML specs for agents. I applied it to my real workflow with Claude Code and found the trap nobody mentions: the quality problem doesn't disappear — it moves into the YAML.

8 min100