Contenido

How I built a self-auditing editorial pipeline with AI

The README.md in the repo literally says: "Juanchi portfolio landing. Automatically synced with your v0.app deployments." Two lines. Vercel badge. Nothing else.

That became a lie in the first month. What actually runs at commit f49b4d1a522a89df7927b5796ef4144ab35ba704 is something else: a system that ingests real repositories, builds a structured code brief, runs content through a numeric quality gate, and automatically rejects or rewrites if it doesn't hit the threshold. All inside Next.js, all on Railway. No Vercel in the middle.

The question I kept running into while building this wasn't "what does the system do?" It was: at what point does an automatic editorial pipeline have better judgment than the person who built it? That's the uncomfortable thing I want to be honest about here.

The problem that made me build this

I was generating content with AI and shipping it. Fast, consistent, polished — and completely interchangeable with what any other developer produces using the same model and the same prompt. No stance, no technical scar tissue, nothing that actually justified me signing it.

The cost wasn't just quality. It was something closer to identity. If every post can come from any Claude instance without specific context, why does juanchi.dev exist as a brand at all?

The answer was: you need a gate. Something that catches generic content before it reaches production. But building that gate is where it gets genuinely weird — because you end up with an AI auditing another AI, and that loop has its own failure modes.

What the repo's editorial scope reveals

Before writing a single line of this post, the pipeline analyzed 907 files in the repo and selected 30 to build editorial context. Not randomly: each file gets an assigned role — entrypoint, domain_logic, data_model, tests, risk_or_security, operations, configuration, documentation.

That's the CodeScopeBrief. The idea is that what reaches the generator isn't a random dump of the repo but a deliberate selection by architectural function. Of 92 available entrypoint files, 4 were selected. Of 426 domain_logic files, 4. Of 121 test files, 3. Fixed budget, balanced coverage.

What gets excluded matters too: three files were blocked by the secrets scanner before ever reaching editorial selection. Not manual intervention — the pipeline detected Anthropic keys and a GitHub PAT and cut them automatically. That's the behavior you want when processing repositories, including your own, which sometimes have a .env committed by accident.

The central technical decision: numeric score as contract

In lib/editorial/editor-service.ts there are three constants that are the actual heart of the system:

hljs language-typescript

// lib/editorial/editor-service.ts

export const EDITORIAL_GATE_MIN_SCORE = 81
export const EDITORIAL_REWRITE_MIN_SCORE = 65
export const EDITORIAL_REWRITE_MAX_ROUNDS = 3

The logic: score above 81, it passes. Between 65 and 81, the system retries generation up to 3 times. Below 65 after 3 rounds, it throws EditorialGateBlockedError and the post doesn't exist.

hljs language-typescript

// lib/editorial/editor-service.ts

export class EditorialGateBlockedError extends Error {
  constructor(
    public readonly reviewId: string,
    public readonly score: number,
  ) {
    super(`Editorial gate blocked content with score ${score} (review ${reviewId})`)
    this.name = "EditorialGateBlockedError"
  }
}

What I think works here: the error carries the score in the payload. It's not a rejection boolean — it's auditable evidence. You can build a dashboard of how many posts were blocked and at what average score they failed. That makes the gate observable rather than a black box.

What genuinely worries me: 81 is a number I chose. I have no public evidence this threshold correlates with quality as perceived by actual readers. It's my own judgment hardcoded as contract. It holds as long as the evaluating model and the generating model stay consistent with each other — change one without recalibrating the other, and the threshold becomes noise.

The review workflow as state machine

lib/editorial/revision-workflow.ts handles something subtler than the gate: the lifecycle of content after generation. It includes a function that calculates readTime from word count:

hljs language-typescript

// lib/editorial/revision-workflow.ts

function readTimeFor(content: string) {
  const words = content.trim().split(/\s+/).filter(Boolean).length
  return Math.max(1, Math.ceil(words / 200))
  // 200 words/minute is the standard I used; revisable
}

Simple. But look at what isGeneratedContent actually validates: not just that es.title and es.slug exist, but also en.title, en.slug, and en.content. The system is bilingual by contract — generate only in Spanish with no English translation and the content is invalid, it doesn't persist.

I made that decision before writing the first post on juanchi.dev: either you publish in both languages or you don't publish. The cost is real — it doubles token cost on every generation. The benefit is reach: posts that can cross to Dev.to in English without manual translation afterward.

Crons that stopped working and how I fixed it

The workflow .github/workflows/awesome-crons.yml has a comment that tells the actual story:

hljs language-yaml

# Scheduled Awesome jobs run as Railway cron services. The previous GitHub
# schedule called juanchi.dev through Cloudflare and was blocked by managed
# challenge 403 before reaching Next.js.

I had GitHub Actions crons calling the app endpoint. Cloudflare was blocking them with 403 because a curl User-Agent without specific headers triggers the managed challenge. The fix was moving crons to Railway directly — Railway has internal access to the app without going through Cloudflare. GitHub Actions stayed only as a manual trigger via workflow_dispatch.

The dispatcher in app/api/admin/awesome/run/[job]/route.ts is deliberate: a single endpoint with in-memory rate limiting (RATE_LIMIT_MS = 60_000) and a job registry by name:

hljs language-typescript

// app/api/admin/awesome/run/[job]/route.ts

const JOBS: Record<string, JobFn> = {
  "repo-sync": (ctx) => runRepoSync(ctx),
  "series-publish": (ctx) => runSeriesPublish(ctx, { force: true }),
  discovery: (ctx) => runDiscoveryJob(ctx),
}

const lastRun = new Map<string, number>()
const RATE_LIMIT_MS = 60_000

The in-memory rate limit has a known problem I haven't solved yet: if Railway restarts the service, the map clears and you can fire the same job twice within 60 seconds. For a personal blog, that risk is acceptable. For something with expensive side effects in production, you'd need to persist the last-run timestamp in a database.

The discovery job is the only one that fires with queueMicrotask because it can take minutes — you can't hold the HTTP connection open while it runs. The rest respond synchronously before the maxDuration = 60 the platform enforces.

What the data model reveals about the product

The baseline migration prisma/migrations/20260421000000_baseline_existing_schema/migration.sql has enums that tell the full story:

EditorialReviewStatus: ACCEPTED, REWRITTEN, BLOCKED, APPROVED, REJECTED — the gate's full lifecycle.
CuratedVerdict: GEM, WORTH_TRYING, MEH, HYPE, DEAD — the tool curation system.
VideoStatus: DRAFT → APPROVED → AUDIO_READY → RENDERING → RENDERED → PUBLISHED → DISCARDED — a complete video pipeline.
PromptVersionSource: seed, auto_tune, admin, rollback — prompt versioning with rollback.

That last enum is the one I find most interesting. The system can change prompts automatically (auto_tune), an admin can override (admin), and if something breaks, you can revert (rollback). It's version control applied to AI instructions — which is exactly what you need when the prompt is part of the product and not just an implementation detail.

The honest limit of the pipeline

The secrets scanner blocked three files — including lib/repo-ingestion/__tests__/context-builder.test.ts and lib/repo-ingestion/__tests__/file-policy.test.ts. Those tests are precisely what would verify the ingestion logic works correctly. I couldn't read them.

That means there's a part of this pipeline I'm describing without having seen its test suite. It might surface edge cases I'm not accounting for. I'm declaring that because it's the right behavior: when the scanner blocks, the honest move is to say so, not invent what might be inside.

The other limit — the one I keep coming back to — is that the 81 threshold for EDITORIAL_GATE_MIN_SCORE is my own judgment without any external validation. It works as long as evaluator and generator stay on the same calibration. It's the kind of technical debt that doesn't show up until the model changes versions.

The question this leaves me with

I built a system that can reject my own content. That's exactly what I wanted — judgment that doesn't bend when I'm in a rush or tempted to ship something mediocre.

But the system audits against a score I calibrated myself. If that calibration is off, I'm blocking good posts or approving bad ones with equal confidence and no way to tell the difference.

The practical next move is to instrument the gate: log every score alongside the resulting published content, then build a manual correlation between score and actual reader engagement after the fact. Without that feedback loop, 81 is a bet disguised as a criterion.

How would you measure whether the quality gate is actually calibrated? The answer I give myself right now doesn't fully convince me.

Comments (0)

💬

What do you think of this?

Drop your comment in 10 seconds.

We only use your login to show your name and avatar. No spam.

No comments yet. Be the first — your take matters most when we're few.

TutorialsTypeScriptLLM

OWASP LLM Top 10 in Production: How I Audited My TypeScript Agent Pipeline Against All 10 Risks — and What I Found

Running the OWASP LLM Top 10 as a real audit is a completely different experience than reading it as a checklist. I ran it against my TypeScript agent stack with system prompts, MCP tools, and Cline — and the findings were uncomfortable.

9 min79

TutorialsTypeScriptpnpm

pnpm workspaces in a monorepo: the setup that survived CI on Railway and the problems the docs don't warn you about

pnpm workspaces is the best option for TypeScript monorepos in 2026. But the happy path in the docs hides three traps that only show up in CI with real deployments: phantom dependencies, broken hoisting on Railway, and script filtering that doesn't filter what you think it does.

9 min94

TutorialsTypeScriptnextjs

OAuth 2.0 Scope Creep: the Attack Vector the Vercel Incident Exposed and How to Audit It in Your Integrations

The Vercel incident wasn't a technical vulnerability — it was a least-privilege failure applied to OAuth. Break down what scope creep is, how to audit it in existing integrations, and what architectural controls prevent a third party from accumulating permissions it doesn't need.

11 min77

How I built a self-auditing editorial pipeline with AI

The problem that made me build this

What the repo's editorial scope reveals

The central technical decision: numeric score as contract

The review workflow as state machine

Crons that stopped working and how I fixed it

What the data model reveals about the product

The honest limit of the pipeline

The question this leaves me with

Comments (0)

What do you think of this?

Related Articles

OWASP LLM Top 10 in Production: How I Audited My TypeScript Agent Pipeline Against All 10 Risks — and What I Found

pnpm workspaces in a monorepo: the setup that survived CI on Railway and the problems the docs don't warn you about

OAuth 2.0 Scope Creep: the Attack Vector the Vercel Incident Exposed and How to Audit It in Your Integrations