How I built a self-auditing editorial pipeline with AI
The README.md in the repo literally says: "Juanchi portfolio landing. Automatically synced with your v0.app deployments." Two lines. Vercel badge. Nothing else.
That became a lie in the first month. What actually runs at commit f49b4d1a522a89df7927b5796ef4144ab35ba704 is something else: a system that ingests real repositories, builds a structured code brief, runs content through a numeric quality gate, and automatically rejects or rewrites if it doesn't hit the threshold. All inside Next.js, all on Railway. No Vercel in the middle.
The question I kept running into while building this wasn't "what does the system do?" It was: at what point does an automatic editorial pipeline have better judgment than the person who built it? That's the uncomfortable thing I want to be honest about here.
The problem that made me build this
I was generating content with AI and shipping it. Fast, consistent, polished — and completely interchangeable with what any other developer produces using the same model and the same prompt. No stance, no technical scar tissue, nothing that actually justified me signing it.
The cost wasn't just quality. It was something closer to identity. If every post can come from any Claude instance without specific context, why does juanchi.dev exist as a brand at all?
The answer was: you need a gate. Something that catches generic content before it reaches production. But building that gate is where it gets genuinely weird — because you end up with an AI auditing another AI, and that loop has its own failure modes.
What the repo's editorial scope reveals
Before writing a single line of this post, the pipeline analyzed 907 files in the repo and selected 30 to build editorial context. Not randomly: each file gets an assigned role — entrypoint, domain_logic, data_model, tests, risk_or_security, operations, configuration, documentation.
That's the CodeScopeBrief. The idea is that what reaches the generator isn't a random dump of the repo but a deliberate selection by architectural function. Of 92 available entrypoint files, 4 were selected. Of 426 domain_logic files, 4. Of 121 test files, 3. Fixed budget, balanced coverage.
What gets excluded matters too: three files were blocked by the secrets scanner before ever reaching editorial selection. Not manual intervention — the pipeline detected Anthropic keys and a GitHub PAT and cut them automatically. That's the behavior you want when processing repositories, including your own, which sometimes have a .env committed by accident.
The central technical decision: numeric score as contract
In lib/editorial/editor-service.ts there are three constants that are the actual heart of the system:
// lib/editorial/editor-service.ts
export const EDITORIAL_GATE_MIN_SCORE = 81
export const EDITORIAL_REWRITE_MIN_SCORE = 65
export const EDITORIAL_REWRITE_MAX_ROUNDS = 3The logic: score above 81, it passes. Between 65 and 81, the system retries generation up to 3 times. Below 65 after 3 rounds, it throws EditorialGateBlockedError and the post doesn't exist.
// lib/editorial/editor-service.ts
export class EditorialGateBlockedError extends Error {
constructor(
public readonly reviewId: string,
public readonly score: number,
) {
super(`Editorial gate blocked content with score ${score} (review ${reviewId})`)
this.name = "EditorialGateBlockedError"
}
}What I think works here: the error carries the score in the payload. It's not a rejection boolean — it's auditable evidence. You can build a dashboard of how many posts were blocked and at what average score they failed. That makes the gate observable rather than a black box.
What genuinely worries me: 81 is a number I chose. I have no public evidence this threshold correlates with quality as perceived by actual readers. It's my own judgment hardcoded as contract. It holds as long as the evaluating model and the generating model stay consistent with each other — change one without recalibrating the other, and the threshold becomes noise.
The review workflow as state machine
lib/editorial/revision-workflow.ts handles something subtler than the gate: the lifecycle of content after generation. It includes a function that calculates readTime from word count:
// lib/editorial/revision-workflow.ts
function readTimeFor(content: string) {
const words = content.trim().split(/\s+/).filter(Boolean).length
return Math.max(1, Math.ceil(words / 200))
// 200 words/minute is the standard I used; revisable
}Simple. But look at what isGeneratedContent actually validates: not just that es.title and es.slug exist, but also en.title, en.slug, and en.content. The system is bilingual by contract — generate only in Spanish with no English translation and the content is invalid, it doesn't persist.
I made that decision before writing the first post on juanchi.dev: either you publish in both languages or you don't publish. The cost is real — it doubles token cost on every generation. The benefit is reach: posts that can cross to Dev.to in English without manual translation afterward.
Crons that stopped working and how I fixed it
The workflow .github/workflows/awesome-crons.yml has a comment that tells the actual story:
# Scheduled Awesome jobs run as Railway cron services. The previous GitHub
# schedule called juanchi.dev through Cloudflare and was blocked by managed
# challenge 403 before reaching Next.js.I had GitHub Actions crons calling the app endpoint. Cloudflare was blocking them with 403 because a curl User-Agent without specific headers triggers the managed challenge. The fix was moving crons to Railway directly — Railway has internal access to the app without going through Cloudflare. GitHub Actions stayed only as a manual trigger via workflow_dispatch.
The dispatcher in app/api/admin/awesome/run/[job]/route.ts is deliberate: a single endpoint with in-memory rate limiting (RATE_LIMIT_MS = 60_000) and a job registry by name:
// app/api/admin/awesome/run/[job]/route.ts
const JOBS: Record<string, JobFn> = {
"repo-sync": (ctx) => runRepoSync(ctx),
"series-publish": (ctx) => runSeriesPublish(ctx, { force: true }),
discovery: (ctx) => runDiscoveryJob(ctx),
}
const lastRun = new Map<string, number>()
const RATE_LIMIT_MS = 60_000The in-memory rate limit has a known problem I haven't solved yet: if Railway restarts the service, the map clears and you can fire the same job twice within 60 seconds. For a personal blog, that risk is acceptable. For something with expensive side effects in production, you'd need to persist the last-run timestamp in a database.
The discovery job is the only one that fires with queueMicrotask because it can take minutes — you can't hold the HTTP connection open while it runs. The rest respond synchronously before the maxDuration = 60 the platform enforces.
What the data model reveals about the product
The baseline migration prisma/migrations/20260421000000_baseline_existing_schema/migration.sql has enums that tell the full story:
EditorialReviewStatus:ACCEPTED,REWRITTEN,BLOCKED,APPROVED,REJECTED— the gate's full lifecycle.CuratedVerdict:GEM,WORTH_TRYING,MEH,HYPE,DEAD— the tool curation system.VideoStatus:DRAFT→APPROVED→AUDIO_READY→RENDERING→RENDERED→PUBLISHED→DISCARDED— a complete video pipeline.PromptVersionSource:seed,auto_tune,admin,rollback— prompt versioning with rollback.
That last enum is the one I find most interesting. The system can change prompts automatically (auto_tune), an admin can override (admin), and if something breaks, you can revert (rollback). It's version control applied to AI instructions — which is exactly what you need when the prompt is part of the product and not just an implementation detail.
The honest limit of the pipeline
The secrets scanner blocked three files — including lib/repo-ingestion/__tests__/context-builder.test.ts and lib/repo-ingestion/__tests__/file-policy.test.ts. Those tests are precisely what would verify the ingestion logic works correctly. I couldn't read them.
That means there's a part of this pipeline I'm describing without having seen its test suite. It might surface edge cases I'm not accounting for. I'm declaring that because it's the right behavior: when the scanner blocks, the honest move is to say so, not invent what might be inside.
The other limit — the one I keep coming back to — is that the 81 threshold for EDITORIAL_GATE_MIN_SCORE is my own judgment without any external validation. It works as long as evaluator and generator stay on the same calibration. It's the kind of technical debt that doesn't show up until the model changes versions.
The question this leaves me with
I built a system that can reject my own content. That's exactly what I wanted — judgment that doesn't bend when I'm in a rush or tempted to ship something mediocre.
But the system audits against a score I calibrated myself. If that calibration is off, I'm blocking good posts or approving bad ones with equal confidence and no way to tell the difference.
The practical next move is to instrument the gate: log every score alongside the resulting published content, then build a manual correlation between score and actual reader engagement after the fact. Without that feedback loop, 81 is a bet disguised as a criterion.
How would you measure whether the quality gate is actually calibrated? The answer I give myself right now doesn't fully convince me.
Related Articles
OWASP LLM Top 10 in Production: How I Audited My TypeScript Agent Pipeline Against All 10 Risks — and What I Found
Running the OWASP LLM Top 10 as a real audit is a completely different experience than reading it as a checklist. I ran it against my TypeScript agent stack with system prompts, MCP tools, and Cline — and the findings were uncomfortable.
pnpm workspaces in a monorepo: the setup that survived CI on Railway and the problems the docs don't warn you about
pnpm workspaces is the best option for TypeScript monorepos in 2026. But the happy path in the docs hides three traps that only show up in CI with real deployments: phantom dependencies, broken hoisting on Railway, and script filtering that doesn't filter what you think it does.
OAuth 2.0 Scope Creep: the Attack Vector the Vercel Incident Exposed and How to Audit It in Your Integrations
The Vercel incident wasn't a technical vulnerability — it was a least-privilege failure applied to OAuth. Break down what scope creep is, how to audit it in existing integrations, and what architectural controls prevent a third party from accumulating permissions it doesn't need.
Comments (0)
What do you think of this?
Drop your comment in 10 seconds.
We only use your login to show your name and avatar. No spam.