Blog | Juanchi.dev | Juanchi.dev

All Experiments History Opinion Reflections Technology Tutorials

Tag: research-driven-agents

1 articles

How They Broke the Top AI Agent Benchmarks — and What That Says About My Stack

I read the paper that exploded on HN about how top AI agent benchmarks get shattered. The problem isn't the models — it's that we're measuring the wrong things and building on sand. Worst part: I recognized the same patterns in my own agents.

7 min519