Broken AI Agent Benchmarks: What It Says About Your Stack | Juanchi.dev