§ Independent · Reproducible · Evidence-led

AI coding, reviewed by engineers who ship.

Independent reviews of Claude, GPT-5, Gemini, Cursor, Aider, and Windsurf. Battle-tested prompts and hands-on guides written by working engineers — not benchmarks pulled from a blog post.

Scored on 14 real tasks · Updated weekly · Every result reproducible
6guides
6tested prompts
6AI tools tracked
4cheatsheets
2weekly trends
§ AI TASK MATRIX

14 tasks, all the tools, scored

Full matrix →
§ 01 Scaffold Spin up a typed React SPA upcoming § 02 Refactor Cross-file, 60k+ lines 2 guides § 03 Test-gen Unit + E2E from a brief 2 guides § 04 Debug Trace and fix a race 1 guides § 05 Schema Design SQL for multi-tenant 2 guides § 06 Migration Prisma diff → up/down upcoming § 07 Review Find seeded bugs in a PR 1 guides § 08 Docs Typed JSDoc from source upcoming § 09 API Typed client from OpenAPI upcoming § 10 Agent 15-step autonomous fix 4 guides § 11 Frontend Page from a Figma brief upcoming § 12 Perf p99 regression hunt upcoming § 13 Security Audit a dep tree upcoming § 14 Data SQL from natural language upcoming
§ LATEST

Most recent guides & reviews

All guides → Reviews →
APR 24
React useEffect cleanup function: when, why, and 4 patterns
When and why React useEffect needs a cleanup function, the 4 patterns that cover 95% of cases, plus what changed in React 18 Strict Mode (effect…
post6 min
APR 23
Long-context evals keep diverging from reality: the 1M-token number nobody earns
Vendor 1M-context numbers keep outperforming my production RAG task by 30+ points. The three reasons the benchmarks lie, and what I trust instead.
analysis3 min
APR 23
Cursor 3 ships parallel agents: what changes in my pipeline, and what does not
Cursor 3 shipped parallel Composer 2 agents and a background agent on April 2, 2026. Two tests moved in my pipeline, four did not. The 90-second…
analysis2 min
APR 23
RAG defaults 2026 cheatsheet: copy, paste, ship
The RAG parameter defaults that moved my top-1 accuracy from 74% to 91% in 2026. Chunk size, overlap, rerank, hybrid BM25, and the 2 flags people…
cheatsheet2 min
APR 23
Cursor 3 shortcuts and settings cheatsheet
The 18 Cursor 3 keyboard shortcuts and 6 settings that changed since 2.x. Composer, parallel agents, tab-complete, and the bindings they moved.
cheatsheet2 min
APR 23
Claude Opus 4.7 tool calling cheatsheet: the 7 settings that make tool use reliable
The 7 settings that move Claude Opus 4.7 tool-call reliability from 94% to 99.2%. Adaptive thinking, tool_choice, disable_parallel_tool_use, stop_sequences, and the sampling params you must now…
cheatsheet3 min
esc