Research & Analysis

What we're learning

Benchmarks, model comparisons, methodology, and insights from testing what LLMs actually build.

24 February 2026
I gave three LLMs the same brief and asked them to build a website. Here's what happened.
Same brief. Same constraints. Three models. Opus 4.6, Sonnet 4.5, and GPT-5.2 each built a complete website from a single client spec. The results are revealing.
comparisonopussonnetgptresearch