Overview
On February 24, 2025, Anthropic released Claude 3.7 Sonnet — the first production model to make “extended thinking” a first-class, user-facing feature.
Unlike o1/o3’s internal chain-of-thought reasoning (hidden from users), Claude 3.7 Sonnet’s thinking process was visible and configurable. Users could set a “thinking budget” — from 1,000 to 64,000 tokens — and watch the model reason through complex problems in real time.
The “10x Developer” Moment
Claude 3.7 Sonnet’s standout capability was agentic coding at human scale:
- Could plan and execute a multi-step refactoring across a 50,000-line codebase
- Could write, run, and debug tests in a single conversation
- Maintained context across dramatically longer interactions than previous models
On SWE-bench Verified (the authoritative version of the coding benchmark), Claude 3.7 Sonnet scored 62.3% — a significant jump from Claude 3.5 Sonnet’s 49%.
Significance
Claude 3.7 Sonnet’s release had two lasting impacts:
- “Thinking budget” became a standard feature across AI products — users learned to allocate compute for complex tasks just as they allocated memory for running programs
- It established Anthropic’s identity as “the coding model” — even as OpenAI competed on benchmarks, Anthropic competed on developer experience and code quality