Overview
In early 2025, AI systems made a qualitative leap: from conversational assistants that answer questions to autonomous agents that take actions. Three landmark releases defined this transition:
- October 28, 2024: Anthropic released Claude Computer Use in public beta — the first commercially available API allowing AI to see a screen and control a keyboard and mouse
- January 23, 2025: OpenAI launched Operator — an AI agent that could autonomously browse the web, fill forms, place orders, and complete multi-step tasks
- February 2, 2025: OpenAI released Deep Research — an agent that conducts multi-hour autonomous research tasks, synthesizing hundreds of sources into analyst-grade reports
Together, these marked the emergence of agentic AI as a mainstream product category.
Claude Computer Use
On October 28, 2024, Anthropic made Claude’s computer use capability available via API in public beta. This enabled AI to:
- Take screenshots and interpret what’s on screen
- Move a cursor and click on elements
- Type into text fields
- Navigate applications and websites
- Execute sequences of actions to complete goals
Unlike browser-automation tools (Selenium, Playwright), Claude Computer Use operated at the visual interface level — the same way a human would interact with a computer — making it generalizable to any application without custom integration.
Early demonstrations included: filling in forms, writing and running code in a terminal, navigating file systems, and completing multi-step tasks across multiple applications.
OpenAI Operator
On January 23, 2025, OpenAI launched Operator for US Pro subscribers. Powered by a new model called Computer-Using Agent (CUA) — combining GPT-4o vision with reinforcement learning — Operator could:
- Browse any website autonomously
- Handle login flows, shopping carts, form submissions
- Book restaurants, order groceries, fill out applications
- Recover from errors and try alternative approaches
Key benchmarks: OSWorld score of 38.1% (human baseline: ~72%); WebArena: 58.1%.
Operator represented the first time a major AI company shipped an autonomous web agent as a consumer product. Its limitations were also instructive: it struggled with CAPTCHAs, complex multi-page workflows, and tasks requiring real-world judgment. Operator was eventually merged into a unified “ChatGPT agent” on July 17, 2025.
OpenAI Deep Research
On February 2, 2025, OpenAI released Deep Research — an agentic tool designed for long-horizon knowledge tasks. Given a research question, it would:
- Decompose the question into sub-queries
- Autonomously browse and read dozens to hundreds of web sources
- Synthesize findings into a structured, cited report
- Complete tasks in 5–30 minutes
Powered by a version of o3 fine-tuned for browsing, Deep Research represented a new category: AI as research analyst. It produced outputs comparable to what a skilled human researcher might take hours or days to produce.
The MCP Infrastructure Layer
Underpinning the agentic ecosystem was the Model Context Protocol (MCP), released by Anthropic on November 25, 2024. MCP is an open standard that allows AI models to:
- Connect to any data source (databases, file systems, APIs) through standardized connectors (“tools”)
- Maintain state across multi-step task sequences
- Compose multiple tools in a single workflow
By March 2026, MCP had crossed 97 million installs. The Linux Foundation announced it would take MCP under open governance — signaling its transition from proprietary protocol to foundational AI infrastructure, analogous to HTTP for the web.
Why This Matters
The shift from conversational to agentic AI represents the most significant change in how humans interact with AI systems since the public launch of ChatGPT. Key implications:
New failure modes: Agentic AI systems can cause real-world consequences — sending emails, making purchases, executing code — that may be difficult to reverse. Safety research shifted from “preventing harmful outputs” to “preventing harmful actions.”
Economic disruption acceleration: Copilot could help a developer; an agent could be a developer, lawyer, researcher, or analyst. The economic displacement potential expanded from augmentation to substitution.
Trust architecture: Agentic AI requires new frameworks: which agents have what permissions, how actions are audited, when humans stay in the loop. Enterprise adoption required solving problems of authorization, auditability, and scope limitation.
The prompt injection threat: Agents that browse the web are vulnerable to adversarial web content designed to redirect their behavior — a new attack surface with no pre-existing defense framework.