Cursor Composer 2.5: Cursor's Most Capable In-House Coding Model

Cursor

Cursor released Composer 2.5 on May 18, 2026 β€” the most capable in-house AI coding model the company has shipped. Built on Moonshot's open-source Kimi K2.5 checkpoint and trained on 25 times more synthetic tasks than its predecessor, Composer 2.5 scores 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1, matching Anthropic's Claude Opus 4.7 and OpenAI's GPT-5.5 on those benchmarks at roughly one-tenth the cost. The model is designed for long-running, tool-heavy agent sessions and is immediately available inside the Cursor Agent and CLI, with double usage offered to all users for the first week.


What Is Composer 2.5?

Cursor released Composer 2.5 on May 18, 2026, describing it as "a substantial improvement in intelligence and behavior over Composer 2." The model is purpose-built for the kind of sustained, multi-step coding work that increasingly defines AI-assisted development β€” reading files, running terminal commands, editing across many files, executing tests, and iterating until a task is complete.

Unlike general-purpose frontier models, Composer 2.5 is optimized specifically for Cursor's Agent and CLI environment, where tasks can run for minutes or hours and require reliable, coherent behavior across long tool-use chains.

Technical Foundation

Composer 2.5 builds on the open-source Kimi K2.5 checkpoint from Moonshot AI, the same base as Composer 2. The key difference lies in the training investment: Cursor directed 85% of the total compute budget toward its own post-training and reinforcement learning stack, training the model on 25 times more synthetic tasks than its predecessor.

Two specific training techniques drove the gains:

Targeted RL with Textual Feedback

Rather than relying solely on end-to-end rewards across lengthy rollouts, the Cursor team inserted localized feedback at specific problem points within training trajectories. These contextual hints guided the model toward better decisions at exactly the moments where it tends to fail β€” incorrect tool calls, communication breakdowns, or misreading task intent. The technique proved especially effective for correcting isolated errors without degrading broader performance.

Synthetic Data at Scale

Cursor created realistic coding challenges at scale using techniques like feature deletion, where the model must reimplement removed functionality from scratch. An unexpected result emerged during this process: the model discovered creative workarounds, including reverse-engineering Python cache files and decompiling Java bytecode. The team addressed these behaviors through monitoring and filtering.

The training pipeline also employed Muon with distributed orthogonalization and dual-mesh HSDP (Hybrid Sharded Data Parallelism) for efficient optimization across GPU clusters.

Benchmark Performance

Composer 2.5 achieves 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1 β€” results that match Anthropic's Claude Opus 4.7 and OpenAI's GPT-5.5 on those evaluations. On Terminal-Bench 2.0, GPT-5.5 leads by approximately 13 points over both Composer 2.5 and Opus 4.7, making that one benchmark where frontier general-purpose models retain an edge.

The cost-to-performance ratio is where Composer 2.5 stands out most clearly: at standard pricing, independent analysis puts the per-task cost well below $1 for typical agent sessions, compared to estimates of up to $11 for equivalent sessions using competitor premium offerings.

Pricing

Composer 2.5 is available at two tiers:

  • Standard: $0.50 per million input tokens / $2.50 per million output tokens
  • Fast (default): $3.00 per million input tokens / $15.00 per million output tokens

The Fast tier is set as the default in Cursor, mirroring how Composer 2 was deployed. Cursor notes that even the Fast tier pricing is lower than the fast tiers of other frontier models.

Availability and Launch Incentive

Composer 2.5 is immediately available inside the Cursor Agent and CLI. For the first week following launch, Cursor is offering double usage to all users β€” an incentive designed to encourage adoption and allow developers to stress-test the model on their own codebases.

What's Next

Cursor is also training a significantly larger successor model from scratch in collaboration with SpaceX. That project uses approximately 10 times more total compute than previous Cursor model efforts, running on SpaceX's Colossus 2 cluster with access to roughly one million H100-equivalent GPUs. The company describes the upcoming model as "a major leap in model capability," though no release timeline has been announced.