o1

Frontier Reasoningby OpenAIReleased September 2024

OpenAI o1 introduced a new paradigm: test-time compute scaling. Rather than simply increasing parameters, o1 "thinks" before answering — spending compute on chain-of-thought reasoning traces before producing a response. This yields dramatically improved performance on hard math, scientific reasoning, and competitive programming at the cost of slower response times.

Context Window
128,000 tokens
License
Proprietary
  • Chain-of-thought reasoning (internal)
  • PhD-level science and math
  • Competitive programming
  • Multi-step logical deduction
  • Complex coding and debugging
Input: $15.00/1M tokens | Output: $60.00/1M tokens. Significantly more expensive than GPT-4o, justified for hard reasoning tasks.
AIME 2024: 83.3% | MATH: 94.8% | GPQA Diamond: 78.3% | Codeforces Rating: 1807 (96th percentile)
Opens a new product category: reasoning models. Anthropic's response is expected. Google's Gemini 1.5 thinking mode is a partial analog. o1's benchmark performance on hard tasks is the current undisputed frontier.
Related Models
GPT-4oo1-minio1-preview