o1

Frontier Reasoningby OpenAIReleased September 2024

OpenAI o1 introduced a new paradigm: test-time compute scaling. Rather than simply increasing parameters, o1 "thinks" before answering — spending compute on chain-of-thought reasoning traces before producing a response. This yields dramatically improved performance on hard math, scientific reasoning, and competitive programming at the cost of slower response times.

Context Window
128,000 tokens
License Model
Proprietary
Intel Assets
01
Benchmark Status
Verified
  • Chain-of-thought reasoning (internal)
  • PhD-level science and math
  • Competitive programming
  • Multi-step logical deduction
  • Complex coding and debugging
Input: $15.00/1M tokens | Output: $60.00/1M tokens. Significantly more expensive than GPT-4o, justified for hard reasoning tasks.
AIME 2024: 83.3% | MATH: 94.8% | GPQA Diamond: 78.3% | Codeforces Rating: 1807 (96th percentile)
Opens a new product category: reasoning models. Anthropic's response is expected. Google's Gemini 1.5 thinking mode is a partial analog. o1's benchmark performance on hard tasks is the current undisputed frontier.
Related Models
GPT-4oo1-minio1-preview