OpenAI o3

In late December 2024, OpenAI unveiled o3, a new AI model that advances the way AI systems process information. Unlike the GPT series, o3 uses an innovative approach to problem solving that closely mirrors human cognitive processes.

Model name

OpenAI’s model naming shows advances in capabilities and design. The “o” series, starting with o1, highlights models that emphasize advanced reasoning and problem-solving skills, a departure from the “GPT” nomenclature. While GPT-4o (“o” for “omni”) focuses on multimodal features, processing text, images and audio, the o3 model focuses on reasoning and analysis tasks.

Technical innovation

OpenAI o3

O3 implements “test-time computation,” which allows it to spend long periods of time exploring solutions, similar to human thinking. It works in two modes: high computation for maximum performance and low computation for efficiency. Even in low computation mode, o3 demonstrates capabilities that exceed average human criteria. Benchmark performances of the model include:

  • 87,5% accuracy on ARC-AGI benchmark in high compute mode
  • 25,2% accuracy on the Frontier Math benchmark, solving research-level math problems

76% accuracy on ARC-AGI in low compute mode, setting a new baseline for efficient AI performance

  • These measures represent a significant advance in AI problem-solving capabilities.

What is OpenAI o3?

OpenAI considers that the o1 and oXNUMX models o3 are at the forefront of LLM development. As a reasoning model, o3 is designed to handle more complex tasks than existing model types, such as GPT-4o. The o3 model uses a process called simulated reasoning, which allows the model to pause and reflect on its internal thought processes before responding. Simulated reasoning goes beyond chain-of-thought (CoT) prompting to provide a more advanced, integrated, autonomous approach to self-analysis and reflection on model output. Simulated reasoning mimics human reasoning by identifying patterns and drawing conclusions based on those patterns.

What can OpenAI o3 do?

As a transformer-based model, it can handle common LLM activities including knowledge-based answering, summarization, and text generation. The o3 model has advanced capabilities in several areas :

  • Advanced reasoning. The model is capable of step-by-step logical reasoning and can handle complex tasks requiring detailed analysis.
  • Programming and Coding. The o3 model is very proficient in coding, achieving an accuracy of 71,7% on SWE-bench Verified, a benchmark that consists of real software tasks, marking a 20% improvement over the o1 model.
  • Mathematics. Users can perform complex mathematical operations with the model with a capability that surpasses o1. OpenAI reported that o3 achieved 96,7% accuracy on the American Invitational Mathematics Examination (AIME), compared to 83,3% for o1.
  • Science. The o3 model will also be useful for scientific research. According to OpenAI, the model achieved 87,7% accuracy on GPQA Diamond, a benchmark that tests PhD-level scientific questions.
  • Self-fact-checking. O3 can self-fact-check, improving the accuracy of its answers.
  • Adaptability to general artificial intelligence. Among the major advances claimed by OpenAI for o3, there is the performance on the ARC-AGI benchmark.

OpenAI or 3-mini

OpenAI o3-mini

On January 31, 2025, OpenAI released o3-mini for all ChatGPT users (including the free tier) and some API users. O3-mini offers three levels of reasoning effort: low, medium, and high. The free version uses medium. The more computationally intensive variant is called o3-mini-high and is available to paid subscribers.

OpenAI o3-mini is the newest and most cost-effective model in their reasoning series. This model pushes the boundaries of what small models can achieve, delivering exceptional STEM capabilities – with particular strength in science, math, and coding – while maintaining the low cost and low latency of OpenAI o1-mini.

Developers can choose between three reasoning effort options – low, medium, and high – to optimize for their specific use cases. This flexibility allows o3-mini to “think harder” when tackling complex challenges or prioritize speed when latency is an issue.

Access and availability of OpenAI o3 and o3-mini

The initial version of the o3 model was restricted and limited, primarily used for public safety testing, requiring potential users to request access. As of February 3, 2025, the base o3 model is only available as part of the OpenAI Deep Search service, which is initially exclusive to ChatGPT Pro users. The o3-mini model became generally available on January 31, 2025. It is accessible through several channels:

  • Access Chat GPT :
    • Free users have limited access to the o3-mini model with rate restrictions. To access it, free plan users can select “Reason” in the message composer or regenerate a response. This is the first time a reasoning model has been made available to free users in ChatGPT.
    • ChatGPT Plus users get access to the o3-mini model with a limit of 150 messages per day. As part of the upgrade, OpenAI is tripling the throughput limit for Plus and Team users from 50 messages per day with o1-mini to 150 messages per day with o3-mini.
    • ChatGPT Pro users have unlimited access to the o3-mini model. Pro users also have the ability to select o3-mini-high in the template selector for a smarter version that takes a little longer to generate responses.
  • API Access: The o3-mini model is available via the API for developers with an initial pricing of $1,10 per million input tokens and $4,40 per million output tokens. OpenAI o3-mini is rolling out to the Chat Completion API, Assistants API, and Batch API starting January 31, 2025, to select developers in API usage tiers 3-5.

In ChatGPT, o3-mini uses average reasoning effort to balance speed and accuracy. All paid users also have the option to select o3-mini-high in the template selector for a smarter version that takes a little longer to generate responses.

Security techniques

The o3 model incorporates a security technique called deliberative alignment, which uses the model’s reasoning to evaluate the security implications of user requests. This approach allows the model to analyze prompts and identify hidden intentions, improving the accuracy of rejecting dangerous content and avoiding unnecessary rejections of safe content. On February 6, 2025, OpenAI announced an update to improve the transparency of the thinking process in its o3-mini model.

Impact

The introduction of the o3 model signifies an evolution towards AI systems capable of handling complex reasoning and problem-solving tasks. Its improved performance and innovative features make it a valuable tool for various applications, from coding to scientific research.