Image Source: The Indian Express
OpenAI has now released its latest, most sophisticated AI models to date—o3 and o4-mini—describing them as a step up in reasoning, problem-solving, and tool use. These models are now able to merge all of ChatGPT's features, such as web search, coding, math, image analysis, and even creating visuals, making them the most flexible and "agentic" AIs OpenAI has ever developed. Early testers and OpenAI itself point to their excellence in hard tasks, with dramatic gains in coding, math, and visual reasoning. They're also quicker and cheaper, establishing new standards in academic and real-world applications.
But there's a catch: these smarter models are also more likely to "hallucinate"—making things up, facts, actions, or sources. Internal OpenAI tests and independent researchers found that o3 and o4-mini hallucinate more frequently than earlier models, sometimes doubling previous rates. For example, on OpenAI’s PersonQA benchmark, o3’s hallucination rate was about twice that of its predecessor, and o4-mini fared even worse. Experts suspect that the reinforcement learning methods used to boost reasoning may inadvertently amplify these errors, with the models making more claims—both accurate and inaccurate.
In spite of all these hurdles, OpenAI is optimistic, committing to conducting more research to address hallucinations and driving the limits in AI capability. Google and Anthropic are also joining the fray with next-gen models.
Sources: TechCrunch, OpenAI, PYMNTS
Advertisement
Advertisement