Flutterby™! : OpenAI's models becoming less stable?

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

OpenAI's models becoming less stable?

2025-04-19 01:14:01.150164+02 by Dan Lyke 2 comments

OpenAI’s new reasoning AI models hallucinate more.

In its technical report for o3 and o4-mini, OpenAI writes that “more research is needed” to understand why hallucinations are getting worse as it scales up reasoning models. O3 and o4-mini perform better in some areas, including tasks related to coding and math. But because they “make more claims overall,” they’re often led to make “more accurate claims as well as more inaccurate/hallucinated claims,” per the report.

It's interesting that we're using terms like "reasoning" in conjunction with machines "hallucinating". Like, when I see a person on the street ranting at the sky I am not thinking of their behavior as connected to "reasoning".

A careful read of this article is also demonstrating all of the ways in which OpenAI has managed to define success for itself...

[ related topics: Invention and Design Software Engineering Mathematics Artificial Intelligence ]

comments in ascending chronological order (reverse):

#Comment Re: OpenAI's models becoming less stable? made: 2025-04-19 01:48:59.559398+02 by: Dan Lyke

Asa Dotzler‬ ‪@asadotzler.com‬ observes:

In reality, they're always hallucinating, because they don't actually know anything and can't discern fact from fiction, but now the useful hallucinations are decreasing and the dangerous ones increasing.

#Comment Re: OpenAI's models becoming less stable? made: 2025-04-19 01:54:31.458357+02 by: brainopener

Here in the future when I see a person on the street ranting at the sky, I wonder if it's a wireless headset.

So my suggestion is to make sure that Bluetooth is disabled on OpenAI servers.

Add your own comment:

(If anyone ever actually uses Webmention/indie-action to post here, please email me)




Format with:

(You should probably use "Text" mode: URLs will be mostly recognized and linked, _underscore quoted_ text is looked up in a glossary, _underscore quoted_ (http://xyz.pdq) becomes a link, without the link in the parenthesis it becomes a <cite> tag. All <cite>ed text will point to the Flutterby knowledge base. Two enters (ie: a blank line) gets you a new paragraph, special treatment for paragraphs that are manually indented or start with "#" (as in "#include" or "#!/usr/bin/perl"), "/* " or ">" (as in a quoted message) or look like lists, or within a paragraph you can use a number of HTML tags:

p, img, br, hr, a, sub, sup, tt, i, b, h1, h2, h3, h4, h5, h6, cite, em, strong, code, samp, kbd, pre, blockquote, address, ol, dl, ul, dt, dd, li, dir, menu, table, tr, td, th

Comment policy

We will not edit your comments. However, we may delete your comments, or cause them to be hidden behind another link, if we feel they detract from the conversation. Commercial plugs are fine, if they are relevant to the conversation, and if you don't try to pretend to be a consumer. Annoying endorsements will be deleted if you're lucky, if you're not a whole bunch of people smarter and more articulate than you will ridicule you, and we will leave such ridicule in place.


Flutterby™ is a trademark claimed by

Dan Lyke
for the web publications at www.flutterby.com and www.flutterby.net.