OpenAI released a report on the hallucination phenomenon of large language models, pointing out the biases in the current evaluation mechanisms and suggesting solutions. (Background: Meta's Zuckerberg is busy! After a salary of over a hundred million dollars, three AI geniuses left in two months) (Additional background: a16z's latest AI top 100 list is out: Grok jumps to the top 4 in a year, Chinese applications break through globally) OpenAI published a study earlier this week on the 'hallucination' phenomenon of large language models, noting that current training and evaluation methods lead models to 'confident guessing' rather than admitting they don't know, which causes hallucinations, and proposes how to move forward. Core of the report: Evaluation methods push models to guess The OpenAI research team found that during current model training, there are many assessment questions presented in multiple-choice format. Models can score points just by guessing correctly by chance; conversely, answering 'I don't know' earns no points. (This is easy to understand, just like when you take a multiple-choice exam, you might fill in random answers even if you don't know the answer, at least there's a chance to get it right.) The report uses the SimpleQA test as an example, comparing the older model o4-mini with the new gpt-5-thinking-mini: the former has slightly higher accuracy but a 'hallucination rate' of 75%; the latter often chooses to abstain but has a significantly lower error rate. OpenAI further pointed out that most developers focus on improving overall accuracy but ignore that 'confident errors' have a much greater impact on users than admitting uncertainty. The research team summarized the root of the problem in one sentence: 'Standard training and evaluation procedures reward models for guessing rather than admitting limitations when uncertain.' In other words, hallucinations are not due to insufficient hardware or parameter scale of the models, but rather the scoring rules that induce models to adopt high-risk strategies. Improving accuracy cannot cure hallucinations The report breaks down five common misconceptions in the industry, the two most important being: first, that simply making the model larger or feeding it more data can eliminate hallucinations; second, that hallucinations are an unavoidable side effect. OpenAI states: The real world is full of information gaps, and any scale of model may encounter 'data sparse' questions. The key question is whether the model has the right to choose 'to abstain.' The report also emphasizes that smaller models can sometimes more easily recognize their own knowledge gaps; by adjusting evaluation criteria, granting partial scores for 'humble responses' while penalizing 'confident errors' more heavily, full-sized models can similarly reduce hallucinations. OpenAI suggests the industry shift from 'correct answer rates' to 'reliability indicators,' such as including error confidence levels in the main KPIs, to encourage models to remain conservative in uncertain situations. Fintech scenario: Trust gaps amplify risks For Wall Street and Silicon Valley, hallucinations are not an abstract academic issue but a variable that directly affects market decisions. Quant funds, investment banks, and cryptocurrency trading platforms increasingly rely on LLMs for text analysis, sentiment interpretation, and even automated reporting. If models hallucinate on details of company financial reports or contract terms, erroneous content may be rapidly amplified through trading algorithms, leading to huge losses. Therefore, regulatory bodies and corporate risk control departments are beginning to pay attention to 'model honesty' indicators. Several brokerages have already included 'uncertainty response rates' in internal acceptance tests, allowing models to preset responses of 'need more data' in unknown areas. This change means that even the most effective AI solutions will struggle to gain acceptance in the financial market if they cannot provide credibility labels. Next Steps: Shift from high-score competition to honest engineering Finally, the path recommended by OpenAI is to rewrite assessment specifications: First, set high penalties for confident incorrect answers; Second, grant partial points for expressing uncertainty appropriately; Third, require models to return verifiable reference sources. The research team claims that this move can force models to learn 'Risk Management' during the training phase, similar to the concept of 'preserving capital' in investment portfolio theory. For developers, this means participants will no longer simply compete on model size but on who can accurately judge when to stop within limited computational budgets; for investors and regulators, the new indicators also provide more intuitive risk control benchmarks. As 'humility' becomes the new trend, the AI ecosystem is shifting from score-oriented to trust-oriented. Related reports ETH breaks through $3600! BlackRock submits staking application for Ethereum ETF, LDO jumps 20% BlackRock's Bitcoin ETF 'IBIT' outperforms all its funds, earning more than ten times that of the larger S&P 500 ETF XRP overtakes USDT to become the third-largest cryptocurrency by market capitalization! But 95% of circulating supply is in profit zone, $3 becomes the critical line for bulls and bears Solana's surface prosperity remains? Contemporary tokens rise solely through behind-the-scenes manipulation, the on-chain cycle may have reached its end <OpenAI explains why AI hallucinations occur? Three solutions to change evaluation myths> This article was first published in BlockTempo, the most influential blockchain news media.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments