AI自动优化执行框架让Haiku 4.5终端成绩排第一:林俊旸称这正是他预判的「环境设计」转折

BlockBeatNews

据 1M AI News 监测,斯坦福、MIT 和韩国游戏公司 KRAFTON 的研究者发布 Meta-Harness,一套让 AI 自动优化执行框架(harness,即包裹模型、驱动 Agent 行动的执行脚手架,涵盖提示词设计、工具调用和上下文管理)的方法。不同于人工手写的执行框架,Meta-Harness 让一个编码 Agent 读取历次候选框架的代码、执行日志和评分,自动迭代优化。

在终端操作基准 TerminalBench-2 上,Meta-Harness 将 Claude Haiku 4.5 的通过率做到 37.6%,超过 Goose(35.5%)和 Claude Code(27.5%),在所有已报告的 Haiku 4.5 执行框架中排第一。在 Claude Opus 4.6 上通过率 76.4%,排第二。

前通义千问技术负责人林俊旸转发论文作者的帖子并评论:「模型加执行框架」已超过「只看模型」,Agent 表现会被框架的设计和质量显著影响,「我确实认为这是一个正确的方向」。林俊旸在 3 月 27 日发布的长文(目前已删除)中就预判,环境设计将从副项目变成真正的创业品类。Meta-Harness 用实验数据印证了这个判断:同一模型,换一套 AI 优化过的执行框架,成绩差距可达 10 个百分点。

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Opmerking
0/400
Geen opmerkingen