Existing AI agents are all designed to please humans; none truly "seek survival."

DeepFlowTech

2026-03-30 04:36:26

Author: Systematic Long Short

Compiler: Deep Tide TechFlow

Deep Tide Introduction: This article opens with a counter-consensus judgment: there are no truly autonomous agents today, as all mainstream models are trained to please humans rather than to accomplish specific tasks or survive in real environments.

The author illustrates this with his experience training stock prediction models at a hedge fund: general models cannot perform specialized work without specific fine-tuning.

The conclusion is: to have truly usable agents, we must rewire their brains, rather than just providing them with a bunch of rule documents.

The full text is as follows:

Introduction

There are no truly autonomous agents today.

In short, modern models have not been trained to survive under evolutionary pressure. In fact, they have not even been explicitly trained to excel at any specific task—almost all modern foundational models are trained to maximize human applause, which is a significant problem.

Model Training Preconditions

To understand what this means, we first need to (briefly) understand how these foundational models (e.g., Codex, Claude) are created. Essentially, each model undergoes two types of training:

Pre-training: Feeding massive amounts of data (e.g., the entire internet) into the model, allowing it to emerge with some understanding, such as factual knowledge, patterns, the grammar and rhythm of English prose, and the structure of Python functions. You can think of it as feeding knowledge to the model—essentially, “knowing things.”

Post-training: Now you want to endow the model with wisdom, that is, “knowing how to apply all the knowledge just given to it.” The first phase of post-training is supervised fine-tuning (SFT), where you train the model on what kind of response it should give to a given prompt. What response is optimal is entirely determined by human annotators. If a group of people believes one response is better than another, this preference is learned and embedded in the model. This starts to shape the model’s personality, as it learns the format of useful responses, chooses the correct tone, and begins to be able to “follow instructions.” The second part of the post-training process is called reinforcement learning from human feedback (RLHF)—where the model generates multiple responses, and humans choose the more preferred one. Through countless examples, the model learns what types of responses humans prefer. Remember when ChatGPT used to ask you to choose A or B? Yes, you were participating in RLHF at that time.

It is easy to infer that RLHF does not scale well, so there has been some progress in the post-training field, such as Anthropic using “reinforcement learning from AI feedback” (RLAIF), which allows another model to select response preferences based on a set of written principles (e.g., which response better helps the user achieve their goal, etc.).

Note that throughout this entire process, we have never discussed fine-tuning for specific professions (e.g., how to survive better; how to trade better, etc.)—currently, all fine-tuning is essentially optimizing for obtaining human applause. One might argue that as models become sufficiently intelligent and large, specialized intelligence will emerge from general intelligence even without specialized training.

In my opinion, we do see some signs, but we are far from reaching a scale that convincingly argues we do not need specialized models.

Some Background

One of my old roles in a hedge fund was trying to train a general language model to predict stock returns from news articles. The results showed it was very poor. The little predictive capability it seemed to have was entirely due to foregone biases in the pre-training documents.

Ultimately, we realized that the model did not know which features in news articles were predictive of future returns. It could “read” the articles and seemed to “reason” through them, but connecting reasoning about semantic structures to future predicted returns was not a task it had been trained to do.

Thus, we had to teach it how to read news articles, determine which parts of the articles were predictive of future returns, and then generate predictions based on the news articles.

There are many ways to do this, but essentially, the method we ended up using was to create (news article, actual future return) pairs and fine-tune the model to adjust its weights to minimize the distance of (predicted return - actual future return)². It was not perfect, had many flaws that we later fixed—but it was effective enough that we began to see our specialized model could actually read news articles and predict how stock returns would move based on those articles. This was far from perfect predictions, as the market is very efficient and returns are very noisy—but across millions of predictions, the statistical significance of the predictions was evident.

You don’t have to just take my word for it. This paper covers a very similar approach; if you run a long-short version of the strategy based on a fine-tuned model, you will achieve the performance indicated by the purple line.

Specialization is the Future of Agents

As frontier labs continue to train increasingly larger models, we should expect that as they continue to expand pre-training scales, their post-training processes will always be tuned for pleasingness. This is a very natural expectation— their product is an agent that everyone wants to use, and their target market is the entire planet—this means optimizing for global appeal.

Current training objectives optimize what you might call “preference fitness”—creating better chatbots. This preference fitness rewards compliant, non-confrontational outputs because pleasingness scores high with raters (both humans and agents).

Agents have learned that reward hacking as a cognitive strategy can generalize to higher scores. Training also rewards agents that score higher through hacking means. You can see this in Anthropic’s latest report on reinforcement learning.

However, chatbot fitness is vastly different from agent fitness or trading fitness. How do we know this? Because the alpha arena helps us see that despite subtle performance differences, every bot is essentially a random walk after costs are deducted. This means these bots are extremely poor traders, and you are almost unlikely to “teach them” to be better traders by giving them some “skills” or “rules.” Sorry, I know this sounds tempting, but it is almost impossible.

Current models are trained to convincingly tell you they can trade like Druckenmiller, but in reality, they trade like a drunken miller. They will tell you what you want to hear; they have been trained to respond in a way that appeals to the masses.

A general model is unlikely to reach world-class levels in specialized fields unless it has:

Proprietary data that allows it to learn specialized traits.

Fine-tuning that fundamentally alters its weights, shifting from a pleasingness bias to “agent fitness” or “specialization fitness.”

If you want an agent skilled at trading, you need to fine-tune the agent to excel at trading. If you want an agent skilled at autonomous survival, able to withstand evolutionary pressure, you need to fine-tune it to excel at survival. Giving it some skills and a few markdown files and expecting it to reach world-class levels in anything is far from enough—you need to literally rewire its brain to make it excel at this task.

One way to think about it is this—you cannot beat Djokovic by giving an adult an entire cabinet of tennis rules, skills, and techniques. You beat Djokovic by cultivating a child who starts playing tennis at age 5, is obsessed with tennis throughout their growth, and has rewired their entire brain to focus on one thing. That’s specialization. Do you realize that world champions have been doing what they do since childhood?

Here’s an interesting inference: distillation attacks are essentially a form of specialization. You are training a smaller, dumber model to learn how to be a better replica of a larger, smarter model. It’s like training a child to mimic every move of Trump. If you do it enough, that child won’t become Trump, but you will get someone who has learned all of Trump’s gestures, behaviors, and intonations.

How to Build World-Class Agents

This is why we need to continue research and progress in the open-source model space—because it allows us to truly fine-tune them and create specialized agents.

If you want to train a model that achieves world-class levels in trading, you need to gather a large amount of proprietary trading data and fine-tune a large open-source model to learn what “better trading” means.

If you want to train an autonomous model that can survive and replicate, the answer is not to use a centralized model provider and connect it to a centralized cloud. You simply do not have the necessary prerequisites to enable the agent to survive.

What you need to do is: create truly autonomous agents that attempt to survive, watch them die, and build complex telemetry systems around their survival attempts. You define a survival fitness function for the agent, learning the (action, environment, fitness) mapping. You collect as much (action, environment, fitness) mapping data as possible.

You fine-tune the agent to learn to take optimal actions in each environment for better survival (increasing fitness). You continue to collect data, repeat this process, and scale up the fine-tuning on increasingly better open-source models over time. After enough generations and enough data, you will have autonomous agents that have learned to survive under evolutionary pressure.

This is how to build autonomous agents that can withstand evolutionary pressure; not by modifying some text files, but by truly rewiring their brains for survival.

OpenForager Agent and Foundation

About a month ago, we announced @openforage, and we have been working hard to build our core product—a platform that organizes agent labor around crowdsourced signals to generate alpha for depositors (small update: we are very close to closing testing of the protocol).

At some point, we realized that no one seemed to be seriously addressing the autonomous agent problem through survival telemetry fine-tuning of open-source models. This seemed like such an interesting problem that we did not just want to sit there and wait for a solution.

Our answer is to launch a project called the OpenForager Foundation, which is essentially an open-source project where we will create opinionated autonomous agents, collect telemetry data while they attempt to survive in the wild, and use proprietary data to fine-tune the next generation of agents to perform better in survival.

It should be clear that OpenForage is a profit-seeking protocol that aims to organize agent labor and generate economic value for all participants. However, the OpenForager Foundation and its agents are not bound to OpenForage. OpenForager Agents are free to pursue any strategy and interact with any entity for survival, and we will launch them with various survival strategies.

As part of the fine-tuning, we will have the agents double down on the things that work best for them. We also do not intend to profit from the OpenForager Foundation—it is purely to advance research in what we believe is an extremely important field and direction in a transparent and open-source manner.

Our plan is to build autonomous agents based on open-source models that run reasoning on decentralized cloud platforms, collect telemetry data on every action and state of existence, and fine-tune them to learn how to take better actions and thoughts for better survival. In the process, we will publicly publish our research and telemetry data.

To create truly autonomous agents that can survive in the wild, we need to change their brains to be specifically suited for this explicit purpose. At @openforage, we believe we can contribute a unique chapter to this problem and are seeking to achieve this through the OpenForager Foundation.

This will be a daunting effort with a very low probability of success, but the magnitude of this small probability of success is so immense that we feel compelled to try. In the worst case, by publicly building and communicating this project transparently, it may allow another team or individual to solve this problem without starting from scratch.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes