Someone finally made it clear about the status quo of GPT! The latest speech by OpenAI Daniel is very popular, and it has to be a genius hand-picked by Musk

Source: Qubit

Following the release of Windows Copilot, the popularity of the Microsoft Build conference was detonated by a speech.

Former Tesla AI director Andrej Karpathy believed in his speech that tree of thoughts is similar to AlphaGo's Monte Carlo Tree Search (MCTS)!

Netizens shouted: This is the most detailed and interesting guide on how to use the large language model and GPT-4 model!

In addition, Karpathy revealed that due to the expansion of training and data, LLAMA 65B is "significantly more powerful than GPT-3 175B", and introduced the large model anonymous arena ChatBot Arena:

Claude scores between ChatGPT 3.5 and ChatGPT 4.

Netizens said that Karpathy's speeches have always been great, and this time, as always, the content did not disappoint everyone.

What became popular with the speech was also a note compiled by Twitter netizens based on the speech. There are a total of 31 notes, and the number of reposts has exceeded 3000+:

So, what was specifically mentioned in this much-watched speech?

How to train GPT assistant?

Karpathy's speech this time is mainly divided into two parts.

Part One, he talked about how to train a "GPT assistant".

Karpathy mainly describes the four training stages of AI assistants: pre-training, supervised fine tuning, reward modeling and reinforcement learning.

Each stage requires a dataset.

In the pre-training stage, a large amount of computing resources are required to collect a large amount of data sets. Train a base model on a large unsupervised dataset.

Karpathy supplements it with more examples:

Using a smaller supervised dataset, fine-tuning this base model with supervised learning creates an assistant model that can answer questions.

He also showed the evolution process of some models. I believe many people have seen the above "evolution tree" picture before.

Karpathy believes that the best open source model currently is Meta's LLaMA series (because OpenAI has not open sourced anything about GPT-4).

What needs to be clearly pointed out here is that the base model is not an assistant model.

Although the base model can answer the question, the answer it gives is not reliable, and it is the assistant model that can be used to answer the question. An assistant model trained on the base model, with supervised fine-tuning, will outperform the base model in generating responses and understanding text structure.

Reinforcement learning is another critical process when training language models.

By training with human-labeled high-quality data, reward modeling can be used to create a loss function to improve its performance. Then, reinforcement training is carried out by increasing the positive label and reducing the probability of negative label.

In creative tasks, the use of human judgment is crucial to improving AI models, and adding human feedback can train models more effectively.

After intensive learning with human feedback, a RLHF model can be obtained.

After the model is trained, the next step is how to effectively use these models to solve problems.

How to use the model better?

In Part Two, Karpathy focuses on hinting strategies, fine-tuning, the rapidly growing tool ecosystem, and future expansion.

Karpathy gave specific examples to illustrate:

When we are writing an article, we will carry out a lot of mental activities, and we need to consider whether our statement is correct. For GPT, this is just a sequence of tokens.

And hint() can make up for this cognitive difference.

Karpathy further explains how the Thought Chain hint works.

For inference problems, if you want Transformer to perform better in natural language processing, you need to let it process information step by step, instead of directly throwing it a very complicated problem.

If you give it a few examples, it will imitate the template of this example, and the final generated results will be better.

The model can only answer questions in its sequence, and if what it generates is wrong, you can prompt it to regenerate.

If you don't ask it to check, it won't check itself.

This involves questions 1 and 2.

Daniel Kahneman, a Nobel laureate in economics, proposed in "Thinking Fast and Slow" that the human cognitive system includes two subsystems, 1 and 2. 1 is mainly based on intuition, while 2 is a logical analysis system.

In layman's terms, 1 is a fast and automatic process, and 2 is a well-thought-out part.

This is also mentioned in a recent popular paper "Tree of thought".

Thoughtful refers to, not simply giving an answer to a question, but more like being used with Python glue code, stringing many together. The model has to maintain multiple hints, and it has to perform some tree search algorithm to find which hints to expand.

Karpathy thinks this line of thinking is very similar to AlphaGo:

When AlphaGo is playing Go, it needs to consider where the next piece will be placed. Initially it learned by imitating humans. But on top of that, it does a Monte Carlo tree search, which leads to strategies with multiple possibilities. It can evaluate multiple possible moves and keep only those strategies that are better. I think it's sort of equivalent to AlphaGo.

In this regard, Karpathy also mentioned AutoGPT:

I don't think it works very well at the moment, and I don't recommend it for practical use. I just think that over time we might be able to take inspiration from where it's going.

Secondly, there is another little coup that is retrieval enhanced generation (retri agumented generation) and effective hints.

The content of the window context is the working memory of the transformers at runtime, and if you can put task-related information into the context, it will perform very well because it has immediate access to this information.

In short, related data can be indexed so that models can be accessed efficiently.

It would perform better if Transformers also had a main document to refer to.

Finally, Karpathy briefly talked about Constraint ing and fine-tuning in large language models. Large language models can be improved through constraint hints and fine-tuning. Constraint hinting enforces templates in the output of large language models, while fine-tuning adjusts the model's weights to improve performance.

I recommend using large language models for low-stakes applications, always combining them with human supervision, seeing them as a source of inspiration and advice, considering copilots rather than making them fully autonomous agents.

About Andrej Karpathy

Dr. Andrej Karpathy's first job after graduation was to study computer vision at OpenAI.

Later, Musk, one of the co-founders of OpenAI, took a fancy to Karpathy and dug people to Tesla. But also because of this incident, Musk and OpenAI fell out completely, and were finally kicked out. At Tesla, Karpathy is the head of projects such as Autopilot and FSD.

In February of this year, seven months after leaving Tesla, Karpathy joined OpenAI again.

Recently, he tweeted that there is currently a lot of interest in the development of an open source large language model ecosystem, which is a bit like a sign of the early Cambrian explosion.

Portal: [1] speech video) [2] thought" essay)

Reference link: [1]

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments