Today we are going to marvel at a fantastic paper that tells us how to create a ChatGPT-like AI assistant, and what is under the hood, and how to do it better. Spoilers: OpenAI just found a way to use an AI to train another AI. It’s crazy, I know.
Let me try to explain. ChatGPT-like AI assistants typically consist of two puzzle pieces. Piece number one, is the knowledge base.
This is a neural network that has read a ton of data, internet forums, textbooks, math exams, film reviews, you name it. From this data, it is able to learn most languages on the planet, and it knows about important movies, events, and of course, it is an excellent scientist. Let’s call this neural network GPT.
However, it is not the assistant that you can use in ChatGPT. No-no! At this point, all it can do is take a sentence, and not answer it.
Not even close. All it can do is complete it. Yes.
That is…not very useful. We usually don’t want our questions completed, we want them answered. So how does that happen?
How does a knowledge base become an assistant? Dear Fellow Scholars, this is Two Minute Papers with Dr Károly Zsolnai-Fehér. Now comes the second step.
Coaching! We need to coach it, or in other words, teach it to how to be useful. Believe it or not, this looks like playing a video game.
Oh yes. We give it example questions, it provides a bunch of answers, and we ask humans to assign scores to these answers. Better answers get a higher score.
This will look like playing a video game where it tries to maximize a score, but to do that, you don’t need to explore of defeat enemies, instead you need to answer correctly. From this, it will learn to be not too vague, not too specific, and to not just answer every complex question with “well, it depends” and not say anything else. That drives us humans up the wall.
And now, hold on to your papers Fellow Scholars, because scientists at OpenAI had a crazy idea in their new paper. And that was to use an AI to teach this AI instead. Not for everything, but for safety-related questions, that is very important.
So teach an AI to teach an other AI? Yes, that is exactly the case. So, does it work?
Well, it does! The ChatGPT that you Fellow Scholars are using day by day is already made that way. I found to be very surprising.
You see, not every prompt should be answered, if we ask something really unsavory, the assistant has to make a decision: comply, or refuse to help. If it refuses, it also has to provide an explanation of some kind. And what this paper proposes is to use an AI to decide when to do which.
So, how good is it? Is it as good as humans? What do you think?
Can it be as good as a human? That sounds crazy. But they say that it is not only as good as a human in this, but even better.
Wow. This is tricky to measure because creating a super safe AI is really, really simple. Just deny every request.
Super safe. Also, not very useful, and thus, they call these “bad refusals”. Ultimately, what scientists at OpenAI tried to strike is a balance between safety and usefulness, and in this area, it performs spectacularly.
I find this to be a stunning result because it learned how to do this from humans, and from very little training data, it is able to surpass the people who made the training data. The student outshines the teacher, at least on this small, specialized task. It helps when it is supposed to help, and it refuses when it should not be helping.
And it had 16% less bad refusals than humans do, making these models more useful while keeping them safe. It’s like teaching a little child how to behave and this child could generalize this knowledge to new situations. This feels a bit like true intelligence to me.
That is absolutely amazing. What a time to be alive! I would like to send a huge thank you for this because this is a paper that contains many theoretical and implementation details, so much so that they even give us the source code for this project, all of it free of charge.
So this is the wondrous world of research papers. Subscribe if you wish to see more amazing papers like this. So, what do you think?
What would you Fellow Scholars use this for? Let me know in the comments below.