We are living in the age of AI and it is crazy how far we have come, but what is coming is perhaps even more incredible. You see, back in the day we talked about OpenAI’s GPT-2 which could complete your sentences. Amazing feat at the time, but today, not so impressive.
But, over time, we got to GPT version 4 which is so much smarter, and is a proper AI assistant with hundreds of millions of users all around the world. And our question is - where are we going? Is it just going to keep improving?
Is creating a superintelligent AI possible? And what do video games of all things have to do with that? Well, most experts I hear about are saying that we have a key puzzle piece missing for achieving superintelligence: and that is reasoning.
You see, these are neural network-based techniques that can learn to recognize a mug. But not as a human would recognize a mug. Many modern techniques require thousands and thousands, if not hundreds of thousands of images of a mug to understand what it is.
To a human, you just show them one, and they understand the concept. That is an example of reasoning. Learning from very little data.
So, how do video games come into the picture? Well, one way it can help is if a computer game is nearly as detailed as reality, we can just drop an AI in there, and let it learn for as long as it wishes. Infinite self-driving car scenarios, infinite worlds for robots to train in, you name it.
That kind of starting to work today, so that is fantastic. Two, they can play computer games themselves, like DeepMind’s early AI played and mastered Atari breakout, and then many other Atari games. And it gets better, today they can even start out in a Minecraft world and get to a respectable amount of achievements.
In these games, it makes a series of actions with the controller, and at the end if it got a good score, it can assign that good score to the whole chain of actions it just did. If the score is not great, it can learn that this chain of actions was not very successful. This is a technique often referred to as reinforcement learning that can be combined with a neural network for looking and understanding the video stream of the game itself.
Please remember this because this might be the key to creating an incredible capable AI. So how do we do that? I would like to ask for a little more of your patience, but I promise we’ll get there soon.
So, this video game thing still does not necessarily help AIs to reason. Computer simulations create tons of data for them to learn on. Remember, reasoning is learning from very little data.
And current AIs are not too good at that. Until now. OpenAI released their o1 system, which is not GPT-5.
It is not just the next ChatGPT in line, it is a completely different kind of system. So, what is different? And where are we going with this?
Dear Fellow Scholars, This is Two Minute Papers with Dr Károly Zsolnai-Fehér. Well, when you ask a question, previous methods give you an answer almost immediately. However, the new o1 takes a little time to think.
Here is my friend, Dr Jim Fan’s FANtastic visualization of it. Okay, but does that help? Is that a good thing?
It surely does help us humans, a tough exam takes not several seconds, but often several hours. If we take longer, we do better. But machine intelligence is not human intelligence.
So, is thinking longer good for them too? Look. It is not good…it is fantastic.
If we let it think for longer, the results can go from 20% accuracy to even 80% accuracy. At a great cost, by the way, as this is a logarithmic scale. But that is a groundbreaking observation.
But it doesn’t help with everything. If you ask a simple question, it might not make a difference. However, if you want to break cryptographic cypher, solve tough brain teasers, do market research, it is fantastic.
They just showcased 3 excellent and one not so excellent example of that. These are new real uses cases that can help people in the real world. And they are almost irrelevant, I will tell you why.
One, here it helps with expanding your company within Europe. And…you see that it is thinking. Now, of course, this is not the same kind of thinking as we humans do, it is a slightly antropomorphic, a humanified term if you will.
And it talks about market entry strategy, risk analysis, financial planning. What is incredible if that if there is some part that you like for instance, Berlin sounds good, you just ask a follow up question and go deeper. Two, it can also help you code up a web app for your users to create their profiles.
It thinks for only a couple of seconds, and you get a full table of contents, a piece of code. Then you can even specialize it to your infrastructure, for instance, if you use Microsoft’s Azure cloud, it can help with that too. Three, it can also help you make calculations about a financial instrument called covered calls, I highly recommend it in case you are looking to lose a ton of money really quick.
And four, or if you are a Fellow Scholar, just ask about how to make dogs a little healthier, what nutrients are important and what it recommends we do in terms of research when creating new kinds of dog food. I got to say I am a little less impressed by this because the task is not that specific, so general information can do a lot of heavy lifting here. So o1 is absolutely incredible, it can help quantum physicists with their research and do many other things the regular ChatGPT AI can only dream of.
So, let’s pop the question, can this o1 thing reason? And can it lead us to some kind of superintelligence? No.
At least that is what scientists at Apple, yes Apple say. They say not so fast. These systems can’t reason.
Why? Well, if you take some relatively simple tasks, and add more details to a question that seem relevant, but are in fact are completely irrelevant, basically junk, I wonder what does it do with current AIs? It shouldn’t change anything of course, it’s irrelevant information.
But that’s not what happens…look. It throws them off pretty reliably. This applies to previous AIs, but then, interestingly, it also applies to the new o1 system too.
Or you can change a few numbers or names in the same tasks and they suddenly perform a bit worse. Not what you would expect from an intelligent being. So they say these AIs can’t really reason.
They are just complex pattern matching machines. They do that really well, but ultimately, they are just matching patterns. Now there is a philosophical question whether what humans are doing is anything more than "complex pattern matching”.
Sometimes I am thinking the question shouldn’t be whether machines can think, but whether humans can think. So, this is where things fizzle out? We won’t get to more intelligent systems, ever?
Is it impossible? Well, not so fast. Don’t despair, Fellow Scholars.
Here is a big reason why it might still happen. So here is the video game thing I promised. You see, OpenAI’s o1 is not just a neural network, but it is also a reinforcement learning system as well.
We saw earlier that it builds up a chain of thoughts… And then, at the end, we can tell the AI whether it did well, or it did not. But…wait a minute. Do you see where this is going?
That is exactly like a video game. Remember, in a video game, you make a series of actions, and you get a score. And if you got a high score, then you can look at the whole series of actions, and it essentially becomes new training data for the AI that it can learn from.
Over time, we say we liked this answer, and we didn’t like that, and it can assign scores to its chain of thoughts, and get tons and tons of data on how to improve. Data to improve reasoning. It sees the whole thing as trying to solve a little video game.
It’s like the earlier AlphaGo AI, which simulated many actions into the future during a game, and if it led to a winning position, it learned from the whole chain. So it gets smarter, and if it gets smarter, it will create better actions, and then, it learns from those better actions, it gets even smarter, creating a potential flywheel effect. We don’t know if that is exactly what is going to happen, this is how research works, but that seems to be OpenAI’s grand plan.
And here, ultimately, OpenAI’s o1 doesn’t learn how to play a game, it learns to be intelligent and helpful instead. And this also means something really surprising: we keep looking at these results, but they are completely irrelevant. Why?
Because this is the foundation of a new kind of system that will be able to learn faster from its own mistakes over time. This is the worst version of o1 you’ll ever see. And that is what might lead to an incredibly intelligent AI in the future.
And we probably won’t have to wait for long, the pace of progress in AI research is staggering. It is probably learning and getting better right as you are watching this video. And it cannot wait to help us with our Scholarly problems.
What a time to be alive!