“The rabbit is ready to eat”. If I ask you to make me a drawing based on this sentence, what would you draw? You have, at least, two options: a rabbit sitting at a table ready to receive its meal or a cooked rabbit ready to be served as a meal.
When you hear this sentence and have to draw it, your brain needs to identify the meaning of these words and their context, according to the knowledge that you have of the culture and the language in which it was said and make the correspondence between this meaning and images it also knows or can imagine: a cooked rabbit or a humanlike rabbit, sitting at the table. Then you choose and start drawing. If you've ever used an artificial intelligence to create images based on a text text prompt, like Midjourney, DALL-E 2 or the app Lensa, this is the process that they're trying to mimic.
And in a matter of seconds, they can come up with results that are more and more similar to what a human being would make. But how do the artificial intelligences arrive at those results? (The revolution of images created by Artificial Intelligence) Let's start by thinking about what we see when we look at an image in a computer screen.
If you zoom a lot on this image, youíll see thousands of little squares, as if it's decomposed in many pieces. These pieces are what we call pixels. This is more or less the way an algorithm understands an image – in pieces which are colours: red/green/blue.
And here I must say that Iím simplifying a lot something that involves advanced applied maths: the process we call machine learning. To machine learning algorithms, every pixel is a number, which corresponds to a kind of coordinate, a position in an imaginary space. These algorithms codify anything like this, including text.
Going back to our rabbit example, when you write a sentence in one of these programmes, It is as if it transformed this sentence into a numeric sequence that it invented itself, found its equivalent in another numeric sequence representing an image and decodified the result in a form that we can recognize: an illustration or photo. But it's even more complicated than this: This type of artificial intelligence works like a complex network of neurons, trained with databases of billions of images taken from the internet, accompanied by descriptive texts. In its learning process, this network which is made of complex mathematical functions, maps every piece of these images and texts, establishing relationships between them.
These relationships help the algorithm to know, for example, that the word rabbit corresponds to the image of a rabbit. But programmes like Midjourney, Stable Diffusion, which Lensa uses, or DALL-E 2 don't store every image of a rabbit with which theyíve been trained. Just like our brains are not just a big file of all the photos and illustrations we've seen in life, and have other ways of memorizing.
What some of them do is practice thousands of times a process equivalent to lowering the quality of these images a lot and then predicting how to rebuild them correctly in high quality. It is as if you took out the antenna of your TV and, from the static in the screen, guessed and rebuilt the image in high definition, pixel by pixel. Other programmes, while mapping and analysing the pieces of the images, keep a kind of essence of the rabbit.
That allows them to use less memory and work faster. In order to do that, they determine what we call latent characteristics, the coordinates that will allow it to build the image of a rabbit. Or an image of anything.
These latent characteristics are non-visible, meaning they go beyond the things we can observe. They're not easily measurable things like "has pointy ears, is furry, eats carrots, is not a hare". And this is also where lies the big innovation of these algorithms – and the part that is hard to explain, even for those who created them.
The people who programmed these artificial intelligences decided which mathematical formulas they would use to learn and with which databases they would be trained. But they don't know exactly what they will learn at the end. In other words, we canít tell what the algorithm has determined, mathematically, that is the essence of the rabbit.
Or which are the pixels that, organized in a certain way, will form the image of a "rabbit ready to eat". What we know is: the algorithms learn concepts that reflect the biases and prejudices of our society – that are present in the databases with which they are trained. This is how Mike Cook, Artificial Intelligence researcher at Kingís College, London, explains it: We tend to think about AI and talk about AI as if it's an AI from the movies where it's very rational and detached from human problems and things like that.
So if an AI tells us something we think that it's telling us it in a very unopinionated objective way, whereas in actual fact it has inherited all of the biases and problems that were in its training data because it's training data came from people like you and me and the things that we put on the internet. In a little over a year and a half, we went from artificial intelligences that produced low quality images to programmes that can make short videos, animation and 3D images from text prompts. This quick evolution means that this technology could give superpowers to professionals in very different industries.
Imagine, for example, an architect creating a building with materials that still haven't been invented. Or scientists obtaining the image of a protein that we don't have in our bodies, in order to synthesize it in a lab and create a new medicine – some are already trying. But in a very short time, we went from closed models, used by a few people, to the open ones, which had over a million users in their first three months.
All of this, of course, comes with its problems. For example, text prompts with the concept of "beautiful woman" or with specific races like "Asian woman" usually produce more images with nudity, because of the predominance of pornography on the internet. Similarly, people of colour have been complaining about the difficult the AIs have of creating images with their characteristics.
And in the creative industries, which are feeling the immediate impact of this technology, protests have already started. Many artists say that, when their work and their style is used to train artificial intelligence and generate images, that is equivalent to theft. Some stock image archives have promised to compensate them financially, but many works have already been used without consent.
And there is an even more serious problem: disinformation and hate speech. It would be possible to create fake photos and videos for political means, harassment or bullying. Some programmes try to avoid the misuse of AI by creating filters but others leave the moderation to users, which hasn't been working very well.
This debate leads us to an alarming vision of our near future: if artificial intelligence can already create images that good and that fast, how can we know that a recording was really made by someone with a camera in real life? Mike Cook, AI researcher, thinks we should still be optimistic, even when it seems hard to do so. There have been some interesting movements from places like the European Union to propose legislation to restrict AI, but it's still moving way way slower than the tech industry is, and because the tech industry is concerned about progress more than anything else, we're seeing speed come.
And that's what is causing all of these problems. When things are going this fast, we're not like applying the breaks. We're not thinking about the dangers, and it's that mid-period that's going to be quite dangerous.
While society adjusts to all of these changes. That said. I think it's important to be optimistic and not to give up.
AI should be something that we feel confident talking about, and we shouldn't mind if we don't know everything, we should discuss it with each other, decide how we want it to change our lives and ask for that from our governments and politicians, so that they can make changes and make laws that can make us safer. What is certain is that we are in the middle of a technological revolution that has started in the creative industry but will spread to many others. And, like all technological revolutions, it is in our hands.
It will cause a lot of conflict and probably some losses. But if we know how to steer it, it promises to open up the door to a future that, a little while ago, we could only imagine. To watch more videos like this, check our YouTube channel, our social media, Instagram, TikTok, Twitter and Facebook, and our website, bbcbrasil.
com. Thank you and see you next time. Bye bye!