If we have photo that is almost perfect, but we wish to delete something, image inpainting techniques already exist to help us with that. Like this. However, what if we don’t want to remove this dog, only place it somewhere else?
Of course, that is still trouble. Look, we could inpaint the part that we deleted, but the new part of course, does not look convincing at all. So, is that impossible?
Nope, this new AI paper promises that it can do exactly that. I would very much like to see that. Let’s have a look.
Dear Fellow Scholars, this is Two Minute Papers with Dr Károly Zsolnai-Fehér. I am especially interested in this because previous techniques could perform it, but…ouch. Not great.
When we wish to move this puppy, they don’t appear to understand the relationship between this good boy and its reflection. So, little doggie, time to move! And now, the new technique…goodness.
This is fantastic. If I was given these images, I would be hard pressed to say whether they were changed. Even the reflection moved to its new place, great job, maybe if you look very closely at the splash, you might find out about the trick.
This is a wonderful collaboration between several institutes that you see here. And I wonder what the key idea is here? The key is that we have diffusion based text to image models, these can generate images for us, that is great, but not all of them give us fine grained control.
Some of the ones in newer papers, however, can do that. We can highlight regions with these blobs and say I would like to see a cat here, a rock and a cloud there. Loving it.
So, our problem is simple, just move the blob, right? Well, let’s see…nope, not quite. Fellow Scholars, do you see the problem here?
The problem is that not only the blob changes, the whole image changes. That is too intrusive. So, this amazing new paper found something really interesting.
Let’s use a previous technique and ask for a rabbit, and a cat. There we go, we got the rabbit, and we got a…well, that is many things, but a cat it is not. Perhaps a hybrid of a rabbit and a cat.
And oh yes, therein lies the problem. The information of the rabbit blob leaked into cat blob, and the cat has been rabbitified. So, what does the new technique do?
Now hold on to your papers Fellow Scholars and let’s ask for one rabbit and one cat, and, there we go. Less leakage. Finally, the blobs are now independent.
Loving it. And that is one of the key ideas that helps us move these objects in an already existing photo. Luckily, there are lots of examples shown in the paper.
And I really like how it understands that some of the surroundings have to change when moving the object, for instance, the shadow has to go too, but everything else has to remain the same. It is a really tricky problem, and this is a fantastic leap forward. Is it perfect?
No. Clearly not perfect. But when showing the results to a bunch of humans, the new technique has a significantly higher win rate against previous methods.
And in some cases, you can also move not just the cat, but the rock too. Two objects. I can imagine a future paper where every single object is recognized in the images, we can already do that quite reliably, but then, also moving would be great.
You know what else would be great? Rotations! Let’s see if it is any good at it.
Well, unfortunately it is not too good at that. So what about resizing? Also not the best.
Small adjustments are kind of okay, but if we go bigger, these artifacts appear. And finally, the most hilarious. When two objects are moved to close to each other, what happens?
Oh my, the dog just absorbed that piglet. What a pity. So, not perfect.
The paper also contains a ton more comparisons against previous techniques, it is super fun, make sure to check it out in the description. It works on a variety of images, that is always a good sign that something would work well in practice, and just imagine what we will be capable of two more papers down the line. I would like to make a prediction for that: as you move the puppy, you will see real time updates.
Goodness, that would be fantastic. What a time to be alive! So, what do you think?
What would you Fellow Scholars use this for? Let me know in the comments below.