How AI could help us talk to animals

1.03M views1424 WordsCopy TextShare

Vox

Why researchers think we're close to getting interspecies chatbots. Support our work. Become a Vox ...

Video Transcript:

Way back in the 80s, I noticed that sometimes, when an elephant called a member of her family, one individual would answer, and everybody else ignored the calling animal. And then she would call again, and a different elephant would sort of lift her head up and rumble very loudly. This is Joyce Poole.

She's been studying African elephants and their communication for 50 years. Then I started to think, well, okay, so maybe they have a way of directing a call to a specific individual. But we had no way of detecting that.

Decades later, she partnered up with Mickey Pardo, who designed a study around her observations. I went out into the field. I recorded calls with careful behavioral observations.

So we knew who made each call, we knew who the call was addressed to, we knew the context of the call. . .

They encoded the acoustic information from the recordings into a long string of numbers, along with the data Mickey collected about the calls. They fed nearly 500 different calls like this into a statistical model. And when given the acoustic structure of a new call, the model could predict who the receiver of the call was, much better than chance.

In other words, evidence suggesting African savanna elephants give each other names. When we posted it on Facebook, somebody wrote back, said that the Earth just shifted a little bit. And I think that's true.

This is just one example of how machine learning is decoding complexities in animal communication that humans can't detect. And now some AI researchers want to take the next step: Large language models, like the ones that power chatbots, but built for interspecies communication. “Can we talk a little bit about love?

” “There is still much to be learned about whales. ” When researchers study animal communication, they usually employ a few methods: Recording their vocalizations, observing and documenting the behavior and context around those sounds, and sometimes doing a playback to measure the animal's response. All of these areas are already being impacted by AI.

Recordings from the field don't usually sound like this. They often sound like this: Multiple animals vocalizing on top of one another in a noisy environment. This is known as the cocktail party problem, and it's a common issue in the field of animal research.

But machine learning solved a similar problem in human speech recognition. AI researchers trained a model called Deep Karaoke on lots of music tracks where instruments and vocals were recorded separately, then also on the fully mixed tracks, until it was able to do the task of separating out instruments and vocals in new music clips. Recently, AI researchers have had some success applying similar algorithms to animal sound recordings.

Which means you can take a clip of a group of macaque monkeys, and single out one discernable call. Researchers could also start using AI in how they use playbacks in the field. You may have seen AI models that can be trained on lots of examples of a sound recording and then generate another unique version of it.

AI researchers are starting to develop similar models for animal recordings. These are all types of “supervised learning. ” That means that the model gets trained on lots of examples labeled by humans.

And in the elephant name study, researchers were able to feed a model their observations, which, along with the sound data, helped them detect something in elephant calls they couldn't through observation alone. You need to annotate a lot of data. Yossi Yovel trained a statistical model on 15,000 Egyptian fruit bat vocalizations, and it was able to identify the emitter of the call, the context of the call, its behavioral response, and who the call was addressed to.

And we annotated them manually. You know, I'm already saying this is a restriction of the study, because maybe we're missing something, we’re humans, we’re not bats. And that's the problem with supervised learning models.

They are limited by what we humans already know about animal communication in order to label the training data. And we don't know a lot. That's why some AI researchers say self-supervised models hold the most potential for decoding animal communication.

This is how natural language processing models like ChatGPT are trained. Instead of human-labeled examples, they are trained on a large amount of unlabeled data, and then can sort it according to patterns and categories it detects all on its own. In the example of ChatGPT, it learned from all the books, websites, social media feeds and anything else it could scrape from the internet, and came to its own conclusions about how language works.

Every language has a shape that AI discovers. This is Aza Raskin. He co-founded the Earth Species Project, one of a few organizations that want to build models like this for animal communication.

What he means by language having a shape, is that language models are built out of relationships among words. Words that mean similar things are placed near each other, words that share a relationship, share a distance and direction. So, man is to king as woman is to queen.

So this is the shape of all those relationships among the English language’s 10,000 most common words, visualized here by the Earth Species Project. Flattened out it looks something like this. Something really miraculous happened in 2017, and that was, researchers discovered that you could take the shape of any one language and match it to the shape of any other language, and the point which is “dog” ends up in the same spot.

This idea, that similar words can be located in other languages in roughly the same place, is what gives the Earth Species Project hope we could do a version of this for animal communication. To do a translation without needing any examples, without needing a Rosetta stone. This is complicated though, because we know that animals don't just communicate with sound, but with other senses, too.

But Aza points out that we can learn from the fact that image generation models like DALL-E and Midjourney are built on the same large language model structure used for text. It turns out, behind the scenes, it's again these kinds of shapes. There's the shape that represents sound, the shape that represents images.

Those two shapes get aligned, and now you can translate between images and text. Their expectation is that where nonhuman animals’ communication would line up with ours will tell us even more about what we have in common. Dolphins look in mirrors and recognize themselves.

Elephants too. That's a kind of self-awareness. One concern with this plan is related to a step in self-supervised learning called validation, meaning humans still need to refine these models by grading them on their answers.

How would we do that in a communication so foreign from our own? We also might have too high expectations of this overlap, or the capacity for having a conversation with a nonhuman animal in a shared language, and about shared experiences. “Hey Kurt, how are you doing dude?

” “So I'm about to translate that into a meow. ” “We said hi. Hi.

Hi. Hi. You know, next time you want to say, how are you?

” I do not think that humans should be considered more important than other species, but that doesn't mean that there's no usefulness in distinguishing between language, which is this very specific behavior that, at least based on what we currently know, seems to be unique to humans, and other forms of communication. In order to build these models, the first step is collecting a lot more data on animal sounds than exists right now. And so I'm actually at the moment building up a database with all the individual calls.

So, close to 10,000 records in that, which is very small actually. Around the world, animal researchers are in the midst of a massive data collection effort, tagging and recording animals with video and sound and spatial data to feed these data-thirsty models. Time will tell whether true interspecies communication will be facilitated by AI.

But researchers hope that discoveries along the way will continue to have an impact on our appreciation and protection of the species we share the planet with. We're not the only ones on the planet who can communicate, who care about one another, who have thoughts about the past and about the future. They also have a right to be here and a reason for being here.