My FeedDiscussionsHeadless CMS
New
Sign in
Log inSign up
Learn more about Hashnode Headless CMSHashnode Headless CMS
Collaborate seamlessly with Hashnode Headless CMS for Enterprise.
Upgrade ✨Learn more
3 reasons why audio deepfake may be better than we think

3 reasons why audio deepfake may be better than we think

Zuzanna Kwiatkowska's photo
Zuzanna Kwiatkowska
·Oct 27, 2021·

5 min read

Recently I stumbled into an article describing multiple situations when thieves managed to use a voice cloning technology to steal money from companies. The scenario was similar – thieves managed to find the executives of a company, record their voice and then use it to generate a real-time deepfake to one of the employees able to do the transfer.

And the most scary thing is that this isn’t the only bad usage of voice deepfake. Blackmailing people with their deepfaked voice, deepfaking politicians to generate some controversial thougts – those are as serious crimes as the first one.

Even the name itself, deepfake, gives us a vibe of something negative, of something not being real and potentially unnatural to us. We fear it to the extent that Kaggle’s Deepfake Detection Challenge has a total amount of prizes worth 1 million dollars.

Does this mean deepfake is only all bad and we should avoid it? Today I am going to write about 3 cases when deepfake actually looks promising.

Giving voice to those that can’t have it

ALS, or Amyotrophic Lateral Sclerosis, is a disease of a nervous system that causes the loss of control in your muscles. The disease progresses over time, taking away from people the ability to move, eat, but also talk, creating a speech impairment which makes it hard to communicate freely.

Tim Shaw, a former American NFL player, was diagnosed with ASL shortly after his 30th birthday. As a sportsman, everything changed for him then and also would change for him in the future.

This is when Google and Deep Mind came into action. They helped through Project Euphonia, which aimed to improve speech recognition and recreate the voice for people with medical conditions. They decided to use the existing technologies in the area of speech synthesis to generate Tim’s voice using his past interviews and recordings before the disease caused major speech impairments. Moreover, they created a tool that would map his current speech into text, giving the potential to translate his current voice into an old voice in real time.

Similar technology is being improved by Rolls-Royce in their AI agent called Quips, which uses the idea of voice-banking. It creates a database of voice samples from before the disease progressed to be later used in voice generation. Rolls-Royce claims that Quips will not only be able to generate speech, but also generate it with a proper accent and other smaller features that are unique to how a person communicates.

Having a voice matching the gender identity

Over the last few years, so-called “skins” became a wide-spread phenomenon in the gaming community. Skins allowed people to modify their in-game looks to represent their personality better or to create a totally new character they more identified with.

Unfortunately, another wide-spread phenomenon is not similarly inclusive, showing the bad side of the gaming community – the online harassment. And the group that reports a higher risk of being harassed online are the LGBTQ+ community members.

A company called Modulate aims to change it. They proposed to create a voice skin that would allow people from the community to use their preferred voice during playing online. But it’s not only that. Many community members asked if it would be possible to use the technology outside of the gaming environment to fight their dysphoria or even increase the privacy and security through shielding behind a modulated voice.

Although the technology still seems to have many drawbacks, it definitely shows a major potential for the future.

Coping with death

Another promising, yet a bit dangerous, technology resembles a Black Mirror episode. In fact, it is a BM episode – a young woman faces the loss of her boyfriend, and decides to upload all of his texts, recordings and photos to the company which promises her to recreate his whole personality. Although in the episode he is fully “brought back to life”, some of the technologies shown there are already here.

But can a speech synthesis be enough in this case? After all, it’s also about personality, inside jokes and phrases people were using to give you the experience of talking with them.

This is the problem Eugenia Kuyda wished to solve, after her friend’s death. She, her friends and family gathered her friend’s text messages in order to train a neural network to mimic his behaviour in the form of a chat bot. Although she said the bot rather resembled “a shadow of a person” to her, it was still a proof of concept for the technology and potentially could be merged with speech-to-text.

So should we try to revive our closed ones like that? I think, if unsupervised, it may be really dangerous in terms of people’s mental health. On the other hand, it can be used together with a therapy to help make a closure after a sudden death or even help cope with traumas caused by others.

Conclusions

What are your thoughts on deepfake technology? Do you think it is doing more harm than good or has a potential to solve many of our problems? Despite the stand we take on that, I like to think we should have awareness on both sides of the coin.

If you want to read posts like this daily, covering all about AI in terms of technical and business aspects – join me on Twitter, I’ll work hard to earn this one click of yours -> https://twitter.com/aiflavours .