Thanks a web demo of a new AI engine called Koe Recast, you can transform your voice up to 20 seconds into different styles, including anime characters, poignant male narrators, ASMR whispers, etc. It’s an eye-opening preview of a potential commercial product currently undergoing private alpha testing.
Koe Recast emerged recently from a Texas-based developer named Asara near, who is working independently on developing a desktop app with the aim of allowing people to change their voice in real time through other apps like Zoom and Discord. “My goal is to help people express themselves in any way that makes them happier,” Near said in a brief interview with Ars.
Some demos on Website Koe shows altered clips of Mark Zuckerberg talking about augmented reality with female voices, bass male narrators, and soaring anime voices, all powered by Recast.
This kind of AI-powered, realistic voice-conversion technology isn’t new. Google Whirlpool with similar technology in 2018 and the deep sounds of famous people have cause controversy for several years now. But witnessing this possibility in an independent startup funded by one person – “So far, I’ve fully funded this project,” says Near – shows fusion technology How far has the voice of AI evolved, and perhaps hints at how close speech conversion capabilities are to widespread adoption through an open source or low-cost release.
When asked what specific type of AI powers Recast’s voice-switching capabilities under the hood, Near withheld specifics but generalized how it works, “We can go in and change change the characteristics of the voice in the embedded space we’ve created, so the goal is to modify the audio parts that correspond to the speaker’s personal style or timbre while keeping the parts intact sounds that correspond to spoken content such as prosody and speech This allows us to change someone’s voice style to any other style, including gender, age, ethnicity their perceived ethnicity, etc.”
Recast supports 10 different voices and more is being rolled out. “It’s currently undecided whether we will offer existing voices of celebrities or other celebrities,” said Near.
However, giving voice to celebrities (or impersonators of living non-celebs) can raise ethical and legal questions. When asked about the potential misuse of Recast, Near replied, “As with any technology, there can be both positives and negatives, but I think the majority of humanity consists of people who great and will benefit greatly from this.” Near also points out that Recast includes a Terms of Service policy that prohibits illegal and hateful use.
As for the release process, Near is pursuing commercial options but is not ruling out an open source release, which could have the same impact as Stable diffusion by bringing profoundly realistic sound to the hands of many without difficult limitations. “We’re exploring several monetization strategies,” said Near. “If the profit models I’m thinking of don’t work out, then this open source technology could be an option in the future.”
As deep learning continues to eliminate the concept of the 20th century (or as some might say “illusion”) of the media as a fixed and precise record of reality, we are looking at a near future in which digital representations of a human voice live, like pictures and videos, which would be one more thing that you wouldn’t be able to get at face value without substantial trust in the source. However, this technology can empower many people otherwise may be discriminated against while doing business — or simply having fun — online.