This startup gives your speech a new 'human-realistic' AI voice

TNW
June 21, 2023
286 views

Ioanna is a writer at TNW. She covers the full spectrum of the European tech ecosystem, with a particular interest in startups, sustainabili (show all) Ioanna is a writer at TNW. She covers the full spectrum of the European tech ecosystem, with a particular interest in startups, sustainability, green tech, AI, and EU policy. With a background in the humanities, she has a soft spot for social impact-enabling technologies.

From virtual assistants to voiceovers for audiobooks, AI voice generation has emerged as a rapidly growing field — and it’s no wonder that companies are rushing to tap into the technology’s potential.

Among them is Valencia-based Voicemod. The startup has developed an AI voice changer and soundboard software that enables instant speech-to-speech conversion. Unlike most of its competitors, the company claims that it transforms voices in real time and with low latency, enabling users to converse as they would in real life.

According to Jaime Bosch, Voicemod’s CEO and co-founder, the company trains its AI model using publicly available data sets and professional voice actors, which results in a broad pool of vocal expressions, pitches, tones, and emotions. Through machine learning techniques, the model learns to understand, analyse, and predict the a person’s speech patterns and intricacies.

“When a user speaks into our software or application, their voice input is processed in real time,” Bosch told TNW. “Our AI model then applies the learned patterns and transformations to the input, allowing for instant voice conversion.”

Voicemod mainly targets the entertainment industry, including gamers, streamers, content creators, and vtubers in platforms ranging from Discord and Twitch, to Zoom and WhatsApp.

To further address the increasing user demand for self-expression, pseudonymity, and creativity online, next to the 100 voice options in its portfolio, the startup is now launching the so-called “AI Humans” collection. Although Voicemod already offers human voice filters, the new collection is slated to be the company’s most human-realistic to date.

Credit: Voicemod

Trained on recordings from voice actors, AI Humans consists of 20 sonic avatars which range in personality, gender, and age. The personas include Joe, an 80-year-old male voice with a “raspy, sardonic tone” and Jennifer, a 25-year-old female voice, featuring an “energetic and friendly” character. Users can also customize the pitch of each persona, changing the perception of the voice’s gender and age.

The video below can give you an idea of how these characters sound:

“AI voices offer exciting opportunities for industries looking to cultivate creative exploration and self-expression, enhance personalization, and foster inclusivity in digital spaces,” Bosch said.

But despite the positive impact AI voice generation can make, the technology is associated with numerous risks as well. Some of them include misuse, fraud, impersonation, and even voice theft, which especially affects professional voice actors.

As per Bosch, Voicemod is actively working to mitigate these risks. For example, it’s developing a watermarking technology to help platforms identify and track AI-generated voices, while it has implemented measures to protect the intellectual property of the voice actors it’s working with.

Bosch believes that AI will become “a tool” for these professionals. “Something that is perhaps missed in these discussions is that behind every use of real-time voice AI, the use-case that Voicemod is targeting, is a human who is effectively driving the AI,” he told TNW.

Voicemod already counts over 40 million desktop downloads. In the future, it plans to launch on mobile as well, and reach millions of monthly active users. It’s also working on B2B partnerships with gaming companies and VR headset platforms.

The software is available for free, with the option for a paid PRO version which unlocks additional features and content.

Source: TNW