In recent years, the amount of time required by a piece of AI to clone someone’s voice has been steadily decreasing. Once, it used to take minutes, but now it’s as short as just seconds. Recommended Videos.
OpenAI, the company backed by Microsoft that is behind the viral generative AI chatbot ChatGPT, recently disclosed that its own voice-cloning technology only needs 15 seconds of audio material to reproduce a person’s voice. In a post on its website, OpenAI shared a small-scale preview of a model called Voice Engine, which it has been developing since the end of 2022.
Voice Engine functions by feeding it a minimum of 15 seconds of spoken content. The user can then input text to create what OpenAI describes as “emotive and realistic” speech that “closely resembles the original speaker.” OpenAI firmly asserts that it is taking a “cautious and informed approach to a broader release due to the potential for improper use of synthetic voices,” adding that it desires to “initiate a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities.” It further stated: “Based on these conversations and the results of these small-scale tests, we will make a more informed decision regarding whether and how to deploy this technology on a large scale.”
One of the types of misuse that OpenAI refers to is a scam that some criminals are already carrying out using similar technology that has been publicly available for some time. It involves cloning a voice and then calling a friend or relative of that person to trick them into transferring money via a bank transfer. There are also concerns about how such technology might be utilized in the upcoming presidential election, an issue highlighted by a recent prominent incident in which a robocall using a clone of President Joe Biden’s voice instructed people not to vote in the January New Hampshire primary.
Another worry is how this rapidly advancing technology will impact the livelihoods of voice actors, who fear that they will increasingly be asked to surrender the rights to their voices so that AI can be used to create a synthetic version, and the compensation for such a contract is likely to be significantly lower than if the actor were to perform the job in person.
Considering more positive applications of the technology, OpenAI suggests that it could be used to offer reading assistance to non-readers and children using natural-sounding and emotive voices that “represent a wider range of speakers than what is achievable with preset voices,” as well as for instant translation of videos and podcasts, a feature that Spotify is already trialing. It could also be used to help patients who are gradually losing their voice due to illness to continue communicating using what sounds like their own voice.
OpenAI has provided examples of the AI-generated audio and the reference audio on its website, and we are certain that you will agree that they are truly extraordinary.