OpenAI Pauses Release of Groundbreaking Voice Cloning Tool Because It’s Too Good

By Shubhendu Vatsa - News

Published: 13 Apr 2024, Last Updated: 6 June 2024

In the past year or so, AI assistants and chatbots have become increasingly common, slowly integrating themselves into our daily lives. But the honeymoon phase seems to be ending sooner than expected. Deepfakes have already emerged as a significant concern, raising questions about the potential for manipulated media to erode public trust and distort the truth.

Now, another potential challenge that has surfaced is the rise of voice cloning. OpenAI, the minds behind the massively popular ChatGPT language model, has developed a groundbreaking voice cloning tool, but in a surprising move, they’ve decided against releasing it, citing the very real risks it poses.

Dubbed Voice Engine, OpenAI’s new groundbreaking text-to-voice generation platform has been in development for roughly two years and can generate a “natural-sounding” copy of someone’s voice using just a 15-second audio sample. The company claims it “closely resembles the original speaker,” and their blog post emphasizes the model’s ability to “create emotive and realistic voices” with just a single 15-second sample size.

Furthermore, Voice Engine generations will be watermarked in a way that allows OpenAI to trace the origin of any generated audio. This will likely not be a traditional watermark but rather a tracking mechanism embedded within the generation process; however, the company didn’t reveal any details.

Additionally, the OpenAI’s current partnerships require “explicit and informed consent from the original speaker,” and they specifically prohibit “developers to build ways for individual users to create their own voices”. However, the company has opted to hold off on a wider release due to concerns about potential misuse and seeks to minimize the threat of damaging misinformation, especially with a global year of elections on the horizon.

This decision comes despite Voice Engine’s impressive ability to not only generate natural-sounding, cloned voices from short audio samples but also read text prompts on command, even in multiple languages. As highlighted in their blog post, “These small-scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries.”

Sam altman — OpenAI CEO Sam Altman | Image: Getty

The company further added, “We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.” As for now, OpenAI said, “We are choosing to preview but not widely release this technology at this time” and the company aims “to bolster societal resilience against the challenges brought by ever more convincing generative models”.

The company also outlined some immediate steps, including “phasing out voice-based authentication as a security measure for accessing bank accounts and other sensitive information.” OpenAI also called for the exploration of “policies to protect the use of individuals’ voices in AI” and “educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content.”