TMTPost -- OpenAI is starting to release its cutting-edge voice assistant for ChatGPT to all the paid users.
OpenAI said its Advanced Vocie Mode (AVM) is rolling out to all Plus and Team users in the ChatGPT application over the course of the week and the Edu and Enterprise subscribers will get access to the audio feature next week. AVM will be first hit the U.S. market. It is not yet available in the European Union, the U.K., Switerland, Iceland, Norway and Liechtenstein, OpenAI said. Any Plus or Team user will see a notification in the app when the user has access to the AVM.
At a post at social media X, former Twitter, OpenAI said the AVM can say “Sorry I’m late” in over 50 languages. The post attached a video how a user could ask the voice assistant to apologize to her grandmother for keeping her waiting for so long. The video showed the artificial intelligence (AI) assistant first summarizing what the user wanted to say in English, and then, after the user reminded the grandmother can only speak Mandarin, the assistant repeated it in standard Mandarin.
Compared with the elder voice assistant, OpenAI added the ability to store Custom Instructions and Memory for the the behaviors the user wants it to exhibit. OpenAI said it also improved conversational speed, smoothness and accents in select foreign languages. OpenAI also revamped its design of AVM. The feature is now represented by a blue animated sphere, instead of the animated black dots that the startup showcased in May.
Moreover, the AVM delivers five new styled voices--Arbor, Maple, Sol, Spruce, and Vale, bringing ChatGPT’s total numbers of voices to nine, while OpenAI dropped the controversial voice named Sky, sounding similar to Scarlett Johansson. The actress said she was “shocked” and “angered” as one of the voices of the AVM first unveiled in May was alleged recreated her voice without her consent. OpenAI that month said it would pause the use of Sky.
The rollout came four months after OpenAI’s first release the AI feature with the launch of its flagship model GPT-40. The startup showed users can ask the GPT-4o-powered ChatGPT a question and interrupt ChatGPT while it’s answering. The model delivers “real-time” responsiveness and can even pick up on nuances in a user’s voice, in response generating voices in “a range of different emotive styles”, OpenAI says. It originally planned to launch the AVM in late June, but it decided to delay the launch by a month to late July as it needed time to reach its safety and reliability standard.
OpenAI rolled out the AVM to a limited number of paid Plus users late July, while the feature has a more limited list of capabilities to start since OpenAI is still working on video and screen-sharing features. For example, the chatbot won’t be able to access a computer-vision feature that would let it offer spoken feedback on a person’s dance moves simply by using their smartphone’s camera. The feature available is unable to impersonate how other people speak. OpenAI also said that it had added new filters to ensure the software can spot and refuse some requests to generate music or other forms of copyrighted audio.