Can AI Feel Now? OpenAI Gives ‘Voice & Emotion’ To Its New Model GPT-4o, How To Access It?
Can AI Feel Now? OpenAI Gives ‘Voice & Emotion’ To Its New Model GPT-4o, How To Access It?
Launched in May this year, GPT-4o has a context window of 128K tokens and a knowledge cut-off date of October 2023. It is especially better at vision and audio understanding compared to previous models

With the release of GPT-4o (“o” for “omni”), OpenAI has unveiled a glimpse into the future of intelligent computing. The moment the latest large language model was launched, demo videos began flooding social media platforms. The human-like voice assistance has left many in awe, drawing comparisons to ‘Samantha’ — the artificial intelligence operating system from the 2013 film ‘Her’.

In a blog post, OpenAI said: “It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to a human response time (opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API”.

It should be noted that there is a common misperception that ChatGPT is equivalent to GPT. It’s an easy mistake to make because they have the same name, are produced by the same corporation, and deal with AI. The main distinction is that ChatGPT is an application driven by GPT AI models, not an AI model itself. ChatGPT uses the underlying GPT model, which is the AI language model, to create conversational replies in an interactive manner.

What is GPT-4o?

Launched in May this year, GPT-4o has a context window of 128K tokens and a knowledge cut-off date of October 2023. It is especially better at vision and audio understanding compared to previous models. The main intelligence source, GPT-4, couldn’t directly understand things such as tone, multiple speakers, or background noises, and it couldn’t express emotions like laughter or singing. But with GPT-4o, a new approach has been taken as it was trained to handle text, vision, and audio altogether.

Soon after the GPT-4o launch, many people tried to use the new model, especially because of the ‘emotional’ or more human-like voice behind the AI assistance. However, many users raised issues regarding the availability of overall access and voice mode for smartphones and PCs on the OpenAI Developer Forum and Reddit.

OpenAI, however, said: “GPT-4o’s text and image capabilities are starting to roll out (May 13) in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits. We’ll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks.”

How to Access ChatGPT-4o?

According to OpenAI, GPT-4o will be available in ChatGPT and the API as a text and vision model initially. GPT-4o will be available in ChatGPT Free, Plus, and Team and in the Chat Completions API, Assistants API, and Batch API.

Free users will be automatically assigned to GPT-4o. If GPT-4o is unavailable, free-tier users will default to GPT-3.5. However, free-tier access comes with limitations on advanced communication features, including data analysis, file uploads, browsing, discovering and utilising GPTs, and vision capabilities.

What are OpenAI’s GPT Models?

Despite being developed by the same company, all GPT models are different from each other in terms of speed, parameters, performance, application, cost, efficacy, token size (refers to the unit of text processed by the model e.g., word, character, subword) and parameters (represent the overall complexity of the model).

OpenAI’s GPT-3 lead the way for AI language models, while GPT-3.5 builds upon its foundation, elevating accuracy and contextual understanding. The selection between them depends on particular requirements. GPT-3 serves as a solution for general purposes, whereas GPT-3.5 shines in intricate and tailored settings. Serving as an advancement over GPT-3, GPT-3.5 utilises deep learning to produce human-like text with heightened precision and reduced biases.

With the GPT-4 launch last year, OpenAI took another step ahead to solve difficult problems with greater accuracy than previous models because of its broader general knowledge and advanced reasoning capabilities. Now, represented by GPT-4o, the latest generation is more powerful than its predecessors, in terms of speed, performance, applications, and efficiency.

Who Will Benefit?

While speaking about the language models, Amit Prasad, Founder and CEO of SatNav Technologies said until very recently, most features that people see in ChatGPT and in other AI tools were considered science fiction and depicted only in movies. But the fast developments happening in this field are now providing opportunities to freely use them in daily work, both personally and professionally.

“Earlier versions of GPTs were still learning and often gave wrong answers, few of the answers had more flowery language with some elaborate sentences and disclaimers which were not logically tenable. Later, they were cleaned up and became more precise. With the latest announcement of GPT-4o, and the much-needed recent upgrade to reduce verbose content in answers of ChatGPT, AI innovation goes to a whole new level which should be warmly embraced by businesses,” he noted.

Ajay Goyal, co-founder and CEO, Erekrut, said: “The latest addition to the series, GPT-4o, represents a further evolution in AI language models, and it will build upon the advancements of its predecessors, potentially offering improved performance and new features”.

Goyal believes that OpenAI’s GPT models represent significant advancements in AI language processing and have the potential to benefit a wide range of users across various industries. According to him, all of these models including GPT-4o will turn out to be helpful for developers and researchers, as well as businesses like customer service, content creators like bloggers, educators and also general public.

Meanwhile, Prasad said at the lower end of the pyramid, it will help to speed up certain level of tasks and their output. For example, a visual query window goes much further ahead than a chat window which was introduced earlier.

“At the higher end of the pyramid, some innovative ideas relevant to one’s individual business needs to be thought of and models developed that can intelligently aid processes which are critical to an organisation’s functioning. Those who move quickly will have a quantum leap over competitors who don’t, the latter even risking the possibility of getting wiped out,” he added.

Original news source

What's your reaction?

Comments

https://umatno.info/assets/images/user-avatar-s.jpg

0 comment

Write the first comment for this!