OpenAI employees and acolytes were all over Twitter in the lead-up to Monday’s spring update from the company, so much so that the breathless hype was practically unavoidable. “The world will change forever,” one tweet promised. OpenAI CEO Sam Altman teased that what was coming felt like “magic” to him. And this was all on top of rumors that the ChatGPT maker is working on a Google Search rival, and that OpenAI is teaming up with Apple for a voice assistant.
In reality, while there were a plethora of technical announcements that the company unveiled during its rather brisk live-streamed event (such as the release of a new desktop version of ChatGPT), one over-arching big reveal stood out to me. It’s that the new ChatGPT-4o makes OpenAI’s already impressive chatbot feel and sound so much more, dare I say it, human.
Among other things, ChatGPT can now detect emotion in both the user’s voice as well as from their facial expression, just like a human can. It also makes unprompted jokes, the way a human would who’s trying to keep a conversation light, and it also lets you interrupt a response — so that you no longer have to confine yourself to the stilted my turn-your turn dynamic of a conversation with a chatbot.
I’m blown away by GPT-4o.
Realtime + multimodal + desktop app.
You’ll have an AI teammate on your device that’s able to help you with anything you’re working on – and it runs 2x faster and costs 50% less than before.
OpenAI doesn’t make AI models.
They make magic.
— Mckay Wrigley (@mckaywrigley) May 13, 2024
To get a sense of what I mean about OpenAI making ChatGPT feel more human, check out this video the company posted in which the new GPT4-o model interacts via the camera with a cute dog. If you had your eyes closed, you’d think this is a real lady fawning over a cute puppy, when in fact it’s an AI model that’s learned how to express relevant and appropriate emotion — in addition to making the same observations we would when we meet a cute dog for the first time.
Tech. Entertainment. Science. Your inbox.
Sign up for the most interesting tech & entertainment news out there.
By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.
“GPT-4o (‘o’ for ‘omni’) is a step towards much more natural human-computer interaction,” OpenAI explains about the update. “It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation … GPT-4o is especially better at vision and audio understanding compared to existing models.”
That last part really speaks to the magic I was alluding to above. During the event on Monday, for example, ChatGPT read a bedtime story (with plenty of whimsy, emotion, and drama added to the narration). In a conversation, it repeats thoughts back to the user for purposes of clarity, and adds hmmms and pauses, just like a human.
Does the world need a chatbot to read bedtime stories with the same level of emotion that your parents did when you were young? Honestly, no — but having said that, it makes the technology much more approachable, similar to the way Apple adding handles to those early candy-colored Macintosh computers was unnecessary but at the time kind of delightful.
OpenAI’s voice assistant also now has some pretty impressive abilities when it comes to live-translation, to the point that this part of the demo actually took my breath away. It was fast and didn’t miss a beat in seamlessly translating between English and Italian, summarizing each speaker in real-time to the other.
This is what I mean about the big reveal, as I see it, from Monday’s OpenAI event. The more the company makes its technology feel delightful, the greater the uptake on ChatGPT is going to be. Be honest: When’s the last time a company released a product or software update that delighted you and felt like magic?