OpenAI’s big ChatGPT event is over, and I can safely say the company severely downplayed it when it said on Twitter that it would “demo some ChatGPT and GPT-4 updates.” Sam Altman’s teaser that it would be new stuff “we think people will love,” and the detail that it “feels like magic to me” best describe what OpenAI managed to pull off with the GPT-4o update for ChatGPT.
As rumored, GPT-4o is a faster multimodal update that will handle voice, images, and live video. It’ll also let you interrupt it while you’re talking, and it can detect the tone of the user’s voice.
The key detail in OpenAI’s tweet was correct however. This was going to be a live demo of ChatGPT’s new powers. And that’s really the big detail here. GPT-4o appears to be able to do what Google had to fake with Gemini in early December when it tried to show off similar Gemini features.
Google staged the early Gemini demos to make it seem that Gemini could listen to human voices in real time while also analyzing the contents of pictures or live video. That was mind-blowing tech that Google was proposing. However, in the days that followed, we learned that Gemini could not do any of that. The demos were sped up for the sake of presenting the results, and prompts were typed rather than spoken.
Tech. Entertainment. Science. Your inbox.
Sign up for the most interesting tech & entertainment news out there.
By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.
Yes, Gemini was successful at delivering the expected results. There’s no question about that. But the demo that Google ultimately showed us was fake. That was a problem in my book, considering one of the main issues with generative AI products is the risk of obtaining incorrect answers or hallucinations.
Fast-forward to mid-May, and OpenAI has the technology ready to offer the kind of interaction with AI that Google faked. We just saw it demonstrated live on stage. ChatGPT, powered by the new GPT-4o model, was able to interact with various speakers simultaneously and adapt to their voice prompts in real time.
GPT-4o was able to look at images and live video to offer answers to questions based on what it had just seen. It helped with math problems and coding. It then translated a conversation between two people speaking different languages in real time.
Yes, these features were probably rehearsed and optimized over and over before the event. But OpenAI also took prompts from X for GPT-4o to try during the event.
Plus, I do expect issues with GPT-4o once it rolls out to users. Nothing is perfect. It might have problems handling voice, picture, and video requests. It might not be as fast as in the live demos from OpenAI’s event. But things will get better. The point is that OpenAI feels confident in the technology to demo it live.
I have no doubt that Gemini 1.5 (or later versions) will manage to match GPT-4o. And I think Google’s I/O event on Tuesday might even feature demos similar to OpenAI’s. Also, I don’t think GPT-4 was ready back in December to offer the features that OpenAI just demoed today.
However, it shows a big difference between the companies here. OpenAI went forward with this live demo when it had the technology ready. Google, meanwhile, had to fake a presentation to make Gemini seem more powerful than it was.
If you missed the ChatGPT Spring Update event, you can rewatch it below. More GPT-4o demos are available at this link.