Watching Anthropic’s Claude 3.5 Sonnet code a video game blew my mind

There’s so much progress in artificial intelligence right now that it feels like, with every new model, some new feature or capability has gone from seemingly impossible to completely possible.

That’s what this latest release feels like. Today, in a blog post, Anthropic announced Claude 3.5 Sonnet, its latest large language model (LLM) that the company says “raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.”

The company says that Claude 3.5 Sonnet is faster, cheaper, and smarter than its predecessors, a combination that Anthropic says is “ideal for complex tasks such as context-sensitive customer support and orchestrating multi-step workflows.”

In an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus which solved 38%. Our evaluation tests the model’s ability to fix a bug or add functionality to an open source codebase, given a natural language description of the desired improvement. When instructed and provided with the relevant tools, Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities. It handles code translations with ease, making it particularly effective for updating legacy applications and migrating codebases.

What can you do with Claude 3.5 Sonnet?

Of course, for people like me, I just wonder what I can do with it. The first demo the company showed off was its ability to help a writer brainstorm a fictional story’s characters, plot points, twists, and resolution. “It shows marked improvement in grasping nuance, humor, and complex instructions, all while writing with a natural tone,” says Anthropic.

Tech. Entertainment. Science. Your inbox.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

In the second demo, the company showed off an example where a user had the assistant break down images of graphs and charts and manipulate the data into a presentation. Anthropic says that, for the new model, “Improvements are most noticeable in tasks requiring visual reasoning, like interpreting charts, graphs, or transcribing text from imperfect images.”

What’s even more impressive is that you can now see what you were creating in step with the company’s new Artifacts feature. The company says, “When a user asks Claude to generate content like code snippets, text documents, or website designs, these Artifacts appear in a dedicated window alongside their conversation. This creates a dynamic workspace where they can see, edit, and build upon Claude’s creations in real-time, seamlessly integrating AI-generated content into their projects and workflows.”

This preview feature marks Claude’s evolution from a conversational AI to a collaborative work environment. It’s just the beginning of a broader vision for Claude.ai, which will soon expand to support team collaboration. In the near future, teams—and eventually entire organizations—will be able to securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate.

The video game demo blew my mind

The demo that really made me do a double-take, however, was the one where someone used the model and the Artifacts feature to create an actual video game. The user simply told Claude to create an 8-bit character and some additional elements, and then combine all of that into a simple game. After only a few prompts, a platformer video game came out on the other side.

I tried my own hand at creating a video game with Claude, and sure enough, it spits out a ton of Python code for a basic platformer game and even worked with me to add new game dynamics, characters, and more. It even redid the code in Swift, so I could try running it in Xcode and potentially bring it to the iPhone, iPad, Mac, and Apple TV.

This is an unbelievable tool for people like me who have ideas but no idea how to make them a reality through coding. Of course, it’s not going to do everything, but being able to play around with this in a conversational way unlocks things I wouldn’t have dreamed of just a year ago. I’m definitely going to be playing with this even more.

Claude 3.5 Sonnet is now available for free on Claude.ai and the Claude iOS app. Claude Pro and Team plan subscribers can also access it, but those users get significantly higher rate limits than free users.

Source