Many AI companies are racing towards bringing AI agents to the market. They are also introducing new models all the time. Anthropic today announced an upgraded Claude 3.5 Sonnet as well as Claude 3.5 Haiku and computer use, which enables developers to use computers like people do.
Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.
Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text. pic.twitter.com/ZlywNPVIJP
— Anthropic (@AnthropicAI) October 22, 2024
As you can see in the benchmarks, the new Sonnet model beats GPT-4o, Gemini 1.5 Pro, and other comparable models. It only loses to Gemini 1.5 Pro in math problem solving. In every other area, this model has surpassed the others. As you may have noticed, the o1 model not included. A lot of people who use AI for coding are going to like this new model:
Early customer feedback suggests the upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding. GitLab, which tested the model for DevSecOps tasks, found it delivered stronger reasoning (up to 10% across use cases) with no added latency, making it an ideal choice to power multi-step software development processes.
As we can see in SWE-bench’s leaderboard, Claude 3.5 Sonnet scored 49% without complex scaffolding.
Best of all, you can now use Clause to navigate computers. It will be able to use tools and software programs that human use. Developers can use this AI to build and test software. As the company explains:
Developers can integrate this API to enable Claude to translate instructions (e.g., “use data from my computer and online to fill out this form”) into computer commands (e.g. check a spreadsheet; move the cursor to open a web browser; navigate to the relevant web pages; fill out a form with the data from those pages; and so on).
This is not perfect right now, so it will be improved over the next few months. Spam, misinformation, or fraud will be handled with new classifiers.
[HT]