AI News in May 2024

May 17, 2024

Juan Carlos Quintero, Founder

Big AI week. Probably the biggest AI week of the year, but it's a big statement so don't hold me accountable for it, AI nevers sleeps :)

Let's take a quick look at the main AI news of May so far

OpenAI released GPT-4o

GPT-4o is the newest flagship model from OpenAI now, that can reason across audio, vision, and text in real time.

GPT-4o (“o” for “omni”), meaning it's the first from OpenAI with built-in multimodal capabilites.

It was GPT-4o was built from the ground up, to accept as input any combination of text, audio, image, and video and to generate any combination of text, audio, and image as outputs.

It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being 2x faster and 50% cheaper.

Regarding the multimodality capabilites, the most remearkable highlight of the release was the following note:

Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

GPT-4o is poised to become a game-changer.

Google I/O 2024

Google announced many new products and new features in the Google I/O 2024 conference

Google I/O is the main developer conference of Google, where the company makes all the major announcements of the year

This year it was incredible. The amount of releases and new products were outstanding

There were so many announcements during the event that is impossible to mention all.

A new in-depth blog post is coming but for now, let's take a look at the top ones

Ask Photos: The ability to send a query inside the Photos app, and the AI will look into your gallery and create a results page.
Search with Video: record a video and send it to Gemini to ask questions about it. It will analyze the footage in real-time, and it can provide with helpful insights about it.
Imagen 3: the new best-in-class image generation model from Google
Veo: a new video generation model that will compete with Sora. It can create videos in 1080p with high accuracy.
Project Astra: a new Universal AI agent powered with vision capabilites that will help you make sense of the environment around you in real-time.
Gemini: "Gems" - Create custom AI assistants for any task/persona.
Agents: AI agents that can execute actions, update websites etc. (not released yet).

Anthropic released Claude in Europe

Anthropic, has launched its Claude chatbot and subscription plans across the European Union.

Anthropic is backed by Amazon, and it's valued at USD 18.4B, making them one of OpenAI's biggest competitors.

The release follows follows the European launch of the Claude API earlier this year, which allows developers to integrate Anthropic’s AI models into their own applications, websites, or services.

Claude 3 is Anthropic's flagship model.

It comes in 3 different versions:

Haiku

The fastest, most compact model for near-instant responsiveness.

It's meant to provide seamless AI experiences that mimic human interactions.

Potential use cases are for customer interactions and to improve costly tasks like carrying inventory management and extracting knowledge from unstructured data.

Sonnet

It aims to strike a balance between intelligence and speed.

Some use cases might be data processing like for RAG or search over vast amounts of knowledge and to improve time-costly tasks like code generation, quality control, and parsing text from images.

Opus

The most intelligent model, with best-in-market performance on highly complex tasks.

It's best suited for task automation, R&D, and strategic tasks like advanced analysis of charts & graphs, financials and market trends, and financial forecasting.

Claude is available at claude.ai

Voiceflow released Workflows

Voiceflow is the platform we use to build our agents.

This is a big release.

It changes the way you work with your agents, but for the better.

The focus is put now on the knowledge and data your agent is based on.

The canvas remains but is now part of the different flows the agent supports, not the main thing anymore, which in retrospective, makes more sense.

2 key features of Voiceflow are flexibility and collaboration

Workflows takes these two to the next level

Really exciting release

AI News in May 2024

OpenAI released GPT-4o

Google I/O 2024

Anthropic released Claude in Europe

Voiceflow released Workflows

More articles

Voiceflow Multi-modal Projects

We joined the Voiceflow Expert Program

Ready to boost your conversion?

Get in touch