Home Emerging Technology Google turbocharges its genAI engine with Gemini 1.5

by Lucas Mearian

Senior Reporter

Google turbocharges its genAI engine with Gemini 1.5

news analysis

Feb 15, 20245 mins

Artificial IntelligenceAugmented RealityChatbots

Only a week after releasing Gemini 1.0, Google has pushed out for testing its latest multimodal AI model; it offers long-context understanding that can accept more than one million tokens.

Credit: Google

Only a week after releasing its latest generative artificial intelligence (genAI) model, Google on Thursday unveiled that model’s successor, Gemini 1.5. The company boasts that the new version bests the earlier version on almost every front.

Gemini 1.5 is a multimodal AI model now ready for early testing. Unlike OpenAI’s popular ChatGPT, Google said, users can feed into its query engine a much larger amount of information to get more accurate responses.

(OpenAI also announced a new AI model today: Sora, a text-to-video model that can generate complex video scenes with multiple characters, specific types of motion, and accurate details of the subject and background “while maintaining visual quality and adherence to the user’s prompt.” The model understands not only what the user asked for in the prompt, but also how those things exist in the physical world.)

openais sora movie scene — A movie scene generated by Sora.

Google’s Gemini models are the industry’s only native, multimodal large language models (LLMs); both Gemini 1.0 and Gemini 1.5 can ingest and generate content through text, images, audio, video and code prompts. For example, user prompts in the Gemini model can be in the form of JPEG, WEBP, HEIC or HEIF images.

“Both OpenAI and Gemini recognize the importance of multi-modality and are approaching it in different ways. Let us not forget that Sora is a mere preview/limited availability model and not something that will be generally available in the near-term,” said Arun Chandrasekaran, a Gartner distinguished vice president analyst.

OpenAI’s Sora will compete with start-ups such as text-to-video model maker Runway AI, he said.

Gemini 1.0, first announced in December 2023, was released last week. With that move, Google said it had reconstructed and renamed its Bard chatbot.

Gemini has the flexibility to run on everything from data centers to mobile devices.

Though ChatGPT 4, OpenAI’s latest LLM, is multimodal, it only offers a couple of modalities such as images and text or text to video, according to Chirag Dekate, a Gartner vice president analyst.

“Google is seizing its role as the leader as an AI cloud provider. They’re no longer playing catch up. Others are,” Dekate said. “If you’re a registered user of Google Cloud, today you can access more than 132 models. Its breadth of models is insane.”

“Media and entertainment will be the vertical industry that may be early adopters of models like these, while business functions such as marketing and design within technology companies and enterprises could also be early adopters,” Chandrasekaran said.

Currently, OpenAI is working on its next-generation GPT 5; that model is likely to also be multimodal. Dekate, however, argued that GPT 5 will consist of many smaller models cobbled together, and won’t be not natively multimodal. That will likely result in a less-efficient architecture.

The first Gemini 1.5 model Google has offered for early testing is Gemini 1.5 Pro, which the company described as “a mid-size multimodal model optimized for scaling across a wide-range of tasks.” The model performs at a similar level to Gemini 1.0 Ultra, its largest model to date, but requires vastly fewer GPU cycles, the company said.

Gemin 1.5 Pro also introduces an experimental feature in long-context understanding, meaning it allows developers to prompt the engine with up to 1 million context tokens.

Developers can sign up for a Private Preview of Gemini 1.5 Pro in Google AI Studio.

Google AI Studio is the fastest way to build with Gemini models and enables developers to integrate the Gemini API in their applications. It’s available in 38 languages across more than 180 countries and territories.

gemini 1.5 graphic — A comparison between Gemini 1.5 and other AI models in terms of token context windows.

Google’s Gemini model was built from the ground up to be multimodal, and doesn’t consist of multiple parts layered atop one another as competitors’ models are. Google calls Gemini 1.5 “a mid-size multimodal model” optimized for scaling across a wide range of tasks; while it performs at a similar level to 1.0 Ultra, it does so by applying many smaller models under one architecture for specific tasks.

Google achieves the same performance in a smaller LLM by using an increasingly popular framework known as “Mixture of Experts,” or MoE. Based on two key architecture elements, MoE layers a combination of smaller neuro networks together and it runs a series of neuro-network routers that dynamically drive query outputs.

“Depending on the type of input given, MoE models learn to selectively activate only the most relevant expert pathways in its neural network. This specialization massively enhances the model’s efficiency,” Demis Hassabis, CEO of Google DeepMind, said in a blog post. “Google has been an early adopter and pioneer of the MoE technique for deep learning through research such as Sparsely-Gated MoE, GShard-Transformer, Switch-Transformer, M4 and more.”

The MoE architecture allows a user to input an enormous amount of information but enables that input to be processed with vastly fewer compute cycles in the inference stage. It can then deliver what Dekate called “have hyper-accurate responses.”

“Their competitors are struggling to keep up, but their competitors don’t have DeepMind or the GPU [capacity] Google has to deliver results,” Dekate said.

With the new long-context understanding feature, Gemini 1.5 has a 1.5 million-token context window, meaning it can allow a user to type in a single sentence or upload several books worth of information to the chatbot interface and receive back a targeted, accurate response. By comparison, Gemini 1.0, had a 32,000 token context window.

Rival LLMs are typically limited to about 10,000 token context windows — with the expection of GPT 4, which can accept up to 125,000 tokens.

Natively, Gemini 1.5 Pro comes with a standard 128,000 token context window. Google, however, is allowing a limited group of developers and enterprise customers to try it in private preview with a context window of up to 1 million tokens via AI Studio and Vertex AI; it will grow from there, Google said.

“As we roll out the full one-million token context window, we’re actively working on optimizations to improve latency, reduce computational requirements and enhance the user experience,” Hassabis said.

by Lucas Mearian

Senior Reporter

Senior Reporter Lucas Mearian covers AI in the enterprise, Future of Work issues, healthcare IT and FinTech.

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

Google turbocharges its genAI engine with Gemini 1.5

Only a week after releasing Gemini 1.0, Google has pushed out for testing its latest multimodal AI model; it offers long-context understanding that can accept more than one million tokens.

More from this author

AI is starving for more power. Can quantum computing help?

Meta’s privacy policy lets it use your posts to train its AI

Slack wants to become the ‘long-term memory’ for organizations

What is a CAIO — and what should they know?

IT pros find generative AI doesn’t always play well with others

Afraid AI will steal your job? You’re not alone

DuckDuckGo launches anonymous AI chatbot

How many jobs are available in technology in the US?

Most popular authors

Show me more

Box announces upgrade to Box AI, integration with GPT-4o

Adobe adds Experience Manager ‘content hub’ to help find, reuse digital assets

Google rolls out cloud-based enterprise browser management tool

Podcast: What skills will future tech leaders need?

Podcast: Is social media as dangerous as smoking?

Podcast: Why businesses should get serious about gaming

Skills that future tech leaders will need

Is social media usage as unhealthy as smoking?

Why businesses should get serious about gaming

Google turbocharges its genAI engine with Gemini 1.5

Only a week after releasing Gemini 1.0, Google has pushed out for testing its latest multimodal AI model; it offers long-context understanding that can accept more than one million tokens.

Related content

8 AI-powered apps that'll actually save you time

EU commissioner slams Apple Intelligence delay

Download our unified communications as a service (UCaaS) enterprise buyer’s guide

Enterprise buyer’s guide: Android smartphones for business

From our editors straight to your inbox

More from this author

AI is starving for more power. Can quantum computing help?

Meta’s privacy policy lets it use your posts to train its AI

Slack wants to become the ‘long-term memory’ for organizations

What is a CAIO — and what should they know?

IT pros find generative AI doesn’t always play well with others

Afraid AI will steal your job? You’re not alone

DuckDuckGo launches anonymous AI chatbot

How many jobs are available in technology in the US?

Most popular authors

Show me more

Box announces upgrade to Box AI, integration with GPT-4o

Adobe adds Experience Manager ‘content hub’ to help find, reuse digital assets

Google rolls out cloud-based enterprise browser management tool

Podcast: What skills will future tech leaders need?

Podcast: Is social media as dangerous as smoking?

Podcast: Why businesses should get serious about gaming

Skills that future tech leaders will need

Is social media usage as unhealthy as smoking?

Why businesses should get serious about gaming