Why diffusion for text? While the AI research community has explored diffusion-based text generation for years, applying it to large models has remained a challenge. DiffusionGemma changes this by shifting how models use hardware. The trade-off with traditional models Most language models act like a typewriter, generating one token at a time from left to…
For centuries, the scientific method has been the greatest engine of human progress. At Google, our mission is deeply rooted in building tools to accelerate it. We believe that a new era of discovery won’t come from narrow, specialized models, but general agents that empower researchers across every scientific field. That’s why we are introducing…
Last year, Nano Banana brought Gemini's intelligence to image generation and editing. Since then, it’s helped millions of people restore old photos, design from sketches and visualize ideas in ways that weren’t possible before. From the start we built Gemini to be natively multimodal from the ground up, and now we’re taking the next step.…
Street View: ground your worlds in real places When creating imaginative worlds in Project Genie, you can now also base them on real places. Just tap the Maps pin to choose a place in the U.S. and optionally select a style for your world, like “Desert Sands” or “Stone Age.” Then, describe your character —…
The Asia-Pacific region is a global engine for economic growth, but it's also highly vulnerable to climate change. While green technologies are gaining momentum, a recent report shows they aren’t scaling fast enough to keep up with the region’s rising environmental risks. To help innovators tackle these environmental challenges, we’re launching an inaugural Google DeepMind…
Today, we’re introducing Gemini 3.1 Flash TTS, the latest text-to-speech model that delivers improved controllability, expressivity and quality — empowering developers, enterprises and everyday users to build the next generation of AI-speech applications. Starting today, 3.1 Flash TTS is rolling out: Improved speech quality and controllability We’ve improved the overall speech quality of Gemini 3.1…
In a breakthrough powered by AlphaFold, scientists have mapped the structure of the large protein that gives “bad cholesterol” its form – a discovery that could help transform how researchers and clinicians treat the world’s leading cause of death The race to reveal a key protein behind heart disease has long been both an important…
Introducing D4RT, a unified AI model for 4D scene reconstruction and tracking across space and time. Anytime we look at the world, we perform an extraordinary feat of memory and prediction. We see and understand things as they are at a given moment in time, as they were a moment ago, and how they are…
Today, Veo is getting more expressive, with improvements that help you create more fun, creative, high-quality videos based on ingredient images, built directly for the mobile format. We’re excited to bring new creative possibilities for everyone from casual storytellers to professional filmmakers. We’re releasing: Improvements to Veo 3.1 Ingredients to Video, our capability that lets…
Large language models (LLMs) are increasingly becoming a primary source for information delivery across diverse use cases, so it’s important that their responses are factually accurate. In order to continue improving their performance on this industry-wide challenge, we have to better understand the types of use cases where models struggle to provide an accurate response…