Google Just Released Gemma 4: A Free, Open-Source AI Model Anyone Can Download and Use

Google releases Gemma 4 on April 3, 2026. Gemma 4 is licensed under Apache 2.0 for the first time, and available to download for free on any Android device, laptop GPU, or cloud infrastructure without a subscription, without sending data to Google, and without paying royalties to use it commercially. 

Let’s take a closer look at what Gemma 4 is, how it differs from Gemini, what the Apache 2.0 license actually means for developers, which of its 4 model sizes fits which use case, why this matters for privacy and enterprise use, and where and how you can get Gemma 4.

What is Gemma 4?

Gemma 4 is a free, open-source AI model built by Google that runs directly on your device. No internet required, no subscription, and no data shared with anyone. Gemma 4 is capable of understanding text, images, video, and audio to answer questions, write code, and complete complex tasks autonomously.

Gemma vs. Gemini: 

Both are built from the same Google DeepMind research, but everything else is different.

Gemma 4Gemini
AccessDownload and run locallyGoogle’s servers only
Internet RequiredNoYes
CostFreeFree tier + paid subscription
Data SharingNone: stays on your devicePasses through Google’s servers
Modify or RedistributeYes: Apache 2.0No
Built Into Google ProductsNoYes
OwnershipYours once downloadedGoogle’s

Most people know that Gemini is Google’s AI assistant built into Search, Gmail, Google Docs, and Android. Gemini is a proprietary product. You access it through Google’s services, your conversations pass through Google’s servers, and advanced features require a paid subscription.

Here’s how Gemma is different in 3 fundamental ways:

  • First, it runs locally: on your device, offline, without an internet connection. 
  • Second, nothing you do with it is shared with Google: no chats, no uploaded files, no generated outputs. 
  • Third, it costs nothing to use, modify, or build products with. 

Both Gemma and Gemini are built from the same underlying AI research at Google DeepMind, but one is a subscription service, and the other is a free tool you own entirely once you download it.

What Apache 2.0 Actually Means and Why It Matters

Apache 2.0 is a free software license created and managed by the Apache Software Foundation (ASF). Apache 2.0 lets anyone use, modify, and sell a product commercially, with no restrictions except crediting the original creator.

Previous versions of Gemma were technically “open,”  meaning you could download the model, but they used a custom Google license that restricted what you could do with it commercially. Developers criticised this as too limiting for real-world use.

Gemma 4 changes this entirely. Apache 2.0 is one of the most permissive open-source licenses in existence. Under Apache 2.0, any developer can:

  • Download Gemma 4 for free
  • Modify it however they want
  • Build a commercial product with it
  • Sell that product
  • Redistribute the model
  • All without paying Google a single dollar in royalties

The only requirement is attribution. You must credit Google as the source. That is the entire restriction. For developers who want to build AI-powered applications without subscription costs or data sharing obligations, this is the most significant policy change in Gemma’s history.

What Gemma 4 Can Actually Do

Gemma 4 is not just a text chatbot. It processes text, images at variable resolutions, video, and audio, making it a fully multimodal model capable of tasks that go well beyond answering questions.

Its 6 core capabilities are:

  1. Advanced reasoning: Multi-step logic, mathematical problem solving, and deep analysis across all model sizes
  2. Offline code generation: Writing, completing, and debugging code without requiring a cloud connection
  3. Agentic workflows: Autonomous multi-step task planning with function calling and structured output, enabling AI agents that take actions rather than just answering questions
  4. Vision and audio processing: Reading charts, interpreting images, transcribing speech, and analysing video natively on the smaller E2B and E4B models
  5. Extended context: Processing up to 256,000 tokens in a single session on the larger models, equivalent to approximately 200,000 words of input
  6. Multilingual fluency: Trained on more than 140 languages, covering the vast majority of global language needs

4 Model Sizes: Which One Is Right for Which Task

Gemma 4 ships in 4 sizes, each designed for a different hardware environment and use case. Choosing the right size depends on what device you are running it on and what you need it to do.

ModelParametersBest ForMemory Needed (16-bit)Memory Needed (4-bit)
Gemma 4 E2BEffective 2BMobile, browser, and offline apps9.6 GB3.2 GB
Gemma 4 E4BEffective 4BEdge devices, Pixel phones, Chrome15 GB5 GB
Gemma 4 31BDense 31BComplex enterprise tasks, local servers58.3 GB17.4 GB
Gemma 4 26B A4BMoE 26BHigh-speed reasoning, cloud inference48 GB15.6 GB

A note on the 31B model and quantisation: 

At full 16-bit precision, the 31B model requires 58.3GB of GPU memory, beyond consumer hardware. At 4-bit quantisation, it drops to 17.4GB, within reach of a high-end consumer GPU like an RTX 4090. Quantisation reduces memory requirements by compressing the model’s numerical precision, with a small trade-off in output quality. For most developer use cases, 4-bit quantisation on the 31B delivers server-grade performance on accessible hardware.

A note on the 26B MoE model: 

Despite having 26 billion parameters, this model only activates 4 billion of them for each response, dramatically reducing the computational cost of each generation. The full 26B must be loaded into memory for fast routing, but active inference runs at 4B speed. This makes it the most efficient option for high-volume applications where speed and cost matter.

Why This Matters for Privacy and Enterprise Use

Because Gemma 4 runs entirely on local or private infrastructure, it is the first Google AI model genuinely suitable for organisations with strict data privacy requirements. Under GDPR in Europe, HIPAA in healthcare, or national data sovereignty laws in any jurisdiction, sending user data to a third-party cloud AI service creates compliance obligations. Gemma 4 eliminates this. The model runs inside your own environment, your data never leaves, and Google has no visibility into its use.

Google has made Gemma 4 available across its Sovereign Cloud offerings, including air-gapped and on-premises deployments for government and defence use cases. For enterprise buyers, this is the clearest path to deploying frontier AI capability while maintaining complete data control.

Where and How to Get Gemma 4

Gemma 4 is available to download today from 4 platforms at no cost: Kaggle, Hugging Face, Ollama, and Google AI Studio. Developers building on Google Cloud can deploy it through Vertex AI, Cloud Run with NVIDIA RTX PRO 6000 GPUs, or Google Kubernetes Engine with vLLM for high-throughput production workloads. The 26B MoE model will be available as fully managed and serverless on Vertex AI’s Model Garden within the coming days.

How Gemma 4 Compares to the Competition

The open-source AI model market includes Meta Llama 4, Mistral, Qwen, and DeepSeek. All capable models are competing for developer adoption. Gemma 4 differentiates itself across 4 specific dimensions.

Gemma 4Meta Llama 4MistralQwenDeepSeek
LicenseApache 2.0Custom (restrictive)Apache 2.0Apache 2.0MIT
Context Window256K128K128K128K128K
Native Video + AudioYesNoNoNoNo
Google Cloud IntegrationYesNoNoNoNo
Runs LocallyYesYesYesYesYes
Free Commercial UseYesLimitedYesYesYes

Final Takeaway

Gemma 4 is Google’s clearest statement yet that open AI is a serious strategic priority. Not a side project. Apache 2.0 licensing removes the last practical barrier to commercial adoption. Local execution removes the privacy barrier for regulated industries. 

4 model sizes remove the hardware barrier for developers at every scale. Anyone can download it today, build with it tomorrow, and ship a product with it next week. All without paying Google anything or sharing a single byte of user data. That is what genuinely open AI looks like.

AI models, developer tools, and the technology decisions shaping how software gets built, our newsletter covers every release worth knowing about. Subscribe and stay ahead.

Join the IT Horizon Community

Stay connected with a community of curious minds following the ideas, breakthroughs, and disruptions shaping our digital future. Join the conversation.

Related blogs

Top Stories

April 14, 2026

Google Maps Just Got Its Biggest Upgrade in a Decade, and It Changes Everything About How You Find Places

April 14, 2026

Japan Just Bet $16 Billion on a Chip Startup Nobody Had Heard of 3 Years Ago

April 14, 2026

Blue Light and Sleep: Why Your Phone Isn’t the Real Reason You’re Tired at Night

April 14, 2026

Trump Posted an AI Image of Himself as Jesus, Then Deleted It After His Own Base Turned on Him

April 14, 2026

Has Neuralink Made a Miscalculation? The Reality Behind the Hype

April 14, 2026

Art schools vs AI: adaptation or erosion?