Google releases Gemma 4 on April 3, 2026. Gemma 4 is licensed under Apache 2.0 for the first time, and available to download for free on any Android device, laptop GPU, or cloud infrastructure without a subscription, without sending data to Google, and without paying royalties to use it commercially.
Let’s take a closer look at what Gemma 4 is, how it differs from Gemini, what the Apache 2.0 license actually means for developers, which of its 4 model sizes fits which use case, why this matters for privacy and enterprise use, and where and how you can get Gemma 4.
What is Gemma 4?
Gemma 4 is a free, open-source AI model built by Google that runs directly on your device. No internet required, no subscription, and no data shared with anyone. Gemma 4 is capable of understanding text, images, video, and audio to answer questions, write code, and complete complex tasks autonomously.
Gemma vs. Gemini:
Both are built from the same Google DeepMind research, but everything else is different.
| Gemma 4 | Gemini | |
| Access | Download and run locally | Google’s servers only |
| Internet Required | No | Yes |
| Cost | Free | Free tier + paid subscription |
| Data Sharing | None: stays on your device | Passes through Google’s servers |
| Modify or Redistribute | Yes: Apache 2.0 | No |
| Built Into Google Products | No | Yes |
| Ownership | Yours once downloaded | Google’s |
Most people know that Gemini is Google’s AI assistant built into Search, Gmail, Google Docs, and Android. Gemini is a proprietary product. You access it through Google’s services, your conversations pass through Google’s servers, and advanced features require a paid subscription.
Here’s how Gemma is different in 3 fundamental ways:
- First, it runs locally: on your device, offline, without an internet connection.
- Second, nothing you do with it is shared with Google: no chats, no uploaded files, no generated outputs.
- Third, it costs nothing to use, modify, or build products with.
Both Gemma and Gemini are built from the same underlying AI research at Google DeepMind, but one is a subscription service, and the other is a free tool you own entirely once you download it.
What Apache 2.0 Actually Means and Why It Matters
Apache 2.0 is a free software license created and managed by the Apache Software Foundation (ASF). Apache 2.0 lets anyone use, modify, and sell a product commercially, with no restrictions except crediting the original creator.
Previous versions of Gemma were technically “open,” meaning you could download the model, but they used a custom Google license that restricted what you could do with it commercially. Developers criticised this as too limiting for real-world use.
Gemma 4 changes this entirely. Apache 2.0 is one of the most permissive open-source licenses in existence. Under Apache 2.0, any developer can:
- Download Gemma 4 for free
- Modify it however they want
- Build a commercial product with it
- Sell that product
- Redistribute the model
- All without paying Google a single dollar in royalties
The only requirement is attribution. You must credit Google as the source. That is the entire restriction. For developers who want to build AI-powered applications without subscription costs or data sharing obligations, this is the most significant policy change in Gemma’s history.
What Gemma 4 Can Actually Do
Gemma 4 is not just a text chatbot. It processes text, images at variable resolutions, video, and audio, making it a fully multimodal model capable of tasks that go well beyond answering questions.
Its 6 core capabilities are:
- Advanced reasoning: Multi-step logic, mathematical problem solving, and deep analysis across all model sizes
- Offline code generation: Writing, completing, and debugging code without requiring a cloud connection
- Agentic workflows: Autonomous multi-step task planning with function calling and structured output, enabling AI agents that take actions rather than just answering questions
- Vision and audio processing: Reading charts, interpreting images, transcribing speech, and analysing video natively on the smaller E2B and E4B models
- Extended context: Processing up to 256,000 tokens in a single session on the larger models, equivalent to approximately 200,000 words of input
- Multilingual fluency: Trained on more than 140 languages, covering the vast majority of global language needs
4 Model Sizes: Which One Is Right for Which Task
Gemma 4 ships in 4 sizes, each designed for a different hardware environment and use case. Choosing the right size depends on what device you are running it on and what you need it to do.
| Model | Parameters | Best For | Memory Needed (16-bit) | Memory Needed (4-bit) |
| Gemma 4 E2B | Effective 2B | Mobile, browser, and offline apps | 9.6 GB | 3.2 GB |
| Gemma 4 E4B | Effective 4B | Edge devices, Pixel phones, Chrome | 15 GB | 5 GB |
| Gemma 4 31B | Dense 31B | Complex enterprise tasks, local servers | 58.3 GB | 17.4 GB |
| Gemma 4 26B A4B | MoE 26B | High-speed reasoning, cloud inference | 48 GB | 15.6 GB |
A note on the 31B model and quantisation:
At full 16-bit precision, the 31B model requires 58.3GB of GPU memory, beyond consumer hardware. At 4-bit quantisation, it drops to 17.4GB, within reach of a high-end consumer GPU like an RTX 4090. Quantisation reduces memory requirements by compressing the model’s numerical precision, with a small trade-off in output quality. For most developer use cases, 4-bit quantisation on the 31B delivers server-grade performance on accessible hardware.
A note on the 26B MoE model:
Despite having 26 billion parameters, this model only activates 4 billion of them for each response, dramatically reducing the computational cost of each generation. The full 26B must be loaded into memory for fast routing, but active inference runs at 4B speed. This makes it the most efficient option for high-volume applications where speed and cost matter.
Why This Matters for Privacy and Enterprise Use
Because Gemma 4 runs entirely on local or private infrastructure, it is the first Google AI model genuinely suitable for organisations with strict data privacy requirements. Under GDPR in Europe, HIPAA in healthcare, or national data sovereignty laws in any jurisdiction, sending user data to a third-party cloud AI service creates compliance obligations. Gemma 4 eliminates this. The model runs inside your own environment, your data never leaves, and Google has no visibility into its use.
Google has made Gemma 4 available across its Sovereign Cloud offerings, including air-gapped and on-premises deployments for government and defence use cases. For enterprise buyers, this is the clearest path to deploying frontier AI capability while maintaining complete data control.
Where and How to Get Gemma 4
Gemma 4 is available to download today from 4 platforms at no cost: Kaggle, Hugging Face, Ollama, and Google AI Studio. Developers building on Google Cloud can deploy it through Vertex AI, Cloud Run with NVIDIA RTX PRO 6000 GPUs, or Google Kubernetes Engine with vLLM for high-throughput production workloads. The 26B MoE model will be available as fully managed and serverless on Vertex AI’s Model Garden within the coming days.
How Gemma 4 Compares to the Competition
The open-source AI model market includes Meta Llama 4, Mistral, Qwen, and DeepSeek. All capable models are competing for developer adoption. Gemma 4 differentiates itself across 4 specific dimensions.
| Gemma 4 | Meta Llama 4 | Mistral | Qwen | DeepSeek | |
| License | Apache 2.0 | Custom (restrictive) | Apache 2.0 | Apache 2.0 | MIT |
| Context Window | 256K | 128K | 128K | 128K | 128K |
| Native Video + Audio | Yes | No | No | No | No |
| Google Cloud Integration | Yes | No | No | No | No |
| Runs Locally | Yes | Yes | Yes | Yes | Yes |
| Free Commercial Use | Yes | Limited | Yes | Yes | Yes |
Final Takeaway
Gemma 4 is Google’s clearest statement yet that open AI is a serious strategic priority. Not a side project. Apache 2.0 licensing removes the last practical barrier to commercial adoption. Local execution removes the privacy barrier for regulated industries.
4 model sizes remove the hardware barrier for developers at every scale. Anyone can download it today, build with it tomorrow, and ship a product with it next week. All without paying Google anything or sharing a single byte of user data. That is what genuinely open AI looks like.
AI models, developer tools, and the technology decisions shaping how software gets built, our newsletter covers every release worth knowing about. Subscribe and stay ahead.





