Empowering app developers: Fine-tuning Gemma 3 for mobile with Tunix in Google Colab

by Henry Ndubuaku & Noah Cyclich, Cactus Compute and Lance Wang & Srikanth Kilaru, Google ML Frameworks

In the rapidly evolving world of AI models for mobile devices, a persistent challenge is how to bring SOTA LLMs to smartphones without compromising on privacy or requiring App developers to be Machine Learning engineers.

Today, we are excited to talk about how Cactus, a startup building a next-gen inference engine for mobile devices, fine-tunes the open-source Gemma 3 model. By leveraging Tunix, the LLM post-training library in the JAX ML ecosystem, they achieved this entirely on Google Colab's Free Tier.

The Challenge: Making Small Models "Expert"

For app developers, running Large Language Models (LLMs) in the cloud isn't always an option due to privacy concerns (like GDPR) and latency requirements. The solution lies in running models locally on the device. However, most smartphones globally lack specialized MPUs (Micro Processing Units), meaning developers need highly efficient, smaller models.

While compact models like Gemma (270M or 1B parameters) are incredibly efficient, they are often "generalists." To be useful for specific mobile applications—such as a medical imaging assistant or a legal document analyzer—they need to be fine-tuned to become domain experts.

The problem? Most app developers are not ML infrastructure experts. Setting up complex training pipelines, managing dependencies, and navigating steep learning curves creates too much friction.

The Solution: SFT via Tunix on Google Colab

To solve this, Cactus created a simplified "Low-Friction" workflow by implementing a Python script using Supervised Fine Tuning (SFT) APIs of Tunix in a Colab.

1. The Engine: Tunix

Cactus utilized Tunix, Google's lightweight and modular LLM post-training library, which supports both SFT and leading RL algorithms, and executes natively on TPUs. Tunix strips away the complexity of heavy frameworks, offering a simplified path to Supervised Fine-Tuning (SFT).

2. The Access: Google Colab Free Tier

Accessibility was a key requirement. Instead of requiring developers to set up complex cloud billing and project IDs immediately, the workflow operates entirely within a Google Colab Notebook. By utilizing the free tier of Colab, developers can:

Load the Gemma 3 model.
Upload their specific dataset (e.g., medical data or customer service logs).
Run an SFT (Supervised Fine-Tuning) job using Tunix.
Export the weights for conversion.

3. The Deployment: Cactus

Once tuned, the model is converted into the Cactus graph format. This allows the now-specialized Gemma 3 model to be deployed directly into a Flutter or native mobile app with just a few lines of code, running efficiently on a wide range of smartphone hardware.

Why This Matters

"Our users are app developers, not ML engineers," explains Henry Ndubuaku, co-founder of Cactus. "They want to pick a model, upload data, and click 'tune.' By using Tunix and Colab, we can give them a 'clone-and-run' experience that removes the intimidation factor from fine-tuning."

This workflow represents the "lowest hanging fruit" in democratizing AI:

No complex local environment setup.
No upfront infrastructure costs.
High-performance JAX native Tunix library to tune a leading OSS model (Gemma).

What's Next?

While the Colab notebook provides an immediate, accessible solution, Cactus is exploring a future plan to build a full GUI-based portal for fine-tuning and quantization of LLMs with the back end compute as Google Cloud TPUs, allowing for scalable training of larger models and even more seamless integration into the mobile development lifecycle.

Get Started

Ready to turn your mobile app into an AI powerhouse? Check out the Tunix SFT Notebook for Cactus and start fine-tuning Gemma 3 for your device today:

You can explore Tunix sample scripts, documentation and repo at:

opensource.google.com

Google Open Source Blog