opensource.google.com

Menu

Jaspr: Why web development in Dart might just be a good idea

Wednesday, April 15, 2026

Jaspr, the open source web framework, is built on Dart

Most developers know Dart as the language that powers Flutter, the multi-platform app framework. But the Dart ecosystem has so much more to offer. For example: Jaspr, a web framework that provides a familiar Flutter-like experience, but is made for building fast, SEO-friendly, and dynamic websites natively in Dart.

Dart on the web is not a new idea. Initially, Dart was designed to run natively in browsers, similar to JavaScript. Google even developed AngularDart, a pure-Dart version of the popular JS framework. And although this is no longer supported, it resulted in some surprisingly powerful web tooling for Dart. Back in 2016, teams at Google chose Dart for its strong type safety and excellent development experience, and it has only improved since then.

However, all of this was unknown to me when I started building Jaspr in 2022. As a web developer who had transitioned to Flutter, I had grown to love Dart and wanted to explore using it for web development. So Jaspr started as a personal challenge: What would a modern web framework look like if it was built entirely in Dart?

Creating Jaspr as an open source project has been one of the most challenging, but also rewarding journeys of my career. Starting out as a solo maintainer is definitely hard work, but it comes with absolute creative freedom. I can explore unconventional ideas, design APIs exactly how I envision them, and integrate modern features seen in other frameworks. All without being slowed down by processes or roadmaps. I poured more than three years of late nights and weekends into the framework. That dedication finally paid off in a way I had never imagined: Google selected Jaspr to completely rebuild and power the official Dart and Flutter websites.

Architecture & design

To understand how Jaspr actually works, let's look at its underlying design. Jaspr is primarily targeted at Flutter developers venturing into web development. Having a clearly defined niche like this greatly helped me shape the framework and prioritize features, while not getting spread too thin as a maintainer.

One of Jaspr's core design principles is that it should look and feel familiar to Flutter, while relying on native web technologies like HTML and CSS. This sets it apart from Flutter, which since 2021 can also target the web, but instead optimizes for rendering consistency between platforms. It relies fully on the Canvas API for rendering, which comes at the cost of slower loading times and lower SEO. Therefore, Jaspr is the missing piece for Flutter developers wanting to build fast and optimized websites with great SEO.

Jaspr results in a syntax that is remarkably close to Flutter's, and functionality that is much closer to something like React with an efficient, DOM-based rendering algorithm.

Example: Jaspr component | Flutter widget | React component

As you can see, Jaspr's StatelessComponent mirrors Flutter's StatelessWidget, but constructs HTML similar to React with JSX. Jaspr also provides a type-safe API for writing CSS rules directly in Dart.

Client-side rendering is only one aspect of what Jaspr can do. Jaspr is built as a full-stack general purpose framework supporting both Server-Side Rendering (SSR) and Static Site Generation (SSG). In the JavaScript ecosystem, you usually find a hard split between rendering libraries (React, Vue) and meta-frameworks (Next, Nuxt, Astro). Jaspr combines these concepts into one versatile and coherent framework.

In order to achieve this wide range of features with the limited resources I had, I naturally had to make compromises. Since I didn't want to limit the quality of any feature, my strategy focuses more on limiting features to what's important. I also learned to prioritize simple solutions and to design APIs that are flexible and composable.

For instance, I built jaspr_content as a plugin for developing content-driven sites from Markdown and other sources, similar to Astro or VitePress. It provides all the core features needed to build massive documentation websites, and instead of serving every use case out of the box, it is flexible and open enough to be fully customizable. In fact, jaspr_content is what currently powers the new flutter.dev and dart.dev documentation, which contain over 3,900 pages.

Tooling and developer experience

In my opinion, a framework is only as good as its tooling, and this is where Dart truly shines and has provided Jaspr developers with a great developer experience. For example, Flutter is known for its stateful hot-reload, enabling you to swap out code instantly without losing client-side state. But hot-reload is actually a Dart feature, enabled by its unique compiler architecture.

For browser development, the dartdevc compiler performs modular and incremental compilation to JavaScript. It supports stateful hot-reload and provides a seamless debugging experience. By cleverly leveraging source-maps, you can step through native Dart code right in the browser, complete with breakpoints, value inspection, and runtime expression evaluation.

An image show what debugging Jaspr or Dart code looks like when using Chrome DevTools
Debugging Jaspr / Dart code using Chrome DevTools

For production builds, Dart uses the dart2js compiler to generate a heavily optimized, tree-shaken JavaScript bundle, or the newer dart2wasm compiler for even better runtime performance through WebAssembly. On the server side, Dart's JIT compiler provides that same hot-reload and debugging capabilities, while its AOT compiler compiles your server code to optimized, platform-specific, native binaries for production environments.

Jaspr builds on top of these and other capabilities, for example by giving developers full-stack debugging, custom lints and code assists, and something I call component scopes. This is a neat editor feature that adds inline hints to your components, showing whether they are rendered on the server, the client, or both. When building full-stack apps, this makes it much easier to reason about which platform APIs or libraries you can safely use in a specific file. I'm also working on more features to make the full-stack development aspect even smoother. For example, a full-stack hot-reload where on any server-side change, whether updating code or (for example), editing a markdown file, the new pre-rendered HTML is "hot-reloaded" into the page while keeping all client-side state. Features like these are only possible due to Jaspr's approach to combine both server- and client-side rendering into one framework.

Impact and outlook

Last year, Google selected Jaspr for the Dart and Flutter websites, including dart.dev, flutter.dev and docs.flutter.dev (repo), which is used by over a million monthly active users. The sites were migrated from JS- and python-based static site generators to Jaspr and jaspr_content, resulting in a unified setup with less context switching and an easier contribution experience. The move to Jaspr also streamlined the development of brand-new interactive tutorials on dart.dev/learn and docs.flutter.dev/learn. For me this is not only an incredible trust in the capabilities of Jaspr, but also a great way to dogfood Jaspr at scale; it allowed me to invest more time and resources into improving Jaspr.

With AI constantly shifting the scope of software development, I believe the concept of being a strict "domain expert" (a purely mobile or purely web developer) will matter less. However, developers and teams will increasingly value coherent tech stacks to reduce context-switching and leverage unified tooling. Just as React Native became massively popular because it allowed web developers to reuse their skills for mobile (or for companies to "reuse" their developers), Jaspr is a great option for teams working with both Flutter and the web. Apart from using existing skills, Jaspr and Flutter projects can also share up to 100% of their business logic, models, and validation code.

Dart's type safety and high-quality tooling position it well for modern web development. Jaspr evolved to be the missing piece, a cohesive framework with modern features and a great development experience.

I personally see Jaspr as an antithesis to the trend of AI causing everyone to converge onto the same stack, especially in web development. While this also has some benefits, I believe there is immense value in exploring alternative ecosystems. This can push boundaries, surface new ideas, and keep our industry vibrant.

If there's one takeaway from my journey, it's this: Don't be afraid to build the tools you want to use. You never know where that codebase will take you, and it can be incredibly rewarding.

If you're a Dart or Flutter developer curious about building websites with the skills you already have, there's never been a better time to start. Try out Jaspr now on its online playground (which is also built with Jaspr!) or by following the Jaspr quickstart.

Learn more about Flutter's migration in We rebuilt Flutter's websites with Dart and Jaspr.

Oh, and if you're wondering where the name "Jaspr" came from — it's named after my dog, Jasper. If you ever find yourself wandering around jaspr.site and want to Meet Jasper, keep an eye out… you just might find a little easter egg tribute to him.

Leveraging CPU memory for faster, cost-efficient TPU LLM training

Friday, April 10, 2026

Intel Xeon 6 Processor

Host offloading with JAX on Intel® Xeon® processors

As Large Language Models (LLMs) continue to scale into the hundreds of billions of parameters, device memory capacity has become a big limiting factor in training, as intermediate activations from every layer in the forward pass are needed in the backward pass. To reduce device memory pressure, these activations can be rematerialized during the backward pass, trading memory for recomputation. While rematerialization enables larger models to fit within limited device memory, it significantly increases training time and cost.

Intel® Xeon® processors (5th and 6th Gen) with Advanced Matrix Extensions (AMX) enable practical host offloading of selected memory- and compute-intensive components in JAX training workflows. This approach can help teams train larger models, relieve accelerator memory pressure, improve end-to-end throughput, and reduce total cost of ownership—particularly on TPU-based Google Cloud instances.

By publishing these results and implementation details, Google and Intel aim to promote transparency and share practical guidance with the community. This post describes how to enable activation offloading for JAX on TPU platforms and outlines considerations for building scalable, cost-aware hybrid CPU–accelerator training workflows.

Figure 1. Google Cloud TPU Pod commonly used in LLM training.

Host offloading

Traditional LLM training is usually done on device accelerators alone. However, modern host machines have much larger memory size than accelerators (512GB or more) and can offer extra compute power, e.g., TFLOPS in case of Intel® Xeon® Scalable Processor with AMX capability. Leveraging host resources can be a great alternative to rematerialization. Host offloading selectively moves computation or data between host and device to optimize performance and memory usage.

Host memory offloading keeps frequently-accessed tensors on the device and spills the rest to CPU memory as an extra level of cache. Activation offloading transfers activations computed on-device in the forward pass to the host, stores them in the host memory, and brings them back to the device in the backward pass for gradient computation. This unlocks the ability to train larger models, use bigger batch sizes, and improve throughput.

Figure 2: Memory offloading during forward and backward pass

In this blog post, we provide a practical guide to offload activations through JAX to efficiently train larger models on TPUs with an Intel® Xeon® Scalable Processor.

Enabling memory offloading in JAX

JAX offers multiple strategies for offloading activations, model parameters, and optimizer states to the host. Users can use checkpoint_names() to create a checkpoint for a tensor. The snippet below shows how to create a checkpoint  x:

from jax.ad_checkpoint import checkpoint_name 
 
def layer_name(x, w): 
  w1, w2 = w 
  x = checkpoint_name(x, "x") 
  y = x @ w1 
  return y @ w2, None  

Users can provide checkpoint_policies() to select the appropriate memory optimization strategy for intermediate values. There are three strategies:

  1. Recomputing during backward pass (default behavior)
  2. Storing on device
  3. Offloading to host memory after forward pass and loading back during backward pass

The code below moves x from device to the pinned host memory after the forward pass.
from jax import checkpoint_policies as cp

policy = cp.save_and_offload_only_these_names( 
  names_which_can_be_saved=[],         # No values stored on device 
  names_which_can_be_offloaded=["x"],  # Offload activations labeled "x" 
  offload_src="device",                # Move from device memory 
  offload_dst="pinned_host"            # To pinned host memory 
) 

Measuring Host Offloading Benefits on TPU v5p

We examined TPU host-offloading on JAX on both fine-tuning and training workloads. All our experiments were run on Google Cloud Platform, using a single v5p-8 TPU instance with single host 4th Gen Intel® Xeon® Scalable Processor.

Fine-tuning PaliGemma2: Using the base PaliGemma2 28B model for vision-language tasks, we fine-tuned the attention layers of the language model (Gemma2 27B) while keeping all other parameters frozen. During fine-tuning, we set the LLM sequence length to 256 and the batch size to 256.

The default checkpoint policy is nothing_saveable, which does not keep any activations on-device during the forward pass. The activations are rematerialized during the backward pass for gradient computation. While this approach reduces memory pressure on the TPU, it increases compute time. To apply host offloading, we offload Q, K, and V projection weights using save_and_offload_only_these_names. These activations are transferred to host memory (D2H) during the forward pass and fetched back during the backward pass (H2D), so the device neither stores nor recomputes them. Figure 2 shows 10% reduction in training time from host offloading. This translates directly into a similar reduction in TPU core-hours, yielding meaningful cost savings. The complete fine-tuning recipe is available at [JAX host offloading].

Figure 3: (Top) Training time comparison between full rematerialization and host offloading.
(Bottom) Memory analysis with and without host offloading.

Training Llama2-13B using MaxText: MaxText offers several rematerialization strategies that can be specified in the training configuration file. We used the policy remat_policy: 'qkv_proj_offloaded' to offload Q, K, and V projection weights. Figure 3 shows ~5% reduction in per-step training time compared to fully rematerializing all activations ( remat_policy: 'full').

Figure 4: MaxText Llama2-13B training statistics with and without host offloading.
The step time was 5% faster with host offloading.

When to offload activations

Activation offloading is beneficial when the time to transfer activations across host and device is lower than the time to recompute them. The timing depends on multiple factors such as PCIe bandwidth, model size, batch size, sequence length, activation tensor sizes, compute capabilities of the device, etc. An additional factor is how much the data movement can be overlapped with computation to keep the device busy. Figure 4 demonstrates an efficient overlap of the device-to-host transfer with compute during the backward pass in PaliGemma2 28B training.

Figure 5: A JAX trace of PaliGemma2 training viewed on Perfetto.
Memory offloading overlaps with compute effectively during backward pass host to device.

Smaller model variants such as PaliGemma2 3B and 9B did not see benefits from host offloading because it is faster to rematerialize all tensors than to transfer them to and from the host. Therefore, identifying the appropriate workload and offloading policy is crucial to realizing performance gain from host offloading

Call to Action

If you train on TPUs and are limited by device memory, consider evaluating activation offloading. Start by labeling candidate activations (for example, Q/K/V projections) and compare step time, memory headroom, and overall cost across representative workloads.

In our experiments, we observed up to ~10% improvement in end-to-end training time for larger workloads, which can reduce total cost of ownership (TCO) by shortening time-to-train or enabling the same workload on smaller instances.

Acknowledgments

Emilio Cota, and Karlo Basioli from Google and Eugene Zhulenev (formerly at Google).

Celebrate A2April!

Thursday, April 9, 2026

Happy 1st Birthday to A2A! Join the community in celebrating the first anniversary of the A2A and its recent 1.0 release. April 9th marks the official birthday, and we're celebrating all month long with #A2April. To help you celebrate, we've used Gemini to make a party hat.

Use the template and instructions below to create your commemorative party hat.

Assembly Instructions

  1. Print: Print this document on heavy cardstock for the best results.
  2. Cut: Carefully cut along the solid outer border of the semi-circle template.
  3. Fold: Gently curve the template into a cone shape, overlapping the "Glue/Tape Tab" underneath the opposite edge.
  4. Secure: Use double-sided tape or a glue stick along the tab to hold the cone shape.
  5. Finish: Punch two small holes on opposite sides of the base and thread through an elastic string or ribbon to secure the hat to your head.

Party Hat Visualization

Make sure to print in landscape mode

Ways to Celebrate

  • Social Media: Share a photo of yourself wearing your hat with the tag #A2April to help generate that social media buzz.
  • Blog Series: Keep an eye out for the upcoming A2April blog series featuring quotes from the team and stories from the open source community.
  • Community Quotes: If you're using A2A in production, reach out to us via social media and share your story for the birthday post.

Kubernetes goes AI-First: Unpacking the new AI conformance program

Monday, April 6, 2026

As AI workloads move from experimental notebooks into massive production environments, the industry is rallying around a new standard to ensure these workloads remain portable, reliable, and efficient.

At the heart of this shift is the launch of the Certified Kubernetes AI Conformance program.

This initiative represents a significant investment in common, accessible, industry-wide standards, ensuring that the benefits of AI-first Kubernetes are available to everyone.

How Kubernetes is Evolving for an AI-First World

Traditional Kubernetes was built for stateless, cloud-first applications. However, AI workloads introduce unique complexities that standard conformance doesn't fully cover:

  • Specific Hardware Demands: AI models require precise control over accelerators like GPUs and TPUs.
  • Networking and Latency: Inference and distributed training require low-latency networking and specialized configurations.
  • Stateful Nature: Unlike traditional web apps, AI often relies on complex, stateful data pipelines.

The AI Conformance program acts as a superset of standard Kubernetes conformance. To be AI-conformant, a platform must first pass all standard Kubernetes tests and then meet additional requirements specifically for AI.

Key Pillars of the AI Conformance Program

The Kubernetes AI Conformance program is being driven in the open via the AI Conformance program. This cross-company effort is led by industry experts Janet Kuo (Google), Mario Fahlandt (Kubermatic GmbH), Rita Zhang (Microsoft), and Yuan Tang (RedHat). This program is a collaborative effort within the open source ecosystem, involving multiple organizations and individuals. By developing this program in the open, the community ensures the standard is built on trust and directly addresses the diverse needs of the global ecosystem. The program establishes a verified set of capabilities that platforms across the industry, like Google Kubernetes Engine (GKE) and Azure Kubernetes Service (AKS) are already adopting.

Dynamic Resource Allocation (DRA)

DRA is the cornerstone of the new standard. It shifts resource allocation from simple accelerator quantity to fine-grained hardware control via attributes. For data scientists, this means they can now request specific hardware based on characteristics such as memory capacity or specialized capabilities, ensuring the environment perfectly matches the model's needs.

All-or-Nothing Scheduling

Distributed training jobs often face "deadlocks" where some pods start while others wait for resources, wasting expensive GPU time. AI Conformance mandates support for solutions like Kueue, allowing developers to ensure a job only begins when all required resources are available, improving cluster efficiency.

Intelligent Autoscaling for AI Workloads

Conformant clusters must support Horizontal Pod Autoscaling (HPA) based on custom AI metrics, such as GPU or TPU utilization, rather than just standard CPU/memory. This allows clusters to scale up for heavy inference demand and scale down to save costs when idle.

Standardized Observability for High Performance

To manage AI at scale, you need deep visibility. The program requires platforms to expose rich accelerator performance metrics directly, enabling teams to monitor inference latency, throughput, and hardware health in a standardized way.

What's Next?

The launch of AI Conformance is just the beginning. As we head further into 2026, the community is adding automated testing for certification and expanding the standard to include more advanced inference patterns and stricter security requirements.

The ultimate goal? Making "AI-readiness" an inherent, invisible part of the Kubernetes standard.

To get involved and help shape the future of AI on Kubernetes, consider joining AI Conformance in Open Source Kubernetes. We welcome diverse perspectives, as your expertise and feedback are crucial to building a robust and inclusive standard for all.

Gemma 4: Expanding the Gemmaverse with Apache 2.0

Thursday, April 2, 2026

Gemma 4: Expanding the Gemmaverse with Apache 2.0

For over 20 years, Google has maintained an unwavering commitment to the open-source community. Our belief has been simple: open technology is good for our company, good for our users, and good for our world. This commitment to fostering collaborative learning and rigorous testing has consistently proven more effective than pursuing isolated improvements. It's been our approach ever since the 2005 launch of Google Summer of Code, and through our open-sourcing of Kubernetes, Android, and Go, and it remains central to our ongoing, daily work alongside maintainers and organizations.

Today, we are taking a significant step forward in that journey. Since first launch, the community has downloaded Gemma models over 400 million times and built a vibrant universe of over 100,000 inspiring variants, known in the community as the Gemmaverse.

The release of Gemma 4 under the Apache 2.0 license — our most capable open models ranging from edge devices to 31B parameters — provides cutting-edge AI models for this community of developers. The industry-standard Apache license broadens the horizon for Gemma 4's applicability and usefulness, providing well-understood terms for modification, reuse, and further development.

A long legacy of open research

We are committed to making helpful, accessible AI technology and research so that everyone can innovate and grow. That's why many of our innovations are freely available, easy to deploy, and useful to developers across the globe. We have a long history of making our foundational machine-learning research, including word2vec, Jax, and the seminal Transformers paper, publicly available for anyone to use and study.

We accelerated this commitment last year. By sharing models that interpret complex genomic data and identify tumor variants, we contributed to the "magic cycle" of research breakthroughs that translate into real-world impact. This week, however, marks a pivotal moment — Gemma 4 models are the first in the Gemmaverse to be released under the OSI-approved Apache 2.0 license.

Empowering developers and researchers to deliver breakthrough innovations

Since we first launched Gemma in 2024, the community of early adopters has grown into a vast ecosystem of builders, researchers, and problem solvers. Gemma is already supporting sovereign digital infrastructure, from automating state licensing in Ukraine to scaling Project Navarasa across India's 22 official languages. And we know that developers need autonomy, control, and clarity in licensing for further AI innovation to reach its full potential.

Gemma 4 brings three essential elements of free and open-source software directly to the community:

  • Autonomy: By letting people build on and modify the Gemma 4 models, we are empowering researchers and developers with the freedom to advance their own breakthrough innovations however they see fit.
  • Control: We understand that many developers require precise control over their development and deployment environments. Gemma 4 allows for local, private execution that doesn't rely on cloud-only infrastructure.
  • Clarity: By applying the industry-standard Apache 2.0 license terms, we are providing clarity about developers' rights and responsibilities so that they can build freely and confidently from the ground up without the need to navigate prescriptive terms of service.

Building together to drive real-world impact

Gemma 4, as a release, is an invitation. Whether you are a scientific researcher exploring the language of dolphins, an industry developer building the next generation of open AI agents, or a public institution looking to provide more effective, efficient, and localized services to your citizens, Google is excited to continue building with you. The Gemmaverse is your playground, and with Apache 2.0, the possibilities are more boundless than ever.

We can't wait to see what you build.

.