Posts from 2025

Announcing Magika 1.0: now faster, smarter, and rebuilt in Rust

Thursday, November 6, 2025

Announcing Magika 1.0: now faster, smarter, and rebuilt in Rust

Elie Bursztein, Julien Cretin, and Yanick Fratantonio, Applied Cybersecurity Research

Early last year, we open sourced Magika, Google's AI-powered file type detection system. Magika has seen great adoption by open source communities since that alpha release, with over one million monthly downloads. Today, we are happy to announce the release of Magika 1.0, a first stable version that introduces new features and a host of major improvements since last announcement. Here are the highlights:

Expanded file type support for more than 200 types (up from ~100).
A brand-new, high-performance engine rewritten from the ground up in Rust.
A native Rust command-line client for maximum speed and security.
Improved accuracy for challenging text-based formats like code and configuration files.
A revamped Magika Python and TypeScript module for even easier integrations.

Smarter Detection: Doubling Down on File Types

Magika 1.0 now identifies more than 200 content types, doubling the number of file-types supported from the initial release. This isn't just about a bigger number; it unlocks far more granular and useful identification, especially for specialized, modern file types.

Some of the notable new file types detected include:

Data Science & ML: We've added support for formats such as Jupyter Notebooks (ipynb), Numpy arrays (npy, npz), PyTorch models (pytorch), ONNX (onnx) files, Apache Parquet (parquet), and HDF5 (h5).
Modern Programming & Web: The model now recognizes dozens of languages and frameworks. Key additions include Swift (swift), Kotlin (kotlin), TypeScript (typescript), Dart (dart), Solidity (solidity), Web Assembly (wasm), and Zig (zig).
DevOps & Configuration: We've expanded detection for critical infrastructure and build files, such as Dockerfiles (dockerfile), TOML (toml), HashiCorp HCL (hcl), Bazel (bazel) build files, and YARA (yara) rules.
Databases & Graphics: We also added support for common formats like SQLite (sqlite) databases, AutoCAD (dwg, dxf) drawings, Adobe Photoshop (psd) files, and modern web fonts (woff, woff2).
Enhanced Granularity: Magika is now smarter at differentiating similar formats that might have been grouped together. For example, it can now distinguish:
- JSONL (jsonl) vs. generic JSON (json)
- TSV (tsv) vs. CSV (csv)
- Apple binary plists (applebplist) from regular XML plists (appleplist)
- C++ (cpp) vs. C (c)
- JavaScript (javascript) vs. TypeScript(typescript)

Expanding Magika's detection capabilities introduced two significant technical hurdles: data volume and data scarcity.

First, the scale of the data required for training was a key consideration. Our training dataset grew to over 3TB when uncompressed, which required an efficient processing pipeline. To handle this, we leveraged our recently released SedPack dataset library. This tool allows us to stream and decompress this large dataset directly to memory during training, bypassing potential I/O bottlenecks and making the process feasible.

Second, while common file types are plentiful, many of the new, specialized, or legacy formats presented a data scarcity challenge. It is often not feasible to find thousands of real-world samples for every file type. To overcome this, we turned to generative AI. We leveraged Gemini to create a high-quality, synthetic training set by translating existing code and other structured files from one format to another. This technique, combined with advanced data augmentation, allowed us to build a robust training set, ensuring Magika performs reliably even on file types for which public samples are not readily available.

The complete list of all 200+ supported file types is available in our revamped documentation.

Under the Hood: A High-Performance Rust Engine

We completely rewrote Magika's core in Rust to provide native, fast, and memory-safe content identification. This engine is at the heart of the new Magika native command line tool that can safely scan hundreds of files per second.

Output of the new Magika Rust based command line tool

Magika is able to identify hundreds of files per second on a single core and easily scale to thousands per second on modern multi-core CPUs thanks to the use of the high-performance ONNX Runtime for model inference and Tokio for asynchronous parallel processing, For example, as visible in the chart below, on a MacBook Pro (M4), Magika processes nearly 1,000 files per second.

Getting Started

Ready to try it out? Getting started with the native command-line client is as simple as typing a single command line:

On Linux and MacOS: curl -LsSf https://securityresearch.google/magika/install.sh | sh
On Windows (PowerShell): powershell -ExecutionPolicy ByPass -c "irm https://securityresearch.google/magika/install.ps1 | iex"

Alternatively, the new Rust command-line client is also included in the magika python package, which you can install with: pipx install magika.

For developers looking to integrate Magika as a library into their own applications in Python, JavaScript/TypeScript, Rust, or other languages, head over to our comprehensive developer documentation to get started.

What's next

We're incredibly excited to see what you will build using Magika's enhanced file detection capabilities.

We invite you to join the community:

Try Magika: Install it and run it on your files, or try it out in our web demo.
Integrate Magika into your software: Visit our documentation to get started.
Give us a star on GitHub to show your support.
Report issues or suggest new file types you'd like to see by opening a feature request.
Contribute new features and bindings by opening a pull request.

Thank you to everyone who has contributed, provided feedback, and used Magika over the past year. We can't wait to see what the future holds.

Acknowledgements

Magika's continued success was made possible by the help and support of many people, including: Ange Albertini, Loua Farah, Francois Galilee, Giancarlo Metitieri, Alex Petit-Bianco, Kurt Thomas, Luca Invernizzi, Lenin Simicich, and Amanda Walker.

This Week in Open Source #11

Friday, October 31, 2025

This Week in Open Source for October 31, 2025

A look around the world of open source

by Daryl Ducharme & amanda casari, Google Open Source

Happy Halloween. Here is your treat in the form of news and events from the world of open source.

Upcoming Events

November 10 - 13: Kubecon NA is coming to Atlanta, Georgia along with Cloud Native Con. It brings together adopters and technologists from leading open source and cloud native communities.
December 5 - 7: PyLadiesCon is happening online and in multiple languages across many timezones. This event is dedicated to empowerment, learning, and diversity within the Python community!
December 8-10: Open Source Summit Japan is happening in Tokyo. Open Source Summits are The Linux Foundation's premier event for open source developers and contributors around the world. If you can make it to Japan there are many sessions to learn from.

Open Source Reads and Links

A new breed of analyzers - AI-powered code analyzers have recently found many real, useful bugs in curl that earlier tools missed. They scanned all source variations without a build and reported high-quality issues like memory leaks and protocol faults. The curl team fixed dozens of them and now works with the reporters to keep improving security.
A national recognition; but science and open source are bitter victories - Gaël Varoquaux received France's national order of merit for his work in science, open source, and AI. He celebrates how open tools and collective effort changed the world but warns that economic power can turn those tools to harmful ends. He urges building a collective narrative and economic ambition so science and free software serve a better future for our children. (disponible en français aussi)
If Open Source Stops Being Global, It Stops Being Open - Geopolitics is pushing technology toward national control. Open source preserves sovereignty because code is user-controlled and global. Should governments buy and support global open source? If it stops being global, does it stop being open?
Vibe Coding Is the New Open Source—in the Worst Way Possible - Developers are using AI-generated "vibe coding" like they used open source, but it can hide insecure or outdated code. AI often produces inconsistent, hard-to-trace code that increases software supply-chain risk. That danger hits small, vulnerable groups hardest and could create widespread security failures.
New Open Source Tool from Angular Scores Vibe Code Quality - One of the Angular developers took up the challenge [of evaluating the best LLM for Angular] and vibe-coded a prototype tool that could test how well vibe code works with Angular. That early experiment led to the creation of an open source tool that tests LLM-generated code for frontend development considerations, such as following best practices for a framework, using accessibility best practices and identifying security problems. Called Web Codegen Scorer, the tool is designed to test all of these in vibe-coded applications.

What spooky open source events and news are you being haunted by? Let us know on our @GoogleOSS X account. We will share some of the best on our next This Week in Open Source post.

Building the future with Blockly at Raspberry Pi Foundation

Tuesday, October 28, 2025

Building the future with Blockly at the Raspberry Pi Foundation

By Rachel Fenichel, Blockly

Today we're announcing that Blockly, Google's open source library for drag-and-drop programming, is moving to the stewardship of the Raspberry Pi Foundation on November 10, 2025.

Since its creation at Google in 2011, Blockly has grown from a passion project to a standard for visual programming. Educational platforms such as Scratch, MakeCode, and LEGO Education use Blockly to remove barriers to entry into the world of programming. Blockly's move to the Raspberry Pi Foundation reflects close alignment with its education-focused mission.

The Raspberry Pi Foundation is one of the world's leading non-profits dedicated to advancing computing education. This move is designed to sustain Blockly's long-term stability and continued innovation as a foundational tool for block-based coding and computer science worldwide.

We are delighted that the Raspberry Pi Foundation will be the new home for Blockly, the world's leading open source library for visual programming. We are committed to maintaining Blockly as an open source project and look forward to working collaboratively with the amazing community of developers and educators to increase its reach and impact in the years to come.

– Philip Colligan, Chief Executive at Raspberry Pi Foundation

Blockly's growth, evolution, and success rest on a foundation of support and investment in open source software from Google over many years. Google.org's support for Blockly's future at Raspberry Pi Foundation strengthens the ecosystem built on block-based coding, fostering greater innovation and expanding access to computational thinking for people around the world.

Looking forward, I'm excited for our future collaborations with the Foundation's world-class research, learning and product teams. We are committed to Blockly's ongoing development, including both feature development and support. Blockly will continue to be free and open source, and existing projects do not need to change anything about how they use Blockly.

To learn more about the transition and read the FAQ, visit blockly.com

2024 Open Source Contributions: A Year in Review

Tuesday, October 14, 2025

2024 Open Source Contributions: A Year in Review

By Sophia Vargas & Daryl Ducharme, Google Open Source

Open source is a critical part of Google with many upstream projects and communities contributing to our infrastructure, products, and services. Within the Open Source Programs Office (OSPO), we continue to focus on investing in the sustainability of open source communities and expanding access to open source opportunities for contributors around the world. As participants in this global ecosystem, our goal with this report is to provide transparency and to report our work within and around open source communities.

In 2024 roughly 10% of Alphabet's full-time workforce actively contributed to open source projects. This percentage has remained roughly consistent over the last five years, indicating that our open source contribution has remained proportional to the size of Alphabet over time. Over the last 5 years, Google has released more than 8000 open source projects, features, libraries, SDKs, datasets, sample code and more. In 2024 alone, we launched more than 700 projects across a wide range of domains: from the Earth Agent Dataset Explorer to Serberus to CQL.

Most open source projects we contribute to are outside of Alphabet

In 2024, employees from Alphabet interacted with more than 19,000 public repositories on GitHub. Over the last six years, more than 78% of the non-personal GitHub repositories receiving Alphabet contributions were outside of Google-managed organizations. Our top external projects (by number of unique contributors at Alphabet) include both Google-initiated projects as well as community-led projects.

In addition to Alphabet employees supporting external projects, in 2024 Alphabet-led projects received contributions from more than 150,000 non-Alphabet employees (unique GitHub accounts not affiliated with Alphabet).

A year of open-source AI and Gemma

As part of the focus on AI in 2024, Google's OSPO supported and actively participated in multiple community efforts, such as the OSI's Open Source AI definition initiative. We continued to release projects with open-source licenses, including AI models and projects, and will continue to be precise in making clear distinctions between "open source" and "open models", as shown in Deep Mind's blog posts about models.

Speaking of AI models — the Gemma team collaborated with the community in every launch. For instance, they shared model weights early with partners like Hugging Face, llama.cpp, mlx. This collaborative approach helped increase Gemma's distribution across many frameworks.

This community spirit is also reflected in projects like GAIA, where a Gemma model was fine-tuned for Portuguese in collaboration with the University of Goias. This collaboration enabled Brazilian government institutions to start using the model, demonstrating the real-world impact of open-source AI. The success of projects like Gemma and Gaia underscores a key theme from our research efforts in 2024: the creation and curation of large, high-quality datasets and open-source tools to accelerate innovation and empower researchers worldwide.

Open data is key to research

The Google Research team created and curated many large, high-quality open access datasets. These serve as the foundation for developing more accurate and equitable AI models. Many of the research projects discussed in their blog from 2024 are also committed to an open science framework, with a strong emphasis on releasing open-source tools, models, and datasets to the broader research community. This collaborative approach accelerates innovation and empowers researchers worldwide.

This commitment is demonstrated through the application of AI and machine learning to tackle complex challenges in various scientific domains. From mapping the human brain in neuroscience to advancing climate science with NeuralGCM and improving healthcare with open foundation models, AI is being used for social good. To foster collaboration and accelerate this research, many projects, including AutoBNN, and NeuralGCM, are made publicly available to the research community.

A key part of making data accessible is the development of new tools and standards. The Croissant metadata format, for example, makes datasets more accessible and usable for machine learning. By focusing on the creation of high-quality datasets, the development of open-source tools, and the application of AI to scientific research, Google Research is helping to build a more open and collaborative research ecosystem.

Investing in the next generation of open source contributors

As a longstanding consumer and contributor to open source projects, we believe it is vital to continue funding both established communities as well as invest in the next generation of contributors to ensure the sustainability of open source ecosystems. In 2024, OSPO provided $2.0M in sponsorships and membership fees to more than 40 open source projects and organizations. Note that this value only represents OSPO's financial contribution; other teams across Alphabet also directly fund open source work. In addition, we continue to support our longstanding programs:

In its 20th year, Google Summer of Code (GSoC) enabled more than 1200 individuals to contribute to 195 organizations. Over the lifetime of this program, more than 21,000 individuals from 123 countries have contributed to more than 1,000 open source organizations across the globe.

Timothy Jordan's keynote from All Things Open highlighting GSoC's amazing 20 years in open source.
In its sixth and final year, Google Season of Docs provided direct grants to 11 open source projects to improve open source project documentation. Each organization also created a case study to help other open source projects learn from their experience.
In its final year, the Google Open Source Peer Bonus Program gave awards to 130 non-Alphabet contributors from the broader open source community representing 35 different countries.

Our open source work will continue to grow and evolve to support the changing needs of our communities. Thank you to our colleagues and community members who continue to dedicate personal and professional time supporting the open source ecosystem. Follow our work at opensource.google.

Rising to Meet the Security Challenge

The integrity of the open source software supply chain is essential for the entire ecosystem. In 2024, attacks on the software supply chain continued to increase. We have been working closely with the community and package managers who are rising to meet the challenge in response to this growing threat.

Our efforts have been focused on making it easier for developers to secure their software and for consumers to verify the integrity of the packages they use. A significant achievement in 2024 Googlers contributed to, was the integration of Sigstore into PyPI, the Python Package Index, a major step forward in securing the Python ecosystem. This is part of a broader movement to adopt cryptographic signing for all public packages.

Alongside these initiatives, we continue to support and contribute to efforts like SLSA (Supply chain Levels for Software Artifacts) to establish a common framework for ensuring supply chain security. We also invested in credential scanning across public package registries to help prevent accidental credential leaks, another common vector for attack.

Beyond securing individual packages, we're also focused on providing visibility into the security of running workloads. This year, we introduced new software supply chain security insights into the Google Kubernetes Engine (GKE) Security Posture dashboard. By surfacing these potential risks directly in the GKE dashboard, we empower teams to take immediate, actionable steps to strengthen their security posture from development through to production.

Securing the open source supply chain requires a collective effort, and we are committed to continuing our work with the community to build a more secure future for everyone.

Appendix: About this data

This report features metrics provided by many teams and programs across Alphabet. In regards to the code and code-adjacent activities data, we wanted to share more details about the derivation of those metrics.

Data sources: These data represent the activities of Alphabet employees on public repositories hosted on GitHub and our internal production Git service Git-on-Borg. These sources represent a subset of open source activity currently tracked by Google OSPO.
- GitHub: We continue to use GitHub Archive as the primary source for GitHub data, which is available as a public dataset on BigQuery. Alphabet activity within GitHub is identified by self-registered accounts, which we estimate underreports actual activity.
- Git-on-Borg: This is a Google managed git service which hosts some of our larger, long running open source projects such as Android and Chromium. While we continue to develop on this platform, most of our open source activity has moved to GitHub to increase exposure and encourage community growth.
Business and personal: Activity on GitHub reflects a mixture of Alphabet projects, third-party projects, experimental efforts, and personal projects. Our metrics report on all of the above unless otherwise specified.
Alphabet contributors: Please note that unless additional detail is specified, activity counts attributed to Alphabet open source contributors will include our full-time employees as well as our extended Alphabet community (temps, vendors, contractors, and interns). In 2024, full time employees at Alphabet represented more than 95% of our open source contributors.
GitHub Accounts: For counts of GitHub accounts not affiliated with Alphabet, we cannot assume that one account is equivalent to one person, as multiple accounts could be tied to one individual or bot account.
*Active counts: Where possible, we will show ‘active users' defined by logged activity (excluding ‘WatchEvent') within a specified timeframe (a month, year, etc.) and ‘active repositories' and ‘active projects' as those that have enough activity to meet our internal active-project criteria and have not been archived.

Special thanks

This post is a testament to the collaborative spirit across Google. We thank amanda casari, Anna Eilering, Erin McKean, Shane Glass, April Knassi, and Mary Radomile from the Open Source Programs Office; Andrew Helton and Christian Howard from Google Research; Gus Martins and Omar Sansiviero from the Gemma team; and Nicky Ringland for her contributions on open-source security. Our gratitude also goes out to all open-source contributors and maintainers, both inside and outside of Google.

Apache Iceberg 1.10: Maturing the V3 spec, the REST API and Google contributions

Wednesday, September 24, 2025

Apache Iceberg 1.10: Maturing the V3 spec, the REST API and Google contributions

by Talat Uyarer, BigQuery Core

The Apache Iceberg 1.10.0 release just dropped. I've been scrolling through the release notes and community analysis, and it's a dense, significant release. You can (and should) read the full release notes, but I want to pull out the "gradients" I see—the directions the community is pushing that signal what's next for the data lakehouse.

Next-Gen Engines Have Arrived

Let's jump straight to the headline news: next-generation engine support. Version 1.10.0 delivers deep, native optimizations for both Apache Spark and Apache Flink, ensuring Iceberg is ready for the future.

For Apache Spark users, the biggest news is full compatibility with the forthcoming Spark 4.0. The release also gets much smarter about table maintenance. The compute_partition_stats procedure now supports incremental refresh, eliminating wasteful recalculations by reusing existing stats and saving massive amounts of compute. For streaming, a critical fix for Spark Structured Streaming converts the maxRecordPerMicrobatch limit to a "soft cap," resolving a common production issue where a single large file could stall an entire data stream.

Apache Flink users get an equally massive upgrade with full Flink 2.0 support. This is accompanied by a new dynamic sink, which is a huge quality-of-life improvement. This feature dramatically streamlines streaming ingestion by automatically handling schema evolution from the input stream and propagating those changes directly to the Iceberg table. It even supports "fan-out" capabilities, letting it create new tables on the fly as new data types appear in the stream, removing a huge layer of operational friction.

Hardening the Core for Speed and Stability

Beyond the big engine updates, 1.10.0 is all about hardening the core for stability and speed. A key part of this is the growing adoption of Deletion Vectors. This V3 feature is now ready for prime time and radically improves the performance of row-level updates by avoiding costly read-modify-write operations.

Speaking of core logic, the compaction code for Spark and Flink has been refactored to share the same underlying logic. This is a fantastic sign of health—it means less duplicated effort, fewer divergent bugs, and a more stable core for everyone, regardless of your engine.

With deletion vectors leading the charge, the rest of the V3 spec is also moving from "on the horizon" to "ready to use." The spec itself is now officially "closed," and we're seeing its most powerful features land, like row lineage for fine-grained traceability and the new variant type for flexibly handling semi-structured data.

The REST Catalog is Ready for Prime Time

For me, the most significant strategic shift in this release is the battle-hardening of the REST Catalog. For years, the de facto standard was the clunky, monolithic Hive Metastore. The REST Catalog spec is the future — a simple, open HTTP protocol that finally decouples compute from metadata.

The 1.10.0 notes are full of REST improvements, but one is critical: a fix for the client that prevents retrying commits after 5xx server errors. This sounds boring, but it's not. When a commit call fails, it's impossible for the client to know if the mutation actually committed or not before the error. Retrying in that ambiguous state could lead to conflicting operations and potential table corruption. This fix is about making the REST standard stable enough for mission-critical production.

Google Cloud and the Open Lakehouse

This industry-wide standardization on a stable REST API is foundational to Google Cloud's BigLake strategy and where our new contributions come in. We're thrilled to have contributed two key features to the 1.10.0 release.

The first is native BigQuery Metastore Catalog support. This isn't just another Hive-compatible API; it's a native implementation that allows you to use the battle-tested, serverless, and globally-replicated BigQuery metadata service as your Iceberg catalog.

The second contribution is the new Google AuthManager. This plugs directly into the REST Catalog ecosystem, allowing Iceberg to authenticate using standard Google credentials. You can now point your open source Spark job (running on GKE, Dataproc, or anywhere) directly at your BigLake-managed tables via the open REST protocol, using standard Google auth.

This is the whole philosophy behind our BigLake REST Catalog. It's our fully-managed, open-standard implementation of the Iceberg REST protocol. This means you get a single source of truth, managing all your Iceberg tables with BigQuery's governance, fine-grained security, and metadata. It also means true interoperability, letting you use BigQuery to analyze your data, or open source Spark, Flink, and Trino to access the exact same tables via the open REST API. And critically, it means no lock-in—you're just talking to an open standard.

You can read more about our managed BigLake REST Catalog service here.

This Week in Open Source #10

Friday, September 19, 2025

This Week in Open Source for 09/19/2025

A look around the world of open source

by Daryl Ducharme & amanda casari, Google Open Source

As we enter the Autumn of 2025, AI is still on the top of everyone's mind in tech and the world of open source is no different. This week we delve into various facets of AI's impact on the open source world, from its presence in the Linux kernel and the need for official policy, to the discussion around copyright in AI and how it affects open source licenses. We also explore ways companies can actively support open source, the challenges federated networks like Mastodon face with age verification laws, and the emerging concept of spec-driven development with AI as a design tool.

Upcoming Events

September 23 - 27: Nerderarla 2025 is happening in Buenos Aires. It is a 100% free, world-class event in Latin America with high-quality content in science and technology.
September 29 - 30: Git Merge 2025 celebrates 20 years of Git in Sunnyvale, California.
October 2 - 3: Monktoberfest is happening in Portland, Maine. The only conference focused on how craft, technology and social come together. It's one of the most unique events in the industry.
October 12 - 14: All Things Open 2025 is happening in Raleigh, North Carolina. The largest open source / tech / web conference on the U.S. east coast will feature many talks, including some from 4 different Googlers on varying topics — from creating mentorship programs, to security, to kubernetes and how open source already has solutions to your data problems that you may be trying to solve with AI.

Open Source Reads and Links

[Article] AI is creeping into the Linux kernel - and official policy is needed ASAP - Coverage and larger discussion about genAI tooling and development on the Linux kernel. Kicked off from OSSNA 2025 talk from Sasha Levin.
[Blog] The copyright crisis in AI that open source can't ignore - Many AI companies have been ignoring copyright to create their AI models. This sets a precedent that could lead to ignoring of open source licenses. What can open source do? There are some AI companies still taking a principled stance. What should they do?
[Blog] 4 ways your company can support open source right now - Your company not only uses, but relies upon, open source software. Supporting open source software is an investment in your company and can give you a voice in the future of those projects. So here are four lenses to look through for how you can support the software that is supporting your company.
[Article] Mastodon says it doesn't 'have the means' to comply with age verification laws - Age verification laws are starting to affect social media use in different locales. For single entities that run social networks, you can create a single solution that is best for your company's values and hopefully your users. What about federated networks of varying sizes? How are they supposed to address these laws? This is both a sign of the problems with federated networks and the importance of them to keep the internet open.
[Blog] Spec-driven development with AI: Get started with a new open source toolkit - We're moving from "code is the source of truth" to "intent is the source of truth." With AI the specification becomes the source of truth and determines what gets built. Is this a new best practice for using AI as a design tool?

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account.

Introducing Kotlin FHIR: A new library to bring FHIR to Multiplatform

Tuesday, September 16, 2025

Build once, deploy everywhere: multiplatform FHIR development on Android, iOS, and Web

by Jing Tang, Google Research

The mission of Google's Open Health Stack team is to accelerate digital health innovation by providing developers everywhere with critical building blocks for next-generation healthcare applications. Expanding its existing components, the team has released Kotlin FHIR (currently in alpha), a new open-source library now available on GitHub. It implements the HL7® FHIR® data model on Kotlin Multiplatform (KMP), enabling developers to build FHIR apps and tools for Android, iOS, and Web simultaneously.

Demo app launched as Android, iOS, and Web applications

Tools to support health data exchange using a modern standard

HL7® FHIR® (Fast Healthcare Interoperability Resources) is a global interoperability standard for healthcare data exchange. It enables healthcare systems to exchange data freely and securely, improving efficiency and transparency while reducing integration costs. Over the years, it has seen rapidly growing adoption, and its use has been mandated by health regulations in an increasing number of countries.

Since March 2023, the Open Health Stack team at Google has introduced a number of tools to support FHIR development. For example, the Android FHIR SDK helps developers to build offline capable FHIR-native apps that can help community health workers carry out data collection tasks in remote communities. With FHIR Data Pipes, developers can build analytics solutions more easily to generate critical insights for large healthcare programmes more easily. Today, apps powered by these tools are used by health workers covering over 75 million people across Sub-Saharan Africa, South and Southeast Asia.

A leap forward to multiplatform development

In low-resource settings, it is imperative to develop apps that can reach as many patients as possible at a low development cost. However, a lack of infrastructure and tooling often hinders this goal. For example, Kotlin Multiplatform (KMP) is a new and exciting technology rapidly gaining traction, but existing FHIR libraries are not suitable for KMP development due to their platform-specific dependencies. Consequently, developing FHIR apps on KMP has not been possible, causing developers to miss out on a significant opportunity to scale their solutions.

Introducing Kotlin FHIR. It is a modern and lightweight implementation of FHIR data models designed for use on KMP with no platform-specific dependencies. With Kotlin FHIR, developers can build FHIR apps once, and deploy them to Android, iOS, Web, and other platforms.

"Any library that helps implementers use FHIR is my favourite, but I'm particularly thrilled to see a new library from the awesome Open Health Stack team.

– Grahame Grieve, Creator of FHIR, Product Director at HL7

Modern, lightweight, and sustainable

Kotlin FHIR uses KotlinPoet to generate the FHIR data model directly from the specification. This ensures that the library is complete and maintainable. The data model classes it generates are minimalist to provide the best usability for developers: it has everything that you need, nothing less and nothing more. It uses modern language features in Kotlin such as sealed interfaces to ensure type-safety and to give developers the best coding experience. It supports all the FHIR versions: R4, R4B and R5, and will be updated when new FHIR versions are released.

The library is currently in alpha, but has received positive feedback from the developer community. To try FHIR multiplatform development using the library, head to the repository.

Beyond data models: more on multiplatform

Our mission is to empower the digital health community, and the Kotlin FHIR library is our latest step in that effort. But handling the FHIR data model on KMP is just the beginning. Rich features provided by the Android FHIR SDK libraries will also be needed on KMP. This is a collaborative effort, and we invite the FHIR community to join us in defining and building the cross-platform tools you need most. To learn more about how you can get involved, head to the Open Health Stack developer site.

Kubernetes 1.34 is available on GKE!

Wednesday, September 10, 2025

by Benjamin Elder & Pradeep Varadharajan, Google Kubernetes Engine

Kubernetes 1.34 is now available in the Google Kubernetes Engine (GKE) Rapid Channel in just 5 days after OSS release! For more information about the content of Kubernetes 1.34, read the official Kubernetes 1.34 Release Notes and the specific GKE 1.34 Release Notes.

Kubernetes Enhancements in 1.34:

The Kubernetes 1.34 release, themed 'Of Wind & Will', symbolizing the winds that shaped the platform, delivers a fresh gust of enhancements. These updates, shaped by both ambitious goals and the steady effort of contributors, continue to propel Kubernetes forward.

Below are some of the Kubernetes 1.34 features that you can use today in production GKE clusters.

DRA Goes GA

The Kubernetes Dynamic Resource Allocation (DRA) APIs are now GA. This is a huge step in the evolution of Kubernetes to stay the undisputed platform for AI/ML workloads. DRA improves Kubernetes' ability to select, configure, allocate, and share GPUs, TPUs, NICs and other specialized hardware. For more information about using DRA in GKE, see About dynamic resource allocation in GKE. You can use DRA now with self-installed drivers and can expect more improvements in upcoming releases.

The Prioritized list and Admin access features have been promoted to beta and will be enabled by default, and the kubelet API has been updated to report status on resources allocated through DRA.

KYAML

We've all been there: a stray space or an unquoted string in a YAML file leads to frustrating debugging sessions. The infamous "Norway Bug" is a classic example of how YAML's flexibility can sometimes be a double-edged sword. 1.34 introduces support for KYAML, a safer and less ambiguous subset of YAML, specifically designed for Kubernetes and helps avoid these common pitfalls.

KYAML is fully compatible with existing YAML parsers but enforces stricter rules making your configurations more predictable and less prone to whitespace errors. This is a game-changer for anyone using templating tools like Helm, where managing indentation can be a headache.

To start using KYAML, simply update your client to 1.34+ and set the environment variable KUBECTL_KYAML=true to enable use of -o kyaml. For more details, check out KEP-5925.

Pod-level resource requests and limits

With the promotion of Pod-level resource requests and limits to beta (and on-by-default), you can now define resource requests and limits at the pod level instead of the container level. This simplifies resource allocation, especially for multi-container Pods, by allowing you to set a total resource budget that all containers within the Pod share. When both pod-level and container-level resources are defined, the pod-level settings take precedence, giving you a clear and straightforward way to manage your Pod's resource consumption.

Improved Traffic Distribution for Services

The existing PreferClose setting for traffic distribution in Services has been a source of ambiguity. To provide clearer and more precise control over how traffic is routed, KEP-3015 deprecates PreferClose and introduces two new, more explicit values:

PreferSameZone is equivalent to the existing PreferClose.
PreferSameNode prioritizes sending traffic to endpoints on the same node as the client. This is particularly useful for scenarios like node-local DNS caches, where you want to minimize latency by keeping traffic on the same node whenever possible.

This feature is now beta in 1.34, with its feature gate enabled by default.

Ordered Namespace Deletion for Enhanced Security

When a namespace is deleted, the order in which its resources are terminated has, until now, been unpredictable. This can lead to security flaws, , such as a NetworkPolicy being removed before the Pods it was protecting, leaving them temporarily exposed. With this enhancement, Kubernetes introduces a structured deletion process for namespaces, ensuring secure and predictable resource removal by enforcing a deletion order that respects dependencies, removing Pods before other resources.
This feature was introduced in Kubernetes v1.33 and became stable in v1.34.

Graceful Shutdowns Made Easy

Ensuring a graceful shutdown for your applications is crucial for zero-downtime deployments. Kubernetes v1.29 introduced a "Sleep" for containers' PreStop and PostStart lifecycle hooks, offering a simple approach to managing graceful shutdowns. This feature allows a container to wait for the specified duration before it's terminated, giving it time to finish in-flight requests and ensuring a clean handoff during rolling updates.
Note: Specifying a negative or zero sleep duration will result in an immediate return, effectively acting as a no-op (added in v1.32).
This feature graduated to stable in v1.34.

Streaming List Responses

Large Kubernetes clusters can push the API server to its limits when dealing with large LIST responses that can consume gigabytes of memory. Streaming list responses address this by changing how the API server handles these requests.

Instead of buffering the entire list in memory, it streams the response object by object, improving performance and substantially reducing memory pressure on the API server. This feature is now GA and is automatically enabled for JSON and Protobuf responses with no client-side changes.

Resilient Watch Cache Initialization

The watch caching layer in the Kubernetes apiserver maintains an eventually consistent cache of cluster state. However, if it needs to be re-initialized, it can potentially lead to a thundering herd of requests that can overload the entire control plane. The Resilient Watch Cache Initialization feature, now stable, ensures clients and controllers can reliably establish watches.

Previously, when the watch cache was initializing, incoming watch and list requests would hang, consuming resources and potentially starving the API server. With this enhancement, such requests are now intelligently handled: watches and most list requests are rejected with a 429, signaling clients to back off, while simpler get requests are delegated directly to etcd.

In-Place Pod Resize Gets Even Better

In-place pod resize, which allows you to change a Pod's resource allocation without a disruptive restart, remains in Beta, but continues to improve in v1.34. You can now decrease memory limits with a best-effort protection against triggering the OOM killer. Additionally, resizes are now prioritized, and retrying deferred resizes is more responsive to resources being released. A ResizeCompleted event provides a clear signal when a resize completes, and includes a summary of the new resource requirements.

MutatingAdmissionPolicy Gets to Beta

MutatingAdmissionPolicy, working as a declarative, in-process alternative to mutating admission webhooks, goes to Beta in Kubernetes 1.34.

Mutating admission policies use the Common Expression Language (CEL) to declare mutations to resources. Mutations can be defined either with an apply configuration that is merged using the server side apply merge strategy, or a JSON patch. This feature is highly configurable, enabling policy authors to define policies that can be parameterized and scoped to resources as needed by cluster administrators.

Acknowledgements

As always, we want to thank all the Googlers that provide their time, passion, talent and leadership to keep making Kubernetes the best container orchestration platform. We would like to mention especially Googlers who helped drive some of the open source features mentioned in this blog: Tim Allclair, Natasha Sarkar, Jordan Liggitt, Marek Siarkowicz, Wojciech Tyczyński, Tim Hockin, Benjamin Elder, Antonio Ojea, Gaurav Ghildiyal, Rob Scott, John Belamaric, Morten Torkildsen, Yu Liao, Cici Huang, Joe Betz, and Dixita (Dixi) Narang.

And thank the many Googlers who helped bring 1.34 to GKE!

This Week in Open Source #9

Friday, September 5, 2025

This Week in Open Source for 09/05/2025

A look around the world of open source
by Daryl Ducharme, amanda casari & Shane Glass, Google Open Source

Upcoming Events

September 5-7: NixCon 2025 is happening in Switzerland. It is the annual conference for the Nix and NixOS community where Nix enthusiasts learn, share, and connect with others.
September 9: Kubernetes Community Day 2025 SF Bay Area event, the ultimate gathering for cloud native enthusiasts! This full-day event, sponsored by the Cloud Native Computing Foundation (CNCF), is packed with insightful cloud native talks and unparalleled opportunities for community networking.
September 11 - 14: ASF Community over Code is happening in Minneapolis, Minnesota. It is for ASF members, committers, and open source developers from around the world, focusing on Search, Big Data, Internet of Things, Community, Geospatial, Financial Tech, and many other topics. Google Open Source's own Stephanie Taylor will be giving a talk on cultivating contributors through mentorship.
September 12 - 16: PyCon AU 2025 is happening in Narrm/Melbourne. It is the national conference for the Python programming community, bringing together professional, student and enthusiast developers, sysadmins and operations folk, students, educators, scientists, statisticians, and many others besides, all with a love for working with Python.
September 23 - 27: Nerderarla 2025 is happening in Buenos Aires. It is a 100% free, world-class event in Latin America with high-quality content in science and technology.
September 29 - 30: Git Merge 2025 celebrates 20 years of Git in Sunnyvale, California.
October 2 - 3: Monktoberfest is happening in Portland, Maine. The only conference focused on how craft, technology and social come together. It's one of the most unique events in the industry.

Open Source Reads and Links

[Blog Post] Gemini CLI comes to Zed - Built to be extensible, the Gemini CLI is showing itself as a boon to developers and contributors are giving it many new features. Open source itself, it is a perfect fit for the open source code editor Zed.
[Article] Why Morgan Stanley open sourced its app development tool - Compliance and security are huge issues in the financial sector. Even though open source has historically been a hard sell in that sector, Morgan Stanley saw the value in opening up their Common Architecture Language Model (CALM) and a desire for industry-wide standards helped it gain traction.
[Blog Post] What the EU's Cyber Resilience Act Means for Open Source - During the Open Source Summits in both North America and Europe, there were many discussions about what needs to happen amongst open source maintainers and enterprises with regards to the CRA.
[Blog Post] The Cyber Resilience Act: A Five Alarm Fire - On paper, the CRA makes sense. The world of software, after all, cannot be built indefinitely on a foundation that includes Munroe's "project thanklessly maintained by a single developer from Nebraska since 2003" as a load bearing component. Change was and is necessary, as are new incentives – and penalties.The question isn't, therefore, whether or not something like the CRA is necessary and inevitable. The question is whether or not the CRA as it is written today is the appropriate tool for the job. And after multiple briefings on the subject, it seems safe to say that the jury is still very much out on that subject.
[Blog Post] How much energy does Google's AI use? We did the math. - Concerned about the energy use of AI? The open access technical paper titled Measuring the environmental impact of delivering AI at Google Scale is available.

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account.

This Week in Open Source #8

Friday, August 15, 2025

This Week in Open Source for 08/15/2025

A look around the world of open source
by Daryl Ducharme & amanda casari, Google Open Source

Upcoming Events

August 14-16: Open Source Festival 2025 (OSCAFest'25) is happening in Lagos, Nigeria. It uses community to help integrate the act of open source contribution to African developers whilst strongly advocating the movement of free and open source software.
August 25-27: Open Source Summit Europe (OSSEU) is happening in Amsterdam, Netherlands. It is the premier event for the open source community to collaborate, share information, solve problems, and gain knowledge, furthering open source innovation and ensuring a sustainable open source ecosystem. Many Googlers will be there giving talks along with so many others.
September 5-7: NixCon 2025 is happening in Switzerland. It is the annual conference for the Nix and NixOS community where Nix enthusiasts learn, share, and connect with others.
September 9: Kubernetes Community Day 2025 SF Bay Area event, the ultimate gathering for cloud native enthusiasts! This full-day event, sponsored by the Cloud Native Computing Foundation (CNCF), is packed with insightful cloud native talks and unparalleled opportunities for community networking.
September 12 - 16: PyCon AU 2025 is happening in Narrm/Melbourne. It is the national conference for the Python programming community, bringing together professional, student and enthusiast developers, sysadmins and operations folk, students, educators, scientists, statisticians, and many others besides, all with a love for working with Python.

Open Source Reads and Links

[Article] Google Brings the A2A Protocol to More of Its Cloud - Last month, Google transferred the A2A protocol to the Linux Foundation and we are still continuing to improve it. Be it updating the spec, integrating it into Cloud Run and GKE we are still happy to see excitement about the future of this protocol.
[Book] OSPO Book - Open Source Programs Offices are an important part of connecting open source communities to your company (if we do say so ourselves). If you are an open source enthusiast who thinks they can start one in their company, here is a good guide from CNCF. There's also a github repo for it.
[Analysis] The RedMonk Programming Language Rankings: January 2025 - Redmonk's regular analysis of programming languages. Trends are remaining mostly steady across languages, which is an interesting trend of itself!
[Blog] One Event at a Time: Funding Your Community the Realistic Way - Great writeup, from a PSF Board member, advising event organizers in the Python community on developing responsible and sustainable funding plans for their community events.
Python Software Foundation News: The PSF has paused our Grants Program - The PSF is temporarily pausing their Grants Program after reaching their 2025 grant budget cap earlier than expected. While they know how important this program is to many in the community, this is a necessary step to protect both the future of the program and the short- and long-term sustainability of the PSF. (If this moves you immediately to donate to the PSF, we welcome your contributions via our donations page).

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account.

Google Summer of Code 2025: Contributor Statistics

Thursday, August 14, 2025

The Numbers Are In: A Deep Dive into GSoC 2025 Stats

Google Summer of Code (GSoC) is an online global program that introduces students and beginner developers to open source software development. For our 21st year of the program we welcomed 1280 Contributors from 68 countries who are coding for 185 Mentoring Organizations.

With the coding period starting June 2nd, GSoC contributors are focused on their 2025 projects alongside their Mentors and the thriving open source communities they are working with. We are excited to share some statistics about the accepted contributors in this year's program.

Accepted GSoC Contributors

92.32% are participating in their first GSoC
43.04% had not contributed to open source before GSoC 2025
89.02% are enrolled in an academic program

An infographic titled Google Summer of Code 2025: The numbers are in!. The image provides the following statistics:

Proposals: 23,000+ proposals were received from 15,000+ individual applicants, representing 130 countries.

Applicants: Over 96% of applicants were applying to GSoC for the first time.

Contributors: 89% of GSoC 2025 contributors are enrolled in an academic program.

Mentorship: The program has 2,100+ mentors from 75 countries and involves 185 open-source organizations.

Project Size: A bar chart shows the project size distribution:

Large (~350 hours): 54%

Medium (~175 hours): 42%

Small (~90 hours): 4%

Projects

53.68% of projects were large (~350 hours), 41.54% medium (~175 hours), 4% (~90 hour) projects
Currently, 77.9% of projects are the standard 12 weeks in length, with 18.3% extending their projects between 14-22 weeks.

Proposals

We got a whopping 15,240 applicants submitting proposals (an increase of 130% of our previous high - a new record!) from 130 countries. These folks submitted 23,559 proposals, a 159% increase over last year!

96.55% applied to GSoC for the first time in 2025

Registrations

We had a record 98,698 people registering from 172 countries for the 2025 program, an increase of 124.4% over the previous high.

Mentors

This summer, 185 open-source organizations are participating in GSoC. Their projects are supported by over 2,100 mentors from 75 countries. These dedicated volunteers guide new contributors, helping them hone their skills.

Many of these mentors are highly experienced. Almost two-thirds have mentored GSoC contributors for four or more years.

A big thank you for being part of this wonderful community and for helping to spread the word about GSoC, which offers an invaluable opportunity for all the individuals beginning their journey in Open Source. We'll keep you updated with future entries about GSoC 2025, stay tuned!

–by Stephanie Taylor, Mary Radomile & Lucila Ortiz, Google Open Source Team

This Week in Open Source #7

Friday, August 8, 2025

This Week in Open Source for 08/08/2025

A look around the world of open source
by Daryl Ducharme, Google Open Source

Upcoming Events

August 14-16: Open Source Festival 2025 (OSCAFest'25) is happening in Lagos, Nigeria. It uses community to help integrate the act of open source contribution to African developers whilst strongly advocating the movement of free and open source software.
August 25-27: Open Source Summit Europe (OSSEU) is happening in Amsterdam, Netherlands. It is the premier event for the open source community to collaborate, share information, solve problems, and gain knowledge, furthering open source innovation and ensuring a sustainable open source ecosystem. Many Googlers will be there giving talks along with so many others.
September 5-7: NixCon 2025 is happening in Switzerland. It is the annual conference for the Nix and NixOS community where Nix enthusiasts learn, share, and connect with others.

Open Source Reads and Links

The Asymmetry of Open Source - Open source software projects need funding, but users are not obligated to pay for them. Companies should invest in open source to maintain quality and avoid issues, while hobbyists can contribute without financial pressure. Proper boundaries and mutual responsibility between companies and developers are essential for a healthy open source ecosystem. How do we find and set those boundaries?
Linux Foundation Announces Intent to Form Developer Relations Foundation - The Linux Foundation has created the Developer Relations Foundation which aims to unify best practices and enhance the role of developer relations in technology. The DRF will focus on collaboration and shared knowledge. Having an open source organization behind this, helps to make sure DevRel is always of service to developers along with whoever is employing them.
5 tips to get started on accessibility - Not exactly open source and yet super important. So important to the open source community that All Things Open posted it on their site. Accessibility (A11y) is always useful. The more it gets used properly, the more useful it is for everyone.
Bringing open source development to Trust and Safety - Ever open source champion, former Googler and now COO at Roost, Anne Bertucio discusses how some teams still have a difficult time understanding open source. The standards that they are used to don't always occur within the transparent world of open source. This means, bringing open source to those teams requires understanding where they are coming from and discussing its limitations as well as its benefits.
How we made JSON.stringify more than twice as fast - One of the beautiful things about open source is the transparency in projects. Google's Chromium V8 engine is no exception. This walk through of the technical structuring that led to a faster JSON.stringify is a great way to learn some approaches to solving software bottlenecks that you may not have thought of. With it being open source, you can also visit the repository and follow along with the history of these code changes.

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account.

What's new in Apache Iceberg v3?

Thursday, August 7, 2025

A Deeper Dive into Apache Iceberg V3: How New Designs Are Solving Core Data Lake Challenges

The Next Chapter for Apache Iceberg: Welcoming the Iceberg V3 Spec
by Talat Uyarer, BigQuery Managed Iceberg & Shane Glass, Google Open Source Programs Office

The data community has long grappled with the challenge of how to bring database-like agility to petabyte-scale datasets stored in open cloud storage. The trade-off has often been between the scalability of data lakes and the performance and ease-of-use of traditional data warehouses. Executing fine-grained updates or evolving table schemas on massive tables often required slow, expensive, and disruptive operations.

The Apache Iceberg project is taking on this challenge. Early versions introduced a revolutionary metadata layer that brought reliability and ACID transactions to data lakes. However, certain operations still presented performance bottlenecks at scale.

With the ratification of the V3 specification, the Apache Iceberg community has introduced new designs that directly address these core issues. These advancements represent a significant leap forward in the mission to build an open and high-performance data lakehouse architecture. Let's explore the technical details of these solutions.

More Efficient Row-Level Transactions with Deletion Vectors

A primary challenge for data lakes has been handling row-level deletes efficiently. Previous approaches, like positional delete files, were a clever solution but could lead to performance degradation at query time when a reader had to reconcile many small delete files against large data files.

The Iceberg V3 spec introduces binary deletion vectors, a more performant and scalable architecture. The core idea is to attach a bitmap to each data file, where each bit corresponds to a row, marking it as deleted or not.

When a query engine reads a data file, it also reads its corresponding deletion vector. As it scans rows, it can check the bitmap with minimal overhead and skip rows marked for deletion. This design is made exceptionally efficient through the use of Roaring bitmaps. This data structure is ideal for this task because it can compress sparse sets of integers—like the positions of deleted rows—into a tiny footprint.

The practical difference is profound:

Previous Model (Positional Deletes): A query might involve reading a central log of deletes, like deletes.avro, containing tuples of (file_path, row_position).
V3 Model (Deletion Vectors): Each data file (e.g., file_A.parquet) is paired with a small, efficient sidecar file (e.g., file_A.puffin) containing a Roaring bitmap of its deleted rows.

This change localizes delete information, streamlines the read path, and dramatically improves the performance of workloads that rely on frequent Change Data Capture (CDC) or row-level updates.

Simplified Schema Evolution with Default Column Values

Another common operational hurdle in managing large tables has been schema evolution. Adding a column to a table with billions of rows traditionally required a "backfill"—a costly and time-consuming job to rewrite all existing data files to add the new column.

Iceberg V3 eliminates this friction with default column values. This feature allows a default value to be specified directly in the table's metadata when a column is added.

ALTER TABLE events ADD COLUMN version INT DEFAULT 1;

This operation is instantaneous because it only modifies metadata. No data files are touched. When a query engine encounters an older data file without the version column, it consults the table schema, finds the default value, and seamlessly populates it in the query results on the fly. This simple but powerful mechanism makes schema evolution a fast, non-disruptive operation, allowing data models to evolve quickly.

Improved Query Engine Compatibility with Enhanced Data Types and Lineage

Beyond these headline features, V3 broadens the capabilities of Iceberg to support more advanced use cases:

Row-Level Lineage: For robust auditing and reliable CDC pipelines, V3 formalizes the tracking of row history. By embedding metadata about when a row was added or last modified, Iceberg tables can now provide a clear lineage, simplifying data governance and enabling more efficient downstream data replication.
Rich Data Types: V3 closes the gap with traditional databases by introducing a more expressive type system. This includes a VARIANT type for handling semi-structured data like JSON, native GEOMETRY and GEOGRAPHY types for advanced geospatial analysis, support for nanosecond-precision timestamps with the new timestamp_ns and timestamptz_ns data types, a significant increase from the previous microsecond limit.

Building the Future of the Open Data Lakehouse

These V3 features—deletion vectors, default values, row lineage, and richer types—are more than just individual improvements. Together, they represent a cohesive step toward a new paradigm where the lines between the data lake and the data warehouse are erased. They enable faster, more efficient, and more flexible data operations than previously thought possible.

This progress is a testament to the collaborative spirit of the Apache Iceberg community. At Google, we are proud to contribute to and support open-source projects like Iceberg that are defining the future of data architecture. We are excited to see the innovative applications the community will build on this powerful new foundation.

Want to get started with Iceberg? Check out this blog post to learn more about how Google Cloud's managed Iceberg offering, BigLake tables for Apache Iceberg in BigQuery, makes building Iceberg-native lakehouses easier by maximizing performance without sacrificing governance.