opensource.google.com

Menu

This Week in Open Source #13

Friday, January 23, 2026

This Week in Open Source for January 23, 2026

A look around the world of open source

Can you believe we're already wrapping up the first month of the year? January is coming to a close. The open source ecosystem is buzzing with activity, from the upcoming community gatherings at FOSDEM in Brussels to new conversations around AI standards and cloud flexibility.

Google Open Source believes that "a community is a garden, not a building". It requires constant tending to thrive. This week, we're looking at how we can all contribute to that growth—whether it's by securing the software supply chain, standardizing AI agents, or simply learning from the legends of our field like Linus Torvalds.

Dive in to see what's happening this week in open source!

Upcoming Events

  • January 29: CHAOSScon Europe 2026 is co-located with FOSDEM in Brussels, Belgium. This conference revolves around discussing open source project health, CHAOSS updates, use cases, and hands-on workshops for developers, community managers, project managers, and anyone interested in measuring open source project health. It also shares insights from the CHAOSS context working groups including OSPOs, University Open Source, and Open Source in Science and Research.
  • January 31 - February 1: FOSDEM 2026 is happening at the Université Libre de Bruxelles in Brussels, Belgium. It is a free event for software developers to meet, share ideas and collaborate. Every year, thousands of developers of free and open source software from all over the world gather at the event in Brussels.
  • February 24 - 25: The Linux Foundation Member Summit is happening in Napa, California. It is the annual gathering for Linux Foundation members that fosters collaboration, innovation, and partnerships among the leading projects and organizations working to drive digital transformation with open source technologies.
  • March 5 - 8: SCALE 23x is happening in Pasadena, California. It is North America's largest community-run open source conference and includes four days of sessions, workshops, and community activities focused on open source, security, DevOps, cloud native, and more.
  • March 9 - 10: FOSSASIA Summit 2026 is happening in Bangkok, Thailand. It will be a two-day hybrid event that showcases the latest in open technologies, fostering collaboration across enterprises, developers, educators, and communities.

Open Source Reads and Links

  • [Article] The state of trusted open source - This review of the state of trusted open source report goes over many statistics. One of the interesting ones is that vulnerabilities most often hide in the smaller dependencies of the larger projects we might be focused on. What does this mean for your approach to security? How should various open source communities deal with this?
  • [Blog] Software Heritage Archive recognized as a digital public good - As the Software Heritage Archive celebrates its 10th anniversary, the Archive has scaled to protect over 27 billion unique source files, even solving the "2PB problem" by deploying protocols that compressed 78TB of graph data into a 3TB research dataset. This ensures that humanity's executable history remains a global commons rather than a proprietary secret, aligning with our belief at Google that Code is for today, Open Source is forever.
  • [Blog] Agent Definition Language: The open standard AI agents have been missing - The Agent Definition Language (ADL) creates a clear, shared way to describe AI agents so they work well across different systems. This helps teams understand what agents do, how they behave, and how to govern them safely. As an open and standard, ADL makes AI agents easier to build, review, and share in the open-source community.
  • [Blog] AI Agent Engineering in Go with the Google ADK - AI, agents, and the related protocols touch on many open source projects. This post gives you a technical hands on with the Agent Starter Pack. By following it you'll learn how to build, test, and securely deploy a Go AI agent using Google Cloud services.
  • [Article] How Kubernetes Broke the AWS Cloud Monopoly - Before Kubernetes, companies felt locked into AWS because of its unique APIs. Kubernetes allowed apps to run on any cloud, giving users more choice and helping other cloud providers grow. This has made multi-cloud the way forward for many enterprises. Are you utilizing a multi-cloud strategy? Has Kubernetes helped you get there?
  • [Article] Even Linux Creator Linus Torvalds is Using AI to Code in 2026 - Opinions vary on where and whether AI is useful in various areas. One place that it has shown the greatest benefit is in as a tool for writing code. It seems Linus Torvalds has started to use it to assist with part of his AudioNoise side project. What a good way to find out how best AI can work for oneself. How have you been using AI with your code?

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account or our new @opensource.google Bluesky account.

A JSON schema package for Go

Wednesday, January 21, 2026

JSON Schema is a specification for describing JSON values that has become a critical part of

LLM infrastructure. We recently released github.com/google/jsonschema-go/jsonschema, a comprehensive JSON Schema package for Go. We use it in the official Go SDK for MCP and expect it to become the canonical JSON Schema package for Google's Go SDKs that work with LLMs.

JSON Schema has been around for many years. Why are we doing this now, and what do LLMs have to do with it?

JSON is a flexible way to describe values. A JSON value can be null, a string, a number, a boolean, a list of values, or a mapping from strings to values. In programming language terms, JSON is dynamically typed. For example, a JSON array can contain a mix of strings, numbers, or any other JSON value. That flexibility can be quite powerful, but sometimes it's useful to constrain it. Think of JSON Schema as a type system for JSON, although its expressiveness goes well beyond typical type systems. You can write a JSON schema that requires all array elements to be strings, as you could in a typical programming language type system, but you can also constrain the length of the array or insist that its first three elements are strings of length at least five while the remaining elements are numbers.

The ability to describe the shape of JSON values like that has always been useful, but it is vital when trying to coax JSON values out of LLMs, whose output is notoriously hard to constrain. JSON Schema provides an expressive and precise way to tell an LLM how its JSON output should look. That's particularly useful for generating inputs to tools, which are usually ordinary functions with precise requirements on their input. It also turns out to be useful to describe a tool's output to the LLM. So frameworks like MCP use JSON Schema to specify both the inputs to and outputs from tools. JSON Schema has become the lingua franca for defining structured interactions with LLMs.

Requirements for a JSON Schema package

Before writing our own package, we took a careful look at the existing JSON Schema packages; we didn't want to reinvent the wheel. But we couldn't find one that had all the features that we felt were important:

  1. Schema creation: A clear, easy-to-use Go API to build schemas in code.
  2. Serialization: A way to convert a schema to and from its JSON representation.
  3. Validation: A way to check whether a given JSON value conforms to a schema.
  4. Inference: A way to generate a JSON Schema from an existing Go type.

We looked at the following packages:

It didn't seem feasible to cobble together what we needed from multiple packages, so we decided to write our own.

A Tour of jsonschema-go

A Simple, open Schema struct

At the core of the package is a straightforward Go struct that directly represents the JSON Schema specification. This open design means you can create complex schemas by writing a struct literal:

var schema = &jsonschema.Schema{
  Type:        "object",
  Description: "A simple person schema",
  Properties: map[string]*jsonschema.Schema{
    "name": {Type: "string"},
    "age": {Type: "integer", Minimum: jsonschema.Ptr(0.0)},
  },
  Required: []string{"name"},
}

A Schema will marshal to a valid JSON value representing the schema, and any JSON value representing a schema can be unmarshalled into a Schema.

The Schema struct defines fields for all standard JSON Schema keywords that are defined in popular specification drafts. To handle additional keywords not present in the specification, Schema includes an Extra field of type map[string]any.

Validation and resolution

Before using a schema to validate JSON values, the schema itself must be validated, and its references to other schemas must be followed so that those schemas can themselves be checked. We call this process resolution. Calling Resolve on a Schema returns a jsonschema.Resolved, an opaque representation of a valid schema optimized for validation. Resolved.Validate accepts almost any value that can be obtained from calling json.Umarshal: null, basic types like strings and numbers, []any, and map[string]any. It returns an error describing all the ways in which the value fails to satisfy the schema.

rs, err := schema.Resolve(nil)
if err != nil {
  return err
}
err = rs.Validate(map[string]any{"name": "John Doe", "age": 20})
if err != nil {
  fmt.Printf("validation failed: %v\n", err)
}

Originally, Validate accepted a Go struct. We removed that feature because it is not possible to validate some schemas against a struct. For example, If a struct field has a non-pointer type, there is no way to determine whether the corresponding key was present in the original JSON, so there is no way to enforce the required keyword.

Inference from Go types

While it's always possible to create a schema by constructing a Schema value, it's often convenient to create one from a Go value, typically a struct. This operation, which we call inference, is provided by the functions For and ForType. Here is For in action:

type Person struct {
    Name string `json:"name" jsonschema:"person's full name"`
    Age int `json:"age,omitzero"`
}

schema, err := jsonschema.For[Person](nil)

/* schema is:
{
    "type": "object",
    "required": ["name"],
    "properties": {
        "age":  {"type": "integer"},
        "name": {
            "type": "string",
            "description": "person's full name"
        }
    },
    "additionalProperties": false
}
*/

For gets information from struct field tags. As this example shows, it uses the name in the json tag as the property name, and interprets omitzero or omitempty to mean that a field is optional. It also looks for a jsonschema tag to get property descriptions. (We considered adding support for other keywords to the jsonschema tag as some other packages do, but that quickly gets complicated. We left an escape hatch in case we decide to support other keywords in the future.)

ForType works the same way, but takes a reflect.Type. It's useful when the type is known only at runtime.

A foundation for the Go community

By providing a high-quality JSON Schema package, we aim to strengthen the entire Go ecosystem for AI applications (and, indeed, any application that needs to validate JSON). This library is already a critical dependency for Google's own AI SDKs, and we're committed to its long-term health. We welcome external contributions, whether they are bug reports, bug fixes, performance enhancements, or support for additional JSON Schema drafts. Before beginning work, please file an issue on our issue tracker.

Mentor Org Applications for Google Summer of Code 2026 open through Feb 3

Monday, January 19, 2026

Attention open source enthusiasts! Mentoring organization applications for Google Summer of Code (GSoC) 2026 are officially open. This is your opportunity to guide students and developers early in their careers. The application window begins today, Monday, January 19th, and will remain open until February 3, 2026, at 18:00 UTC.

To find more information about the process of becoming a mentor organization, please review our official GSoC site. We also recommend consulting the Mentor Guide and the GSoC Organization Admin Tips, as both provide tips for preparing your community and strengthening your application.

GSoC welcomes a wide variety of open source projects working in AI/ML, security, cloud, development tools, science, medicine, data, media, and more! For 2026, we are looking for even more innovative projects working on Artificial Intelligence/Machine Learning and Security.

Requirements for GSoC Mentoring Organizations:

  1. An established open source project with at least 18 months of history
  2. Software produced and released under an Open Source Initiative (OSI)-approved license
  3. A robust community with members who are enthusiastic and prepared to mentor GSoC participants
  4. An active project characterized by regular engagement, rather than infrequent contributions
  5. A comprehensive list of Project Ideas (refer to the mentor guide for best practices)
  6. A clear grasp of GSoC objectives and program rules
  7. A high-quality application that provides a detailed explanation of your project and its specific goals

2026 Mentoring Organizations will be announced on February 19, 18:00 UTC*.

For first-time organizations interested in participating, we strongly suggest getting a referral from experienced organizations that think that your project is a good fit.

Google Summer of Code: Organizations Apply

Please visit the GSoC site for even more information on how to apply and review the detailed timeline for important deadlines this year. We recommend reading our help page on our website for easy access to all the most important resources for all the applicants.

We look forward to seeing your organization applications and learning more about your communities!

*Interested GSoC Contributor? After mentoring organizations are announced you can (and should!) begin researching each organization and reviewing project ideas to find the community that fits your interests. GSoC contributor applications are open from March 16-31.

Explore public datasets with Apache Iceberg & BigLake

Wednesday, January 14, 2026

A vintage-style illustration titled THE PUBLIC DATASETS OF APACHE ICEBERG shows a man in a boat named BigLake Explorer viewing a large iceberg.

The promise of the Open Data Lakehouse is simple: your data should not be locked into a single engine. It should be accessible, interoperable, and built on open standards. Today, we are taking a major step forward in making that promise a reality for developers, data engineers, and researchers everywhere.

We are thrilled to announce the availability of high-quality Public Datasets served via the Apache Iceberg REST Catalog. Hosted on Google Cloud's BigLake, these datasets are available for read-only access to anyone with a Google Cloud account.

Whether you are using Apache Spark, Trino, Flink, or BigQuery, you can now connect to a live, production-grade Iceberg Catalog and start querying data immediately. No copying files, no managing storage bucket. Just configure your catalog and query.

How to Access Public Datasets

This initiative is designed to be engine-agnostic. We provide the storage and the catalog and you bring the compute. This allows you to benchmark different engines, test new Iceberg features, or simply explore interesting data without setting up infrastructure or finding data to ingest.

How to Connect with Apache Spark

You can connect to the public dataset using any standard Spark environment (local, Google Cloud Dataproc, or other vendors). You only need to point your Iceberg catalog configuration to our public REST endpoint.

Prerequisites:

  • A Google Cloud Project (for authentication).
  • Standard Google Application Default Credentials (ADC) set up in your environment.

Spark Configuration:

Use the following configuration flags when starting your Spark Shell or SQL session. This configures a catalog named bqms (BigQuery Metastore) pointing to our public REST endpoint.

PROJECT_ID=<YOUR_PROJECT_ID>

  spark-sql \
    --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-gcp-bundle:1.10.0 \
    --conf spark.hadoop.hive.cli.print.header=true \
    --conf spark.sql.catalog.bqms=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.bqms.type=rest \
    --conf spark.sql.catalog.bqms.uri=https://biglake.googleapis.com/iceberg/v1/restcatalog \
    --conf spark.sql.catalog.bqms.warehouse=gs://biglake-public-nyc-taxi-iceberg \
    --conf spark.sql.catalog.bqms.header.x-goog-user-project=$PROJECT_ID \
    --conf spark.sql.catalog.bqms.rest.auth.type=google \
    --conf spark.sql.catalog.bqms.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO \
    --conf spark.sql.catalog.bqms.header.X-Iceberg-Access-Delegation=vended-credentials \
    --conf spark.sql.defaultCatalog=bqms

Note: Replace <YOUR_PROJECT_ID> with your actual Google Cloud Project ID. This is required for the REST Catalog to authenticate your quota usage, even for free public access.

Exploring the Data: Sample Queries

Once connected, you have full SQL access to the datasets. We are launching with the classic NYC Taxi dataset, modeled as an Iceberg table to showcase partitioning and metadata capabilities.

1. The "Hello World" of Analytics

This query aggregates millions of records to find the average fare and trip distance by passenger count. It demonstrates how Iceberg efficiently scans data files without needing to list directories.

SELECT 
    passenger_count,
    COUNT(1) AS num_trips,
    ROUND(AVG(total_amount), 2) AS avg_fare,
    ROUND(AVG(trip_distance), 2) AS avg_distance
FROM 
    bqms.public_data.nyc_taxicab
WHERE 
    data_file_year = 2021
    AND passenger_count > 0
GROUP BY 
    passenger_count
ORDER BY 
    num_trips DESC;

What this demonstrates:

  • Partition Pruning: The query filters on data_file_year, allowing the engine to skip scanning data from other years entirely.
  • Vectorized Reads: Engines like Spark can process the Parquet files efficiently in batches.

2. Time Travel: Auditing Data History

One of Iceberg's most powerful features is Time Travel. You can query the table as it existed at a specific point in the past.

-- Compare the row count of the current version vs. a specific snapshot
SELECT 
    'Current State' AS version, 
    COUNT(*) AS count 
FROM bqms.public_data.nyc_taxicab
UNION ALL
SELECT 
    'Past State' AS version, 
    COUNT(*) AS count 
FROM bqms.public_data.nyc_taxicab VERSION AS OF 2943559336503196801;

Description:

This query allows you to audit changes. By querying the history metadata table (e.g., SELECT * FROM bqms.public_data.nyc_taxicab.history), you can find snapshot IDs and "travel back" to see how the dataset grew over time.

Coming Soon: An Iceberg V3 Playground

We are not just hosting static data; we are building a playground for the future of Apache Iceberg. We plan to release new datasets specifically designed to help you test Iceberg V3 Spec features.

Start Building Today

The goal of these public datasets is to lower the barrier to entry. You don't need to manage infrastructure to learn Iceberg; you just need to connect. Whether you are a data analyst, data scientist, data engineer or a data enthusiast, today you can:

  • Use BigQuery (via BigLake) to query these tables directly using SQL, combining them with your private data.
  • Test your OSS engine (e.g. Spark, Trino, Flink etc.) configurations against a live REST Catalog.

Start building an open, managed and high-performance Iceberg lakehouse to enable advanced analytics and data science with https://cloud.google.com/biglake today!

Happy Querying!

This Week in Open Source #12

Friday, January 9, 2026

This Week in Open Source for January 9, 2026

A look around the world of open source

Here we are at the beginning of a new year. What will it bring to the open source world? What new projects will be started? What should we be focusing on? What is your open source resolution for 2026? One of ours is to better connect with various open source communities on social media. We've gotten off to a big start by launching an official Google Open Source account on Bluesky. Already, we are enjoying the community there.

Upcoming Events

  • January 21 - 23: Everything Open 2026 is happening in Canberra, Australia. Everything Open is a conference focused on open technologies, including Linux, open source software, open hardware and open data, and the communities that surround them. The conference provides technical deep-dives as well as updates from industry leaders and experts on a wide array of topics from these areas.
  • January 29: CHAOSScon Europe 2026 is co-located with FOSDEM in Brussels, Belgium. This conference revolves around discussing open source project health, CHAOSS updates, use cases, and hands-on workshops for developers, community managers, project managers, and anyone interested in measuring open source project health. It also shares insights from the CHAOSS context working groups including OSPOs, University Open Source, and Open Source in Science and Research.
  • January 31 - February 1: FOSDEM 2026 is happening at the Université Libre de Bruxelles in Brussels, Belgium. It is a free event for software developers to meet, share ideas and collaborate. Every year, thousands of developers of free and open source software from all over the world gather at the event in Brussels.
  • February 24 - 25: The Linux Foundation Member Summit is happening in Napa, California. It is the annual gathering for Linux Foundation members that fosters collaboration, innovation, and partnerships among the leading projects and organizations working to drive digital transformation with open source technologies.

Open Source Reads and Links

  • [Talk] State of the Source at ATO 2025: State of the "Open" AI - At the end of last year Open Source Initiative gave a summary of Gabriel Toscano's talk at All Things Open. In the talk he discusses how AI models call themselves "open" but often lack the legal or technical freedoms that true open source requires. Analysis of ~20,000 Hugging Face models found Apache 2.0 and MIT are common, but many models have no license or use restrictive custom terms. The study warns that inconsistent labeling and mutable restrictions muddy openness and urges clearer licensing and platform checks.
  • [Article] The Reality of Open Source: More Puppies, Less Beer - Bitnami's removal of popular containers last year shows that open source can suddenly change and disrupt users. Organizations must evaluate who funds and maintains each open source component, not just the code. Plan for business continuity, supply-chain visibility, and the ability to fork or replace critical components.
  • [Blog] The Open Source Community and U.S. Public Policy - The Open Source Initiative is increasing its U.S. policy work to ensure open source developers are part of technology and AI rulemaking. Since policymakers often lack deep knowledge of open source, the community must explain how shared code differs from deployed systems. Joining groups like the Open Policy Alliance helps nonprofits engage and influence policy.
  • [Article] Pebble, the e-ink smartwatch that refuses to die, just went fully open source - Pebble, the e-ink smartwatch with a tumultuous history, is making a move sure to please the DIY enthusiasts that make up the bulk of its fans: Its entire software stack is now fully open source, and key hardware design files are available too.
  • [Article] Forget Predictions: Tech Leaders' Actual 2026 Resolutions - We want to know your open source resolutions and perhaps these resolutions from some tech leaders (open source and otherwise) can point you in a direction. Their plans run the gamut of securing and managing AI responsibly, reducing noise in security data, and creating healthier tech habits. The common theme is intentional, measurable change over speculation.
  • [Paper] Everything is Context: Agentic File System Abstraction for Context Engineering - GenAI systems may produce inaccurate or misleading outputs due to limited contextual awareness and evolving data sources. Thus mechanisms are needed to govern how persistent knowledge transitions into bounded context in a traceable, verifiable, and human-aware manner, ensuring that human judgment and knowledge are embedded within the system's evolving context for reasoning and evaluation.

    The paper proposes using a file-system abstraction based on the open-source AIGNE framework to manage all types of context for generative AI agents. This unified infrastructure makes context persistent, traceable, and governed so agents can read, write, and version memory, tools, and human input.

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account or our new @opensource.google Bluesky account.

.