Apache Iceberg project has just launched version 1.11.0! A lot has happened since the last version.
Iceberg 1.11.0 adds support for Apache Spark 4.1 and Apache Flink 2.1, the latest releases of the two engines and makes both the default build targets
The rest are more structural. The REST catalog learns to plan scans server-side, shifting metadata work off the query engine. A new partition statistics scan API gives optimizers a clean, supported way to read a table's shape. Built-in table encryption arrives with envelope encryption and Google KMS support. And Google Storage Analytics library integration makes your Iceberg workloads faster than before.
Let's take a look at some of the biggest changes.
Spark & Flink Updates
As Spark and Flink are moving forward, the 1.11.0 release is pushing forward for new version support in both.
- Spark 4.1 & DSv2 Migration: Spark 4.1 unlocks is
MERGE INTOwith automatic schema evolution: Spark's newerMERGEsyntax accepts aWITH SCHEMA EVOLUTIONclause, so aMERGEwhose source carries columns the target table lacks can add those columns to the table within the same statement, with no separateALTER TABLEround trip. Beyond the version bump, the 1.11 Spark connector also modernizes against Spark's newer DataSource V2 APIs and adds an asynchronous micro-batch planner that speeds up Structured Streaming. - Flink Ecosystem Updates: Initial work for Flink 2.1 support has landed in the core repository, continuing Iceberg's promise of providing first-class, low-latency streaming sink capabilities. The centerpiece of the Flink work is the
DynamicIcebergSink, an experimental sink that breaks the old one-sink-per-table model: a single sink routes each record to a table chosen at runtime, creating tables on demand and evolving their schemas and partition specs on the fly as the input changes including dropping columns once you opt in withdropUnusedColumns. In addition to DynamicIcebergSInk work Flink started supportingnanosecond,variantandunknowntypes from V3 Spec.
Server-side scan planning
In previous versions of Iceberg, the client handled the heavy lifting of scan orchestration. The driver of engine would traverse the table's metadata tree, retrieving manifest lists and files from object storage to filter data against specific partition requirements. Iceberg 1.11.0 shifts this computational burden into the catalog through server-side scan planning.
Instead of manually traversing manifests, the engine submits a single POST …/plan request detailing the scan allowing the REST catalog to return optimized FileScanTasks.
The API is designed to handle data at any scale: smaller scans return immediate results, extensive operations return a plan-id for polling, and massive datasets are retrieved via parallel plan-tasks through POST …/tasks.
Built-in table encryption
As data lakes increasingly serve as the central hub for sensitive PII and financial data, relying solely on bucket-level storage encryption is no longer enough. Iceberg 1.11.0 introduces built-in table encryption, bringing fine-grained, KMS-backed security directly to the table level.
This provides data platform teams with robust capabilities for security and compliance:
- Zero-Trust Storage Security: Even if a malicious actor gains direct access to your underlying object storage bucket, the data remains completely unreadable.
- Total Index Protection: It isn't just the raw data that is protected; Iceberg encrypts the manifest lists as well, preventing attackers from inferring sensitive information from table statistics.
- Tamper-Proof Data: Built-in authentication tags guard against unauthorized modifications, ensuring data integrity.
- Effortless Key Rotation: Keys are rotated automatically as they age, satisfying strict compliance mandates without requiring you to rewrite massive datasets.
Iceberg achieves this using envelope encryption with a three-tier key hierarchy. A table master key lives securely in your KMS and never touches Iceberg storage. This master key wraps key-encryption keys (KEKs), which are stored safely inside the table metadata. Finally, each KEK wraps a unique, per-file data-encryption key (DEK).
Every data file and manifest list is then encrypted with AES-GCM under its own unique DEK. This decoupled architecture ensures maximum security while maintaining the high performance expected of Iceberg workloads.
File Format API
Historically, Iceberg's format-handling code was tightly coupled, growing organically around Parquet, Avro, and ORC. Adding a new format or enforcing consistent feature support (like V3 default values or new column types) across all formats meant duplicating complex engine-specific switch/case code paths.
Iceberg 1.11.0 introduces the finalized File Format API, bringing a consistent API to reading and writing all of these file formats.
Instead of hardcoded engines handling binary extraction, the architecture introduces:
- FormatModel: A standardized implementation defining how a file format handles reader/writer construction and its specific capabilities.
- FormatModelRegistry: A central directory where query engines fetch appropriate read and write builders.
This API (which is already seeing adoption around other Apache Iceberg implementations) provides a significant code cleanup for the future of the project. It also opens the door for more file formats as time goes on.
Moreover, this new interface facilitates the implementation of Column Families, enabling vertical partitioning of storage. This advancement lets teams perform targeted updates or rewrites on isolated columns—such as recalculating vector embeddings—while leaving the remaining table data undisturbed.
SQL UDF Specification
1.11.0 includes the SQL UDF specification, which adds a brand new metadata format for both Scalar and Table Functions:
- Immutable Versioning and Rollback: UDF metadata is written as self-contained, versioned JSON files stored right in the object store. If a data engineer deploys a buggy UDF update, administrators can execute an atomic rollback to a previous version log state
- Standardized Schema Typings: Parameters and return types map cleanly to Iceberg Type JSON representations, directly accommodating complex nested maps, structs, and the upcoming Iceberg V3 variant type.
- Engine Specific Execution: Each SQL UDF has a function implementation for each engine, allowing users to leverage engine-specific functionality in their UDFs.
Google Analytics Library Integration
For Google Cloud customers, version 1.11.0 delivers substantial throughput gains by embedding the GCS Analytics Core library into GCSFileIO (Issue #14326, PR #14333).
This integration introduces Footer Prefetching, which optimizes Parquet length checks by caching object suffixes to remove network overhead. Combined with threaded VectoredIO for concurrent multi-range operations and specialized small object caching for sub-1MB files, these enhancements eliminate persistent I/O bottlenecks. Initial benchmarks indicate that these architectural improvements can reduce Parquet metadata parsing latency and boost total record processing speeds, empowering high-scale Spark, Flink, and Trino workloads to run with improved efficiency on Google Cloud Storage.
Getting Started with 1.11.0
We are excited to be part of the Apache Iceberg community and innovating together. As a compliant Iceberg REST Catalog, Lakehouse for Apache Iceberg (formerly BigLake) already has support for version 1.11.0.
To upgrade your environment, update your build dependencies to version 1.11.0. Remember to review your deployment runtimes to ensure compatibility with the new JDK 17 baseline, and test your workloads if you are transitioning from Spark 3.4.
For a full breakdown of every bug fix, contributor attribution, and dependency bump, check out the official Apache Iceberg Releases Page