Introducing Ephemeral Containers

Friday, January 21, 2022

Ephemeral containers in Kubernetes started with a simple question: is it feasible to run a service on Kubernetes without bundling a Linux distribution userland with every binary?

It was early 2016. Kubernetes had just released version 1.2, and my SRE team was evaluating using Google Kubernetes Engine for internal workloads. Docker and Kubernetes examples always seemed to build images on top of Linux distributions like Debian or CentOS, but our build system produced a binary with its minimum set of library dependencies, so that's what I wanted to deploy as a container image.

This minimal container image worked fine, but only if I never made a mistake. Since the container image had no shell to use with kubectl exec, I had to log into the node with administrator privileges to interactively troubleshoot any problems. This produced an unfortunate debugging experience and was unacceptable from a security perspective.

What's more, kubectl exec had changed little from docker exec even though Kubernetes introduced new abstractions such as a Pod, where multiple containers share resources. How should Kubernetes native troubleshooting work?

Debugging on Borg

Providing userspace utilities for cluster applications wasn't a new problem for Google. Google's existing cluster orchestration system, Borg, provides a common userland for processes. Rather than including system and debugging utilities with the application binary, Borg provides a basic set of userland utilities that applications can expect in their runtime environment. Another team maintains and updates these utilities independent from application binaries.

There are downsides to this approach: Updates to the common utilities can take weeks or months to roll out, application owners can't specify which utilities they need, and the utilities needed at run time may be completely different from the ones needed at debug time. We could do better for Kubernetes.

Extensibility for Kubernetes

I wanted a solution that felt native for Kubernetes and gave users the freedom to customize to their use case, but I was still new to Kubernetes. I reached out to SIG Node and discovered a welcoming, helpful open source community.

Together we considered different ways of deploying tools to a Pod at debugging time. Implementing the feature entirely on the client side would be easiest, but solutions such as copying binaries into the running container image didn't make debugging feel like a feature of the platform. Kubernetes deploys binaries using containers, so it's natural to use containers for troubleshooting as well.

Existing container types were tied to the Pod lifecycle, though. Containers and Init Containers run when a Pod starts, and neither may be added after a Pod is created. For administrative actions we needed a lifecycle more like kubectl exec. We needed a new type of container: the Ephemeral Container.

What are Ephemeral Containers?

Ephemeral containers are a new type of container that are part of the Kubernetes core API. An Ephemeral Container may be added to an existing Pod for administrative actions like debugging, it runs until it exits, and it won't be restarted. An ephemeral container runs within the Pod's existing resource allocation and shares common container namespaces.

How are Ephemeral Containers used?

Here are some debugging scenarios that are made easier using ephemeral containers.

Troubleshooting Clusters

I run a service named "apples" that consists of a Go binary running in a distroless container image. One of its pods is suddenly having trouble connecting to a backend service, but since it's a distroless image I can't use kubectl exec to troubleshoot:

% kubectl exec -it apples-57bcf49487-ddmpn -- sh OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "sh": executable file not found in $PATH: unknown

We can use kubectl debug to add an ephemeral container and test the backend service:

% kubectl debug -it --image=busybox apples-57bcf49487-ddmpn -- sh Defaulting debug container name to debugger-5wvgc. / # ps ax PID USER TIME COMMAND 1 65535 0:00 /pause 7 root 0:00 /app 19 root 0:00 sh 26 root 0:00 ps ax / # wget -S -O - http://bananas:8080 Connecting to bananas:8080 ( HTTP/1.1 500 Internal Server Error wget: server returned error: HTTP/1.1 500 Internal Server Error

Technical Support

To make this easier to detect next time, I'll add this check to my operation team's autodiagnose script. The ops team doesn't have access to attach to production pods, but they have access to run the autodiagnose image which attaches its logs to a bug report:

% kubectl debug apples-57bcf49487-ddmpn --


What's Next for Ephemeral Containers?

Ephemeral Containers are available as a beta feature in Kubernetes 1.23, but we still have lots of work to polish the rough edges and improve kubectl debug to support more debugging journeys, such as configuring the container security context to allow attaching a debugger.

Try out Ephemeral Containers and let us know in the Ephemeral Containers and kubectl debug enhancements how they work for you.

Contributor Experience

Working with the Kubernetes community has been incredibly rewarding. When I started I didn't know nearly enough to contribute something like this, but I discovered a community that works hard to welcome contributions at all levels.

I want to thank the community, and especially Dawn Chen, Yu-Ju Hong, Jordan Liggitt‎, Clayton Coleman, Maciej Szulik, Tim Hockin‎, for providing the support and guidance that made this feature possible.

Kubernetes will welcome your contribution as well! See for how to get started.

By Lee Verberne, Site Reliability Engineer – Google Cloud Platform

Life after Season of Docs

Tuesday, January 18, 2022

My journey to technical writing involved a long, windy, and non-linear career path. Before I became a technical writer, I spent years working in finance jobs with a stint of teaching in between. Seeking a career change, I went back to university where I came across the Technical Writing certificate program. It was the perfect fit for my skills and interests.

One of Google’s technical writers, Nicole Yap, visited my class to talk about her career, and introduce Season of Docs—a program that brings technical writers and open source projects together to work on open source documentation. My interest was piqued as it seemed like a great opportunity for a new graduate. With no real world experience of writing documentation, I applied and was accepted into Season of Docs. I worked with Oppia, an online learning platform for a 3-month project, where I created a user guide with video tutorials.

During that time, I had to quickly become familiar with many new concepts:
  • Open source philosophy
  • Writing docs-as-code
  • Command-line basics
  • Submitting and amending pull requests on GitHub, and much more!
In the course of the Season of Docs program, I got my first full time job as a technical writer at a software company in Toronto. Juggling the demands of the project and my new job was challenging, but I was grateful for the experience as I could transfer the skills I learned to the new role. 

Opening doors to new experiences

I had such a positive experience working with my mentors1 at Oppia that we mutually agreed to extend our relationship. Over the next year, I continued to work with Oppia in different capacities—copywriting, editing, helping write math lessons—while getting to know the network of international volunteers who contribute to this incredible organization.

I also had the opportunity to present a talk at a Write the Docs Toronto meetup which was a great way to plug Season of Docs, and demonstrate what I had learnt during the program. There was quite a bit of interest from the audience as many hadn’t even heard of the program before.

My Season of Docs experience also helped me with my day job as a technical writer. After experiencing the steep learning curve with Oppia, I was able to hit the ground running with learning the new job processes at the software company. I was also able to fall back on my Season of Docs experience as I created marketing and technical videos in my new job as well.

A new opportunity

At the start of 2021, I had the opportunity to apply for a technical writing position at Google. I had the notion that a company like Google would require years of tech writing experience before they would even consider my application, but that turned out not to be true. I’ve been a technical writer at Google for four months now, and it still feels a bit surreal!

As a newcomer in the tech world, I find that everything I learned during the Season of Docs program has come in handy in helping me understand my job a little better. Getting into Season of Docs as a new entrant to the field of technical writing was a confidence-booster for me, and the path it led to has been challenging yet gratifying. I’m excited to continue learning every single day from the sea of talent around me.

By Audrey Tavares – Google Cloud

  1. The current Season of Docs program format does not have a defined mentor role, but technical writers in the program work closely with project contributors to learn open source skills.  

DeepNull: an open-source method to improve the discovery power of genetic association studies

Friday, January 14, 2022

In our paper “DeepNull models non-linear covariate effects to improve phenotypic prediction and association power,” we proposed a new method, DeepNull, to model the complex relationship between covariate effects on phenotypes to improve Genome-wide association studies (GWAS) results. We have released DeepNull as open source software, with a Colab notebook tutorial for its use.

Human Genetics 101

Each individual’s genetic data carries health information such as why certain individuals have a lower risk of developing skin cancer compared to others or why certain drugs differ in effectiveness between individuals. Genetic data is encoded in the human genome—a DNA sequence—composed of a 3 billion long chain built from four possible nucleotides (A, C, G, and T). Only a small subset of the genome (~4-5 million positions) varies between two individuals. One of the goals of genetic studies is to detect variants that are associated with different phenotypes (e.g., risk of diseases such as Glaucoma or observed phenotypic values such as high-density lipoprotein (HDL), low-density lipoproteins (LDL), height, etc).

Genome-wide association studies

GWAS are used to associate genetic variants with complex traits and diseases. To more accurately determine an association strength between genotype and phenotype, the interactions between phenotypes (such as age and sex) and principal components (PCs) of genotypes, must be adjusted for as covariates. Covariate adjustment in GWAS can increase precision and correct for confounding. In the linear model setting, adjustment for a covariate will improve precision (i.e., statistical power) if the distribution of the phenotype differs across levels of the covariate. For example, when performing GWAS on height, males and females have different means. All state of the art methods (e.g., BOLT-LMM, regenie) perform GWAS assuming that the effect of genotypes and covariates to phenotype is linear and additive. However, we know that the assumption of linear and additive contributions of covariates often does not reflect underlying biology, so we sought a method to more comprehensively model and adjust for the interactions between phenotypes for GWAS.

DeepNull method overview

We proposed a new method, DeepNull, to relax the linear assumption of covariate effects on phenotypes. DeepNull trains a deep neural network (DNN) to predict phenotype using all covariates in a 5-fold cross-validation. After training the DeepNull model, we make phenotype predictions for all individuals and add this prediction as one additional covariate in the association test. Major advantages of DeepNull are its simplicity to use and that it requires only a minimal change to existing GWAS pipeline implementations. In other words, to use DeepNull, we just need to add one additional covariate, which is computed by DeepNull, to the existing pipeline to perform GWAS.

DeepNull improves statistical power

We simulated data under different genetic architectures (genetic conditions) to first check that DeepNull controls type I error and then compare DeepNull statistical power with current state of the art methods (hereafter referred to as “Baseline”). First, we simulated data under genetic architectures where covariates have a linear effect on phenotype and observed that both Baseline and DeepNull have tight control of type I error. It is interesting that DeepNull power does not decrease compared to Baseline under a setting in which covariates have only a linear effect on phenotype. Next, we simulated data under genetic architectures where covariates have non-linear effects on phenotype. Both Baseline and DeepNull have tight control of type I error while DeepNull increases the statistical power depending on the genetic architecture. We observed that for certain genetic architectures, DeepNull increases the statistical power up to 20%. Below, we compare the -log p-value of test statistics computed from DeepNull versus Baseline for Apolipoprotein B (ApoB) levels obtained from UK Biobank:
Figure 1. Significance level comparison of DeepNull vs Baseline. X-axis is the -log p-value of Baseline and Y-axis is the -log p-value of DeepNull. The orange dots indicate variants that are significant for Baseline but not significant for DeepNull and green dots indicate variants that are significant for DeepNull but not significant for Baseline.

DeepNull improves phenotype prediction

We applied DeepNull to predict phenotypes by utilizing polygenic risk score (PRS) and existing covariates such as age and sex. We considered 10 phenotypes obtained from UK Biobank. We observed that DeepNull on average increased the phenotype prediction (R2 where R is Pearson correlation) by 23%. More strikingly, in the case of Glaucoma, referral probability that is computed from the fundus images (Phene et al. Ophthalmology 2019, Alipanahi et al AJHG 2021), DeepNull improves the phenotype prediction by 83.4% and in the case of LDL, DeepNull improves the phenotype prediction by 40.3%. The summary of DeepNull results versus Baseline are shown in figure 2 below:


Figure 2. DeepNull improves phenotype prediction compared to Baseline. The Y-axis is the R2 where R is the Pearson’s correlation between true and predicted value of phenotypes. Phenotypic abbreviations: alkaline phosphatase (ALP), alanine aminotransferase (ALT), aspartate aminotransferase (AST), apolipoprotein B (ApoB), glaucoma referral probability (GRP), LDLcholesterol (LDL), sex hormone-binding globulin (SHBG), and triglycerides (TG).


We proposed a new framework, DeepNull, that can model the nonlinear effect of covariates on phenotypes when such nonlinearity exists. We show that DeepNull can substantially improve phenotype prediction. In addition, we show that DeepNull achieves results similar to a standard GWAS when the effect of covariate on the phenotype is linear and can significantly outperform a standard GWAS when the covariate effects are nonlinear. DeepNull is open source and is available for download from GitHub or installation via PyPI.

By Farhad Hormozdiari and Andrew Carroll – Genomics team in HealthAI


This blog summarizes the work of the following Google contributors, who we would like to thank: Zachary R. McCaw, Thomas Colthurst, Ted Yun, Nick Furlotte, Babak Alipanahi, and Cory Y. McLean. In addition, we would like to thank Alkes Price, Babak Behsaz, and Justin Cosentino for their invaluable comments and suggestions.