opensource.google.com

Menu

Modernizing Oracle operations with Kubernetes and El Carro

Thursday, May 13, 2021

Google Cloud is releasing El Carro, an open source tool to help you transform and modernize your Oracle database operations. El Carro implements the Kubernetes operator pattern to deliver automation for provisioning and ongoing operations like backups, patching, and high availability for databases running in hybrid and multi-cloud environments. And it does so using the same declarative syntax that DevOps teams are using to manage applications. With El Carro, users can choose to modernize and transform their database operations in place and benefit from a consistent management experience and hybrid and multi-cloud portability. Released under the Apache License 2.0, you are free to use El Carro in any Kubernetes environment—you are in control.

Containers and Kubernetes deliver portability on standardized infrastructure, and today Oracle supports databases running in containers; they’ve also released container build files and images and helm charts to simplify provisioning. What is missing for the next level of integration is support for lifecycle operations and an extension of the Kubernetes API to the primitives needed for database management.

In addition, fully managed or autonomous services for Oracle may not make available all the required features, such as Active Data Guard, Multitenant, and In-Memory, parameters/flags, versions, and patch levels. DBAs also find themselves locked out of many roles, including sysadmin and root. These restrictions make many cloud architects fall back to lift and shift Oracle databases onto infrastructure as a service offerings and miss out on opportunities to modernize and transform database operations. And with transactional databases growing in number and criticality, organizations are struggling to deliver innovation and modernization. Engineers are already busy keeping up with sprawl and mundane operational tasks while adhering to strict change management processes.

How do we solve this database operations gap?

El Carro solves this. It is built with scalability in mind, using the same container orchestration infrastructure, Kubernetes, that powers many businesses and is a top choice for modern architectures. Its open API allows you to manage your database configurations as declarative code, enabling CI/CD or Gitops workflows for auditability and control mechanisms. El Carro automates many database lifecycle operations, like backups, replication, and patching. And, when it distributes databases on the nodes of a cluster, it is aware of the priority and resource requirements of each database to optimize tight packing while respecting quality of service. Lastly, it helps DBAs by delivering automation without restrictions and leaving DBAs in full control over their systems. You can choose to let the operator drive for you, but you can also take over the steering wheel yourself at any time.

Because Kubernetes is now the standard for portable infrastructure automation and orchestration, engineers appreciate how Kubernetes abstracts complex problems into manageable infrastructure as code. Kubernetes can scale from small projects to large projects that support the infrastructure that powers Google products and services for billions of users around the world. Moreover, Google pioneers the next generation of infrastructure as code that we refer to as Configuration as Data to declaratively establish a contract between developer intent and the runtime operation. According to the Cloud Native Survey 2020, two-thirds of respondents were either already running stateful workloads in production or were considering doing so within the next 12 months. We expect that datastores are going to drive the next wave of enterprise Kubernetes adoption.

A number of open source operators for databases, such as PostgreSQL, MySQL, and many others, have been released, are actively maintained by the community, and are popular among developers and architects looking for a hands-off approach to manage databases with their applications. El Carro extends the list of database operators to include Oracle.

What are we building with El Carro for Oracle databases?

The operator pattern emerged in late 2016 as an extension of the Kubernetes API and control loop aimed at automating more complicated and application-specific tasks that are beyond the native Kubernetes objects.

El Carro implements a custom resource definition (CRD), which is tailored to database management. Users set and change attributes of the custom resource using the Kubernetes API the same way they do for built-in objects such as pods, deployments, or services. The El Carro controller observes changes to the CRs and compares the declared state with the current reality in the cluster, then makes the necessary changes. Those changes could either affect the Kubernetes resources used by the database such as persistent volumes or the pod itself, or may result in issuing calls via SQL or command line tools to the database to create and modify users or other database objects.

Here’s a look at how this works:
El Carro Architecture
El Carro Architecture

The diagram above shows how the major components of a database managed by the El Carro Operator interact with each other. The controller monitors the CRD for any changes made by admins. It creates and manages the cluster resources that make up the actual database deployment: persistent volumes for filesystems and data, a pod to run containers with the actual database, and a daemon that allows the controller to securely run SQL commands on the database. And lastly, a service makes sqlnet connections available to applications and end users that can either run in the same Kubernetes cluster or outside of it.

At release time, the El Carro Operator can provision Oracle databases of 12c Enterprise Edition and 18c Express Edition. It manages instance parameters, pluggable databases, and users. You can take and restore backups either using rman or storage snapshots, and we are working to add additional features.

How to get involved with El Carro?

In the development process, we collaborated with users and partners in the Oracle community to help us validate the approach. "Pythian has helped Oracle users to automate and optimize the operations of their mission-critical systems for over 20 years,” says Simon Pane, principal consultant at Pythian. “We are excited about the possibilities that El Carro brings to users on their cloud modernization journeys. We are proud to work with the community on a vision for the future of database management.".

Sean Scott covers Docker for databases on his blog oraclesean.com, and says: "There are many benefits to running Oracle databases in containers. Adding Kubernetes orchestration introduces new opportunities to bring the DevOps and Oracle communities together."

You can try out El Carro today. Follow the quick start guide and try out provisioning of instances, databases, users. Import data via Data Pump, manage instance parameters, choose between different methods for backups, and try out a restore. Have a look at how we integrate with external logging and monitoring solutions. Reach out via our Google group and leave feedback for what features you would like to see next, or even create your own patch and pull request on GitHub.

By Bjoern Rost - Product Manager and Boris Dali - Team Lead, Engineering

Season of Docs announces participating organizations for 2021

Friday, April 16, 2021

Season of Docs has announced the participating open source organizations for 2021! You can view the list of participating organizations on the website.

During the documentation development phase, which runs from now until November 16, 2021, each accepted organization will work with the technical writer they hire to complete their documentation project.

For more information about the documentation development phase, visit the organization administrator guide on the website.

What is Season of Docs?

Season of Docs supports documentation in open source by:
  • Providing funds to open source organizations to use for documentation projects
  • Providing guides and support for open source organizations to help them understand their documentation needs
  • Collecting data from open source organizations to better understand documentation impact
  • Publishing case studies from open source organizations to share best practices
Season of Docs seeks to empower open source organizations to understand their documentation needs, to create documentation to fill those needs, to measure the effect and impact of their documentation, and, in the spirit of open source, share what they've learned to help guide other projects. Season of Docs also seeks to bring more technical writers into open source through funding their work with open source projects and organizations.

How do I take part in Season of Docs as a technical writer?

Technical writers interested in working with accepted open source organizations can share their contact information via the Season of Docs GitHub repository; or they may submit a statement of interest directly to the organizations. Technical writers do not need to submit a formal application through Season of Docs. We recommend technical writers reach out to organizations before submitting a statement of interest to discuss the project they’ll be working on and gain a better understanding of the organization.

Organizations must hire technical writers by May 17, 2021 at 18:00 UTC, so technical writers should begin reaching out as soon as possible.

Will technical writers be paid while working with organizations accepted into Season of Docs?

Yes. Participating organizations will transfer funds directly to the technical writer. Technical writers should review the organization's proposed project budgets and discuss their compensation and payment schedule with the organization prior to hiring. Check out our technical writer payment process guide for more details.

If you have any questions about the program, please email us at season-of-docs@google.com.

General timeline

May 17

Technical writer hiring deadline

June 16

Organization administrators begin reporting on their project status via monthly evaluations.

November 30

Organization administrators submit their case study and final project evaluation.

December 14

Google publishes the 2021 case studies and aggregate project data.

May 2, 2022

Organizations begin to participate in post-program followup surveys.

See the full timeline for details.

Care to join us?

Explore the Season of Docs website at g.co/seasonofdocs to learn more about the program. Use our logo and other promotional resources to spread the word. Examine the timeline, check out the FAQ, and reach out to organizations now!

By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Actuating Google Production: How Google’s Site Reliability Engineering Team Uses Go

Tuesday, April 13, 2021

Google runs a small number of very large services. Those services are powered by a global infrastructure covering everything a developer needs: storage systems, load balancers, network, logging, monitoring, and much more. Nevertheless, it is not a static system—it cannot be. Architecture evolves, new products and ideas are created, new versions must be rolled out, configs pushed, database schema updated, and more. We end up deploying changes to our systems dozens of times per second.

Because of this scale and critical need for reliability, Google pioneered Site Reliability Engineering (SRE), a role that many other companies have since adopted. “SRE is what you get when you treat operations as if it’s a software problem. Our mission is to protect, provide for, and progress the software and systems behind all of Google’s public services with an ever-watchful eye on their availability, latency, performance, and capacity.” - Site Reliability Engineering (SRE).

Go Gopher logo
Credit to Renee French for the Go Gopher

In 2013-2014, Google’s SRE team realized that our approach to production management was not cutting it anymore in many ways. We had advanced far beyond shell scripts, but our scale had so many moving pieces and complexities that a new approach was needed. We determined that we needed to move toward a declarative model of our production, called "Prodspec," driving a dedicated control plane, called "Annealing."

When we started those projects, Go was just becoming a viable option for critical services at Google. Most engineers were more familiar with Python and C++, either of which would have been valid choices. Nevertheless, Go captured our interest. The appeal of novelty was certainly a factor of course. But, more importantly, Go promised a sweet spot between performance and readability that neither of the other languages were able to offer. We started a small experiment with Go for some initial parts of Annealing and Prodspec. As the projects progressed, those initial parts written in Go found themselves at the core. We were happy with Go—its simplicity grew on us, the performance was there, and concurrency primitives would have been hard to replace.

At no point was there ever a mandate or requirement to use Go, but we had no desire to return to Python or C++. Go grew organically in Annealing and Prodspec. It was the right choice, and thus is now our language of choice. Now the majority of Google production is managed and maintained by our systems written in Go.

The power of having a simple language in those projects is hard to overstate. There have been cases where some feature was indeed missing, such as the ability to enforce in the code that some complex structure should not be mutated. But for each one of those cases, there have undoubtedly been tens or hundred of cases where the simplicity helped.

For example, Annealing impacts a wide variety of teams and services meaning that we relied heavily on contributions across the company. The simplicity of Go made it possible for people outside our team to see why some part or another was not working for them, and often provide fixes or features themselves. This allowed us to quickly grow.

Prodspec and Annealing are in charge of some quite critical components. Go’s simplicity means that the code is easy to follow, whether it is to spot bugs during review or when trying to determine exactly what happened during a service disruption.

Go performance and concurrency support have also been key for our work. As our model of production is declarative, we tend to manipulate a lot of structured data, which describes what production is and what it should be. We have large services so the data can grow large, often making purely sequential processing not efficient enough.

We are manipulating this data in many ways and many places. It is not a matter of having a smart person come up with a parallel version of our algorithm. It is a matter of casual parallelism, finding the next bottleneck and parallelising that code section. And Go enables exactly that.

As a result of our success with Go, we now use Go for every new development for Prodspec and Annealing.In addition to the SRE team, engineering teams across Google have adopted Go in their development process. Read about how the Core Data Solutions, Firebase Hosting, and Chrome teams use Go to build fast, reliable, and efficient software at scale.

By Pierre Palatin, Site Reliability Engineer
.