MySQL to Cloud Spanner via HarbourBridge

Today we’re announcing that HarbourBridge—an open source toolkit that automates much of the manual work of evaluating and assessing Cloud Spanner—supports migrations from MySQL, in addition to existing support for PostgreSQL. This provides a zero-configuration path for MySQL users to try out Cloud Spanner. HarbourBridge bootstraps early stages of migration, and helps get you to the meaty issues as quickly as possible.

Core capabilities

At its core, HarbourBridge provides an automated workflow for loading the contents of an existing MySQL or PostgreSQL database into Spanner. It requires zero configuration—no manifests or data maps to write. Instead, it imports the source database, builds a Spanner schema, creates a new Spanner database populated with data from the source database, and generates a detailed assessment report. HarbourBridge can either import dump files (from mysqldump or pg_dump) or directly connect to the source database. It is intended for loading databases up to a few tens of GB for evaluation purposes, not full-scale migrations.

Bootstrap early-stage migration

HarbourBridge bootstraps early-stage migration to Spanner by using an existing MySQL or PostgreSQL source database to quickly get you running on Spanner. It generates an assessment report with an overall migration-fitness score for Spanner, a table-by-table analysis of type mappings and a list of features used in the source database that aren't supported by Spanner.

View HarbourBridge as a way to get up and running quickly, so you can focus on critical things like tuning performance and getting the most out of Spanner. You will need to tweak and enhance what HarbourBridge produces—more on that later.

Getting started

HarbourBridge can be used with the Cloud Spanner Emulator, or directly with a Cloud Spanner instance. The Emulator is a local, in-memory emulation of Spanner that implements the same APIs as Cloud Spanner’s production service, and allows you to try out Spanner’s functionality without creating a GCP Project. The HarbourBridge README contains a step-by-step quick-start guide for using the tool with a Cloud Spanner instance.

Together, HarbourBridge and the Cloud Spanner Emulator provide a lightweight, open source toolchain to experiment with Cloud Spanner. Moreover, when you want to proceed to performance testing and tuning, switching to a production Cloud Spanner instance is a simple configuration change.

To get started on using HarbourBridge with the Emulator, follow the Emulator instructions. In particular, start the Emulator using Docker and configure the SPANNER_EMULATOR_HOST environment variable (this tells the Cloud Spanner Client libraries to use the Emulator).

Next, install Go and configure the GOPATH environment variable if they are not already part of your environment. Now you can download and install HarbourBridge using

GO111MODULE=on \
go get github.com/cloudspannerecosystem/harbourbridge

It should be installed as $GOPATH/bin/harbourbridge. To use HarbourBridge on a MySQL database, run mysqldump and pipe its output to HarbourBridge

mysqldump <opts> db | $GOPATH/bin/harbourbridge -driver=mysqldump

where <opts> are the standard options you pass to mysqldump or mysql to specify host, port, etc., and db is the name of the database to dump.
Similarly, to use HarbourBridge on a PostgreSQL database, run

pg_dump <opts> db | $GOPATH/bin/harbourbridge -driver=pg_dump

See the Troubleshooting guide if you run into any issues. In addition to creating a new Spanner database with data from the source database, HarbourBridge also generates a schema file, the assessment report, and a bad data file (if any data is dropped). See Files generated by HarbourBridge.

Sample dump files

If you don’t have ready access to a MySQL or PostgreSQL database, the HarbourBridge github repository has some samples. The files cart.mysqldump and cart.pg_dump contain mysqldump and pg_dump output for a very basic shopping cart application (just two tables, one for products and one for user carts). The files singers.mysqldump and singers.pg_dump contain mysqldump and pg_dump output for a version of the Cloud Spanner singers example. To use HarbourBridge on cart.mysqldump, download the file locally and run

$GOPATH/bin/harbourbridge -driver=mysqldump < cart.mysqldump

Next steps

The schema created by HarbourBridge provides a starting point for evaluation of Spanner. While it preserves much of the core structure of your MySQL or PostgreSQL schema, data types will be mapped based on the types supported by Spanner, and unsupported features will be dropped e.g. functions, sequences, procedures, triggers and views. See the assessment report as well as HarbourBridge’s Schema conversion documentation for details.

To test Spanner’s performance, you will need to switch from the Emulator to a Cloud Spanner instance. The HarbourBridge quick-start guide provides details of how to set up a Cloud Spanner instance. To have HarbourBridge use your Cloud Spanner instance instead of the Emulator, simply unset the SPANNER_EMULATOR_HOST environment variable (see the Emulator documentation for context).

To optimize your Spanner performance, carefully review choices of primary keys and indexes—see Keys and indexes. Note that HarbourBridge preserves primary keys from the source database but drops all other indexes. This means that the out-of-the-box performance you get from the schema created by HarbourBridge can be significantly impacted. If this is the case, add appropriate Secondary indexes. In addition, consider using Interleaved tables to optimize table layout and improve the performance of joins.

Recap

HarbourBridge is an open source toolkit for evaluating and assessing Cloud Spanner using an existing MySQL or PostgreSQL database. It automates many of the manual steps so that you can quickly get to important design, evaluation and performance issues, such as. refining choice of primary keys, tuning of indexes, and other optimizations.

We encourage you to try out HarbourBridge, send feedback, file issues, fork and modify the codebase, and send PRs for fixes and new functionality. We have big plans for HarbourBridge, including the addition of user-guided schema conversion (to customize type mappings and provide a guided exploration of indexing, primary key choices, and use of interleaved tables), as well as support for more databases. HarbourBridge is part of the Cloud Spanner Ecosystem, owned and maintained by the Cloud Spanner user community. It is not officially supported by Google as part of Cloud Spanner.

By Nevin Heintze, Cloud Spanner

opensource.google.com

Google Open Source Blog