Writing Backward-Compatible Schema Migrations - How and Why?

ยท

0 min read

Your app is running happily on its own without major glitches. What happens when business requirements changed and you are now required to update your database schema?

In this article, I'll be discussing several caveats that demand our attention when making schema changes in an app which is already been up and serving users. I'd also share some techniques I'd used to eliminate those risks.

The concepts and ideas of making backward-compatible schema changes should be universal regardless of programming languages and frameworks, I'll be using Python Django to illustrate my points.

I'll be starting off with a simple scenario, which you are required to update a schema by renaming name to first_name and adding a last_name column as shown below:

backward compatible schema(1).png

You may be thinking: "Well, isn't this a straight forward renaming and column addition? How difficult could it be"?. Yes, it is merely a simple schema change, but no, It is much more than that!

Making the schema changes recklessly leads you to paralysing bugs

Risks When Renaming Columns

Unless you are renaming a column right after creating it, or the column hasn't been deployed to production yet, you are most likely walking on a thin line. The first thing that needs to be solved is references to the column. If you are using statically typed language and an ORM, lucky you! This will most likely be caught during compilation time. However, in my situation, I was using Python with Django, referencing errors may not be caught if test coverage is insufficient.

So, what can you do to eliminate the potential of referencing error when renaming a column?

Naive Solution: Do A Global Search And Replace

Screenshot from 2020-03-28 16-29-09.png

In most code editors and IDE, you can trigger a global search (or find all) using the keyboard shortcut Ctrl/Cmd + Shift + F. This may allow you to quickly replace all references to name in the codebase but will only work if:

  • the table profile hasn't been created in the production environment
  • the app is self-contained, be it a mobile or desktop app
  • it is a web service but is not consumed by any other app yet

Why?

  1. If your app is providing web services, you will most likely be using Blue/Green deployment. If you are not familiar with this deployment strategy, it is basically used to ensure there's no downtime when switching from older to a newer version of your app. With it, it is very likely that, for a brief moment, both older and newer versions are running simultaneously. In that case, users may experience weird or even fatal errors during the transition phase.

  2. If your app is exposing APIs for other applications (frontend, third party apps, etc), it is almost impossible to ensure all consumer apps are updated to reflect the changes made in the schema at the same time. Thus, the existence of multiple versions will most probably be happening too.

Those problems are not easily caught during testing, but is a prevalent problem.

So, what is the better solution here?

A Better Approach: Ensuring Backward-Compatibility

You may have guessed it, a better solution is to ensure schema changes are backward compatible. But, how?

What's better than showing some code? The following snippet is a Python Django ORM model representing our profiles table:

import uuid

from django.db import models


class Profile(models.Model):
    pid = models.CharField(primary_key=True, default=uuid.uuid4)
    preferred_name = models.CharField(max_length=256)
    name = models.CharField(max_length=256)

Remember, your task is to rename name to first_name, and to add a new column named last_name.

First, let's rename the column from name to first_name:

class Profile(models.Model):
    # ...
    first_name = models.CharField(max_length=256)   # Renamed from `name`

To make sure any references made to name still works, you can leverage class property here:

class Profile(models.Model):
    # ...
    first_name = models.CharField(max_length=256)  # Renamed from `name`

    @property
    def name(self):
        return self.first_name

    @name.setter
    def name(self, value):
        self.first_name = value

By adding a class property name, along with a setter that proxies to your new column name first_name. Any code attempting to access or update name should still work. Personally, I'd also suggest you emit a warning message to inform the developer of such schema change:

# ...
import logging

logger = logging.getLogger(__name__)

class Profile(models.Model):
    # ...
    first_name = models.CharField(max_length=256)  # Renamed from `name`

    @property
    def name(self):
        logger.warning("The property `name` is renamed to `first_name`.")
        return self.first_name

    @name.setter
    def name(self, value):
        # Emit similar warning message
        self.first_name = value

Some of you may also notice the warning mechanism can be better implemented using a decorator, but that is beyond the scope of this article.

Up to this point, your code should work harmoniously with referrers pointing to both name and first_name. Hooray ๐ŸŽ‰ ? Not so soon... ๐Ÿ˜…

Risks When Adding New Columns

Don't forget, you are also adding a new column last_name to your profiles table. Adding a new column is relatively less complex than modifying existing ones. However, it is often being neglected that any new columns being added to an existing table should be nullable in most circumstances.

Why?

  • Existing code consumers won't have any knowledge of a new column is added. Any attempt to create a new record/row in the table would result in ConstraintError/not-null violation if the new column is not nullable (NOT NULL)

Let's go ahead and add the new column:

class Profile(models.Model):
    # ...
    first_name = models.CharField(max_length=256)  # Renamed from `name`
    last_name = models.CharField(max_length=256, null=True)  # New, nullable column

    @property
    def name(self):
        return self.first_name

    @name.setter
    def name(self, value):
        self.first_name = value

Yay! With such implementation, you won't break any clients ignorant of the added column.

To mitigate the loss of the null constraint checker in your database, you can implement similar logic in your application code instead.

Summary

To sum up, when making schema changes, we should always:

  • think about existing code consumers and team members working on the same codebase
  • ensure changes are gradually done and all code users are well-informed
  • ensure code references to both old and new column names are supported
  • ensure new columns added to any existing table are nullable

I hope this article helps. Thumb-ups and claps would definitely drive me further. Have fun hacking!