Writing Backward-Compatible Schema Migrations - How and Why?
Your app is running happily on its own without major glitches. What happens when business requirements changed and you are now required to update your database schema?
In this article, I'll be discussing several caveats that demand our attention when making schema changes in an app which is already been up and serving users. I'd also share some techniques I'd used to eliminate those risks.
The concepts and ideas of making backward-compatible schema changes should be universal regardless of programming languages and frameworks, I'll be using Python Django to illustrate my points.
I'll be starting off with a simple scenario, which you are required to update a schema by renaming
first_name and adding a
last_name column as shown below:
You may be thinking: "Well, isn't this a straight forward renaming and column addition? How difficult could it be"?. Yes, it is merely a simple schema change, but no, It is much more than that!
Making the schema changes recklessly leads you to paralysing bugs
Risks When Renaming Columns
Unless you are renaming a column right after creating it, or the column hasn't been deployed to production yet, you are most likely walking on a thin line. The first thing that needs to be solved is references to the column. If you are using statically typed language and an ORM, lucky you! This will most likely be caught during compilation time. However, in my situation, I was using Python with Django, referencing errors may not be caught if test coverage is insufficient.
So, what can you do to eliminate the potential of referencing error when renaming a column?
Naive Solution: Do A Global Search And Replace
In most code editors and IDE, you can trigger a global search (or find all) using the keyboard shortcut
Ctrl/Cmd + Shift + F. This may allow you to quickly replace all references to
name in the codebase but will only work if:
- the table
profilehasn't been created in the production environment
- the app is self-contained, be it a mobile or desktop app
- it is a web service but is not consumed by any other app yet
If your app is providing web services, you will most likely be using Blue/Green deployment. If you are not familiar with this deployment strategy, it is basically used to ensure there's no downtime when switching from older to a newer version of your app. With it, it is very likely that, for a brief moment, both older and newer versions are running simultaneously. In that case, users may experience weird or even fatal errors during the transition phase.
If your app is exposing APIs for other applications (frontend, third party apps, etc), it is almost impossible to ensure all consumer apps are updated to reflect the changes made in the schema at the same time. Thus, the existence of multiple versions will most probably be happening too.
Those problems are not easily caught during testing, but is a prevalent problem.
So, what is the better solution here?
A Better Approach: Ensuring Backward-Compatibility
You may have guessed it, a better solution is to ensure schema changes are backward compatible. But, how?
What's better than showing some code? The following snippet is a Python Django ORM model representing our
import uuid from django.db import models class Profile(models.Model): pid = models.CharField(primary_key=True, default=uuid.uuid4) preferred_name = models.CharField(max_length=256) name = models.CharField(max_length=256)
Remember, your task is to rename
first_name, and to add a new column named
First, let's rename the column from
class Profile(models.Model): # ... first_name = models.CharField(max_length=256) # Renamed from `name`
To make sure any references made to
name still works, you can leverage class property here:
class Profile(models.Model): # ... first_name = models.CharField(max_length=256) # Renamed from `name` def name(self): return self.first_name def name(self, value): self.first_name = value
By adding a class property
name, along with a setter that proxies to your new column name
first_name. Any code attempting to access or update
name should still work. Personally, I'd also suggest you emit a warning message to inform the developer of such schema change:
# ... import logging logger = logging.getLogger(__name__) class Profile(models.Model): # ... first_name = models.CharField(max_length=256) # Renamed from `name` def name(self): logger.warning("The property `name` is renamed to `first_name`.") return self.first_name def name(self, value): # Emit similar warning message self.first_name = value
Some of you may also notice the warning mechanism can be better implemented using a decorator, but that is beyond the scope of this article.
Up to this point, your code should work harmoniously with referrers pointing to both
first_name. Hooray 🎉 ? Not so soon... 😅
Risks When Adding New Columns
Don't forget, you are also adding a new column
last_name to your
profiles table. Adding a new column is relatively less complex than modifying existing ones. However, it is often being neglected that any new columns being added to an existing table should be nullable in most circumstances.
- Existing code consumers won't have any knowledge of a new column is added. Any attempt to create a new record/row in the table would result in
not-null violationif the new column is not nullable (
Let's go ahead and add the new column:
class Profile(models.Model): # ... first_name = models.CharField(max_length=256) # Renamed from `name` last_name = models.CharField(max_length=256, null=True) # New, nullable column def name(self): return self.first_name def name(self, value): self.first_name = value
Yay! With such implementation, you won't break any clients ignorant of the added column.
To mitigate the loss of the null constraint checker in your database, you can implement similar logic in your application code instead.
To sum up, when making schema changes, we should always:
- think about existing code consumers and team members working on the same codebase
- ensure changes are gradually done and all code users are well-informed
- ensure code references to both old and new column names are supported
- ensure new columns added to any existing table are nullable
I hope this article helps. Thumb-ups and claps would definitely drive me further. Have fun hacking!