Migrating data from PlanetScale
Posted on 19 March, 2024
As previously posted here and here PlanetScale are pulling their free tier.
I've already written about the process of changing the codebase to work with a different provider, but that still left a question: how to get my existing data out of PlanetScale and into Supabase?
Dumping PlanetScale
The only "easy" way to get data out of PlanetScale that I could find was using their CLI tool pscale
to create a MySQL dump. There are other (generic) ways to get data out of a MySQL database that may
be better, but none that I could get to work with PlanetScale.
This means having their CLI tooling installed on your local machine (or a virtual machine that you have control over) and then running:
pscale database dump <database name> <branch name>
That gives you a folder in your current working directory that contains two SQL files for each table
in your database — one containing the schema, and one with all the rows set out as an INSERT
instruction.
Recreating on Supabase
I'd already set up my tables in Supabase as part of changing the codebase (I use dbdiagram.io to set up schema when using SQL databases, so I was able to grab a Postgres version of the schema from there). So "all" I needed was to get my data in.
Whilst I might have been able to use the second SQL file in the dump folder (copying the whole
INSERT query and pasting it into the Supabase web console), reading through the Supabase docs it
seemed a much safer bet to convert them into CSVs and then use Supabase's "import from CSV" function.
Reading a file in, converting it, and outputting in a different format — feels like a task for Python to me.
Not using Pandas
Some quick googling suggested that using Pandas to read in an SQL file and output a CSV would be "pretty easy", but there were some potential hiccups with the PlanetScale dump which meant I felt more confident going "old-skool" — hand-writing a conversion function using RegEx and Python's CSV module.
My concept was:
Read the file into memory.
For each line:
- remove the trailing closing bracket and any extraneous characters
- remove the first open bracket and any preceding characters
- strip any quote marks (WARNING!)
Write the data out as a CSV.
When used from the command line the function will check that it's been given an SQL file, or a folder, and warn if neither is true. If the input is a folder it'll cycle through every non-schema SQL file within that folder.
Use example for a single file:
python3 sql_dump_to_csv.py ./lsersa_booking.users.00001.sql
And for a folder:
python3 sql_dump_to_csv.py ./pscale_dump_lsersa_booking_main/
The output does need checking — particularly if there are any text fields in the database
that may actually contain commas (like an address). In that instance there'll be an extra column in
the CSV (with doesn't have a header) — I simply opened the CSV in a spreadsheet and eye-balled
it. The only table this affected was my users table (which had some commas in address fields), and
it was visually extremely obvious where columns had gotten out of sync.
Full code:
# sql_dump_to_csv.py
import re
import csv
def remove_bounding_quotes(str):
return re.sub('^["\'`]|["\'`]$', "", str)
def convert_to_csv(filepath):
# open the file at the path
sql_file = open(filepath, 'r')
# read it in
lines = sql_file.readlines()
# create an empty list
csv_lines = []
# read lines in - we only want content between brackets on each line
for line in lines:
# get rid of final closing bracket and any other detritus
temp = re.split("\)( VALUES)?,?;?$", line)[0]
# find the first opening bracket
bracket_pos = temp.find("(")
# commit
csv_lines.append(temp[bracket_pos + 1 :])
# break down into elements
csv_lines = list(map(lambda line: line.split(","), csv_lines))
# remove opening and closer quotes - single, double and `
csv_lines = list(map(lambda line: list(map(lambda cell: remove_bounding_quotes(cell), line)), csv_lines))
# in the first row, we need to make sure we get rid of expiresAt for expires_at
csv_lines[0] = list(map(lambda cell: 'expires_at' if cell == 'expiresAt' else cell, csv_lines[0]))
# save to same path but finishing .csv
csv_file = open(re.sub('.sql$', '.csv', filepath), 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerows(csv_lines)
csv_file.close()
sql_file.close()
def is_sql_dump_file(filepath):
# we want sql files that don't have schema (naming convention is digits or 'schema')
if re.search('.sql$', filepath):
if re.search('schema.sql$', filepath):
return False
else:
return True
else:
return False
def final_message():
print("Complete. Please check output for escaped special characters")
if __name__ == "__main__":
import sys
import os
filepath = sys.argv[1]
sql_filepath = re.search('.sql$', filepath)
dir_filepath = os.path.isdir(filepath)
if sql_filepath:
convert_to_csv(filepath)
final_message()
elif dir_filepath:
# need to try finding sql files that aren't schemas
all_files = [f for f in os.listdir(filepath) if os.path.isfile(os.path.join(filepath, f))]
sql_files = list(filter(lambda f: is_sql_dump_file(f), all_files))
# run a conversion for each
for sql_file in sql_files:
convert_to_csv(os.path.join(filepath, sql_file))
final_message()
else:
# not an sql file and not a directory, stop
print("Not an sql file")
Once I'd generated my CSVs for each table, actually importing the data into Supabase was very easy. I'd been unsure whether Supabase would cope with PlanetScale's boolean treatment (storing as a 1/0 integer), or if date formats would be an issue, but both were accepted just fine. The only thing I had to be aware of was which tables had foreign key constraints, and therefore out the order in which I should import the tables.