Making changes to model code

Guidelines for writing model code

No length limits on database columns

Don’t put any length limits on your database columns (for example sqlalchemy.Column(sqlalchemy.Unicode(30), ...)). These can cause painful database migrations.

Always use sqlalchemy.UnicodeText() with no length limit as the type for text columns in the database (you can also use sqlalchemy.Text() if you’re sure the column will never receive non-ASCII characters).

When necessary validate the lengths of strings in Python code instead. This can be done using SQLAlchemy validators in model code.

View callables for HTML forms should also use Colander schemas to validate user input, in addition to any validation done in the model code, because Colander supports returning per-field errors to the user.

Creating a database migration script

If you’ve made any changes to the database schema (for example: added or removed a SQLAlchemy ORM class, or added, removed or modified a sqlalchemy.Column on an ORM class) then you need to create a database migration script that can be used to upgrade the production database from the previous to your new schema.

We use Alembic to create and run migration scripts. See the Alembic docs (and look at existing scripts in h/migrations/versions) for details, but the basic steps to create a new migration script for h are:

  1. Create the revision script by running alembic revision, for example:

    alembic -c conf/alembic.ini revision -m "add the foobar table"
    

    This will create a new script in h/migrations/versions/.

  2. Edit the generated script, fill in the upgrade() and downgrade() methods.

    See https://alembic.readthedocs.io/en/latest/ops.html#ops for details.

    Note

    Not every migration should have a downgrade() method. For example if the upgrade removes a max length constraint on a text field, so that values longer than the previous max length can now be entered, then a downgrade that adds the constraint back may not work with data created using the updated schema.

  3. Stamp your database.

    Before running any upgrades or downgrades you need to stamp the database with its current revision, so Alembic knows which migration scripts to run:

    alembic -c conf/alembic.ini stamp <revision_id>
    

    <revision_id> should be the revision corresponding to the version of the code that was present when the current database was created. The will usually be the down_revision from the migration script that you’ve just generated.

  4. Test your upgrade() function by upgrading your database to the most recent revision. This will run all migration scripts newer than the revision that your db is currently stamped with, which usually means just your new revision script:

    alembic -c conf/alembic.ini upgrade head
    

    After running this command inspect your database’s schema to check that it’s as expected, and run h to check that everything is working.

    Note

    You should make sure that there’s some repesentative data in the relevant columns of the database before testing upgrading and downgrading it. Some migration script crashes will only happen when there’s data present.

  5. Test your downgrade() function:

    alembic -c conf/alembic.ini downgrade -1
    

    After running this command inspect your database’s schema to check that it’s as expected. You can then upgrade it again:

    alembic -c conf/alembic.ini upgrade +1
    

Batch deletes and updates in migration scripts

It’s important that migration scripts don’t lock database tables for too long, so that when the script is run on the production database concurrent database transactions from web requests aren’t held up.

An SQL DELETE command acquires a FOR UPDATE row-level lock on the rows that it selects to delete. An UPDATE acquires a FOR UPDATE lock on the selected rows if the update modifies any columns that have a unique index on them that can be used in a foreign key. While held this FOR UPDATE lock prevents any concurrent transactions from modifying or deleting the selected rows.

So if your migration script is going to DELETE or UPDATE a large number of rows at once and committing that transaction is going to take a long time (longer than 100ms) then you should instead do multiple DELETEs or UPDATEs of smaller numbers of rows, committing each as a separate transaction. This will allow concurrent transactions to be sequenced in-between your migration script’s transactions.

For example, here’s some Python code that deletes all the rows that match a query in batches of 25:

query = <some sqlalchemy query>
query = query.limit(25)
while True:
    if query.count() == 0:
        break
    for row in query:
        session.delete(row)
    session.commit()

Separate data and schema migrations

It’s easier for deployment if you do data migrations (code that creates, updates or deletes rows) and schema migrations (code that modifies the database schema, for example adding a new column to a table) in separate migration scripts instead of combining them into one script. If you have a single migration that needs to modify some data and then make a schema change, implement it as two consecutive migration scripts instead.

Don’t import model classes into migration scripts

Don’t import model classes, for example from h.api.models import Annotation, in migration scripts. Instead copy and paste the Annotation class into your migration script.

This is because the script needs the schema of the Annotation class as it was at a particular point in time, which may be different from the schema in h.api.models.Annotation when the script is run in the future.

The script’s copy of the class usually only needs to contain the definitions of the primary key column(s) and any other columns that the script uses, and only needs the name and type attributes of these columns. Other attributes of the columns, columns that the script doesn’t use, and methods can usually be left out of the script’s copy of the model class.

Troubleshooting migration scripts

(sqlite3.OperationalError) near “ALTER”

SQLite doesn’t support ALTER TABLE. To get around this, use Alembic’s batch mode.

Cannot add a NOT NULL column with default value NULL

If you’re adding a column to the model with nullable=False then when the database is upgraded it needs to insert values into this column for each of the already existing rows in the table, and it can’t just insert NULL as it normally would. So you need to tell the database what default value to insert here.

default= isn’t enough (that’s only used when the application is creating data, not when migration scripts are running), you need to add a server_default= argument to your add_column() call.

See the existing migration scripts for examples.