SQLAthanor - Serialization / De-serialization for SQLAlchemy

SQLAthanor

Serialization/De-serialization Support for the SQLAlchemy Declarative ORM

Branch Unit Tests
latest Build Status (Travis CI) Code Coverage Status (Codecov) Documentation Status (ReadTheDocs)
v.0.4 Build Status (Travis CI) Code Coverage Status (Codecov) Documentation Status (ReadTheDocs)
v.0.3 Build Status (Travis CI) Code Coverage Status (Codecov) Documentation Status (ReadTheDocs)
v.0.2 Build Status (Travis CI) Code Coverage Status (Codecov) Documentation Status (ReadTheDocs)
v.0.1 Build Status (Travis CI) Code Coverage Status (Codecov) Documentation Status (ReadTheDocs)
develop Build Status (Travis CI) Code Coverage Status (Codecov) Documentation Status (ReadTheDocs)

SQLAthanor is a Python library that extends SQLAlchemy’s fantastic Declarative ORM to provide easy-to-use record serialization/de-serialization with support for:

The library works as a drop-in extension - change one line of existing code, and it should just work. Furthermore, it has been extensively tested on Python 2.7, 3.4, 3.5, 3.6, and 3.7 using SQLAlchemy 0.9 and higher.


Installation

To install SQLAthanor, just execute:

$ pip install sqlathanor

Dependencies


Why SQLAthanor?

Odds are you’ve used SQLAlchemy before. And if you haven’t, why on earth not? It is hands down the best relational database toolkit and ORM available for Python, and has helped me quickly write code for many APIs, software platforms, and data science projects. Just look at some of these great features.

As its name suggests, SQLAlchemy focuses on the problem of connecting your Python code to an underlying relational (SQL) database. That’s a super hard problem, especially when you consider the complexity of abstraction, different SQL databases, different SQL dialects, performance optimization, etc. It ain’t easy, and the SQLAlchemy team has spent years building one of the most elegant solutions out there.

But as hard as Pythonically communicating with a database is, in the real world with microservices, serverless architectures, RESTful APIs and the like we often need to do more with the data than read or write from/to our database. In almost all of the projects I’ve worked on over the last two decades, I’ve had to:

  • hand data off in some fashion (serialize) for another program (possibly written by someone else in another programming language) to work with, or
  • accept and interpret data (de-serialize) received from some other program (possibly written by someone else in another programming language).

Python objects (pickled or not) are great, but they’re rarely the best way of transmitting data over the wire, or communicating data between independent applications. Which is where formats like JSON, CSV, and YAML come in.

So when writing many Python APIs, I found myself writing methods to convert my SQLAlchemy records (technically, model instances) into JSON or creating new SQLAlchemy records based on data I received in JSON. So after writing similar methods many times over, I figured a better approach would be to write the serialization/de-serialization code just once, and then re-use it across all of my various projects.

Which is how SQLAthanor came about.

It adds simple methods like to_json(), new_from_csv(), and update_from_csv() to your SQLAlchemy declarative models and provides powerful configuration options that give you tons of flexibility.

Key SQLAthanor Features

  • Easy to adopt: Just tweak your existing SQLAlchemy import statements and you’re good to go.
  • With one method call, convert SQLAlchemy model instances to:
    • CSV records
    • JSON objects
    • YAML objects
    • dict objects
  • With one method call, create or update SQLAlchemy model instances from:
    • dict objects
    • CSV records
    • JSON objects
    • YAML objects
  • Decide which serialization formats you want to support for which models.
  • Decide which columns/attributes you want to include in their serialized form (and pick different columns for different formats, too).
  • Default validation for de-serialized data for every SQLAlchemy data type.
  • Customize the validation used when de-serializing particular columns to match your needs.
  • Works with Declarative Reflection and the Automap Extension.
  • Programmatically generate Declarative Base Models from serialized data.
  • Programmatically create SQLAlchemy Table objects from serialized data.

SQLAthanor vs Alternatives

Since serialization and de-serialization are common problems, there are a variety of alternative ways to serialize and de-serialize your SQLAlchemy models. Obviously, I’m biased in favor of SQLAthanor. But it might be helpful to compare SQLAthanor to some commonly-used alternatives:

Adding your own custom serialization/de-serialization logic to your SQLAlchemy declarative models is a very viable strategy. It’s what I did for years, until I got tired of repeating the same patterns over and over again, and decided to build SQLAthanor instead.

But of course, implementing custom serialization/de-serialization logic takes a bit of effort.

Tip

When to use it?

In practice, I find that rolling my own solution is great when it’s a simple model with very limited business logic. It’s a “quick-and-dirty” solution, where I’m trading rapid implementation (yay!) for less flexibility/functionality (boo!).

Considering how easy SQLAthanor is to configure / apply, however, I find that I never really roll my own serialization/de-serialization approach when working SQLAlchemy models any more.

The Marshmallow library and its Marshmallow-SQLAlchemy extension are both fantastic. However, they have one major architectural difference to SQLAthanor and several more minor differences:

The biggest difference is that by design, they force you to maintain two representations of your data model. One is your SQLAlchemy model class, while the other is your Marshmallow schema (which determines how your model is serialized/de-serialized). Marshmallow-SQLAlchemy specifically tries to simplify this by generating a schema based on your model class, but you still need to configure, manage, and maintain both representations - which as your project gets more complex, becomes non-trivial.

SQLAthanor by contrast lets you configure serialization/deserialization in your SQLAlchemy model class definition. You’re only maintaining one data model representation in your Python code, which is a massive time/effort/risk-reduction.

Other notable differences relate to the API/syntax used to support non-JSON formats. I think Marshmallow uses a non-obvious approach, while with SQLAthanor the APIs are clean and simple. Of course, on this point, YMMV.

Tip

When to use it?

Marshmallow has one advantage over SQLAthanor: It can serialize/de-serialize any Python object, whether it is a SQLAlchemy model class or not. SQLAthanor only works with SQLAlchemy.

As a result, it may be worth using Marshmallow instead of SQLAthanor if you expect to be serializing / de-serializing a lot of non-SQLAlchemy objects.

The Colander library and the ColanderAlchemy extension are both great, but they have a similar major architectural difference to SQLAthanor as Marshmallow/Marshmallow-SQLAlchemy:

By design, they force you to maintain two representations of your data model. One is your SQLAlchemy model class, while the other is your Colander schema (which determines how your model is serialized/de-serialized). ColanderAlchemy tries to simplify this by generating a schema based on your model class, but you still need to configure, manage, and maintain both representations - which as your project gets more complex, becomes non-trivial.

SQLAthanor by contrast lets you configure serialization/deserialization in your SQLAlchemy model class definition. You’re only maintaining one data model representation in your Python code, which is a massive time/effort/risk-reduction.

A second major difference is that, again by design, Colander is designed to serialize/de-serialize Python objects to a set of Python primitives. Since neither JSON, CSV, or YAML are Python primitives, you’ll still need to serialize/de-serialize Colander’s input/output to/from its final “transmissable” form. Once you’ve got a Python primitive, this isn’t difficult - but it is an extra step.

Tip

When to use it?

Colander has one advantage over SQLAthanor: It can serialize/de-serialize any Python object, whether it is a SQLAlchemy model class or not. SQLAthanor only works with SQLAlchemy.

As a result, it may be worth using Colander instead of SQLAthanor if you expect to be serializing / de-serializing a lot of non-SQLAlchemy objects.

pandas is one of my favorite analytical libraries. It has a number of great methods that adopt a simple syntax, like read_csv() or to_csv() which de-serialize / serialize data to various formats (including SQL, JSON, CSV, etc.).

So at first blush, one might think: Why not just use pandas to handle serialization/de-serialization?

Well, pandas isn’t really a serialization alternative to SQLAthanor. More properly, it is an ORM alternative to SQLAlchemy itself.

I could write (and have written) a lot on the subject, but the key difference is that pandas is a “lightweight” ORM that focuses on providing a Pythonic interface to work with the output of single SQL queries. It does not support complex relationships between tables, or support the abstracted definition of business logic that applies to an object representation of a “concept” stored in your database.

SQLAlchemy is specifically designed to do those things.

So you can think of pandas as being a less-abstract, “closer to bare metal” ORM - which is what you want if you want very efficient computations, on relatively “flat” (non-nested/minimally relational) data. Modification or manipulation of the data can be done by mutating your pandas DataFrame without too much maintenance burden because those mutations/modifications probably don’t rely too much on complex abstract business logic.

SQLAthanor piggybacks on SQLAlchemy’s business logic-focused ORM capabilities. It is designed to allow you to configure expected behavior once and then re-use that capability across all instances (records) of your data. And it’s designed to play well with all of the other complex abstractions that SQLAlchemy supports, like relationships, hybrid properties, reflection, or association proxies.

pandas serialization/de-serialization capabilities can only be configured “at use-time” (in the method call), which leads to a higher maintenance burden. SQLAthanor’s serialization/de-serialization capabilities are specifically designed to be configurable when defining your data model.

Tip

When to use it?

The decision of whether to use pandas or SQLAlchemy is a complex one, but in my experience a good rule of thumb is to ask yourself whether you’re going to need to apply complex business logic to your data.

The more complex the business logic is, the more likely SQLAlchemy will be a better solution. And if you are using SQLAlchemy, then SQLAthanor provides great and easy-to-use serialization/de-serialization capabilities.


Hello, World and Basic Usage

SQLAthanor is a drop-in replacement for the SQLAlchemy Declarative ORM and parts of the SQLAlchemy Core.

1. Import SQLAthanor

Since SQLAthanor is a drop-in replacement, you should import it using the same elements as you would import from SQLAlchemy:

The code below is a pretty standard set of import statements when working with SQLAlchemy and its Declarative ORM.

They’re provided for reference below, but do not make use of SQLAthanor and do not provide any support for serialization or de-serialization:

from sqlalchemy.ext.declarative import declarative_base, as_declarative
from sqlalchemy import Column, Integer, String          # ... and any other data types

# The following are optional, depending on how your data model is designed:
from sqlalchemy.orm import relationship
from sqlalchemy.ext.hybrid import hybrid_property
from sqlalchemy.ext.associationproxy import association_proxy

To import SQLAthanor, just replace the relevant SQLAlchemy imports with their SQLAthanor counterparts as below:

from sqlathanor import declarative_base, as_declarative
from sqlathanor import Column
from sqlathanor import relationship             # This import is optional, depending on
                                                # how your data model is designed.

from sqlalchemy import Integer, String          # ... and any other data types

# The following are optional, depending on how your data model is designed:
from sqlalchemy.ext.hybrid import hybrid_property
from sqlalchemy.ext.associationproxy import association_proxy

Tip

Because of its many moving parts, SQLAlchemy splits its various pieces into multiple modules and forces you to use many import statements.

The example above maintains this strategy to show how SQLAthanor is a 1:1 drop-in replacement. But obviously, you can import all of the items you need in just one import statement:

from sqlathanor import declarative_base, as_declarative, Column, relationship

SQLAthanor is designed to work with Flask-SQLAlchemy too! However, you need to:

  1. Import the FlaskBaseModel class, and then supply it as the model_class argument when initializing Flask-SQLAlchemy.
  2. Initialize SQLAthanor on your db instance using initialize_flask_sqlathanor.
from sqlathanor import FlaskBaseModel, initialize_flask_sqlathanor
from flask import Flask
from flask_sqlalchemy import SQLAlchemy

app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db'

db = SQLAlchemy(app, model_class = FlaskBaseModel)
db = initialize_flask_sqlathanor(db)

And that’s it! Now SQLAthanor serialization functionality will be supported by:

  • Flask-SQLAlchemy’s db.Model
  • Flask-SQLAlchemy’s db.relationship()
  • Flask-SQLAlchemy’s db.Column

See also

For more information about working with Flask-SQLAlchemy, please review their detailed documentation.

As the examples provided above show, importing SQLAthanor is very straightforward, and you can include it in an existing codebase quickly and easily. In fact, your code should work just as before. Only now it will include new functionality to support serialization and de-serialization.

The table below shows how SQLAlchemy classes and functions map to their SQLAthanor replacements:

SQLAlchemy Component SQLAthanor Analog

declarative_base()

from sqlalchemy.ext.declarative import declarative_base

declarative_base()

from sqlathanor import declarative_base

@as_declarative

from sqlalchemy.ext.declarative import as_declarative

@as_declarative

from sqlathanor import as_declarative

Column

from sqlalchemy import Column

Column

from sqlathanor import Column

relationship()

from sqlalchemy import relationship

relationship()

from sqlathanor import relationship

ext.automap.automap_base()

from sqlalchemy.ext.automap import automap_base

automap.automap_base()

from sqlathanor.automap import automap_base

2. Declare Your Models

Now that you have imported SQLAthanor, you can just declare your models the way you normally would, even using the exact same syntax.

But now when you define your model, you can also configure serialization and de-serialization for each attribute using two approaches:

  • The Meta Configuration approach lets you define a single __serialization__ attribute on your model that configures serialization/de-serialization for all of your model’s columns, hybrid properties, association proxies, and properties.
  • The Declarative Configuration approach lets you supply additional arguments to your attribute definitions that control whether and how they are serialized, de-serialized, or validated.

Note

explicit is better than implicit

PEP 20 - The Zen of Python

By default, all columns, relationships, association proxies, and hybrid properties will not be serialized. In order for a column, relationship, proxy, or hybrid property to be serializable to a given format or de-serializable from a given format, you will need to explicitly enable serialization/deserialization.

from sqlathanor import declarative_base, Column, relationship, AttributeConfiguration

from sqlalchemy import Integer, String
from sqlalchemy.ext.hybrid import hybrid_property
from sqlalchemy.ext.associationproxy import association_proxy

BaseModel = declarative_base()

class User(BaseModel):
  __tablename__ = 'users'

  __serialization__ = [AttributeConfiguration(name = 'id',
                                              supports_csv = True,
                                              csv_sequence = 1,
                                              supports_json = True,
                                              supports_yaml = True,
                                              supports_dict = True,
                                              on_serialize = None,
                                              on_deserialize = None),
                       AttributeConfiguration(name = 'addresses',
                                              supports_json = True,
                                              supports_yaml = (True, True),
                                              supports_dict = (True, False),
                                              on_serialize = None,
                                              on_deserialize = None),
                       AttributeConfiguration(name = 'hybrid',
                                              supports_csv = True,
                                              csv_sequence = 2,
                                              supports_json = True,
                                              supports_yaml = True,
                                              supports_dict = True,
                                              on_serialize = None,
                                              on_deserialize = None)]
                       AttributeConfiguration(name = 'keywords',
                                              supports_csv = False,
                                              supports_json = True,
                                              supports_yaml = True,
                                              supports_dict = True,
                                              on_serialize = None,
                                              on_deserialize = None)]
                       AttributeConfiguration(name = 'python_property',
                                              supports_csv = (False, True),
                                              csv_sequence = 3,
                                              supports_json = (False, True),
                                              supports_yaml = (False, True),
                                              supports_dict = (False, True),
                                              on_serialize = None,
                                              on_deserialize = None)]

  id = Column('id',
              Integer,
              primary_key = True)

  addresses = relationship('Address',
                           backref = 'user')

  _hybrid = 1

  @hybrid_property
  def hybrid(self):
      return self._hybrid

  @hybrid.setter
  def hybrid(self, value):
      self._hybrid = value

  @hybrid.expression
  def hybrid(cls):
    return False

  keywords = association_proxy('keywords', 'keyword')

  @property
  def python_property(self):
    return self._hybrid * 2

As you can see, we’ve added a __serialization__ attribute to your standard model. The __serialization__ attribute takes a list of AttributeConfiguration instances, where each configures the serialization and de-serialization of a model attribute.

from sqlathanor import declarative_base

BaseModel = declarative_base()

class User(BaseModel):
  __tablename__ = 'users'

  id = Column("id",
              Integer,
              primary_key = True,
              autoincrement = True,
              supports_csv = True,
              csv_sequence = 1,
              supports_json = True,
              supports_yaml = True,
              supports_dict = True,
              on_serialize = None,
              on_deserialize = None)

  name = Column("name",
                Text,
                supports_csv = True,
                csv_sequence = 2,
                supports_json = True,
                supports_yaml = True,
                supports_dict = True,
                on_serialize = None,
                on_deserialize = None)

  email = Column("email",
                 Text,
                 supports_csv = True,
                 csv_sequence = 3,
                 supports_json = True,
                 supports_yaml = True,
                 supports_dict = True,
                 on_serialize = None,
                 on_deserialize = validators.email)

   password = Column("password",
                     Text,
                     supports_csv = (True, False),
                     csv_sequence = 4,
                     supports_json = (True, False),
                     supports_yaml = (True, False),
                     supports_dict = (True, False),
                     on_serialize = None,
                     on_deserialize = my_custom_password_hash_function)

As you can see, we’ve just added some (optional) arguments to the Column constructor. Hopefully these configuration arguments are self-explanatory.

Both the Meta and the Declarative configuration approaches use the same API for configuring serialization and de-serialization. While there are a lot of details, in general, the configuration arguments are:

  • supports_<format> determines whether that attribute is included when serializing or de-serializing the object to the <format> indicated.

    Tip

    If you give these options one value, it will either enable (True) or disable (False) both serialization and de-serialization, respectively.

    But you can also supply a tuple with two values, where the first value controls whether the attribute supports the format when inbound (de-serialization) or whether it supports the format when outbound (serialization).

    In the example above, the password attribute will not be included when serializing the object (outbound). But it will be expected / supported when de-serializing the object (inbound).

  • on_serialize indicates the function or functions that are used to prepare an attribute for serialization. This can either be a single function (that applies to all serialization formats) or a dict where each key corresponds to a format and its value is the function to use when serializing to that format.

    Tip

    If on_serialize is left as None, then SQLAthanor will apply a default on_serialize function based on the attribute’s data type.

  • on_deserialize indicates the function or functions that are used to validate or pre-process an attribute when de-serializing. This can either be a single function (that applies to all formats) or a dict where each key corresponds to a format and its value is the function to use when de-serializing from that format.

    Tip

    If on_deserialize is left as None, then SQLAthanor will apply a default on_deserialize function based on the attribute’s data type.

3. Serialize Your Model Instance

So now let’s say you have a model instance and want to serialize it. It’s super easy:

# Get user with id == 123 from the database
user = User.query.get(123)

# Serialize the user record to a JSON string.
serialized_version = user.to_json()
# Get user with id == 123 from the database
user = User.query.get(123)

# Serialize the user record to a CSV string.
serialized_version = user.to_csv()
# Get user with id == 123 from the database
user = User.query.get(123)

# Serialize the user record to a YAML string.
serialized_version = user.to_yaml()
# Get user with id == 123 from the database
user = User.query.get(123)

# Serialize the user record to a Python dict.
serialized_version = user.to_dict()

That’s it! Of course, the serialization methods all support a variety of other (optional!) options to fine-tune their behavior (CSV formatting, relationship nesting, etc.).

4. De-serialize a Model Instance

Now let’s say you receive a User object in serialized form and want to create a proper Python User object. That’s easy, too:

# EXAMPLE 1: Create a new User from a JSON string called "deserialized_object".
user = User.new_from_json(deserialized_object)

# EXAMPLE 2: Update an existing "user" instance from a JSON
# string called "deserialized_object".
user.update_from_json(updated_object)
# EXAMPLE 1: Create a new User from a CSV string called "deserialized_object".
user = User.new_from_csv(deserialized_object)

# EXAMPLE 2: Update an existing "user" instance from a CSV
# string called "deserialized_object".
user.update_from_csv(updated_object)
# EXAMPLE 1: Create a new User from a YAML string called "deserialized_object".
user = User.new_from_json(deserialized_object)

# EXAMPLE 2: Update an existing "user" instance from a YAML
# string called "deserialized_object".
user.update_from_yaml(updated_object)
# EXAMPLE 1: Create a new User from a dict called "deserialized_object".
user = User.new_from_dict(deserialized_object)

# EXAMPLE 2: Update an existing "user" instance from a dict called
# "deserialized_object".
user.update_from_dict(updated_object)

That’s it! Of course, all the de-serialization functions have additional options to fine-tune their behavior as needed. But that’s it.


Questions and Issues

You can ask questions and report issues on the project’s Github Issues Page


Contributing

We welcome contributions and pull requests! For more information, please see the Contributor Guide


Testing

We use TravisCI for our build automation and ReadTheDocs for our documentation.

Detailed information about our test suite and how to run tests locally can be found in our Testing Reference.


License

SQLAthanor is made available under an MIT License.