thunk juice

build the -Wall

Fun with marshmallow-select


The fun starts here

My original announcement of marshmallow-select was unfortunately short on examples of what could be done with it. Let’s fix that.

Consider a server application with a minimal user model (ommitting pks & relationships)

class User(Base):
    first_name = Column(String(100))
    last_name = Column(String(100))
    email = Column(String(100))

such that users have zero or more images associated with their account

class Image(Base):
    url = Column(String(100))
    user_id = Column(Integer, ForeignKey('user.id'))

and can like other user’s images[1]

class Like(Base):
    user_id = Column(Integer, ForeignKey('user.id'))
    image_id = Column(Integer, ForeignKey('image.id'))

We’ll call the resulting app Shitbook.

Now consider what sorts of subsets of these objects you might want to show to users in various Shitbook views.

To start with, we’ll use a combination of marshmallow-select’s ModelSchema to define a set of “Shallow” schemas containing all the direct fields of our models, and two-way nesting from plain old marshmallow to define a set of “Complete” schemas.[2]

class UserSchema(ShallowUserSchema):
    images = List(Nested('ImageSchema'))
    likes = List(Nested('LikeSchema'))


class ImageSchema(ShallowImageSchema):
    user = Nested('UserSchema')
    users_who_like = List(Nested('UserSchema'))


class LikeSchema(Schema):
    user = Nested('UserSchema')
    image = Nested('ImageSchema')

Note that these schemas cannot actually be used. For one thing, they are infinitely recursive. For another, all relationships are lazily loaded. Attempting to serialize any linked object with them will result in python rapidly beating your database and itself to death. As we’ll see, this doesn’t actually matter.[3]

A details view

You’d probably have some sort of single-user view (think “user profile page”) in which you showed all the user’s fields, all his images, and all his likes. But you probably wouldn’t want to show all subfields of each image and like. Let’s say that you want to show just all the urls from the user’s own images, and all the urls from the images that user has liked. Or, to put it another way:

class ImageForUserDetailSchema(ImageSchema):
    class Meta:
        fields = ['id', 'url']

class LikeForUserDetailSchema(LikeSchema):
    image = Nested(ImageForUserDetailSchema)

    class Meta:
        fields = ['id', 'image']

class UserDetailSchema(UserSchema):
    images = List(Nested(ImageForUserDetailSchema))
    likes = List(Nested(LikeForUserDetailSchema))

Now, if you’re crazy like we are at Distribute, and you’re planning to generating typescript interface definitions and backend method calls from swagger, you’ll want to verify that you can get correct swagger output from apispec.

spec = APISpec(
    title='detail',
    version='0.0.0',
    plugins=[
        'apispec.ext.marshmallow',
    ],
)
spec.definition('UserDetail', schema=UserDetailSchema)
spec.definition('ImageForUserDetail', schema=ImageForUserDetailSchema)
spec.definition('LikeForUserDetail', schema=LikeForUserDetailSchema)

which yields the definitions


{
    "ImageForUserDetail": {
        "properties": {
            "id": {
                "format": "int32",
                "type": "integer"
            },
            "url": {
                "maxLength": 100,
                "type": "string",
                "x-nullable": true
            }
        },
        "type": "object"
    },
    "LikeForUserDetail": {
        "properties": {
            "id": {
                "format": "int32",
                "type": "integer"
            },
            "image": {
                "$ref": "#/definitions/ImageForUserDetail"
            }
        },
        "type": "object"
    },
    "UserDetail": {
        "properties": {
            "email": {
                "maxLength": 100,
                "type": "string",
                "x-nullable": true
            },
            "first_name": {
                "maxLength": 100,
                "type": "string",
                "x-nullable": true
            },
            "id": {
                "format": "int32",
                "type": "integer"
            },
            "images": {
                "items": {
                    "$ref": "#/definitions/ImageForUserDetail"
                },
                "type": "array"
            },
            "last_name": {
                "maxLength": 100,
                "type": "string",
                "x-nullable": true
            },
            "likes": {
                "items": {
                    "$ref": "#/definitions/LikeForUserDetail"
                },
                "type": "array"
            }
        },
        "type": "object"
    }
}

Looks fine.

Now let’s fetch and serialize a user according to the UserDetailSchema, first naively

qry = session.query(User).filter(User.id==uid)
data = qry.first()
out = OutUserDetailSchema().dump(data).data

With a running total query counter, we see

begin example: user detail
fetching user detail data
queries: 1
serializing user detail data
queries: 4

To clarify, that means a total of 4 queries: 1 for the original fetch, and 3 during serialization. You can see the same basic thing in the test.

Now we filter first

qry = session.query(User).filter(User.id==uid)
sf = SchemaFilter(UserDetailSchema(), unlazify=True)
qry = sf(qry)
data = qry.first()
out = OutUserDetailSchema().dump(data).data

which yields

begin example: user detail
fetching user detail data
queries: 1
serializing user detail data
queries: 1

In other words, all data necessary to produce the message defined by the schema was retrieved in the initial query.

A list view

You’d probably also have some kind of user-list view which shows just the user’s name and an image. For this, we extend the image model to

class Image(Base):
    url = Column(String(100))
    user_id = Column(Integer, ForeignKey('user.id'))
    is_default = Column(Boolean, default=False)

and define an additional relationship

class User(Base):
    ...
    default_image = relationship(
        'Image',
        uselist=False,
        primaryjoin=(
            "and_(User.id==Image.user_id, "
            "Image.is_default==True)"
        )
    )

Our schemas are[4]

class ImageForUserListEltSchema(ImageSchema):
    class Meta:
        fields = ['id', 'url']


class UserListEltSchema(UserSchema):
    default_image = Nested(ImageForUserListEltSchema)

    class Meta:
        fields = ['id', 'first_name', 'default_image']

Once again, with 2 users in our db, we generate 3 queries unfiltered, and 1 query when SchemaFiltering on UserListEltSchema. Moreover, If we subsequently execute

session.query(Image).get(0).id
session.query(Image).get(1).id

No additional queries are produced. But then if we continue with

session.query(Image).get(2).id

a query is produced, because we fetched only those images which were linked to a user we were fetching via the default_image relationship.

So hopefully you now see that neat stuff can be done. I’m also going to be making the API more convenient (unlazify will be the default, and be configurable), and improving the performance by allowing the query’s projection to be computed at application boot time instead of for every query when executed (and also making computing the projection much faster). So don’t touch that dial.


  1. And also, in this example, their own. But this isn’t a real application. ↩︎

  2. marchmallow-sqlalchemy does not allow you to generate infinitely-recursive schemas. I think this is in an effort to be helpful, rather than just to avoid infinite recursion within its own code. ↩︎

  3. Note that it is not strictly required to define such an infinitely-recursive schema. Any old legacy schema setup you have around the codebase will do. But since the primary use-case for marshmallow-select is avoiding N+1 query bugs, it helps to start with a limiting case in which recursive N+1 queries are guaranteed. ↩︎

  4. alert readers will notice that ImageForUserListEltSchema and ImageForUserDetailSchema are identical. I could use a common schema, but the point is that you could use a different subset of the fields of Image for the two different views. ↩︎

Posted June 5, 2017