build the -Wall
My original announcement of marshmallow-select was unfortunately short on examples of what could be done with it. Let’s fix that.
Consider a server application with a minimal user model (ommitting pks & relationships)
class User(Base):
first_name = Column(String(100))
last_name = Column(String(100))
email = Column(String(100))
such that users have zero or more images associated with their account
class Image(Base):
url = Column(String(100))
user_id = Column(Integer, ForeignKey('user.id'))
and can like other user’s images[1]
class Like(Base):
user_id = Column(Integer, ForeignKey('user.id'))
image_id = Column(Integer, ForeignKey('image.id'))
We’ll call the resulting app Shitbook.
Now consider what sorts of subsets of these objects you might want to show to users in various Shitbook views.
To start with, we’ll use a combination of
marshmallow-select’s
ModelSchema
to define a set of “Shallow” schemas containing all the
direct fields of our models, and
two-way nesting
from plain old
marshmallow
to define a set of “Complete” schemas.[2]
class UserSchema(ShallowUserSchema):
images = List(Nested('ImageSchema'))
likes = List(Nested('LikeSchema'))
class ImageSchema(ShallowImageSchema):
user = Nested('UserSchema')
users_who_like = List(Nested('UserSchema'))
class LikeSchema(Schema):
user = Nested('UserSchema')
image = Nested('ImageSchema')
Note that these schemas cannot actually be used. For one thing, they are infinitely recursive. For another, all relationships are lazily loaded. Attempting to serialize any linked object with them will result in python rapidly beating your database and itself to death. As we’ll see, this doesn’t actually matter.[3]
You’d probably have some sort of single-user view (think “user profile page”) in which you showed all the user’s fields, all his images, and all his likes. But you probably wouldn’t want to show all subfields of each image and like. Let’s say that you want to show just all the urls from the user’s own images, and all the urls from the images that user has liked. Or, to put it another way:
class ImageForUserDetailSchema(ImageSchema):
class Meta:
fields = ['id', 'url']
class LikeForUserDetailSchema(LikeSchema):
image = Nested(ImageForUserDetailSchema)
class Meta:
fields = ['id', 'image']
class UserDetailSchema(UserSchema):
images = List(Nested(ImageForUserDetailSchema))
likes = List(Nested(LikeForUserDetailSchema))
Now, if you’re crazy like we are at Distribute, and you’re planning to generating typescript interface definitions and backend method calls from swagger, you’ll want to verify that you can get correct swagger output from apispec.
spec = APISpec(
title='detail',
version='0.0.0',
plugins=[
'apispec.ext.marshmallow',
],
)
spec.definition('UserDetail', schema=UserDetailSchema)
spec.definition('ImageForUserDetail', schema=ImageForUserDetailSchema)
spec.definition('LikeForUserDetail', schema=LikeForUserDetailSchema)
which yields the definitions
{
"ImageForUserDetail": {
"properties": {
"id": {
"format": "int32",
"type": "integer"
},
"url": {
"maxLength": 100,
"type": "string",
"x-nullable": true
}
},
"type": "object"
},
"LikeForUserDetail": {
"properties": {
"id": {
"format": "int32",
"type": "integer"
},
"image": {
"$ref": "#/definitions/ImageForUserDetail"
}
},
"type": "object"
},
"UserDetail": {
"properties": {
"email": {
"maxLength": 100,
"type": "string",
"x-nullable": true
},
"first_name": {
"maxLength": 100,
"type": "string",
"x-nullable": true
},
"id": {
"format": "int32",
"type": "integer"
},
"images": {
"items": {
"$ref": "#/definitions/ImageForUserDetail"
},
"type": "array"
},
"last_name": {
"maxLength": 100,
"type": "string",
"x-nullable": true
},
"likes": {
"items": {
"$ref": "#/definitions/LikeForUserDetail"
},
"type": "array"
}
},
"type": "object"
}
}
Looks fine.
Now let’s fetch and serialize a user according to the
UserDetailSchema
, first naively
qry = session.query(User).filter(User.id==uid)
data = qry.first()
out = OutUserDetailSchema().dump(data).data
With a running total query counter, we see
begin example: user detail
fetching user detail data
queries: 1
serializing user detail data
queries: 4
To clarify, that means a total of 4 queries: 1 for the original fetch, and 3 during serialization. You can see the same basic thing in the test.
Now we filter first
qry = session.query(User).filter(User.id==uid)
sf = SchemaFilter(UserDetailSchema(), unlazify=True)
qry = sf(qry)
data = qry.first()
out = OutUserDetailSchema().dump(data).data
which yields
begin example: user detail
fetching user detail data
queries: 1
serializing user detail data
queries: 1
In other words, all data necessary to produce the message defined by the schema was retrieved in the initial query.
You’d probably also have some kind of user-list view which shows just the user’s name and an image. For this, we extend the image model to
class Image(Base):
url = Column(String(100))
user_id = Column(Integer, ForeignKey('user.id'))
is_default = Column(Boolean, default=False)
and define an additional relationship
class User(Base):
...
default_image = relationship(
'Image',
uselist=False,
primaryjoin=(
"and_(User.id==Image.user_id, "
"Image.is_default==True)"
)
)
Our schemas are[4]
class ImageForUserListEltSchema(ImageSchema):
class Meta:
fields = ['id', 'url']
class UserListEltSchema(UserSchema):
default_image = Nested(ImageForUserListEltSchema)
class Meta:
fields = ['id', 'first_name', 'default_image']
Once again, with 2 users in our db, we generate 3 queries unfiltered,
and 1 query when SchemaFilter
ing on UserListEltSchema
. Moreover,
If we subsequently execute
session.query(Image).get(0).id
session.query(Image).get(1).id
No additional queries are produced. But then if we continue with
session.query(Image).get(2).id
a query is produced, because we fetched only those images which were
linked to a user we were fetching via the default_image
relationship.
So hopefully you now see that neat stuff can be done. I’m also going
to be making the API more convenient (unlazify
will be the default,
and be configurable), and improving the performance by allowing the
query’s projection to be computed at application boot time instead of
for every query when executed (and also making computing the
projection much faster). So don’t touch that dial.
And also, in this example, their own. But this isn’t a real application. ↩︎
marchmallow-sqlalchemy does not allow you to generate infinitely-recursive schemas. I think this is in an effort to be helpful, rather than just to avoid infinite recursion within its own code. ↩︎
Note that it is not strictly required to define such an infinitely-recursive schema. Any old legacy schema setup you have around the codebase will do. But since the primary use-case for marshmallow-select is avoiding N+1 query bugs, it helps to start with a limiting case in which recursive N+1 queries are guaranteed. ↩︎
alert readers will notice that ImageForUserListEltSchema
and ImageForUserDetailSchema
are identical. I could use a common
schema, but the point is that you could use a different subset of
the fields of Image
for the two different views. ↩︎