Many-To-Many Relationships In FastAPI
In this tutorial, I cover multiple strategies for handling many-to-many relationships using FastAPI with SQLAlchemy and pydantic. I assume you already know of what these things are and how to use them in a basic sense.
The code for this tutorial can be found on github.
Intro
I’m working on a project in FastAPI and I needed to way to store and serve data with a many-to-many relationship. What I thought would take ~15 minutes turned into a week-long affair of investigative research and experimentation.
Aside
My recent week was one of those emotional roller coasters that every programmer experiences. You desperately want to accomplish some goal, but minutes turn unto hours and hours into days and you still don’t have a solution. You face incredible frustration - borderline depression - where you’re reading undocumented source code at 1AM which only raises more questions than answers. But then, usually after a good sleep and some time away, you figure it out. And as a result of the struggle, you’re way smarter than you were a week ago. I’m sure a lot of you can relate.
About this tutorial
In this tutorial, I’ll not only show you how to handle many-to-many relationships, but I’ll walk you through my personal discovery process (with less pain, of course). This means we’ll see some dead ends and non ideal solutions, but some of those dead ends introduce really powerful techniques that could be useful elsewhere. Trust me - it’s worth it.
Setup
The best tutorials start with a motivating example..
Suppose we want to build a REST API to provide information about books and authors. A book can have multiple authors and an author can have multiple books. This is a classic data relationship known as “many-to-many”. To store such data in a SQL database, you need three tables.
books
{id, title}authors
{id, name}book_authors
{book_id, author_id}
If we want to know which books were written by author 1, we join authors
to book_authors
to books
, filtered where authors.id = 1
.
# books written by author 1
SELECT temp.* FROM (
SELECT
books.*,
book_authors.author_id
FROM books INNER JOIN book_authors ON books.id = book_authors.book_id
) AS temp
INNER JOIN authors ON temp.author_id = authors.id
WHERE authors.id = 1
Here, book_authors
is what’s known as a “junction table” or an “association table”.
Modeling this kind of data in SQLAlchemy and FastAPI actually isn’t all that hard. The real struggle starts when you want to include extra data in the association table. For example, books usually have a blurb about each author that discusses their contribution to the book. For example, a book about pollution might have two authors with blurbs like
- “Harry is a scientist who wrote chapters 1-3.”
- “Susan is an economist who wrote chapter 4.”
In this case, blurb is specific to each (book, author)
pair, and so it should be stored as a column in the book_authors
junction table. It turns out, modeling and serving this data is kind of tricky if you’re anal about data structures like I am. But before we dive into the specifics, let’s get our feet wet by implementing a FastAPI application for serving many-to-many data without extra data.
Many-to-many without extra data
In this section, we’ll attempt to implement a FastAPI app called Bookipedia which stores data on books and their authors. It’s the simpler form of the problem discussed above; here our junction table is simply a mapping between books and authors without any extra information. That is, we want to build the app around these three tables:
Our app will be a very simple read-only API with four endpoints
GET /books
GET /books/{id}
GET /authors
GET /authors/{id}
When a user requests a specific book like GET /books/2
they should receive a JSON response like
{
id: 2
title: 'Foo',
authors: [
{id: 0, name: 'bob'},
{id: 1, name: 'sue'}
]
}
Challenge
Before I show you my solution to this problem, I strongly encourage you to attempt to build this yourself. You learn more when you struggle :)
Solution
See the complete code on github.
SQLAlchemy
First lets set up our SQLAlchemy models. My code below is based on this section of the SQLAlchemy docs, except I include the additional stuff necessary to create an in-memory SQLite database. (Check out my tutorial on SQLAlchemy if the below code is super confusing.)
"""
FastAPI app called 'Bookipedia' that serves information about books and their authors. A simple example of a
"many-to-many" relationship *without* extra data.
"""
from sqlalchemy import create_engine, Column, Integer, String, Table, ForeignKey
from sqlalchemy.orm import declarative_base, relationship
# Make the engine
engine = create_engine("sqlite+pysqlite:///:memory:", future=True, echo=True,
connect_args={"check_same_thread": False})
# Make the DeclarativeMeta
Base = declarative_base()
# Declare Classes / Tables
book_authors = Table('book_authors', Base.metadata,
Column('book_id', ForeignKey('books.id'), primary_key=True),
Column('author_id', ForeignKey('authors.id'), primary_key=True)
)
class Book(Base):
__tablename__ = 'books'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
authors = relationship("Author", secondary="book_authors", back_populates='books')
class Author(Base):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
books = relationship("Book", secondary="book_authors", back_populates='authors')
# Create the tables in the database
Base.metadata.create_all(engine)
Notes
- We define the
book_authors
table using SQLAlchemy’sTable()
class (“imperative style”), and we define thebooks
andauthors
tables using inheritance (“declarative style”). We could’ve used imperative style only or declarative style only, but opted for a mix of both. (This is weird to me, but it’s what they do in the docs..) - In the junction table, we declared the primary key as the pair of columns
(book_id, author_id)
. Because of this, every(book_id, author_id)
pair in the table must be unique.
Now let’s insert some data. Here we create two books and three authors, mapping authors 1 & 2 to book 1 and authors 1 & 3 to book 2.
# Insert data
from sqlalchemy.orm import Session
with Session(bind=engine) as session:
book1 = Book(title="Dead People Who'd Be Influencers Today")
book2 = Book(title="How To Make Friends In Your 30s")
author1 = Author(name="Blu Renolds")
author2 = Author(name="Chip Egan")
author3 = Author(name="Alyssa Wyatt")
book1.authors = [author1, author2]
book2.authors = [author1, author3]
session.add_all([book1, book2, author1, author2, author3])
session.commit()
Now let’s run some queries on the data. If you’re feeling ambitious, try solving these problems on your own.
- Get the book with id 1 and print its title
with Session(bind=engine) as session:
b1 = session.query(Book).where(Book.id == 1).one()
print(b1.title)
- Print the name of each of its authors
with Session(bind=engine) as session:
b1 = session.query(Book).where(Book.id == 1).one()
for a in b1.authors:
print(a.name)
Hold up. In the last example, it looks like SQLAlchemy executed two separate SQL queries: one when we fetched the author and another when we fetch his books. (Make sure you set echo=True
when you instantiated the database engine
to see this.)
This is commonly called the “N+1 SELECTS” problem. It happens because SQLAlchemy lazily loads objects related to an instance. This is a helpful default behavior, but if you know ahead of time that you’ll need those related objects, tell SQLAlchemy to fetch them in the first query. Otherwise you might suffer from performance issues. To fix this, just modify the query using joinedload()
as follows:
from sqlalchemy.orm import joinedload
with Session(bind=engine) as session:
b1 = session.query(Book).\
options(joinedload(Book.authors)).\
where(Book.id == 1).one()
print(b1.title)
You can read more about data loading strategies here.
pydantic
Now let’s set up pydantic models which we can use as response models in our FastAPI path operation functions. (Recall, pydantic models give us a way to document API input and output, validate data, and serialize data to/from JSON.)
class AuthorBase(BaseModel):
id: int
name: str
class Config:
orm_mode = True
class BookBase(BaseModel):
id: int
title: str
class Config:
orm_mode = True
class BookSchema(BookBase):
authors: List[AuthorBase]
class AuthorSchema(AuthorBase):
books: List[BookBase]
Notes
- It might seem like overkill to set up four models. Why no just two -
Author
andBook
? Try it and you’ll quickly realize you can’t avoid a circular dependency issue wherebyAuthor
depends onBook
andBook
depends onAuthor
.
Let’s try populating a BookSchema
and serializing it to JSON.
with Session(bind=engine) as session:
b1 = session.query(Book).options(joinedload(Book.authors)).first()
b1_schema = BookSchema.from_orm(b1)
print(b1_schema.json())
{
"id": 1,
"title": "Dead People Who'd Be Influencers Today",
"authors": [
{
"id": 1,
"name": "Blu Renolds"
},
{
"id": 3,
"name": "Chip Egan"
}
]
}
FastAPI
Finally, let’s set up our FastAPI app and path operation functions.
"""
FastAPI app called 'Bookipedia' that serves information about books and their authors. A simple example of a
"many-to-many" relationship *without* extra data.
"""
# <SQLAlchemy code not shown...>
# <pydantic code not shown...>
from fastapi import FastAPI, Depends
app = FastAPI(title="Bookipedia")
def get_db():
db = Session(bind=engine)
try:
yield db
finally:
db.close()
@app.get("/books/{id}", response_model=BookSchema)
async def get_book(id: int, db: Session = Depends(get_db)):
db_book = db.query(Book).options(joinedload(Book.authors)).\
where(Book.id == id).one()
return db_book
@app.get("/books", response_model=List[BookSchema])
async def get_books(db: Session = Depends(get_db)):
db_books = db.query(Book).options(joinedload(Book.authors)).all()
return db_books
@app.get("/authors/{id}", response_model=AuthorSchema)
async def get_author(id: int, db: Session = Depends(get_db)):
db_author = db.query(Author).options(joinedload(Author.books)).\
where(Author.id == id).one()
return db_author
@app.get("/authors", response_model=List[AuthorSchema])
async def get_authors(db: Session = Depends(get_db)):
db_authors = db.query(Author).options(joinedload(Author.books)).all()
return db_authors
Notes
- We didn’t set up any error handling for the sake of brevity, so when you test the endpoints, make sure you pass in a valid Book id and Author id. (SQLite auto-increments ids starting from 1.)
- One thing that’s a little bit mysterious here is how FastAPI converts our SQLAlchemy model instances into JSON. For example, in the
get_book()
path operation function, we populate and return aBook
instance, yet somehow FastAPI serves a JSON serialization of that instance. What happens under the hood is more complicated than this, but in essence it’s this:
schema_book = BookSchema.from_orm(db_book)
schema_book.json()
Many-to-many with extra data
At this point, I’d like to make a seemingly benign tweak to the Bookipedia app we built in the last section. I want to add a blurb
field to the book_authors
table so that we can store blurbs about each author for each book.
Now when a user requests a specific book like GET /books/2
they should receive a JSON response like
{
id: 2
title: 'Foo',
authors: [
{id: 0, name: 'bob', blurb: 'this is a blurb'},
{id: 1, name: 'sue', blurb: 'this is also a blurb'}
]
}
Our seemingly benign tweak to the data turns out to wreak havoc (or at least, it did for me). Again, I challenge you to attempt solving this..
SQLAlchemy Association Object Pattern
The first thing we need to do is add a blurb
field to our SQLAlchemy model. We’ll change our declaration of book_authors
from this..
book_authors = Table('book_authors', Base.metadata,
Column('book_id', ForeignKey('books.id'), primary_key=True),
Column('author_id', ForeignKey('authors.id'), primary_key=True)
)
to this
class BookAuthor(Base):
__tablename__ = 'book_authors'
book_id = Column(ForeignKey('books.id'), primary_key=True)
author_id = Column(ForeignKey('authors.id'), primary_key=True)
blurb = Column(String, nullable=False)
We leave the Book and Author models unchanged (for now).
class Book(Base):
__tablename__ = 'books'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
authors = relationship("Author", secondary="book_authors", back_populates='books')
class Author(Base):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
books = relationship("Book", secondary="book_authors", back_populates='authors')
This is described in the SQLAlchemy Docs as the association object pattern. We can insert the same data as before but with blurbs like this.
# Insert data
from sqlalchemy.orm import Session
with Session(bind=engine) as session:
book1 = Book(title="Dead People Who'd Be Influencers Today")
book2 = Book(title="How To Make Friends In Your 30s")
author1 = Author(name="Blu Renolds")
author2 = Author(name="Chip Egan")
author3 = Author(name="Alyssa Wyatt")
session.add_all([book1, book2, author1, author2, author3])
session.commit()
book_author1 = BookAuthor(book_id=book1.id, author_id=author1.id, blurb="Blue wrote chapter 1")
book_author2 = BookAuthor(book_id=book1.id, author_id=author2.id, blurb="Chip wrote chapter 2")
book_author3 = BookAuthor(book_id=book2.id, author_id=author1.id, blurb="Blue wrote chapters 1-3")
book_author4 = BookAuthor(book_id=book2.id, author_id=author3.id, blurb="Alyssa wrote chapter 4")
session.add_all([book_author1, book_author2, book_author3, book_author4])
session.commit()
Next let’s tweak our pydantic models to include an Optional blurb
field.
from typing import List, Optional
from pydantic import BaseModel
class AuthorBase(BaseModel):
id: int
name: str
blurb: Optional[str]
class Config:
orm_mode = True
class BookBase(BaseModel):
id: int
title: str
blurb: Optional[str]
class Config:
orm_mode = True
class BookSchema(BookBase):
authors: List[AuthorBase]
class AuthorSchema(AuthorBase):
books: List[BookBase]
Here we add blurb
as an Optional field to both AuthorBase
and BookBase
. If we fetch an author and his related books, we expect each related book to have blurb
populated. If we fetch a book and its related authors, we expect each author to have blurb
populated.
Let’s see what happens if we attempt to fetch the book with id 1 and use it to populate the BookSchema
pydantic model we set up earlier.
with Session(bind=engine) as session:
db_book = session.query(Book).\
options(joinedload(Book.authors)).\
where(Book.id == 1).one()
schema_book = BookSchema.from_orm(db_book)
print(schema_book.json())
{
"id": 1,
"title": "Dead People Who'd Be Influencers Today",
"blurb": null,
"authors": [
{
"id": 1,
"name": "Blu Renolds",
"blurb": null
},
{
"id": 2,
"name": "Chip Egan",
"blurb": null
}
]
}
No error, but all the blurbs come back null. This shouldn’t be too surprising.. db_book
is a Book
instance which points to a collection of Author
instances, but blurb
is a field of BookAuthor
, not Author
. In other words, db_book.authors[0].blurb
doesn’t exit.
Association Proxy
See the complete code on github.
My first solution to this problem was to take advantage of SQLAlchemy’s Association Proxy (thanks to the hint from “van” here).
Step 1: Adjust SQLAlchemy Relationships
In the Author
model, make books
point to BookAuthor
instead of Book
. In other words, change this
class Author(Base):
...
books = relationship("Book", secondary="book_authors", back_populates='authors')
to this
class Author(Base):
...
books = relationship("BookAuthor")
Then give BookAuthor
an author
field, which points to the Author
model like
class BookAuthor(Base):
...
author = relationship("Author")
Do essentially the same thing for the Book
model. I’ll include back_populates
as well, so my updated models look like this.
class BookAuthor(Base):
__tablename__ = 'book_authors'
book_id = Column(ForeignKey('books.id'), primary_key=True)
author_id = Column(ForeignKey('authors.id'), primary_key=True)
blurb = Column(String, nullable=False)
book = relationship("Book", back_populates="authors")
author = relationship("Author", back_populates="books")
class Book(Base):
__tablename__ = 'books'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
authors = relationship("BookAuthor", back_populates="book")
class Author(Base):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
books = relationship("BookAuthor", back_populates="author")
Now when we fetch a book, we can access its first author’s blurb and name as follows
with Session(bind=engine) as session:
db_book = session.query(Book).first()
db_book.authors[0].blurb
db_book.authors[0].author.name
Important: Notice that to get the name of the author, we can’t simply call db_book.authors[0].name
. That’s because db_book.authors[0]
is a BookAuthor
instance and BookAuthor
doesn’t have a name
attribute. It does, however, have an author
attribute. So we can fetch the author’s name via db_book.authors[0].author.name
.
Step 2: Create Association Proxies
Suppose we fetch a book and attempt to populate our existing BookSchema
with it.
with Session(bind=engine) as session:
db_book = session.query(Book).options(
joinedload(Book.authors).options(joinedload(BookAuthor.author))
).first()
schema_book = BookSchema.from_orm(db_book)
As it stands, we’ll get an error like this.
pydantic.error_wrappers.ValidationError: 4 validation errors for BookSchema
authors -> 0 -> id
field required (type=value_error.missing) …
That’s because our BookSchema
pydantic model attempts to populate the related authors' ids using book.authors[0].id
, book.authors[1].id
, etc. But as we discussed a minute ago, the Author
attributes need to be accessed like book.authors[0].author.id
or book.authors[0].author.name
. To overcome this, we can place association proxies in BookAuthor
for each Author
attribute we want to be directly accessible from the BookAuthor
model. For example,
from sqlalchemy.ext.associationproxy import association_proxy
class BookAuthor(Base):
__tablename__ = 'book_authors'
book_id = Column(ForeignKey('books.id'), primary_key=True)
author_id = Column(ForeignKey('authors.id'), primary_key=True)
blurb = Column(String, nullable=False)
book = relationship("Book", back_populates="authors")
author = relationship("Author", back_populates="books")
# proxies
author_name = association_proxy(target_collection='author', attr='name')
book_title = association_proxy(target_collection='book', attr='title')
Now when we fetch db_book
(as above) we can directly access a related author’s name via db_book.authors[0].author_name
.
Step 3: pydantic Aliases
With the tweaks we made in step 2 above, we have the data we need in the correct objects to populate our pydantic models. …but we have some name mismatches. BookSchema
’s authors expect an id
attribute and a name
attribute, but our Book
’s authors have an author_id
attribute and an author_name
attribute. (We have similar name mismatches for Author
’s books.) Fortunately, pydantic supports Field aliases. Here’s a working implementation.
from typing import List, Optional
from pydantic import BaseModel, Field
class AuthorBase(BaseModel):
id: int = Field(alias='author_id')
name: str = Field(alias='author_name')
blurb: Optional[str]
class Config:
orm_mode = True
allow_population_by_field_name = True
class BookBase(BaseModel):
id: int = Field(alias='book_id')
title: str = Field(alias='book_title')
blurb: Optional[str]
class Config:
orm_mode = True
allow_population_by_field_name = True
class BookSchema(BookBase):
authors: List[AuthorBase]
class AuthorSchema(AuthorBase):
books: List[BookBase]
Without making any changes to the FastAPI code we set up in the previous section, our app works :). Here’s a screenshot from the swagger docs, after I GET /books/1
.
Looking at this response, two annoying issues stand out:
blurb
exists as an attribute of book (although it’s null which is good). Remember,blurb
is a note about the author of a book, so it should only be an attribute of an author’s books or a book’s authors. It shouldn’t exist as a top-level attribute. To fix this, passresponse_model_exclude={'blurb'}
into the path operation decorators.- Our alias names (e.g.
author_name
andauthor_id
) are being used in the JSON response instead of our preferred names (‘name’ and ‘id’). To fix this, passresponse_model_by_alias=False
into the path operation decorators.
Now the get_book()
function looks like this
@app.get("/books/{id}", response_model=BookSchema,
response_model_exclude={'blurb'}, response_model_by_alias=False)
async def get_book(id: int, db: Session = Depends(get_db)):
db_book = db.query(Book).options(joinedload(Book.authors)).\
where(Book.id == id).one()
return db_book
And the response from Get /books/1
looks like this
Drawbacks
- It’s awkward and un-pythonic that
db_book.authors
actually returns a collection ofBookAuthor
s, not trueAuthor
s - It’s inconvenient to add every little author and book attribute as a proxy onto the
BookAuthor
model - Frankly this entire approach is a bit complex. There must be a simpler solution…
View Only Property
See the complete code on github.
Another strategy is to create a view-only property of our Book
model called authors
which simply executes a SQL statement that returns the authors for a book with their corresponding blurbs. (Similarly, we could create a books
attribute of Author
.) Here’s an implementation of this.
class BookAuthor(Base):
__tablename__ = 'book_authors'
book_id = Column(ForeignKey('books.id'), primary_key=True)
author_id = Column(ForeignKey('authors.id'), primary_key=True)
blurb = Column(String, nullable=False)
class Book(Base):
__tablename__ = 'books'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
@property
def authors(self):
s = """
SELECT temp.* FROM (
SELECT
authors.*,
book_authors.blurb,
book_authors.book_id
FROM authors INNER JOIN book_authors ON authors.id = book_authors.author_id
) AS temp
INNER JOIN books ON temp.book_id = books.id
WHERE books.id = :bookid
"""
result = object_session(self).execute(s, params={'bookid': self.id}).fetchall()
return result
class Author(Base):
...
(Relevant section of the SQLAlchemy docs, here.)
With this technique, our pydantic model and path operation functions can remain extremely simple (as they were in the many-to-many without extra data section). You can checkout the full code for this here.
Drawbacks
- The related collections are read-only, so we can’t modify them if we wanted to.
- This setup suffers from the “N+1 SELECTs” problem. (There may be a way around this that I’m not aware of.) For example, see what happens when we fetch all authors.
The response looks good..
But notice the four separately executed SELECT statements!
pydantic Custom GetterDict
See the complete code on github.
The premise behind the third solution is to modify our pydantic models so that they can read our SQLAlchemy models which are misaligned in structure. This is based on the discussion here and palderosa’s solution here.
1. SQLAlchemy Models
We’ll start by setting up SQLAlchemy models with an association object pattern.
# Declare Classes / Tables
class BookAuthor(Base):
__tablename__ = 'book_authors'
book_id = Column(ForeignKey('books.id'), primary_key=True)
author_id = Column(ForeignKey('authors.id'), primary_key=True)
blurb = Column(String, nullable=False)
book = relationship("Book", back_populates="authors")
author = relationship("Author", back_populates="books")
class Book(Base):
__tablename__ = 'books'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
authors = relationship("BookAuthor", back_populates="book")
class Author(Base):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
books = relationship("BookAuthor", back_populates="author")
2. Pydantic Models
Next we’ll set up our pydantic models (schemas) just as we did in the last section
from typing import List
from pydantic import BaseModel
class AuthorBase(BaseModel):
id: int
name: str
class Config:
orm_mode = True
class BookBase(BaseModel):
id: int
title: str
class Config:
orm_mode = True
class BookSchema(BookBase):
authors: List[AuthorBase]
class AuthorSchema(AuthorBase):
books: List[BookBase]
Recall the current issue - if we fetch a book from the database, pydantic attempts to populate the book’s authors' id and name with calls like db_book.authors[0].id
and db_book.authors[1].name
, but the correct way to access these attributes is db_book.authors[0].author.id
and db_book.authors[0].author.id
. If only we could customize the way in which pydantic gets these attributes..
The trick here is to create a custom GetterDict
and provide it as a Config attribute named getter_dict
. This is discussed in the pydantic docs here. Here’s an implementation with our models.
from typing import List, Any
from pydantic import BaseModel
from pydantic.utils import GetterDict
class BookAuthorGetter(GetterDict):
def get(self, key: str, default: Any = None) -> Any:
if key in {'id', 'name'}:
return getattr(self._obj.author, key)
else:
return super(BookAuthorGetter, self).get(key, default)
class BookAuthorSchema(BaseModel):
id: int
name: str
blurb: str
class Config:
orm_mode = True
getter_dict = BookAuthorGetter
class BookSchema(BaseModel):
id: int
title: str
authors: List[BookAuthorSchema]
class Config:
orm_mode = True
class AuthorBookGetter(GetterDict):
def get(self, key: str, default: Any = None) -> Any:
if key in {'id', 'title'}:
return getattr(self._obj.book, key)
else:
return super(AuthorBookGetter, self).get(key, default)
class AuthorBookSchema(BaseModel):
id: int
title: str
blurb: str
class Config:
orm_mode = True
getter_dict = AuthorBookGetter
class AuthorSchema(BaseModel):
id: int
name: str
books: List[AuthorBookSchema]
class Config:
orm_mode = True
Custom JSON Serializer
See the complete code on github.
My final solution is to set up pydantic models that exactly mimic our SQL models, but insert a custom JSON serialization mechanism so that the pydantic models are serialized to JSON the way we desire. Let’s see how this works for a book and its authors.
First the SQLAlchemy models. Again, it’s the class association object pattern we’ve seen before.
class BookAuthor(Base):
__tablename__ = 'book_authors'
book_id = Column(ForeignKey('books.id'), primary_key=True)
author_id = Column(ForeignKey('authors.id'), primary_key=True)
blurb = Column(String, nullable=False)
book = relationship("Book", back_populates="authors")
author = relationship("Author", back_populates="books")
class Book(Base):
__tablename__ = 'books'
id = Column(Integer, primary_key=True)
title = Column(String, nullable=False)
authors = relationship("BookAuthor", back_populates="book")
class Author(Base):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
books = relationship("BookAuthor", back_populates="author")
Next, the pydantic models.
from typing import List, Optional
from pydantic import BaseModel
class RelatedAuthorSchema(BaseModel):
id: int
name: str
class Config:
orm_mode = True
class BookAuthorSchema(BaseModel):
blurb: str
author: Optional[RelatedAuthorSchema]
class Config:
orm_mode = True
class BookSchema(BaseModel):
id: int
title: str
authors: List[BookAuthorSchema]
class Config:
orm_mode = True
This time we set up three models that mimic our database models.
So, how does a pydantic model get serialized to JSON? Well, BaseModel
has a .dict()
method that converts a pydantic model instance to a dict, and then it has a .json()
method that converts that dict to JSON. (See here for details.)
pydantic allows us to configure a custom JSON serializer via the json_dumps
config attribute as mentioned here, but unfortunately FastAPI ignores this. Instead, FastAPI relies on the .dict()
method to create a dictionary object which it then serializes to JSON on its own. So, our strategy is to extend the .dict()
method to rearrange the dictionary the way we want it.
First, let’s see what .dict()
produces on its own.
with Session(bind=engine) as session:
# Fetch the first book
db_book = session.query(Book).first()
# convert to BookSchema and print .dict() serialization
schema_book = BookSchema.from_orm(db_book)
print(schema_book.dict())
{
'id': 1,
'title': "Dead People Who'd Be Influencers Today",
'authors': [
{
'blurb': 'Blue wrote chapter 1',
'author': {
'id': 1,
'name': 'Blu Renolds'
}
},
{
'blurb': 'Chip wrote chapter 2',
'author': {
'id': 2,
'name': 'Chip Egan'
}
}
]
}
The goal here is to move id
and name
one level up in this nested dictionary. Here’s how we can extend the .dict()
method to make this happen.
class BookSchema(BaseModel):
id: int
title: str
authors: List[BookAuthorSchema]
def dict(self, **kwargs):
data = super(BookSchema, self).dict(**kwargs)
for a in data['authors']:
a['id'] = a['author']['id']
a['name'] = a['author']['name']
del a['author']
return data
class Config:
orm_mode = True
Now the same print(schema_book.dict())
produces
{
'id': 1,
'title': "Dead People Who'd Be Influencers Today",
'authors': [
{
'blurb': 'Blue wrote chapter 1',
'id': 1,
'name': 'Blu Renolds'
},
{
'blurb': 'Chip wrote chapter 2',
'id': 2,
'name': 'Chip Egan'
}
]
}
Drawbacks
- Unfortunately, the auto-generating documentation features provided by pydantic + Fast API are now incorrect. If you look closely at the swagger docs, you’ll see the Example Value | Schema sections are inaccurate.
Why is this so hard?
While I was struggling with this problem, I kept wondering why SQLAlchemy was making my life more difficult than it needed to be, but then I had an epiphany. You see, not all many-to-many relationships are the same. The key insight came when I thought about a different toy example..
Suppose we wanted to build an app called Moviepedia which provides information about actors and movies. Similar to Bookipedia, we’d store our data in three tables like
movies
actors
characters
Notice I called the junction table “characters” and not “movie_actors”, wink wink.. The key insight here is that one actor can play multiple characters in the same movie. In other words, the same (actor_id, movie_id)
pair can occur multiple times in the characters table (and for this reason, characters should have a dedicated id
primary key column). Here’s an example of what the data might look like
When a user request info about movie 2, GET /movies/2 the response should look something like this
{
id: 2
title: 'Austin Powers',
actors: [
{
id: 0,
name: 'Mike Myers',
characters: [
{character: 'Austin Powers'},
{character: 'Doctor Evil}'
]
},
{
id: 1,
name: sue,
characters: [
{characer: 'blah'}
]
}
]
}
Notice the three levels of data (movie > actors > characters) are necessary in this scenario because a (movie, actor)
can have multiple characters. This is distinctly different from Bookipedia where a (book, author)
can only have one blurb, and thus the three levels of nested data (books > authors > blurbs) can be flattened into just two levels (books > authors).
In other words, SQLAlchemy by default steers us towards three levels of nested JSON data because it supports the generic case where the junction table can have multiple entries for the same (table A, table B)
pair.