Many-To-Many Relationships In FastAPI

2021-12-22 4320 words 21 minutes

Contents

In this tutorial, I cover multiple strategies for handling many-to-many relationships using FastAPI with SQLAlchemy and pydantic. I assume you already know of what these things are and how to use them in a basic sense.

The code for this tutorial can be found on github.

Intro

I’m working on a project in FastAPI and I needed to way to store and serve data with a many-to-many relationship. What I thought would take ~15 minutes turned into a week-long affair of investigative research and experimentation.

Aside

My recent week was one of those emotional roller coasters that every programmer experiences. You desperately want to accomplish some goal, but minutes turn unto hours and hours into days and you still don’t have a solution. You face incredible frustration - borderline depression - where you’re reading undocumented source code at 1AM which only raises more questions than answers. But then, usually after a good sleep and some time away, you figure it out. And as a result of the struggle, you’re way smarter than you were a week ago. I’m sure a lot of you can relate.

About this tutorial

In this tutorial, I’ll not only show you how to handle many-to-many relationships, but I’ll walk you through my personal discovery process (with less pain, of course). This means we’ll see some dead ends and non ideal solutions, but some of those dead ends introduce really powerful techniques that could be useful elsewhere. Trust me - it’s worth it.

Setup

The best tutorials start with a motivating example..

Suppose we want to build a REST API to provide information about books and authors. A book can have multiple authors and an author can have multiple books. This is a classic data relationship known as “many-to-many”. To store such data in a SQL database, you need three tables.

books {id, title}
authors {id, name}
book_authors {book_id, author_id}

If we want to know which books were written by author 1, we join authors to book_authors to books, filtered where authors.id = 1.

# books written by author 1

SELECT temp.* FROM (
  SELECT
      books.*,
      book_authors.author_id
  FROM books INNER JOIN book_authors ON books.id = book_authors.book_id
) AS temp
INNER JOIN authors ON temp.author_id = authors.id
WHERE authors.id = 1

Here, book_authors is what’s known as a “junction table” or an “association table”.

Modeling this kind of data in SQLAlchemy and FastAPI actually isn’t all that hard. The real struggle starts when you want to include extra data in the association table. For example, books usually have a blurb about each author that discusses their contribution to the book. For example, a book about pollution might have two authors with blurbs like

“Harry is a scientist who wrote chapters 1-3.”
“Susan is an economist who wrote chapter 4.”

In this case, blurb is specific to each (book, author) pair, and so it should be stored as a column in the book_authors junction table. It turns out, modeling and serving this data is kind of tricky if you’re anal about data structures like I am. But before we dive into the specifics, let’s get our feet wet by implementing a FastAPI application for serving many-to-many data without extra data.

Many-to-many without extra data

In this section, we’ll attempt to implement a FastAPI app called Bookipedia which stores data on books and their authors. It’s the simpler form of the problem discussed above; here our junction table is simply a mapping between books and authors without any extra information. That is, we want to build the app around these three tables:

Our app will be a very simple read-only API with four endpoints

GET /books
GET /books/{id}
GET /authors
GET /authors/{id}

When a user requests a specific book like GET /books/2 they should receive a JSON response like

{
  id: 2
  title: 'Foo',
  authors: [
    {id: 0, name: 'bob'},
    {id: 1, name: 'sue'}
  ]
}

Challenge

Before I show you my solution to this problem, I strongly encourage you to attempt to build this yourself. You learn more when you struggle :)

Solution

See the complete code on github.

SQLAlchemy

First lets set up our SQLAlchemy models. My code below is based on this section of the SQLAlchemy docs, except I include the additional stuff necessary to create an in-memory SQLite database. (Check out my tutorial on SQLAlchemy if the below code is super confusing.)

"""
FastAPI app called 'Bookipedia' that serves information about books and their authors. A simple example of a
"many-to-many" relationship *without* extra data.
"""

from sqlalchemy import create_engine, Column, Integer, String, Table, ForeignKey
from sqlalchemy.orm import declarative_base, relationship

# Make the engine
engine = create_engine("sqlite+pysqlite:///:memory:", future=True, echo=True,
                       connect_args={"check_same_thread": False})

# Make the DeclarativeMeta
Base = declarative_base()

# Declare Classes / Tables
book_authors = Table('book_authors', Base.metadata,
    Column('book_id', ForeignKey('books.id'), primary_key=True),
    Column('author_id', ForeignKey('authors.id'), primary_key=True)
)

class Book(Base):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    title = Column(String, nullable=False)
    authors = relationship("Author", secondary="book_authors", back_populates='books')

class Author(Base):
    __tablename__ = 'authors'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    books = relationship("Book", secondary="book_authors", back_populates='authors')

# Create the tables in the database
Base.metadata.create_all(engine)

Notes

We define the book_authors table using SQLAlchemy’s Table() class (“imperative style”), and we define the books and authors tables using inheritance (“declarative style”). We could’ve used imperative style only or declarative style only, but opted for a mix of both. (This is weird to me, but it’s what they do in the docs..)
In the junction table, we declared the primary key as the pair of columns (book_id, author_id). Because of this, every (book_id, author_id) pair in the table must be unique.

Now let’s insert some data. Here we create two books and three authors, mapping authors 1 & 2 to book 1 and authors 1 & 3 to book 2.

# Insert data
from sqlalchemy.orm import Session
with Session(bind=engine) as session:
    book1 = Book(title="Dead People Who'd Be Influencers Today")
    book2 = Book(title="How To Make Friends In Your 30s")
    
    author1 = Author(name="Blu Renolds")
    author2 = Author(name="Chip Egan")
    author3 = Author(name="Alyssa Wyatt")
    
    book1.authors = [author1, author2]
    book2.authors = [author1, author3]
    
    session.add_all([book1, book2, author1, author2, author3])
    session.commit()

Now let’s run some queries on the data. If you’re feeling ambitious, try solving these problems on your own.

Get the book with id 1 and print its title

with Session(bind=engine) as session:
    b1 = session.query(Book).where(Book.id == 1).one()
    print(b1.title)

Print the name of each of its authors

with Session(bind=engine) as session:
    b1 = session.query(Book).where(Book.id == 1).one()
    for a in b1.authors:
        print(a.name)

Hold up. In the last example, it looks like SQLAlchemy executed two separate SQL queries: one when we fetched the author and another when we fetch his books. (Make sure you set echo=True when you instantiated the database engine to see this.)

This is commonly called the “N+1 SELECTS” problem. It happens because SQLAlchemy lazily loads objects related to an instance. This is a helpful default behavior, but if you know ahead of time that you’ll need those related objects, tell SQLAlchemy to fetch them in the first query. Otherwise you might suffer from performance issues. To fix this, just modify the query using joinedload() as follows:

from sqlalchemy.orm import joinedload
with Session(bind=engine) as session:
    b1 = session.query(Book).\
        options(joinedload(Book.authors)).\
        where(Book.id == 1).one()
    print(b1.title)

You can read more about data loading strategies here.

pydantic

Now let’s set up pydantic models which we can use as response models in our FastAPI path operation functions. (Recall, pydantic models give us a way to document API input and output, validate data, and serialize data to/from JSON.)

class AuthorBase(BaseModel):
    id: int
    name: str

    class Config:
        orm_mode = True

class BookBase(BaseModel):
    id: int
    title: str

    class Config:
        orm_mode = True

class BookSchema(BookBase):
    authors: List[AuthorBase]

class AuthorSchema(AuthorBase):
    books: List[BookBase]

Notes

It might seem like overkill to set up four models. Why no just two - Author and Book? Try it and you’ll quickly realize you can’t avoid a circular dependency issue whereby Author depends on Book and Book depends on Author.

Let’s try populating a BookSchema and serializing it to JSON.

with Session(bind=engine) as session:
    b1 = session.query(Book).options(joinedload(Book.authors)).first()

b1_schema = BookSchema.from_orm(b1)
print(b1_schema.json())

{
  "id": 1,
  "title": "Dead People Who'd Be Influencers Today",
  "authors": [
    {
      "id": 1,
      "name": "Blu Renolds"
    },
    {
      "id": 3,
      "name": "Chip Egan"
    }
  ]
}

FastAPI

Finally, let’s set up our FastAPI app and path operation functions.

"""
FastAPI app called 'Bookipedia' that serves information about books and their authors. A simple example of a
"many-to-many" relationship *without* extra data.
"""

# <SQLAlchemy code not shown...>

# <pydantic code not shown...>
    
from fastapi import FastAPI, Depends

app = FastAPI(title="Bookipedia")

def get_db():
    db = Session(bind=engine)
    try:
        yield db
    finally:
        db.close()

@app.get("/books/{id}", response_model=BookSchema)
async def get_book(id: int, db: Session = Depends(get_db)):
    db_book = db.query(Book).options(joinedload(Book.authors)).\
        where(Book.id == id).one()
    return db_book


@app.get("/books", response_model=List[BookSchema])
async def get_books(db: Session = Depends(get_db)):
    db_books = db.query(Book).options(joinedload(Book.authors)).all()
    return db_books


@app.get("/authors/{id}", response_model=AuthorSchema)
async def get_author(id: int, db: Session = Depends(get_db)):
    db_author = db.query(Author).options(joinedload(Author.books)).\
        where(Author.id == id).one()
    return db_author


@app.get("/authors", response_model=List[AuthorSchema])
async def get_authors(db: Session = Depends(get_db)):
    db_authors = db.query(Author).options(joinedload(Author.books)).all()
    return db_authors

Notes

We didn’t set up any error handling for the sake of brevity, so when you test the endpoints, make sure you pass in a valid Book id and Author id. (SQLite auto-increments ids starting from 1.)
One thing that’s a little bit mysterious here is how FastAPI converts our SQLAlchemy model instances into JSON. For example, in the get_book() path operation function, we populate and return a Book instance, yet somehow FastAPI serves a JSON serialization of that instance. What happens under the hood is more complicated than this, but in essence it’s this:

schema_book = BookSchema.from_orm(db_book)
schema_book.json()

Many-to-many with extra data

At this point, I’d like to make a seemingly benign tweak to the Bookipedia app we built in the last section. I want to add a blurb field to the book_authors table so that we can store blurbs about each author for each book.

Now when a user requests a specific book like GET /books/2 they should receive a JSON response like

{
  id: 2
  title: 'Foo',
  authors: [
    {id: 0, name: 'bob', blurb: 'this is a blurb'},
    {id: 1, name: 'sue', blurb: 'this is also a blurb'}
  ]
}

Our seemingly benign tweak to the data turns out to wreak havoc (or at least, it did for me). Again, I challenge you to attempt solving this..

SQLAlchemy Association Object Pattern

The first thing we need to do is add a blurb field to our SQLAlchemy model. We’ll change our declaration of book_authors from this..

book_authors = Table('book_authors', Base.metadata,
    Column('book_id', ForeignKey('books.id'), primary_key=True),
    Column('author_id', ForeignKey('authors.id'), primary_key=True)
)

to this

class BookAuthor(Base):
    __tablename__ = 'book_authors'
    book_id = Column(ForeignKey('books.id'), primary_key=True)
    author_id = Column(ForeignKey('authors.id'), primary_key=True)
    blurb = Column(String, nullable=False)

We leave the Book and Author models unchanged (for now).

class Book(Base):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    title = Column(String, nullable=False)
    authors = relationship("Author", secondary="book_authors", back_populates='books')

class Author(Base):
    __tablename__ = 'authors'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    books = relationship("Book", secondary="book_authors", back_populates='authors')

This is described in the SQLAlchemy Docs as the association object pattern. We can insert the same data as before but with blurbs like this.

# Insert data
from sqlalchemy.orm import Session
with Session(bind=engine) as session:
    book1 = Book(title="Dead People Who'd Be Influencers Today")
    book2 = Book(title="How To Make Friends In Your 30s")

    author1 = Author(name="Blu Renolds")
    author2 = Author(name="Chip Egan")
    author3 = Author(name="Alyssa Wyatt")

    session.add_all([book1, book2, author1, author2, author3])
    session.commit()

    book_author1 = BookAuthor(book_id=book1.id, author_id=author1.id, blurb="Blue wrote chapter 1")
    book_author2 = BookAuthor(book_id=book1.id, author_id=author2.id, blurb="Chip wrote chapter 2")
    book_author3 = BookAuthor(book_id=book2.id, author_id=author1.id, blurb="Blue wrote chapters 1-3")
    book_author4 = BookAuthor(book_id=book2.id, author_id=author3.id, blurb="Alyssa wrote chapter 4")

    session.add_all([book_author1, book_author2, book_author3, book_author4])
    session.commit()

Next let’s tweak our pydantic models to include an Optional blurb field.

from typing import List, Optional
from pydantic import BaseModel

class AuthorBase(BaseModel):
    id: int
    name: str
    blurb: Optional[str]

    class Config:
        orm_mode = True

class BookBase(BaseModel):
    id: int
    title: str
    blurb: Optional[str]

    class Config:
        orm_mode = True

class BookSchema(BookBase):
    authors: List[AuthorBase]

class AuthorSchema(AuthorBase):
    books: List[BookBase]

Here we add blurb as an Optional field to both AuthorBase and BookBase. If we fetch an author and his related books, we expect each related book to have blurb populated. If we fetch a book and its related authors, we expect each author to have blurb populated.

Let’s see what happens if we attempt to fetch the book with id 1 and use it to populate the BookSchema pydantic model we set up earlier.

with Session(bind=engine) as session:
    db_book = session.query(Book).\
        options(joinedload(Book.authors)).\
        where(Book.id == 1).one()

schema_book = BookSchema.from_orm(db_book)
print(schema_book.json())

{
  "id": 1,
  "title": "Dead People Who'd Be Influencers Today",
  "blurb": null,
  "authors": [
    {
      "id": 1,
      "name": "Blu Renolds",
      "blurb": null
    },
    {
      "id": 2,
      "name": "Chip Egan",
      "blurb": null
    }
  ]
}

No error, but all the blurbs come back null. This shouldn’t be too surprising.. db_book is a Book instance which points to a collection of Author instances, but blurb is a field of BookAuthor, not Author. In other words, db_book.authors[0].blurb doesn’t exit.

Association Proxy

See the complete code on github.

My first solution to this problem was to take advantage of SQLAlchemy’s Association Proxy (thanks to the hint from “van” here).

Step 1: Adjust SQLAlchemy Relationships

In the Author model, make books point to BookAuthor instead of Book. In other words, change this

class Author(Base):
    ...
    books = relationship("Book", secondary="book_authors", back_populates='authors')

to this

class Author(Base):
    ...
    books = relationship("BookAuthor")

Then give BookAuthor an author field, which points to the Author model like

class BookAuthor(Base):
    ...
    author = relationship("Author")

Do essentially the same thing for the Book model. I’ll include back_populates as well, so my updated models look like this.

class BookAuthor(Base):
    __tablename__ = 'book_authors'
    book_id = Column(ForeignKey('books.id'), primary_key=True)
    author_id = Column(ForeignKey('authors.id'), primary_key=True)
    blurb = Column(String, nullable=False)
    book = relationship("Book", back_populates="authors")
    author = relationship("Author", back_populates="books")

class Book(Base):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    title = Column(String, nullable=False)
    authors = relationship("BookAuthor", back_populates="book")

class Author(Base):
    __tablename__ = 'authors'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    books = relationship("BookAuthor", back_populates="author")

Now when we fetch a book, we can access its first author’s blurb and name as follows

with Session(bind=engine) as session:
    db_book = session.query(Book).first()
    
    db_book.authors[0].blurb
    db_book.authors[0].author.name

Important: Notice that to get the name of the author, we can’t simply call db_book.authors[0].name. That’s because db_book.authors[0] is a BookAuthor instance and BookAuthor doesn’t have a name attribute. It does, however, have an author attribute. So we can fetch the author’s name via db_book.authors[0].author.name.

Step 2: Create Association Proxies

Suppose we fetch a book and attempt to populate our existing BookSchema with it.

with Session(bind=engine) as session:
    db_book = session.query(Book).options(
        joinedload(Book.authors).options(joinedload(BookAuthor.author))
    ).first()

schema_book = BookSchema.from_orm(db_book)

As it stands, we’ll get an error like this.

pydantic.error_wrappers.ValidationError: 4 validation errors for BookSchema
authors -> 0 -> id
field required (type=value_error.missing) …

That’s because our BookSchema pydantic model attempts to populate the related authors' ids using book.authors[0].id, book.authors[1].id, etc. But as we discussed a minute ago, the Author attributes need to be accessed like book.authors[0].author.id or book.authors[0].author.name. To overcome this, we can place association proxies in BookAuthor for each Author attribute we want to be directly accessible from the BookAuthor model. For example,

from sqlalchemy.ext.associationproxy import association_proxy

class BookAuthor(Base):
    __tablename__ = 'book_authors'
    book_id = Column(ForeignKey('books.id'), primary_key=True)
    author_id = Column(ForeignKey('authors.id'), primary_key=True)
    blurb = Column(String, nullable=False)
    book = relationship("Book", back_populates="authors")
    author = relationship("Author", back_populates="books")

    # proxies
    author_name = association_proxy(target_collection='author', attr='name')
    book_title = association_proxy(target_collection='book', attr='title')

Now when we fetch db_book (as above) we can directly access a related author’s name via db_book.authors[0].author_name.

Step 3: pydantic Aliases

With the tweaks we made in step 2 above, we have the data we need in the correct objects to populate our pydantic models. …but we have some name mismatches. BookSchema’s authors expect an id attribute and a name attribute, but our Book’s authors have an author_id attribute and an author_name attribute. (We have similar name mismatches for Author’s books.) Fortunately, pydantic supports Field aliases. Here’s a working implementation.

from typing import List, Optional
from pydantic import BaseModel, Field

class AuthorBase(BaseModel):
    id: int = Field(alias='author_id')
    name: str = Field(alias='author_name')
    blurb: Optional[str]

    class Config:
        orm_mode = True
        allow_population_by_field_name = True

class BookBase(BaseModel):
    id: int = Field(alias='book_id')
    title: str = Field(alias='book_title')
    blurb: Optional[str]

    class Config:
        orm_mode = True
        allow_population_by_field_name = True

class BookSchema(BookBase):
    authors: List[AuthorBase]

class AuthorSchema(AuthorBase):
    books: List[BookBase]

Without making any changes to the FastAPI code we set up in the previous section, our app works :). Here’s a screenshot from the swagger docs, after I GET /books/1.

Looking at this response, two annoying issues stand out:

blurb exists as an attribute of book (although it’s null which is good). Remember, blurb is a note about the author of a book, so it should only be an attribute of an author’s books or a book’s authors. It shouldn’t exist as a top-level attribute. To fix this, pass response_model_exclude={'blurb'} into the path operation decorators.
Our alias names (e.g. author_name and author_id) are being used in the JSON response instead of our preferred names (‘name’ and ‘id’). To fix this, pass response_model_by_alias=False into the path operation decorators.

Now the get_book() function looks like this

@app.get("/books/{id}", response_model=BookSchema,
         response_model_exclude={'blurb'}, response_model_by_alias=False)
async def get_book(id: int, db: Session = Depends(get_db)):
    db_book = db.query(Book).options(joinedload(Book.authors)).\
        where(Book.id == id).one()
    return db_book

And the response from Get /books/1 looks like this

Drawbacks

It’s awkward and un-pythonic that db_book.authors actually returns a collection of BookAuthors, not true Authors
It’s inconvenient to add every little author and book attribute as a proxy onto the BookAuthor model
Frankly this entire approach is a bit complex. There must be a simpler solution…

View Only Property

See the complete code on github.

Another strategy is to create a view-only property of our Book model called authors which simply executes a SQL statement that returns the authors for a book with their corresponding blurbs. (Similarly, we could create a books attribute of Author.) Here’s an implementation of this.

class BookAuthor(Base):
    __tablename__ = 'book_authors'
    book_id = Column(ForeignKey('books.id'), primary_key=True)
    author_id = Column(ForeignKey('authors.id'), primary_key=True)
    blurb = Column(String, nullable=False)

class Book(Base):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    title = Column(String, nullable=False)

    @property
    def authors(self):
        s = """
            SELECT temp.* FROM (
                SELECT
                    authors.*,
                    book_authors.blurb,
                    book_authors.book_id
                FROM authors INNER JOIN book_authors ON authors.id = book_authors.author_id
            ) AS temp
            INNER JOIN books ON temp.book_id = books.id
            WHERE books.id = :bookid
            """
        result = object_session(self).execute(s, params={'bookid': self.id}).fetchall()
        return result

class Author(Base):
    ...

(Relevant section of the SQLAlchemy docs, here.)

With this technique, our pydantic model and path operation functions can remain extremely simple (as they were in the many-to-many without extra data section). You can checkout the full code for this here.

Drawbacks

The related collections are read-only, so we can’t modify them if we wanted to.
This setup suffers from the “N+1 SELECTs” problem. (There may be a way around this that I’m not aware of.) For example, see what happens when we fetch all authors.

The response looks good.. authors response

But notice the four separately executed SELECT statements! authors selects

pydantic Custom GetterDict

See the complete code on github.

The premise behind the third solution is to modify our pydantic models so that they can read our SQLAlchemy models which are misaligned in structure. This is based on the discussion here and palderosa’s solution here.

1. SQLAlchemy Models

We’ll start by setting up SQLAlchemy models with an association object pattern.

# Declare Classes / Tables
class BookAuthor(Base):
    __tablename__ = 'book_authors'
    book_id = Column(ForeignKey('books.id'), primary_key=True)
    author_id = Column(ForeignKey('authors.id'), primary_key=True)
    blurb = Column(String, nullable=False)
    book = relationship("Book", back_populates="authors")
    author = relationship("Author", back_populates="books")

class Book(Base):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    title = Column(String, nullable=False)
    authors = relationship("BookAuthor", back_populates="book")

class Author(Base):
    __tablename__ = 'authors'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    books = relationship("BookAuthor", back_populates="author")

2. Pydantic Models

Next we’ll set up our pydantic models (schemas) just as we did in the last section

from typing import List
from pydantic import BaseModel

class AuthorBase(BaseModel):
    id: int
    name: str

    class Config:
        orm_mode = True

class BookBase(BaseModel):
    id: int
    title: str

    class Config:
        orm_mode = True

class BookSchema(BookBase):
    authors: List[AuthorBase]

class AuthorSchema(AuthorBase):
    books: List[BookBase]

Recall the current issue - if we fetch a book from the database, pydantic attempts to populate the book’s authors' id and name with calls like db_book.authors[0].id and db_book.authors[1].name, but the correct way to access these attributes is db_book.authors[0].author.id and db_book.authors[0].author.id. If only we could customize the way in which pydantic gets these attributes..

The trick here is to create a custom GetterDict and provide it as a Config attribute named getter_dict. This is discussed in the pydantic docs here. Here’s an implementation with our models.

from typing import List, Any
from pydantic import BaseModel
from pydantic.utils import GetterDict


class BookAuthorGetter(GetterDict):
    def get(self, key: str, default: Any = None) -> Any:
        if key in {'id', 'name'}:
            return getattr(self._obj.author, key)
        else:
            return super(BookAuthorGetter, self).get(key, default)


class BookAuthorSchema(BaseModel):
    id: int
    name: str
    blurb: str

    class Config:
        orm_mode = True
        getter_dict = BookAuthorGetter


class BookSchema(BaseModel):
    id: int
    title: str
    authors: List[BookAuthorSchema]

    class Config:
        orm_mode = True


class AuthorBookGetter(GetterDict):
    def get(self, key: str, default: Any = None) -> Any:
        if key in {'id', 'title'}:
            return getattr(self._obj.book, key)
        else:
            return super(AuthorBookGetter, self).get(key, default)


class AuthorBookSchema(BaseModel):
    id: int
    title: str
    blurb: str

    class Config:
        orm_mode = True
        getter_dict = AuthorBookGetter


class AuthorSchema(BaseModel):
    id: int
    name: str
    books: List[AuthorBookSchema]

    class Config:
        orm_mode = True

Custom JSON Serializer

See the complete code on github.

My final solution is to set up pydantic models that exactly mimic our SQL models, but insert a custom JSON serialization mechanism so that the pydantic models are serialized to JSON the way we desire. Let’s see how this works for a book and its authors.

First the SQLAlchemy models. Again, it’s the class association object pattern we’ve seen before.

class BookAuthor(Base):
    __tablename__ = 'book_authors'
    book_id = Column(ForeignKey('books.id'), primary_key=True)
    author_id = Column(ForeignKey('authors.id'), primary_key=True)
    blurb = Column(String, nullable=False)
    book = relationship("Book", back_populates="authors")
    author = relationship("Author", back_populates="books")

class Book(Base):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    title = Column(String, nullable=False)
    authors = relationship("BookAuthor", back_populates="book")

class Author(Base):
    __tablename__ = 'authors'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    books = relationship("BookAuthor", back_populates="author")

Next, the pydantic models.

from typing import List, Optional
from pydantic import BaseModel

class RelatedAuthorSchema(BaseModel):
    id: int
    name: str

    class Config:
        orm_mode = True

class BookAuthorSchema(BaseModel):
    blurb: str
    author: Optional[RelatedAuthorSchema]

    class Config:
        orm_mode = True

class BookSchema(BaseModel):
    id: int
    title: str
    authors: List[BookAuthorSchema]

    class Config:
        orm_mode = True

This time we set up three models that mimic our database models.

So, how does a pydantic model get serialized to JSON? Well, BaseModel has a .dict() method that converts a pydantic model instance to a dict, and then it has a .json() method that converts that dict to JSON. (See here for details.)

pydantic allows us to configure a custom JSON serializer via the json_dumps config attribute as mentioned here, but unfortunately FastAPI ignores this. Instead, FastAPI relies on the .dict() method to create a dictionary object which it then serializes to JSON on its own. So, our strategy is to extend the .dict() method to rearrange the dictionary the way we want it.

First, let’s see what .dict() produces on its own.

with Session(bind=engine) as session:

    # Fetch the first book
    db_book = session.query(Book).first()

    # convert to BookSchema and print .dict() serialization
    schema_book = BookSchema.from_orm(db_book)
    print(schema_book.dict())
    
{
  'id': 1, 
  'title': "Dead People Who'd Be Influencers Today", 
  'authors': [
    {
      'blurb': 'Blue wrote chapter 1', 
      'author': {
        'id': 1, 
        'name': 'Blu Renolds'
      }
    }, 
    {
      'blurb': 'Chip wrote chapter 2', 
      'author': {
        'id': 2, 
        'name': 'Chip Egan'
      }
    }
  ]
}

The goal here is to move id and name one level up in this nested dictionary. Here’s how we can extend the .dict() method to make this happen.

class BookSchema(BaseModel):
    id: int
    title: str
    authors: List[BookAuthorSchema]
    
    def dict(self, **kwargs):
        data = super(BookSchema, self).dict(**kwargs)

        for a in data['authors']:
            a['id'] = a['author']['id']
            a['name'] = a['author']['name']
            del a['author']

        return data

    class Config:
        orm_mode = True

Now the same print(schema_book.dict()) produces

{
  'id': 1, 
  'title': "Dead People Who'd Be Influencers Today", 
  'authors': [
    {
      'blurb': 'Blue wrote chapter 1', 
      'id': 1, 
      'name': 'Blu Renolds'
    }, 
    {
      'blurb': 'Chip wrote chapter 2', 
      'id': 2, 
      'name': 'Chip Egan'
    }
  ]
}

Drawbacks

Unfortunately, the auto-generating documentation features provided by pydantic + Fast API are now incorrect. If you look closely at the swagger docs, you’ll see the Example Value | Schema sections are inaccurate.

Why is this so hard?

While I was struggling with this problem, I kept wondering why SQLAlchemy was making my life more difficult than it needed to be, but then I had an epiphany. You see, not all many-to-many relationships are the same. The key insight came when I thought about a different toy example..

Suppose we wanted to build an app called Moviepedia which provides information about actors and movies. Similar to Bookipedia, we’d store our data in three tables like

movies
actors
characters

Notice I called the junction table “characters” and not “movie_actors”, wink wink.. The key insight here is that one actor can play multiple characters in the same movie. In other words, the same (actor_id, movie_id) pair can occur multiple times in the characters table (and for this reason, characters should have a dedicated id primary key column). Here’s an example of what the data might look like

When a user request info about movie 2, GET /movies/2 the response should look something like this

{
  id: 2
  title: 'Austin Powers',
  actors: [
    {
      id: 0, 
      name: 'Mike Myers', 
      characters: [
        {character: 'Austin Powers'},
        {character: 'Doctor Evil}'
      ]
    },
    {
      id: 1, 
      name: sue, 
      characters: [
        {characer: 'blah'}
      ]
    }
  ]
}

Notice the three levels of data (movie > actors > characters) are necessary in this scenario because a (movie, actor) can have multiple characters. This is distinctly different from Bookipedia where a (book, author) can only have one blurb, and thus the three levels of nested data (books > authors > blurbs) can be flattened into just two levels (books > authors).

In other words, SQLAlchemy by default steers us towards three levels of nested JSON data because it supports the generic case where the junction table can have multiple entries for the same (table A, table B) pair.