10 Tips For Adding SQLAlchemy To FastAPI

Introduction

The popularity of FastAPI has been steadily growing over the last few years. It is climbing the top 25 most popular web frameworks and technologies in the Stack Overflow Developer Survey 2024, right behind Django and Flask. In an age where AI models need to be wrapped in Python APIs at every company, it is no surprise there is a lot of interest in the framework.

1
2
3
4
5
6


from fastapi import FastAPI
app = FastAPI()

@app.get("/")
def root():
    return "As this is all you need to get started!"

The minimalism might tempt you to believe there is not a lot of room for error. But complexity is incremental. Eventually these APIs will have to be updated, integrated, and maintained, so you’re better off having read The Art of Computer Programming than The Art of Statistics. I’ve experienced this first hand: thinking about the poor excuse for an ETL pipeline I hacked together years ago makes me shiver to this day.

Whether you use FastAPI in a large web app or a small ML inference service, chances are want to track metrics, save information or add some users at some point. Therefore, I want to share some learnings having introduced SQLAlchemy to FastAPI several times in the past. I bundled these learnings into 10 practical tips for integrating a relational database into your FastAPI application. The principles can be seen in action in my FastAPI Template Project.

FastAPI and DDD

SQLAlchemy provides a core framework for communicating with a relational database and an ORM. Everyone Python developer using a relational database outside Django is probably familiar with the project. When you decided to integrate with FastAPI however, it is important to keep clear separations of concern. In this section, I introduce a few abstractions we have found to be helpful for both maintaining and testing the code.

1. Don’t put Database Models in your endpoints

SQLAlchemy ORM uses what is called the data mapper pattern. The data mapper pattern is great for its flexibility, and attempts to bridge the Object–relational impedance mismatch. But there are disadvantages to using these ORM models everywhere in your application, in particular your API. What if you want to keep the endpoints simple to users, but have a highly normalized data model, or want to hide fields to users? You might end up adding awkward logic to create contextual awareness of the fields. Or worse: your API will be hard to use. Hence, you should consider evolving your API and database schema separately.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


class TodoInDB(SQL_BASE):  # type: ignore
    __tablename__ = "todo"

    id = Column(Integer, primary_key=True, autoincrement=True)
    key = Column(String(length=128), nullable=False, unique=True)
    value = Column(String(length=128), nullable=False)
    done = Column(Boolean, default=False)


class Todo(BaseModel):
    key: str
    value: str
    done: bool = False

In particular, you should not try to create a class that is both a Pydantic BaseModel and a declarative base from SQLAlchemy. Pydantic is above all an input validation library, so it should be used to validate user input in the API. Above, we see an example of a model for which both a Pydantic BaseModel is created for the API and a SQLAlchemy model. You only need to translate between these objects when passing the boundary between your domain layer and database layer.

2. The Repositories Pattern

Being able to swap components in your software is a good indicator that it is modular. And although you are probably not switching databases anytime soon, being able to swap actual persistence with in-memory structures both allows you to postpone introducing new entities (or even a whole database) and make it easy to write unit tests. This requires creating an interface around the storage layer and all query logic. Enter the Repository Pattern.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


class TodoRepository:
    def save(self, todo: Todo):
        raise NotImplementedError()

    def get_by_key(self, key: str) -> Todo | None:
        raise NotImplementedError()

class SQLTodoRepository(TodoRepository):
    def __init__(self, session: Session):
        self._session: Session = session

    def save(self, todo: Todo):
        self._session.add(TodoInDB(key=todo.key, value=todo.value))

    def get_by_key(self, key: str) -> Todo | None:
        instance = self._session.query(TodoInDB).filter(TodoInDB.key == key).first()

        if instance:
            return Todo(key=instance.key, value=instance.value, done=instance.done)

The Repository Pattern can be found in books such as Domain Driven Design. For our purpose, repositories are intermediaries that store the entities in our domain model. A repository typically has CRUD methods such as create(), get() or update(). The signatures of these methods consist of domain models and query parameters to filter or paginate the response. They do not expose how they persist entities, only which entities they persist.

The interface of a repository can be implemented using SQLAlchemy, but also just by using a dictionary of objects to use in unit tests.

3. Filter Models

One pattern I personally found to be very helpful is the use of Filter Models. Since you often want to query your database using several filters that can be used through the API as well, collecting them in a separate BaseModel allows you to add or remove filter parameters and change defaults without changing a Repository’s interface. It also keeps the interface clean as it reduces the amount of arguments to the repository methods.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


class TodoFilter(BaseModel):
    limit: int = 1
    key_contains: str | None = None
    value_contains: str | None = None
    done: bool | None = None

class SQLTodoRepository:
    ...

    def get(self, todo_filter: TodoFilter) -> List[Todo]:
        query = self._session.query(TodoInDB)

        if todo_filter.key_contains is not None:
            query = query.filter(TodoInDB.key.contains(todo_filter.key_contains))

        if todo_filter.value_contains is not None:
            query = query.filter(TodoInDB.value.contains(todo_filter.value_contains))

        if todo_filter.done is not None:
            query = query.filter(TodoInDB.done == todo_filter.done)

        if todo_filter.done is not None:
            query = query.limit(todo_filter.limit)

        return [Todo(key=todo.key, value=todo.value, done=todo.done) for todo in query]

Still, this is Python, not Java. It is not uncommon to just choose for a larger amount keyword arguments for the get method in this case when it is clear these are use for filtering. From the developers perspective, it might be a bit more tedious to import a whole TodoFilter object and instantiate it, than to just pass a set of kwargs for which I get completion for function arguments from my editor instead of fields on a class.

Managing SQLAlchemy scopes

FastAPI and SQLAlchemy both have their own documentation on introducing a database, FastAPI focussing on using Depends(), and managing connections, sessions and transactions. But the former shows a rather succinct setup, where the latter is a detailed explanation of SQLAlchemy and all its internals. I will show a complete yet simple setup that should do the trick for basic APIs. This leverages SQLAlchemy’s different scopes without coupling implementation details to your domain.

4. Re-use database connections

As every documentation page will tell you, a big anti-pattern is not re-using database connections. They are expensive to set up, so reconnecting for every transaction introduces significant overhead. The FastAPI documentation shows you how you could approach this using global variables. Another way to deal with this is caching the function that initialises the engine (the object managing the connection pool).

1
2
3


@lru_cache(maxsize=None)
def get_engine():
    return create_engine(os.getenv("DB_STRING"), pool_pre_ping=True)

This way you re-use the engine between request and only pay for the connections on the first one. More FastAPI-specific implementations create and close these objects using lifespan events, or using the new lifespan kwarg to the FastAPI app object:

1
2
3
4
5
6


@asynccontextmanager
async def lifespan(app: FastAPI):
   engine = create_engine(os.getenv("DB_STRING"), pool_pre_ping=True)
   ...

app = FastAPI(lifespan=lifespan)

However, for SQLAlchemy this does not mean that you pay for connections only on startup, as the documentation explains:

All SQLAlchemy pool implementations have in common that none of them “pre create” connections - all implementations wait until first use before creating a connection.

5. One session per request

The second layer of bookkeeping is the managing of sessions. Sessions are not expensive to create. They form a layer between the database that performs transactions, identity mapped objects etc. SQLAlchemy has its own documentation on sessions, where they define the transaction scope and the session scope. In short, in the case of APIs it is good practice to create one session per request.

To manage a sessions life cycle, we can leverage FastAPI’s Depends by creating an iterator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


def create_todo_repository() -> Iterator[TodoRepository]:
    session = sessionmaker(bind=get_engine())()
    todo_repository = SQLTodoRepository(session)

    try:
        yield todo_repository
    except Exception:
        session.rollback()
        raise
    finally:
        session.close()

The FastAPI endpoint to create a Todo would then simply read:

1
2
3


@app.post("/{key}")
def create(key: str, value: str, todo_repository: TodoRepository = Depends(create_todo_repository)):
    ...

Every time the endpoint is called, a repository is instantiated with a new session that is automatically closed after the request ends. Even when an exception occurs, we rollback the session.

6. Transactions using context managers

The last thing to manage is individual transactions. Sometimes, one transaction per request offers enough granularity. In that case, committing the session right after yielding the todo_repository does the trick. When you want to commit transactions in the service layer, repositories need a way to construct transactions, without introducing a leaky abstraction by exposing the session. Luckily, context managers are a great solution. You’ve probably seen this being used in other session, connection, or cursor implementations.

To make a context manager out of a repository, you need to implement an __enter__() and an __exit__() method. We then need to commit the session when exiting the context. The __exit__() even allows you to handle any exceptions raised in the current context, so you can safely rollback any pending transactions before exiting.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


class SQLTodoRepository:
    def __init__(self, session):
        self._session: Session = session

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value: str, exc_traceback: str) -> None:
        if exc_type is not None:
            self._session.rollback()
            return

        try:
            self._session.commit()
        except DatabaseError:
            self._session.rollback()
            raise

    ...

Your endpoint turns into:

1
2
3
4
5
6


@app.post("/{key}")
def create(key: str, value: str, todo_repository: TodoRepository = Depends(create_todo_repository)):
    with todo_repository as repo:
        repo.create(key, value)

    ...

7. Use Alembic

You are most likely going to change your database schema at one point. Alembic has been the standard schema management solution for SQLAlchemy. (They were also developed by the same author!) These just go hand-in-hand.

8. Test your repositories

The great thing about repositories is that they are easy to test. You should check if they save and return the right entities, perform the right filtering and they do not accept invalid values. If you test your repositories well, it gives you a lot of comfort writing more involved business logic since you know that you are at least using the right entities. If you have in-memory implementations (and perhaps add ContractTests for your repositories as well), you can even swap the SQL implementations for these lightweight test doubles to unit test business logic.

While we are at it, we might as well test our migrations by using Alembic in our test setup. This ensures you do not forget to generate and commit your migrations, since your tests will fail without them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51


@pytest.fixture
def todo_repository():
    time.sleep(1)
    alembicArgs = ["--raiseerr", "upgrade", "head"]
    alembic.config.main(argv=alembicArgs)

    engine = get_engine(os.getenv("DB_STRING"))
    session = sessionmaker(bind=engine)()

    yield SQLTodoRepository(session)

    session.close()

    sessionmaker(bind=engine, autocommit=True)().execute(
        ";".join([f"TRUNCATE TABLE {t} CASCADE" for t in SQL_BASE.metadata.tables.keys()])
    )

@pytest.mark.integration
def test_repository(todo_repository: SQLTodoRepository):
    with todo_repository as r:
        r.save(Todo(key="testkey", value="testvalue"))

    todo = r.get_by_key("testkey")
    assert todo.value == "testvalue"

    with pytest.raises(IntegrityError):
        with todo_repository as r:
            r.save(Todo(key="testkey", value="not allowed: unique todo keys!"))

    with pytest.raises(DataError):
        with todo_repository as r:
            r.save(Todo(key="too long", value=129 * "x"))

@pytest.mark.integration
def test_repository_filter(todo_repository: SQLTodoRepository):
    with todo_repository as r:
        r.save(Todo(key="testkey", value="testvalue"))
        r.save(Todo(key="abcde", value="v"))

    todos = r.get(TodoFilter(key_contains="test"))
    assert len(todos) == 1
    assert todos[0].value == "testvalue"

    todos = r.get(TodoFilter(key_contains="abcde"))
    assert len(todos) == 1
    assert todos[0].value == "v"

    assert len(r.get(TodoFilter(key_contains="e"))) == 2
    assert len(r.get(TodoFilter(key_contains="e", limit=1))) == 1
    assert len(r.get(TodoFilter(value_contains="v"))) == 2
    assert len(r.get(TodoFilter(done=True))) == 0

Before adding database to FastAPI

Before diving into SQLAlchemy, here are two tips that help make the right considerations when you have not bitten the bullet yet about introducing SQLAlchemy.

9. Make sure your FastAPI app needs a relational database

Everything is a trade-off. And although a big part of the web runs on relational databases, PostgreSQL is not the answer to everything. Does your inference endpoint only need a trained model that updates its version once in a while? Then it is probably enough to add the version number to the models filename. (Python wheels is one of the many examples of how data can be managed largely by a naming convention.) Want to cache some calculations to speed up response times? Check out the functools caching decorators before adding Redis. Even when authentication can be reduced to hashed environment variables if it is only for your launching customer.

On the other hand, you might have performance requirements that can not be met by a relational database that are optimized for OLTP. For OLAP use-cases such as analytics dashboards on large amounts of data, there may be more suitable database technologies (check out Google BigQuery, Amazon Redshift, Snowflake, or even DuckDB!). The amount of reads and writes, the query patterns, and ACID requirements should all be considered. The data field develops quickly and has shown us that there is no one size fits all.

10. Spend some time on your data model

If you postponed adding a relational database long enough, you should take another critical look at your data model. Of course, if you manage the data yourself and it is not mission-critical, you get away with a lot. But the moment you deploy to production and start storing other people’s data, migrations can become a real pain. Classics include creating a nullable column that should have been mandatory, or decreasing VARCHAR lengths. And how normalized should your schema be? Do you create that extra table and sacrifice some query performance, or do you accept the duplication?

Architecture is what’s hard to change, and your database schema is a crucial part of it. If you want to avoid common data modeling mistakes, check out my blog post about that as well.

Conclusion

SQLAlchemy and FastAPI are both great Python packages that get you up and running with minimal setup. In this post we listed several aspects to take into account as you add a relational database to your API and your project matures. But as mentioned above, everything is a trade-off. You might find turning on Pydantic’s ORM mode a lot easier to work with and never run into needing more flexibility on both your API and database mapping. Maybe you want to leave out the context manager and just set autocommit=True while creating a new session. In the end, you know the requirements of your application best.

Having said that, hopefully these tips will help you build a great data API, or at least give food for thought regarding the design principles. If you want to see all the code in this post in action, you can check out my FastAPI Template Project on GitHub and read my blog post. If you think we missed important tips or you disagree on the ones listed, feel free to reach out! We are always happy to discuss and improve on ideas.

Donny Peeters
February 5, 2023
- 13 minutes - 2607 words