How to build a Python backend? (Part 1: internal architecture)

So you want to create a full-featured web application and you’re wondering if you should use a large framework like Django or something more minimal like Flask? But what if you really need something in the middle? What if you want something simpler than Django because your frontend uses a technology like React or Angular? What if you need more than just a Web API like the one you can build with Flask because your apps handles complex business logic and/or interact with other systems asynchronously?

This article describes how we’ve built this kind of backend service with the following principles in mind:

  • Easy to maintain architecture
  • Ready-for-production
  • Taking full advantage of asyncio for the API, business logic, and interaction with third-party systems

Prerequisites: have a recent Python installed (β‰₯ 3.9) with pip should be enough πŸ˜ƒ

1. Domain Driven Design

First, let’s talk about architecture!

There is a lot to learn from architecture design when you want to build real world applications. When you look at most of the code examples provided by Flask or FastAPI you get a very simple application with a REST API with only a simple handler per endpoint. In real applications you want to separate your business logic from the API calls so you can interact with the app via other canals such as GraphQL API, or RabbitMQ messages. You also need to deal with one or more storage systems, a database, a caching layer, an object storage service, a secret store, and more complex systems like cloud providers APIs, Kubernetes, etc.

To properly implement separation of concerns and abstract interactions with other systems, Domain Driven Design (DDD) concepts provide a nice toolbox to look into. The Architecture Patterns with Python Book (available online here), is a gold mine for understanding how to implement the DDD architecture in Python. It provides tons of step by step examples for every concept so you can understand why you should or shouldn’t apply them. This is a must read, and most of what is presented here is based on this book.

So we’ll walk you through an architecture composed of 3 layers: Domain, Application, and Infrastructure. The Domain layer defines the Data structures in plain Python objects: the business objects. The Application layer holds the brain of the App: the business logic. Finally, the Infrastructure layer is the “arms and legs” of our App: the part that interacts with the external world (HTTP API, database, file system, servomotors, etc).

So let’s create our application’s skeleton with the wonderful poetry:

mkdir myapp
cd myapp
pip install poetry
poetry init
mkdir -p myapp/application
mkdir myapp/domain
mkdir myapp/infrastructure

You should have something like:

β”œβ”€β”€ myapp
β”‚   β”œβ”€β”€ application
β”‚   β”œβ”€β”€ domain
β”‚   └── infrastructure
└── pyproject.toml

1.1 Domain

The Domain layer is a model representation of the services. It really is the core of our services and it must be able to evolve fast. This layer doesn’t depends on any other layer (following the dependency inversion principle) and imports no external libraries (unless for justified exceptions, it only consists in raw python code).

A domain is a dataclass defining a business object. Most of the methods of these dataclasses consist of helpers manipulating the dataclass’ state. Some of these classes are abstract classes, implemented by other classes from the infrastructure layer.

Methods of these classes can return Domain objects, states (“something went wrong”, “no problem here”, “only steps 1 and 3 worked”…), or nothing.

The general rule is to put as much stuff as possible there.

For example, here is an object that represents an entry in our todo app. And yes, our example will be a todo app! (as we all do ^^).

import uuid
from datetime import datetime
from dataclasses import dataclass, field


@dataclass
class TodoEntry:
    id: str
    created_at: datetime
    content: str
    tags: set[str] = field(default_factory=set)

    @classmethod
    def create_from_dict(cls, content:str) -> "TodoEntry":
        return cls(id=str(uuid.uuid4()), created_at=datetime.utcnow(), content=content)

    def set_tag(self, tag: str) -> None:
        self.tags.add(tag)

Did you notice that we heavily use Python types? This is really a good way to get something working quick and with confidence. We strongly advise you to use them and enforce it in the CI so you won’t have surprises at execution time.

1.2 Infrastructure

To manage all the interactions with external systems like database, file system, network, API, etc.

These services act as “wrappers” around external dependencies so that they can be used within the Application layer. This pattern might be called with a lot of names, Adapter, Delegation, Facade… but the main idea is to abstract the underlying infrastructure and provide an internal interface around it. It means that any technology you are using from the outside world is loosely tight to your application. The great advantage of this pattern is that it is easy to switch from one tool to another in a less then a day!

1.2.1 The repository pattern

This is also a place where we can find Repositories. The repository pattern is simply a class abstracting an object persistency. It provides at least add and get functions, providing a single way to store and retrieve data from storage systems. We can start with a Pickle file storage until we reach performance limitations signifying us to switch to an SQL database or something else. This process spares us having to change any line of code in our Application or Domain layer.

For example, here is a ‘Todo entries’ repository using the Pickle library to serialize objects into files:

import pickle
from dataclasses import dataclass
from pathlib import Path

from myapp.domain.todo import TodoEntry
from myapp.domain.todo_entry_repository import ITodoEntryRepository


class TodoEntryNotFound(Exception):
    pass


@dataclass
class TodoEntryPickleRepository(ITodoEntryRepository):
    storage_dir: str

    def get(self, entry_id: str) -> TodoEntry:
        try:
            entry: TodoEntry
            with open(Path(self.storage_dir) / entry_id) as entry_file:
                entry = pickle.load(entry_file)
            return entry
        except Exception:
            raise TodoEntryNotFound()

    def add(self, entry: TodoEntry) -> None:
        with open(Path(self.storage_dir) / entry.id) as entry_file:
            pickle.dump(entry, entry_file

Note that we implement an abstract class in the Domain layer. This allows us to import the repository interface from the Application layer without knowing what the actual implementation is.

1.3 Application

Now that we have the Domain that contains the business object as well as our Repository to manage persistence of this object in the Infrastructure layer, we need to glue them together with our business logic.
The Application layer contains all the services provided by the application, using the Domain structures and the Infrastructure as a backend.

These Application services “orchestrate” the Domain’s structures and the Infrastructure services so that they work together harmoniously.

Application data should not be modified here ; it is the job of the classes’ methods of the Domain layer. As mentioned before, no data is directly modified here. However, we catch exceptions and use object methods to apply the right business rules.

For example we can have a TodoService like this one:

from dataclasses import dataclass
from typing import Optional

from myapp.domain.todo import TodoEntry
from myapp.domain.todo_entry_repository import ITodoEntryRepository


@dataclass
class TodoService:
    todo_repository: ITodoEntryRepository

    def add_entry(self, content: str) -> str:
        entry = TodoEntry.create_from_content(content)
        self.todo_repository.add(entry)
        return entry.id

    def add_tag(self, entry_id: str, tag: str) -> None:
        entry = self.todo_repository.get(entry_id)
        entry.set_tag(tag)

    def get_all(self, search: Optional[str] = None) -> list[TodoEntry]:
        return self.todo_repository.get_all(search)

Wait! When was this todo_repository created and by who? It’s now time to talk about Dependencies Injection.

1.4 Dependencies injection

The goal of dependency injection is to avoid creating objects everywhere or passing them in all functions in some kind of Context melting pot. To do so we’ll define where all Infrastructure services are created, in one single place. We can then easily inject these services as dependencies of Application services using a default value as a singleton (e.g. for a database connection) or a one-time object from a factory (e.g. for an HTTP request handler).

The Dependency Injector library is well designed and provides everything you need to define all your services, inject them and even load configurations.

Let’s install it:

poetry add dependency_injector

We can now create an application container in a container.py file. What’s beautiful with this pattern is that, in one simple file, you describe how to instantiate all internal objects with their configurations and dependencies. For example, you can see that the repository is a Singleton (only instantiated once) and it has a storage_dir configuration entry. Our todo service is a Factory (re-instantiated for every usage) and has a dependency on the repository.

from dependency_injector import providers, containers

from myapp.application.todo_service import TodoService
from myapp.infrastructure.database.todo_entry_repository import TodoEntryPickleRepository


class ApplicationContainer(containers.DeclarativeContainer):
    configuration = providers.Configuration()

    todo_entry_repository = providers.Singleton(
        TodoEntryPickleRepository,
        storage_dir=configuration.storage_dir
    )

    todo_service = providers.Factory(
        TodoService,
        todo_entry_repository
    )

2. Web API

Now that we have our base application, we need to create an API. FastAPI is a really nice library that helps you create your API endpoints, route them, serialize and deserialize the API objects (called models), and even generate interactive documentation pages.

Let’s add it to our dependencies with:

poetry add fastapi

A proper way to add your API is to separate them by controllers, one by group of endpoints. So let’s create a centralized setup file to aggregate common configuration and dependency injection for all controllers in infrastructure/api/setup.py.

from fastapi import FastAPI

from myapp.container import ApplicationContainer
from myapp.infrastructure.api import todo_controller


def setup(app: FastAPI, container: ApplicationContainer) -> None:

    # Add other controllers here
    app.include_router(todo_controller.router)

    # Inject dependencies
    container.wire(
        modules=[
            todo_controller,
        ]
    )

And the controller for the /todo endpoints:

from dataclasses import asdict
from typing import Optional

from dependency_injector.wiring import Provide
from fastapi import APIRouter

from myapp.application.todo_service import TodoService
from myapp.container import ApplicationContainer
from myapp.infrastructure.api.todo_schema import TodoEntrySchema

todo_service: TodoService = Provide[ApplicationContainer.todo_service]

router = APIRouter(
    prefix="/todo",
    tags=["Todo"],
    responses={404: {"description": "Not found"}},
)


@router.get("/", response_model=list[TodoEntrySchema])
async def list_todos(search: Optional[str] = None) -> list[TodoEntrySchema]:
    todo_entries = todo_service.get_all(search)
    return [TodoEntrySchema(**asdict(todo_entry)) for todo_entry in todo_entries]

@router.post("/")
async def add_todo(content: str) -> str:
    return todo_service.add_entry(content)

Here is the todo schema used for the serialization. Note that we use a different object than the internal TodoEntry from the domain because we want to decorellate it from the external API. Thus, you can change your API wordings and hide internal values that are not useful for users. The schema is based on the Pydantic model that uses Python built-in typing. As advertised in the FastAPI documentation, it comes with plenty of advantages like static analysis with Mypy, useful IDE autocomplete, easy debugging and so on.

from pydantic import BaseModel


class TodoEntrySchema(BaseModel):
    id: str
    content: str
    tags: list[str]

3. Test it!

Note that for the sake of simplicity, we kept the testing part out of the way so far… shame on us! This is one of the main reasons we split all of our code this way! Keep in mind that all the components can be easily tested both separately or in-context. Let me give you an example. Our API calls the application service which then calls the repository and then returns a todo list converted from the domain object.

Here is a simple test for our repository in myapp/infrastructure/database/test_todo_entry_repository.py:

from tempfile import TemporaryDirectory

import pytest as pytest

from myapp.container import ApplicationContainer
from myapp.domain.todo import TodoEntry
from myapp.infrastructure.database.todo_entry_repository import TodoEntryPickleRepository


@pytest.fixture()
def repository():
    with TemporaryDirectory() as tmp_dir:
        container = ApplicationContainer()

        container.configuration.storage_dir.from_value(tmp_dir)
        yield container.todo_entry_repository()


def test_add_and_get(repository: TodoEntryPickleRepository):
    entry = TodoEntry.create_from_content("test")
    repository.add(entry)
    assert entry == repository.get(entry.id)

If you don’t know Pytest Fixture, it’s a generator that can be used as an input of a test function. It’s very useful if you have context to manage. Here, our repository uses a local directory to store our objects, so we have to create the directory before the test, and delete it with all its content even if the test fails. To do so, we use tempfile.TemporaryDirectory providing context about where the directory is created when we enter it, and wipe it when we leave it. With our fixture, every function that uses the repository input now has a fresh storage directory sent with yield, and only when the test finishes the fixture does the code continue to be executed. Eventually the context is closed and the directory is deleted.

Let’s install pytest in the test environment (so it doesn’t end up in our final package) and run it. With poetry you can run poetry shell to get a new shell with a virtual environment activated, where all your dependencies are available including tools like pytest:

poetry add --dev pytest
poetry shell
pytest

With this test, we find out that the repository does not open the files in “write in bytes” mode but in “read in utf-8” mode which is the default in Python. So I’ve added the mode="wb" in my add function and the mode="rb" in my get function.

4. Check them all!

A good practice when you’re coding something is to do some formatting and static analysis of you code before committing it to your Git repository. It is called linting. You can apply format, sort the imports, check for inconsistencies and errors in the code, and even check for security issues or dead code.

Here is a set of tools we use for that, all packed in a simple lint.sh script that we’ll run before committing (you’ll need to use slightly different options to run it in the CI, but that’s another story).

echo "-- Checking import sorting"
isort .

echo "-- Checking python formating"
black .

echo "-- Checking python with static checking"
flake8

echo "-- Checking type annotations"
mypy ./myapp  --ignore-missing-imports

echo "-- Checking for dead code"
vulture ./myapp

echo "-- Checking security issues"
bandit -r ./myapp

To install all these, use poetry add --dev ... and poetry shell just like you did for pytest.

Here is an example of the outputs of this script from mypy:

myapp/infrastructure/database/todo_entry_repository.py:36: error: Unsupported operand types for in ("Optional[str]" and "str")

Here is the code:

def get_all(self, search: Optional[str]) -> list[TodoEntry]:
    entries: list[TodoEntry] = []
    for entry_file_path in Path(self.storage_dir).iterdir():
        with open(entry_file_path, mode="rb") as entry_file:
            entry: TodoEntry = pickle.load(entry_file)
            if search in entry.content or search in entry.tags:
                entries.append(entry)
    return entries

Humm, we have an optional search option here, but we never checked if it is “None” before using the in operator on it! It means that we have to manage the “None” case, so let’s replace our “if” with:

if search:
    if search in entry.content or search in entry.tags:
        entries.append(entry)
else:
    entries.append(entry)

And now mypy is happy! πŸ˜ƒ

❯ mypy ./myapp --ignore-missing-imports
Success: no issues found in 16 source files

This is only an example of what static checks can bring you. Use them extensively and you’ll drastically reduce the amount of bugs in production!

5. Application main

Now that we have a full application, we need to glue all this together in some main application function. So we’ll create an app.py file at the root of our source code, loading the configuration and launching the app’s main process.

import logging

from fastapi import FastAPI
import uvicorn

from myapp.container import ApplicationContainer
from myapp.infrastructure.api.setup import setup
from myapp import __version__


def init() -> FastAPI:
    container = ApplicationContainer()

    # Setup logging
    container.configuration.log_level.from_env("TODO_APP_LOG_LEVEL", "INFO")

    str_level = container.configuration.log_level()
    numeric_level = getattr(logging, str_level.upper(), None)
    if not isinstance(numeric_level, int):
        raise ValueError("Invalid log level: %s" % str_level)
    logging.basicConfig(level=numeric_level)
    logger = logging.getLogger(__name__)
    logger.info("Logging level is set to %s" % str_level.upper())

    # init Database
    container.configuration.storage_dir.from_env("TODOAPP_STORAGE_DIR", "/tmp/todoapp")
    Path(container.configuration.storage_dir()).mkdir(parents=True, exist_ok=True)

    # Init API and attach the container
    app = FastAPI()
    app.extra["container"] = container

    # Do setup and dependencies wiring
    setup(app, container)

    # TODO add other initialization here

    return app


def start() -> None:
    """Start application"""
    logger = logging.getLogger(__name__)
    logger.info(f"My TODO app version: {__version__}")
    app = init()
    uvicorn.run(
        app, host="0.0.0.0", port=8080,
    )


if __name__ == "__main__":
    start()

You can see that we finally defined how to load the configuration, in this case using environment variables but you can use any common configuration format supported by the configuration provider like Yaml or Ini.

6. Run it!

At last we can run our application! We’re now confident it’s ready to see some real user requests πŸ˜ƒ

python -m myapp.app

And in another terminal:

❯ curl localhost:8080/todo/
[]
❯ curl -H "Content-Type: application/json" -d '{"content": "my first entry"}' localhost:8080/todo/
"8b122755-530e-43c7-ae84-362c17a37fc5"
❯ curl localhost:8080/todo/
[{"id":"8b122755-530e-43c7-ae84-362c17a37fc5","content":"my first entry","tags":[]}]

It works! But the best part is the OpenAPI interactive documentation available at: http://127.0.0.1:8080/docs
Here you can see the documentation and query the backend directly πŸ˜ƒ

7. What’s next?

We still have a lot to show you. Introduce the Unit of Work allowing to keep consistency with transaction on data changes, how to use Sqlalquemy ORM and Alembic to automate database migration, how to add an internal Message Bus to manage synchronous and asynchronous treatment properly, how to do application testing with Mocks, etc…

But this post is way tool long already…

Full source code for the example we discussed in this post can be found here: https://github.com/RyaxTech/example-app

The original idea and first implementation of this architecture was made by Maxime Arriaza at Ryax Technologies, and we really thank him for that! 🫢

In a future article we’ll cover another level of the architecture, with a discussion on the Microservices. How to split your services? Who is responsible of what? How to manage inter-services communication? Let’s stay in touch!

A word on the author

Michael Mercier

Lead Software Engineer at Ryax Technologies. Michael is a Computer Science PhD and R&D engineer, with a wide IT infrastructure expertise in multiple contexts: Cloud, High Performance Computing, and Big Data.