Relationships

Your data naturally has connections to one another. Students have teachers, offices have employees, etc. Relationships are what give a relational database like Postgres its name.

Iceaxe has support for relationships via foreign keys. Let's take the case of wanting to assign each Employee to an Office of record. We define two separate tables for each objects, and specify that the object_id key of Employees should link back to the office. This is a one-to-many relationship since one office can have many employees.

from iceaxe import TableBase, Field

class Office(TableBase):
    id: int = Field(primary_key=True)
    name: str

class Employee(TableBase):
    id: int = Field(primary_key=True)
    name: str
    office_id: int = Field(foreign_key="office.id")

When new data is inserted into the database, Postgres will validate that the office_id that you've provided does in fact exist in the database. If not, our underlying connection will raise an error.

office = Office(name="HQ")
await conn.insert([office])

valid_employee = Employee(name="Alice", office_id=office.id)
invalid_employee = Employee(name="Bob", office_id=11111111)

await conn.insert([valid_employee]) # Works
await conn.insert([invalid_employee]) # Raises an error

How do we retrieve the office that an employee is associated with? We can use the join method to specify that we want to pull in the office data. This executes one efficient SQL query to get both objects. The resulting objects in Python are typehinted as you expect.

from iceaxe import select

query = select((Employee, Office)).join(
    Office,
    Employee.office_id == Office.id,
).where(
    Employee.name == "Alice",
)
results = await conn.exec(query)

alice, office = results[0]
print(f"{alice.name} (office.name)") # Alice (HQ)

Many-to-many relationships

Many-to-many relationships are a bit more complex. Let's say we have a many-to-many relationship between Person and FavoriteColor. A person can have many favorite colors, and a color can be a favorite of many people.

This calls for a third table, PersonFavoriteColor, that links the two tables together. It acts as the primary lookup table for the relationship. In theory this relationship table can have additional fields - there's nothing unique to it being a many-to-many relationship - but in this case we'll keep it simple.

from iceaxe import TableBase, Field

class Person(TableBase):
    id: int = Field(primary_key=True)
    name: str

class FavoriteColor(TableBase):
    id: int = Field(primary_key=True)
    color: str

class PersonFavoriteColor(TableBase):
    person_id: int = Field(foreign_key="person.id", primary_key=True)
    color_id: int = Field(foreign_key="favorite_color.id", primary_key=True)

To query the favorite colors of a person, we can use the join method like before. Except instead of a direct equality comparison between the two tables we'll want to use the bridge table as the central point of the join.

query = select((Person, FavoriteColor)).join(
    PersonFavoriteColor,
    Person.id == PersonFavoriteColor.person_id,
).join(
    FavoriteColor,
    PersonFavoriteColor.color_id == FavoriteColor.id,
).where(
    Person.name == "Alice",
)

This is a more complicated query than before, so let's consider the actual SQL that's being run:

SELECT person.*, favorite_color.*
FROM person
JOIN person_favorite_color ON person.id = person_favorite_color.person_id
JOIN favorite_color ON person_favorite_color.color_id = favorite_color.id
WHERE person.name = 'Alice'

A plain JOIN like this is the same as specifying an INNER JOIN. Since SQL fetches results in a row based format, we'll receive |Person|*|Color| rows here. In other words, if Alice has 3 favorite colors, we'll get 3 rows back alongside 3 copies of Alice's data. When cast into Python objects, we'll get get the same:

[
    (Person(name="Alice"), FavoriteColor(color="Red")),
    (Person(name="Alice"), FavoriteColor(color="Green")),
    (Person(name="Alice"), FavoriteColor(color="Blue")),
]

Sometimes, that's okay if you're fetching a small amount of data. You can always just group by the person to get a more compact result:

from itertools import groupby

results = await conn.exec(query)
grouped = groupby(sorted(results, key=lamda x: x.id), lambda x: x.id)
for person_id, person_colors in grouped:
    person, colors = person_colors[0]
    print(f"{person.name} likes {', '.join(color.color for person, color in person_colors)}")

Alternatively, you can separate out your queries: one query to fetch the people and one query to fetch the colors for all matching IDs:

query = select(Person).where(Person.name == "Alice")
people = await conn.exec(query)

query = select((PersonFavoriteColor.person_id, FavoriteColor)).join(
    PersonFavoriteColor,
    FavoriteColor.id == PersonFavoriteColor.color_id,
).where(
    PersonFavoriteColor.person_id.in_([person.id for person in people]),
)

Comparison with other ORMs

In some other ORMs that you might have used, there is an explicit notion of defining and accessing child models right from the parent object. The syntax might be something like:

class Parent(Base):
    children = relationship("Child")

class Child(Base):
    parent_id = Column(Integer, ForeignKey("parent.id"))
    parent = relationship("Parent")

When you access parent.children, these ORMs will automatically issue a new request to the database to pull the dependent children. Internally this usually translates to the following SQL:

# SELECT * FROM parent WHERE id = ?
parent_obj = db_session.get(Parent, parent_id)

# SELECT * FROM child WHERE parent_id = ?
for child in parent_obj.children:
    print(child)

This works naturally in syncronous code but isn't as straightforward in async logic. When should the async ORM issue the request to fetch the children? Should it be done when the parent object is created? When the children attribute is accessed? If so, children would need to switch to an awaitable type. Will the transaction be held for the duration of both fetches? There are typically a host of options that govern each ORM's behavior.

Plus, if you're dealing with multiple parent objects, it's common to unintentionally issue a large number of queries to the database when you iterate over the parent objects. This is known as the N+1 query problem.

Inline with Iceaxe's philosophy that SQL queries are better explicit than implicit, we don't support this implicit relationship syntax. Instead we support foreign keys and joins directly in the query. If you want to implement something similar at the class level, it's pretty easy to do so:

from iceaxe import TableBase, Field, select

class Parent(TableBase):
    id: int = Field(primary_key=True)
    name: str

    async def children(self, conn: DBConnection):
        query = select(Child).where(Child.parent_id == self.id)
        return await conn.exec(query)

But buyer beware: this will issue a new query to the database every time. Think about if this is really the right approach for your use case. Normally it's better to default to explicit joins in your queries.