Data for Trees in Python -

Starter code: below

datatypes.py: blank
io_util.py: contains code already
main.py: has starter code
functions.py: has stubs that will contain our upgma functions
Newick.R: has code that we will use to visualize the trees that we make

Dataclasses

{You probably are getting tired of writing constructors and repr()}

{@dataclass includes support for both, and type declarations look more language-neutral and like other languages. Let’s show how this works for a rectangle and circle. This would be better to move to the start of the preceding code along; so feel free to just adjust it.}

from dataclasses import dataclass

@dataclass
class Rectangle:
    """
    Represents a rectangle in 2D space.

    Attributes:
        width (float): The width of the rectangle.
        height (float): The height of the rectangle.
        rotation (float): Rotation angle in degrees (default: 0.0).
        x1 (float): X-coordinate of the lower-left corner (default: 0.0).
        y1 (float): Y-coordinate of the lower-left corner (default: 0.0).
    """

    width: float = 1.0
    height: float = 1.0
    rotation: float = 0.0
    x1: float = 0.0
    y1: float = 0.0

    def area(self) -> float:
        """Return the area of the rectangle."""
        return self.width * self.height


@dataclass
class Circle:
    """
    Represents a circle in 2D space.

    Attributes:
        radius (float): The radius of the circle.
        x1 (float): X-coordinate of the circle center (default: 0.0).
        y1 (float): Y-coordinate of the circle center (default: 0.0).
    """

    radius: float = 1.0
    x1: float = 0.0
    y1: float = 0.0

    def area(self) -> float:
        """Return the area of the circle."""
        return 3.0 * self.radius ** 2

{Calling this in main}

def main():
    # Custom rectangle
    r = Rectangle(width=3.0, height=4.0)
    print("Custom rectangle area:", r.area())

    # Default circle
    c = Circle(radius = 5.0)
    print("Default circle area:", c.area())

    # printing r and c
    print(r)
    print(c)

Establishing data for trees

{Next, we will apply this to datatypes.py}

DistanceMatrix = list[list[float]]
"""A two-dimensional list of floats representing pairwise distances between species."""

{We also will provide support for a Tree, which just needs to be a list of Node objects, so we use an alias. However, one important thing is that this must occur after our Node declaration, so we put this at the bottom of the file.}

Tree = list[Node]
"""A list of Node objects representing a phylogenetic tree structure."""

{Next, we adapt the idea from the previous code along to define a Node, using our datatypes idea. Here is what we would like to do.}

from dataclasses import dataclass

@dataclass
class Node:
    """
    Represents a node in a phylogenetic tree.

    Attributes:
        num (int): Numeric identifier for the node (e.g., index in a tree list).
        age (float): Age (or height) of the node, typically half the distance between clusters.
        label (str): Label of the node, usually the species name for leaves.
        child1: The first child node, or None if this node is a leaf.
        child2: The second child node, or None if this node is a leaf.
    """

    num: int = 0
    age: float = 0.0
    label: str = ""
    child1: Node = None
    child2: Node = None

{Let’s try it!}

def main():
    v = Node(num = 2, age = 3, label = "New Node")
    print(v)

{Unfortunately, when we run our code, we get}

NameError: name 'Node' is not defined. Did you mean: 'None'?

STOP: Why do you think there is an issue here?

{The problem is created because we’re essentially defining a “recursive” Node object. Python is not happy defining an object as a field of itself.}

{To fix this, we will import Self from typing module. The type, then, is “Self | None”, which says that child1 can hold either an object of the same class (Self) or None. Finally, = None indicates that the default value is None, which is good, as it means that the node doesn’t have a node set when it’s declared by default.}

from dataclasses import dataclass
from typing import Self

@dataclass
class Node:
    """
    Represents a node in a phylogenetic tree.

    Attributes:
        num (int): Numeric identifier for the node (e.g., index in a tree list).
        age (float): Age (or height) of the node, typically half the distance between clusters.
        label (str): Label of the node, usually the species name for leaves.
        child1 (Self | None): The first child node, or None if this node is a leaf.
        child2 (Self | None): The second child node, or None if this node is a leaf.
    """

    num: int = 0
    age: float = 0.0
    label: str = ""
    child1: Self | None = None
    child2: Self | None = None

{We are now ready to implement UPGMA}

Next lesson

Data for Trees in Python

Dataclasses

Establishing data for trees

Join our community!

Join our community!