Decision trees are very common algorithms in the world of development. On paper, they are quite simple. You chain if-else statements nested within each other. The problem arises when the tree starts to grow. If you're not careful, it becomes complex to read and, therefore, difficult to maintain.
In this article, we'll see how we implemented decision trees at Wecasa to make them as readable and maintainable as possible.
Before we go any further, let's take some time to define what a Composite is.
What Links Composite and Binary Decision Tree?
Composite is a structural design pattern. It is a way of arranging objects in a tree structure, which allows us to have a logical hierarchy.
When we talk about a decision tree, we can represent it as boxes that can contain boxes, which can contain boxes, which can contain… a final result.
It's precisely this recursive aspect that makes the decision tree so powerful.
Let's take an example. We want to create a decision tree to calculate the amount of the promotion a customer is eligible for on our e-commerce site.
- In input, our tree receives a User object with all its attributes.
- In output, our tree returns a number.
Each node in our tree has a condition. You can notice by the color code that we have three types of Composites here: Creation Date, Zip Code, and whether the user has already made a purchase. Now that our example is set, let's see how this translates into code!
Firstly, we need tools !
In our implementation of the decision tree, we will need three classes:
- The Node Class: The base class for all other classes. Its purpose is to encapsulate the common logic for all different Nodes in my tree.
class Node
attr_accessor :left_node, :right_node
def call(user)
raise NotImplementedError
end
private
# this method is used to ease composites definition
def call_with_condition(user)
if yield
left_node.call(user)
else
right_node.call(user)
end
end
end
- The Leaf Class: Its purpose is to end the tree. You can find them as bottom nodes without any child nodes. The only job they have to accomplish is to return the value they encapsulate.
class Leaf < Node
attr_accessor :value
def initialize(value)
self.value = value
end
def call(_)
value
end
end
- The Composite Classes: Their role is to contain the condition to guide between the success branch and the failure branch, to proceed to the next step in the algorithm. They also contain a method to set arguments. We will see later why this method is useful.
class CreationDateComposite < Node
attr_accessor :days
def initialize(left_node, right_node)
self.left_node = left_node
self.right_node = right_node
end
def call(user)
call_with_condition(user) do
self.days.ago > user.created_at
end
end
def set_days(days)
self.days = days
self
end
end
class ZipCodeComposite < Node
[...]
end
class HaveBoughtComposite < Node
[...]
end
I only show CreationDateComposite
so you understand the logic, no need to show the rest.
How We Finally Manage to Create a Binary Decision Tree
Thanks to all the classes we have built, we will now create our first complete tree. Here is the proposed implementation:
class PromotionTree
# We use Singleton because we don't need to rebuild the tree
# everytime we call the class
include Singleton
TREE = CreationDateComposite.new(
ZipCodeComposite.new(
BoughtComposite.new(
Leaf.new(60),
Leaf.new(150)
),
BoughtComposite.new(
Leaf.new(30),
Leaf.new(100)
)
).set_zip_code('75'),
BoughtComposite.new(
Leaf.new(30),
ZipCodeComposite.new(
Leaf.new(75),
ZipCodeComposite.new(
Leaf.new(78),
Leaf.new(0)
).set_zip_code('78')
).set_zip_code('75')
)
).set_days(7).freeze
def call(user)
TREE.call(user)
end
end
And now we can simply use the tree :
user = User.last
PromotionTree.instance.call(user) # => 60 🎉
That being explained, this implementation offers several benefits:
- No if-else mixed in my business logic: By encapsulating conditions in separate objects, we have a clear separation of responsibilities. Each condition is handled by a specific composite, making the code more readable, elegant and maintainable.
- Centralization of conditions in separate objects: This allows easy reuse and modification of conditions without touching the main application logic. Conditions can be changed or extended by creating new composites or modifying existing ones thanks to OOP.
- Ease of testing: Each component of the decision tree can be tested in isolation, simplifying the unit testing process. Tests can focus on specific conditions without having to simulate the entire decision tree.
- Extensibility: Adding new conditions or modifying existing conditions is simple and straightforward. Just create new composite nodes and insert them into the existing decision tree.
- It's easy to build a new tree: Let's say tomorrow I want to build a new kind of tree. I always have all my Nodes ready to be used.
But apart from this, I can see some downsides :
- The setup cost is High: I think it's not that pertinent to use this if you setup this kind of architecture if you don't plan to create other trees.
- It uses recursion: The limit of your tree will be the limit of your Stack Size. So if your tree is bigger than your allowed stack size, you won't be able to use it. Fortunately, this value is quite big : 10867 for my setup. But it can depend on your setup !
- You need to get into the habit to read it quickly: At first it's kinda disturbing. The fact that we need to read on the top the Composite, and the bottom the value it used is quite weird. But don't be afraid of this syntax, I will publish another article to show you how I implemented a DSL for Tree building based on this architecture.
Conclusion
Using Composite to implement decision trees allows us to structure our conditions in a clear and maintainable way. Rather than getting lost in a maze of nested if-else statements, we have a logical hierarchy of decisions, each node playing a well-defined role.
Thank you for reading. I invite you to subscribe so you don't miss the next articles that will be released.
Top comments (0)