Alejandro Sosa

Posted on Nov 5

Comparative Benchmarking: ILP, A*, and Branch and Bound Algorithms in High-Throughput Scenarios

#algorithms #engineering #benchmarking #go

In this blog post, we will compare the performance of three different algorithms used in a recent personal project: the ILP (Integer Linear Programming) algorithm, the Local algorithm utilizing the A* algorithm, and an optimized solution using the Branch and Bound algorithm. All algorithms were tested using the same dataset, with the ILP and Branch and Bound implementations sharing the same workload, while the A* implementation was limited due to performance constraints.

Disclaimer: While I will not delve into the project's specific code details, I will share some insights from it. The codebase is not intended for public disclosure, and this serves as a disclaimer to respect its confidentiality.

Benchmark Results

Here are the benchmark results for all three algorithms:

goos: linux
goarch: amd64
pkg: github.com/sosalejandro/<my-project>/<my-package>/pkg
cpu: 13th Gen Intel(R) Core(TM) i7-13700HX

BenchmarkGenerateReportILP-24                            724       1694029 ns/op       30332 B/op        181 allocs/op
BenchmarkGenerateReportILPParallel-24                   6512        187871 ns/op       34545 B/op        184 allocs/op
BenchmarkGenerateReportLocal-24                            2     851314106 ns/op    559466456 B/op   7379756 allocs/op
BenchmarkBranchGenerateReportLocal-24                 101449         12106 ns/op       29932 B/op        165 allocs/op
BenchmarkGenerateReportLocalParallel-24                    3     349605952 ns/op    559422440 B/op   7379837 allocs/op
BenchmarkBranchGenerateReportLocalParallel-24         120543         10755 ns/op       29933 B/op        165 allocs/op
PASS
coverage: 81.4% of statements
ok      github.com/sosalejandro/<my-project>/<my-package>/pkg   11.121s

Workload Configuration

All algorithms were tested using the same set of data, but the workload (i.e., the number of times each item is processed) differed between the implementations.

ILP and Branch and Bound Implementation Workload:

plan := []Plan{
    {ID: "1", Times: 100},
    {ID: "2", Times: 150},
    {ID: "3", Times: 200},
    {ID: "8", Times: 50},
    {ID: "9", Times: 75},
    {ID: "10", Times: 80},
    {ID: "11", Times: 90},
    {ID: "12", Times: 85},
    {ID: "13", Times: 60},
    {ID: "14", Times: 110},
}

A* Implementation Workload:

plan := []Plan{
    {ID: "1", Times: 1},
    {ID: "2", Times: 1},
    {ID: "3", Times: 5},
    {ID: "8", Times: 5},
    {ID: "9", Times: 5},
    {ID: "10", Times: 5},
    {ID: "11", Times: 9},
    {ID: "12", Times: 5},
    {ID: "13", Times: 5},
    {ID: "14", Times: 5},
}

Workload Analysis

To understand the impact of these workloads on the benchmark results, let's calculate the total number of iterations (i.e., the sum of the Times values) for each implementation.

Total Iterations:

ILP and Branch and Bound Implementations:

  100 + 150 + 200 + 50 + 75 + 80 + 90 + 85 + 60 + 110 = 1000

A* Implementation:

  1 + 1 + 5 + 5 + 5 + 5 + 9 + 5 + 5 + 5 = 46

Workload Ratio:

ILP Iterations / A* Iterations = 1000 / 46 ≈ 21.74

This means the ILP and Branch and Bound implementations are handling approximately 21.74 times more iterations compared to the A* implementation.

Performance Comparison

Let's break down the benchmark results in relation to the workload differences.

Benchmark	Runs	ns/op	B/op	allocs/op	Total Time (ns)
BenchmarkGenerateReportILP-24	724	1,694,029	30,332	181	≈ 1,225,836,996
BenchmarkGenerateReportILPParallel-24	6,512	187,871	34,545	184	≈ 1,223,607,552
BenchmarkBranchGenerateReportLocal-24	101,449	12,106	29,932	165	≈ 1,224,505,394
BenchmarkGenerateReportLocal-24	2	851,314,106	559,466,456	7,379,756	≈ 1,702,628,212
BenchmarkGenerateReportLocalParallel-24	3	349,605,952	559,422,440	7,379,837	≈ 1,048,817,856
BenchmarkBranchGenerateReportLocalParallel-24	120,543	10,755	29,933	165	≈ 1,295,219,065

Observations

Execution Time per Operation:
- BenchmarkGenerateReportILP-24 vs BenchmarkBranchGenerateReportLocal-24:
  - Branch and Bound is 99.29% faster than ILP, reducing execution time from 1,694,029 ns/op to 12,106 ns/op.

BenchmarkGenerateReportILP-24 vs BenchmarkGenerateReportLocal-24:
- ILP is 99.80% faster than Local, reducing execution time from 851,314,106 ns/op to 1,694,029 ns/op.
BenchmarkGenerateReportILPParallel-24 vs BenchmarkBranchGenerateReportLocalParallel-24:
- Branch and Bound Parallel is 94.28% faster than ILP Parallel, reducing execution time from 187,871 ns/op to 10,755 ns/op.
BenchmarkGenerateReportILPParallel-24 vs BenchmarkGenerateReportLocalParallel-24:
- ILP Parallel is 99.95% faster than Local Parallel, reducing execution time from 349,605,952 ns/op to 187,871 ns/op.

Memory Allocations:
- ILP Implementations: Slight increase in memory usage and allocations when running in parallel.
- Branch and Bound Implementations: Lower memory usage and allocations compared to the A* implementations.
- A* Implementations: Extremely high memory allocations, leading to inefficient resource utilization.
Throughput:
- ILP Parallel and Branch and Bound Parallel can handle approximately 21.74 times more iterations due to the higher workload.
- A* Implementations struggle with throughput not due to the significantly lower number of iterations but due to inefficient memory usage and implementation.

Impact of Varying Workload on Performance

Given that the ILP and Branch algorithms handle 21.74 times more throughput per test iteration, this difference in workload impacts each algorithm's performance and efficiency:

ILP and Branch Algorithms: As these handle a greater throughput, they are optimized for higher workloads. Despite handling more operations, they maintain faster execution times. This suggests they are not only computationally efficient but also well-suited for high-throughput scenarios.
Local Algorithm: With a smaller throughput and higher execution time, this algorithm is less efficient in handling increased workloads. If scaled to the same throughput as ILP or Branch, its execution time would increase significantly, indicating it’s not ideal for high-throughput cases.

In scenarios where workload is increased, ILP and Branch would outperform Local due to their ability to manage higher throughput efficiently. Conversely, if the workload were reduced, the Local algorithm might perform closer to ILP and Branch but would still likely lag due to fundamental differences in algorithmic efficiency.

Algorithm Overview

To provide a clearer understanding of how each algorithm approaches problem-solving, here's a general overview of their mechanisms and methodologies.

Integer Linear Programming (ILP)

Purpose:

ILP is an optimization technique used to find the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. It is particularly effective for problems that can be expressed in terms of linear constraints and a linear objective function.

General Workflow:

Define Variables:

Identify the decision variables that represent the choices to be made.
Objective Function:

Formulate a linear equation that needs to be maximized or minimized.
Constraints:

Establish linear inequalities or equalities that the solution must satisfy.
Solve:

Utilize an ILP solver to find the optimal values of the decision variables that maximize or minimize the objective function while satisfying all constraints.

Pseudocode:

function ILP_Solve(parameters):
    variables = define_decision_variables(parameters)
    objective = define_objective_function(variables)
    constraints = define_constraints(parameters, variables)

    solver = initialize_ILP_solver()
    solver.set_objective(objective)
    for constraint in constraints:
        solver.add_constraint(constraint)

    result = solver.solve()
    return result

A* Algorithm (Local Implementation)

Purpose:

A* is a pathfinding and graph traversal algorithm known for its performance and accuracy. It efficiently finds the shortest path between nodes by combining features of uniform-cost search and pure heuristic search.

General Workflow:

Initialization:

Start with an initial node and add it to the priority queue.
Loop:
- Remove the node with the lowest cost estimate from the priority queue.
- If it's the goal node, terminate.
- Otherwise, expand the node by exploring its neighbors.
- For each neighbor, calculate the new cost and update the priority queue accordingly.
Termination:

The algorithm concludes when the goal node is reached or the priority queue is empty (indicating no path exists).

Pseudocode:

function AStar(initial_state, goal_state, heuristic):
    open_set = PriorityQueue()
    open_set.push(initial_state, priority=heuristic(initial_state))

    came_from = {}
    g_score = map with default value infinity
    g_score[initial_state] = 0

    while not open_set.is_empty():
        current = open_set.pop()

        if current == goal_state:
            return reconstruct_path(came_from, current)

        for neighbor in current.neighbors:
            tentative_g_score = g_score[current] + cost(current, neighbor)
            if tentative_g_score < g_score[neighbor]:
                came_from[neighbor] = current
                g_score[neighbor] = tentative_g_score
                f_score = tentative_g_score + heuristic(neighbor)
                open_set.push(neighbor, priority=f_score)

    return failure

Branch and Bound Algorithm

Purpose:

Branch and Bound is an optimization algorithm that systematically explores the solution space. It divides the problem into smaller subproblems (branching) and uses bounds to eliminate subproblems that cannot produce better solutions than the current best (bounding).

General Workflow:

Initialization:

Start with an initial solution and set the best known solution.
Branching:

At each node, divide the problem into smaller subproblems.
Bounding:

Calculate an optimistic estimate (upper bound) of the best possible solution in each branch.
Pruning:

Discard branches where the upper bound is worse than the best known solution.
Search:

Recursively explore remaining branches using depth-first or best-first search.
Termination:

When all branches have been pruned or explored, the best known solution is optimal.

Pseudocode:

function BranchAndBound():
    best_solution = None
    best_score = negative_infinity
    initial_state = create_initial_state()
    stack = [initial_state]

    while stack is not empty:
        current_state = stack.pop()
        current_score = evaluate(current_state)

        if current_score > best_score:
            best_score = current_score
            best_solution = current_state

        if can_branch(current_state):
            branches = generate_branches(current_state)
            for branch in branches:
                upper_bound = calculate_upper_bound(branch)
                if upper_bound > best_score:
                    stack.push(branch)
    return best_solution

Comparative Analysis

Feature	ILP Implementation	Local (A*) Implementation	Branch and Bound Implementation
Optimization Approach	Formulates the problem as a set of linear equations and inequalities to find the optimal solution.	Searches through possible states using heuristics to find the most promising path to the goal.	Systematically explores and prunes the solution space to find optimal solutions efficiently.
Scalability	Handles large-scale problems efficiently by leveraging optimized solvers.	Performance can degrade with increasing problem size due to the exhaustive nature of state exploration.	Efficient for combinatorial problems, with pruning reducing the search space significantly.
Development Time	Faster implementation as it relies on existing ILP solvers and libraries.	Requires more time to implement, especially when dealing with complex state management and heuristics.	Moderate development time, balancing complexity and optimization benefits.
Flexibility	Highly adaptable to various linear optimization problems with clear constraints and objectives.	Best suited for problems where pathfinding to a goal is essential, with heuristic guidance.	Effective for a wide range of optimization problems, especially combinatorial ones.
Performance	Demonstrates superior performance in handling a higher number of iterations with optimized memory usage.	While effective for certain scenarios, struggles with high memory allocations and longer execution times under heavy workloads.	Shows significant performance improvements over ILP and A* with optimized memory usage and faster execution times.
Developer Experience	Improves developer experience by reducing the need for extensive coding and optimization efforts.	May require significant debugging and optimization to achieve comparable performance levels.	Balances performance with manageable development effort, leveraging existing strategies for optimization.
Integration	Currently integrates a C++ ILP module with Golang, facilitating efficient computation despite cross-language usage.	Fully implemented within Golang, but may face limitations in performance and scalability without optimizations.	Implemented in Golang, avoiding cross-language integration complexities and enhancing performance.

Implications for Server Performance

Scalability:
- The Branch and Bound implementation demonstrates excellent scalability, efficiently handling a large number of concurrent requests with reduced latency.
- The ILP Parallel implementation also shows excellent scalability, efficiently handling a large number of concurrent requests with reduced latency.
- The A* implementation is unsuitable for high-load environments due to performance limitations.
Resource Utilization:
- Branch and Bound Implementations utilize resources efficiently, with low memory consumption and fast execution times.
- ILP Parallel effectively utilizes multi-core CPUs, providing high throughput with manageable memory consumption.
- A* Implementations consume excessive memory, potentially leading to resource exhaustion.

Workload Impact on Performance

The workload differences influence the performance of the algorithms:

Branch and Bound Implementation handles the same workload as the ILP implementation efficiently, providing fast execution times and low memory usage, making it suitable for scaling.
ILP Implementation handles a larger workload efficiently due to optimized solvers.
A* Implementation struggles with performance due to high execution times and memory usage.

Conclusion

An extra comparison was added using an optimized solution with the Branch and Bound algorithm, which shows how it significantly improved over the ILP and the A* algorithms in terms of performance and resource utilization. The workload used on the Branch and Bound Algorithm is the same as the ILP algorithm.

The Branch and Bound-based BenchmarkBranchGenerateReportLocalParallel function showcases exceptional performance improvements, making it highly suitable for server environments demanding high concurrency and efficient resource management.

By focusing on leveraging the strengths of the Branch and Bound approach and optimizing it for the specific problem, we can ensure that the project remains both performant and scalable, capable of handling increasing demands with ease.

Final Thoughts

Balancing performance, scalability, and developer experience is crucial for building robust applications. The Branch and Bound approach has proven to be the most efficient in the current setup, offering substantial performance gains with reasonable development effort.

By continuously profiling, optimizing, and leveraging the strengths of each algorithmic approach, we can maintain a high-performance, scalable, and developer-friendly system.

DEV Community

Comparative Benchmarking: ILP, A*, and Branch and Bound Algorithms in High-Throughput Scenarios

Benchmark Results

Workload Configuration

ILP and Branch and Bound Implementation Workload:

A* Implementation Workload:

Workload Analysis

Total Iterations:

Workload Ratio:

Performance Comparison

Observations

Impact of Varying Workload on Performance

Algorithm Overview

Integer Linear Programming (ILP)

A* Algorithm (Local Implementation)

Branch and Bound Algorithm

Comparative Analysis

Implications for Server Performance

Workload Impact on Performance

Conclusion

Final Thoughts

Top comments (0)

Read next

From Zero to Hero: Dockerizing My Go App and Hosting It on Render like a Pro 🚀

🔒 Exploring the Singleton Design Pattern in Ruby

1590. Make Sum Divisible by P

Why Clean Architecture Struggles in Golang and What Works Better