Jake Z.

Posted on Aug 10, 2020 • Originally published at algodaily.com

Recursive Backtracking For Combinatorial, Path Finding, and Sudoku Solver Algorithms

#webdev #tutorial #beginners #interview

This lesson was originally published at https://algodaily.com, where I maintain a technical interview course and write think-pieces for ambitious developers.

Backtracking Made Simple

Backtracking is a very important concept in computer science and is used in many applications. Generally, we use it when all possible solutions of a problem need to be explored. It is also often employed to identify solutions that satisfy a given criterion also called a constraint.

In this tutorial, I will discuss this technique and demonstrate it. We'll achieve understanding through a few simple examples involving enumerating all solutions or enumerating solutions that satisfy a certain constraint.

Let's start on the next step!

Backtracking and Depth First Search

In very simple terms, we can think of backtracking as building and exploring a search tree in a depth first manner. The root node of the tree, or the "path" to the leaf node, represents a candidate solution that can be evaluated. So as we traverse through each path, we're testing a solution. So in the diagram below, A -> B -> D is one possible solution.

If the candidate path qualifies as a working solution, then it is kept as an alternative. Otherwise, the search continues in a depth first manner. In other words, once a solution is found, the algorithm backtracks (goes back a step, and explores another path from the previous point) to explore other tree branches to find more solutions.

Efficiency Gains

For constraint satisfaction problems, the search tree is "pruned" by abandoning branches of the tree that would not lead to a potential solution. Thus, we're constantly cutting down the search time and making it more efficient than an exhaustive or complete search. Let's now jump straight into how all of this is done via examples you might see on interview day.

Combinatorial Problem: Finding N Combinations

As a first problem, Iet's use a very simple problem from combinatorics-- can you find all possible N combinations of items from a set?

In other words, given a set {1, 2, 3, 4, 5} and an N value of 3, we'd be looking for all combinations/subsets of length/size 3. In this case, they would be {1, 2, 3}, {1, 2, 4}, and so on.

Note that the ordering is not important in a combination. so {1, 2, 3} and {3, 2, 1} are considered the same thing.

Let's now look at the pseudo-code for this N-combination problem:

routine: combos
input: set
output: display N combinations
assumption: position of the first item is zero and result set is empty at start

base case:
1. If all combinations starting with items in positions < (size-N) have been printed. Stop

recursive case:
Combos(set,result)
1. Repeat for each items i in the set:
    a. Put the item i in the result set
    b. if the result set has N items, display it
        else
        recursively call combos with (the input set without item i) and (the result set)
    c. Remove the item i from result set

Implementation of Combinatorial Solution

The diagram below shows how this pseudo code works for an input set {1, 2, 3, 4} and N=3.

Notice how the search tree is built from {} (empty set), to {1} to {1, 2} to {1, 2, 3}.

When {1, 2, 3} is found, the algorithm backtracks to {1, 2} to find all combinations starting with {1, 2}. Once that is finished the method backtracks to {1} to find other combinations starting with 1.

In this case, the entire search tree is not stored, but is instead built implicitly. Some paths, where the possibility of finding more combinations is not possible, are abandoned. The method is elegant and its C++ implementation is shown here.

Notice how in the base case 2 of the code, the exploration of combinations stops early on when the index of the set goes above a certain level. So in the tree above, the solutions {3} and {4} won't be explored. This is what makes the algorithm efficient.

#include <iostream>
#include <vector>
#include <string>

using namespace std;

// helper: prints the vector
void printVector(vector<int>& arr)
{
    cout << "\n";
    for (int i = 0; i < arr.size(); ++i)
        cout << arr[i] << " ";
    cout << "\n";
}

// helper function:
// prints all possible combinations of N numbers from a set
void combosN(vector<int>& set, int N, vector<int>& result, int ind)
{
    // base case 1
    if (ind >= set.size())
        return;
    // base case 2
    if (result.size() == 0 && ind > set.size() - N)
        return;
    for (int i = ind; i < set.size(); ++i) {
        result.push_back(set[i]);
        if (result.size() == N)
            printVector(result); // print the result and don't go further
        else // recursive case
            combosN(set, N, result, i + 1);
        result.pop_back();
    }
}

// To be called by user: all possible combinations of N numbers from a set
void combosN(vector<int>& set, int N)
{
    vector<int> result;
    combosN(set, N, result, 0);
}

int main() {
  vector<int> v = {1, 2, 3, 4};
  combosN(v, 3);
}

Combinatorial Problem With A Constraint: Finding N Combinations with Sum < S

Let's now add a constraint to our N combinations problem! The constraint is-- that all sets where sum < S (S being a given parameter) should be printed out.

All we need to do is modify the combosN code, so that all combinations whose sum exceeds S are not explored further, and other such combinations are not generated. Assuming the array is sorted, it becomes even more efficient.

We've illustrated backtracking via arrays to keep things simple. This technique would work really well for unordered linked lists, where random access to elements is not possible.

The tree below shows the abandoned paths {3, 10} and {5, 8}.

// sum should be less than target of the argument.  Rest is the same as combosN function
void combosNConstraint(vector<int>& arr, vector<int>& subsets, int ind, int target)
{
    if (ind == arr.size())
        return;
    for (int i = ind; i < arr.size(); ++i) {
        subsets.push_back(arr[i]);
        // do a recursive call only if constraint is satisfied
        if (sum(subsets) <= target) {
            printVector(subsets);
            combosNConstraint(arr, subsets, i + 1, target);
        }
        subsets.pop_back();
    }
}

Enumerating Paths Through a Square Grid

Our next combinatorial problem is that of printing all possible paths from a start location to a target location.

Suppose we have a rectangular grid with a robot placed at some starting cell. It then has to find all possible paths that lead to the target cell. The robot is only allowed to move up or to the right. Thus, the next state is generated by doing either an "up move" or a "right move".

Backtracking comes to our rescue again. Here is the pseudo-code that allows the enumeration of all paths through a square grid:

routine: enumeratePaths
input: Grid m*n
output: Display all paths
assumption: result is empty to begin with

Base case 1:
1. If target is reached then print the path
Base case 2:
2. If left or right cell is outside the grid then stop

Recursive case:
1. Add the current cell to path
2. Invoke enumeratePaths to find all paths that are possible by doing an "up" move
3. Invoke enumeratePaths to find all paths that are possible by doing a "right" move
4. Remove the current cell from path

Square Grid Implementation

To see how the previous pseudo-code works, I have taken an example of a 3x3 grid and shown the left half of the tree. You can see that from each cell there are only two moves possible, i.e., up or right.

The leaf node represents the goal/target cell. Each branch of the tree represents a path. If the goal is found (base case 1), then the path is printed. If instead, base case 2 holds true (i.e., the cell is outside the grid), then the path is abandoned and the algorithm backtracks to find an alternate path.

Note: only a few backtrack moves are shown in the figure. However, after finding the goal cell, the system again backtracks to find other paths. This continues until all paths are exhaustively searched and enumerated.

The code attached is a simple C++ implementation of enumerating all paths through an m * n grid.

// helper recursive routine
void enumeratePaths(int rows, int cols, vector < int > & path, int r, int c) {

  path.push_back(c + cols * r);
  // base case 1
  if (r == rows - 1 && c == cols - 1) {
    printVector(path);
    return;
  }
  // base case 2
  if (r >= rows) // out of bound. do nothing
    return;
  // base case 2
  if (c >= cols) // out of bound. do nothing
    return;

  // row up
  enumeratePaths(rows, cols, path, r + 1, c);
  // backtrack
  path.pop_back();
  // column right
  enumeratePaths(rows, cols, path, r, c + 1);
  path.pop_back();
}
// to be called by user
void enumeratePathsMain(int rows, int cols) {
  vector < int > path;
  enumeratePaths(rows, cols, path, 0, 0);
}

Find Path Through a Maze

We can extend the prior problem to find the path through the maze. You can think of this problem as the grid problem, but with an added constraint. The constraint is this-- that some cells of the maze are not accessible at all, so the robot cannot step into those cells.

Let's call these "inaccessible" cell pits, where the robot is forbidden to enter. The paths that go through these cells should then be abandoned earlier on in "the search". The pseudo-code thus remains the same with one additional base case, which is to stop if the cell is a forbidden cell.

routine: enumerateMaze
input: Grid m * n
output: Display all paths
assumption: result is empty to begin with

Base case 1:
1. If target is reached then print the path
Base case 2:
2. If left or right cell is outside the maze then stop
Base case 3:
3. If the cell is a pit then stop

Recursive case:
1. Add the current cell to path
2. Invoke enumerateMaze to find all paths that are possible by doing an "up" move
3. Invoke enumerateMaze to find all paths that are possible by doing a "right" move
4. Remove the current cell from path

The figure below shows how paths are enumerated through a maze with pits. I have not shown all the backtracking moves, but the ones shown give a fairly good idea of how things are working. Basically, the algorithm backtracks to either a previous cell to find new paths, or backtracks from a pit to find new paths.

The C++ code attached is an implementation of enumerating all paths through a maze, which is represented as a binary 2D array. The main function that we can call is enumerateMazeMain and you can add a function to initialize the maze differently. The main recursive function translated from the above pseudo-code is the enumerateMaze function.

class mazeClass {
    vector<vector<int> > maze;

    void enumerateMaze(vector<int>& path, int r, int c)
    {

        path.push_back(c + maze.size() * r);
        // base case 1
        if (r == maze.size() - 1 && c == maze[0].size() - 1) {
            printVector(path);
            return;
        }
        // base case 2
        if (r >= maze.size()) // out of bound. do nothing
            return;
        // base case 2
        if (c >= maze.size()) // out of bound. do nothing
            return;
        // base case 3
        if (!maze[r][c])
            return;

        // row up
        enumerateMaze(path, r + 1, c);
        // backtrack
        path.pop_back();
        // column right
        enumerateMaze(path, r, c + 1);
        path.pop_back();
    }

public:
    // set up the maze.  Change arrmaze to define your own
    void mazeInitialize()
    {
        int arrmaze[] = {
            1,
            1,
            1,
            1,
            1,
            0,
            1,
            1,
            1,
            0,
            0,
            1,
            1,
            1,
            1,
            1
        };
        vector<int> temp;

        int ind = 0;
        for (int i = 0; i < 4; i++) {
            temp.clear();
            for (int j = 0; j < 4; ++j) {
                temp.push_back(arrmaze[ind]);
                ind++;
            }
            maze.push_back(temp);
        }
    }

    // main function to call from outside
    void enumerateMazeMain()
    {
        vector<int> path;
        if (maze.size() == 0)
            mazeInitialize();
        enumerateMaze(path, 0, 0);
    }
};

// to call this function use:
// mazeClass m;
// m.enumerateMazeMain();

Solving Sudoku

The last example in this tutorial is coming up with a solution to one of my favorite combinatorial games-- Sudoku-- via backtracking!

Sudoku is a classic example of a problem with constraints, which can be solved via backtracking. It works like magic! To simplify the problem, let's use an easier version of the sudoku game.

We can model the game as an N * N grid, each cell having numbers from 1 .. N.

The rule is not to repeat the same number in a column or row. The initial sudoku board has numbers in some cells, and are empty for the rest. The goal of the game is to fill out the empty cells with numbers from 1 .. N, so that the constraints are satisfied. Let us now look at how backtracking can be used to solve a given Sudoku board.

Routine: solve
Input: Sudoku board
Rule: No repetition of a number in the same row or column
Assumption: The initial board configuration is according to Sudoku rules

Base case:
1. If all empty places are filled return success
2. If all combinations are tried and the board is invalid, return false

Recursive case (returns success or failure):
1. Choose an empty cell
2. For each candidate number i in the range 1..N
    a. Place the candidate i in the empty cell
    b. Check if the board is valid with candidate i.
        If the board is valide then
        {   i. result = invoke the solve routine on the next empty cell
            ii. If result is true then stop and return success
        }
        else
            Continue with the next candidate as given in step 2
3. return failure (no possible combination is possible)

Results

It's pretty awesome that we can actually find a solution to Sudoku via a simple backtracking routine. Let's see this routine in action on a simple 4 x 4 board as shown in the figure below. There are three empty cells. We can see that all combinations of numbers are tried.

Once an invalid board configuration is found, the entire branch is abandoned, backtracked, and a new solution is tried. The C++ implementation is provided. You can add your own public function to initialize the board differently.

class sudoku {
    vector<vector<int> > board;

    void Initialize()
    {
        int arrBoard[] = {
            2,
            1,
            3,
            4,
            1,
            3,
            -1,
            2,
            -1,
            2,
            -1,
            3,
            -1,
            -1,
            2,
            1
        };
        vector<int> temp;

        int ind = 0;
        for (int i = 0; i < 4; i++) {
            temp.clear();
            for (int j = 0; j < 4; ++j) {
                temp.push_back(arrBoard[ind]);
                ind++;
            }
            board.push_back(temp);
        }
    }
    // set (r,c) to (0,-1) when calling first time
    // will search for the next empty slot row wise
    bool findNextEmpty(int& r, int& c)
    {
        int initj = 0;
        int initi = 0;
        bool found = false;
        // start searching from next position
        if (c == board[0].size()) {
            initi = r + 1;
            c = 0;
        }
        for (int i = r; i < board.size() && !found; ++i) {
            if (i == r)
                initj = c + 1;
            else
                initj = 0;
            for (int j = initj; j < board[i].size() && !found; ++j) {

                if (board[i][j] == -1) {
                    r = i;
                    c = j;
                    found = true;
                }
            }
        }
        return found;
    }

    // check if the number candidate valid at cell (r,c)
    bool checkValid(int candidate, int r, int c)
    {
        bool valid = true;
        // check column
        for (int i = 0; i < board.size() && valid; ++i) {
            if ((i != r) && (board[i][c] == candidate))
                valid = false;
        }

        // check row
        for (int j = 0; j < board[0].size() && valid; ++j) {
            if ((j != c) && (board[r][j] == candidate))
                valid = false;
        }
        return valid;
    }

    // recursive implementation
    bool solve(int r, int c)
    {
        bool success = false;

        // base case: no more empty slots
        if (!findNextEmpty(r, c))
            return true;

        // nxn is size of board
        int n = board.size();
        for (int i = 1; i <= n; ++i) {
            board[r][c] = i;
            if (checkValid(i, r, c)) {
                success = solve(r, c); // solve for next empty slot
            }
            if (success)
                break;
            else
                board[r][c] = -1; // try the next candidate for same slot
        }
        return success;
    }

public:
    void print()
    {
        for (int i = 0; i < board.size(); ++i) {
            for (int j = 0; j < board[i].size(); ++j)
                cout << board[i][j] << " ";
            cout << "\n";
        }
        cout << "\n";
    }

public:
    bool solve()
    {
        Initialize();
        return solve(0, -1);
    }
};
// how to use:
// sudoku s;
// s.solve();
// s.print();

Take Away Lesson

Backtracking is a very important principle that every software engineer should be aware of, especially for interviews. You should use it when you need to enumerate all solutions of a problem. Take advantage of it in scenarios where the solutions required have to satisfy a given constraint.

But before applying backtracking blindly to a problem, think of other possible solutions and consider how you can optimize your code. As always, work things out on a piece of paper with a pen (or with pseudocode!), rather than directly jumping into code.

Quiz Time

Given this pseudo-code for the N combinations problem:

base case:
1. If all combinations starting with items in positions < (size-N) have been printed. Stop

recursive case:
Combos(set, result)
1. Repeat for each items i in the set:
    a. Put the item i in the result set
    b. if the result set has N items, display it
        else
        recursively call combos with (the input set without item i) and (the result set)
    c. Remove the item i from result set

What should you change in the code above if all possible combinations of any size are to be displayed?

Change step a of Combos routine with N items in set
Change step b of Combos and display the set unconditionally
Remove step c of Combos
None of these options

Solution: Change step b of Combos and display the set unconditionally

Question 2

For the path problem through a grid, how many possible paths are there for a 5x5 grid if the start position is (0,0) and the goal is (4,4)?

64
70
32
None of the above

Solution: 70

Top comments (2)

Djamaile • Aug 10 '20 • Edited

What program did you use for the drawings? Nice article! Always good the freshen up on these topics.

Jake Z. • Aug 11 '20

Hey Dhamaile,

Excalidraw.io :-) But really, any diagramming software works the same. It's more about choosing the right color pairings and communicating the correct message.

DEV Community