Prince Raj

Posted on Dec 9

Optimizing Code for Performance: A Case Study in Prefix Arrays

In the world of competitive programming and algorithm design, even small changes in code can result in massive performance gains. In this article, we analyze a simple yet significant optimization in a problem-solving code. We'll dissect the two versions of a function designed to determine if subarrays of a given array are "special," as defined by a specific condition. One version of the code executes in 435ms, while a slightly modified version runs in 4ms. Let's uncover why.

The Problem Statement

The problem involves determining whether a subarray of integers is "special." A subarray is special if, for every consecutive pair of elements, their parities alternate (i.e., one is even, the other is odd, and vice versa). Given an array nums and a set of queries [u, v] (where u and v are the bounds of the subarray), the task is to answer whether each subarray satisfies the "special" condition.

The Solution: Prefix Arrays

Both implementations leverage a prefix array (pref) to preprocess the data for efficient query answering. The prefix array pref[i] stores the count of parity alternations from the start of the array up to index i. Queries are answered using the difference between two prefix values, avoiding the need to re-check the array for each query.

First Implementation: The Slower Version

class Solution {
public:
    vector<bool> isArraySpecial(vector<int>& nums, vector<vector<int>>& queries) {
        int n = nums.size();
        vector<int> pref(n+1, 0);
        for (int i = 1; i < n; i++) {
            pref[i+1] = pref[i];
            if (nums[i] % 2 != nums[i-1] % 2) {
                pref[i+1]++;
            }
        }

        vector<bool> ans;
        for (auto& query : queries) {
            int u = query[0], v = query[1];
            if (u == v){
                ans.push_back(true);
                continue;
            }
            ans.push_back(pref[v+1] - pref[u+1] == v - u);
        }

        return ans;
    }
};

Second Implementation: The Faster Version

class Solution {
public:
    vector<bool> isArraySpecial(vector<int>& nums, vector<vector<int>>& queries) {
        int n = nums.size();
        vector<int> pref(n+1, 0);
        for (int i = 1; i < n; i++) 
            pref[i+1] = pref[i] + (nums[i] % 2 != nums[i-1] % 2);

        vector<bool> ans;
        for (auto& query : queries) {
            int u = query[0], v = query[1];
            if (u == v){
                ans.push_back(true);
                continue;
            }
            ans.push_back(pref[v+1] - pref[u+1] == v - u);
        }

        return ans;
    }
};

Key Differences

1. Incremental Assignment in Prefix Calculation

Slow Version:

  pref[i+1] = pref[i];
  if (nums[i] % 2 != nums[i-1] % 2) {
      pref[i+1]++;
  }

Here, the prefix array is updated in two steps:

First, the value from the previous prefix index is copied.
Then, the value is conditionally incremented.
- Fast Version:

  pref[i+1] = pref[i] + (nums[i] % 2 != nums[i-1] % 2);

In this version, the prefix array is updated in a single step using an arithmetic expression.

2. Compiler Optimization and CPU Pipelines

The expression (nums[i] % 2 != nums[i-1] % 2) in the fast version is directly added to the prefix value, avoiding branching logic. CPUs are optimized for branch prediction and instruction pipelining, and conditional statements (like if) introduce potential branch mispredictions. The fast version eliminates branching entirely in the prefix computation loop.

3. Memory Access Patterns

In the slow version:

Memory is accessed twice for each prefix update (pref[i] and pref[i+1]). In the fast version:
Memory is accessed once per update (pref[i+1]).

Although this may seem minor, repeated memory accesses can significantly impact performance in tight loops, especially for large arrays.

Why the Huge Difference in Execution Time?

Time Complexity

Both versions have the same theoretical time complexity:

Preprocessing: ( O(n) )
Query Answering: ( O(q) ), where ( q ) is the number of queries.

However, constant factors matter in real-world performance:

The fast version avoids branching and redundant memory operations, minimizing execution overhead.
The slow version introduces branch misprediction penalties and additional operations per loop iteration.

Empirical Results

Slow Version: 435ms
Fast Version: 4ms

This 100x improvement is a testament to the power of micro-optimizations, especially in competitive programming contexts.

Lessons Learned

Avoid Conditional Statements in Tight Loops: Whenever possible, replace if-based logic with arithmetic expressions to exploit the CPU's efficiency with branchless code.
Minimize Memory Access: Reducing redundant memory reads and writes can significantly boost performance.
Understand Your Tools: A deep understanding of how compilers and CPUs optimize code can help write more efficient solutions.
Profile Your Code: Use profiling tools to identify bottlenecks and focus your optimization efforts where they matter most.

Conclusion

In programming, small details often make a big difference. The journey from 435ms to 4ms illustrates how thoughtful optimizations, such as avoiding branches and reducing memory operations, can transform code performance. For competitive programmers, these insights are not just valuable—they're essential.

DEV Community