DEV Community

Jérôme Parent-Lévesque for Potloc

Posted on

Generalized Raking for Survey Weighting

In the world of surveys, it is very common that our acquired responses need to be weighted in order to achieve a sample that is representative of some target population. This process of weighting simply consists of assigning a weight (a.k.a. factor) to each respondent, and calculating all survey results as a weighted sum of respondents.

For example, we might have surveyed 100 male respondents and 150 female respondents but were targeting a male / female ratio of 48% / 52%. In this simple case, we could achieve the target ratio by weighting the male responses by a factor of 0.48 / (100 / (100 + 150)) = 1.2 and weighting the female responses by 0.52 / (150 / (100 + 150) = 0.867.
The technical term for this method of computing weights is Post-Stratification.

However, in a more complex scenario, where we have many different measurable demographic targets, how can we determine weights for all the survey respondents?

Raking

At Potloc, it is very common that our clients desire survey populations matching a lot of such targets. For example, we might have targets looking like this:

  • 42% male
  • 58% female
  • 20% students
  • 80% non-students
  • 15% dog owners
  • ...

In this setting, weights cannot be calculated using a simple ratio as in the male/female example shown above. Here, we instead need to rely on more involved algorithms, notably a process called Raking.

Iterative Proportional Fitting

One common approach to solve the problem of finding good weights that will satisfy our demographic targets is Iterative Proportional Fitting. In this method, weights for each respondents are computed for a single target at a time using Post-Stratification. By iteratively computing this for each target and repeating a few times, the weights end up converging to values that satisfy our targets.

Great! Problem solved!

...but what if we could do even better? 🤔

Generalized Raking

Beyond satisfying the demographic targets, the most desirable property for the weights is that they should be as close as possible to 1. Indeed, weights that are really large mean that those respondents' responses will count for a lot more than the "average" respondent in our survey results. For example, a respondent with a weight of 10 will count for 10 times more than the average respondent, and 100 times more than a respondent with weight 0.1 . Similarly, small weights mean that some responses will have very little impact on the final results.

Unfortunately, Iterative Proportional Fitting does nothing to encourage weights to be close to 1, which leads to sub-optimal weights. This is where Generalized Raking, an algorithm introduced by Deville et al. (1992), comes into play.

Note: This is where we get into the more mathematical part of this blog post 🤓. Don't care about this part? No worries! Simply skip to the next section!

The authors of this paper formulated the weighting problem as a constrained optimization method where the objective is that the weights are as close to one as possible and where the constraint is that the targets are matched. Mathematically this looks like this:

arg minw  G(x)    s.t.  XTw=TG(x)=x(log(x)1)+1 \argmin_{w} \; G(x) \;\; \text{s.t.} \; X^T w = T \newline G(x) = x(\log(x) - 1) + 1

where G(x)G(x) is the raking function which encourages weights to be close to 1, ww is the vector of weights, TT is the vector of targets (in absolute numbers, not percentages) and XX is the (numRespondents×numTargets)(\text{numRespondents} \times \text{numTargets}) matrix of responses. The matrix XX is binary where cells are filled with a '1' if the respondent belongs to the target category and '0' otherwise.

In other words, this is saying that we want to optimize the weights to be as close to 1 as possible while satisfying the target constraints. This is achieved by minimizing the function G(x)G(x) which looks like this (notice the global minimum at x=1x=1 !):
image

The Generalized Raking Algorithm

While it is possible to solve this optimization problem using general methods such as Sequential Least Squares Programming, the authors of Generalized Raking have devised a more efficient and robust algorithm for this specific problem:

  1. Initialize variables
    • A (numRespondents×1)(\text{numRespondents} \times 1) vector ww to ones
    • A (numTargets×1)(\text{numTargets} \times 1) vector λ\lambda to zeros
    • A (numRespondents×numRespondents)(\text{numRespondents} \times \text{numRespondents}) square matrix HH to the Identity matrix
  2. While the weights have not converged
    1. λ=λ+(XTHX)1(TXTw)\lambda = \lambda + (X^T H X)^{-1} (T - X^T w)
    2. w=G1(Xλ)w = G^{-1}(X \lambda)
    3. H=diag(G1(Xλ))H = \text{diag}({G^{-1}}'(X \lambda))

Here, G1(x)G^{-1}(x) is the inverse of the derivative of the raking function, i.e. exe^x .
G1{G^{-1}}' is it's derivative, in this case also exe^x .

Implementation

While there are many implementations of this algorithm in R, we were not able to find one in Ruby that could play well with our codebase and be easily maintainable.
We therefore decided to make our own and to share it here for anyone looking for something similar. We started by making an implementation in python with the popular numpy library:

import numpy as np

def raking_inverse(x):
  return np.exp(x)

def d_raking_inverse(x):
  return np.exp(x)

def graking(X, T, max_steps=500, tolerance=1e-6):
  # Based on algo in (Deville et al., 1992) explained in detail on page 37 in
  # https://orca.cf.ac.uk/109727/1/2018daviesgpphd.pdf

  # Initialize variables - Step 1
  n, m = X.shape
  L = np.zeros(m) # Lagrange multipliers (lambda)
  w = np.ones(n) # Our weights (will get progressively updated)
  H = np.eye(n)
  success = False

  for step in range(max_steps):
    L += np.dot(np.linalg.pinv(np.dot(np.dot(X.T, H), X)), (T - np.dot(X.T, w))) # Step 2.1
    w = raking_inverse(np.dot(X, L)) # Step 2.2
    H = np.diag(d_raking_inverse(np.dot(X, L))) # Step 2.3

    # Termination condition:
    loss = np.max(np.abs(np.dot(X.T, w) - T) / T)
    if loss < tolerance:
        success = True
        break

  if not success: raise Exception("Did not converge")
  return w
Enter fullscreen mode Exit fullscreen mode

Ruby Implementation

After validating the algorithm in python, we then proceeded to replicate it in Ruby. For this, we had to find an equivalent to numpy which we found in Numo. Numo is an awesome library for vector and matrix operations, and its linalg sub-library was perfect for us as we needed to compute a matrix pseudo-inverse. This allowed us to translate the code to Ruby almost line by line:

require "numo/narray"
require "numo/linalg"

def raking_inverse(x)
  Numo::NMath.exp(x)
end

def d_raking_inverse(x)
  Numo::NMath.exp(x)
end

def graking(X, T, max_steps: 500, tolerance: 1e-6)
  # Based on algo in (Deville et al., 1992) explained in detail on page 37 in
  # https://orca.cf.ac.uk/109727/1/2018daviesgpphd.pdf

  # Initialize variables - Step 1
  n, m = X.shape
  L = Numo::DFloat.zeros(m)
  w = Numo::DFloat.ones(n)
  H_diag = Numo::DFloat.ones(n)

  success = false

  max_steps.times do
    L += Numo::Linalg.pinv((X.transpose * H_diag).dot(X)).dot(T - X.transpose.dot(w)) # Step 2.1
    w = raking_inverse(X.dot(L)) # Step 2.2
    H_diag = d_raking_inverse(X.dot(L)) # Step 2.3

    # Termination condition:
    loss = ((T - X.transpose.dot(w)).abs / T).max
    if loss < tolerance
      success = true
      break
    end
  end

  raise StandardError, "Did not converged" unless success
  w
end
Enter fullscreen mode Exit fullscreen mode

You may have noticed that the code doesn't quite match exactly the algorithm described above, notably steps 2.1 and 2.3. This is because we have found it to be vastly faster with Numo to store the sparse matrix HH as a flat vector h_matrix_diagonal since it only contains values on the diagonal. As a result, the step of taking the product XTHX^T H can be rewritten as X.Transpose * h_matrix_diagonal, making use of Numo's implicit broadcasting.

In practice, we optimize this code a bit further by exiting early whenever possible (for example if our loss becomes NaN) and by allowing to pass as input an initial value for the vector lambdas if we believe to have an initialisation value better than the default.

With these few lines of code, we are now able to support complex survey weighting scenarios while having all of our code in our beautiful Ruby monolith 🎉

Interested in what we do at Potloc? Come join us! We are hiring 🚀

References

Discussion (0)