In the world of surveys, it is very common that our acquired responses need to be weighted in order to achieve a sample that is representative of some target population. This process of weighting simply consists of assigning a weight (a.k.a. factor) to each respondent, and calculating all survey results as a weighted sum of respondents.
For example, we might have surveyed 100 male respondents and 150 female respondents but were targeting a male / female ratio of 48% / 52%. In this simple case, we could achieve the target ratio by weighting the male responses by a factor of
0.48 / (100 / (100 + 150)) = 1.2 and weighting the female responses by
0.52 / (150 / (100 + 150) = 0.867.
The technical term for this method of computing weights is Post-Stratification.
However, in a more complex scenario, where we have many different measurable demographic targets, how can we determine weights for all the survey respondents?
At Potloc, it is very common that our clients desire survey populations matching a lot of such targets. For example, we might have targets looking like this:
- 42% male
- 58% female
- 20% students
- 80% non-students
- 15% dog owners
In this setting, weights cannot be calculated using a simple ratio as in the male/female example shown above. Here, we instead need to rely on more involved algorithms, notably a process called Raking.
One common approach to solve the problem of finding good weights that will satisfy our demographic targets is Iterative Proportional Fitting. In this method, weights for each respondents are computed for a single target at a time using Post-Stratification. By iteratively computing this for each target and repeating a few times, the weights end up converging to values that satisfy our targets.
Great! Problem solved!
...but what if we could do even better? 🤔
Beyond satisfying the demographic targets, the most desirable property for the weights is that they should be as close as possible to 1. Indeed, weights that are really large mean that those respondents' responses will count for a lot more than the "average" respondent in our survey results. For example, a respondent with a weight of 10 will count for 10 times more than the average respondent, and 100 times more than a respondent with weight 0.1 . Similarly, small weights mean that some responses will have very little impact on the final results.
Unfortunately, Iterative Proportional Fitting does nothing to encourage weights to be close to 1, which leads to sub-optimal weights. This is where Generalized Raking, an algorithm introduced by Deville et al. (1992), comes into play.
Note: This is where we get into the more mathematical part of this blog post 🤓. Don't care about this part? No worries! Simply skip to the next section!
The authors of this paper formulated the weighting problem as a constrained optimization method where the objective is that the weights are as close to one as possible and where the constraint is that the targets are matched. Mathematically this looks like this:
where is the raking function which encourages weights to be close to 1, is the vector of weights, is the vector of targets (in absolute numbers, not percentages) and is the matrix of responses. The matrix is binary where cells are filled with a '1' if the respondent belongs to the target category and '0' otherwise.
In other words, this is saying that we want to optimize the weights to be as close to 1 as possible while satisfying the target constraints. This is achieved by minimizing the function
which looks like this (notice the global minimum at
While it is possible to solve this optimization problem using general methods such as Sequential Least Squares Programming, the authors of Generalized Raking have devised a more efficient and robust algorithm for this specific problem:
- Initialize variables
- A vector to ones
- A vector to zeros
- A square matrix to the Identity matrix
- While the weights have not converged
is the inverse of the derivative of the raking function, i.e.
is it's derivative, in this case also .
While there are many implementations of this algorithm in R, we were not able to find one in Ruby that could play well with our codebase and be easily maintainable.
We therefore decided to make our own and to share it here for anyone looking for something similar. We started by making an implementation in python with the popular
import numpy as np def raking_inverse(x): return np.exp(x) def d_raking_inverse(x): return np.exp(x) def graking(X, T, max_steps=500, tolerance=1e-6): # Based on algo in (Deville et al., 1992) explained in detail on page 37 in # https://orca.cf.ac.uk/109727/1/2018daviesgpphd.pdf # Initialize variables - Step 1 n, m = X.shape L = np.zeros(m) # Lagrange multipliers (lambda) w = np.ones(n) # Our weights (will get progressively updated) H = np.eye(n) success = False for step in range(max_steps): L += np.dot(np.linalg.pinv(np.dot(np.dot(X.T, H), X)), (T - np.dot(X.T, w))) # Step 2.1 w = raking_inverse(np.dot(X, L)) # Step 2.2 H = np.diag(d_raking_inverse(np.dot(X, L))) # Step 2.3 # Termination condition: loss = np.max(np.abs(np.dot(X.T, w) - T) / T) if loss < tolerance: success = True break if not success: raise Exception("Did not converge") return w
After validating the algorithm in python, we then proceeded to replicate it in Ruby. For this, we had to find an equivalent to
numpy which we found in Numo. Numo is an awesome library for vector and matrix operations, and its
linalg sub-library was perfect for us as we needed to compute a matrix pseudo-inverse. This allowed us to translate the code to Ruby almost line by line:
require "numo/narray" require "numo/linalg" def raking_inverse(x) Numo::NMath.exp(x) end def d_raking_inverse(x) Numo::NMath.exp(x) end def graking(X, T, max_steps: 500, tolerance: 1e-6) # Based on algo in (Deville et al., 1992) explained in detail on page 37 in # https://orca.cf.ac.uk/109727/1/2018daviesgpphd.pdf # Initialize variables - Step 1 n, m = X.shape L = Numo::DFloat.zeros(m) w = Numo::DFloat.ones(n) H_diag = Numo::DFloat.ones(n) success = false max_steps.times do L += Numo::Linalg.pinv((X.transpose * H_diag).dot(X)).dot(T - X.transpose.dot(w)) # Step 2.1 w = raking_inverse(X.dot(L)) # Step 2.2 H_diag = d_raking_inverse(X.dot(L)) # Step 2.3 # Termination condition: loss = ((T - X.transpose.dot(w)).abs / T).max if loss < tolerance success = true break end end raise StandardError, "Did not converged" unless success w end
You may have noticed that the code doesn't quite match exactly the algorithm described above, notably steps 2.1 and 2.3. This is because we have found it to be vastly faster with Numo to store the sparse matrix
as a flat vector
h_matrix_diagonal since it only contains values on the diagonal. As a result, the step of taking the product
can be rewritten as
X.Transpose * h_matrix_diagonal, making use of Numo's implicit broadcasting.
In practice, we optimize this code a bit further by exiting early whenever possible (for example if our loss becomes
NaN) and by allowing to pass as input an initial value for the vector
lambdas if we believe to have an initialisation value better than the default.
With these few lines of code, we are now able to support complex survey weighting scenarios while having all of our code in our beautiful Ruby monolith 🎉
Interested in what we do at Potloc? Come join us! We are hiring 🚀