DEV Community

Cover image for Combining two datasets into one
Jake McLelland
Jake McLelland

Posted on

Combining two datasets into one

The other day I had a client request that we add square pegs to a round hole. We needed to return two, very different objects into a single collection. This happens all the time in this industry, of course. And it's obviously not a big deal to add a new object that can represent the good bits from both objects as one conjoined unit.

But, one of the more interesting challenges that resulted was that I needed a search method that:

  1. Definitely returns objects from both distinct data sets
  2. Still honors an overall take limiter from the client application.

In other words, if the client app request only 10 records, but there are some of both data sets, how do you limit the final results so that there are definitely some of both (if possible) and yet the overall return stops at the requested limit?

If there are no results in the first dataset, but some in the other, then clearly you can just use the requested take count.

It gets tricky, though, when there are a few in both, especially when there are a bunch in one but only one or two in the other.

Let's say the client app requested 10 records, one dataset returns 9, and the other 1. As a human, I read that and it's a no brainer. We can take all of both, it's less then 10, all good. But what if there are 6 of one and 42 of the other? Or thousands in both?

I'm sure there's probably a more sophisticated way to handle this, either mathmatically, using AI, or something else more clever. But this is what I came up with.

👇 Run this in dotnetfiddle

public static (int leftTakeCount, int rightTakeCount) GetTakeCountsFromTwoSourcesToMeetOneTotalQuota(int take, int leftCount, int rightCount)
{
    int halfTake = take / 2; // since it's an int here, the decimal will just truncate
    int totalCount = leftCount + rightCount;

    bool totalCountAboveTake = totalCount > take;

    bool leftAboveHalfTake = leftCount > halfTake;
    bool shouldReduceLeft = totalCountAboveTake && leftAboveHalfTake;

    bool rightAboveHalfTake = rightCount > halfTake;
    bool shouldReduceRight = totalCountAboveTake && rightAboveHalfTake;

    // When both counts are above the requested take count and also above the halfTake, then we can just chop both down to the half count
    bool canReduceBothToHalftake = totalCountAboveTake & leftAboveHalfTake & rightAboveHalfTake;
    int leftReductionToHalfTake = canReduceBothToHalftake ? leftCount - halfTake : 0;
    int rightReductionToHalfTake = canReduceBothToHalftake ? rightCount - halfTake : 0;

    // When one of the counts is below halfTake, we need to calculate the difference so that we'll end up with some of both
    int leftReductionToDifference = (shouldReduceLeft & !canReduceBothToHalftake) ? leftCount - (take - rightCount) : 0;
    int rightReductionToDifference = (shouldReduceRight & !canReduceBothToHalftake) ? rightCount - (take - leftCount) : 0;

    // Since we used booleans above, one of these will end up as 0. So get the other one
    int maxLeftReduction = Math.Max(leftReductionToHalfTake, leftReductionToDifference);
    int maxRightReduction = Math.Max(rightReductionToHalfTake, rightReductionToDifference);

    int finalLeftReduction = shouldReduceLeft ? maxLeftReduction : 0;
    int finalRightReduction = shouldReduceRight ? maxRightReduction : 0;

    int leftTakeCount = leftCount - finalLeftReduction;
    int rightTakeCount = rightCount - finalRightReduction;

    return (leftTakeCount, rightTakeCount);
}
Enter fullscreen mode Exit fullscreen mode

Here, I assume that if the sum of both counts is less than or equal to the requested take amount, then we can just use all of both. But if the sum is more than that, we need to intelligently reduce the larger side to leave room for the smaller side. And if there's just a bunch in both sets, we can just cut the requested take value in half and return that many of both values.

The result is a tuple that gives me the take value of the first or 'left' dataset separately from the second or 'right' dataset. That allows me to limit my database query for both sets so that the total number of records is whatever the client requested.

One obvious issue that I don't have time to address here is that since I'm using an int as the halfTake value and therefore disposing of the remainder decimal values, if you request an uneven number like 25, then the halfTake will end up one rounding up to the nearest int, and you'll end up with one extra record.

But, since I control the client app, . . . I'm not going to do that, and just not worry about that for right now. Besides, my consuming service method actually has to sort these values later, so I can just lop off the extras with another .Take(request.Take) statement in the service before returning it to the client.

Top comments (0)