Recently I ran into a situation, where I wanted to calculate a mean of an unknown size set. My first naive idea was to calculate the average value between current mean and the new value:
xs = [3, 7, 6] # Assuming we don't know the length
mean = x[0]
n = 1
while n < len(xs):
mean = (mean + xs[n])/2
n += 1
Which quickly reveals to be simply wrong:
(3 + 7 + 6)/3 = 5.3333
((5/2 + 7/2)/2 + 6/2) = 5.5
In order to find the actual relation between current mean and the i'th value, I started comparing mean from 2 and the mean from 3 values:
(3 + 7)/2 = 3/2 + 7/2
(3 + 7 + 6)/3 = (3 + 7)/3 + 6/3
From here, it's possible to rewrite mean from 3 values in terms of mean from 2 values:
(3 + 7)/3 + 6/3 =
= (3 + 7)/2 * 2/3 + 6/3 =
= (3/2 + 7/2)*2/3 + 6/3
Where I noticed the pattern:
mean(i) = mean(i-1) * (i-1)/i + x(i-1)/i
Which gives us the correct algorithm for iterative calculation of the mean:
xs = [3, 7, 6] # Assuming we don't know length
mean = xs[0]
n = 1
while n < len(xs):
n += 1
mean = mean*(i-1)/i + xs[i-1]/i
While I know that in this example iterative calculation is unnescessary, I found this real handy for implementing a segment growing algorithm, where I decide what pixels to add to the segment based on current segment's mean value.
Top comments (1)
The normal way is to keep a count of, and the sum of, the numbers so far. The sum divided by the count is the mean at any point
Further statistics can calculate other values in a similar way, such as standard deviations.