DEV Community

Kyle Pena
Kyle Pena

Posted on • Edited on

Derivation of Welford's Algorithm

This will be somewhat out of context if you're coming here first. It's really a footnote in a much longer series of blog posts on summarizing data distributions in computing environments where storage is at a premium.

In that blog post, I wanted to explain how to derive Welford's Algorithm for the recurrence relation for the second central moment, and found the explanations I could find a little lacking (at least for me).

I found this helpful post to be a great starting point, but the algebra part of the post skipped over so many steps that I couldn't follow it.

So I worked it out in greater detail. I'm hoping this is helpful to other mere mortals who, like myself, couldn't quite connect the dots.

Notation:

  • y represents a new observation.
  • μ' is the updated mean (the mean after incorporating y).
  • M2M_2 is the second central moment - well, not quite. Technically you'd have to divide by N to get the central moment. But we'll call it the central moment here.
  • N is the number of observations including y.

First, some work on the mean update formula:

μ=μ+yμN\mu^\prime = \mu + \frac{y - \mu}{N}
Nμ=Nμ+yμN\mu^\prime = N\mu + y - \mu
(N1)μ+μ=(N1)μ+y(N-1)\mu^\prime + \mu^\prime = (N-1)\mu + y
(N1)μ(N1)μ=yμ(N-1)\mu^\prime - (N-1)\mu = y - \mu^\prime
(N1)μ(N1)μ=μy(N-1)\mu - (N-1)\mu^\prime = \mu^\prime - y

Now, the main derivation:

M2M2=1N(yiμ)21N1(yiμ)2M_2^\prime - M_2 = \sum_1^N{(y_i - \mu^\prime)^2} - \sum_1^{N-1}{(y_i - \mu)^2}
=(yμ)2+1N1(yiμ)2(yiμ)2= (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i - \mu^\prime)^2 - (y_i - \mu)^2}
=(yμ)2+1N1(yi22yiμ+μ2)(yi22yiμ+μ2)= (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i^2 - 2y_i\mu^\prime +\mu^{\prime 2}) - (y_i^2 - 2y_i\mu + \mu^2)}
=(yμ)2+1N12yiμ+2yiμ+(μ2μ2))= (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i\mu^\prime + 2y_i\mu + (\mu^{\prime 2} - \mu^2))}
=(yμ)2+1N12yi(μμ)+(μμ)(μ+μ)= (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i(\mu^\prime - \mu) + (\mu^\prime - \mu)(\mu^\prime + \mu)}
=(yμ)2+1N1(μμ)(2yi+μ+μ)= (y - \mu^\prime)^2 + \sum_1^{N-1}{(\mu^\prime - \mu)(-2y_i + \mu^\prime + \mu)}
=(yμ)2+(μμ)1N1(2yiμμ)= (y - \mu^\prime)^2 + (\mu - \mu^\prime) \sum_1^{N-1}{(2y_i - \mu^\prime - \mu)}
=(yμ)2+(μμ)[2(N1)μ(N1)μ(N1)μ]= (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ 2(N-1)\mu - (N-1)\mu^\prime - (N-1)\mu ]
=(yμ)2+(μμ)[(N1)μ(N1)μ]= (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ (N-1)\mu - (N-1)\mu^\prime ]

Substitute the mean update term we worked out earlier.

=(yμ)2+(μμ)(μy)= (y - \mu^\prime)^2 + (\mu - \mu^\prime) (\mu^\prime - y)
=(yμ)2+(μμ)(yμ)= (y - \mu^\prime)^2 + (\mu^\prime - \mu) (y - \mu^\prime)
=(yμ)(yμ+μμ)= (y - \mu^\prime)(y - \mu^\prime + \mu^\prime - \mu)
=(yμ)(yμ)= (y - \mu^\prime)(y - \mu)

This equals the difference M2M2M_2^\prime - M_2 , and so the complete recurrence relation for the second central moments is:

M2=M2+(yμ)(yμ)M_2^\prime = M_2 + (y - \mu^\prime)(y - \mu)

And thus the recurrence relation for the corrected sample variance based on the second central moment is:

σ2=1N1[M2+(yμ)(yμ)]\sigma^{\prime 2} = \frac{1}{N-1} [ M_2 + (y - \mu^\prime)(y - \mu) ]

Top comments (0)