## Part 3 — Training Models

In Parts 1 and 2 of this series we built our models by inspection. This was a good way to introduce the concepts involved but in reality what we’ll want to do is “train” our models using the data itself. This means setting the states’ output probabilities and transition probabilities. The more observations we have, the better.

The technique we’ll use is called **Viterbi alignment**. First we set the initial values for the transition and output probabilities and then we iterate in a series of refinements until we converge. That is, until the output probabilities no longer change with a new iteration. Let’s dig a little further into this process.

First we set the initial values:

- Roughly divide each observation into the number of states in the model e.g. for 3 states divide each observation into 3 parts of similar size.
- Get average number of data points (frames) per observation.
- Divide average number of frames per observation by number of states to get average frames per state (time expected to remain in a state).
- Set
$P($
*transition out of each state*$)=1/$*average frames per state*. -
$P($
*self transition*$)=1-P($*transition out of each state*$)$ . We’ll now refer to transitions out and to the same state simply as $P(transitions)$ . Initially all states will have the same probabilities. - Calculate mean and standard deviation for each state.

Let’s go over those steps with a real example. Let’s say we have three observation sequences. Like in our previous examples, we’re tracking only one feature: the change in the position of one hand on the y axis.

###### Figure 1. Initial values

Our three observations in *Figure 1* have 16, 5, and 14 frames respectively. That gives us approximately 12 average frames per observation. Assuming we’re still working with 3 states in our model, 12 avg frames/3 states = 4 avg frames (data points) per state. If we’re spending 4 frames in each state on average, then we should expect to escape each state
$1/4$
of the time. As a result, the self transition probability is
$3/4$
. So
$P($
*transition out*
$)=0.25$
and
$P($
*self transition*
$)=0.75$
.

Let’s divide each observations into similarly sized thirds (for three states).

###### Figure 2. Dividing each observation into 3 states

Now calculate the mean and standard deviation for each state using the data points for each state as we just defined them.

###### Figure 3. Gaussians for each state

Now comes the iteration. We’ll see if moving the boundaries between the states will lead to a better explanation of the data. We’re looking to see if we can get Gaussians with a lower variance. We’ll converge once we no longer need to move any boundaries and none of the output probabilities (mean and standard deviation) change.

This is the process at a high level. While not converged:

- Move the state boundaries in the observation sequences whenever a frame is closer to the neighboring state than to its current state as measured by the distance to the mean (in standard deviations).
- Recalculate
$P(transitions)$
for each state (
$1/$
*average frames*$)$ . - Recalculate output probabilities for each state (mean and standard deviation).

Let’s apply these steps to our example. On the first sequence, look at the boundary between states 1 and 2. The $5$ is much closer to the mean of state 1 $(5.4)$ than to the mean of state 2 $(-0.6)$ as measured in standard deviations so that boundary should move to the right moving the $5$ to state 1.

###### Figure 4. The boundary between S1 and S2 moves to the right

Let’s keep going. What about the $2$ that is now at the new boundary between states 1 and 2? As shown in Figure 5, the $2$ is a little more than $1$ standard deviation away from the mean of S1 and less than $1$ standard deviation from the mean of S2, so it’s still better in the second state (for now). By the time we finish this iteration the mean and standard deviation for the states will have changed as data points move to another state.

###### Figure 5. Determining if the boundary needs to move

Let’s see if any other boundaries move in this iteration. The $-1$ is not going to move, it’s way closer to $-0.6$ than to $-5$ . In the second observation, the $-7$ is closer to the mean of S3, so that boundary moves.

In the last observation, the $3$ is closer to the mean of S1 and the $-5$ is right at the mean of S3, so those move.

How about the $-3$ in S2 of the last observation? It’s $2.4/3.1=0.774$ standard deviations away from state 2 and $2/2.6=0.769$ standard deviations away from state 3 so we’ll move it to state 3.

We’re done moving boundaries so let’s recalculate everything. First the transition probabilities:

###### New transition probabilities

And then the new output probabilities (Gaussians):

###### New output probabilities

We’ve reached the end of the first iteration! Let’s start the second iteration and see if we need to move any boundaries given our new mean and std dev for the outputs.

In the first observation, the $2$ is now closer to S1 than to S2 so it moves to S1.

The $-1$ stays in S2 in both observations 1 and 3. No more moves needed so let’s recalculate everything. First the transition probabilities:

###### New transition probabilities

And then the output probabilities (Gaussians):

###### New output probabilities

We finished the second iteration! In the next iteration none of the boundaries are going to move nor the output probabilities going to change so we’ve converged! We have now initialized the values for each state of our HMM. We:

- Assigned a state to each time frame in each of our observations.
- Calculated the resulting averages for all values assigned to each state.
- Calculated the average amount of time we expect to stay in each state.

Here’s our final result:

## Top comments (0)