# More Training

After my last post on WalkSafe and machine learning classification on running, I've spent a lot of time testing WalkSafe in real-world scenarios personally. I've been mostly favorably impressed with the performance of the classification, but there's been something in the back of my mind telling me I could do better.

I was experiencing a number of false-positives (Driving slow looked like Running, for example, or Walking fast looked like Running), so I decided to retrain my neural network to better generalize for unseen conditions and improve general classification performance from my last article.

# Three Big Gains

## 1. Normalize

The first and biggest gain came when I realized that I was feeding raw speeds (15 m/s, for example) into the neural network and I discovered that it might perform better on 0-1 ranged data. So, I setup a simple normalization routine to normalize/unnormalize the data by setting a `MAX`

speed. Basically, I took the raw speed points and did this for every point:

`const inputSpeed = rawSpeed / MAX_SPEED`

For my app, I've decided to use `33 m/s`

as a max speed, which is roughly 75 mph or 110 kph.

I did try experimenting with bucketing speeds (e.g. "snapping to a grid" or rounding to every 2 m/s), as well as averaging speeds together (average two readings into one). These were both done in an attempt to get the network to better generalize with unseen data. However, testing with datasets the network had not seen (and even recall tests) showed that bucketing and averaging produced significant DROPS in performance (recall and generalization.) Therefore, those techniques were discarded.

## 2. Training Set Structure

Another gain, albeit somewhat smaller, was made by changing the way I loaded by test data.

Originally, I loaded all the data from ~8 separate CSV files, then concatenated all those points into a single array, and finally made ngrams out of that array of points.

This had the unrealized effect of making ngrams out of two separate data sets - when one set ended and the new set was concatenated onto the end, an ngram could span both sets.

Therefore, in order not to "confuse" the network by feeding it training data that was not real, I changed the loading process to something like this:

```
const csvData = [
getCsv('file1.csv'),
getCsv('file2.csv'),
getCsv('file3.csv')
];
const trainingData = csvData
.map(lerpData) // see #3 "fill in the gaps", below
.map(makeNgrams) // from last article: [1,2,3,4] into [[1,2],[3,4]]
.reduce((list, ngrams) => list.concat(ngrams), []);
```

The end result is still a giant set of training data points in `trainingData`

, but it doesn't concatenate the points from the different data sets together until after they've been properly transformed

## 3. Fill in the Gaps

The second largest fundamental generalization and classification gain was made when I realized that there were gaps in the GPS speed readings. Which, of course, is obvious in a real-world collection scenario. However, I came to the conclusion that training the network on a speed transition of `1m/s`

> `5m/s`

without any context as to how fast that transition happened would be to deprive it of valuable contextual information that could aid in classification.

In order to capture this concept of time, I decided to normalize the inputs so that every input into the network represented a finite set of time stamps with a finite interval between each input. (Before, every input was NOT guaranteed to have a finite, fixed interval between each input.)

In order to accomplish this "finite, fixed interval" guarantee, I used a very simple concept, Linear interpolation.

Thanks to mattdes on GitHub, I've found this `lerp`

function (MIT licensed) useful in a number of my projects and I've reused it many times. Here it is in it's entirety:

```
//https://github.com/mattdesl/lerp/blob/master/index.js
function lerp(v0, v1, t) {
return v0*(1-t)+v1*t
}
```

The entirety of my lerping routine to normalize my data is shown below, in hopes that perhaps someone else might find it useful.

In short, it takes a set of points that look like `{speed:1.5, timestamp: '2019-09-26 02:53:02'}`

, and if the points are more than 1 second apart, this routine interpolates the speeds between the two points at 1-second steps.

The return list from this routine will be "guaranteed" to have data at 1 second intervals, so that every point into the neural network is guaranteed to have a difference of 1 second. This allows the network to better capture the idea of "speed of change" in the readings.

```
function lerpRawData(rawData) {
const lerped = [];
rawData.map((row, idx) => {
const speed = parseFloat(row.speed);
if(idx === rawData.length - 1) {
// at end, don't do lerp
lerped.push({ ...row });
return;
}
// Already checked if we're at end, so this doesn't need check
const nextIdx = idx + 1,
nextRow = rawData[nextIdx],
thisTime = new Date(row.timestamp).getTime(),
nextTime = new Date(nextRow.timestamp).getTime(),
nextSpeed = parseFloat(nextRow.speed),
delta = nextTime - thisTime;
// Step between the two timestamps in 1000ms steps
// and lerp the speeds between the timestamps based on percent distance
for(let time=thisTime; time<nextTime; time+=1000) {
const progress = (time - thisTime) / delta;
const interSpeed = lerp(speed, nextSpeed, progress);
const interTimestamp = new Date(time);
const d = {
...row,
timestamp: interTimestamp,
speed: interSpeed,
progress, // just for debugging
};
// Just for debugging
if(time > thisTime && time < nextTime)
d._lerped = true;
lerped.push(d);
}
});
return lerped;
}
```

## 4. Hidden Layers

I know the headline said three big gains, but it's worth mentioning here that an additional hidden layer appeared to aid in generalization as well. My hidden layer setup now looks like this:

`hiddenLayers: [ inputSize * 2, inputSize * 1.5 ]`

This produces a network similar to this hackish pseudocode:

```
inputSize = 4
[ * , * , *, * ] # inputs (ngram size)
[ * , * , *, * , *, *, * ] # hidden layer 1
[ * , * , *, * , * ] # hidden layer 2
[ * , * , *, * ] # outputs (4 classes)
```

## Conclusion

With these tweaks, my network now has slightly reduced recall across the board but exhibits consistently improved generalization. Performance on unseen data is now consistently at greater than 85% accuracy.

## Discussion