Originally Posted on Skiplist blog
Skiplist is JavaScript all the way down.
It is like a zombie virus. The language has taken over everything. And, my hand has a bite. Do I embrace my inner JavaScript kitty, become what I have always feared, or should I chop it off while I have a chance?
This week I optimized an in memory cache lookup. The customer dataset was a few orders of magnitude larger than anticipated. As a result, we had to refactor an in memory cache data structure.
The initial version of the cache was pair programmed and driven via TDD. I like to embrace my inner agile technical coach when I can. It may only be a coincidence, but it was easy to refactor the implementation details such that lookups now occur in constant time. Some technical details of the optimization trick are described below.
The source for example below can be found here in GitHub.
Declarative Programming
Imperative tells the how. Declarative code tells the what.
Let's look at an example, gathering the ages of three people:
const people = [
{id: 1, name: "Jason", age: 38},
{id: 2, name: "Justin", age: 34},
{id: 3, name: "Josh", age: 33}
]
// imperative
const ages = []
for(let person of people) {
ages.push(person.age);
}
// declarative
const ages = people.map(person => person.age)
JavaScript offers a few of built in declarative helper functions:
The indoctrinated claim, declarative code is expressive, elegant, and functional... "clean". I agree, not caring how the sausage is made allows you to enjoy it that much better! Yet, there are times when the how is important.
Using Find to Search for a Value
What about a similar scenario where you are looking up a person by id in list of a million entries?
const idToFind = 1000000
person = people.find(person => person.id === idToFind);
The above statement is clean, find the first person whose id is 1000000. In contrast, the imperative approach to the same linear search is about half a dozen more lines of code. Simple is awesome. Simple is clean. But, Big(O) notation ("Big O Notation") tells us linear search is literally the worse. We sacrifice performance for cleanliness, which is the trade off I pesonally will pick 99.8% of the time. #empatheticprogramming
If the keys are unique and our data set is of a manageable size, we can increase performance by turning our list of people into a map of people by id, then perform a hash lookup, O(1), on the id! We have at worse, a one time O(n) arrangement step, and then O(1) lookup on each record.
Code Example
As good stuarts of software craftsmanship, let's start off with a failing JavaScript Unit Test to assert the expected behavior.
const assert = require('assert');
const Search = require("./search");
describe('Search', function () {
const people = [];
before(() => {
people.push({id: 1, name: "Jason", age: 38});
people.push({id: 2, name: "Justin", age: 34});
people.push({id: 3, name: "Josh", age: 33});
});
it('should return the correct element', function () {
const expectedName = "Justin";
const search = new Search(people);
const person = search.find(2);
assert.equal(expectedName, person.name);
});
});
Imperative
At this point, we have a red test. Let's implement our first approach, a traditional imperative search using a for loop.
class Search {
constructor(people) {
this.people = people;
}
find(id) {
for(let person of this.people) {
if(person.id === id) {
return person;
}
}
}
}
module.exports = Search;
I setup a test harness to evaluate performance.
// performance output:
// >>> Average time for find for 3 records: 0.001 ms
// >>> Total time for find for 3 records: 2 ms
// >>> Average time for find for 1000000 records: 2.617 ms
// >>> Total time for find for 1000000 records: 2906 ms
Declarative
We have a green test that asserts the behavior and a performance test harness, we are free to move about the cabin (refactor the internals of the find
method)! Moving from imperative to declarative looks like:
// ...
find(id) {
return this.people.find(person => person.id === id);
}
// ...
// performance output:
// >>> Average time for find for 3 records: 0.001 ms
// >>> Total time for find for 3 records: 2 ms
// >>> Average time for find for 1000000 records: 2.356 ms
// >>> Total time for find for 1000000 records: 2690 ms
Our search time on a million records is relatively unchanged. With the move from imperative to declarative, we gain code cleanliness. The code now "tells the what" in such a way that swapping in a different data structure, like a map, is easier to conceptualize. We have a reduction in cognitive load.
Optimized
Finally, what if we were performing this search within a nested loop of a large collection (this never happens!)? Two and a half milliseconds each on a search of even a few hundred records could easily degrade customer experience. So, let's look at our example using a map. An array in JavaScript is an associative array, so we can easily map
the id as the key to the object.
class Search {
constructor(people) {
const peopleMap = [];
people.forEach(person => peopleMap[person.id] = person);
this.people = peopleMap
}
find(id) {
return this.people[id]
}
}
module.exports = Search;
// performance output:
// Average time for find for 3 records: 0.001 ms
// Total time for find for 3 records: 2 ms
// Average time for find for 1000000 records: 0 ms
// Total time for find for 1000000 records: 302 ms
Conclusion
I think my problem with JavaScript is not that I don't like it. I hate that I like it. I am scared with memories of pre browser standardization (IE6 ~2005 w/ ActiveX) JavaScript web development. I respect its current position within the development community and look forward to finding a common platform option at every layer of a solution.
Top comments (6)
Nice, you have built an Index, transforming a Search operation into a Lookup.
As a note you cannot trust/use the
peopleMap.length
, because most likely the IDS will not be incremental starting from 0, so it will become a "sparse array" and the length will be themax(id) +1
. Also if the first element of the map will be a String, and not a Number, the map will be transformed into anObject
and thelength
property will not exists.In this case you have to be careful to not use String (text) properties for the index, when the storage is an array (like in your case).
For simplicity I would make
const peopleMap = []
an object, and rename it to improve the readability:const peopleByID = {}
. If you use numeric values I think it will be still transformed to a sparse array so[]
or{}
does not matter, but it will enforce the dev to NOT use the .length and other Array functions.Multiple indexs can be added in the same way, one for each column to be searched.
If the values are not unique, the values will be a list of elements.
I have found rare occasions in practice for this kind of optimization, usually the data is too big or too expensive (hard to keep it up to date) to cache it locally, and usually is stored in a key-value store or directly in DB,
... but if you have this situation, where you do multiple searches on the same data set, the trade-of (between using extra memory for the index) and the CPU savings worth it.
The data set did become quite large, I had to add
--max-old-space-size=4096
as a node parameter! I try to store only reference data in this cache, but even that is a lot.The purpose of the cache is to limit network IO of a data conversion, more memory overhead less IO. But, you are absolutely right. If the data cache became unmanageable, I would push the data out of memory into a local key/value store.
I like the idea of using an object instead of a list!
Fixed! Thanks for the correction
This particular optimization happened on the back end of an Express server. The same client has a react front end and a react native mobile application!