As many of us may like this post about 7-killer-one-liners, we all know that shuffling
looks not very promising, compared with the "correct" way, Fisher-Yates
and its variants.
const shuffleArray = (arr) => arr.sort(() => Math.random() - 0.5)
But, how bad can it be? Basically it depends on the sorting algorithm. It is usually some kind of intro-sort, with is usually some mixture of quick sort, insertion sort and heap sort. The randomness makes it hard to predict the result. So, let's do some experiments instead.
Firstly it is the shuffle function:
declare global {
interface Array<T> {
shuffle: () => T[]
}
}
Array.prototype.shuffle = function <T>(this: T[]) {
return this.sort(() => Math.random() - 0.5)
}
export {}
And now we can:
const experiment = (N: number, times?: number) => {
times = times ?? N ** 2
const original = [...Array(N).keys()]
const samples = Array.from(Array(times), () => [...original].shuffle())
}
We now have so many shuffled samples, but how can we assess them ?
Here we gonna calculate the frequency each number may appear on each position.
const NumberPosition = (numbers: number[], samples: number[][]) => {
return numbers.map(
n => samples.map(sample => [n, sample.indexOf(n)] as const)
// (n, k) => samples.map(sample => [sample[k], k] as const)
).flat(1)
}
const experiment = (N: number, times?: number) => {
times = times ?? N ** 2
const original = [...Array(N).keys()]
const samples = Array.from(Array(times), () => [...original].shuffle())
const pairs = NumberPosition(original, samples)
}
Both methods work. The former one seems more "understandable", and we don't care about performance at all.
Here we gonna count the pairs. We need a Map<[number, number], number>
for that. But here is a problem:
const m = new Map<[number, number], number>()
m.set([0, 0], 1)
m.set([0, 0], 2)
console.log(m)
> Map(2) { [ 0, 0 ] => 1, [ 0, 0 ] => 2 }
To make things cool, we use a pool, which is a [number, number][][]
, to keep the reference unique.
const map = new Map<readonly [number, number], number>()
const pool = original.map(
n => original.map((_, k) => [n, k] as const)
)
const keyOf = (pair: readonly [number, number]) =>
pool[pair[0]][pair[1]]
for (const pair of pairs) {
const key = keyOf(pair)
map.set(key, (map.get(key) ?? 0) + 1)
}
All these
as const
are there for a reason.
Now we have the stats. We gonna sort it by counts.
return Array.from(map.entries())
.sort(([, a], [, b]) => b - a)
Now the whole script looks like:
declare global {
interface Array<T> {
shuffle: () => T[]
}
}
Array.prototype.shuffle = function <T>(this: T[]) {
return this.sort(() => Math.random() - 0.5)
}
const experiment = (N: number, times?: number) => {
times = times ?? N ** 2
const original = [...Array(N).keys()]
const samples = Array.from(Array(times), () => [...original].shuffle())
const pairs = original.map(
n => samples.map(sample => [n, sample.indexOf(n)] as const)
// (n, k) => samples.map(sample => [sample[k], k] as const)
).flat(1)
const map = new Map<readonly [number, number], number>()
const pool = original.map(n => original.map((_, k) => [n, k] as const))
const keyOf = (pair: readonly [number, number]) => pool[pair[0]][pair[1]]
for (const pair of pairs) {
const key = keyOf(pair)
map.set(key, (map.get(key) ?? 0) + 1)
}
return Array.from(map.entries()).sort(([, a], [, b]) => b - a)
}
export { }
So now let's try sth easy:
console.table(experiment(3, 65536))
and the result:
┌─────────┬──────────┬───────┐
│ (index) │ 0 │ 1 │
├─────────┼──────────┼───────┤
│ 0 │ [ 1, 1 ] │ 45117 │
│ 1 │ [ 2, 2 ] │ 32746 │
│ 2 │ [ 0, 0 ] │ 28609 │
│ 3 │ [ 0, 2 ] │ 24666 │
│ 4 │ [ 2, 0 ] │ 24632 │
│ 5 │ [ 1, 0 ] │ 12295 │
│ 6 │ [ 0, 1 ] │ 12261 │
│ 7 │ [ 2, 1 ] │ 8158 │
│ 8 │ [ 1, 2 ] │ 8124 │
└─────────┴──────────┴───────┘
[1, 1]
45117 and [2, 2]
32746 vs [1, 2]
8124 and [2, 1]
8158, That means some elements are more likely to stay where they originally are: and it is 45117/65536, not a very good one.
Let's try a larger array. For larger ones, we care only about first few and last few records, so let's do a filter:
const endN = 4
console.table(
experiment(40, 100000)
.filter(
(_, k, a) => k < endN || a.length - k < endN)
)
┌─────────┬────────────┬──────┐
│ (index) │ 0 │ 1 │
├─────────┼────────────┼──────┤
│ 0 │ [ 0, 0 ] │ 7031 │
│ 1 │ [ 0, 1 ] │ 6308 │
│ 2 │ [ 30, 39 ] │ 4650 │
│ 3 │ [ 3, 0 ] │ 4624 │
│ 4 │ [ 1, 37 ] │ 772 │
│ 5 │ [ 1, 38 ] │ 579 │
│ 6 │ [ 1, 39 ] │ 378 │
└─────────┴────────────┴──────┘
10 times, but it is 0.07, seems better. And it means "there are a possibility of 0.07 that 0 stays on position 0".
Things are kept near where they were, typical insertion sort. This is how intro-sort looks like when N is low.
And a larger one, 1000. I have to do less iterations (down to 10000) or there will be no enough address space for node.js to use.
┌─────────┬──────────────┬────┐
│ (index) │ 0 │ 1 │
├─────────┼──────────────┼────┤
│ 0 │ [ 441, 0 ] │ 55 │
│ 1 │ [ 0, 4 ] │ 53 │
│ 2 │ [ 315, 1 ] │ 52 │
│ 3 │ [ 0, 3 ] │ 52 │
│ 4 │ [ 252, 2 ] │ 49 │
│ 5 │ [ 0, 10 ] │ 48 │
│ 6 │ [ 0, 13 ] │ 48 │
│ 7 │ [ 63, 4 ] │ 47 │
│ 8 │ [ 0, 9 ] │ 47 │
│ 9 │ [ 189, 3 ] │ 46 │
│ 10 │ [ 190, 999 ] │ 1 │
│ 11 │ [ 134, 999 ] │ 1 │
│ 12 │ [ 887, 999 ] │ 1 │
│ 13 │ [ 946, 999 ] │ 1 │
│ 14 │ [ 63, 999 ] │ 1 │
│ 15 │ [ 632, 999 ] │ 1 │
│ 16 │ [ 883, 999 ] │ 1 │
│ 17 │ [ 71, 999 ] │ 1 │
│ 18 │ [ 889, 999 ] │ 1 │
└─────────┴──────────────┴────┘
No much data but a stable one. 55/10000 is not a very big issue, but 55:1 is still bad.
At the end let's try a real Fisher-Yates and see how good it is:
Array.prototype.shuffle = function <T>(this: T[]) {
for (let i = this.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1));
[this[i], this[j]] = [this[j], this[i]]
}
return this
}
You can tell from above I don't like semis, but I have to keep this one :-).
and
┌─────────┬──────────┬──────┐
│ (index) │ 0 │ 1 │
├─────────┼──────────┼──────┤
│ 0 │ [ 2, 0 ] │ 3370 │
│ 1 │ [ 1, 2 ] │ 3369 │
│ 2 │ [ 0, 2 ] │ 3360 │
│ 3 │ [ 2, 1 ] │ 3359 │
│ 4 │ [ 0, 1 ] │ 3344 │
│ 5 │ [ 1, 0 ] │ 3334 │
│ 6 │ [ 1, 1 ] │ 3297 │
│ 7 │ [ 0, 0 ] │ 3296 │
│ 8 │ [ 2, 2 ] │ 3271 │
└─────────┴──────────┴──────┘
Looks good.
and 40
┌─────────┬────────────┬──────┐
│ (index) │ 0 │ 1 │
├─────────┼────────────┼──────┤
│ 0 │ [ 39, 11 ] │ 2638 │
│ 1 │ [ 11, 11 ] │ 2636 │
│ 2 │ [ 38, 34 ] │ 2634 │
│ 3 │ [ 4, 36 ] │ 2633 │
│ 4 │ [ 20, 21 ] │ 2348 │
│ 5 │ [ 27, 25 ] │ 2348 │
│ 6 │ [ 32, 20 ] │ 2345 │
└─────────┴────────────┴──────┘
and 100
┌─────────┬────────────┬──────┐
│ (index) │ 0 │ 1 │
├─────────┼────────────┼──────┤
│ 0 │ [ 74, 70 ] │ 2168 │
│ 1 │ [ 55, 2 ] │ 2167 │
│ 2 │ [ 68, 74 ] │ 2164 │
│ 3 │ [ 50, 20 ] │ 2157 │
│ 4 │ [ 35, 54 ] │ 1830 │
│ 5 │ [ 3, 92 ] │ 1823 │
│ 6 │ [ 27, 69 ] │ 1794 │
└─────────┴────────────┴──────┘
The GC becomes unhappy when I increase the size, due to the address space limitation, and I am unhappy to make the code GC friendly :), but this is enough.
Top comments (0)