DEV Community

Pacharapol Withayasakpunt
Pacharapol Withayasakpunt

Posted on

Deserializable serialization of Anything, other than YAML. Also, how-to-hash.

Because yaml.load is actually dangerousLoad, and is as potentially dangerous as eval.

There is also another method in Python, pickle, and it can be potentially as dangerous as YAML, while being more concise, and harder to edit than YAML. Sadly, I know no such library in Node / JavaScript.

So, safer method is actually JSON, which is highly editable, and you can provide how-to on serialization / deserialization.

Also, in Python, serializing a non-serializable will throw an Error (while in JavaScript, will mostly be defaulted to {}, except for BigInt, for some reasons)

In Python, how-to is https://www.polvcode.dev/post/2019/09/custom-json, but I haven't done it for a while.

In JavaScript, it is that JSON.stringify has replacer and JSON.parse has reviver.

How to identify typeof Anything

Firstly, you have know all custom types, if you want to serialize it, which is mostly easily done by instanceof, but you cannot serialize with instanceof...

So, I have identify it, first with typeof, but second step is manual. (By identifying what is serializable, otherwise it is not.)

// These are what are returned from typeof

export type TypeNativeSerializable = 'string' | 'number' | 'boolean'
export type TypeNativeNonSerializable = 'bigint' | 'symbol' | 'undefined' | 'function' | 'object'

// These are what aren't returned, but do have syntactic meaning. Have to be derived.

export type TypeExtra = 'Null' | 'Array' | 'Named' | 'Constructor' | 'NaN' | 'Infinity
Enter fullscreen mode Exit fullscreen mode

And, I identify what will be hard to serialize, with some meanings.

  const specialTypes: Record<TypeExtra | TypeNativeNonSerializable, any[]> = {
    Null: [null],
    NaN: [NaN],
    Named: [new NamedClassWithMethods(), new NamedClassWithoutMethods()],
    Infinity: [Infinity, -Infinity],
    Array: [new Array(5)],
    Constructor: [NamedClassWithMethods, NamedClassWithoutMethods, Array, NamedArray],
    bigint: [BigInt(900719925474099133333332)],
    symbol: [Symbol('hello')],
    undefined: [undefined],
    object: [{ a: 1 }],
    function: [function fnLiteral (a: any) { return a }, (b: any) => b]
  }
Enter fullscreen mode Exit fullscreen mode

And now, type identification -- the hard topic here is how-to-check-if-a-javascript-function-is-a-constructor...

Serialization

To cut things short, I have identified how-to-serialize for most native objects in my library. There is no dependencies, and works in both browser and Node (but I haven't polyfill / shim for older browsers yet.)

GitHub logo patarapolw / any-serialize

Serialize any JavaScript objects, as long as you provides how-to. I have already provided Date, RegExp and Function.

But I disable undefined serialization by default (i.e. undefined is excluded by default), but you might want to enable it. (I did so in my test.)

Most serialization is done by both .toString() and caching typeof Object.

RegExp objects are a little special. .toString() is hard to reconstruct, so I use RegExp#source and RegExp#flags instead.

Hashing

There are some problematic topics here.

  • JSON.stringify doesn't reliably sort keys.
  • You cannot supply both function replacer and sorter to JSON.stringify
  • How do I hash functions and classes
  • Symbols should always be unique.
  • Key collision

I have provided how to sort keys without a library, via JSON.stringify, with Array as the second argument. Only that you have to cache all Object keys, including nested.

    const clonedObj = this.deepCloneAndFindAndReplace([obj])[0]

    const keys = new Set<string>()
    const getAndSortKeys = (a: any) => {
      if (a) {
        if (typeof a === 'object' && a.constructor.name === 'Object') {
          for (const k of Object.keys(a)) {
            keys.add(k)
            getAndSortKeys(a[k])
          }
        } else if (Array.isArray(a)) {
          a.map((el) => getAndSortKeys(el))
        }
      }
    }
    getAndSortKeys(clonedObj)
    return this.stringifyFunction(clonedObj, Array.from(keys).sort())
Enter fullscreen mode Exit fullscreen mode

I also deepCloneAndFindAndReplace the Object here, both to "supply both function replacer and sorter to JSON.stringify" and to prevent original Object modification on replace.

How do I hash functions and classes

For functions, I replace all whitespaces, but a proper and better way is probably to minify to stringified code. (Didn't put in my code to avoid adding dependencies.)

export const FullFunctionAdapter: IRegistration = {
  key: 'function',
  toJSON: (_this) => _this.toString().trim().replace(/\[native code\]/g, ' ').replace(/[\t\n\r ]+/g, ' '),
  fromJSON: (content: string) => {
    // eslint-disable-next-line no-new-func
    return new Function(`return ${content}`)()
  }
}
Enter fullscreen mode Exit fullscreen mode

For classes, you will need to objectify it.

/**
 * https://stackoverflow.com/questions/34699529/convert-javascript-class-instance-to-plain-object-preserving-methods
 */
export function extractObjectFromClass (o: any, exclude: string[] = []) {
  const content = {} as any

  Object.getOwnPropertyNames(o).map((prop) => {
    const val = o[prop]
    if (['constructor', ...exclude].includes(prop)) {
      return
    }
    content[prop] = val
  })

  return o
}
Enter fullscreen mode Exit fullscreen mode

It is possible to hash without a library. You just need to know the code.

/**
 * https://stackoverflow.com/questions/7616461/generate-a-hash-from-string-in-javascript
 *
 * https://stackoverflow.com/a/52171480/9023855
 *
 * @param str
 * @param seed
 */
export function cyrb53 (str: string, seed = 0) {
  let h1 = 0xdeadbeef ^ seed; let h2 = 0x41c6ce57 ^ seed
  for (let i = 0, ch; i < str.length; i++) {
    ch = str.charCodeAt(i)
    h1 = Math.imul(h1 ^ ch, 2654435761)
    h2 = Math.imul(h2 ^ ch, 1597334677)
  }
  h1 = Math.imul(h1 ^ h1 >>> 16, 2246822507) ^ Math.imul(h2 ^ h2 >>> 13, 3266489909)
  h2 = Math.imul(h2 ^ h2 >>> 16, 2246822507) ^ Math.imul(h1 ^ h1 >>> 13, 3266489909)
  return 4294967296 * (2097151 & h2) + (h1 >>> 0)
}
Enter fullscreen mode Exit fullscreen mode
  • Preventing key collision
  • Symbols are always unique

This is as simple as Math.random().toString(36).substr(2), but you might use a proper UUID.

Deserialization isn't always safe, but it isn't required for hashing

In the end, it is the same as YAML and pickle, so you have to choose what to be deserialize properly.

I exclude Function deserialization by default, by removing fromJSON method.

export const WriteOnlyFunctionAdapter: IRegistration = {
  ...FullFunctionAdapter,
  fromJSON: null
}
Enter fullscreen mode Exit fullscreen mode

If you are only needing it for MongoDB, you don't need a library at all

Because the code is here. https://gist.github.com/patarapolw/c9fc59e71695ce256b442f36b93fd2dc

const cond = {
  a: new Date(),
  b: /regexp/gi
}

const r = JSON.stringify(cond, function (k, v) {
  const v0 = this[k]
  if (v0) {
    if (v0 instanceof Date) {
      return { $date: v0.toISOString() }
    } else if (v0 instanceof RegExp) {
      return { $regex: v0.source, $options: v0.flags }
    }
  }
  return v
})

console.log(r)

console.log(JSON.parse(r, (_, v) => {
  if (v && typeof v === 'object') {
    if (v.$date) {
      return new Date(v.$date)
    } else if (v.$regex) {
      return new RegExp(v.$regex, v.$options)
    }
  }
  return v
}))
Enter fullscreen mode Exit fullscreen mode

Summary

This library has no dependencies, and tested for most native objects.

GitHub logo patarapolw / any-serialize

Serialize any JavaScript objects, as long as you provides how-to. I have already provided Date, RegExp and Function.

The demo is here https://patarapolw.github.io/any-serialize/, and tested that this can be hashed.

const obj = {
  a: new Date(),
  r: /^hello /gi,
  f: (a, b) => a + b,
  s: new Set([1, 1, 'a']),
  c: new XClass(),
  miscell: [
    NaN,
    Infinity,
    BigInt(900719925474099133333332),
    function fnLiteral (a) { return a }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Top comments (0)