Fran C. for Factorial

Posted on Mar 16, 2020

Some reasons to avoid Hashes in Ruby and some alternatives to them.

#ruby

So, hashes, uh?

What's good about them?

They are easy to build
They are cheap (short-term cheap at least)
They help interfacing with shapeless data (like JSON) in an easy way.
They are good to build simple/cheap key-value stores in memory.

What's not so good about them?

They are shapeless (in a bad way, you never know what you expect from them)
They are mutable (in a bad way as well, how do you know nobody else will mutate this or nobody has already mutated it?)
They are dummy (unlike objects, they know nothing, they have no way to interface with except for Enumerable)
They are impossible to type (🤷‍♂️)
They need a lot of defensive code with nil checks. (even when they are getting better at this with things like hash.dig(:key1, :key2) it is not enough and you always need extra checks here and there, which makes the code difficult to read and to change).

So, what should we do with them?

I propose that we all agree on a small set of guidelines to help us answer the question "Should I use a Hash for this?". TL;DR

Avoid using Hashes as input to methods. Try to have shaped inputs.
Avoid using Hashes as return values. Try to have shaped return values.
If you receive a Hash, isolate it. Build a layer to interface with instead of the Hash.
- Get the data directly if it is a small hash.
- Build objects if it has an heterogeneous structure.

1. Avoid using hashes as input to methods.

There's no easy way to know what's inside params here..

MyInteractor.new(
  author: author,
  params: params
).call

Instead use this:

MyInteractor.new(
  author: author,
  value: params[:value],
  question_id: params[:question_id]
).call

You now have an explicit knowledge of the input for that interactor which is:

simpler to use (you don't need a Hash now, only values which can come from any object)
Intention revealing
Typable
Easier to document

Try to avoid substituting the Hash with a T::Struct or a plain Struct if you don't use sorbet. You will be adding an extra layer and having very little in return. You avoid the hash, which is good, but you add an extra layer and win very little in return. That does not justify a new layer.

Instead, type the parameters you're receiving individually with Sorbet. If you don't use sorbet, just avoid the struct and pass them individually.

You might think:

But I have A TON of parameters and a Hash is a nice and convenient way to
make my object easier to use!

It isn't. You're sacrificing a lot in exchange for almost nothing and your code will suffer for it.

Another reason to avoid this is that it can imply security holes. There's a very simple example which is mass assignment model attributes from the controller params.

# NOTE: This is not how our code works, this is an example of how NOT to do it
current_user = current_company
  .users
  .find(session[:current_user])

current_access.update!(params[:user])

Can you spot the issue? It is subtle but it is there. An attacker could just send a company_id and inject themselves on any company 😱. Not good.

This kind of issue is what made Rails end up adding the strong parameters API: https://guides.rubyonrails.org/action_controller_overview.html#strong-parameters but that isn't enough. Strong parameters are focused on avoiding mass assignment. But you massively pass params from a controller into any object as parameters you're exposed to this kind of problem as well.

In short: Be as explicit as possible with contracts. Pass unfurled parameters as a preference and type as much as you can.

2. Avoid using hashes as return values.

If whoever uses the code you write doesn't know what to expect from your return value, they'll have a hard time using you. Make it easy for them.

If you need a dummy object, use a T::Struct instead (or a Struct). They are very good for transferring data, they have a defined shape and they are highly typable.

3. If you receive a Hash, isolate it.

Let's give an example, you get a Hash from reading a YAML file, a JSON response from an API or even on a controller action.

This is, by far, the hardest to solve among all the cases because each one will need a different solution.

There are however some guidelines you can follow which help a bit:

1. Isolating hashes: Get their values

If you only need to get some keys from the Hash, e.g. reading the params on a controller. You're good to go. Slice the keys you need and then read them.

class MyController
  def create
    MyCreateInteractor.new(
      author: current_access,
      value: create_params[:value],
      description: create_params[:description]
    ).call.unwrap!
  end

  def create_params
    params.slice(:value, :description)
  end
end

Soon we will be able to use sorbet for this, which will give use shaped and typed params. Until that moment arrives, we need to deal with it.

2. Isolating hashes: Build a defensive layer

If the Hash has layers with more hashes inside of it and you need to extract data from it through Enumerable methods (find, map, select...) then wrap that hash on an object and interface with that object instead.

There's an example of this on our PdfParser library. Its configuration is a YAML file with a shape like this:

---
some_data_table:
  min_x: 55
  min_y: 111
  max_x: 532
  max_y: 167
  transform:
    type: table
    arguments:
      - name: column1
        transform: text
        max_x: 214
      - name: column2
        transform: text
        max_x: 372
      - name: column3
        transform: money_es
        max_x: 532

some_numeric_field:
  min_x: 217
  min_y: 81
  max_x: 265
  max_y: 99
  transform: money_es

Interfacing with a hash like this is problematic because you don't know what to expect. Have a look at the transform for example. On the numeric field, it only has a name. But on the table; it has a type, a list of arguments and more nested transforms.

Let's first understand the problem:

This file represents the format for a PDF file. A format has many different fields inside (some_numeric_field and some_data_table for example) in this case.
Each field has boundaries to where it is found on the PDF
Each field defines a transformation to be applied to it after reading the text on it. E.g.: money_es means that it is money formatted like 1.000,00 and it has to be transformed into a BigDecimal

How did we get over this? Interfacing with a Hash like this could have been really complex so we started by separating the parts that change from the parts that stay the same.

All fields have boundaries
All fields have a name
All fields have a transform

What is the changing part? The transform can be different per field so we defined something like this:

Field = Struct.new(:name, :boundaries, :transform)
Box = Struct.new(:min_x, :min_y, :max_x, :max_y)
Transform = Struct.new(:type, :arguments) do
  # @param args [Hash]
  # @option args [String, Hash] :transform the name of the transform or a
  #   Hash describing it
  # @option :transform [String] :type the name of the transform
  # @option :transform [Hash] :arguments a hash with the arguments for the
  #   transform
  def self.from_hash(args)
    type, arguments =
      case args[:transform]
      when String
        [args[:transform], nil]
      else
        [
          args[:transform][:type],
          args[:transform][:arguments]
        ]
      end

    new(type, arguments)
  end

With that we build objects from reading the file into a Hash like this:


# name is the key of the YAML definition for a field and arguments is a hash
# with its content.
read_file.map do |name, arguments|
  Field.new(
    name,
    Transform.from_hash(arguments),
    Boundaries.new(
      arguments[:min_x].to_i,
      arguments[:min_y].to_i,
      arguments[:max_x].to_i,
      arguments[:max_y].to_i
    )
  )
end

We only read from the hash in 1 place (when parsing the configuration).
We have a specific initializer for the case in which we need to pass a hash around (Transform.from_hash) and we use a case...when to map it into objects.

It is indeed more work but saves you from a lot of pain later on. Nobody will know this all came from a Hash, nobody will try to read a shapeless object and nobody will try to mutate it.

What are Hashes good for?

They are good as in-memory key-value stores. But even in that case, build a layer to isolate yourself from it. As an example:

We use this object to store an in-memory cache of the hierarchical relations between employees and their managers. Fetching them is costly and because of our permissions architecture, we do it a lot.

class HierarchyCache
  def initialize
    @cache = {}
  end

  # If a key has value, return it
  # if it doesn't store the return value of `block` and return the value
  def fetch(key, employee, &block)
    value = get(key, employee)

    return value unless value.nil?

    set(key, employee, block.call)
  end

  # Get the value stored on a key
  def get(key, employee)
    @cache.dig(employee.id, key)
  end

  # Sore a value for a key
  def set(key, employee, value)
    new_cache_entry = { key => value }

    if @cache[employee.id]
      @cache[employee.id].merge!(new_cache_entry)
    else
      @cache[employee.id] = new_cache_entry
    end

    get(key, employee)
  end
end

We use a hash here but we take care of not letting anybody know we do and most importantly, we don't leak how the hash is structured and we provide a programmatic API to interface it.

Summary (AKA TL;DR)

Avoid using Hashes as input to methods. Try to have shaped inputs.
Avoid using Hashes as return values. Try to have shaped return values.
If you receive a Hash, isolate it. Build a layer to interface with instead of the Hash.
- Get the data directly if it is a small hash.
- Build objects if it has a heterogeneous structure.
Use them as long as nobody can call [] or any other Enumerable method on them.

Last, but not least, don't serialize things into hashes if you intend to use them later on. If you find yourself using [] on different layers of your code check if you can enforce any of the previous points.

Summarized even more

Don't use them to build contracts (as inputs/outputs). No problem as intermediate values (as long as you isolate them well enough).

Top comments (3)

Augusts Bautra • Apr 22

Thanks for the writeup. Some additions:

Look for opportunities to use Set objects when inclusion in some group will be checked. It's as fast as hash key lookup.
Combat mutability by applying .freeze to hashes, especially ones in constants.
Consider using gem json-schema to specify the structure of JSON structures where Structs just won't cut it.

Ilya Sher • Mar 16 '20

But I have A TON of parameters

Red flag: bad design with high probability. Typically, functions/methods should not take many parameters.

Fran C. • Mar 16 '20

Agree 100% but... (you know, there's always a "but") Sometimes that's not something you can control or not as much as you would like to.

Something I forgot to add is that using a Hash for this doesn't fix the problem which is, as you pointed out, a design problem. It would only hide it and make it harder to fix.

DEV Community