Enums are a very useful concept. It's like a locked list of choices where only a few specific values are allowed, and nothing else. Enums work well for any place where you need a limited, known list of values for something, like:
-
status
(with values like "pending," "active," and "archived") -
role
(with values like "admin," "user," and "guest") -
difficulty_level
(with values like "easy," "medium," and "hard")
See docs on Rails' enum macro.
Enums allow more choices of values than booleans, and are more constrained than strings.
Using string columns for enum values is known to be too permissive. Eventually letter case and random whitespace problems creep into the dataset. Sidestep issues by using appropriate data type - enum!
Let me present three approaches to defining enums in Rails.
Integer-based Enums (easy)
Integer-based enums are easy to define, use and extend:
# in migration
create_table :jobs do |t|
t.integer :status, null: false, default: 0
end
# in model
enum status: { pending: 0, completed: 1, errored: 2 }
Adding a new possible status
is easy - add new key-value pairs, but be sure not to change the existing mappings.
enum status: { pending: 0, completed: 1, errored: 2, processing: 3 }
You can even skip some integers to have subgroups. Here we're placing errors in the 90s, and leaving integers 3-8 for possible additions.
enum status: {
pending: 0, processing: 1, completed: 2,
errored_hard: 91,
errored_with_retry: 92
}
DB-level Enums (hard-er)
Postgres supports database-level enum definition. This approach is more easy to read (queries have human-readable values, not cryptic integers), but harder to maintain - changing values requires a database migration, not just code change.
# in migration
create_enum :job_status, ["pending", "completed", "errored"]
create_table :jobs do |t|
t.enum :status, null: false, default: "pending", enum_type: "job_status"
end
# in model
enum status: { pending: "pending", completed: "completed", errored: "errored" }
String Enums (discouraged)
If you need the flexibility of permitting new values without changes to code, such as user-defined types, and are OK with taking on the dataset pollution risk, and then string enums can be an option.
It's basically using just a string column, so very few native constraints on the database level for the values users can write. I recommend adding CHECK constraints, for example, allow only lowercase latin letters and underscores, to have some semblance of data integrity on the database level, and a dynamic validation in app code, so forms can show validation errors etc.
# in migration
create_table :jobs do |t|
t.string :status, null: false, default: "pending"
end
# in model, just define a validation
validate :validate_status_in_supported_list
def validate_status_in_supported_list
return unless status_changed?
# here the dynamic source of allowed values can be anything - database, remote requests, file read etc.
allowed_statuses = SomeSource.allowed_statuses
return if allowed_statuses.include?(status)
errors.add(:status, :inclusion)
end
Top comments (9)
I usually encourage using string-based enums unless there's very good reason not to. I really don't like integer-based ones, because they are useless to reason without having the application code at hand (for example, and usually, when querying database directly).
DB-based are a nice middle ground, but require a migration every time you add a new value to enum (which I don't like) and
down
migrations are especially complicated.Same here (also read Rails Core Team regret integer-based enums; but don't quote me on that).
I like defining them like this too
enum :status, %w[enabled disabled].index_by(&:itself), default: "enabled"
I've been burnt too many times by fat-fingered devlopers/users managing to get bad data into the DB with
update_all(state: "oopsie")
, haha. Integer enums somewhat limit that.Do they, really? What prevents these devs from doing
update_all(state: 12)
when you only have 5 valid states? I would even argue that when they update tocomplted
it's somewhat easier to guess what they had in mind than with a pure wrong number.I disagree. I'm convinced
update_all(state: 12)
for no such enum defined will raise an ArgumentError, whereasupdate_all(state: "word")
opens the possibility of a developer misremembering something. Basically, the opaqueness of the integer enum becomes a sort of feature in that you have to open the code and see what values are defined, rather than winging it. :)I completely agree with you
The only application for integer based is when I really want to order things. Difficulty is a good example of it. I would like to sort by difficulty using db, then it’s okay
But in real life my first intent will to make a string based enum and say that the order of my hash is the source of truth
Difficulty is a nice counterpoint. Although what would you do if you now need to introduce a new enum not at the end? Lets say you have:
... but now you need
novice
betweeneasy
andmedium
. Do you migrate all the data to haveor perhaps you use the old trick? ;)
I took the example of Difficulty but of course if it has to change I’ll definitively go the string based !
Nono, sorting integer enums is a hack. I believe semantic ordering comes into play here, I believe I covered it in dev.to/epigene/til-custom-order-wi...
Some comments may only be visible to logged-in visitors. Sign in to view all comments.