This has been originally posted on my personal blog.
Ruby comes distributed with a vast standard library. We only use a fraction of it, usually. Everyone knows date
, set
, json
, maybe even csv
. But there are many more things hidden there.
Some time ago a discussion somewhere (Reddit perhaps?) prompted me to take a deep plunge into Ruby stdlib and in this post I described what I found. Not all things were new to me, some of them were simply forgotten. I chose ones I found most entertaining, interested or standing out in any other way.
While reading through those and asking yourself "why would I use it in a web app?", please bear in mind that Ruby was not designed to be a language powering one of the most important web frameworks in history. Things listed here are more suitable for system scripting etc.
Parsing command-line options
These days when we write a Ruby script it usually comes as a Rake task. But even if it's a standalone file, it is usually steered in a similar way: via environment variables or just by positional parameters accessed via ARGV
array. However, in stdlib we can find two libraries for handling more complex input.
GetoptLong
One of them is GetoptLong
. Let's see it in action:
require 'getoptlong'
opts = GetoptLong.new(
[ '--url', GetoptLong::REQUIRED_ARGUMENT ],
[ '--count', '-c', GetoptLong::OPTIONAL_ARGUMENT ],
[ '--verbose', GetoptLong::NO_ARGUMENT ]
)
opts.each do |option, value|
p [option, value]
end
As you see, I defined three options:
-
url
- it is a required argument -
count
- which is optional -
verbose
which serves as a flag
After that there is code that for each option prints its name and value. So when I test it with ruby getoptlong.rb -c 5 --verbose --url http://github.com
I get:
["--count", "5"]
["--verbose", ""]
["--url", "http://github.com"]
There are few interesting quirks with that. For example, if I omit url
totally, nothing happens. Only if I use it as a flag (ruby getoptlong.rb --url
), I get an exception. Also, if I use some option that is not defined, it throws an error as well.
You can find docs for GetoptLong here.
OptionParser
This solution is much more robust and advanced. Let's see it in action with a similar example:
require 'optparse'
OptionParser.new do |opts|
opts.banner = 'OptionParser example script'
opts.on('--url URL') do |url|
puts "url: #{url}"
end
opts.on('-c N', '--count N') do |n|
puts "#{n} times"
end
opts.on('--verbose') do
puts 'Verbose mode ON'
end
opts.on('-h', '--help') do
puts opts
end
end.parse!
The code is much more idiomatic here. The result is as expected. Behaviour regarding extra options etc. is the same as with GetoptLong. One thing we get for (almost) free here is a help message. Try it with ruby optparse.rb -h
:
OptionParser example script
--url URL
-c, --count N
--verbose
-h, --help
But there's much more to OptionParser than that - coercing types, something called conventions etc. Read more in the docs.
Simple persistent key-value store
When we, Ruby developers, think about a key-value store, we usually have some kind of server-based solution, such as Redis or Riak. However, when writing simple application it's usually more reasonable to use embedded stores. Lately, RocksDB from Facebook became famous as one of such solutions. But with Ruby, we are lucky to have embedded key-value store right in the standard library.
And, there's more... It's not one KV store. It's three of them: DBM, GDBM and SDBM. They are really similar to one another, so I will only quickly outline differences:
- DBM relies on what's installed on your system. It can use many things under the hood and most of the times it will be incompatible between different machines (or even on the same machine when system configuration changes). Therefore it's not well-suited for a persistent storage but is good for temporary applications.
- GDBM is based on one particular implementation of KV store call, not surprisingly, GDBM. Aforementioned DBM may, in some cases, choose to use GDBM as it's underlying storage. It should be compatible between different systems.
- SDBM's code, contrary to previous ones, is shipped with Ruby, so it should be same for all machines.
How do we use it? For example with SDBM (because we don't need to install anything extra to have it):
require 'sdbm'
SDBM.open 'fruits' do |db|
db['apple'] = 'fruit'
db['pear'] = 'fruit'
db['carrot'] = 'vegetable'
db['tomato'] = 'vegetable'
db.update('peach' => 'fruit', 'tomato' => 'fruit')
db.each do |key, value|
puts "Key: #{key}, Value: #{value}"
end
end
This creates two files in current directory. fruits.dir
is empty (I really don't know why), but real data is in fruits.pag
. You can peek into it with hexdump -C fruits.pag
:
00000000 08 00 fb 03 f6 03 f2 03 ed 03 e7 03 de 03 d8 03 |................|
00000010 cf 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000003c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 76 |...............v|
000003d0 65 67 65 74 61 62 6c 65 74 6f 6d 61 74 6f 76 65 |egetabletomatove|
000003e0 67 65 74 61 62 6c 65 63 61 72 72 6f 74 66 72 75 |getablecarrotfru|
000003f0 69 74 70 65 61 72 66 72 75 69 74 61 70 70 6c 65 |itpearfruitapple|
00000400
The data is actually there.
Usefulness of this solution is probably quite limited. You can use it when you want to persist some state between script runs. Or when you really care about memory. Having some big hashes loaded in RAM all the time can slow down your program. With (S/G)DBM you can dump data which is unused for a while to disk and pick it up later when you need it.
Persisting whole objects hierarchy with PStore
Speaking of persisting... In examples above we could only use strings. That's ok in many cases, but not always. What if you want to save part of your application state - with objects, their states, and relations?
Ruby stdlib has you covered! PStore is exactly what you are looking for. In this example we are going to create some very simple Finite-State-Machine-like structure with states connected via named edges to each other:
class State
def initialize(name)
@name = name
@edges = {}
end
def connect_to(word, state)
@edges[word] = state
end
def traverse(indent = 0)
tab = " " * indent
puts "#{tab}State #{@name}:"
@edges.each do |word, state|
puts "#{tab} '#{word}':"
state.traverse(indent + 4)
end
end
end
A traverse
method simply displays connections from start node to the end (watch out, we don't handle loops!). So now let's create some structure and traverse it:
s0 = State.new('start')
s1 = State.new('first')
s2 = State.new('second')
s3 = State.new('third')
s4 = State.new('fourth')
s0.connect_to('aa', s1)
s0.connect_to('aaa', s2)
s1.connect_to('b', s3)
s3.connect_to('c', s4)
s2.connect_to('d', s4)
s0.traverse
What we got is:
State start:
'aa':
State first:
'b':
State third:
'c':
State fourth:
'aaa':
State second:
'd':
State fourth:
Now let's save it using PStore
to a file on disk:
require "pstore"
storage = PStore.new('fsm.pstore')
storage.transaction do
storage['start'] = s0
end
And then, in a different script, we load and traverse:
class State
# omitting, definition same as above
end
require "pstore"
storage = PStore.new('fsm.pstore')
storage.transaction do
start = storage['start']
start.traverse
end
And output is exactly the same! If you're curious, like me, you can peek into fsm.pstore
file using hexdump
again:
00000000 04 08 7b 06 49 22 0a 73 74 61 72 74 06 3a 06 45 |..{.I".start.:.E|
00000010 54 6f 3a 0a 53 74 61 74 65 07 3a 0a 40 6e 61 6d |To:.State.:.@nam|
00000020 65 49 22 0a 73 74 61 72 74 06 3b 00 54 3a 0b 40 |eI".start.;.T:.@|
00000030 65 64 67 65 73 7b 07 49 22 07 61 61 06 3b 00 54 |edges{.I".aa.;.T|
00000040 6f 3b 06 07 3b 07 49 22 0a 66 69 72 73 74 06 3b |o;..;.I".first.;|
00000050 00 54 3b 08 7b 06 49 22 06 62 06 3b 00 54 6f 3b |.T;.{.I".b.;.To;|
00000060 06 07 3b 07 49 22 0a 74 68 69 72 64 06 3b 00 54 |..;.I".third.;.T|
00000070 3b 08 7b 06 49 22 06 63 06 3b 00 54 6f 3b 06 07 |;.{.I".c.;.To;..|
00000080 3b 07 49 22 0b 66 6f 75 72 74 68 06 3b 00 54 3b |;.I".fourth.;.T;|
00000090 08 7b 00 49 22 08 61 61 61 06 3b 00 54 6f 3b 06 |.{.I".aaa.;.To;.|
000000a0 07 3b 07 49 22 0b 73 65 63 6f 6e 64 06 3b 00 54 |.;.I".second.;.T|
000000b0 3b 08 7b 06 49 22 06 64 06 3b 00 54 40 13 |;.{.I".d.;.T@.|
000000be
Useful? Perhaps not, but maybe? I can see the potential to save a state of some simple game this way, for example.
Observer pattern
Usage of Ruby's Observable
was actually part of the first (?) book from which I learned Ruby back in 2008 (?). So it's not new to me, but it's worth reminding that we have such thing built-in. It actually can make the code cleaner in some cases.
To illustrate how it works, I'm going to implement yet another FizzBuzz (it will be a bit incorrect though because will print a number every time):
require 'observer'
class Incrementor
include Observable
def initialize
@number = 0
end
def runto(num)
loop do
@number += 1
changed # note this!
print "#{@number} "
notify_observers(@number)
puts ""
break if @number >= num
end
end
end
class FizzObserver
def update(num)
print "Fizz" if num % 3 == 0
end
end
class BuzzObserver
def update(num)
print "Buzz" if num % 5 == 0
end
end
inc = Incrementor.new
inc.add_observer(FizzObserver.new)
inc.add_observer(BuzzObserver.new)
inc.runto(30)
If you run this code, you'll se it works. There are just two things to remember: call changed
to indicate that the object has changed and calling notify_observers
when you want to emit new values.
Why useful? You can abstract some things (such as logging) outside of your main class. Note, however, that abusing it will lead to callback hell, which would be hard to debug and understand. Just like ActiveRecord callbacks.
DRb
DRb or dRuby is a real gem in the standard library. Described simply as "distributed object system for Ruby", it can give you a lot of fun. To see it live, I decided to go with something really useful: a service that prints random number from 0 to @max_num
every @interval
seconds. Here the code, with DRb included:
require 'drb/drb'
class RandomService
def initialize
set_max_num(100)
set_interval(1)
end
def run
while @should_stop.nil?
puts rand(@max_num)
sleep(@interval)
end
end
def set_max_num(num)
@max_num = num
end
def set_interval(time)
@interval = time
end
def stop!
@should_stop = true
end
end
service = RandomService.new
DRb.start_service('druby://localhost:9394', service)
service.run
The class itself is really straightforward and I'm not going into details about it. The only (hopefully) unfamiliar thing here is the call to DRb, where we wrap our service in dRuby protocol. Basically what it does is exposing our interface on localhost on port 9394. Now, remembering it, I recommend to start the service and split your terminal in two (iTerm can do it on Mac, I recommend Tilix for Linux).
Now, when we have our little service running, fire up irb in second terminal and type:
irb(main):001:0> require 'drb/drb'
=> true
irb(main):002:0> service = DRbObject.new_with_uri('druby://localhost:9394')
=> #<DRb::DRbObject:0x007fd51a8072c0 @uri="druby://localhost:9394", @ref=nil>
When it's done, you can start to play by calling methods on service
. Decrease interval to 0.1
, set max_num
to 1000 – whatever you want. Finally, stop the show by running service.stop!
. All that you have done is reflected immediately in the process you're running in a completely different process in a different terminal! Needless to say, you can also do it over the network, if you wish.
You may think right now that this is just a nice toy. But I've actually seen things like that used in practice. Probably most notable example was an IRC bot where from Ruby console you could do many things, starting from temporary adding admins to some array usually populated on start (so, no downtime for restart required!), ending by defining completely new methods and commends to test them out before actually putting them in the code. I can also imagine exposing such interface to, for example, manipulate the size of some workers pool etc. Actually, the sky is pretty much the limit here.
Other
There are many more things in stdlib. I'm going to mention few of them but without such details descriptions.
tsort
I had a bit of trouble understanding what tsort is really for. What it does is a topological sorting of directed acyclic graphs. If this sounds pretty specific, that's because it is. This kind of sorting is mostly useful in dependency sorting, when you have a graph of dependencies (A depends on B and C, B depends on D, E depends on A) and you need to determine an order of installing those dependencies, so that every item has its dependencies already installed when being installed.
There is a great article by Lawson Kurtz explaining how it's used in Bundler.
Math
Some math-related classed in Ruby standard library:
-
Matrix
has methods for matrix operations, such as (but not limited to):conjugate
,determinant
,eigensystem
,inverse
and many more (see the docs) -
Prime
represents an infinite set of all prime numbers. You don't need to implement this Eratosthenes sieve yourself! - [sidenote] I was surprised that there is no
Complex
class in stdlib, especially after I learned that it used to be there, but was removed. It turns out that it actually made it to core (so it is automatically required). Check this out by firing up yourirb
and writing:(2 + 3i) * (-6i)
(spoiler: it won't be aNameError
because fo undefinedi
)
abbrev
This is probably more of a toy that really useful tool, but in case you need it, it's there. Abbrev
module has one method abbrev
that takes a list of strings and returns possible abbreviations that are non-ambiguous. For example:
Abbrev.abbrev(%w[ruby rubic russia])
#=> {"ruby"=>"ruby", "rubic"=>"rubic", "rubi"=>"rubic", "russia"=>"russia", "russi"=>"russia", "russ"=>"russia", "rus"=>"russia"}
So, you know you can't use ru
as an abbreviation.
zlib
Last but not least, there is zlib
. To quote:
Zlib is designed to be a portable, free, general-purpose, legally unencumbered – that is, not covered by any patents – lossless data-compression library for use on virtually any computer hardware and operating system.
For me, it sounds quite good. Compared to gzip:
The zlib format was designed to be compact and fast for use in memory and on communications channels. The gzip format was designed for single-file compression on file systems, has a larger header than zlib to maintain directory information, and uses a different, slower check method than zlib.
So zlib could actually be a good choice to reduce overhead when you send something over the network. To check it, I took Pride and Prejudice from Gutenberg and checked how it can be compressed:
require 'zlib'
source = File.read('path/to/pride-and-prejudice.txt')
compressed = Zlib::Deflate.deflate(source)
decompressed = Zlib::Inflate.inflate(compressed)
puts "Source size: #{source.bytesize}"
puts "Compressed: #{compressed.bytesize}"
puts "Decompressed: #{decompressed.bytesize}"
puts "Compression: #{(1 - (compressed.bytesize.to_f / source.bytesize)).round(4)}"
The result was:
Source size: 724725
Compressed: 260549
Decompressed: 724725
Compression: 0.6405
I say it's pretty impressive!
More?
Yes! There is more hidden in Ruby stdlib. Have I missed something? Do you think something is even more interesting? Let me know.
Top comments (3)
Thanks for these points. I didn't know about rational numbers and I skipped Rinda because I couldn't understand it – thanks for link to the post! Monitor is also a good addition to the list, I remember using it before, but forgot to include in my post.
Great article, thanks! I had no idea some of these things existed, particularly the storage options.
Thanks Pawel!