DEV Community

kojix2
kojix2

Posted on • Updated on

Easy machine learning with Ruby using Rumale

What is Rumale

A powerful library for machine learning written in pure Ruby!
Rumale is created by @yoshoku.
https://github.com/yoshoku/rumale

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.

Install

gem install rumale
Enter fullscreen mode Exit fullscreen mode

Prepare a dataset

require 'rumale'
require 'daru'
require 'rdatasets'

# load datasets
iris = RDatasets.load(:datasets, :iris)
# Daru::DataFrame

# labels # Numo::Int32#shape=[150]
iris_labels = iris['Species'].to_a
encoder = Rumale::Preprocessing::LabelEncoder.new
labels = encoder.fit_transform(iris_labels) 

# samples Numo::DFloat#shape=[150,4]
# (Daru -> NArray )
samples = Numo::DFloat[*iris[0..3].to_matrix.to_a]
Enter fullscreen mode Exit fullscreen mode

Classification models

# Support vector machine
model = Rumale::LinearModel::SVC.new(
  reg_param: 0.0001,
  fit_bias: true,
  max_iter: 3000,
  random_seed: 1
)
Enter fullscreen mode Exit fullscreen mode

Various classifiers

model = Rumale::Tree::DecisionTreeClassifier.new(random_seed: 1)
model = Rumale::Ensemble::RandomForestClassifier.new(random_seed: 1)
model = Rumale::NearestNeighbors::KNeighborsClassifier.new(n_neighbors: 5)
model = Rumale::NaiveBayes::GaussianNB.new
# etc...
Enter fullscreen mode Exit fullscreen mode

Cross validation

# KFold
kf = Rumale::ModelSelection::StratifiedKFold.new(
  n_splits: 5,
  random_seed: 1
)

cv = Rumale::ModelSelection::CrossValidation.new(
  estimator: model,
  splitter: kf
)
report = cv.perform(samples, labels)
Enter fullscreen mode Exit fullscreen mode

Result

scores = report[:test_score]
puts scores.sum / scores.size
# 0.9466666666666667
Enter fullscreen mode Exit fullscreen mode

Learning and Predicting

# Learning
model.fit(samples, labels)

# Predicting
# accept 2D NArray  (Numo::DFloat#shape=[150,4])
p model.predict(samples).to_a
Enter fullscreen mode Exit fullscreen mode

Save and load models

# Save a model
File.binwrite("model.dat", Marshal.dump(model))

# Load a model
model = Marshal.load(File.binread("model.dat"))
Enter fullscreen mode Exit fullscreen mode

Enjoy!

Top comments (0)