First of all I would like to say thanks for the good reception and support y'all have given to my last article Exploring Genetic Algorithms with Ruby. It really meant the world to me. I know I said I was going to write consistently on a weekly-basis but August started off pretty rough so I needed to push this new article back a bit. Also, I am a "do it well or not at all" guy, and this means that if I'm not satisfied with something I wrote, I'll never publish it; therefore, the wait is worth it, at least for my peace of mind.
About the entry, well, this is a new one in my now called "Ruby is waaaay more than web development" article series, in which I'm trying to raise awareness in the community that the Ruby language covers a lot more ground than just Web Development and its relation to the RubyOnRails framework. In this post we'll talk about Ruby and Machine Learning.
Proper Introduction: A Ruby Revolution in ML.
In the ever-evolving landscape of machine learning, a few programming languages have traditionally held the limelight – Python, R, and Julia, to name a few. Yet, nestled in the shadows, the Ruby programming language has been quietly proving its mettle in this domain. This article embarks on a journey to showcase the potential of Ruby in machine learning, highlighting disruptive gems that defy convention, offering illustrative code snippets, and delving into the depths of Ruby's internals for a comprehensive understanding.
While Ruby might not be the first language that springs to mind when it comes to machine learning, its attributes make a compelling case for its inclusion in the ML toolkit. Ruby's readability, elegant syntax, and developer-friendly environment pave the way for creative problem-solving. As we journey through this article, we will unveil how Ruby can step up as a worthy contender in the realm of machine learning.
Unveiling Disruptive Gems for ML's purposes.
1. Nyaplot: Visualizing Insights:
Nyaplot is a versatile gem for data visualization. It offers a collection of plots and visualization tools that are particularly helpful when exploring datasets and gaining insights. Nyaplot's interactive visualizations provide an intuitive understanding of data patterns and distributions.
require 'nyaplot'
# Generate some sample data
data = {
feature1: [1, 2, 3, 4, 5],
feature2: [5, 4, 3, 2, 1]
}
# Create a scatter plot
scatter = Nyaplot::Plot.new
scatter.add(:scatter, data[:feature1], data[:feature2])
scatter.show
2. Numo::GSL: Integration with GNU Scientific Library:
Numo::GSL seamlessly integrates Ruby with the GNU Scientific Library, enabling advanced mathematical and statistical computations. This gem is ideal for researchers who require specialized numerical functions for their machine learning projects.
require 'numo/gsl'
# Compute Bessel function
result = Numo::GSL::Sf.bessel_Jn(2, 3.0)
puts "Bessel function result: #{result}"
3. RubyFann: Neural Networks with FANN:
RubyFann leverages the Fast Artificial Neural Network (FANN) library to implement neural networks. This gem empowers developers to create, train, and utilize neural networks for various tasks, from classification to regression.
require 'ruby-fann'
# Build a simple neural network
fann = RubyFann::Standard.new(num_inputs: 2, hidden_neurons: [3], num_outputs: 1)
Real-life Application: The Sentiment Analysis.
Let's dive into a real-life application where Ruby shines: sentiment analysis of customer reviews. In this scenario, we'll use the Natural Language Toolkit (nltk) gem, Nyaplot for visualization, and Numo::NArray for data manipulation.
require 'nltk'
require 'nyaplot'
require 'numo/narray'
# Sample dataset of customer reviews and their sentiment labels
reviews = [
"The product is amazing and exceeded my expectations!",
"Not satisfied with the quality. Will not buy again.",
"Excellent customer service. Very happy with my purchase.",
"The packaging was damaged, but the product itself is good."
]
sentiments = [1, 0, 1, 0] # 1 for positive, 0 for negative
# Tokenize and preprocess the reviews using NLTK
tokenized_reviews = reviews.map { |review| NLTK::Tokenization.word_tokenize(review) }
# ...
# Vectorize the preprocessed text using TF-IDF
tfidf_vectorizer = NLTK::Vectorization::TfIdfVectorizer.new
tfidf_matrix = tfidf_vectorizer.fit_transform(tokenized_reviews)
# ...
# Split the dataset into training and testing sets
train_data, test_data, train_labels, test_labels = train_test_split(tfidf_matrix, sentiments, test_size: 0.2)
# ...
# Build a simple sentiment analysis model using Numo::NArray
input_size = train_data.shape[1]
hidden_size = 128
output_size = 1
learning_rate = 0.01
weights_hidden = Numo::DFloat.new(input_size, hidden_size).rand
bias_hidden = Numo::DFloat.new(hidden_size).rand
weights_output = Numo::DFloat.new(hidden_size, output_size).rand
bias_output = Numo::DFloat.new(output_size).rand
def sigmoid(x)
1 / (1 + Numo::NMath.exp(-x))
end
def predict(input)
hidden_output = sigmoid(input.dot(weights_hidden) + bias_hidden)
output = sigmoid(hidden_output.dot(weights_output) + bias_output)
output
end
# Train the model using the training data
num_epochs = 1000
num_epochs.times do
predicted = predict(train_data)
error = train_labels - predicted
d_output = error * predicted * (1 - predicted)
error_hidden = d_output.dot(weights_output.transpose)
d_hidden = error_hidden * predicted * (1 - predicted)
weights_output += hidden_output.transpose.dot(d_output) * learning_rate
bias_output += d_output.sum * learning_rate
weights_hidden += train_data.transpose.dot(d_hidden) * learning_rate
bias_hidden += d_hidden.sum * learning_rate
end
# Evaluate the model using the testing data
predictions = test_data.map { |data| predict(data) > 0.5 ? 1 : 0 }
accuracy = (predictions == test_labels).sum / test_labels.size.to_f
puts "Accuracy: #{accuracy}"
# Visualize sentiment distribution using Nyaplot
plot = Nyaplot::Plot.new
plot.add(:histogram, sentiments)
plot.show
Peering into Ruby's Internals for ML.
Delving into Ruby's internals can be enlightening. You know, Ruby is written in C, which means that understanding its internals provides insights into how it can effectively handle machine learning tasks. Although Ruby's primary design isn't tailored exclusively for machine learning, its extensibility allows developers to build libraries and gems that enable ML capabilities.
Memory Management: Ruby's garbage collector, a part of its internal C implementation, manages memory allocation and de-allocation. This efficient memory management is essential for handling large datasets and optimizing neural network operations during machine learning tasks.
C Extensions: Ruby's C API allows developers to write C extensions, bridging the gap between Ruby and lower-level operations. This extensibility enables the integration of optimized numerical libraries like BLAS and LAPACK, which are crucial for performing complex matrix operations common in machine learning algorithms.
Embedding Libraries: Developers can embed external libraries written in C or other languages within Ruby code. For instance, by embedding optimized linear algebra libraries like Intel's MKL or OpenBLAS, developers can significantly accelerate numerical computations within machine learning algorithms.
Interfacing with Gems: Ruby's C API allows gems to be developed with enhanced performance in mind. Disruptive gems like Numo::NArray and Numo::GSL can harness these capabilities to provide optimized numerical computations and integration with specialized scientific libraries.
Multithreading: Ruby's C internals provide mechanisms for multithreading, allowing developers to parallelize tasks such as data preprocessing, optimization, and neural network training, which are common in machine learning workflows.
Example: Matrix Operations.
Let's consider a simple example of how Ruby's internals contribute to efficient matrix operations, a fundamental aspect of machine learning algorithms.
require 'numo/narray'
# Creating Numo::NArray objects
a = Numo::DFloat.new(3, 3).rand
b = Numo::DFloat.new(3, 3).rand
# Matrix multiplication using Numo::NArray
c = a.dot(b)
In the above code, when performing matrix multiplication using dot
, Numo::NArray leverages efficient C-level memory management and optimized BLAS operations. These optimizations are a direct result of Ruby's ability to interface with C libraries, showcasing how it can contribute to the language's prowess in handling machine learning tasks.
By utilizing C-level memory management Numo::NArray's dot
method can perform matrix multiplications like the above, efficiently. This efficiency is crucial in machine learning, where matrix operations are fundamental to algorithms like neural networks and support vector machines.
Conclusion: Embracing Ruby's ML Potential.
Ruby's versatility and its growing array of gems make it an unexpectedly compelling choice for ML tasks. As we've witnessed, gems like Nyaplot, Numo::GSL, and RubyFann are pushing the boundaries of what's possible. By embracing these gems and even exploring Ruby's internals for extensibility, developers can unlock the language's potential in the world of machine learning. Ruby may not be the conventional choice, but it's proving its prowess in this evolving landscape. So, whether you're a Ruby enthusiast or a machine learning aficionado, Ruby's capabilities are certainly worth exploring further.
Top comments (5)
Thanks for sharing.
A series of side by side comparisons and benchmarks of equivalent code in Ruby and Python would be nice.
Great insights 🔥
Andrew Kane has been doing huge contributions in this field: ankane.org/new-ml-gems
Thank you for reading the article and for sharing those contributions. I know that, for now, it's a field that doesn't have many followers, some people doesn't even know that we can actually work ML with Ruby, but, that bit by bit is gaining ground. Thank you very much for your time!
Great!
I'm glad you liked it. Thanks. I really appreciate it!