DEV Community

Cover image for Working with LLMs in Ruby on Rails: A Simple Guide
JT Dev for JetThoughts

Posted on • Updated on • Originally published at jetthoughts.com

Working with LLMs in Ruby on Rails: A Simple Guide

Why You Need to Work with LLMs Today

Large Language Models (LLMs) are reshaping how we build apps. Knowing how to use LLMs lets you create smart, interactive tools that understand and generate text. This skill is now key in modern development. Whether you build chatbots or text analyzers, LLMs can add value. So, let’s dive into how to run an LLM server locally and use it in a Ruby on Rails (RoR) project.

Running LLM Locally with Docker

We will run the Llama 3.1 model using Docker. Llama 3.1 is popular for personal use, and Docker simplifies the setup.

Install Docker: Use the official Docker client with a UI.

  • Run the LLM Server: Use the following command to start the Llama 3.1 with API server
  docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama 
  ollama/ollama
Enter fullscreen mode Exit fullscreen mode
  • Select LLM model:
  docker exec -it ollama ollama run llama3
Enter fullscreen mode Exit fullscreen mode
  • Test the server with:
  curl http://localhost:11434/api/generate -d '{"model": "llama3", "prompt":"Why is the sky blue? Answer with 10 words"}'
Enter fullscreen mode Exit fullscreen mode

If the result looks something like this, then the server has started successfully:

{"model":"llama3","created_at":"2024-08-28T15:01:07.826076294Z","response":"Short","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.154276586Z","response":" wavelength","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.314917461Z","response":" blue","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.490800211Z","response":" light","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.661478628Z","response":" sc","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:08.83101417Z","response":"atters","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.002102128Z","response":" more","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.175030712Z","response":" in","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.34067667Z","response":" Earth","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.512882962Z","response":"'s","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.685311962Z","response":" atmosphere","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:09.87469392Z","response":".","done":false}
{"model":"llama3","created_at":"2024-08-28T15:01:10.089219045Z","response":"","done":true,"done_reason":"stop","context":[128006,882,128007,271,10445,374,279,13180,6437,30,22559,449,220,605,4339,128009,128006,78191,128007,271,12755,46406,6437,3177,1156,10385,810,304,9420,596,16975,13],"total_duration":12195522088,"load_duration":7132571086,"prompt_eval_count":21,"prompt_eval_duration":2754452000,"eval_count":13,"eval_duration":2263609000}
Enter fullscreen mode Exit fullscreen mode

Llama server API documentation.

Building a Ruby on Rails App

Let’s create a simple RoR app that connects to our LLM server.

  • Create a New Ruby on Rails Project:
rails new llm-chat
Enter fullscreen mode Exit fullscreen mode
  • Generate a Controller with actions:
rails g controller chat index create
Enter fullscreen mode Exit fullscreen mode
  • Add a Routes for Chat: In config/routes.rb, add:
root "chat#index"
post "/", to: "chat#create", controller: :chat
Enter fullscreen mode Exit fullscreen mode
  • Add WebSocket Route: In config/routes.rb, add:
mount ActionCable.server => '/cable'
Enter fullscreen mode Exit fullscreen mode
  • Generate a WebSocket Channel:
rails generate channel Chat
Enter fullscreen mode Exit fullscreen mode
  • Update the Chat Channel: In app/channels/chat_channel.rb, update the code:
class ChatChannel < ApplicationCable::Channel
  def subscribed
    stream_from "chat_channel"
  end

  def unsubscribed
  end
end
Enter fullscreen mode Exit fullscreen mode
  • Update the Controller: In app/controllers/chat_controller.rb, modify the create method:
class ChatController < ApplicationController
  def index; end

  def create
    LlmJob.perform_later("http://localhost:11434/api/generate", params[:chat][:query])

    head :ok
  end
end
Enter fullscreen mode Exit fullscreen mode
  • Create LlmJob:
╰─ $ rails generate job Llm
      invoke  test_unit
      create    test/jobs/llm_job_test.rb
      create  app/jobs/llm_job.rb
Enter fullscreen mode Exit fullscreen mode
  • LlmJob code:
require 'net/http'

class LlmJob < ApplicationJob
  queue_as :default

  def perform(api_endpoint, prompt)
    uri = URI(api_endpoint)
    req = Net::HTTP::Post.new(uri, 'Content-Type' => 'application/json')
    req.body = { model: "llama3", prompt: prompt }.to_json

    Net::HTTP.start(uri.hostname, uri.port) do |http|
      http.request(req) do |response|
        response.read_body do |chunk|
          parsed_response = JSON.parse(chunk)
          ActionCable.server.broadcast(
            "chat_channel",
            { message: parsed_response['response'], done: parsed_response['done'] }
          )
        end
      end
    end
  end
end
Enter fullscreen mode Exit fullscreen mode
  • Frontend Chat Channel: In app/javascript/channels/chat_channel.js, add:
import { createConsumer } from "@rails/actioncable"

const consumer = createConsumer()

consumer.subscriptions.create("ChatChannel", {
  received(data) {
    document.getElementById("send-request").disabled = true;
    const chatBox = document.getElementById('chat-box');

    let botMessageElement = chatBox.querySelector('div[data-status="pending"]');

    if (!botMessageElement) {
      botMessageElement = document.createElement('div');
      botMessageElement.className = 'message bot';
      botMessageElement.setAttribute('data-status', 'pending');
      chatBox.appendChild(botMessageElement);
    }

    botMessageElement.textContent += ` ${data.message}`;

    if (data.done) {
      botMessageElement.setAttribute('data-status', 'done');
      document.getElementById("send-request").disabled = false;
    }

    chatBox.scrollTop = chatBox.scrollHeight;
  }
});
Enter fullscreen mode Exit fullscreen mode

Image description

Conclusion

Now, you have a basic RoR app that interacts with an LLM server. The server sends responses in chunks, and the app displays them in real-time. This setup is a powerful way to integrate AI into your apps.

Full code you can find here: Github repo

Top comments (0)