Building Simple Recommendation Engine With KNN Algorithm using Redis

#redis #machinelearning #python #knn

Hi guys today i tought of building a recommendation engine which recommends us music on giving a audio file as input

let us see how we can build this engine using redis with the help of inbuilt vector similarity search

start the redis stack server

docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest

preparing the dataset

for the dataset i had downloaded the music from this channel https://www.youtube.com/@TheFatRat i had downloaded the music in .wav format

and i had create a data.json

[
    {
        "file": "1.wav",
        "link": "https://www.youtube.com/watch?v=2QdPxdcMhFQ"
    },
    {
        "file": "2.wav",
        "link": "https://www.youtube.com/watch?v=2Ax_EIb1zks"
    },
    {
        "file": "3.wav",
        "link": "https://www.youtube.com/watch?v=wgip631nFdY"
    },
    {
        "file": "4.wav",
        "link": "https://www.youtube.com/watch?v=bgyO9bNbfb8"
    },
    {
        "file": "5.wav",
        "link": "https://www.youtube.com/watch?v=gHgv19ip-0c"
    },
    {
        "file": "6.wav",
        "link": "https://www.youtube.com/watch?v=3VTkBuxU4yk"
    },
    {
        "file": "7.wav",
        "link": "https://www.youtube.com/watch?v=cJglBxApcDM"
    },
    {
        "file": "8.wav",
        "link": "https://www.youtube.com/watch?v=j-2DGYNXRx0"
    },
    {
        "file": "9.wav",
        "link": "https://www.youtube.com/watch?v=M-P4QBt-FWw"
    },
    {
        "file": "10.wav",
        "link": "https://www.youtube.com/watch?v=cMg8KaMdDYo"
    }
]

now to insert the data into redis we can run the below script

import librosa
import threading
import numpy as np
import json
import os
import uuid

dataset = json.load(open("./dataset/dataset.json", "r"))

music_data = []

def load_music_file(data):
    vector, sr = librosa.load("./dataset/" + data['file'])
    music_data.append([vector, sr, data['link']])

threads = []

for data in dataset:
    t = threading.Thread(target=load_music_file, args=[data,])
    t.start()
    threads.append(t)

for t in threads:
    t.join()

features = []

for y, sr, url in music_data:
    feature = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=200)
    features.append([np.mean(feature.T, axis=0), url])

import redis

redis_client = redis.Redis()

pipeline = redis_client.pipeline()

for feature, url in features:
    pipeline.hmset(
        "music:" + str(uuid.uuid4()),
        {"url": url, "vec": feature.astype(np.float32).tobytes()}
    )

pipeline.execute()

we are using librosa python package to preprocess the audio files and to convert the music file into a vector representation we are using mfcc https://medium.com/prathena/the-dummys-guide-to-mfcc-aceab2450fd

now if you navigate to http://localhost:8001 you can find that the music files have be added into our redis database

using redis knn command to get the recommendations

first step you need to do inorder to use knn feature of redis you must create a vector index which can be done via following command

FT.CREATE "idx:music"
    ON HASH
        PREFIX 1 "music:"
    SCHEMA
        "url" TEXT
        "vec" VECTOR HNSW
            6
            "TYPE" "FLOAT32"
            "DIM" 128    // <-- 128 because the mfcc vector has a dimension of 1 x 128
            "DISTANCE_METRIC" "COSINE"

now if you want to query the items using redis command you can use the following command

FT.SEARCH idx:music 
    "*=>[KNN $K @vec $query_vector as vector_score]"  
    "PARAMS" "4" 
        K 2                 
        "query_vector"      
            <binary_form_of_vector>
    RETURN 2 vector_score url 
    SORTBY vector_score
    DIALECT 2

but i recommend you to use python instead of the above query because it is easy you can query recommendations using the below code

y, sr = librosa.load("./example.wav")
feature = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=200)
vector = np.mean(feature.T, axis=0)
vector = vector.astype(np.float32).tobytes()

q = Query(
    f"(*)=>[KNN 3 @vec $vec_param as vector_score]"
).sort_by("vector_score").return_fields("url","vector_score").dialect(2)

params = {
    "vec_param": vector
}

results = []
query_results = redis_client.ft("idx:music").search(query=q, query_params=params)

for result in query_results.docs:
    results.append(result.url)

print(results)

i have created one flask api so that we can run this model in the browser

# app.py
import tempfile

import librosa
import numpy as np
import redis
from flask import Flask, jsonify, render_template, request
from redis.commands.search.query import Query

redis_client = redis.Redis()
app = Flask(__name__)

@app.route('/')
def index():
    return render_template("index.html")

@app.route("/get_recommendations", methods=['POST'])
def get_recommendations():
    tmp = tempfile.NamedTemporaryFile(delete=True)
    request.files.get("music_file").save(tmp)

    y, sr = librosa.load(tmp.name)
    feature = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=200)
    vector = np.mean(feature.T, axis=0)
    vector = vector.astype(np.float32).tobytes()

    q = Query(
        f"(*)=>[KNN 3 @vec $vec_param as vector_score]"
    ).sort_by("vector_score").return_fields("url", "vector_score").dialect(2)

    params = {
        "vec_param": vector
    }

    results = []
    query_results = redis_client.ft("idx:music").search(query=q, query_params=params)

    for result in query_results.docs:
        results.append(result.url)

    tmp.close()
    return jsonify({
        "results": results
    })

if __name__ == '__main__':
    app.run(debug=True)

<!--- templates/index.html --->
<script>
    async function submitForm(event) {
        event.preventDefault()

        const fd = new FormData(event.currentTarget)
        const data = await (await fetch("{{url_for('get_recommendations')}}", {
            method: 'POST',
            body: fd,
        })).json()

        document.getElementById("results").innerHTML = ""
        data["results"].forEach((url) => {
            document.getElementById("results").innerHTML += `<a href="${url}">${url}</a><br/>`
        })

        return false
    }
</script>

<form action="#" onsubmit="submitForm(event)" enctype="multipart/form-data">
    <input type="file" name="music_file" id="">
    <button type="submit">submit</button>
</form>

<div id="results"></div>

you can find the complete code in the github
https://github.com/rohit20001221/music_recommendation_engine

DEV Community

Building Simple Recommendation Engine With KNN Algorithm using Redis

Top comments (0)

Read next

CountVectorizer vs TfidfVectorizer

AI-Powered Code Completion: Faster, Smarter, and Fully Local Editing for IntelliJ IDEs

Supercharging LLM Testing: TICK Lets You Check the Boxes

Frontier AI Developers Need Internal Audit Function to Address Key Governance Challenges