Mangabo Kolawole

Posted on Aug 9 • Edited on Sep 15

Building a Music Streaming Service with Python, Golang, and React: From System Design to Coding Part 3

#go #python #react #architecture

Streaming is an interesting topic in software engineering. Whether it is about music, video, or just simple data, applying this concept from a design, architecture, and coding perspective can become quickly complex if not thought through correctly.

In this article series, we are building a service to stream music using Python, Golang, and React. By the end of this series of articles, you will learn:

How to build an API using Python and Golang
How to serve HTTP range requests
How to create a system design/architecture for a music streaming service

In the last part of this series, we have built an iteration of the version of the application, where a user can retrieve a list of songs from the browser and play selected music from the browser while the architecture can support many requests

We have noticed some cons to our architecture mostly concerning the security and privacy regarding the songs. Any user can directly download the songs, which undermines the necessity of a music streaming application. In this article, we will address this point by adding a streaming service component, and ensure that it can scale when there is a lot of requests.

If you are interested in more content covering topics like this, subscribe to my newsletter for regular updates on software programming, architecture, and tech-related insights.

Setup

You can clone the v1.5 for the project using the following instructions:

git clone -b v1.5 https://github.com/koladev32/golang-react-music-streaming.git
cd golang-react-music-streaming
make setup

This will set up the project by cloning the wanted branch and installing packages and dependencies.

Once the project is set up, we can now move to discussing architecture decisions and enhancement.

Architecture

The image below represent the architecture of the project at its current state.

This architecture is a straightforward monolith that consolidates all backend functionality—such as caching, song management, and database connections—into a single server or domain. To ensure that the bandwidth of the server is not used abnormally, we have delegated the file serving to an external storage. This is represented in the architecture by having the storage component outside of the server domain.

With this new scalable architecture in place, we now face a challenge: users can access direct download links and potentially bypass the streaming service, undermining the purpose of streaming content through our application.

To improve the system, we will redesign the architecture by incorporating a streaming service component.

In the updated architecture, we've introduced a new component called the Streaming Service on the server. This component interacts with the database to retrieve song information. Additionally, the storage component no longer communicates directly with the client but instead communicates with the Streaming Service.

How does the Streaming Service work? When a user requests a song, the API Gateway redirects the request to the Streaming Service. The Streaming Service contacts the database to retrieve the song information and uses the URL to the stored file to download it. The file is then divided into byte chunks and these chunks are buffered using HTTP range requests to the client.

Although it is possible for malicious users to find ways to download the song, we can encrypt the buffered chunks with a key that both the client and the server possess. However, this is an option that can be discussed later in the article.

The Streaming Service will be written using Golang. We will build a streaming engine with Golang, taking advantage of its concurrency features, as we may have thousands of requests to handle per minute. The service can be written in Python, but due to the GIL (Global Interpreter Lock) which can heavily limit the effectiveness of multi-threading in CPU-bound operations.

Python can still be used for streaming services, particularly if the focus is more on ease of development and if the performance demands are not as stringent. However, for a service that needs to efficiently handle a large number of concurrent requests with low latency, Go is often a preferred choice.

Now that we know better about the architecture changes, let's add the streaming engine.

Adding Golang Streaming Engine

The Streaming Engine will be written in Golang and will run as a separate service. Here are the requirements for this service:

The streaming engine will serve only one endpoint songs/listen/<id> where id is the id of the song the user wants to listen to. This will be used to retrieve the song from the storage component.
The streaming service will respond to the client using HTTP Range Requests. We will learn more about the concept after done listing the requirements.
The streaming service should accept ranges in the headers, and send partial responses. This ensures that the client can specify the range of file bytes he can handle according to the internet speed for example.
Tasks such as buffering and reading the chunks of the downloaded file should be handled concurrently.

With the requirements stated, let's talk about HTTP range requests.

Explaining HTTP Range Requests

HTTP Range Requests are a technique that allows clients to request specific portions of a resource, especially when dealing with large files like videos or audio streams. This method is widely employed by streaming services to ensure that content begins playing almost immediately, without the need to download the entire file upfront.

To better understand how HTTP Range Requests operate, let’s break down the process step by step.

How HTTP Range Requests Work

When a client requests a resource, the server typically responds with the entire file. However, when dealing with large files, the server may indicate that it supports range requests, allowing the client to download the file in parts.

Initial Request

* The client starts by sending a standard HTTP `GET` request to retrieve the resource.

* If the file is large, the server might respond with the full resource or include an `Accept-Ranges: bytes` header to indicate that range requests are supported.

Next, the client can make use of the range request feature to download only the necessary parts of the file.

Client Sends Range Request

* To request a specific portion of the resource, the client includes a `Range` header in its request:

    ```json
    Range: bytes=0-1023
    ```

* This header specifies the desired byte range, in this case, the first 1024 bytes.

Upon receiving this request, the server will respond with only the requested portion of the file.

Server Responds with Partial Content

* The server responds with an HTTP `206 Partial Content` status, indicating that it is sending only a portion of the resource:

    ```json
    Content-Range: bytes 0-1023/5000
    ```

Once the client has received this part, it can continue to request additional parts of the file as needed.

Subsequent Requests

* If more data is needed, the client requests the next segment:

    ```json
    Range: bytes=1024-2047
    ```

* The server then responds with the next chunk, continuing this process until the entire file is downloaded or the client has obtained all the necessary parts.

HTTP Range Requests offer several benefits that make them particularly useful in scenarios where large files are involved.

Benefits of HTTP Range Requests

By allowing clients to download only the portions of a file they need, HTTP Range Requests provide several key advantages.

Improved User Experience: Users can start consuming content, such as streaming a video or audio file, almost immediately without waiting for the entire file to download. Additionally, they can skip to different parts of a file without downloading it in its entirety.
Better Bandwidth Management: Only the necessary portions of a file are downloaded, reducing unnecessary data transfer. This also allows downloads to resume from where they left off in case of interruptions.
Scalability for Servers: Servers can better manage their load by serving only the required parts of a resource, leading to more efficient distribution of bandwidth and resources.

To illustrate how this process works in a real-world scenario, consider the example of streaming a song.

Example Flow: Streaming a Song

When a user streams a song, the client and server communicate in a series of requests and responses.

Client Requests Audio Start: The client begins by requesting the first chunk of an audio file:
```
GET /audio.mp3 HTTP/1.1
Range: bytes=0-2047
```
Server Responds with Partial Content: The server sends the first 2048 bytes of the file:
```
HTTP/1.1 206 Partial Content
Content-Range: bytes 0-2047/100000
```
Client Requests Next Segment: As the audio plays, the client requests the next segment:
```
GET /audio.mp3 HTTP/1.1
Range: bytes=2048-4095
```
Server Responds with Next Chunk: The server sends the next portion of the file:
```
HTTP/1.1 206 Partial Content
Content-Range: bytes 2048-4095/100000
```
Client Seeks to Another Part: If the user skips ahead, the client requests a different part of the file:
```
GET /audio.mp3 HTTP/1.1
Range: bytes=8192-10239
```
Server Responds with the New Range: The server responds with the requested part:
```
HTTP/1.1 206 Partial Content
Content-Range: bytes 8192-10239/100000
```

This example highlights how HTTP Range Requests enable efficient and user-friendly streaming, providing a smoother experience by allowing immediate playback and more manageable file transfers.

With HTTP Range requests explained, let's write the implementation in Golang. We will build an API to serve an endpoint and then write a function to handle the streaming via HTTP Range Requests.

Building the Streaming Service with Golang

In this section, we will build the streaming Service with Golang. At the root of the project, create a new folder called streaming-engine. This directory will contain the backend for streaming written in Golang.

mkdir streaming-engine
cd streaming-engine

Then inside this directory, run the following lines to create the Golang project.

go mod init streaming-engine

Then install the required dependencies such as Mux, Sqlite3 driver, and gorm to interact with the database.

go get github.com/gorilla/mux
go get gorm.io/driver/sqlite
go get gorm.io/gorm

Once the installation is done, create a file called main.go where we will put the content of the backend logic.

Writing the streaming Engine backend logic

Now that the project has been set up, we can start writing the code for the streaming engine. Let's begin with the necessary imports and the definition of the essential struct:

package main

import (
    "fmt"
    "io"
    "log"
    "net/http"
    "strconv"
    "strings"

    "github.com/gorilla/mux"
    "gorm.io/driver/sqlite"
    "gorm.io/gorm"
)

var db *gorm.DB
var err error

// Song represents the song model in the existing music_song table
type Song struct {
    ID        uint   `gorm:"column:id"`
    Name      string `gorm:"column:name"`
    File      string `gorm:"column:file"`
    Author    string `gorm:"column:author"`
    Thumbnail string `gorm:"column:thumbnail"`
}

// TableName overrides the table name used by Gorm
func (Song) TableName() string {
    return "music_song"
}

In the code above, we import the required packages to assist in writing the stream handler function and setting up the API. We also define variables such as db for database initialization and err for tracking errors throughout the application. The Song struct is defined to represent the song model in the music_song table. We override the default table name used by Gorm to ensure it correctly maps to the existing database table.

Next, we proceed with writing the function for database initialization:

func initDB() {
    // Initialize SQLite connection
    db, err = gorm.Open(sqlite.Open("../backend/db.sqlite3"), &gorm.Config{})
    if err != nil {
        log.Fatal("Failed to connect to database:", err)
    }
}

In the code above, we define the initDB function to initialize the database connection using the gorm package's Open function. Ensure that the path to the SQLite database file matches your project structure, and adjust it if necessary.

Moving on, we will write the functions that will be utilized within the stream handler function:

Extracting the song ID from the request URL:

func getSongID(r *http.Request) (int, error) {
    params := mux.Vars(r)
    id, err := strconv.Atoi(params["id"])
    return id, err
}

In the code above, we extract the song ID from the URL parameters using mux.Vars. The function converts the ID from a string to an integer with strconv.Atoi and returns the ID along with any encountered errors.

Retrieving the song details from the database:

func getSongFromDB(id int) (Song, error) {
    var song Song
    err := db.First(&song, id).Error
    return song, err
}

The getSongFromDB function queries the database to retrieve the song details for the given ID using db.First. It returns the song data and any errors that arise during the query.

Fetching the file from the URL:

func fetchFile(fileURL string) (*http.Response, error) {
    fullURL := "http://localhost:8000/media/" + fileURL
    resp, err := http.Get(fullURL)
    if err != nil || resp.StatusCode != http.StatusOK {
        return nil, fmt.Errorf("file not found on the server")
    }
    return resp, nil
}

In the code above, fetchFile constructs the full URL for the media file by appending the fileURL to a base URL and perform an HTTP GET request to retrieve the file. It returns the response or an error if the file is not found or an issue occurs.

Parsing the Range header to get the start and end bytes of the file:

func parseRangeHeader(rangeHeader string, fileSize int64) (int64, int64, error) {
    bytesRange := strings.Split(strings.TrimPrefix(rangeHeader, "bytes="), "-")
    start, err := strconv.ParseInt(bytesRange[0], 10, 64)
    if err != nil {
        return 0, 0, err
    }

    var end int64
    if len(bytesRange) > 1 && bytesRange[1] != "" {
        end, err = strconv.ParseInt(bytesRange[1], 10, 64)
        if err != nil {
            return 0, 0, err
        }
    } else {
        end = fileSize - 1
    }

    if start > end || end >= fileSize {
        return 0, 0, fmt.Errorf("invalid range")
    }

    return start, end, nil
}

In the code above, we parse the Range header from the HTTP request to determine the start and end bytes of the file that the client wants to receive. We handle any errors in the range specification and return the start and end byte positions.

Writing the partial content to the response:

func writePartialContent(w http.ResponseWriter, start, end, fileSize int64, resp *http.Response) error {
    w.Header().Set("Content-Range", fmt.Sprintf("bytes %d-%d/%d", start, end, fileSize))
    w.Header().Set("Accept-Ranges", "bytes")
    w.Header().Set("Content-Length", strconv.FormatInt(end-start+1, 10))
    w.Header().Set("Content-Type", "audio/mpeg")
    w.WriteHeader(http.StatusPartialContent)

    // Create a channel for the buffered data and a wait group for synchronization
    dataChan := make(chan []byte)
    var wg sync.WaitGroup
    wg.Add(1)

    go func() {
        defer wg.Done()
        buffer := make([]byte, 1024) // 1KB buffer size
        bytesToRead := end - start + 1
        for bytesToRead > 0 {
            n, err := resp.Body.Read(buffer)
            if err != nil && err != io.EOF {
                http.Error(w, "Error reading file", http.StatusInternalServerError)
                return
            }
            if n == 0 {
                break
            }
            if int64(n) > bytesToRead {
                n = int(bytesToRead)
            }
            dataChan <- buffer[:n]
            bytesToRead -= int64(n)
        }
        close(dataChan)
    }()

    go func() {
        defer wg.Wait()
        for chunk := range dataChan {
            if _, err := w.Write(chunk); err != nil {
                http.Error(w, "Error writing response", http.StatusInternalServerError)
                return
            }
        }
    }()

    // Skip the bytes until the start position
    io.CopyN(io.Discard, resp.Body, start)

    return nil
}

In the code above, writePartialContent sets the headers necessary for partial content delivery and handles the concurrent reading and writing of the specified byte range. We use goroutines to buffer and write data concurrently, ensuring efficient streaming. If any errors occur during the process, they are returned as HTTP errors.

We can now use these functions in the stream handler function and create the API server to serve the streaming endpoint.

// Handles streaming of the file via HTTP range requests
func streamHandler(w http.ResponseWriter, r *http.Request) {
    id, err := getSongID(r)
    if err != nil {
        http.Error(w, "Invalid song ID", http.StatusBadRequest)
        return
    }

    song, err := getSongFromDB(id)
    if err != nil {
        http.Error(w, "Song not found", http.StatusNotFound)
        return
    }

    resp, err := fetchFile(song.File)
    if err != nil {
        http.Error(w, err.Error(), http.StatusNotFound)
        return
    }
    defer resp.Body.Close()

    fileSize := resp.ContentLength

    rangeHeader := r.Header.Get("Range")
    if rangeHeader == "" {
        http.ServeFile(w, r, song.File)
        return
    }

    start, end, err := parseRangeHeader(rangeHeader, fileSize)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    if err := writePartialContent(w, start, end, fileSize, resp); err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
        return
    }
}

func main() {
    initDB()

    r := mux.NewRouter()
    r.HandleFunc("/songs/listen/{id}", streamHandler).Methods("GET")

    log.Println("Server is running on port 8005")
    log.Fatal(http.ListenAndServe(":8005", r))
}

The streamHandler function handles HTTP range requests for streaming files by extracting the song ID, retrieving the song details from the database, and fetching the file from a URL. It then parses any range headers to determine which portion of the file to stream and writes this partial content to the HTTP response.

The main function initializes the database, sets up an HTTP router to handle requests for streaming songs, and starts the server on port 8005. In the streaming engine directory, run the following command to start the server.

go run .

We have now written the Golang service and the client can stream a song via the /songs/listen/{id} where id is the id of the song.

Now that the service is written, we have to make some tweaks on the Django backend and the Frontend.

Modifying the Backend and Frontend to use the Streaming service

Backend Modifications: Restricting Exposed Fields

In the Django backend, we need to control the data exposed through the API. To achieve this, we exclude the file field from the API response. Update the SongSerializer to display only the relevant fields:

# music/serializers.py

class SongSerializer(serializers.ModelSerializer):
    class Meta:
        model = Song
        fields = ['id', 'name', 'artist', 'duration', 'thumbnail']

In this configuration, the file field is deliberately omitted, ensuring it’s not exposed to the client.

Frontend Adjustments: Utilizing the Streaming Endpoint

On the frontend, adjust the logic to leverage the newly created streaming endpoint. The react-h5-audio-player package handles streaming, so manual management of the stream isn’t necessary.

Update the playSong function in app.js to correctly set the song’s URL:

// app.js
...
const playSong = (song) => {
  setCurrentSong(`http://localhost:8005/songs/listen/${song.id}`);
};
...

With this update, the frontend automatically streams the audio using the updated API, providing a seamless experience for the user.

With this final version of the application written, let's talk about some enhancements that can be made.

Enhancements

When building this application and planning the architecture, we ensured scalability and reliability by ensuring that the architecture could support a large number of requests. As we decided to go with something simple for the coding part, it is important to state some enhancements that should be made from an architectural part and a coding part.

Architectural enhancements

At the moment, the current architecture has the streaming service component on the server. While streaming is done using HTTP range requests, it is also important to account for bandwidth usage. Here are the enhancements we can make to the architecture:

Move the streaming engine component outside of the server, and put it on another server/domain with a cache component. The cache is important because the streaming engine is connected to the database.
The API Gateway will be removed from the server too. This will help redirect requests depending on the URL to the streaming engine or the API server.

Here is the new diagram for the architecture following these changes.

Now that we have a better architectural proposal, let's talk about coding enhancements.

Coding enhancements

Many streaming services secure data during transmission by using encryption, which protects the content from unauthorized access and tampering. This process typically involves two main steps:

First, on the backend, the content is encrypted before being sent over the network. Encryption algorithms like AES-128 or AES-256 are commonly used for this purpose. The encrypted content is then delivered to the client via HTTP or HTTPS. Depending on the streaming format, encryption can be applied at the file level or to individual segments if the content is chunked.
Second, on the frontend, the client—such as a web player, mobile app, or smart TV—receives the encrypted content and decrypts it using a key provided by the backend. Decryption usually takes place within the browser or application runtime, with the key exchange secured through protocols like HTTPS or Digital Rights Management (DRM) technologies.

Examples of Encrypted Streaming

HLS (HTTP Live Streaming) with AES-128 Encryption: HLS is a widely used streaming protocol that supports AES-128 encryption. Media files are divided into segments, each encrypted individually. The decryption key is stored on the server and retrieved by the client via a secure HTTPS connection.
DASH (Dynamic Adaptive Streaming over HTTP) with Widevine DRM: DASH is another popular streaming protocol often paired with DRM systems like Google Widevine. The content is encrypted, and the decryption key is managed by the DRM system, ensuring secure key management and licensing. This setup allows only authorized clients to decrypt and play the content.
RTMP (Real-Time Messaging Protocol) with SSL/TLS: RTMP is used for low-latency streaming and can be secured with SSL/TLS for encrypted transmission. While RTMP is less common today compared to HLS and DASH, it remains relevant in some live-streaming scenarios.
DRM Systems like PlayReady or FairPlay: DRM systems such as Microsoft PlayReady and Apple FairPlay are integral to services like Netflix, Hulu, and Apple TV. These systems encrypt content on the server and control access to decryption keys through a licensing server, ensuring that only authorized or paying users can access the content.

This is an interesting step that can be added to the streaming engine to ensure that the streaming is reliably encrypted.

Conclusion

In this article, we've developed a streaming service using Golang and HTTP Range Requests, and explored key architectural improvements to enhance the security and efficiency of our application.

You can find the code for this article here.

This concludes this part of the series, but stay tuned for our next installment, where we'll integrate all the concepts covered here into a global-scale architecture. We’ll explore how to build a streaming service capable of serving users worldwide while maintaining high performance.

If you enjoyed this article, consider subscribing to my newsletter so you don't miss out on future updates.

Your feedback is valuable! If you have any suggestions, critiques, or questions, please leave a comment below.

Stay tuned for more exciting content! 🚀