DEV Community

David Liman
David Liman

Posted on • Updated on • Originally published at dvliman.com

Building a Live Streaming app in Clojure

I want to echo John Carmack’s tweet that all giant companies use open-source FFmpeg in the backends. FFmpeg is a core piece of technology that powers our live-streaming and recording system at Inspire Fitness. It certainly is high-quality open-source software that we use to record and stream countless hours of workout videos.

It looks like this:

session details

live sessions

Users can:

  1. watch live-streaming content, or
  2. playback on-demand videos from our content library

High level

Behind the scene, we have:

  1. IP cameras are wired up in each studio room.
  2. The cameras support the RTSP protocol.
  3. A software pipeline integrated with our custom CMS to broadcast (stream) our cameras feed to the internet while simultaneously recording and storing the content to AWS S3 storage for on-demand playback.

How it all works together:

  1. We configure classes (the recordings) to start at a particular time in our dashboard. The time aligns with our studio schedules, where gym members would often join our classes to work out alongside the instructors.
  2. We kick off a dedicated ec2 instance with FFmpeg baked in an AMI image when the class starts. We call this our encoder/transcoder.
;; vm.clj

(aws/invoke state/ec2
  {:op       :RunInstances
   :request {:InstanceType      "c5d.2xlarge"
             :MaxCount          1
             :MinCount          1
             :SubnetId          "subnet-id"
             :ImageId           "ami-id"
             :SecurityGroupIds  ["sg-id"]
             :UserData          (build-user-data class-id)
             :TagSpecifications [{:ResourceType "instance"
                                  :Tags         [{:Key "Name" :Value (make-class-name class-id)}]}]}})
Enter fullscreen mode Exit fullscreen mode
  1. As soon as the ec2 boots up, it runs the cloud-init script, which starts the Clojure process and mount (a state management library) that would then starts the dependencies:
;; core.clj

(defn run-encoder []
  (mount/start #'state/s3
               #'state/db
               #'encoder/encoder))

(defn -main [& args]
  (run-encoder))
Enter fullscreen mode Exit fullscreen mode
  1. This would in turn calls start-stream:
;; encoder.clj

(defstate encoder
  :start (try
           (start-stream)
           (catch Exception e))
             ;; error-handling here
  :stop (stop-stream))
Enter fullscreen mode Exit fullscreen mode
  1. The start-stream logic is actually pretty simple. It pulls feed from our camera and egress out our CDN partner
(defn start-stream []
  (record :encoder :connect-to-camera (make-event-details ...))
  (manager/register :connect-to-camera
    (sh/proc "ffmpeg" "-hide_banner" "-re" "-rtsp_transport" "tcp" "-i"
      config/encoder-rtsp-endpoint
      "-c:a" "aac" "-ar" "48000" "-b:a" "128k"
      "-c:v" "h264" "-profile:v" "high"
      "-g" "48" "-keyint_min" "48" "-sc_threshold" "0" "-b:v" "3072k"
      "-maxrate" "3500k" "-vcodec" "libx264" "-bufsize" "3072k"
      "-hls_time" "6"
      "-hls_playlist_type" "event"
      "-hls_segment_filename" segment-file-pattern
      manifest-file-path))

  (redirect-stdout-stderr :connect-to-camera hls-file-path)

  ;; it takes some time to pull from the RTSP stream and write to the m3u8 file
  ;; the HLS egress looks at the m3u8 file; if it can't find it, the process will exit - so wait for the file
  (record :encoder :wait-for-manifest-file (make-event-details ...))

  (record :encoder :egress (make-event-details ...))
  (manager/register :egress
    (sh/proc "ffmpeg" "-hide_banner" "-re" "-i" manifest-file-path
      "-c:v" "copy" "-c:a" "aac" "-ar" "48000" "-b:a" "128k" "-f" "flv"
      config/encoder-ingress-endpoint))

  (redirect-stdout-stderr :egress egress-file-path))
Enter fullscreen mode Exit fullscreen mode
  1. The output would be a playback URL that our video player would be pulling from.

Notice we are essentially invoking the FFmpeg that we bundled earlier to:

  1. take input from RTSP transport protocol
  2. argument flags for the video/audio codec
Flag What it does
-re read input at native frame rate. Mainly used to simulate a grab device i.e if you wanted to stream a video file, then you would want this, otherwise it might stream too fast
-c:a aac transcode to AAC codec
-ar 48000 set the audio sample rate
-b:a 128k set the audio bitrate
-c:v h264 or copy transcode to h264 codec or simply send the frame verbatim to output
-hls_time the duration for video segment length
-f flv says to deliver the output stream in an flv wrapper
rtmp:// is where the transcoded video stream get pushed to

The code is essentially a shell wrapper to FFmpeg command-line arguments. FFmpeg is the swiss-army tool for all video/audio codecs

The whole encoder.clj is about 300 lines long with error handling. It handles file uploads (video segment files, FFmpeg logs for debugging), egress to primary and secondary/fallback RTMP slot, shutdown processes, and the ec2 instance when we are done with the recording.

Lesson learned

This was a rescue project from Go to Clojure. The previous architecture had too many moving pieces, making HTTP requests across multiple micro-services. The main server would crash daily due to improper handling of WebSocket messages, causing messages to be lost and encoder instances not starting up on time.

The rewrite reduced the complexities. Simple Made Easy as Rich Hickey.

Rewriting a project is never a good approach considering the opportunity cost. I evaluated a few offerings: mux.com, Cloudflare Stream, Amazon IVS. On paper, they have all the building blocks we need. In addition, some have features like video analytics, policing/signing playback URL, which would be useful for us.

Ultimately, the fact that we still had 2 years contract with the CDN company was why we still manage our encoder.

In hindsight, if we consider the storage costs and S3 egress bandwidth fee on top of the CDN costs, I would probably go for ready-made solutions for our company stage. I would start optimizing when we have more traffic.

The good thing is that this system works really well for live-streaming workload with programmatic access. (Sidenote: barring occasional internet hiccups in our studio).

If you have a video production pipeline that involves heavy video editing, going with prebuilt software could be more flexible until you solidify the core functionalities.

Special thanks to:

  1. Daniel Fitzpatrick and Vincent Ho. My coworkers helped proofread this article and maintain the broadcast system. I enjoy working with you both ❤️. We even have automated tests to prove the camera stream is working end to end!
  2. Neil, our product manager, understands tech trade-offs and works with me to balance the product roadmap.
  3. Daniel Glauser, who hired me for this Clojure gig

Top comments (0)