Our team had a chance to sit down with Tim Specht, Co-Founder and CTO at Dubsmash. If you’re not familiar with the organization, Dubsmash is a company that provides a social platform for users to share videos via their mobile applications. Users can choose an audio recording or soundbite from TV shows, movies, music, and other internet trends and record a video of themselves dubbing over that clip of audio, which can then be posted and shared.
Dubsmash’s differentiating factor from other platforms on the market is its focus on creators. The format, features, and functionality of the app keep the creators of the content in mind instead of curating everything toward the consumer due to Dubsmash’s prioritized belief that creators deserve attention. The technology is geared toward helping creators make compelling content so that they can keep engagement up within their followership. Dubsmash hosts a program for artists and creators so that they can boost their careers following their passions- this initiative proves that Dubsmash is dedicated to giving back to its community.
Additionally, Dubsmash has always and will always value the diversity of all kinds. It has been revealed that one of Dubsmash’s main competitors, TikTok, has been explicitly modifying its content not to promote posts featuring unattractive people or poor neighborhoods in their ‘For You’ tab of ‘Recommended Videos.’ TikTok revolves its algorithms around elements that will generate the most amount of engagement, like ranking videos featuring the traditional standard of beauty or clean, modern homes as backdrops to grab society’s attention in the fastest, easiest way possible.
At Dubsmash, there is not nearly as much censorship so that everyone has a fair shot at getting their ideas, art, and content out there, regardless of how someone looks or where they're filming their videos. One advantage of not being TikTok is that the app feels less crowded by semi-pro creators and influencers. That gives users the vibe that they’re more likely to hit the Trending or Explore page on Dubsmash. The Trending page is dominated by hot new songs and flashy dances, even if they’re shot with a lower production quality that feels accessible. Dubsmash gives space for more opportunity by making its 'Explore' page about discovering accounts and all the content they’ve made rather than specific videos. On Tiktok, popular clips may have multi-million views while on Dubsmash, there are only tens of thousands. However, there is still enough visibility to make shooting on Dubsmash worth it, especially for lesser-known artists who want to be discovered.
Dubsmash has captured 27% of the U.S. short-form video market share by installs, second to TikTok, which pulls 59%. Dubsmash holds 73% of the U.S. market outside of TikTok as far as active users go, compared to just 23% on Triller, 3.6% on Firework, and 0% on Facebook’s Lasso. Dubsmash has three times as many active users and saw 38% more first-time downloads in 2018 than 2019. 30% of Dubsmash’s daily users are creating content, resulting in 30% month-after-month retention.
Dubsmash uses a hybrid approach to hosting their application. Because the app is fundamentally a mobile application for iOS and Android devices, only the backend, and backend related services need to be hosted. For this, Dubsmash relies heavily on Heroku’s Container Service – all Docker images are stored in Quay, allowing for near real-time rollbacks in the event that something goes wrong.
Heroku allows Dubsmash to scale both horizontally and vertically in an infinite way without having to worry about employing a large number of backend infrastructure engineers. Simply build, push, and deploy.
Aside from Heroku, Dubsmash makes heavy use of AWS as their primary service provider with GCP coming in second due to their robust cloud data warehouse – BigQuery. According to Tim Specht, Co-Founder and CTO at Dubsmash, “If AWS offers a service, Dubsmash likely uses it to some extent.”
GCP’s robust cloud datastore BigQuery offers insight into user data in a fraction of the time that their previously homegrown solution which utilized AWS Redshift, among other tooling. “What would normally take hours of query time can now be done effectively and cost-efficiently in seconds using GCP’s BigQuery, allowing the team at Dubsmash to increase overall productivity and user experience.”
Daily, Dubmash utilizes dozens of third-party services from various vendors to ensure optimal uptime and visibility into user-based metrics.
For example, Celery is heavily used as a distributed task queue for video decoding/encoding and used to process millions of tasks daily. Memcached and Redis are used for caching, allowing for snappy load times; whereas Airflow is used for complex workflows such as analytical reporting and notifications, etc.
The most important question that we asked is what database of choice Dubamash is using and why? The answer was sweet and simple – PostgreSQL. The reasoning behind this is due to the many 1:1 relationships that Dubsmash implements. In contrast, a database such as MongoDB wouldn’t make as much sense due to the lack of 1:N relationships and primary/secondary setup. The reads and writes to Postgres are efficient, as are queries. Postgres works excellent for our use-case, and we (Dubsmash) stand by our decision to use it as our core database.
While this is not an exhaustive list of the various infrastructure that Dubsmash uses daily, it should give you a good idea of how Dubsmash operates.
Python is the primary language of choice at Dubsmash; however, mobile applications are written in native languages. Currently, iOS is utilizing Swift, whereas the Android app is undergoing a migration from Java to Kotlin.
Docker has been a fantastic product that we heavily utilize here at Dubsmash. Docker allows our developers to not only onboard themselves quickly and efficiently but not have to have in-depth knowledge and understanding of every part of our infrastructure’s stack. For example, our infrastructure relies on Python3, Django, PostgreSQL, Memcached, Celery, and Redis, to name a few.
Without having Docker and Docker Compose at our disposal, our engineers would be wasting precious time and valuable resources bringing themselves up to speed with the various products we utilize, best practices, setup, and cleanup scripts, etc.
“Docker has been a game-changer for our team here at Dubsmash.”, according to Tim.
Dubsmash rarely builds in-house products as they prefer to buy vs. build. However, under certain circumstances, requirements come into play, and you have to make business decisions that ultimately move your business and engineering team forward.
With this in mind, Dubsmash played around with AWS Kinesis, only to find that the product failed to meet their expectations from an analytics and cost perspective.
Instead, Dubsmash dumps all of their data into GCP’s BigQuery, allowing the team to exercise automated and ad-hoc queries on the fly in a matter of seconds with a simple and easy to use query language. Depending on the type of query, and need for the query, the resulting data pulled from BigQuery will end up in Google’s Data Studio for reporting purposes, or tossed into a workflow that Airflow will take over (e.g., user push notifications, etc.).
During the early stages of Dubsmash, the engineering team played around with several pieces of technology to build a home-grown search engine for real-time user search capabilities. After many wasted hours and a lot of money spent on hosting Elasticsearch, the team decided to switch their mindset and leave it to the experts at Algolia.
Algolia focuses 100% on scalable search, offering lightning-fast query response times compared to that of Elasticsearch.
According to Dubsmash, Elasticsearch had a complicated and proprietary query language, offered poor performance when it came to indexing and reindexing data, and was an overall pain to keep up and running efficiently. With Algolia, Dubsmash was able to cut their time spend on search by a fraction and focus on what matters most – user experience.
At the core of Dubsmash lies Stream, the primary driving force for the Feeds infrastructure users rely on to discover content within the application.
The Co-Founder and CTO admitted that they had been looking at Stream for a while, and even attempted to build an in-house feed service, but failed when it came to overall cost and performance.
“Building a custom in-house infrastructure to power feeds, accompanied with ranking (weights), speed, reliability, and something cost-effective is a true challenge, and we thank Stream for everything that they do.” - Tim Specht
By utilizing Stream's Feeds Infrastructure, Dubsmash can scale their service to an infinite number of users while having the ability to make real-time tweaks to sorting and weighting, time decay, etc., while having peace of mind that their application won’t come to a halt due to hiccups that may have come up in a home-grown solution.
Emails aren’t used all that often with Dubsmash as they rely heavily on SMS and Push for notifications. However, when emails are sent, AWS Simple Email Services (AWS SES) is used.
SMS is the primary form of communication that Dubsmash utilizes to interact with its users (aside from Push Notifications). To establish a reliable SMS workflow, Dubsmash uses AWS Simple Notification Service (AWS SNS) to send messages to users.
Push notifications are how Dubsmash connects with their uses. For example, if a user hasn’t logged into their application in a couple of days, Dubsmash will trigger a push notification to the user to re-engage the user and bring them back into the application.
At times, Dubsmash has had to send several million push notifications in a single go, sometimes peaking 16+ million push notifications a day. This is hard on any type of infrastructure due to the hardware and network requirements behind the scene. To accomplish this, Dubsmash utilizes a combination of AWS Lambda and AWS SNS to send notifications to users in an efficient method.
To accomplish this scale, the engineers at Dubsmash have devised a method to send batched requests to an AWS Lambda instance, which then divides the total number of messages into batches of 500. From those batches of 500 messages, AWS Lambda instances are provisioned on the fly (coded in Python for little overhead) and sent via AWS SNS.
Streaming video has always been a challenge for many applications. Luckily, Dubmash knows this problem all too well as they specialize in streaming videos to user's devices.
If you have any background in streaming video, you’ll know that it’s a hard problem to solve. There are many constraints on file sizes, file types (e.g., HLS, MPEG-dash), and even enforcements that are put in place by the user's end browser. To get around these constraints, Dubsmash stores all videos at MP4 files on AWS S3 and streams them via AWS Cloudfront to the user's devices and force MP4 (rather than utilizing adaptive streaming).
By doing so, this ensures that there is no downgrading of the video, enforcing the best user experience possible– a core value of Dubsmash.
Dubsmash is always looking ahead in terms of how to grow and evolve as a social media platform. Here are some of their goals for the future:
- Build more tools dedicated to the content creators. Dubsmash wants to create more opportunities and flexibility for artists so that they can make high-engagement products for their audiences in creative ways.
- Grow user-base inclusively and fairly.
- Increase personalization and customization. Dubsmash wants to curate the most relevant content for each user, making proper and attractive recommendations based on app-activity.
- Continue to make Dubsmash a fun experience for both creators and consumers
- Future features: Filters, speed controls, time stickers
A lot of complex, technical, detailed work goes into this app behind the scenes to create a simple, clean, straightforward user experience. Dubsmash will continue to differentiate themselves by positioning their app as a legitimate tastemaking and community platform, as long as a utility for content-creators. Dubsmash is now up to one billion video views per month, and this number will likely keep growing as the team keeps innovating and building.