DEV Community

Alex Antsiferov
Alex Antsiferov

Posted on • Updated on

Moving Mongo Out of the Container | MongoDB Atlas Hackathon 2022 on DEV

What I built

I have a pet project that I started some time ago, while studying programming. It's a Telegram bot for people learning German, called Dasbot.
I'm pretty proud of its daily audience of a few hundred users who have collectively answered more than 300k quiz questions ๐Ÿ˜Ž, but I must confess: until now its database has been residing in a Docker container ๐Ÿ“ฆ. Like, not even on a mounted volume ๐Ÿคฆโ€โ™‚๏ธ.
This hackaton motivated me to amend this gruesome mistake.

Also, now that I know about change streams, I can display some real time stats on the bot's web page, yay!

Category Submission:

No idea! Just wanted to share some life lessons :)

App Link

https://dasbot.yak.supplies

You're welcome to use the bot and answer its questions! (Especially if you're struggling with German like I do). If it annoys you, just ban it ๐Ÿ˜Š

Screenshots

Dasbot stats page

Description

German language is difficult! Especially terrible are its grammatical genders which defy any logic, so you just have to memorize them.
Dasbot actually helps you do this, with a simple spaced repetition algorithm.

It's written in Python, because I was studying Python at that time.
And it's using MongoDB for database, because I didn't need much structure in my documents.
(There should be a photo of my desk here, covered with all the bureaucratic papers they send you twice a day here in Germany ๐Ÿ“ฉ).

In the database I keep everyone's scores neeeded for the repetition system. I also collect stats (user, word, answer, time) -- there could be some useful insights in there.

Link to Source Code

https://github.com/wetterkrank/dasbot -- main app
https://github.com/wetterkrank/dasbot-docs-live -- web app with the new /stats page

Permissive License

MIT License

Background

So, I used Docker.
It's a great tool! And I guess it's ok for a study project to spawn a database in a container. But when you do it in "production", you start collecting some gotchas. Here's a couple of mine.

  mongo:
    ports:
      - "0.0.0.0:27017:27017"
Enter fullscreen mode Exit fullscreen mode

-- this was a part of my docker-compose.yml.

After the launch, everything worked fine for a few days, and then I found my database empty!
I checked the Mongo logs and found some dropDatabase calls coming from unknown IPs. Hacked! ๐Ÿช“ But how!? I knew my ufw rules by heart! What I didn't know is that Docker keeps its own iptables and will not be trammelled by a mere firewall.
So when you expose the port using 0.0.0.0, you share it with the world full of people with port scanners.

Fast forward to this November. I just updated a config setting and decided to restart the containers manually.
Then I pinged the bot and was slightly surprised that it didn't recognise me. So I looked at the db collections... interesting... 0 documents... ๐Ÿ˜ฐ
After scrolling up the shell history, I noticed that I typed docker-compose down instead of docker-compose stop. Here goes my data! Luckily, I had a backup ๐Ÿ˜….

How I built it

As for the moving to Atlas part: this was simple!
I would have loved to use the live migration service but I decided to start with M0 cluster so didn't have the opportunity and just used mongorestore instead:

DB_CONTAINER="dasbot_db"
RESTORE_URI="mongodb+srv://$DB_USERNAME:$DB_PASSWORD@mydb.smth.mongodb.net/"

echo "Piping mongodump to mongorestore with Atlas as destination..."
docker exec $DB_CONTAINER mongodump --db=dasbot --archive | mongorestore --archive --drop --uri="$RESTORE_URI"
Enter fullscreen mode Exit fullscreen mode

One notable hiccup was the speed of mongorestore -- a pitiful 50Mb of data took several minutes to load! However, increasing the number of workers (numInsertionWorkersPerCollection) helped.
ย 

For the change streams (real time stats) exercise I had to refresh my knowledge of aggregation pipelines and write some JS code. I already mentioned stats collection above, it can be used to build all kinds of reports.

So I've added a couple of triggers which are responsible for aggregating this data and publishing the updates to a separate database, and an Atlas app that lets users access this database anonymously.

// Scheduled to run twice per day
// Updates correct / incorrect counters in answers_total
exports = function() {
    const mongodb = context.services.get("DasbotData");
    const collection = mongodb.db("dasbot").collection("stats");
    const pipeline = [
      { $group: {
          _id: { $cond: [ { $eq: ["$correct", true] }, 'correct', 'incorrect' ] },
          count: { "$sum": 1 }
        }
      },
      {
        $out: { db: "dasbot-meta", coll: "answers_total" }
      }
    ]
    collection.aggregate(pipeline);
};

Enter fullscreen mode Exit fullscreen mode
// This runs on every `stats` insert and updates the aggregated results
exports = function(changeEvent) {  
  const db = context.services.get("DasbotData").db("dasbot-meta");
  const answers_total = db.collection("answers_total");

  const fullDocument = changeEvent.fullDocument;
  const key = fullDocument.correct ? "correct" : "incorrect";
  const options = { "upsert": true };

  answers_total.updateOne( { "_id": key }, { "$inc": { "count": 1 } }, options); // { _id:, value: }
};
Enter fullscreen mode Exit fullscreen mode

To display the data, I made a simple React app that uses the Realm Web SDK. Now, when someone answers the bot's question, you can immediately see it โšก.

Additional Resources/Info

This tutorial was quite handy!

Top comments (0)