Alexey Timin for ReductStore

Posted on Oct 11, 2022

How to keep a history of file changes with C++

#cpp #tutorial #beginners #reductstorage

Here, I'm going to show you how you can keep track of file changes in a directory and store them in Reduct Storage by using its C++ client SDK. You can find the full working example here.

Running Reduct Storage

If you're a Linux user, the easiest way to run the storage engine is Docker. This is an example of a docker-compose.yml file:

services:
  reduct-storage:
    image: reductstorage/engine:v1.0.1
    volumes:
      - ./data:/data
    environment:
      RS_LOG_LEVEL: DEBUG
    ports:
      - 8383:8383

You also can download binaries and run them:

RS_DATA_PATH=./data reduct-storage

If everything is ok, you should see Web Console on http://127.0.0.1:8383.

Installing Reduct Storage SDK for C++

Currently, you can only build and install the library manually. Follow this instruction.

File Watcher in C++

The SDK provides cmake find script, so you can easily integrate it in your CMake project. This is the example of your CMakeLists.txt:

cmake_minimum_required(VERSION 3.23)
project(file_watcher_example)

set(CMAKE_CXX_STANDARD 20)

find_package(ReductCpp 1.0.1)
find_package(ZLIB)
find_package(OpenSSL)

add_executable(file_watcher main.cc)
target_link_libraries(file_watcher 
  ${REDUCT_CPP_LIBRARIES} ${ZLIB_LIBRARIES} 
  OpenSSL::SSL OpenSSL::Crypto)

Now we're ready to write C++ code. Our main.cc file:

#include <reduct/client.h>

#include <filesystem>
#include <fstream>
#include <iostream>
#include <map>
#include <regex>
#include <thread>

constexpr std::string_view kReductStorageUrl = "http://127.0.0.1:8383";
constexpr std::string_view kWatchedPath = "./";

namespace fs = std::filesystem;

int main() {
  using ReductClient = reduct::IClient;
  using ReductBucket = reduct::IBucket;

  auto client = ReductClient::Build(kReductStorageUrl);

  auto [bucket, err] = client->GetOrCreateBucket(
      "watched_files", ReductBucket::Settings{
                           .quota_type = ReductBucket::QuotaType::kFifo,
                           .quota_size = 100'000'000,  // 100Mb
                       });
  if (err) {
    std::cerr << "Failed to create bucket" << err << std::endl;
    return -1;
  }

  std::cout << "Create bucket" << std::endl;

  std::map<std::string, fs::file_time_type> file_timestamp_map;
  for (;;) {
    for (auto& file : fs::directory_iterator(kWatchedPath)) {
      bool is_changed = false;
      // check only files
      if (!fs::is_regular_file(file)) {
        continue;
      }

      const auto filename = file.path().filename().string();
      auto ts = fs::last_write_time(file);

      if (file_timestamp_map.contains(filename)) {
        auto& last_ts = file_timestamp_map[filename];
        if (ts != last_ts) {
          is_changed = true;
        }
        last_ts = ts;
      } else {
        file_timestamp_map[filename] = ts;
        is_changed = true;
      }

      if (!is_changed) {
        continue;
      }

      std::string alias = filename;
      std::regex_replace(
          alias.begin(), filename.begin(), filename.end(), std::regex("\\."),
          "_");  // we use filename as an entyr name. It can't contain dots.
      std::cout << "`" << filename << "` is changed. Storing as `" << alias
                << "` ";

      std::ifstream changed_file(file.path());
      if (!changed_file) {
        std::cerr << "Failed open file";
        return -1;
      }

      auto file_size = fs::file_size(file);

      auto write_err = bucket->Write(
          alias, std::chrono::file_clock::to_sys(ts),
          [file_size, &changed_file](ReductBucket::WritableRecord* rec) {
            rec->Write(file_size, [&](size_t offest, size_t size) {
              std::string buffer;
              buffer.resize(size);
              changed_file.read(buffer.data(), size);
              std::cout << "." << std::flush;
              return std::pair{offest + size <= file_size, buffer};
            });
          });

      if (write_err) {
        std::cout << " Err:" << write_err << std::endl;
      } else {
        std::cout << " OK (" << file_size / 1024 << " kB)" << std::endl;
      }
    }

    std::this_thread::sleep_for(std::chrono::milliseconds(100));
  }
  return 0;
}

Okay, it has quite many lines but don't worry this is a simple program. Let's look at the code in detail.

Creating a Bucket

To start writing to the database, we must create a bucket:

  auto client = ReductClient::Build(kReductStorageUrl);
  auto [bucket, err] = client->GetOrCreateBucket(
      "watched_files", ReductBucket::Settings{
                           .quota_type = ReductBucket::QuotaType::kFifo,
                           .quota_size = 100'000'000,  // 100Mb
                       });
  if (err) {
    std::cerr << "Failed to create bucket" << err << std::endl;
    return -1;
  }

Here we build a client which should use a storage engine with the kReductStorageUrl URL. Then we create a bucket with the watched_files name or get an existing one. Pay attention, we provide some settings as well to limit it size with 100Mb, so that the storage engine starts removing old data when we reach this quota.
The SDK doesn't throw any exceptions. Each method returns reduct::Error or reduct::Result<T>, so you can easily check the result in your code and print error messages.

Watching Files

We implement the file watcher in a straightforward way:

  std::map<std::string, fs::file_time_type> file_timestamp_map;
  for (;;) {
    for (auto& file : fs::directory_iterator(kWatchedPath)) {
      bool is_changed = false;
      // check only files
      if (!fs::is_regular_file(file)) {
        continue;
      }

      const auto filename = file.path().filename().string();
      auto ts = fs::last_write_time(file);

      if (file_timestamp_map.contains(filename)) {
        auto& last_ts = file_timestamp_map[filename];
        if (ts != last_ts) {
          is_changed = true;
        }
        last_ts = ts;
      } else {
        file_timestamp_map[filename] = ts;
        is_changed = true;
      }

      if (!is_changed) {
        continue;
      }

      // Storing a changed file...

      std::this_thread::sleep_for(
std::chrono::milliseconds(100));

}

We travel through a given directory fs::directory_iterator(kWatchedPath) and keep the last modification time of each file in the file_timestamp_map map. If it is new (wasn't in the map) or it is changed (timestamp is different), we set the is_changed flag to start storing the changed file.

Don't forget to sleep a while at the end of each cycle to avoid overloading a CPU.

Storing Files

A history of a file is represented as an entry in Reduct Storage. Because an entry name can't have "." we should replace them in our file names:

  std::string alias = filename;
      std::regex_replace(
          alias.begin(), filename.begin(), filename.end(), std::regex("\\."),
          "_");  // we use filename as an entyr name. It can't contain dots.
      std::cout << "`" << filename << "` is changed. Storing as `" << alias
                << "` ";

Then we open the changed file for reading:

     std::ifstream changed_file(file.path());
      if (!changed_file) {
        std::cerr << "Failed open file";
        return -1;
      }

And write it chunkwise to the storage engine:

      auto file_size = fs::file_size(file);
      auto write_err = bucket->Write(
          alias, std::chrono::file_clock::to_sys(ts),
          [file_size, &changed_file](ReductBucket::WritableRecord* rec) {
            rec->Write(file_size, [&](size_t offest, size_t size) {
              std::string buffer;
              buffer.resize(size);
              changed_file.read(buffer.data(), size);
              std::cout << "." << std::flush;
              return std::pair{offest + size <= file_size, buffer};
            });
          });

As you can see, it's quite verbose, but we send files with little chunks, and we can send terabytes without any worries about memory! If you put a huge file into your watched directory, you can see how fast Reduct Storage is.

Getting Data

You can get the data by using Bucket::Query method. You also can use Python or JavaScript Client SDKs, or even wget:

wget http://127.0.0.1/api/v1/b/watched_files/<File-Name>

I hope it was helpful! Thanks!

DEV Community