DEV Community

Alexey Timin for ReductStore

Posted on

How to keep a history of file changes with C++

Here, I'm going to show you how you can keep track of file changes in a directory and store them in Reduct Storage by using its C++ client SDK. You can find the full working example here.

Running Reduct Storage

If you're a Linux user, the easiest way to run the storage engine is Docker. This is an example of a docker-compose.yml file:

services:
  reduct-storage:
    image: reductstorage/engine:v1.0.1
    volumes:
      - ./data:/data
    environment:
      RS_LOG_LEVEL: DEBUG
    ports:
      - 8383:8383
Enter fullscreen mode Exit fullscreen mode

You also can download binaries and run them:

RS_DATA_PATH=./data reduct-storage
Enter fullscreen mode Exit fullscreen mode

If everything is ok, you should see Web Console on http://127.0.0.1:8383.

Installing Reduct Storage SDK for C++

Currently, you can only build and install the library manually. Follow this instruction.

File Watcher in C++

The SDK provides cmake find script, so you can easily integrate it in your CMake project. This is the example of your CMakeLists.txt:

cmake_minimum_required(VERSION 3.23)
project(file_watcher_example)

set(CMAKE_CXX_STANDARD 20)

find_package(ReductCpp 1.0.1)
find_package(ZLIB)
find_package(OpenSSL)

add_executable(file_watcher main.cc)
target_link_libraries(file_watcher 
  ${REDUCT_CPP_LIBRARIES} ${ZLIB_LIBRARIES} 
  OpenSSL::SSL OpenSSL::Crypto)
Enter fullscreen mode Exit fullscreen mode

Now we're ready to write C++ code. Our main.cc file:

#include <reduct/client.h>

#include <filesystem>
#include <fstream>
#include <iostream>
#include <map>
#include <regex>
#include <thread>

constexpr std::string_view kReductStorageUrl = "http://127.0.0.1:8383";
constexpr std::string_view kWatchedPath = "./";

namespace fs = std::filesystem;

int main() {
  using ReductClient = reduct::IClient;
  using ReductBucket = reduct::IBucket;

  auto client = ReductClient::Build(kReductStorageUrl);

  auto [bucket, err] = client->GetOrCreateBucket(
      "watched_files", ReductBucket::Settings{
                           .quota_type = ReductBucket::QuotaType::kFifo,
                           .quota_size = 100'000'000,  // 100Mb
                       });
  if (err) {
    std::cerr << "Failed to create bucket" << err << std::endl;
    return -1;
  }

  std::cout << "Create bucket" << std::endl;

  std::map<std::string, fs::file_time_type> file_timestamp_map;
  for (;;) {
    for (auto& file : fs::directory_iterator(kWatchedPath)) {
      bool is_changed = false;
      // check only files
      if (!fs::is_regular_file(file)) {
        continue;
      }

      const auto filename = file.path().filename().string();
      auto ts = fs::last_write_time(file);

      if (file_timestamp_map.contains(filename)) {
        auto& last_ts = file_timestamp_map[filename];
        if (ts != last_ts) {
          is_changed = true;
        }
        last_ts = ts;
      } else {
        file_timestamp_map[filename] = ts;
        is_changed = true;
      }

      if (!is_changed) {
        continue;
      }

      std::string alias = filename;
      std::regex_replace(
          alias.begin(), filename.begin(), filename.end(), std::regex("\\."),
          "_");  // we use filename as an entyr name. It can't contain dots.
      std::cout << "`" << filename << "` is changed. Storing as `" << alias
                << "` ";

      std::ifstream changed_file(file.path());
      if (!changed_file) {
        std::cerr << "Failed open file";
        return -1;
      }

      auto file_size = fs::file_size(file);

      auto write_err = bucket->Write(
          alias, std::chrono::file_clock::to_sys(ts),
          [file_size, &changed_file](ReductBucket::WritableRecord* rec) {
            rec->Write(file_size, [&](size_t offest, size_t size) {
              std::string buffer;
              buffer.resize(size);
              changed_file.read(buffer.data(), size);
              std::cout << "." << std::flush;
              return std::pair{offest + size <= file_size, buffer};
            });
          });

      if (write_err) {
        std::cout << " Err:" << write_err << std::endl;
      } else {
        std::cout << " OK (" << file_size / 1024 << " kB)" << std::endl;
      }
    }

    std::this_thread::sleep_for(std::chrono::milliseconds(100));
  }
  return 0;
}

Enter fullscreen mode Exit fullscreen mode

Okay, it has quite many lines but don't worry this is a simple program. Let's look at the code in detail.

Creating a Bucket

To start writing to the database, we must create a bucket:

  auto client = ReductClient::Build(kReductStorageUrl);
  auto [bucket, err] = client->GetOrCreateBucket(
      "watched_files", ReductBucket::Settings{
                           .quota_type = ReductBucket::QuotaType::kFifo,
                           .quota_size = 100'000'000,  // 100Mb
                       });
  if (err) {
    std::cerr << "Failed to create bucket" << err << std::endl;
    return -1;
  }
Enter fullscreen mode Exit fullscreen mode

Here we build a client which should use a storage engine with the kReductStorageUrl URL. Then we create a bucket with the watched_files name or get an existing one. Pay attention, we provide some settings as well to limit it size with 100Mb, so that the storage engine starts removing old data when we reach this quota.
The SDK doesn't throw any exceptions. Each method returns reduct::Error or reduct::Result<T>, so you can easily check the result in your code and print error messages.

Watching Files

We implement the file watcher in a straightforward way:

  std::map<std::string, fs::file_time_type> file_timestamp_map;
  for (;;) {
    for (auto& file : fs::directory_iterator(kWatchedPath)) {
      bool is_changed = false;
      // check only files
      if (!fs::is_regular_file(file)) {
        continue;
      }

      const auto filename = file.path().filename().string();
      auto ts = fs::last_write_time(file);

      if (file_timestamp_map.contains(filename)) {
        auto& last_ts = file_timestamp_map[filename];
        if (ts != last_ts) {
          is_changed = true;
        }
        last_ts = ts;
      } else {
        file_timestamp_map[filename] = ts;
        is_changed = true;
      }

      if (!is_changed) {
        continue;
      }

      // Storing a changed file...

      std::this_thread::sleep_for(
std::chrono::milliseconds(100));

}

Enter fullscreen mode Exit fullscreen mode

We travel through a given directory fs::directory_iterator(kWatchedPath) and keep the last modification time of each file in the file_timestamp_map map. If it is new (wasn't in the map) or it is changed (timestamp is different), we set the is_changed flag to start storing the changed file.

Don't forget to sleep a while at the end of each cycle to avoid overloading a CPU.

Storing Files

A history of a file is represented as an entry in Reduct Storage. Because an entry name can't have "." we should replace them in our file names:

  std::string alias = filename;
      std::regex_replace(
          alias.begin(), filename.begin(), filename.end(), std::regex("\\."),
          "_");  // we use filename as an entyr name. It can't contain dots.
      std::cout << "`" << filename << "` is changed. Storing as `" << alias
                << "` ";
Enter fullscreen mode Exit fullscreen mode

Then we open the changed file for reading:

     std::ifstream changed_file(file.path());
      if (!changed_file) {
        std::cerr << "Failed open file";
        return -1;
      }
Enter fullscreen mode Exit fullscreen mode

And write it chunkwise to the storage engine:

      auto file_size = fs::file_size(file);
      auto write_err = bucket->Write(
          alias, std::chrono::file_clock::to_sys(ts),
          [file_size, &changed_file](ReductBucket::WritableRecord* rec) {
            rec->Write(file_size, [&](size_t offest, size_t size) {
              std::string buffer;
              buffer.resize(size);
              changed_file.read(buffer.data(), size);
              std::cout << "." << std::flush;
              return std::pair{offest + size <= file_size, buffer};
            });
          });
Enter fullscreen mode Exit fullscreen mode

As you can see, it's quite verbose, but we send files with little chunks, and we can send terabytes without any worries about memory! If you put a huge file into your watched directory, you can see how fast Reduct Storage is.

Getting Data

You can get the data by using Bucket::Query method. You also can use Python or JavaScript Client SDKs, or even wget:

wget http://127.0.0.1/api/v1/b/watched_files/<File-Name>
Enter fullscreen mode Exit fullscreen mode

I hope it was helpful! Thanks!

Top comments (0)