DEV Community

AnthonyCvn for ReductStore

Posted on • Originally published at reduct.store on

How to Store Vibration Sensor Data

Vibration Data Flow Intro

Efficient and effective storage of vibration data is important to a wide range of industries, particularly where accurate and complex predictive maintenance or optimization is required.

This blog post looks at best practices for managing vibration data, starting with storing both raw and pre-processed metrics to take advantage of their unique benefits. Then, we'll explore the differences between traditional time series databases and a time series object storage such as ReductStore.

ReductStore is designed to efficiently handle time series unstructured data, making it an excellent choice for storing high frequency vibration sensor measurements.

We'll also cover strategies for eliminating data loss through volume-based retention policy and automated replication, ensuring that critical information is always available for diagnostics and analysis.

Here is a summary of the topics we'll cover:

Store Both Raw and Pre-Processed Metrics

Maintaining both raw data and pre-processed metrics such as RMS (Root Mean Square), peak-to-peak and crest factor can be beneficial for several reasons.

  • Raw data provides the detailed granularity required for in-depth diagnostics and future algorithm development.
  • Pre-processed metrics provide immediate insight into equipment condition without the need for heavy computation.

Benefits of Pre-Processing Before Storage

The main advantage of pre-processing metrics is that it significantly reduces storage requirements by summarizing raw data into key metrics.

For example, a signal can be divided into 1 second chunks. The result is then a new signal sampled at 1Hz that aggregates the content of each chunk. As seen in the image below, the signal is divided into 1-second chunks, and the RMS value would be calculated for each chunk. For an original signal sampled at 10kHz, this would reduce the data size by a factor of 10,000. This approach is particularly useful for vibration data but some information is ultimately lost in the process.

Signal in Chunks

The typical metrics include:

  • RMS (Root Mean Square): Represents the power content of the signal by calculating the root of the mean square:
    • RMS=1ni=1nxi2\text{RMS} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} x_i^2 } , where xix_i is a sample of the signal and nn is the number of samples in a given time window (e.g., 10,000 samples for a 1 second chunk sampled at 10kHz).
  • Peak-to-Peak: Measures the difference between the maximum and minimum values, indicating the signal's amplitude range:
    • Peak-to-Peak=max(x)min(x)\text{Peak-to-Peak} = \max(x) - \min(x) , where max(x)\max(x) and min(x)\min(x) are the maximum and minimum values of the signal in a given time window (e.g., 1 second chunk).
  • Crest Factor: The ratio of the peak value of the signal to its RMS value, indicating how sharp or sudden the peaks in the signal are:
    • Crest Factor=max(x)RMS\text{Crest Factor} = \frac{\max(|x|)}{\text{RMS}} , where max(x)\max(|x|) is the maximum absolute value of the signal in a given time window.

Many other metrics can be calculated depending on the application, and more advanced signal processing or machine learning techniques can be applied to extract more information from the signal. At the same time, access to raw data is critical, as it enables the calculation of these various metrics and the application of various analytical methods. This is especially important as new techniques and algorithms are developed, allowing for continuous improvement and more accurate diagnosis.

Advantages of Storing Raw Data

Storing raw data in the context of vibration monitoring offers significant advantages. Raw sensor output stored as a blob captures the full fidelity of the original signal, allowing extensive post-processing and re-analysis using different algorithms or filters.

This flexibility is essential for developing new diagnostic tools or improving existing ones without the need for repeated data acquisition. For example, raw data can be used to perform detailed frequency analysis using FFT (see example below), detect spikes using time domain analysis, identify signal variations using wavelet transform, find faults in rotating equipment using envelope analysis, and understand structural vibrations using modal analysis.

FFT Image Example

In this example, the Fast Fourier Transform (FFT) is calculated to identify the frequency content of the signal. This information can be used to identify the dominant frequencies in the signal, which in turn can be an indication of rotating equipment faults such as unbalance, misalignment or bearing failure.

Use Time Series Databases

Time Series Databases (TSDBs) are specialised data storage systems optimised for handling time-indexed data. They excel at managing large volumes of sequentially generated data points, such as vibration measurements, by providing efficient write and read operations.

Time Series Object Store vs. Traditional Time Series Database

A time series object store and a traditional time series database can serve similar purposes, but have distinct architectural differences. While a traditional TSDB is optimised for high-speed ingestion and retrieval of scalar data points, an object store is designed to handle complex, high-dimensional data objects and their metadata. This makes the object store more versatile for applications that require rich contextual information, such as vibration analysis that includes waveform data and diagnostic logs.

Vibration Data in Time Series Object Store

With a time series object store, each chunk of vibration data is stored as a binary object with a timestamp along with metadata such as preprocess metrics that can be useful for filtering and replication (more on that later).

Data Flow Schema

This method allows efficient management of large vibration datasets by providing fast access to specific time periods. In fact, ReductStore outperforms TimescaleDB for blobs ranging from 1KB and larger using this method, with improvements ranging from 205% to 1300%.

Adopt Efficient Data Retention and Replication Strategies

By storing raw sensor data locally, we minimize latency and retain critical information for immediate diagnosis.

However, local storage can quickly fill up at the edge, leading to potential disk bottlenecks. To prevent this, we need to periodically eliminate data. However, this data may be critical for further analysis or diagnosis.

Automated replication of important data chunks is therefore essential to ensure that critical information is retained.

Volume-Based Retention Policy

A real-time FIFO (first-in, first-out) quota prevents disk space shortages in real time. Typically, databases implement retention policies based on time periods; in the case of ReductStore, retention can be set based on data volume. This is particularly useful when storing vibration sensor data on edge devices with limited storage capacity.

Time-based retention can lead to data loss during downtime. For example, if a system retains data for eight days and goes offline over the weekend, it might only capture six days of data before starting to overwrite. Additionally, if the device is offline for a long time, it could delete all existing data once restarted, losing important information for diagnostics.

Typical Weekend Problem

In the above diagram, the system retains data for eight days, but due to a weekend shutdown, it will only capture six days of data before starting to overwrite (assuming a retention policy of eight days).

With ReductStore's volume-based retention policy, the system retains data based on the amount of data stored, ensuring that critical information is preserved even after downtime or long periods of inactivity.

Data Replication Based on Pre-processed Metrics

By preprocessing vibration data to extract key metrics such as peak amplitude, frequency content, and RMS values, the system can prioritize these summarized metrics for replication.

Data Replication Schema

In this example, the system stores raw data locally and pre-processed metrics in a time series object store. Both the pre-processed metrics and the raw data are then automatically replicated to a central database. The replication strategy in this scenario focuses on reducing data transfer by replicating raw data only when necessary, based on the insights derived from the pre-processed metrics.

This approach ensures that important diagnostic information is preserved even if the raw data must be dropped due to storage constraints. For more details on data replication, check out this data replication guide.

Setting Up the Environment for Data Flow Example

Before we dive into the code, let's set up our environment. We'll be using Docker to run ReductStore and Python for our client application.

ReductStore Setup

Create a docker-compose.yml file with the following content:

version: "3.8"services: reductstore: image: reduct/store:latest ports: - "8383:8383" volumes: - data:/data environment: - RS_API_TOKEN=my-tokenvolumes: data: driver: local
Enter fullscreen mode Exit fullscreen mode

Then we ca run ReductStore with:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

This will start ReductStore on port 8383 with a simple API token for authentication.

Python Setup

Make sure you have Python 3.8+ installed in your environment. Then simply install the necessary libraries for our example using pip:

pip install reduct-py numpy
Enter fullscreen mode Exit fullscreen mode

Now that we have our environment set up, let's dive into the code.

Full Data Flow Example with Python

Let's break down the main components of our Python script:

Connecting to ReductStore

async def setup_reductstore() -> Bucket: client = Client("http://localhost:8383", api_token="my-token") return await client.create_bucket("sensor_data", exist_ok=True)
Enter fullscreen mode Exit fullscreen mode

This function establishes a connection to ReductStore and creates (or gets) a bucket named sensor_data. A bucket is a logical container for storing time series data, and each bucket can contain multiple entries (e.g., vibration_sensor_1, vibration_sensor_2).

Generating Simulated Sensor Data

def generate_sensor_data(frequency: int = 1000, duration: int = 1) -> np.ndarray: t = np.linspace(0, duration, frequency * duration) signal = np.sin(2 * np.pi * 10 * t) + 0.5 * np.random.randn(len(t)) return signal.astype(np.float32)
Enter fullscreen mode Exit fullscreen mode

This function generates a simulated sensor signal: a simple sine wave with added noise. In a real-world scenario, you would replace this with actual sensor readings.

As we saw earlier, it's beneficial to divide the data into chunks for more efficient storage and querying. In this example, we generate 1 second of data at a time (1,000 samples at 1 kHz), that we'll store as a single entry in ReductStore.

Calculating Metrics

def calculate_metrics(signal: np.ndarray) -> tuple: rms = np.sqrt(np.mean(signal**2)) peak_to_peak = np.max(signal) - np.min(signal) crest_factor = np.max(np.abs(signal)) / rms return rms, peak_to_peak, crest_factor
Enter fullscreen mode Exit fullscreen mode

We calculate three common metrics for our signal: RMS (Root Mean Square), Peak-to-Peak, and Crest Factor.

Packing Binary Data

def pack_data(signal: np.ndarray) -> bytes: fmt = f">{len(signal)}f" return struct.pack(fmt, *signal)
Enter fullscreen mode Exit fullscreen mode

This function uses the struct module to pack our numpy array into a binary format, specifically with the format string ">1000f" (more details on this below). You may be wondering why we don't use numpy's tobytes() method. While tobytes() is convenient, it offers limited control over the byte format, which can lead to compatibility problems when reading the data on different devices.

The struct module, on the other hand, allows us to specify byte order and data type, preserving consistent data representation and avoiding compatibility problems. The format string ">1000f" is explained as follows, based on the struct module documentation:

  • > indicates big-endian byte order, with the most significant byte (leftmost byte) stored first, which is common in network communications.
  • 1000 is the number of elements in the array (1000 samples).
  • f is the data type (float) for each element, with a default size of 4 bytes.

The choice of binary data depends on your specific requirements, if you are using specific hardware or software that requires a different format, you can adjust the format string accordingly.

Some restricted embedded systems may require a specific byte order or data type, so it's important to understand the format requirements of your architecture.

Storing Data in ReductStore

HIGH_RMS = 1.0HIGH_CREST_FACTOR = 3.0HIGH_PEAK_TO_PEAK = 5.0async def store_data( bucket: Bucket, timestamp: int, packed_data: bytes, rms: float, peak_to_peak: float, crest_factor: float,): labels = { "rms": "high" if rms > HIGH_RMS else "low", "peak_to_peak": "high" if peak_to_peak > HIGH_PEAK_TO_PEAK else "low", "crest_factor": "high" if crest_factor > HIGH_CREST_FACTOR else "low", } await bucket.write("sensor_readings", packed_data, timestamp, labels=labels)
Enter fullscreen mode Exit fullscreen mode

This is where we store our packed chunk of data, along with labels that indicate whether each metric is high or low. This allows us to later replicate and filter data based on these metrics.

The hard-coded thresholds for high RMS, peak-to-peak, and crest factor values are for demonstration purposes. These values should be determined based on your specific sensor and application requirements and can be adjusted using environmental variables or configuration files.

Querying and Retrieving Data

async def query_data(bucket: Bucket, start_time: int, end_time: int): async for record in bucket.query( "sensor_readings", start=start_time, stop=end_time ): print(f"Timestamp: {record.timestamp}") print(f"Labels: {record.labels}") data = await record.read_all() num_points = len(data) // 4 fmt = f">{num_points}f" signal = struct.unpack(fmt, data) signal = np.array(signal, dtype=np.float32) print(f"Number of data points: {num_points}") print(f"First few values: {signal[:5]}") print("---")
Enter fullscreen mode Exit fullscreen mode

This function shows how to query data within a given time range and unpack the binary data back into the original Numpy array.

  1. The read_all() method reads the whole chunk of data at a given timestamp.
  2. The length of the data is divided by 4 to get the number of float values, since each float is 4 bytes long.
  3. The same format string used to pack the data is used to unpack the data to ensure that the data is interpreted correctly.
  4. The unpacked data is then converted back to a Numpy array, which is more convenient for further processing.

Main Execution

async def main(): bucket = await setup_reductstore() # Store 5 seconds of data for _ in range(5): timestamp = int(time.time() * 1_000_000) signal = generate_sensor_data() rms, peak_to_peak, crest_factor = calculate_metrics(signal) packed_data = pack_data(signal) await store_data( bucket, timestamp, packed_data, rms, peak_to_peak, crest_factor ) await asyncio.sleep(1) # Query the stored data for the last 5 seconds end_time = int(time.time() * 1_000_000) start_time = end_time - 5_000_000 await query_data(bucket, start_time, end_time)
Enter fullscreen mode Exit fullscreen mode

This is the main execution flow of our script, which demonstrates the complete data flow:

  1. We connect to ReductStore and create a bucket.
  2. We store 5 seconds in chunks of 1 second data. Each timestamp is generated using the current time in microseconds.
  3. We query the stored data for the last 5 seconds and print the results.

Now that we are able to store and query vibration sensor data, let's explore how to duplicate important data using ReductStore's replication feature.

Replication with ReductStore Web Console

In addition to storing and querying data, we can also set up replication to duplicate our sensor data across multiple ReductStore instances.

Replication tasks can be configured in a variety of ways, such as

  • Replicate all data from a source bucket to a target bucket (e.g. sensor_data to backup_data)
  • Replicate only specific entries, e.g. sensor_readings.
  • Replicate data based on labels, e.g. replicate only data with high RMS values to high_peak_to_peak bucket (as we'll do in this example).

Data Replication Flow

The replication task can be set using client libraries, HTTP API, CLI, provisioning, or the ReductStore web console. In this example, we'll use the web console for simplicity, but you can also refer to the Replication Guide for more details.

So the bucket structure that we'll set up is as follows:

  • sensor_data bucket contains all sensor readings under the entry sensor_readings.
  • high_peak_to_peak bucket will contain only the sensor readings with high peak-to-peak values under the same entry name sensor_readings.

Here's how to set it up:

  1. Access the ReductStore web console (typically at http://localhost:8383 if running locally).<!-- --> Dashboard Example
  2. Navigate to the "Replications" section to create a new replication with the "+" button.
  3. Enter a name for your replication task and add the following details:<!-- -->
    • Choose your source bucket (in this case, sensor_data).
    • Enter the destination bucket name (e.g., high_peak_to_peak)
    • Enter the details for your destination ReductStore instance (in this case, the same instance for demonstration purposes http://localhost:8383)
    • Enter the destination token (in this case, the same token my-token)<!-- --><!-- --> <!-- --> Settings Example
    • Then, set up filters based on labels to replicate only data with high RMS.<!-- --><!-- --> <!-- --> Filter Example
  4. Click "Create" to start the replication process.

With this new replication task, all new data from the sensor_data bucket with the label peak_to_peak = high will be replicated to the high_peak_to_peak bucket.

info

Only new data will be replicated, so you can set up replication tasks at any time without worrying about duplicating existing data.

Conclusion

We explored a practical implementation of storing and querying vibration sensor data using ReductStore and Python. Here's a summary of what we covered:

  1. We set up a ReductStore instance using Docker and installed the necessary Python libraries.
  2. We simulated vibration sensor data, calculated key metrics (RMS, peak-to-peak, and crest factor), and packaged the data into a binary format that we can configure to ensure the data can be read by other devices.
  3. We stored the data in ReductStore, along with labels indicating whether each metric was high or low.
  4. We demonstrated how to query the stored data and unpack it back into a usable format.
  5. Finally, we explored how to set up replication using the web console so that we could copy high peak-to-peak metrics into a separate bucket.

This setup can serve as a starting point for building your own vibration data management system, which you'll likely need to tailor to your specific needs and hardware.

Here are some key takeaways:

  • Time series object storage can offer significant advantages over traditional TSDBs by efficiently managing vibration data in chunks.
  • Volume-based retention policies and automated data replication ensure critical information is retained while minimizing storage constraints at the edge.
  • By pre-processing and prioritizing metadata for replication, critical diagnostic data remains accessible even if raw data is overwritten.

Thanks for reading, I hope this article will help you choose the right storage strategy for your vibration data. If you have any questions or comments, feel free to use the ReductStore Community Forum.

Top comments (0)