DEV Community

murugan
murugan

Posted on

Singlestore, MemSQL Basic understanding

Single Store

SingleStore DB is a distributed, relational database that handles both transactions and real-time analytics at scale. Querying is done through standard SQL drivers and syntax, leveraging a broad ecosystem of drivers and applications. Read the links below to get familiar with SingleStore DB:

Reference

How SingleStore DB Works

SingleStore DB is a distributed, relational database that handles both transactions and real-time analytics at scale. It is accessible through standard SQL drivers and supports ANSI SQL syntax including joins, filters, and analytical capabilities (e.g. aggregates, group by, and windowing functions).

Data Ingestion

SingleStore DB can load data continuously or in bulk from a variety of sources. Popular loading sources include: files, a Kafka cluster, cloud repositories like Amazon S3, HDFS, or from other databases. As a distributed system, SingleStore DB ingests data streams using parallel loading to maximize throughput.

SingleStore Pipelines is an easy-to-use built-in capability that extracts, transforms, and loads external data using sources such as Kafka, S3, Azure Blob, and filesystems.

Deploy

SingleStore DB can be deployed on bare metal, on virtual machines, or in the cloud by using SingleStore Tools

Reference

  1. Managed Service
    • The scalable cloud database for data-intensive applications, deployed on AWS, Azure, or GCP
  2. DB Software
    • Manually deploy and manage a SingleStore database cluster on your own hardware.

DB Creation

DROP TABLE IF EXISTS play_game;
CREATE DATABASE play_game;
USE play_game;

CREATE TABLE play_game (
    msgId INT,
    msgDateTime DATETIME NOT NULL,
    gameName  varchar(60),
    SORT KEY (timeSince),
    SHARD KEY (msgId)
);
Enter fullscreen mode Exit fullscreen mode

Pipelines

MemSQL Pipelines is a MemSQL Database feature that natively ingests real-time data from external sources. As a built-in component of the database, Pipelines can extract, transform, and load external data without the need for third-party tools or middleware. Pipelines is robust, scalable, highly performant, and supports fully distributed workloads.

  • Alter pipeline
  • Create Pipeline
  • Create Pipeline with Transform
  • Create Pipeline into Procedure
  • Extract Pipeline into outfile

Reference

Create Pipeline

CREATE PIPELINE play_game
AS LOAD DATA KAFKA '127.0.0.1:9092/play-games-1'
SKIP DUPLICATE KEY ERRORS
INTO TABLE play_game
FORMAT JSON (
    msgId <- msgId,
    msgDateTime <- msgDateTime,
    gameName <- gameName,
    timeSince <- timeSince
)

START PIPELINE play_game;
Enter fullscreen mode Exit fullscreen mode

Basic Comments

Syntax Comments
Delete Pipeline DELETE PIPELINE play_games;
Delete Database DROP DATABASE IF EXISTS play_game;
Delete Table DROP TABLE IF EXISTS play_game;
Stop Pipeline STOP PIPELINE play_game;
Test Pipeline Test PIPELINE play_game;
Extract Pipeline EXTRACT PIPELINE play_games INTO OUTFILE 'file_name.json'

Reference

Load Data

The easiest way to load data is to first upload it to Amazon S3 or Azure Blob Storage. Then, use SingleStore Pipelines to extract your data.

  • Load Data from Amazon Web Services (AWS)
  • Load Data from Microsoft Azure
  • Load Data from the Filesystem using a Pipeline
  • Load Data from Kafka
  • Load Data from MySQL etc

Load Data from KAFA

bin/kafka-topics.sh --create --topic play-games-1 --partitions 1 --bootstrap-server localhost:9092 --replication-factor 1


bin/kafka-topics.sh --describe --topic play-games-1 --bootstrap-server localhost:9092

bin/kafka-console-producer.sh --topic play-games-1 --bootstrap-server localhost:9092

Insert into the data from Kafka usimng JSON

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic play-games-1 < PlayActivity.json

bin/kafka-console-consumer.sh --topic play-games-1 --from-beginning --bootstrap-server localhost:9092

List Topic

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Delete Topic

bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic DummyTopic

Enter fullscreen mode Exit fullscreen mode

Single Store Stored Procedure Example

USE quickstart_kafka;
CREATE TABLE test2 (id int, fname varchar(50), lname varchar(50), addr varchar(50));

DELIMITER //
CREATE OR REPLACE PROCEDURE process_users(GENERIC_BATCH query(GENERIC_JSON json)) AS
BEGIN
INSERT INTO test3(id,fname,lname,addr)
SELECT GENERIC_JSON::id, GENERIC_JSON::fname,GENERIC_JSON::lname,GENERIC_JSON::addr
FROM GENERIC_BATCH;
END //
DELIMITER ;

CREATE or replace PIPELINE jsonproce1 AS LOAD DATA KAFKA '127.0.0.1:9092/test2'
INTO PROCEDURE process_users (GENERIC_JSON <- %)FORMAT JSON ;

test pipeline jsonproce1;

start pipeline jsonproce1;
Enter fullscreen mode Exit fullscreen mode

Manage Data

  • Local and Unlimited Database Storage Concepts
  • Unlimited Data Storage
  • Backing Up and Restoring Data
  • Exporting Particular Database >> mysqldump -h 127.0.0.1 -u root -p -P 3306 foo > foo.sql
  • Export All Database >> mysqldump -h 127.0.0.1 -u root -p -P 3306 --all-databases > full_backup.sql

Aditional Concepts:

Open MemSQL Studio

https://aws.amazon.com/quickstart/architecture/memsql/
https://aws-quickstart.s3.amazonaws.com/quickstart-memsql/doc/memsql-on-the-aws-cloud.pdf

--> SQL Editor
--> Master Aggregator Node (1 Master) (Aggregator we will share the meta data)
--> Child Aggregator Node (2 Nodes)
--> Leaf Node (4 Nodes) - Leaf node we will share the data

Every cluster atleast 1 master aggregater and 1 leaf node

Cluster

A cluster encompasses all of the nodes that are included in a complete SingleStore DB installation. A cluster contains aggregator nodes and leaf nodes.

Connect to your Cluster

You have three ways to connect to your SingleStore DB cluster: SingleStore DB Studio, the singlestore client application, or through any compatible third-party applications.

Node

A node is a server that has an installation of a SingleStore DB instance.

Leaf

A leaf is a node that stores a subset of a cluster’s data. A cluster typically contains many leaves.

Partition

A partition contains a subset (a shard) of a database’s data. A leaf contains multiple partitions. When you run CREATE DATABASE, SingleStore DB splits the database into partitions,
which are distributed evenly among available leaves. With CREATE DATABASE, you can specify the number of partitions with the PARTITIONS=X option.

If you don’t specify the number of partitions explicitly, the default is used (the number of leaves times the value of the default_partitions_per_leaf engine variable.

Aggregator

An aggregator is a node that routes queries to the leaves, aggregates intermediate the results, and sends the results back to the client.
There are two types of aggregators: master and child. A cluster contains exactly one master aggregator, a specialized aggregator responsible for cluster monitoring and failover.
A cluster may also contain zero or more child aggregators (depending on query volume).

Time Series Functions:

For storing and manipulating time series data, SingleStore supports the following functions:

FIRST
LAST
TIME_BUCKET

FIRST
An aggregate function that returns the first value of a set of input values, defined as the value associated with the minimum time.

LAST
An aggregate function that returns the last value of a set of input values, defined as the value associated with the maximum time.

SELECT TIME_BUCKET('4d'), gameName
FROM play_activity ORDER BY 2, 1;

SELECT TIME_BUCKET('1d'), gameName
FROM play WHERE timeSince BETWEEN '2021-10-06T00:00:10.530+00:00' AND '2021-10-07T23:00:10.530+00:00' ORDER BY 2, 1;

SELECT TIME_BUCKET('6h', timeSince) as timeSince, gameName
FROM play
WHERE timeSince > now() - INTERVAL 1 day ORDER BY 2, 1;

Additional comments

Restart Studio

  • sudo systemctl restart singlestoredb-studio

Installation Steps

sudo apt-get update

http://localhost:9000/cluster
http://localhost:8080/cluster/localhost/dashboard

References

timeSince

Top comments (0)