DEV Community

yemingfeng
yemingfeng

Posted on

SDB :Pure golang development, distributed, rich data structure, persistent, easy-to-use NoSQL database

SDB :Pure golang development, distributed, rich data structure, persistent, easy-to-use NoSQL database


Why do you need SDB?

Consider the following business scenario:

  • Counting Service: Count the likes, plays and other data of the content
  • Commenting Service: After posting a comment, view a list of comments for a piece of content
  • Recommendation Service: each user has a recommendation list with content and weight

The above business scenarios can all be implemented through MySQL + Redis.

MySQL acts as a persistent capability in this scenario, and Redis provides read and write
capabilities for online services.

The above storage requirements are: persistence + high performance + rich data structure. Can only
one storage be used to satisfy the above scenario?

The answer is: very little. Some databases either support insufficient data structure, or data
structure is not rich enough, or the access cost is too high......

In order to solve the above problems, SDB was created.


A brief introduction to SDB


Architecture

architecture

  • Implemented a master-slave architecture based on the raft protocol.
  • When a write request is initiated, any node can be connected. If the node is the master node, the request will be processed directly, otherwise the request will be forwarded to the master node.
  • When initiating a read request, you can connect to any node, and the node will process the request directly.

Quick to use

Compile protobuf

Due to the use of protobuf, the project does not upload the go files generated by protobuf to
github. Need to manually trigger compilation of protobuf files

sh ./scripts/build_protobuf.sh
Enter fullscreen mode Exit fullscreen mode

Start the master server

sh ./scripts/start_sdb.sh
Enter fullscreen mode Exit fullscreen mode

Start the slave server1

sh ./scripts/start_slave1.sh
Enter fullscreen mode Exit fullscreen mode

Start the slave server2

sh ./scripts/start_slave2.sh
Enter fullscreen mode Exit fullscreen mode

Use pebble storage engine by default.

Use cli

Client use

package main

import (
    "github.com/yemingfeng/sdb/internal/util"
    pb "github.com/yemingfeng/sdb/pkg/protobuf"
    "golang.org/x/net/context"
    "google.golang.org/grpc"
)

var clientLogger = util.GetLogger("client")

func main() {
    conn, err := grpc.Dial(":10000", grpc.WithInsecure())
    if err != nil {
        clientLogger.Printf("faild to connect: %+v", err)
    }
    defer func() {
        _ = conn.Close()
    }()


    c := pb.NewSDBClient(conn)
    setResponse, err := c.Set(context.Background(),
        &pb.SetRequest{Key: []byte("hello"), Value: []byte("world")})
    clientLogger.Printf("setResponse: %+v, err: %+v", setResponse, err)
    getResponse, err := c.Get(context.Background(),
        &pb.GetRequest{Key: []byte("hello")})
    clientLogger.Printf("getResponse: %+v, err: %+v", getResponse, err)
}
Enter fullscreen mode Exit fullscreen mode

Performance Testing

Test script: benchmark

Testing Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)

Testing processor: 2.9GHz Dual-core Core i5

Testing memory: 8GB

Testing result: peek QPS > 10k,avg QPS > 9k,set avg time < 80ms,get avg time <
0.2ms

benchmark


Plan

  • [x] Write interface documentation
  • [x] Implement more APIs (2021.12.30)
    • [x] String
      • [x] SetNX
      • [x] GetSet
      • [x] MGet
      • [x] MSet
    • [x] List
      • [x] LMembers
      • [x] LLPush
    • [x] Set
      • [x] SMembers
    • [x] Sorted Set
      • [x] ZMembers
  • [x] Support for richer data structures (2021.01.20)
    • [x] bitset
    • [x] map
    • [x] geo hash
  • [x] sdb-cli (2021.03.10)
  • [x] master-slave architecture
  • [ ] Write sdb kv storage engine

Interface documentation

string

interface parameter description
Set key, value set kv
MSet keys, values set multi kv
SetNX key, value When key does not exist, set value
Get key Get the value corresponding to the key
MGet keys Get the value corresponding to a set of keys
Del key delete key
Incr key, delta Add delta to the key, and throw an exception if the value is not a number. If value does not exist, then value = delta

list

interface parameter description
LRPush key, values append values from the key array
LLPush key, values append values to the front of the key array
LPop keys, values remove all values elements from the key array
LRange key, offset, limit traverse keys in array order, starting at 0. If offset = -1, traverse from back to front
LExist key, values determine if values exist in the key array
LDel key delete a key array
LCount key returns the number of elements in the key array, the time complexity is high, not recommended
LMembers key traverse keys in array order. high time complexity, not recommended

set

interface parameter description
SPush key, values add values to the key collection
SPop keys, values remove all values elements from the key collection
SExist key, values determine whether values exist in the key set
SDel key delete a key set
SCount key returns the number of elements in the key set, the time complexity is high, not recommended
SMembers key iterate over keys by value size. High time complexity, not recommended

sorted set

interface parameter description
ZPush key, tuples add the values to the key sorted set, sorted by tuple.score from small to large
ZPop keys, values removes all values elements from the key sorted set
ZRange key, offset, limit according to the score size, iterate over the keys from small to large. If offset = -1, start traversing by score from large to small
ZExist key, values determine whether values exist in the key sorted set
ZDel key delete a key sorted set
ZCount key returns the number of elements in the sorted set of key, the time complexity is high, not recommended
ZMembers key according to the score size, iterate over the keys from small to large. High time complexity, not recommended

bloom filter

interface parameter description
BFCreate key, n, p create bloom filter, n is the number of elements, p is the false positive rate
BFDel key delete a key bloom filter
BFAdd key, values add values to bloom filter. An exception will be thrown when the bloom filter is not created
BFExist key, values determine whether values exist in the key bloom filter

hyper log log

interface parameter description
HLLCreate key create hyper log log
HLLDel key delete a hyper log log
HLLAdd key, values add values to hyper log log. An exception will be thrown when the hyper log log is not created
HLLCount key get the number of deduplicated elements of a hyper log log

bitset

interface parameter description
BSDel key delete a bitset
BSSetRange key, start, end, value set key [start, end) range bit to value
BSMSet key, bits, value set key bits to value
BSGetRange key, start, end get the bits in the range of key [start, end)
BSMGet key, bits get the bit of key bits
BSMCount key get the number of key bit = 1
BSMCountRange key, start, end get the number of key [start, end) bit = 1

map

interface parameter description
MPush key, pairs add pairs KV pairs to the key map
MPop key, keys remove all keys elements in the key map
MExist key, keys determine whether keys exist in the key map
MDel key delete a map
MCount key returns the number of elements in the key map, the time complexity is high, not recommended
MMembers key iterates over pairs by pair.key size. High time complexity, not recommended

geo hash

interface parameter description
GHCreate key, precision create a geo hash, precision stands for precision.
GHDel key delete a geo hash
GHAdd key, points add the points to the geo hash, and the id in the point is the unique identifier
GHPop key, ids delete points
GHGetBoxes key, point returns a list of points in the same box as a point in the key geo hash, sorted by distance from small to large
GHGetNeighbors key, point returns the list of points closest to the point in the key geo hash, sorted by distance from small to large
GHCount key returns the number of elements in the key geo hash, the time complexity is high, not recommended
GHMembers key returns a list of all points in the key geo hash. High time complexity, not recommended

page

interface parameter description
PList dataType, key, offset, limit query an existing element under a dataType. If key is not empty, get the element of key under this dataType.

pub sub

interface parameter description
Subscribe topic subscribe to a topic
Publish topic, payload post a payload to a topic

cluster

interface parameter description
CInfo Get node information in the cluster

Monitor

Install docker version grafana, prometheus (skipable)

Config grafana

The final effect can refer to: grafana diagram of performance test


Configuration parameters

parameter description default value
store.engine Storage engine, optional pebble, level, badger pebble
store.path storage directory ./master/db/
server.grpc_port grpc port 10000
server.http_port http port,for use by prometheus 11000
server.rate qps per second limit 30000
cluster.path directory where raft logs are stored ./master/raft
cluster.node_id the node id identified by the raft protocol, which must be unique 1
cluster.address address for raft correspondence 127.0.0.1:12000
cluster.master the address of the master node in the existing cluster is added through the http[http_port] interface exposed by the master node. If it is a new cluster, it is empty.
cluster.timeout raft protocol apply timeout, the unit is ms 10000
cluster.join as a slave node, whether to join the master node; it needs to be set to true for the first time to join, and false to start again after joining false

SDB principle - storage engine selection

The core problem of the SDB project is the problem of the data storage scheme.

First, we can't write a storage engine by hand. This is too much work and unreliable. We have to
find a storage solution suitable for SDB positioning in open source projects.

SDB requires a storage engine that can provide high-performance read and write capabilities.
Commonly used stand-alone storage engine solutions include B+ tree, LSM tree, and B tree.

There is also a background, golang performs very well in cloud native, and its performance is
comparable to C language, and the development efficiency is also high, so SDB prefers to use pure
golang for development.

So now the problem becomes: it is more difficult to find a storage engine developed in pure golang
version. After collecting a series of materials, I found the following open source solutions:

On the whole, the three storage engines golangdb, badger, and pebble are all very good.

In order to be compatible with these three storage engines, SDB Provides an
abstract interface
, and then adapt to the three storage engines.

The principle of SDB - data structure design

SDB has solved the problem of data storage through the previous three storage engines. But how to
support rich data structures on KV's storage engine?

Taking pebble as an example, first of all pebble provides the following interface capabilities:

  • set(k, v)
  • get(k)
  • del(k)
  • batch
  • iterator

Next, I take the support of the List data structure as an example to analyze how SDB supports List
through the pebble storage engine.

The List data structure provides the following interfaces: LRPush, LLPush, LPop, LExist, LRange,
LCount.

If the key of a List is: [hello], and the list elements of the List are: [aaa, ccc, bbb], then each
element of the List is stored in pebble as:

pebble key pebble value
l/hello/{unique_ordering_key1} aaa
l/hello/{unique_ordering_key2} ccc
l/hello/{unique_ordering_key3} bbb

The pebble key generation strategy for List elements:

  • data structure prefix: List is prefixed with l characters, Set is prefixed with s...
  • list key part: the key of List is hello
  • unique_ordering_key: The generation method is implemented by the snowflake algorithm, which guarantees local auto-increment
  • pebble value part: the real content of the List element, such as aaa, ccc, bbb

Why is the insertion order of List guaranteed?

This is because pebble is an implementation of LSM, which uses lexicographical ordering of keys
internally. In order to guarantee the insertion order, SDB adds unique_ordering_key to the pebble
key As the basis for sorting, thus ensuring the insertion order.

With the pebble key generation strategy, everything becomes easier. Let's look at the core logic of
LRPush, LLPush, LPop, and LRange:

LRPush

func LRPush(key []byte, values [][]byte) (bool, error) {
    batchAction := store.NewBatchAction()
    defer batchAction.Close()

    for _, value := range values {
        batchAction.Set(generateListKey(key, util.GetOrderingKey()), value)
    }

    return batchAction.Commit()
}
Enter fullscreen mode Exit fullscreen mode

LLPush

The logic of LLPush is very similar to that of LRPush. The difference is that as long as
{unique_ordering_key} is negative, it becomes the minimum value. In order to ensure that the values
are internally ordered, we have to - index. The logic is as follows:

func LLPush(key []byte, values [][]byte) (bool, error) {
    batch := store.NewBatch()
    defer batch.Close()

    for i, value := range values {
        batch.Set(generateListKey(key, -(math.MaxInt64 - util.GetOrderingKey())), value)
    }

    return batch.Commit()
}
Enter fullscreen mode Exit fullscreen mode

LPop

When writing to pebble, the key is generated through the unique_ordering_key scheme. Unable to find
elements of List in pebble directly in pebble key. When deleting an element, it is necessary to
traverse all elements of the List, find the element whose value = value to be deleted, and then
delete it. The core logic is as follows:

func LPop(key []byte, values [][]byte) (bool, error) {
    batchAction := store.NewBatchAction()
    defer batchAction.Close()

    store.Iterate(&store.IteratorOption{Prefix: generateListPrefixKey(key)},
        func(key []byte, value []byte) {
            for i := range values {
                if bytes.Equal(values[i], value) {
                    batchAction.Del(key)
                }
            }
        })

    return batchAction.Commit()
}
Enter fullscreen mode Exit fullscreen mode

LRange

Similar to delete logic, through iterator The interface is
traversed. Additional support for reverse iteration is here
Allow offset to pass in -1, which means to iterate from the back.

func LRange(key []byte, offset int32, limit uint32) ([][]byte, error) {
    index := int32(0)
    res := make([][]byte, limit)
    store.Iterate(&engine.PrefixIteratorOption{
        Prefix: generateListPrefixKey(key), Offset: offset, Limit: limit},
        func(key []byte, value []byte) {
            res[index] = value
            index++
        })
    return res[0:index], nil
}
Enter fullscreen mode Exit fullscreen mode

The above implements the support for the List data structure.

Other data structures are generally similar in logic, among
which sorted_set
more complicated. You can check it yourself.

LPop optimization

Smart people can see that the logic of LPop consumes a lot of performance in the case of a large
amount of data. It is because we cannot know the key corresponding to the value in the storage
engine, and we need to load all the elements in the List After coming out, judge one by one before
deleting.

In order to reduce time complexity and improve performance. Take List: [hello] -> [aaa, ccc, bbb] as
an example. The storage model will be changed to the following:

Positive row index structure [unchanged]:

pebble key pebble value
l/hello/{unique_ordering_key1} aaa
l/hello/{unique_ordering_key2} ccc
l/hello/{unique_ordering_key3} bbb

Secondary index structure

pebble key pebble value
l/hello/aaa/{unique_ordering_key1} aaa
l/hello/ccc/{unique_ordering_key2} ccc
l/hello/bbb/{unique_ordering_key3} bbb

With this auxiliary index, we can determine whether there is an element of a certain value in the
List by means of prefix retrieval. This reduces time complexity and improves performance. It is also
necessary to write the auxiliary index when writing the element, so the core logic will be changed
to:

func LRPush(key []byte, values [][]byte) (bool, error) {
    batch := store.NewBatch()
    defer batch.Close()

    for _, value := range values {
        id := util.GetOrderingKey()
        batch.Set(generateListKey(key, id), value)
        batch.Set(generateListIdKey(key, value, id), value)
    }

    return batch.Commit()
}

func LLPush(key []byte, values [][]byte) (bool, error) {
    batch := store.NewBatch()
    defer batch.Close()

    for i, value := range values {
        id := -(math.MaxInt64 - util.GetOrderingKey())
        batch.Set(generateListKey(key, id), value)
        batch.Set(generateListIdKey(key, value, id), value)
    }

    return batch.Commit()
}

func LPop(key []byte, values [][]byte) (bool, error) {
    batch := store.NewBatch()
    defer batch.Close()

    for i := range values {
        store.Iterate(&engine.PrefixIteratorOption{Prefix: generateListIdPrefixKey(key, values[i])},
            func(storeKey []byte, storeValue []byte) {
                if bytes.Equal(storeValue, values[i]) {
                    batch.Del(storeKey)

                    infos := strings.Split(string(storeKey), "/")
                    id, _ := strconv.ParseInt(infos[len(infos)-1], 10, 64)
                    batch.Del(generateListKey(key, id))
                }
            })
    }

    return batch.Commit()
}
Enter fullscreen mode Exit fullscreen mode

The principle of SDB - the mapping of relational model to KV model

With the above solution, you probably know how the KV storage engine supports data structures. But
this approach is crude and cannot be generalized.

Referring to TiDB Design, SDB has designed a
relational model to KV structure.

In SBD, data is constructed from Collections and Rows. in:

  • Collection
    A table similar to a database is a logical concept. Each dataType (such as List) corresponds to a
    Collection. A Collection contains multiple Rows.

  • A Row contains unique keys: key, id, value, indexes, is the data that is actually stored in KV
    storage
    . Each row row has rowKey as the unique value, rowKey = {dataType} + {key} + {id}

  • Each row contains N indexes, each index has indexKey as a unique value, indexKey
    = {dataType} + {key} + idx_{indexName} + {indexValue} + {id}

Take ListCollection as an example, the key of the List is [l1], assuming that the Collection has 4
rows of Row, and each row of Row has the index of value and score

Then the Row structure of each row is as follows:

 { {key: l1}, {id: 1.1}, {value: aaa}, {score: 1.1}, indexes: [ {name: "value", value: aaa}, {name: "score", value: 1.1} ] }
 { {key: l1}, {id: 2.2}, {value: bbb}, {score: 2.2}, indexes: [ {name: "value", value: bbb}, {name: "score", value: 2.2} ] }
 { {key: l1}, {id: 3.3}, {value: ccc}, {score: 3.3}, indexes: [ {name: "value", value: ccc}, {name: "score", value: 3.3} ] } 
 { {key: l1}, {id: 4.4}, {value: aaa}, {score: 4.4}, indexes: [ {name: "value", value: aaa}, {name: "score", value: 4.4} ] }
Enter fullscreen mode Exit fullscreen mode

Take Row with id = 1.1 as an example, dataType = 1, rowKey = 1/l1/1.1, valueIndexKey
= 1/l1/idx_value/aaa/1.1, scoreIndexKey = 1/l1/idx_score/1.1/1.1 The written data is:

    rowKey: 1/l1/1.1 -> { {key: l1}, {id: 1.1}, {value: aaa}, {score: 1.1}, indexes: [ {name: "value", value: aaa}, {name: "score", value: 1.1} ] }
    valueIndexKey: 1/l1/idx_value/aaa/1.1, -> 1/l1/1.1
    scoreIndexKey: 1/l1/idx_score/1.1/1.1 -> 1/l1/1.1
Enter fullscreen mode Exit fullscreen mode

In this way, the data structure, KV storage, and relational model are connected.


SDB principle - communication protocol scheme

After solving the problems of storage and data structure, SDB faces the problem of the [last mile]
which is the choice of communication protocol.

SDB is positioned to support multiple languages, so it is necessary to choose a communication
framework that supports multiple languages.

grpc is a very good choice. You only need to use the SDB proto file to automatically generate
clients in various languages through the protoc command line tool, which solves the problem of
developing different clients.


SDB principle - master-slave architecture

After solving all the above problems, SDB is already a reliable stand-alone NoSQL database.

In order to further optimize SDB, a master-slave architecture has been added to SDB.

There are three options for the raft protocol under the golang language, namely:

In order to support the Chinese project, dragonboat was chosen. PS: Thank you, Chinese!

After using the raft protocol, the throughput of the entire cluster drops a little, but with the horizontally expanded read capability, the benefits are still obvious.


Version update record

v3.0.0

  • commit Implemented master-slave architecture based on raft

v1.7.0

  • commit Use sharding to store bitset, bitset no longer needs to be initialized, with the function of [automatic expansion]

Thanks to the power of open source, I will not list them all here, please move to go.mod

Discussion (6)

Collapse
femolacaster profile image
femolacaster

valueIndexKey: 1/l1/idx_value/aaa/1.1, -> 1/l1/1.1
scoreIndexKey: 1/l1/idx_score/1.1/1.1 -> 1/l1/1.1

Why is there a comma before the arrow for valueIndexKey and not for scoreIndexKey? Typo?

Great thinking by the way! Love the concept behind the database.

Collapse
yemingfeng profile image
yemingfeng Author

Sorry, this is a typo, I'll fix it shortly.

Collapse
peteole profile image
Ole Petersen

Thanks for sharing it here! Super cool that there still is something going on in database development! Really great that it is distributed!

Collapse
yemingfeng profile image
yemingfeng Author

Yes, but this project is still relatively primitive and needs to be iterated slowly.

Collapse
ryan4yin profile image
ryan4yin | thiscute.eth • Edited on

this is really awesome! and there is a typo: 'In SBD, data is constructed from Collections and Rows.' It should be 'SDB'.

Collapse
yemingfeng profile image
yemingfeng Author

感谢。现在去修改。