Skip to content

Commit

Permalink
Improve comments
Browse files Browse the repository at this point in the history
  • Loading branch information
alexander-akhmetov committed May 30, 2021
1 parent 757ece5 commit fefc9dc
Show file tree
Hide file tree
Showing 24 changed files with 245 additions and 282 deletions.
54 changes: 23 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@

A simple key-value storage.

It was created for learning purposes: I wanted to know Go more and create my own small database.
It was created for learning purposes: I wanted to learn a bit of Go and create my own small database.

Not for production usage. :)
Not intended for production use. :)

Implemented storage types:

Expand All @@ -17,7 +17,7 @@ Implemented storage types:

## Usage

By default it uses `lsmt.Storage`, but you can change it in the [main.go](db/main.go).
By default it uses `lsmt.Storage`, but you can change this in the [main.go](db/main.go).

### Command-line interface

Expand Down Expand Up @@ -68,36 +68,29 @@ func main() {
}
```

More about all this configuration options you can read in the `lsmt.Storage` section below.
More information about all these configuration options can be found in the `lsmt.Storage` section below.

## Internals

The database supports four different storage types.
The database supports different storage types.

### memory.Storage

It's a simple hash map which holds all in the memory. Nothing else :)
It's a simple hash map that holds everything in memory.

### file.Storage

It stores all information in the file. When you add a new entry, it just appends key and value to the file. So it's very fast to add new information.
But when you are trying to get the key, it scans the whole file (and from the beginning, not from the end) to find the latest key. So reading is slow.
It stores all information in a file. When you add a new entry, it simply appends the key and value to the file. So it's very fast to add new information. However, when you try to retrieve a key, it scans the entire file (starting from the beginning, not the end) to find the latest key. Therefore, reading is slow.

### indexedfile.Storage

It is a `FileStorage` with a simple index (hash map). When you add a new key, it saves offset in bytes to the map in the memory. To process `get` command it checks the index, finds offset in bytes and reads only a piece of the file. Writing and reading are fast, but you need a lot of memory to keep all keys in it.
This is a FileStorage with a simple index (hash map). When you add a new key, it saves the offset in bytes to the map in memory. To process the get command, it checks the index, finds the offset in bytes, and reads only a piece of the file. Writing and reading are fast, but you need a lot of memory to keep all keys in it.

### lsmt.Storage

The most interesting part of this project. :)
It stores all data in sorted string tables (SSTables), which are essentially binary files. It supports sparse indexes, so you don't need a lot of memory to store all your keys like in indexedfile.Storage.

It's something similar to Google's LevelDB or Facebook's RocksDB.
It keeps all data in sorted string tables (SSTable) which are basically binary files.
Supports sparse indexes, so you don't need a lot of memory to store all your keys like in `indexedfile.Storage`.

But it will be slower than `indexedfile.Storage`, because it uses a red-black tree to store sparse index and it checks all SSTables when you retrieve a value because it can't say that it doesn't have this key without checking the SSTables on a disk.

To make it faster in this situation, we can use a Bloom filter.
However, it will be slower than indexedfile.Storage because it uses a red-black tree to store a sparse index and checks all SSTables when you retrieve a value. This is because it can't determine whether it has this key without checking the SSTables on disk. It could probably use a Bloom filter for that.

```none
+------------+
Expand Down Expand Up @@ -144,7 +137,7 @@ Main parts:
* Flush queue (list of memtables)
* Flusher (dumps a memtable to a disk)
* SSTables storage (main storage for the data)
* Compaction (background process to remove old keys which were updated)
* Compaction (background process to remove old keys that were updated)

#### GET process

Expand All @@ -153,7 +146,7 @@ Main parts:
3. Check SSTables

It checks all these parts in this order to be sure that it returns the latest version of the key.
Each SSTable has its own index (red-black tree). It can be sparse: it will not keep each key-offset pair in the index,
Each SSTable has its own index. It can be sparse: it will not keep each key-offset pair in the index,
but it will store keys every N bytes. We can do this because SSTable files are sorted and read-only. When we need to find a
key, we find its offset or closest minimal to this key. After we can load part of the file into memory and find the value for the key.

Expand All @@ -164,18 +157,17 @@ key, we find its offset or closest minimal to this key. After we can load part o

#### Flush

When memtable becomes bigger than some threshold, core component puts it to the flush queue and initializes a new memtable.
Flusher is a background process which checks the queue and dumps memtables as sstables to a disk.
When the memtable becomes bigger than some threshold, the core component puts it to the flush queue and initializes a new memtable.
The flusher is a background process that checks the queue and dumps memtables as SSTables to disk.

#### Compaction

It's a periodical background process.
It merges small SSTable files into a bigger one and removes old key-value pairs which can be removed.
It's a periodical background process that merges small SSTable files into a larger one and removes old key-value pairs that can be removed.

#### SSTables storage

It's a disk storage. On start-up time `mdb` checks this folder and registers all files and builds indexes.
Files are read-only, `mdb` never change them. It can only merge them into a big one file, but without modifying old files.
It's a disk storage. During start-up, mdb checks this folder, registers all files, and builds indexes.
Files are read-only; mdb never changes them. It can only merge them into a larger file, but without modifying old files.

#### File format

Expand All @@ -193,13 +185,13 @@ entry_type:
##### Configuration

```none
CompactionEnabled bool // you can disable background compaction process
MinimumFilesToCompact int // how many files does it need to start compaction process
CompactionEnabled bool // Enable/disable the background compaction process
MinimumFilesToCompact int // How many files are needed to start the compaction process
MaxMemtableSize int64 // max size for memtable
MaxCompactFileSize int64 // do not compact files bigger than this size
SSTableReadBufferSize int // read buffer size: database will build indexes each
// <SSTableReadBufferSize> bytes. If you want to have non-sparse index
// put 1 here
MaxCompactFileSize int64 // Do not compact files bigger than this size
SSTableReadBufferSize int // Read buffer size: the database will build indexes every
// <SSTableReadBufferSize> bytes. If you want to have a non-sparse index
// put 1 here
```

#### performance test mode
Expand Down
24 changes: 12 additions & 12 deletions cmd/perfomance.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,26 @@ import (
"github.com/alexander-akhmetov/mdb/pkg"
)

// now it's better to start performance test with external output filtering:
// It's better to start the performance test with external output filtering:
//
// go run db/*.go -p 2>&1 | grep -v 'DEBUG'
//
// Otherwise it will print a lot of additional log informarion: per each inserted key.
// Later I will add log filtering to the performance test.
// Otherwise, it will print a lot of additional log information. For example, a log line for each inserted key.
// TODO: Add log filtering to the performance test.

// with this counter we will calculate how many
// inserts were made for the previous second
// With this counter, we will calculate how many
// inserts were made in the previous second.
var counter = 0

// This is an infinite loop which just writes random keys to the storage
// and every second it prints output: how many keys were inserted for the previous second,
// for example:
// This is an infinite loop that writes random keys to the storage
// and prints the output every second: how many keys were inserted.
// For example:
//
// 2018/08/17 07:21:39.010602 Inserted: 13141
// 2018/08/17 07:21:40.010651 Inserted: 13169
//
// It doesn't check are the inserted values valid or not.
// It just inserts key as fast as it can, nothing else.
// It doesn't check whether the inserted values are valid.
// It simply inserts keys as fast as possible, nothing more.
func performanceTest(db mdb.Storage, maxKeys int, checkKeys bool) {
go printStatsEverySecond()

Expand Down Expand Up @@ -57,8 +57,8 @@ func performanceTest(db mdb.Storage, maxKeys int, checkKeys bool) {

}

// printStatsEverySecond prints counter value and sets it to 0,
// then it sleeps for a second and does the same, again and again :)
// printStatsEverySecond prints the counter value, resets it,
// sleeps for a second, and repeats the process.
func printStatsEverySecond() {
for true == true {
log.Printf("%sInserted: %v%s", colorGreen, counter, colorNeutral)
Expand Down
1 change: 0 additions & 1 deletion pkg/base.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
// Package mdb is a simple key-value database
// license that can be found in the LICENSE file.
package mdb

import (
Expand Down
8 changes: 4 additions & 4 deletions pkg/file/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,20 @@ import (

var writeMutex = &sync.Mutex{}

// Storage holds all information in a file
// Storage holds all information in a file.
type Storage struct {
Filename string
}

// Set saves given key and value
// Set saves the given key and value.
func (s *Storage) Set(key string, value string) {
writeMutex.Lock()
defer writeMutex.Unlock()
strToAppend := fmt.Sprintf("%s;%s\n", key, value)
utils.AppendToFile(s.Filename, strToAppend)
}

// Get returns a value by given key and boolean indicator that key exists
// Get returns a value for a given key and a boolean indicator of whether the key exists.
func (s *Storage) Get(key string) (string, bool) {
line, found := utils.FindLineByKeyInFile(s.Filename, key)
if found {
Expand All @@ -33,7 +33,7 @@ func (s *Storage) Get(key string) (string, bool) {
return "", false
}

// Start initializes Storage and creates file if needed
// Start initializes Storage and creates a file if needed.
func (s *Storage) Start() {
log.Println("[INFO] Starting file storage")
utils.StartFileDB()
Expand Down
2 changes: 1 addition & 1 deletion pkg/file/file_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ func TestFileStorage(t *testing.T) {

assert.Equal(t, expContent, content, "File content wrong")

// now let's read content from this file
// Let's read the content of this file

value, exists := storage.Get(testKey2)
assert.Equal(t, testValue2, value, "Wrong value")
Expand Down
12 changes: 6 additions & 6 deletions pkg/indexed_file/indexed_file.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ import (

var writeMutex = &sync.Mutex{}

// Storage holds all in a file
// Storage holds data in a file
type Storage struct {
Filename string
index map[string]int64
}

// Set saves given key and value
// Set saves the given key and value.
func (s *Storage) Set(key string, value string) {
writeMutex.Lock()
defer writeMutex.Unlock()
Expand All @@ -30,7 +30,7 @@ func (s *Storage) Set(key string, value string) {
utils.AppendToFile(s.Filename, strToAppend)
}

// Get returns a value by given key and boolean indicator that key exists
// Get returns a value for a given key and a boolean indicator of whether the key exists.
func (s *Storage) Get(key string) (string, bool) {
var line string
if offset, ok := s.index[key]; ok {
Expand All @@ -42,7 +42,7 @@ func (s *Storage) Get(key string) (string, bool) {
return "", false
}

// Start initializes Storage, creates file if needed and rebuilds index
// Start initializes the Storage, creates the file if needed and rebuilds the index.
func (s *Storage) Start() {
log.Println("[INFO] Starting indexed file storage")
utils.StartFileDB()
Expand All @@ -52,8 +52,8 @@ func (s *Storage) Start() {
log.Println("[DEBUG] Storage: started")
}

// rebuildIndex reads all file and builds initial index
// it will be very slow for large files
// rebuildIndex reads the file and builds an initial index.
// It is slow for large files.
func (s *Storage) rebuildIndex() {
s.index = map[string]int64{}

Expand Down
6 changes: 3 additions & 3 deletions pkg/indexed_file/indexed_file_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ func TestIndexedFileStorage(t *testing.T) {

assert.Equal(t, expContent, content, "File content wrong")

// now let's read content from this file
// Let's read the content of this file

value, exists := storage.Get(testKey)
assert.Equal(t, testValue, value, "Wrong value")
Expand Down Expand Up @@ -64,12 +64,12 @@ func TestIndexedFileStorageIndexBuild(t *testing.T) {
storage.Set(testKey, testValue)
storage.Set(testKey2, testValue2)

// clean index and check it
// clean the index and check it
storage.index = map[string]int64{}
assert.Equal(t, int64(0), storage.index[testKey], "index must be empty")
assert.Equal(t, int64(0), storage.index[testKey2], "index must be empty")

// now let's build index again
// build the index again
storage.Stop()
storage.Start()
assert.Equal(t, int64(0), storage.index[testKey], "wrong index offset")
Expand Down
5 changes: 2 additions & 3 deletions pkg/lsmt/binfile.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,12 @@ import (

const filePermissions = 0600

// binScanner scans binary file and splits data into
// entry.DBEntry automatically
// binScanner scans a binary file and automatically splits data into entry.DBEntry.
type binScanner struct {
*bufio.Scanner
}

// appendBinaryToFile writes key-value in binary format
// appendBinaryToFile writes key-value pairs in binary format.
func appendBinaryToFile(filename string, entry *entry.DBEntry) {
// todo: move to entry
file, err := os.OpenFile(filename, os.O_APPEND|os.O_WRONLY, filePermissions)
Expand Down
10 changes: 5 additions & 5 deletions pkg/lsmt/binfile_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import (

func TestAppendBinaryToFile(t *testing.T) {
// test appendBinaryToFile
// file must exists
// file must exist
testutils.SetUp()
defer testutils.Teardown()

Expand All @@ -30,11 +30,11 @@ func TestAppendBinaryToFile(t *testing.T) {
Value: value,
})

// check that keys are added
expKeysMap := [][2]string{[2]string{key, value}}
// check that the keys are added
expKeysMap := [][2]string{{key, value}}
testutils.AssertKeysInFile(t, filename, expKeysMap)

// check binary content
// check the binary content
bytes := testutils.ReadFileBinary(filename)
expBytes := []byte{0x0, 0x0, 0x0, 0x0, 0x8, 0x0, 0x0, 0x0, 0xa, 0x74, 0x65, 0x73, 0x74, 0x2d, 0x6b, 0x65, 0x79, 0x74, 0x65, 0x73, 0x74, 0x2d, 0x76, 0x61, 0x6c, 0x75, 0x65}
assert.Equal(t, expBytes, bytes)
Expand Down Expand Up @@ -68,7 +68,7 @@ func TestNewBinFileScanner(t *testing.T) {
readBufferSize := 1024
scanner := newBinFileScanner(f, readBufferSize)

// we have only one key-value in the file
// we have only one key-value pair in the file
e, err := scanner.ReadEntry()
assert.Nil(t, err)
assert.Equal(t, &entry.DBEntry{Key: key, Value: value}, e)
Expand Down
18 changes: 9 additions & 9 deletions pkg/lsmt/compaction.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ var compactionMutex = &sync.Mutex{}

const ssTableReadBufferSize = 4096

// compact finds N SSTables in the workDir,
// which are can be merged together (they are must be smaller some limit)
// and merges them into a one bigger SSTable, then it removes old files
// compact finds N SSTables in the workDir
// that can be merged together (they must be smaller than some limit)
// and merges them into one bigger SSTable. Then it removes the old files.
func compact(workDir string, tmpDir string, minimumFilesToCompact int, maxCompactFileSize int64) (string, string, string, bool) {
compactionMutex.Lock()
defer compactionMutex.Unlock()
Expand All @@ -35,7 +35,7 @@ func compact(workDir string, tmpDir string, minimumFilesToCompact int, maxCompac
return fFile, sFile, tmpFilePath, true
}

// merge merges files into a one
// merge merges files into one.
func merge(fFile string, sFile string, mergeTo string) {
log.Printf("[DEBUG] Merging %s + %s => %s", fFile, sFile, mergeTo)

Expand All @@ -58,7 +58,7 @@ func merge(fFile string, sFile string, mergeTo string) {
sEntry, _ := secondScanner.ReadEntry()

for true == true {
// compare files line by line and add to the new file only last keys
// Compare files line by line and add only the latest keys to the new file.
for (sEntry.Key > fEntry.Key && fEntry.Key != "") || (fEntry.Key != "" && sEntry.Key == "") {
appendBinaryToFile(mergeTo, fEntry)
fEntry, _ = firstScanner.ReadEntry()
Expand All @@ -67,8 +67,8 @@ func merge(fFile string, sFile string, mergeTo string) {
for (sEntry.Key <= fEntry.Key && sEntry.Key != "") || (fEntry.Key == "" && sEntry.Key != "") {
appendBinaryToFile(mergeTo, sEntry)
for sEntry.Key == fEntry.Key {
// if keys are equal, we need to read next first key too,
// otherwise we will save it again in this loop
// If keys are equal, we need to read the next first key too,
// otherwise we will save it again in this loop.
fEntry, _ = firstScanner.ReadEntry()
}
sEntry, _ = secondScanner.ReadEntry()
Expand All @@ -79,8 +79,8 @@ func merge(fFile string, sFile string, mergeTo string) {
}
}

// getTwoFilesToCompact returns paths to to files which we can merge
// and boolean third argument which indicates can we merge files or not
// getTwoFilesToCompact returns paths to two files that we can merge
// and a boolean indicating whether we can merge the files or not.
func getTwoFilesToCompact(dir string, minimumFilesToCompact int, maxFileSize int64) (string, string, bool) {
allFiles := listSSTables(dir)

Expand Down
Loading

0 comments on commit fefc9dc

Please sign in to comment.