Improve comments

alexander-akhmetov · May 30, 2021 · fefc9dc · fefc9dc
1 parent 757ece5
commit fefc9dc
Show file tree

Hide file tree

Showing 24 changed files with 245 additions and 282 deletions.
diff --git a/README.md b/README.md
@@ -4,9 +4,9 @@
 
 A simple key-value storage.
 
-It was created for learning purposes: I wanted to know Go more and create my own small database.
+It was created for learning purposes: I wanted to learn a bit of Go and create my own small database.
 
-Not for production usage. :)
+Not intended for production use. :)
 
 Implemented storage types:
 
@@ -17,7 +17,7 @@ Implemented storage types:
 
 ## Usage
 
-By default it uses `lsmt.Storage`, but you can change it in the [main.go](db/main.go).
+By default it uses `lsmt.Storage`, but you can change this in the [main.go](db/main.go).
 
 ### Command-line interface
 
@@ -68,36 +68,29 @@ func main() {
 }
 ```
 
-More about all this configuration options you can read in the `lsmt.Storage` section below.
+More information about all these configuration options can be found in the `lsmt.Storage` section below.
 
 ## Internals
 
-The database supports four different storage types.
+The database supports different storage types.
 
 ### memory.Storage
 
-It's a simple hash map which holds all in the memory. Nothing else :)
+It's a simple hash map that holds everything in memory.
 
 ### file.Storage
 
-It stores all information in the file. When you add a new entry, it just appends key and value to the file. So it's very fast to add new information.
-But when you are trying to get the key, it scans the whole file (and from the beginning, not from the end) to find the latest key. So reading is slow.
+It stores all information in a file. When you add a new entry, it simply appends the key and value to the file. So it's very fast to add new information. However, when you try to retrieve a key, it scans the entire file (starting from the beginning, not the end) to find the latest key. Therefore, reading is slow.
 
 ### indexedfile.Storage
 
-It is a `FileStorage` with a simple index (hash map). When you add a new key, it saves offset in bytes to the map in the memory. To process `get` command it checks the index, finds offset in bytes and reads only a piece of the file. Writing and reading are fast, but you need a lot of memory to keep all keys in it.
+This is a FileStorage with a simple index (hash map). When you add a new key, it saves the offset in bytes to the map in memory. To process the get command, it checks the index, finds the offset in bytes, and reads only a piece of the file. Writing and reading are fast, but you need a lot of memory to keep all keys in it.
 
 ### lsmt.Storage
 
-The most interesting part of this project. :)
+It stores all data in sorted string tables (SSTables), which are essentially binary files. It supports sparse indexes, so you don't need a lot of memory to store all your keys like in indexedfile.Storage.
 
-It's something similar to Google's LevelDB or Facebook's RocksDB.
-It keeps all data in sorted string tables (SSTable) which are basically binary files.
-Supports sparse indexes, so you don't need a lot of memory to store all your keys like in `indexedfile.Storage`.
-
-But it will be slower than `indexedfile.Storage`, because it uses a red-black tree to store sparse index and it checks all SSTables when you retrieve a value because it can't say that it doesn't have this key without checking the SSTables on a disk.
-
-To make it faster in this situation, we can use a Bloom filter.
+However, it will be slower than indexedfile.Storage because it uses a red-black tree to store a sparse index and checks all SSTables when you retrieve a value. This is because it can't determine whether it has this key without checking the SSTables on disk. It could probably use a Bloom filter for that.
 
 ```none
                      +------------+
@@ -144,7 +137,7 @@ Main parts:
 * Flush queue (list of memtables)
 * Flusher (dumps a memtable to a disk)
 * SSTables storage (main storage for the data)
-* Compaction (background process to remove old keys which were updated)
+* Compaction (background process to remove old keys that were updated)
 
 #### GET process
 
@@ -153,7 +146,7 @@ Main parts:
 3. Check SSTables
 
 It checks all these parts in this order to be sure that it returns the latest version of the key.
-Each SSTable has its own index (red-black tree). It can be sparse: it will not keep each key-offset pair in the index,
+Each SSTable has its own index. It can be sparse: it will not keep each key-offset pair in the index,
 but it will store keys every N bytes. We can do this because SSTable files are sorted and read-only. When we need to find a
 key, we find its offset or closest minimal to this key. After we can load part of the file into memory and find the value for the key.
 
@@ -164,18 +157,17 @@ key, we find its offset or closest minimal to this key. After we can load part o
 
 #### Flush
 
-When memtable becomes bigger than some threshold, core component puts it to the flush queue and initializes a new memtable.
-Flusher is a background process which checks the queue and dumps memtables as sstables to a disk.
+When the memtable becomes bigger than some threshold, the core component puts it to the flush queue and initializes a new memtable. 
+The flusher is a background process that checks the queue and dumps memtables as SSTables to disk.
 
 #### Compaction
 
-It's a periodical background process.
-It merges small SSTable files into a bigger one and removes old key-value pairs which can be removed.
+It's a periodical background process that merges small SSTable files into a larger one and removes old key-value pairs that can be removed.
 
 #### SSTables storage
 
-It's a disk storage. On start-up time `mdb` checks this folder and registers all files and builds indexes.
-Files are read-only, `mdb` never change them. It can only merge them into a big one file, but without modifying old files.
+It's a disk storage. During start-up, mdb checks this folder, registers all files, and builds indexes. 
+Files are read-only; mdb never changes them. It can only merge them into a larger file, but without modifying old files.
 
 #### File format
 
@@ -193,13 +185,13 @@ entry_type:
 ##### Configuration
 
 ```none
-CompactionEnabled     bool  // you can disable background compaction process
-MinimumFilesToCompact int  // how many files does it need to start compaction process
+CompactionEnabled     bool  // Enable/disable the background compaction process
+MinimumFilesToCompact int   // How many files are needed to start the compaction process
 MaxMemtableSize       int64 // max size for memtable
-MaxCompactFileSize    int64 // do not compact files bigger than this size
-SSTableReadBufferSize int  // read buffer size: database will build indexes each
-                           // <SSTableReadBufferSize> bytes. If you want to have non-sparse index
-                           // put 1 here
+MaxCompactFileSize    int64 // Do not compact files bigger than this size
+SSTableReadBufferSize int   // Read buffer size: the database will build indexes every
+                            // <SSTableReadBufferSize> bytes. If you want to have a non-sparse index
+                            // put 1 here
 ```
 
 #### performance test mode

diff --git a/cmd/perfomance.go b/cmd/perfomance.go
@@ -8,26 +8,26 @@ import (
 	"github.com/alexander-akhmetov/mdb/pkg"
 )
 
-// now it's better to start performance test with external output filtering:
+// It's better to start the performance test with external output filtering:
 //
 // go run db/*.go -p 2>&1 |  grep -v 'DEBUG'
 //
-// Otherwise it will print a lot of additional log informarion: per each inserted key.
-// Later I will add log filtering to the performance test.
+// Otherwise, it will print a lot of additional log information. For example, a log line for each inserted key.
+// TODO: Add log filtering to the performance test.
 
-// with this counter we will calculate how many
-// inserts were made for the previous second
+// With this counter, we will calculate how many
+// inserts were made in the previous second.
 var counter = 0
 
-// This is an infinite loop which just writes random keys to the storage
-// and every second it prints output: how many keys were inserted for the previous second,
-// for example:
+// This is an infinite loop that writes random keys to the storage
+// and prints the output every second: how many keys were inserted.
+// For example:
 //
 // 2018/08/17 07:21:39.010602 Inserted: 13141
 // 2018/08/17 07:21:40.010651 Inserted: 13169
 //
-// It doesn't check are the inserted values valid or not.
-// It just inserts key as fast as it can, nothing else.
+// It doesn't check whether the inserted values are valid.
+// It simply inserts keys as fast as possible, nothing more.
 func performanceTest(db mdb.Storage, maxKeys int, checkKeys bool) {
 	go printStatsEverySecond()
 
@@ -57,8 +57,8 @@ func performanceTest(db mdb.Storage, maxKeys int, checkKeys bool) {
 
 }
 
-// printStatsEverySecond prints counter value and sets it to 0,
-// then it sleeps for a second and does the same, again and again :)
+// printStatsEverySecond prints the counter value, resets it,
+// sleeps for a second, and repeats the process.
 func printStatsEverySecond() {
 	for true == true {
 		log.Printf("%sInserted: %v%s", colorGreen, counter, colorNeutral)

diff --git a/pkg/base.go b/pkg/base.go
@@ -1,5 +1,4 @@
 // Package mdb is a simple key-value database
-// license that can be found in the LICENSE file.
 package mdb
 
 import (

diff --git a/pkg/file/file.go b/pkg/file/file.go
@@ -10,20 +10,20 @@ import (
 
 var writeMutex = &sync.Mutex{}
 
-// Storage holds all information in a file
+// Storage holds all information in a file.
 type Storage struct {
 	Filename string
 }
 
-// Set saves given key and value
+// Set saves the given key and value.
 func (s *Storage) Set(key string, value string) {
 	writeMutex.Lock()
 	defer writeMutex.Unlock()
 	strToAppend := fmt.Sprintf("%s;%s\n", key, value)
 	utils.AppendToFile(s.Filename, strToAppend)
 }
 
-// Get returns a value by given key and boolean indicator that key exists
+// Get returns a value for a given key and a boolean indicator of whether the key exists.
 func (s *Storage) Get(key string) (string, bool) {
 	line, found := utils.FindLineByKeyInFile(s.Filename, key)
 	if found {
@@ -33,7 +33,7 @@ func (s *Storage) Get(key string) (string, bool) {
 	return "", false
 }
 
-// Start initializes Storage and creates file if needed
+// Start initializes Storage and creates a file if needed.
 func (s *Storage) Start() {
 	log.Println("[INFO] Starting file storage")
 	utils.StartFileDB()

diff --git a/pkg/file/file_test.go b/pkg/file/file_test.go
@@ -31,7 +31,7 @@ func TestFileStorage(t *testing.T) {
 
 	assert.Equal(t, expContent, content, "File content wrong")
 
-	// now let's read content from this file
+	// Let's read the content of this file
 
 	value, exists := storage.Get(testKey2)
 	assert.Equal(t, testValue2, value, "Wrong value")

diff --git a/pkg/indexed_file/indexed_file.go b/pkg/indexed_file/indexed_file.go
@@ -13,13 +13,13 @@ import (
 
 var writeMutex = &sync.Mutex{}
 
-// Storage holds all in a file
+// Storage holds data in a file
 type Storage struct {
 	Filename string
 	index    map[string]int64
 }
 
-// Set saves given key and value
+// Set saves the given key and value.
 func (s *Storage) Set(key string, value string) {
 	writeMutex.Lock()
 	defer writeMutex.Unlock()
@@ -30,7 +30,7 @@ func (s *Storage) Set(key string, value string) {
 	utils.AppendToFile(s.Filename, strToAppend)
 }
 
-// Get returns a value by given key and boolean indicator that key exists
+// Get returns a value for a given key and a boolean indicator of whether the key exists.
 func (s *Storage) Get(key string) (string, bool) {
 	var line string
 	if offset, ok := s.index[key]; ok {
@@ -42,7 +42,7 @@ func (s *Storage) Get(key string) (string, bool) {
 	return "", false
 }
 
-// Start initializes Storage, creates file if needed and rebuilds index
+// Start initializes the Storage, creates the file if needed and rebuilds the index.
 func (s *Storage) Start() {
 	log.Println("[INFO] Starting indexed file storage")
 	utils.StartFileDB()
@@ -52,8 +52,8 @@ func (s *Storage) Start() {
 	log.Println("[DEBUG] Storage: started")
 }
 
-// rebuildIndex reads all file and builds initial index
-// it will be very slow for large files
+// rebuildIndex reads the file and builds an initial index.
+// It is slow for large files.
 func (s *Storage) rebuildIndex() {
 	s.index = map[string]int64{}
 

diff --git a/pkg/indexed_file/indexed_file_test.go b/pkg/indexed_file/indexed_file_test.go
@@ -31,7 +31,7 @@ func TestIndexedFileStorage(t *testing.T) {
 
 	assert.Equal(t, expContent, content, "File content wrong")
 
-	// now let's read content from this file
+	// Let's read the content of this file
 
 	value, exists := storage.Get(testKey)
 	assert.Equal(t, testValue, value, "Wrong value")
@@ -64,12 +64,12 @@ func TestIndexedFileStorageIndexBuild(t *testing.T) {
 	storage.Set(testKey, testValue)
 	storage.Set(testKey2, testValue2)
 
-	// clean index and check it
+	// clean the index and check it
 	storage.index = map[string]int64{}
 	assert.Equal(t, int64(0), storage.index[testKey], "index must be empty")
 	assert.Equal(t, int64(0), storage.index[testKey2], "index must be empty")
 
-	// now let's build index again
+	// build the index again
 	storage.Stop()
 	storage.Start()
 	assert.Equal(t, int64(0), storage.index[testKey], "wrong index offset")

diff --git a/pkg/lsmt/binfile.go b/pkg/lsmt/binfile.go
@@ -10,13 +10,12 @@ import (
 
 const filePermissions = 0600
 
-// binScanner scans binary file and splits data into
-// entry.DBEntry automatically
+// binScanner scans a binary file and automatically splits data into entry.DBEntry.
 type binScanner struct {
 	*bufio.Scanner
 }
 
-// appendBinaryToFile writes key-value in binary format
+// appendBinaryToFile writes key-value pairs in binary format.
 func appendBinaryToFile(filename string, entry *entry.DBEntry) {
 	// todo: move to entry
 	file, err := os.OpenFile(filename, os.O_APPEND|os.O_WRONLY, filePermissions)

diff --git a/pkg/lsmt/binfile_test.go b/pkg/lsmt/binfile_test.go
@@ -12,7 +12,7 @@ import (
 
 func TestAppendBinaryToFile(t *testing.T) {
 	// test appendBinaryToFile
-	// file must exists
+	// file must exist
 	testutils.SetUp()
 	defer testutils.Teardown()
 
@@ -30,11 +30,11 @@ func TestAppendBinaryToFile(t *testing.T) {
 		Value: value,
 	})
 
-	// check that keys are added
-	expKeysMap := [][2]string{[2]string{key, value}}
+	// check that the keys are added
+	expKeysMap := [][2]string{{key, value}}
 	testutils.AssertKeysInFile(t, filename, expKeysMap)
 
-	// check binary content
+	// check the binary content
 	bytes := testutils.ReadFileBinary(filename)
 	expBytes := []byte{0x0, 0x0, 0x0, 0x0, 0x8, 0x0, 0x0, 0x0, 0xa, 0x74, 0x65, 0x73, 0x74, 0x2d, 0x6b, 0x65, 0x79, 0x74, 0x65, 0x73, 0x74, 0x2d, 0x76, 0x61, 0x6c, 0x75, 0x65}
 	assert.Equal(t, expBytes, bytes)
@@ -68,7 +68,7 @@ func TestNewBinFileScanner(t *testing.T) {
 	readBufferSize := 1024
 	scanner := newBinFileScanner(f, readBufferSize)
 
-	// we have only one key-value in the file
+	// we have only one key-value pair in the file
 	e, err := scanner.ReadEntry()
 	assert.Nil(t, err)
 	assert.Equal(t, &entry.DBEntry{Key: key, Value: value}, e)

diff --git a/pkg/lsmt/compaction.go b/pkg/lsmt/compaction.go
@@ -13,9 +13,9 @@ var compactionMutex = &sync.Mutex{}
 
 const ssTableReadBufferSize = 4096
 
-// compact finds N SSTables in the workDir,
-// which are can be merged together (they are must be smaller some limit)
-// and merges them into a one bigger SSTable, then it removes old files
+// compact finds N SSTables in the workDir
+// that can be merged together (they must be smaller than some limit)
+// and merges them into one bigger SSTable. Then it removes the old files.
 func compact(workDir string, tmpDir string, minimumFilesToCompact int, maxCompactFileSize int64) (string, string, string, bool) {
 	compactionMutex.Lock()
 	defer compactionMutex.Unlock()
@@ -35,7 +35,7 @@ func compact(workDir string, tmpDir string, minimumFilesToCompact int, maxCompac
 	return fFile, sFile, tmpFilePath, true
 }
 
-// merge merges files into a one
+// merge merges files into one.
 func merge(fFile string, sFile string, mergeTo string) {
 	log.Printf("[DEBUG] Merging %s + %s => %s", fFile, sFile, mergeTo)
 
@@ -58,7 +58,7 @@ func merge(fFile string, sFile string, mergeTo string) {
 	sEntry, _ := secondScanner.ReadEntry()
 
 	for true == true {
-		// compare files line by line and add to the new file only last keys
+		// Compare files line by line and add only the latest keys to the new file.
 		for (sEntry.Key > fEntry.Key && fEntry.Key != "") || (fEntry.Key != "" && sEntry.Key == "") {
 			appendBinaryToFile(mergeTo, fEntry)
 			fEntry, _ = firstScanner.ReadEntry()
@@ -67,8 +67,8 @@ func merge(fFile string, sFile string, mergeTo string) {
 		for (sEntry.Key <= fEntry.Key && sEntry.Key != "") || (fEntry.Key == "" && sEntry.Key != "") {
 			appendBinaryToFile(mergeTo, sEntry)
 			for sEntry.Key == fEntry.Key {
-				// if keys are equal, we need to read next first key too,
-				// otherwise we will save it again in this loop
+				// If keys are equal, we need to read the next first key too,
+				// otherwise we will save it again in this loop.
 				fEntry, _ = firstScanner.ReadEntry()
 			}
 			sEntry, _ = secondScanner.ReadEntry()
@@ -79,8 +79,8 @@ func merge(fFile string, sFile string, mergeTo string) {
 	}
 }
 
-// getTwoFilesToCompact returns paths to to files which we can merge
-// and boolean third argument which indicates can we merge files or not
+// getTwoFilesToCompact returns paths to two files that we can merge
+// and a boolean indicating whether we can merge the files or not.
 func getTwoFilesToCompact(dir string, minimumFilesToCompact int, maxFileSize int64) (string, string, bool) {
 	allFiles := listSSTables(dir)