|
1 | 1 | cabal-version: 3.4
|
2 | 2 | name: lsm-tree
|
3 | 3 | version: 0.1.0.0
|
4 |
| -synopsis: Log-structured merge-tree |
5 |
| -description: Log-structured merge-tree. |
| 4 | +synopsis: Log-structured merge-trees |
| 5 | +description: |
| 6 | + This package contains an efficient implementation of on-disk key–value storage, implemented as a log-structured merge-tree or LSM-tree. |
| 7 | + An LSM-tree is a data structure for key–value mappings, similar to "Data.Map", but optimized for large tables with a high insertion volume. |
| 8 | + It has support for: |
| 9 | + |
| 10 | + * Basic key–value operations, such as lookup, insert, and delete. |
| 11 | + * Range lookups, which efficiently retrieve the values for all keys in a given range. |
| 12 | + * Monoidal upserts (or \"mupserts\") which combine the stored and new values. |
| 13 | + * BLOB storage which assocates a large auxiliary BLOB with a key. |
| 14 | + * Durable on-disk persistence and rollback via named snapshots. |
| 15 | + * Cheap table duplication where all duplicates can be independently accessed and modified. |
| 16 | + * High-performance lookups on SSDs using I\/O batching and parallelism. |
| 17 | + |
| 18 | + This package exports two modules: |
| 19 | + |
| 20 | + * "Database.LSMTree.Simple" |
| 21 | + |
| 22 | + This module exports a simplified API which picks sensible defaults for a number of configuration parameters. |
| 23 | + |
| 24 | + It does not support mupserts or BLOBs, due to their unintuitive interaction, see [Mupserts and BLOBs](#mupsertsandblobs). |
| 25 | + |
| 26 | + If you are looking at this package for the first time, it is strongly recommended that you start by reading this module. |
| 27 | + |
| 28 | + * "Database.LSMTree" |
| 29 | + |
| 30 | + This module exports the full API. |
| 31 | + |
| 32 | + == Mupserts and BLOBs #mupsertsandblobs# |
| 33 | + |
| 34 | + The interaction between mupserts and BLOBs is unintuitive. |
| 35 | + A mupsert updates the value associated with the key by combining the old and new value with a user-specified function. |
| 36 | + However, this does not apply to any BLOB value associated with the key, which is simply overwritten by the new BLOB value. |
| 37 | + |
| 38 | + == Portability #portability# |
| 39 | + |
| 40 | + * This package only supports 64-bit, little-endian systems. |
| 41 | + * On Windows, the package has only been tested with NTFS filesystems. |
| 42 | + * On Linux, executables using this package, including test and benchmark suites, must be compiled with the [@-threaded@](https://downloads.haskell.org/ghc/latest/docs/users_guide/phases.html#ghc-flag-threaded) RTS option enabled. |
| 43 | + |
| 44 | + == Concurrency #concurrency# |
| 45 | + |
| 46 | + LSM-trees can be used concurrently, but with a few restrictions: |
| 47 | + |
| 48 | + * Each session locks its session directory. |
| 49 | + This means that a database cannot be accessed from different processes at the same time. |
| 50 | + * Tables can be used concurrently and concurrent use of read operations such as lookups is determinstic. |
| 51 | + However, concurrent use of write operations such as insert or delete with any other operation results in a race condition. |
| 52 | + |
| 53 | + == Performance #performance# |
| 54 | + |
| 55 | + The worst-case time and space complexities are given in [big-O notation](http://en.wikipedia.org/wiki/Big_O_notation). |
| 56 | + The time cost of operations on LSM-trees is generally dominated by the number of disk I\/O actions. |
| 57 | + As such, the worst-case complexity of basic operations refer to the number of disk I\/O actions. |
| 58 | + |
| 59 | + TODO: Describe the time complexity of the basic operations. |
| 60 | + |
| 61 | + The in-memory size of an LSM-tree is described in terms of the variable \(n\), which refers to the number of /physical/ database entries. |
| 62 | + A /physical/ database entry is any key–operation pair, e.g., @Insert k v@ or @Delete k@, whereas a /logical/ database entry is determined by all physical entries with the same key. |
| 63 | + |
| 64 | + The worst-case in-memory size of an LSM-tree is \(O(n)\). |
| 65 | + |
| 66 | + * The worst-case size of the write buffer is \(O(1)\). |
| 67 | + |
| 68 | + The maximum size of the write buffer on the write buffer allocation strategy, which is determined by the @'confWriteBufferAlloc'@ field of @'TableConfig'@. |
| 69 | + Regardless of write buffer allocation strategy, the size of the write buffer may never exceed 4GiB. |
| 70 | + |
| 71 | + [@AllocNumEntries maxEntries@]: |
| 72 | + The maximum size of the write buffer is the maximum number of entries multiplied by the average size of a key–operation pair. |
| 73 | + |
| 74 | + * The worst-case size of the Bloom filters is \(O(n)\). |
| 75 | + |
| 76 | + The total size of all Bloom filters depends on the Bloom filter allocation strategy, which is determined by the @'confBloomFilterAlloc'@ field of @'TableConfig'@. |
| 77 | + |
| 78 | + [@AllocFixed bitsPerPhysicalEntry@]: |
| 79 | + The total size of all Bloom filters is the number of bits per physical entry multiplied by the number of physical entries. |
| 80 | + [@AllocRequestFPR requestedFPR@]: |
| 81 | + TODO: How does one determine the bloom filter size using @AllocRequestFPR@? |
| 82 | + |
| 83 | + * The worst-case size of the indexes is \(O(n)\). |
| 84 | + |
| 85 | + The total size of all indexes depends on the index type, which is determined by the @'confFencePointerIndex'@ field of @'TableConfig'@. |
| 86 | + The size of the various indexes is described in reference to the size of the database in [/memory pages/](https://en.wikipedia.org/wiki/Page_%28computer_memory%29). |
| 87 | + |
| 88 | + [@OrdinaryIndex@]: |
| 89 | + An ordinary index stores the maximum serialised key for each memory page. |
| 90 | + The total size of all indexes is proportional to the average size of one serialised key per memory page. |
| 91 | + [@CompactIndex@]: |
| 92 | + A compact index stores the 64 most significant bits of the minimum serialised key for each memory page, as well as 1 bit per memory page to resolve clashes, 1 bit per memory page to mark overflow pages, and a negligable amount of memory for tie breakers. |
| 93 | + The total size of all indexes is approximately 66 bits per memory page. |
| 94 | + |
| 95 | + The total size of an LSM-tree must not exceed \(2^{41}\) physical entries. |
| 96 | + Violation of this condition /is/ checked and will throw a 'TableTooLargeError'. |
| 97 | + |
| 98 | + == Implementation |
| 99 | + |
| 100 | + The implementation of LSM-trees in this package draws inspiration from: |
| 101 | + |
| 102 | + * Chris Okasaki. |
| 103 | + 1998. |
| 104 | + \"Purely Functional Data Structures\" |
| 105 | + [doi:10.1017/CBO9780511530104](https://doi.org/10.1017/CBO9780511530104) |
| 106 | + * Niv Dayan, Manos Athanassoulis, and Stratos Idreos. |
| 107 | + 2017. |
| 108 | + \"Monkey: Optimal Navigable Key-Value Store.\" |
| 109 | + [doi:10.1145/3035918.3064054](https://doi.org/10.1145/3035918.3064054) |
| 110 | + * Subhadeep Sarkar, Dimitris Staratzis, Ziehen Zhu, and Manos Athanassoulis. |
| 111 | + 2021. |
| 112 | + \"Constructing and analyzing the LSM compaction design space.\" |
| 113 | + [doi:10.14778/3476249.3476274](https://doi.org/10.14778/3476249.3476274) |
| 114 | + |
6 | 115 | license: Apache-2.0
|
7 | 116 | license-file: LICENSE
|
8 | 117 | author:
|
|
0 commit comments