Skip to content

Commit 1a19e33

Browse files
authored
Merge pull request #654 from IntersectMBO/wenkokke/package-description
doc: add package description
2 parents 68487e4 + b966a47 commit 1a19e33

File tree

1 file changed

+111
-2
lines changed

1 file changed

+111
-2
lines changed

lsm-tree.cabal

Lines changed: 111 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,117 @@
11
cabal-version: 3.4
22
name: lsm-tree
33
version: 0.1.0.0
4-
synopsis: Log-structured merge-tree
5-
description: Log-structured merge-tree.
4+
synopsis: Log-structured merge-trees
5+
description:
6+
This package contains an efficient implementation of on-disk key–value storage, implemented as a log-structured merge-tree or LSM-tree.
7+
An LSM-tree is a data structure for key–value mappings, similar to "Data.Map", but optimized for large tables with a high insertion volume.
8+
It has support for:
9+
10+
* Basic key–value operations, such as lookup, insert, and delete.
11+
* Range lookups, which efficiently retrieve the values for all keys in a given range.
12+
* Monoidal upserts (or \"mupserts\") which combine the stored and new values.
13+
* BLOB storage which assocates a large auxiliary BLOB with a key.
14+
* Durable on-disk persistence and rollback via named snapshots.
15+
* Cheap table duplication where all duplicates can be independently accessed and modified.
16+
* High-performance lookups on SSDs using I\/O batching and parallelism.
17+
18+
This package exports two modules:
19+
20+
* "Database.LSMTree.Simple"
21+
22+
This module exports a simplified API which picks sensible defaults for a number of configuration parameters.
23+
24+
It does not support mupserts or BLOBs, due to their unintuitive interaction, see [Mupserts and BLOBs](#mupsertsandblobs).
25+
26+
If you are looking at this package for the first time, it is strongly recommended that you start by reading this module.
27+
28+
* "Database.LSMTree"
29+
30+
This module exports the full API.
31+
32+
== Mupserts and BLOBs #mupsertsandblobs#
33+
34+
The interaction between mupserts and BLOBs is unintuitive.
35+
A mupsert updates the value associated with the key by combining the old and new value with a user-specified function.
36+
However, this does not apply to any BLOB value associated with the key, which is simply overwritten by the new BLOB value.
37+
38+
== Portability #portability#
39+
40+
* This package only supports 64-bit, little-endian systems.
41+
* On Windows, the package has only been tested with NTFS filesystems.
42+
* On Linux, executables using this package, including test and benchmark suites, must be compiled with the [@-threaded@](https://downloads.haskell.org/ghc/latest/docs/users_guide/phases.html#ghc-flag-threaded) RTS option enabled.
43+
44+
== Concurrency #concurrency#
45+
46+
LSM-trees can be used concurrently, but with a few restrictions:
47+
48+
* Each session locks its session directory.
49+
This means that a database cannot be accessed from different processes at the same time.
50+
* Tables can be used concurrently and concurrent use of read operations such as lookups is determinstic.
51+
However, concurrent use of write operations such as insert or delete with any other operation results in a race condition.
52+
53+
== Performance #performance#
54+
55+
The worst-case time and space complexities are given in [big-O notation](http://en.wikipedia.org/wiki/Big_O_notation).
56+
The time cost of operations on LSM-trees is generally dominated by the number of disk I\/O actions.
57+
As such, the worst-case complexity of basic operations refer to the number of disk I\/O actions.
58+
59+
TODO: Describe the time complexity of the basic operations.
60+
61+
The in-memory size of an LSM-tree is described in terms of the variable \(n\), which refers to the number of /physical/ database entries.
62+
A /physical/ database entry is any key–operation pair, e.g., @Insert k v@ or @Delete k@, whereas a /logical/ database entry is determined by all physical entries with the same key.
63+
64+
The worst-case in-memory size of an LSM-tree is \(O(n)\).
65+
66+
* The worst-case size of the write buffer is \(O(1)\).
67+
68+
The maximum size of the write buffer on the write buffer allocation strategy, which is determined by the @'confWriteBufferAlloc'@ field of @'TableConfig'@.
69+
Regardless of write buffer allocation strategy, the size of the write buffer may never exceed 4GiB.
70+
71+
[@AllocNumEntries maxEntries@]:
72+
The maximum size of the write buffer is the maximum number of entries multiplied by the average size of a key–operation pair.
73+
74+
* The worst-case size of the Bloom filters is \(O(n)\).
75+
76+
The total size of all Bloom filters depends on the Bloom filter allocation strategy, which is determined by the @'confBloomFilterAlloc'@ field of @'TableConfig'@.
77+
78+
[@AllocFixed bitsPerPhysicalEntry@]:
79+
The total size of all Bloom filters is the number of bits per physical entry multiplied by the number of physical entries.
80+
[@AllocRequestFPR requestedFPR@]:
81+
TODO: How does one determine the bloom filter size using @AllocRequestFPR@?
82+
83+
* The worst-case size of the indexes is \(O(n)\).
84+
85+
The total size of all indexes depends on the index type, which is determined by the @'confFencePointerIndex'@ field of @'TableConfig'@.
86+
The size of the various indexes is described in reference to the size of the database in [/memory pages/](https://en.wikipedia.org/wiki/Page_%28computer_memory%29).
87+
88+
[@OrdinaryIndex@]:
89+
An ordinary index stores the maximum serialised key for each memory page.
90+
The total size of all indexes is proportional to the average size of one serialised key per memory page.
91+
[@CompactIndex@]:
92+
A compact index stores the 64 most significant bits of the minimum serialised key for each memory page, as well as 1 bit per memory page to resolve clashes, 1 bit per memory page to mark overflow pages, and a negligable amount of memory for tie breakers.
93+
The total size of all indexes is approximately 66 bits per memory page.
94+
95+
The total size of an LSM-tree must not exceed \(2^{41}\) physical entries.
96+
Violation of this condition /is/ checked and will throw a 'TableTooLargeError'.
97+
98+
== Implementation
99+
100+
The implementation of LSM-trees in this package draws inspiration from:
101+
102+
* Chris Okasaki.
103+
1998.
104+
\"Purely Functional Data Structures\"
105+
[doi:10.1017/CBO9780511530104](https://doi.org/10.1017/CBO9780511530104)
106+
* Niv Dayan, Manos Athanassoulis, and Stratos Idreos.
107+
2017.
108+
\"Monkey: Optimal Navigable Key-Value Store.\"
109+
[doi:10.1145/3035918.3064054](https://doi.org/10.1145/3035918.3064054)
110+
* Subhadeep Sarkar, Dimitris Staratzis, Ziehen Zhu, and Manos Athanassoulis.
111+
2021.
112+
\"Constructing and analyzing the LSM compaction design space.\"
113+
[doi:10.14778/3476249.3476274](https://doi.org/10.14778/3476249.3476274)
114+
6115
license: Apache-2.0
7116
license-file: LICENSE
8117
author:

0 commit comments

Comments
 (0)