Skip to content

IntersectMBO/lsm-tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

lsm-tree

Cardano Handbook Build Haddocks

⚠️ This library is in active development: there is currently no release schedule!

This package is developed by Well-Typed LLP on behalf of Input Output Global, Inc. (IOG) and INTERSECT. The main contributors are Duncan Coutts, Joris Dral, Matthias Heinzel, Wolfgang Jeltsch, Wen Kokke, and Alex Washburn.

Description

This package contains an efficient implementation of on-disk key–value storage, implemented as a log-structured merge-tree or LSM-tree. An LSM-tree is a data structure for key–value mappings, similar to Data.Map, but optimized for large tables with a high insertion volume. It has support for:

  • Basic key–value operations, such as lookup, insert, and delete.

  • Range lookups, which efficiently retrieve the values for all keys in a given range.

  • Monoidal upserts (or "mupserts") which combine the stored and new values.

  • BLOB storage which assocates a large auxiliary BLOB with a key.

  • Durable on-disk persistence and rollback via named snapshots.

  • Cheap table duplication where all duplicates can be independently accessed and modified.

  • High-performance lookups on SSDs using I/O batching and parallelism.

This package exports two modules:

  • Database.LSMTree.Simple

    This module exports a simplified API which picks sensible defaults for a number of configuration parameters.

    It does not support mupserts or BLOBs, due to their unintuitive interaction, see Mupserts and BLOBs.

    If you are looking at this package for the first time, it is strongly recommended that you start by reading this module.

  • Database.LSMTree

    This module exports the full API.

Mupserts and BLOBs

The interaction between mupserts and BLOBs is unintuitive. A mupsert updates the value associated with the key by combining the old and new value with a user-specified function. However, this does not apply to any BLOB value associated with the key, which is simply overwritten by the new BLOB value.

Portability

  • This package only supports 64-bit, little-endian systems.

  • On Windows, the package has only been tested with NTFS filesystems.

  • On Linux, executables using this package, including test and benchmark suites, must be compiled with the -threaded RTS option enabled.

Concurrency

LSM-trees can be used concurrently, but with a few restrictions:

  • Each session locks its session directory. This means that a database cannot be accessed from different processes at the same time.

  • Tables can be used concurrently and concurrent use of read operations such as lookups is determinstic. However, concurrent use of write operations such as insert or delete with any other operation results in a race condition.

Performance

The worst-case time and space complexities are given in big-O notation. The time cost of operations on LSM-trees is generally dominated by the number of disk I/O actions. As such, the worst-case complexity of basic operations refer to the number of disk I/O actions.

TODO: Describe the time complexity of the basic operations.

The in-memory size of an LSM-tree is described in terms of the variable n, which refers to the number of physical database entries. A physical database entry is any key–operation pair, e.g., Insert k v or Delete k, whereas a logical database entry is determined by all physical entries with the same key.

The worst-case in-memory size of an LSM-tree is O(n).

  • The worst-case size of the write buffer is O(1).

    The maximum size of the write buffer on the write buffer allocation strategy, which is determined by the confWriteBufferAlloc field of TableConfig. Regardless of write buffer allocation strategy, the size of the write buffer may never exceed 4GiB.

    AllocNumEntries maxEntries
    The maximum size of the write buffer is the maximum number of entries multiplied by the average size of a key–operation pair.

  • The worst-case size of the Bloom filters is O(n).

    The total size of all Bloom filters depends on the Bloom filter allocation strategy, which is determined by the confBloomFilterAlloc field of TableConfig.

    AllocFixed bitsPerPhysicalEntry
    The total size of all Bloom filters is the number of bits per physical entry multiplied by the number of physical entries.

    AllocRequestFPR requestedFPR
    TODO: How does one determine the bloom filter size using AllocRequestFPR?

  • The worst-case size of the indexes is O(n).

    The total size of all indexes depends on the index type, which is determined by the confFencePointerIndex field of TableConfig. The size of the various indexes is described in reference to the size of the database in memory pages.

    OrdinaryIndex
    An ordinary index stores the maximum serialised key for each memory page. The total size of all indexes is proportional to the average size of one serialised key per memory page.

    CompactIndex
    A compact index stores the 64 most significant bits of the minimum serialised key for each memory page, as well as 1 bit per memory page to resolve clashes, 1 bit per memory page to mark overflow pages, and a negligable amount of memory for tie breakers. The total size of all indexes is approximately 66 bits per memory page.

The total size of an LSM-tree must not exceed 241 physical entries. Violation of this condition is checked and will throw a TableTooLargeError.

Implementation

The implementation of LSM-trees in this package draws inspiration from:

About

A Haskell library for on-disk tables based on LSM-Trees

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages