Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
common.js	common.js

common

Classes

Counter

Memory efficient data stucture meant to keep track of how many times it sees each symbol pushed to it.

Internally, it keeps:

an array with all the symbols it holds in the order they've been pushed to the data structure.
an associative array with the number of times each of these symbols appeared

The structure takes an argument, n, with the maximum number of symbols to keep track of. When a new symbol is pushed, if n is at its limit, the structure will start removing symbols following a FIFO method and remove them from the associative array if their cummulative count is one.

This greatly reduces the memory footprint of the data structure and allows it to work on datasets much larger than the allowed resident set size of the program.

Usage:

const counter = Counter({ n:1024*1024 });

counter.push("[SYMBOL]");

counter.o["[SYMBOL]"]; // number of times SYMBOL appears on the input set

Calculating how many symbols appear only once in the input set is possible, even after they were deleted:

Retrieve the set of symbols that were seen more than one time.

Object.entries(counter.o).filter(v => v[1] > 1);

Subtract from the total number of symbols in the input set.
The remainder is the symbols that appear only once.

Object.keys("[INPUT SET]").filter(k => "[NOT IN (1)]");

Queue

A Queue is a construction that executes a given function with a an aggregated invocation parameter list only after:

A threshold has been exceeded (n)
It is explicitly told to do so (flush)

This is useful in situations when, for a given set of inputs, the side effects of calling a given function on each element are, comparatively, more expensive than what the function is intended to do. A Queue is a primitive that allows to carry out batch invocations of one single function.

An example could be inserting rows into a database, it is better to aggregate a few thousand rows and write them all in a single call, than doing so for each row since disk/network latency and other side effects consume the vast majority of such function's execution time.

The construction takes two arguments:

n: number of items to buffer before triggering a flush operation
onFlush: (async) function that is called on each flush operation

And exposes two methods:

(async) flush: dispatches f with the current aggregated queue and clears it
(async) push: adds an item to the queue, calls flush when n is reached

onFlush callback's only argument is an array with all the elements that have been pushed to the queue up to when flush was called. An example:

// if your push calls look like this ...
await Queue.push(1, 2);
await Queue.push(3, 4, 5, 6);
await Queue.push(7, 8);

// ... when Queue.flush() is called ...
await Queue.flush();

// ... onFlush will be called with an array like:
// [
//   [1, 2],
//   [3, 4, 5, 6],
//   [7, 8]
// ]

You should always await on these methods, since once they return there is a guarantee that the queue has been processed properly.

Usage:

const queue = Queue({
  n:1024,
  onFlush:async q => {
    "[PROCESS q]"
  }
});

while("[CONDITION]")
  await queue.push("[ARGUMENTS]");

// always call flush() at the end
// to make sure all the elements that remain in the queue are processed
await queue.flush();

XMLParser

A streaming XML parser using libexpat as backend. Requires node-expat (npm install node-expat) to work.

The parser operates on node.js Streams and does not allocate memory other than a few transient objects that are needed to populate startElement. This allows it to parse large XML files while keeping an extremely low memory footprint.

The parser takes a single argument, startElement, which is the name of the XML element tags in the document that will be returned to the user.

The parser inherits the EventEmitter class. Each time a startElement is found, a node event is emitted with a single object that holds all its data including its child nodes. The structure of this object is as follows:

attributes of the node are in an associative map stored under the "$" key
all text on this node is concatenated and stored under the "_" key
child nodes are stored in arrays under keys corresponding to their tag name multiple child nodes with the same tag name are stored in the same array each of this child nodes is an object that follows the same structure

Example, an XML file like:

<?xml version="1.0" encoding="UTF-8"?>
<root a="one" b="two">
  <first c="three">First</first>
  <second f="four">Second</second>
</root>

... would get turned into an object like:

// root
{
  $:{
    a:'one',
    b:'two'
  },
  first:{
    $:{ c:'three' },
    _:'First'
  },
  second:{
    $:{ d:'four' },
    _:'Second'
  }
}

Usage:

Instantiate the parser (new XMLParser()).
Pipe data to it as you would do with any other Stream.

try {
  await pipeline([
    createReadStream("[FILENAME]"),
    XMLParser({ startElement:'"[ELEMENT TAG]"' })
      .on('node', node => {
        node.$; // attributes
        node._; // text

        Object.keys(node); // child nodes
      })
  ]);
} catch(e) {
  // ...
}

Functions

See code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common

common

README.md

common

Classes

Counter

Queue

XMLParser

Functions

Files

common

Directory actions

More options

Directory actions

More options

Latest commit

History

common

Folders and files

parent directory

README.md

common

Classes

Counter

Queue

XMLParser

Functions