Skip to content

Abrimos-info/opensearch-extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

opensearch-extract

Extract documents or specific fields from an OpenSearch index. Supports extracting all documents, filtering by a list of IDs, or running a custom query. Output is newline-delimited JSON (NDJSON) written to stdout.

Usage

node index.js -u [OpenSearch URL] -s [index]

Parameters

Short Long Type Default Description
-u --elasticUri String http://localhost:9200/ OpenSearch cluster URL
-s --source String (required) Name of the index to extract from
-m --mode String idList Extraction mode: idList or query (see below)
-i --idField String Document field to match IDs against (used in idList mode)
-e --extractField String[] Fields to include in output. Can be specified multiple times. Omit to get the whole document
-f --file String Path to a file containing IDs (one per line) or a query JSON
-p --params String[] Positional parameters to substitute into a query file. Can be specified multiple times
-l --limit Number Maximum number of documents to return

Modes

There are three ways to select which documents to extract:

1. Extract everything (no --file)

When no file is provided, all documents in the index are returned.

node index.js -s my-index

2. ID list mode (--mode idList, default)

Provide a file with one ID per line and specify which document field to match against with --idField. The tool builds a terms query to fetch all matching documents.

# ids.txt contains one ID per line
node index.js -s my-index -f ids.txt -i document_id

3. Query mode (--mode query)

Provide a file containing a raw OpenSearch query in JSON format. The file should contain the full query body, e.g.:

{
  "query": {
    "range": {
      "timestamp": { "gte": "2025-01-01" }
    }
  }
}
node index.js -s my-index -m query -f my-query.json

Query files support positional parameter substitution using {$1}, {$2}, etc. Pass values with -p:

{
  "query": {
    "range": {
      "timestamp": { "gte": "{$1}", "lte": "{$2}" }
    }
  }
}
node index.js -s my-index -m query -f my-query.json -p 2025-01-01 -p 2025-12-31

Selecting fields

By default, the full document _source is returned. Use -e to extract only specific fields. Dot notation is supported for nested fields.

# Extract just the title and author name
node index.js -s my-index -e title -e author.name

Limiting results

Use -l to cap the number of documents returned:

node index.js -s my-index -l 500

Output

Results are written to stdout as one JSON object per line (NDJSON), so you can pipe them to a file or other tools:

node index.js -s my-index -e title > output.jsonl
node index.js -s my-index | jq '.title'

About

Extract documents or specific fields from an OpenSearch index.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors