Extract documents or specific fields from an OpenSearch index. Supports extracting all documents, filtering by a list of IDs, or running a custom query. Output is newline-delimited JSON (NDJSON) written to stdout.
node index.js -u [OpenSearch URL] -s [index]
| Short | Long | Type | Default | Description |
|---|---|---|---|---|
-u |
--elasticUri |
String | http://localhost:9200/ |
OpenSearch cluster URL |
-s |
--source |
String | (required) | Name of the index to extract from |
-m |
--mode |
String | idList |
Extraction mode: idList or query (see below) |
-i |
--idField |
String | Document field to match IDs against (used in idList mode) |
|
-e |
--extractField |
String[] | Fields to include in output. Can be specified multiple times. Omit to get the whole document | |
-f |
--file |
String | Path to a file containing IDs (one per line) or a query JSON | |
-p |
--params |
String[] | Positional parameters to substitute into a query file. Can be specified multiple times | |
-l |
--limit |
Number | Maximum number of documents to return |
There are three ways to select which documents to extract:
When no file is provided, all documents in the index are returned.
node index.js -s my-indexProvide a file with one ID per line and specify which document field to match against with --idField. The tool builds a terms query to fetch all matching documents.
# ids.txt contains one ID per line
node index.js -s my-index -f ids.txt -i document_idProvide a file containing a raw OpenSearch query in JSON format. The file should contain the full query body, e.g.:
{
"query": {
"range": {
"timestamp": { "gte": "2025-01-01" }
}
}
}node index.js -s my-index -m query -f my-query.jsonQuery files support positional parameter substitution using {$1}, {$2}, etc. Pass values with -p:
{
"query": {
"range": {
"timestamp": { "gte": "{$1}", "lte": "{$2}" }
}
}
}node index.js -s my-index -m query -f my-query.json -p 2025-01-01 -p 2025-12-31By default, the full document _source is returned. Use -e to extract only specific fields. Dot notation is supported for nested fields.
# Extract just the title and author name
node index.js -s my-index -e title -e author.nameUse -l to cap the number of documents returned:
node index.js -s my-index -l 500Results are written to stdout as one JSON object per line (NDJSON), so you can pipe them to a file or other tools:
node index.js -s my-index -e title > output.jsonl
node index.js -s my-index | jq '.title'