opensearch-extract

Extract documents or specific fields from an OpenSearch index. Supports extracting all documents, filtering by a list of IDs, or running a custom query. Output is newline-delimited JSON (NDJSON) written to stdout.

Usage

node index.js -u [OpenSearch URL] -s [index]

Parameters

Short	Long	Type	Default	Description
`-u`	`--elasticUri`	String	`http://localhost:9200/`	OpenSearch cluster URL
`-s`	`--source`	String	(required)	Name of the index to extract from
`-m`	`--mode`	String	`idList`	Extraction mode: `idList` or `query` (see below)
`-i`	`--idField`	String		Document field to match IDs against (used in `idList` mode)
`-e`	`--extractField`	String[]		Fields to include in output. Can be specified multiple times. Omit to get the whole document
`-f`	`--file`	String		Path to a file containing IDs (one per line) or a query JSON
`-p`	`--params`	String[]		Positional parameters to substitute into a query file. Can be specified multiple times
`-l`	`--limit`	Number		Maximum number of documents to return

Modes

There are three ways to select which documents to extract:

1. Extract everything (no `--file`)

When no file is provided, all documents in the index are returned.

node index.js -s my-index

2. ID list mode (`--mode idList`, default)

Provide a file with one ID per line and specify which document field to match against with --idField. The tool builds a terms query to fetch all matching documents.

# ids.txt contains one ID per line
node index.js -s my-index -f ids.txt -i document_id

3. Query mode (`--mode query`)

Provide a file containing a raw OpenSearch query in JSON format. The file should contain the full query body, e.g.:

{
  "query": {
    "range": {
      "timestamp": { "gte": "2025-01-01" }
    }
  }
}

node index.js -s my-index -m query -f my-query.json

Query files support positional parameter substitution using {$1}, {$2}, etc. Pass values with -p:

{
  "query": {
    "range": {
      "timestamp": { "gte": "{$1}", "lte": "{$2}" }
    }
  }
}

node index.js -s my-index -m query -f my-query.json -p 2025-01-01 -p 2025-12-31

Selecting fields

By default, the full document _source is returned. Use -e to extract only specific fields. Dot notation is supported for nested fields.

# Extract just the title and author name
node index.js -s my-index -e title -e author.name

Limiting results

Use -l to cap the number of documents returned:

node index.js -s my-index -l 500

Output

Results are written to stdout as one JSON object per line (NDJSON), so you can pipe them to a file or other tools:

node index.js -s my-index -e title > output.jsonl
node index.js -s my-index | jq '.title'

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

opensearch-extract

Usage

Parameters

Modes

1. Extract everything (no `--file`)

2. ID list mode (`--mode idList`, default)

3. Query mode (`--mode query`)

Selecting fields

Limiting results

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

opensearch-extract

Usage

Parameters

Modes

1. Extract everything (no --file)

2. ID list mode (--mode idList, default)

3. Query mode (--mode query)

Selecting fields

Limiting results

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Extract everything (no `--file`)

2. ID list mode (`--mode idList`, default)

3. Query mode (`--mode query`)

Packages