feat(es): add dump helper and bulk-ndjson restore path by MattDevy · Pull Request #415 · elastic/cli

MattDevy · 2026-06-16T11:31:49Z

Summary

Adds elastic es helpers dump — exports one or more indices as bulk-format NDJSON (action + _source line pairs) using a per-index Point-in-Time + search_after sorted by _shard_doc, with --size, --keep-alive, --output (file or stdout), --skip-index-name, --add-id, --query (inline JSON), and --query-file (file or - for stdin).
Extends elastic es helpers bulk-ingest with a bulk-ndjson source format that streams pre-formatted action+doc line pairs verbatim into _bulk, so dump output round-trips through the existing ingester. --index is now optional in this mode (action lines may carry _index).
Inspired by escli-rs's utils dump / utils load — the use case is capturing a remote index for local debugging.

Why

Round-trip export/import for debugging is a recurring ask. scroll-search writes only _source lines (one-way, not re-ingestable) and bulk-ingest previously generated action lines from scratch, so it couldn't consume already-shaped bulk NDJSON. dump produces the right shape and the new bulk-ndjson source format closes the loop with no third command.

Design notes

PIT, not scroll. PIT + search_after gives consistent reads without leaving long-lived scroll contexts behind, and is what escli-rs uses. PIT is closed in a finally block so it doesn't leak on transport errors.
bulk-ndjson as a source format, not a parallel command. Reuses the existing bulk-ingest plumbing: resolveRawInputs for file/dir/stdin, generalised splitIntoBatches<T> with a sizeOf callback, plus the same retry/concurrency/progress-reporter helpers.
--json without --output is rejected with a clear error, because streamed NDJSON and a stats JSON blob would otherwise collide on stdout.

Example

# Export a remote index, omit _index so it can be re-targeted, write to file
elastic --use-context remote es helpers dump \
  --indices my-prod-idx --skip-index-name \
  --query '{"range":{"@timestamp":{"gte":"now-1h"}}}' \
  --output dump.ndjson

# Re-ingest into a local cluster under a new index name
elastic --use-context local es helpers bulk-ingest \
  --source-format bulk-ndjson --index local-copy --data-file dump.ndjson

Test plan

Unit tests: 16 new for dump (PIT + search_after, multi-index, query inline/file, --output file vs stdout, --skip-index-name, --add-id, PIT cleanup on error, missing PIT id, empty query file, --json requires --output), 10 new for bulk-ndjson ingest (verbatim _bulk body, /{index}/_bulk routing, byte-size batching, odd-line / non-bulk-action validation, multi-file dir, --data-file+--data-dir conflict, empty file, no glob match, schema rejection of missing --index for other formats).
npm test — 1462/1462 pass.
npm run test:lint, npm run build, tsc --noEmit clean.
Branch coverage 90.35% (threshold 90%).
Manual end-to-end against a real cluster — leaving for reviewer / next pass.

Add `elastic es helpers dump` for exporting indices as bulk-format NDJSON using PIT + search_after, and extend `bulk-ingest` with a `bulk-ndjson` source format so the dump output can be streamed back into `_bulk`. Use case: capture a remote index for local debugging. Inspired by escli-rs (https://github.com/Anaethelion/escli-rs); ports the dump/load feature set into the elastic/cli helper conventions.

github-actions · 2026-06-16T11:34:26Z

✅MegaLinter analysis: Success

Descriptor	Linter	Files	Errors	Warnings	Elapsed time
✅ COPYPASTE	jscpd	yes	no	no	9.7s
✅ REPOSITORY	gitleaks	yes	no	no	60.58s
✅ REPOSITORY	git_diff	yes	no	no	0.48s
✅ REPOSITORY	secretlint	yes	no	no	38.06s
✅ REPOSITORY	trivy	yes	no	no	16.98s
✅ TYPESCRIPT	eslint	5	0	0	5.22s
✅ YAML	yamllint	1	0	0	0.92s

Notices

📣 MegaLinter 9.5.0 is out! Discover the new features and security recommendations in the release announcement. (Skip this info by defining SECURITY_SUGGESTIONS: false)

See detailed reports in MegaLinter artifacts
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

Supersedes the previous regeneration commit, which ran against a locally-symlinked node_modules with a stale yaml version. The actual drift on main was commander 14 -> 15 from #410, which had updated the lockfile but not NOTICE.txt.

Adds a "Dump and restore an index" subsection under the `es` section with the round-trip example, a flag reference for `dump`, and a note on the new `bulk-ingest --source-format bulk-ndjson` mode.

Anaethelion

Just a small improvement.

Track the active PIT id and output fd in mutable refs visible to a SIGINT/SIGTERM handler that releases both before the process exits with code 130. The per-index `finally` and the signal handler share the same refs (and null them eagerly) so the two cleanup paths can't race into a double-close. Listeners are removed when the handler returns. Addresses Anaethelion review feedback on PR #415.

Found something to address after approval

JoshMock

A few nits, and a few larger concerns. Overall looks awesome!

Regardless, I'm on PTO after today so don't wait to merge on my account. Worst case scenario, we have to make a second pass for some of my proposed improvements. 🖤

JoshMock · 2026-06-18T18:51:22Z


 Run `elastic es <command> --help` for all available options on any command.

+##### Dump and restore an index


Can you duplicate or move these docs into the ./docs/ directory as a guide? Probably worth adding similar guides for the other helpers as well, honestly, but we can handle that in a separate PR.

JoshMock · 2026-06-18T18:58:09Z

+      } catch (err) {
+        throw new Error(`bulk-ndjson: invalid action line at line ${lineNum}: ${err instanceof Error ? err.message : String(err)}`, { cause: err })
+      }
+      if (parsed == null || typeof parsed !== 'object' || Array.isArray(parsed)) {


A nit, probably for later, but it would be a nice improvement to standardize on using node:assert for all input validation of this sort (basically anywhere Zod or JSON schemas are not used for validation).

JoshMock · 2026-06-18T19:49:11Z

+  const pairs: string[] = []
+  let action: string | undefined
+  let lineNum = 0
+  let nonEmptyCount = 0
+
+  for (const line of raw.split('\n')) {
+    lineNum++
+    const trimmed = line.trim()
+    if (trimmed.length === 0) continue
+    nonEmptyCount++


The "every other line" loop scheme appears to overlook the fact that a delete action won't have a paired document on the following line. Also, update actions require documents to be wrapped in a {"doc": ...} envelope.

Since this is ingest path will always come use the output of a dump command, maybe BULK_ACTIONS needs to just be reduced to index and create, or just index?

JoshMock · 2026-06-18T19:50:39Z

+    return JSON.parse(input.query)
+  }
+  if (input.query_file != null) {
+    const raw = readRawInput(input.query_file)


If query_file is a - (shorthand for stdin), will this attempt to read a filename literally called - or will it do the right thing?

JoshMock · 2026-06-18T19:59:24Z

+ * Reuses retry, concurrency, and progress reporting from the main flow; the only difference
+ * is that the input is already bulk-shaped, so each pair is sent through verbatim.
+ */
+async function runBulkNdjson (opts: BulkIngestInput, transport: EsClient): Promise<JsonValue> {


opts.pipeline and opts.routing are ignored in this function. Should they be handled here, or do we expect the bulk-ndjson format to never include those?

JoshMock · 2026-06-18T20:19:11Z

+        const actionLine = addId
+          ? actionPrefix + JSON.stringify(hit._id) + actionSuffix
+          : actionPrefix
+        write(`${actionLine}\n${JSON.stringify(hit._source)}\n`)


Every hit will call write, which then calls writeSync, executing a filesystem call for each line. This could be optimized by using a writable file stream, or manually buffering rows and only writing when a byte threshold is exceeded, to significantly speed up dumps with more than a few thousand documents.

- dump: --query-file '-' now reads from stdin instead of trying to open a file literally named '-'. - dump: batch per-page hits into a single write to cut writeSync calls from O(docs) to O(pages); on a multi-million-doc dump this is syscall-bound vs network-bound. - bulk-ndjson: restrict accepted actions to `index` and `create`. The pair parser assumes every action is followed by a doc line, which is not true for `delete` (no doc) or `update` (needs `{"doc": ...}` envelope). The producer this format is designed for (dump) only emits `index`, so rejecting the others is safer than silently corrupting input. - bulk-ndjson: apply --pipeline and --routing as URL query params so they affect every action in the batch without rewriting pre-formatted action lines. - docs: move the long-form dump-and-restore guide to docs/cli/stack/es/ helpers/dump-and-restore.md; README links to it. Addresses JoshMock review on PR #415.

github-actions · 2026-06-19T11:57:13Z

🔍 Preview links for changed docs

github-actions · 2026-06-19T11:57:14Z

✅ Elastic Docs Style Checker (Vale)

No issues found on modified lines!

The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale.

MattDevy · 2026-06-19T12:27:27Z

Follow-up: filed #443 to track the memory profile of bulk-ingest.

The dump side streams (one PIT page at a time, bounded by --size), but the ingest side — for all source formats including the new bulk-ndjson, not just my addition — reads the full input into memory before flushing the first batch. Large dumps (100 MB+) bloat; multi-GB dumps OOM. Fix needs a readline-based streaming reader for every input source and a csv-parse (streaming) swap for CSV, so it's broader than this PR.

MattDevy added 3 commits June 16, 2026 12:35

chore: regenerate NOTICE.txt for yaml@2.8.4

f83f44b

chore: regenerate NOTICE.txt for commander@15.0.0

d4c3a11

Supersedes the previous regeneration commit, which ran against a locally-symlinked node_modules with a stale yaml version. The actual drift on main was commander 14 -> 15 from #410, which had updated the lockfile but not NOTICE.txt.

docs: document dump helper and bulk-ndjson restore in README

3fd89fd

Adds a "Dump and restore an index" subsection under the `es` section with the round-trip example, a flag reference for `dump`, and a note on the new `bulk-ingest --source-format bulk-ndjson` mode.

MattDevy marked this pull request as ready for review June 16, 2026 12:06

MattDevy requested review from Anaethelion, JoshMock and margaretjgu June 16, 2026 12:07

MattDevy mentioned this pull request Jun 16, 2026

ci(kb): cleanup trap masks real failure in Buildkite KB functional tests #416

Closed

Merge branch 'main' into claude/admiring-nash-2e349b

cc780f8

Anaethelion requested changes Jun 17, 2026

View reviewed changes

Comment thread src/es/helpers/dump.ts

MattDevy requested a review from Anaethelion June 17, 2026 12:43

MattDevy and others added 2 commits June 17, 2026 13:49

fix(test): remove unused closeSync import in dump.test.ts

d6896ff

Merge branch 'main' into claude/admiring-nash-2e349b

9e774ac

Anaethelion previously approved these changes Jun 17, 2026

View reviewed changes

JoshMock reviewed Jun 18, 2026

View reviewed changes

MattDevy added 2 commits June 19, 2026 12:51

docs: move dump/restore guide out of auto-generated cli/ tree

f74048d

github-actions Bot deployed to docs-preview June 19, 2026 11:57 View deployment

docs: fix Vale warnings in dump-and-restore guide

1037043

github-actions Bot deployed to docs-preview June 19, 2026 12:22 View deployment

MattDevy mentioned this pull request Jun 19, 2026

perf(es): bulk-ingest buffers all input into memory; OOMs on large dumps #443

Open


		Run `elastic es <command> --help` for all available options on any command.

		##### Dump and restore an index

Uh oh!

Conversation

MattDevy commented Jun 16, 2026

Summary

Why

Design notes

Example

Test plan

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅MegaLinter analysis: Success

Notices

Uh oh!

Anaethelion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JoshMock left a comment

Choose a reason for hiding this comment

Uh oh!

JoshMock Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

JoshMock Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

JoshMock Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

JoshMock Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

JoshMock Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

JoshMock Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Elastic Docs Style Checker (Vale)

Uh oh!

MattDevy commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading