-
Notifications
You must be signed in to change notification settings - Fork 182
Migrate data to serverless using logstash #4039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Vale Linting ResultsNo issues found on modified lines! |
🔍 Preview links for changed docs |
robbavey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good start, but maybe we could consider adding some more details for more advanced scenarios
| docinfo => true | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it makes sense to split the input and output sections up to allow more context in each of them?
I'd like to see some explanation of why we are setting docinfo => true, and what it provides (metadata from retrieved docs to allow retention of index name and/or doc id). That could be a link into the Elasticsearch input doc, but the additional options available in the plugin might muddy rather than clarify.
That might also give space to talk about why users might want to consider setting size, slices and scroll settings depending on their needs and the original index. (Or we could put this in the "advanced" section)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original KBA was intended to provide a basic configuration and flow that users could use to get some data into Serverless.
I'd like to keep the basic scenario as basic as possible, but with enough content to help a large percentage of our users be successful in their migration task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tatianafinamor, as the author of the migration KBA, can you give me more info on how you've seen the basic migration work with existing users? Any guesses as to what percentage of customers the migration steps n the KBA will work for?
Any other info that might help inform which content is MVP, and which content might be too much information and belongs in a more advanced section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @karenzone
Thank you for helping getting those steps added to our docs and for the additional questions.
The intention of this KB was to cover the basic data migration path using Logstash.
It is essentially just reading the existing indices from the Hosted deployment and writing them into Serverless using a modified Logstash output.
It does not cover dashboards, ingest pipelines, templates, or other Kibana objects, so those will need to be migrated separately.
From what I have seen so far, this approach works for roughly 30 to 40% of customers without needing extra steps. (Well they all need to manually add their mappings, I will update that as a step on the KB).
These percentage tends to be users with fairly standard index structures and not too many custom pipelines. or dashboards.
For the remaining cases, the process still works, but needs some extra adjustments, things like rebuilding pipelines, fixing mapping conflicts, or handling very large historical indices.
One more thing worth mentioning: there is another KBA that is really helpful when the customer is also migrating agents, especially for Observability or Security use cases:
https://support.elastic.dev/knowledge/view/64804a87
I usually use both guides together when helping customers.
If it helps, I can help build out the full migration content covering dashboards, ingest pipelines, templates, and other assets.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this guide is for advanced users of Logstash, then this is probably fine as-is. But I'm not sure that's the case. I think there's a chance folks using this guide are unfamiliar with LS. In that case, I agree with Rob here. If I'm migrating data, I guess I don't need to know what every line of my config file is doing. But if the value makes a difference in the data migrated, we should explain it.
I agree we don't want to overcomplicate or give a full tour of the entire Elasticsearch input, but I also think that we don't want to be too basic. We're still a long way from over complicating things in this guide. Breaking this up allows us to provide more context without things getting too busy. For example:
| :::{admonition} Advanced migration | ||
| :applies_to: stack: preview | ||
|
|
||
| {{ls}} can handle more advanced migrations with field tracking settings in the [Elasticsearch input](https://www.elastic.co/docs/reference/logstash/plugins/plugins-inputs-elasticsearch) plugin. The field tracking feature adds cursor-like pagination functionality that can support more complex migrations and ongoing data migration over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| {{ls}} can handle more advanced migrations with field tracking settings in the [Elasticsearch input](https://www.elastic.co/docs/reference/logstash/plugins/plugins-inputs-elasticsearch) plugin. The field tracking feature adds cursor-like pagination functionality that can support more complex migrations and ongoing data migration over time. | |
| {{ls}} can handle more advanced migrations with field tracking settings in the [Elasticsearch input](https://www.elastic.co/docs/reference/logstash/plugins/plugins-inputs-elasticsearch) plugin. The field tracking feature adds cursor-like pagination functionality that allows more advanced migrations features, including the ability to resume migration tasks after a Logstash restart, and support for ongoing data migration over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I'll add this on my next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also add these here?
size- Controls how many documents are retrieved per scroll. Larger values increase throughput but use more memory....slices- Enables parallel reads from the source index...scroll- Adjusts how long Elasticsearch keeps the scroll context alive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do we want to put this up in the configuration step? It's an afterthought here that most folks won't see until after they've already finished the above steps.
| * [Verify data migration](#verify-migration) | ||
|
|
||
|
|
||
| ## Step 1: Configure {{ls}} [configure-ls] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the first step be a brief blurb about Logstash? Incl. a link to installation or quick start docs?
| docinfo => true | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this guide is for advanced users of Logstash, then this is probably fine as-is. But I'm not sure that's the case. I think there's a chance folks using this guide are unfamiliar with LS. In that case, I agree with Rob here. If I'm migrating data, I guess I don't need to know what every line of my config file is doing. But if the value makes a difference in the data migrated, we should explain it.
I agree we don't want to overcomplicate or give a full tour of the entire Elasticsearch input, but I also think that we don't want to be too basic. We're still a long way from over complicating things in this guide. Breaking this up allows us to provide more context without things getting too busy. For example:
| :::{admonition} Advanced migration | ||
| :applies_to: stack: preview | ||
|
|
||
| {{ls}} can handle more advanced migrations with field tracking settings in the [Elasticsearch input](https://www.elastic.co/docs/reference/logstash/plugins/plugins-inputs-elasticsearch) plugin. The field tracking feature adds cursor-like pagination functionality that can support more complex migrations and ongoing data migration over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also add these here?
size- Controls how many documents are retrieved per scroll. Larger values increase throughput but use more memory....slices- Enables parallel reads from the source index...scroll- Adjusts how long Elasticsearch keeps the scroll context alive...
| :::{admonition} Advanced migration | ||
| :applies_to: stack: preview | ||
|
|
||
| {{ls}} can handle more advanced migrations with field tracking settings in the [Elasticsearch input](https://www.elastic.co/docs/reference/logstash/plugins/plugins-inputs-elasticsearch) plugin. The field tracking feature adds cursor-like pagination functionality that can support more complex migrations and ongoing data migration over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do we want to put this up in the configuration step? It's an afterthought here that most folks won't see until after they've already finished the above steps.
Closes: https://github.com/elastic/ingest-dev/issues/5008
Logstash is recommended as the preferred migration path from ECH to Serverless.
This PR adds basic instructions.
PREVIEW: https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/4039/manage-data/migrate/migrate-with-logstash
Business and user need
"Customers get discouraged to move to serverless at the moment when they don't find a way to perform the migration."
Review path
Next steps
Follow-up issue and PR(s) to expand field tracking docs for migration and other use cases