Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemtest/gencorpora: generate corpus metadata #8937

Merged
merged 10 commits into from
Aug 24, 2022

Conversation

lahsivjar
Copy link
Contributor

@lahsivjar lahsivjar commented Aug 23, 2022

Motivation/summary

A corpora is required for defining the rally track. The corpora schema requires a list of document files and each document has the following requirements:

  1. document-count and source-file are mandatory
  2. compressed-bytes and uncompressed-bytes is recommended

This PR adds support to generate the document file with the above fields (other than compressed-bytes).

Checklist

- [ ] Update CHANGELOG.asciidoc
- [ ] Update package changelog.yml (only if changes to apmpackage have been made)
- [ ] Documentation has been updated

How to test these changes

N/A

Related issues

Part of: #8754

@mergify
Copy link
Contributor

mergify bot commented Aug 23, 2022

This pull request does not have a backport label. Could you fix it @lahsivjar? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.x is the label to automatically backport to the 7.x branch.
  • backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Aug 23, 2022
@apmmachine
Copy link
Contributor

apmmachine commented Aug 23, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-24T02:17:22.501+0000

  • Duration: 25 min 8 sec

Test stats 🧪

Test Results
Failed 0
Passed 129
Skipped 0
Total 129

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@lahsivjar lahsivjar requested a review from a team August 23, 2022 05:26
@apmmachine
Copy link
Contributor

apmmachine commented Aug 23, 2022

📚 Go benchmark report

Diff with the main branch

name                                                                                              old time/op    new time/op     delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/decoder goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
ModelIndexer/NoCompression-12                                                                       1.97µs ±17%     1.45µs ±26%  -26.44%  (p=0.032 n=5+5)
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_transaction_id.ndjson-12         21.7µs ±30%     17.7µs ± 7%  -18.29%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/errors_2.ndjson-12                      21.4µs ±10%     19.6µs ± 8%   -8.54%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/errors_transaction_id.ndjson-12         13.5µs ± 7%     12.2µs ±14%   -9.63%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/minimal.ndjson-12                       14.0µs ±13%     12.3µs ± 3%  -11.98%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions.ndjson-12                  15.5µs ± 1%     15.1µs ± 1%   -2.86%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans.ndjson-12            19.2µs ± 1%     18.5µs ± 1%   -3.27%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum.ndjson-12        2.69µs ± 1%     2.60µs ± 2%   -3.36%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/unknown-span-type.ndjson-12             8.25µs ± 1%     7.85µs ± 1%   -4.80%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors.ndjson-12                      10.9µs ± 1%     10.0µs ± 1%   -8.61%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors_2.ndjson-12                    8.62µs ± 2%     7.93µs ± 2%   -7.93%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors_rum.ndjson-12                  2.20µs ± 1%     1.99µs ± 1%   -9.83%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors_transaction_id.ndjson-12       5.73µs ± 1%     5.25µs ± 1%   -8.28%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/events.ndjson-12                      15.5µs ± 1%     14.4µs ± 1%   -7.15%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/heavy.ndjson-12                       1.58ms ± 1%     1.53ms ± 2%   -3.30%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event-type.ndjson-12           790ns ± 1%      717ns ± 1%   -9.28%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event.ndjson-12               3.60µs ± 0%     3.36µs ± 5%   -6.57%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-json-event.ndjson-12          1.27µs ± 2%     1.22µs ± 2%   -4.00%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-json-metadata.ndjson-12       1.98µs ± 1%     1.90µs ± 1%   -3.84%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata-2.ndjson-12           485ns ± 2%      466ns ± 0%   -3.85%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata.ndjson-12             491ns ± 1%      471ns ± 1%   -4.08%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/metadata-null-values.ndjson-12         991ns ± 1%      937ns ± 1%   -5.49%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/metadata.ndjson-12                    1.27µs ± 1%     1.22µs ± 3%   -4.10%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/metricsets.ndjson-12                  6.16µs ± 1%     5.92µs ± 2%   -3.82%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/minimal-service.ndjson-12             1.73µs ± 1%     1.65µs ± 2%   -4.42%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/minimal.ndjson-12                     5.10µs ± 1%     4.88µs ± 1%   -4.35%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/optional-timestamps.ndjson-12         3.11µs ± 2%     2.96µs ± 1%   -4.65%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/otel-bridge.ndjson-12                 3.89µs ± 1%     3.69µs ± 2%   -5.11%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/ratelimit.ndjson-12                   16.6µs ± 1%     16.0µs ± 1%   -3.27%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/span-links.ndjson-12                  1.83µs ± 1%     1.77µs ± 3%   -3.37%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions-huge_traces.ndjson-12    4.99µs ± 2%     4.75µs ± 2%   -4.69%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions.ndjson-12                13.3µs ± 3%     12.7µs ± 1%   -4.75%  (p=0.016 n=4+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans.ndjson-12          16.3µs ± 1%     15.6µs ± 1%   -4.48%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans_rum.ndjson-12      2.24µs ± 1%     2.16µs ± 1%   -3.74%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans_rum_2.ndjson-12    2.19µs ± 1%     2.08µs ± 1%   -4.83%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/unknown-span-type.ndjson-12           6.95µs ± 1%     6.68µs ± 1%   -3.82%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
Process-12                                                                                          1.37µs ±19%     1.98µs ±55%  +44.38%  (p=0.032 n=5+5)
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64
ReadEvents/nop_codec/1_events-12                                                                    2.07µs ± 6%     1.87µs ± 6%   -9.92%  (p=0.008 n=5+5)
ReadEvents/nop_codec/1000_events-12                                                                  932µs ± 6%     1009µs ± 6%   +8.26%  (p=0.032 n=5+5)

name                                                                                              old alloc/op   new alloc/op    delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/decoder goos:linux goarch:amd64
CompressedRequestReader/deflate_content_encoding-12                                                 45.0kB ± 0%     45.0kB ± 0%   +0.01%  (p=0.032 n=5+5)
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/heavy.ndjson-12                         13.9MB ± 0%     13.9MB ± 0%   -0.24%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/transactions_spans.ndjson-12            93.6kB ± 1%     92.7kB ± 0%   -0.91%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/errors_rum.ndjson-12                    10.7kB ± 0%     10.5kB ± 1%   -1.36%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/errors_transaction_id.ndjson-12         22.2kB ± 1%     21.9kB ± 1%   -1.15%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/otel-bridge.ndjson-12                   18.3kB ± 1%     18.4kB ± 1%   +0.57%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/unknown-span-type.ndjson-12             25.4kB ± 1%     25.2kB ± 1%   -0.97%  (p=0.032 n=5+5)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64

name                                                                                              old allocs/op  new allocs/op   delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/decoder goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64

name                                                                                              old speed      new speed       delta
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_transaction_id.ndjson-12        181MB/s ±25%    216MB/s ± 7%  +19.47%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/errors_2.ndjson-12                     221MB/s ±10%    242MB/s ± 8%   +9.45%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/errors_transaction_id.ndjson-12        284MB/s ± 7%    316MB/s ±16%  +11.15%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/minimal.ndjson-12                     74.0MB/s ±12%   83.7MB/s ± 3%  +13.15%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions.ndjson-12                 364MB/s ± 1%    375MB/s ± 1%   +2.95%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans.ndjson-12           304MB/s ± 1%    314MB/s ± 1%   +3.38%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum.ndjson-12       429MB/s ± 1%    444MB/s ± 2%   +3.49%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/unknown-span-type.ndjson-12            401MB/s ± 1%    421MB/s ± 1%   +5.04%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors.ndjson-12                     582MB/s ± 1%    636MB/s ± 1%   +9.42%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors_2.ndjson-12                   547MB/s ± 2%    594MB/s ± 2%   +8.60%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors_rum.ndjson-12                 861MB/s ± 1%    954MB/s ± 1%  +10.88%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors_transaction_id.ndjson-12      667MB/s ± 1%    728MB/s ± 1%   +9.03%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/events.ndjson-12                     479MB/s ± 1%    516MB/s ± 1%   +7.70%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/heavy.ndjson-12                      252MB/s ± 1%    261MB/s ± 2%   +3.43%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event-type.ndjson-12         495MB/s ± 1%    546MB/s ± 1%  +10.23%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event.ndjson-12              213MB/s ± 0%    228MB/s ± 5%   +7.17%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-json-event.ndjson-12         462MB/s ± 2%    481MB/s ± 2%   +4.18%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-json-metadata.ndjson-12      226MB/s ± 1%    235MB/s ± 1%   +4.00%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata-2.ndjson-12         900MB/s ± 2%    936MB/s ± 0%   +3.99%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata.ndjson-12           908MB/s ± 1%    947MB/s ± 1%   +4.25%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/metadata-null-values.ndjson-12       531MB/s ± 1%    561MB/s ± 1%   +5.81%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/metadata.ndjson-12                   977MB/s ± 1%   1019MB/s ± 3%   +4.31%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/metricsets.ndjson-12                 413MB/s ± 1%    430MB/s ± 2%   +3.99%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/minimal-service.ndjson-12            246MB/s ± 1%    258MB/s ± 2%   +4.62%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/minimal.ndjson-12                    202MB/s ± 1%    211MB/s ± 1%   +4.55%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/optional-timestamps.ndjson-12        330MB/s ± 2%    347MB/s ± 1%   +4.85%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/otel-bridge.ndjson-12                484MB/s ± 1%    510MB/s ± 2%   +5.39%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/ratelimit.ndjson-12                  254MB/s ± 1%    263MB/s ± 1%   +3.39%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/span-links.ndjson-12                 372MB/s ± 1%    385MB/s ± 3%   +3.53%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions-huge_traces.ndjson-12   636MB/s ± 2%    667MB/s ± 2%   +4.92%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions.ndjson-12               423MB/s ± 3%    445MB/s ± 1%   +4.97%  (p=0.016 n=4+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans.ndjson-12         356MB/s ± 1%    373MB/s ± 1%   +4.68%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans_rum.ndjson-12     515MB/s ± 1%    535MB/s ± 1%   +3.88%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans_rum_2.ndjson-12   511MB/s ± 1%    537MB/s ± 1%   +5.06%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/unknown-span-type.ndjson-12          476MB/s ± 1%    495MB/s ± 1%   +3.97%  (p=0.008 n=5+5)

report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few early comments. Looks good overall.

systemtest/gencorpora/catbulk.go Outdated Show resolved Hide resolved
systemtest/gencorpora/catbulk.go Outdated Show resolved Hide resolved
systemtest/gencorpora/catbulk.go Outdated Show resolved Hide resolved
systemtest/gencorpora/catbulk.go Outdated Show resolved Hide resolved
for count := 0; scanner.Scan(); count++ {
fmt.Fprintln(writer, scanner.Text())
if _, err := fmt.Fprintln(&buf, scanner.Text()); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if _, err := fmt.Fprintln(&buf, scanner.Text()); err != nil {
if _, err := buf.Write(scanner.Bytes()); err != nil {

suggestion: I think we can avoid a byte->string conversion and use the buffer directly. WDYT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work since the bulk API requires newline delimited JSON and scanner strips EOL markers. However, it is possible to optimize performance here by customizing the scanner -- I was being lazy here since it is not in the critical code path. Let me update this.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lahsivjar
Copy link
Contributor Author

I have updated the code to split input expecting NDJSON doc with metadata followed by source. This should cleanup the code a bit and have less memory footprint.

@lahsivjar lahsivjar requested a review from axw August 24, 2022 01:58
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. LGTM.

@lahsivjar lahsivjar merged commit 3cc896e into elastic:main Aug 24, 2022
@lahsivjar lahsivjar deleted the 8754_run-rally branch August 24, 2022 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants