Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure Packages and remove State Variables #191

Merged
merged 20 commits into from
May 20, 2022

Conversation

jlvoiseux
Copy link
Contributor

@jlvoiseux jlvoiseux commented Apr 26, 2022

Motivation / Summary

This PR aims to resolve several tech-debt-related issues, as well as to integrate improvements discussed during past code reviews.

Changes

How to test

@github-actions github-actions bot added the aws-λ-extension AWS Lambda Extension label Apr 26, 2022
@apmmachine
Copy link

apmmachine commented Apr 26, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-05-19T09:55:36.232+0000

  • Duration: 7 min 4 sec

Test stats 🧪

Test Results
Failed 0
Passed 200
Skipped 4
Total 204

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@jlvoiseux jlvoiseux marked this pull request as ready for review May 4, 2022 07:58
@jlvoiseux jlvoiseux requested a review from a team May 4, 2022 08:22
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some initial feedback.

Would it be possible to break this up a little bit? Maybe even just splitting 393aeef out into a separate PR? It's difficult to provide a comprehensive review when there are unrelated changes.

apm-lambda-extension/extension/apm_server.go Outdated Show resolved Hide resolved
apm-lambda-extension/logsapi/subscribe.go Show resolved Hide resolved
apm-lambda-extension/main.go Outdated Show resolved Hide resolved
apm-lambda-extension/main.go Outdated Show resolved Hide resolved
apm-lambda-extension/main.go Outdated Show resolved Hide resolved
apm-lambda-extension/main.go Outdated Show resolved Hide resolved
apm-lambda-extension/main.go Outdated Show resolved Hide resolved
@jlvoiseux jlvoiseux force-pushed the tech-debt-refactor branch from 393aeef to 83b0ce8 Compare May 5, 2022 12:15
@jlvoiseux
Copy link
Contributor Author

@axw Thank you for your feedback! I have rebased 393aeef - that one will be the object of a later PR.

// Init APM Server Transport struct and start http server to receive data from agent
apmServerTransport := extension.InitApmServerTransport(config)
agentDataServer, err := extension.StartHttpServer(ctx, apmServerTransport)
if err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a way to remove one of these two calls, in a similar fashion to what we did with logsapi.Subscribe and logsTransport.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jlvoiseux, looking good.

I'm a bit concerned about races between agents requesting a flush, the completion of the invocation according to runtimeDone, and the flush timer. As it is we could mistakenly think we should flush earlier than we should. I'd like to dig into the requirements a bit here to see if we can simplify things.

apm-lambda-extension/main.go Outdated Show resolved Hide resolved
apm-lambda-extension/logsapi/subscribe.go Outdated Show resolved Hide resolved
apm-lambda-extension/logsapi/subscribe.go Outdated Show resolved Hide resolved
apm-lambda-extension/logsapi/subscribe.go Outdated Show resolved Hide resolved
apm-lambda-extension/main.go Outdated Show resolved Hide resolved
apm-lambda-extension/extension/apm_server.go Outdated Show resolved Hide resolved
apm-lambda-extension/logsapi/subscribe.go Show resolved Hide resolved
apm-lambda-extension/extension/route_handlers.go Outdated Show resolved Hide resolved
apm-lambda-extension/extension/route_handlers.go Outdated Show resolved Hide resolved
@axw
Copy link
Member

axw commented May 6, 2022

I'm a bit concerned about races between agents requesting a flush, the completion of the invocation according to runtimeDone, and the flush timer. As it is we could mistakenly think we should flush earlier than we should. I'd like to dig into the requirements a bit here to see if we can simplify things.

After some offline discussions, it seems like what we have is desirable. It would be good to document some of the reasoning in the code (maybe at the select loop in processEvent) for posterity.

Still, we need to ensure signals cannot leak between invocations.

apm-lambda-extension/extension/apm_server.go Outdated Show resolved Hide resolved
apm-lambda-extension/extension/apm_server.go Outdated Show resolved Hide resolved
apm-lambda-extension/extension/apm_server.go Outdated Show resolved Hide resolved
apm-lambda-extension/extension/apm_server.go Outdated Show resolved Hide resolved
@jlvoiseux
Copy link
Contributor Author

@axw, @felixbarny, thank you for your reviews. I have implemented the related corrections last Friday.

To solve the race related to AgentDoneSignal, I have reproduced it by adding a test in main_test.go. This test simulates the case where the AgentDoneSignal corresponding to the invocation i is received after the RuntimeDoneSignal, thus interrupting the execution of the invocation i+1.

This scenario is avoided by opening and closing the AgentDoneSignal for each Lambda invocation. A limitation of this solution is that the edge case where the AgentDoneSignal of the invocation i is received during the invocation i+1 - but the only surefire way to handle this case would be to decompress the APM Payload to check faas.execution_id. Moreover, this scenario happening is very unlikely, as the Agent will have been stopped and restarted by the Lambda service by the time we reach the next invocation.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @jlvoiseux. Although there's still a theoretical race with AgentDoneSignal, it's at least no worse than it was before. Maybe we can eliminate it in the future.

@jlvoiseux
Copy link
Contributor Author

Logs API Issue Fix
The issue mentioned offline was due to me overlooking the following documentation item:

HTTP (recommended) – Lambda delivers logs to a local HTTP endpoint (http://sandbox.localdomain:${PORT}/${PATH}) as an array of records in JSON format. The $PATH parameter is optional. Note that only HTTP is supported, not HTTPS. You can choose to receive logs through PUT or POST.

Prior to the PR, the default listener address was set as a global variable to sandbox and then modified to accommodate our tests needs. This is no longer the case thanks to the implementation of InitLogsTransport(). However, the default listener address set by that function was localhost - the value required by our tests. To solve this issue:

Benchmark
The numerous changes implemented by the current PR did not impact performance or error rate in high throughput scenarios:

v1.0.0
image

Current PR
image

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Change looks good.

@jlvoiseux jlvoiseux merged commit 849aea4 into elastic:main May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-λ-extension AWS Lambda Extension
Projects
None yet
4 participants