Conversation
Reviewer's GuideOptimize S3 CloudTrail event forwarding by avoiding unnecessary sleep when records are available, add a lightweight chunking counter, and adjust tests/imports accordingly. Sequence diagram for updated AwsS3QueuedConnector run and forward_events flowsequenceDiagram
participant Runner as RunLoop
participant Connector as AbstractAwsS3QueuedConnector
participant S3 as AwsS3Bucket
participant Dest as DestinationPlatform
Runner->>Connector: run()
loop until alive_is_set
Connector->>Connector: commit_marker()
Connector->>Connector: forward_events() int
activate Connector
Connector->>S3: _fetch_next_objects(marker)
S3-->>Connector: objects
Connector->>S3: _fetch_events(bucket_name, _move_marker(objects))
S3-->>Connector: events
Connector->>Connector: _chunk_events(events, chunk_size)
loop for each chunk
Connector->>Dest: send_records(records, event_name, batch_size)
Dest-->>Connector: ack
end
Connector-->>Runner: total_records
deactivate Connector
alt total_records == 0
Connector->>Connector: sleep(configuration.frequency)
else total_records > 0
Note right of Connector: No sleep to minimize CloudTrail delay
end
end
Updated class diagram for AbstractAwsS3QueuedConnector event handlingclassDiagram
class AbstractAwsS3QueuedConnector {
+configuration: AwsS3FetcherConfiguration
+bucket_name: str
+marker: str
+alive: Event
+trigger: Trigger
+log(message: str, level: str) void
+log_exception(ex: Exception, message: str) void
+send_records(records: list, event_name: str, batch_size: int) void
+commit_marker() void
+_fetch_next_objects(marker: str) list
+_move_marker(objects: list) str
+_fetch_events(bucket_name: str, marker: str) Sequence
+_chunk_events(events: Sequence, chunk_size: int) Generator~list~
+forward_events() int
+stop() void
+run() void
}
class AwsS3FetcherConfiguration {
+frequency: int
+chunk_size: int
}
AbstractAwsS3QueuedConnector --> AwsS3FetcherConfiguration : uses
class _chunk_events_changes {
+counter: int
+for_each_event(events)
+if counter >= chunk_size then yield chunk
+reset counter on new chunk
}
AbstractAwsS3QueuedConnector --> _chunk_events_changes : implements_chunking_logic
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- The change to
_chunk_eventsadds acounterto avoid callinglen(chunk), butlenon a list is already O(1) in CPython and still used in the finalifcondition; consider either reverting tolen(chunk)entirely or consistently using the counter and updating the comment to avoid implying a performance issue that isn’t there. - With the new
runloop behavior, when records are continuously available there is no sleep at all, which may result in a tight loop consuming more CPU than before; consider keeping a minimal sleep or a backoff mechanism even when records are forwarded.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The change to `_chunk_events` adds a `counter` to avoid calling `len(chunk)`, but `len` on a list is already O(1) in CPython and still used in the final `if` condition; consider either reverting to `len(chunk)` entirely or consistently using the counter and updating the comment to avoid implying a performance issue that isn’t there.
- With the new `run` loop behavior, when records are continuously available there is no sleep at all, which may result in a tight loop consuming more CPU than before; consider keeping a minimal sleep or a backoff mechanism even when records are forwarded.
## Individual Comments
### Comment 1
<location> `AWS/connectors/s3/logs/base.py:214-218` </location>
<code_context>
# get and forward events
try:
events = self._fetch_events(self.bucket_name, self._move_marker(objects))
chunks = self._chunk_events(list(events), self.configuration.chunk_size or 10000)
for records in chunks:
- self.log(message=f"forwarding {len(records)} records", level="info")
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid materializing all events into a list before chunking to reduce memory pressure.
Wrapping `self._fetch_events(...)` in `list(...)` removes the benefits of streaming and can cause high memory usage if many events are returned. Since `_chunk_events` only iterates over the sequence, you can pass the iterator directly, e.g. `chunks = self._chunk_events(events, ...)`, to preserve streaming behavior and reduce memory pressure.
```suggestion
# get and forward events
try:
events = self._fetch_events(self.bucket_name, self._move_marker(objects))
chunks = self._chunk_events(events, self.configuration.chunk_size or 10000)
for records in chunks:
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
squioc
left a comment
There was a problem hiding this comment.
Hello,
Thank you about the work.
However, I'm doubt about the impact of these changes. The code belong to deprecated triggers
There was a problem hiding this comment.
Hello, I doubt about the changes.
The triggers (CloudTrailLogsTrigger and FlowlogRecordsTrigger), related to this code, are deprecated.
automation-library/AWS/main.py
Lines 26 to 27 in f1a33c0
automation-library/AWS/trigger_cloudtrail_logs.json
Lines 3 to 4 in f1a33c0
automation-library/AWS/trigger_flowlogs.json
Lines 3 to 4 in f1a33c0
Relates to 1193
Summary by Sourcery
Adjust S3 log forwarding to be rate-aware by tracking and returning the number of forwarded CloudTrail records per run and sleeping only when no records are processed.
Bug Fixes:
Enhancements:
Tests:
Chores: