Skyhigh: fix loss events by TOUFIKIzakarya · Pull Request #2065 · SEKOIA-IO/automation-library

TOUFIKIzakarya · 2026-02-26T14:09:38Z

Summary by Sourcery

Synchronize event collection and forwarding using batch identifiers so that ingestion checkpoints are only advanced after successful forwarding.

Bug Fixes:

Prevent loss of Skyhigh SWG events by deferring time range advancement until batches are confirmed pushed to the intake.

Enhancements:

Propagate batch identifiers through collector, transformer, and forwarder pipelines to track pending and processed batches.
Introduce a batch status queue to coordinate push confirmations between the forwarder and collector, avoiding premature checkpoint updates.

Tests:

Update and extend unit tests to cover batch ID propagation, transformer behavior with tagged batches, and confirmation of batch completion by the forwarder.

sourcery-ai · 2026-02-26T14:09:44Z

Reviewer's Guide

Implements batch-level tracking and confirmation between event collection, transformation, and forwarding to avoid losing Skyhigh SWG events, wiring a new batch status queue through the pipeline and updating tests accordingly.

Sequence diagram for Skyhigh SWG batch tracking and confirmation

sequenceDiagram
    participant SkyhighSecuritySWGTrigger as SkyhighSecuritySWGTrigger
    participant EventCollector as EventCollector
    participant Transformers as EventsTransformer
    participant EventsForwarder as EventsForwarder
    participant Intake as IntakeService
    participant BatchStatusQueue as batch_status_queue

    SkyhighSecuritySWGTrigger->>EventCollector: start(collect_queue, batch_status_queue)
    SkyhighSecuritySWGTrigger->>Transformers: start(collect_queue, forwarding_queue)
    SkyhighSecuritySWGTrigger->>EventsForwarder: start(forwarding_queue, batch_status_queue)

    loop periodic_collection
        EventCollector->>EventCollector: query_api()
        alt response available
            EventCollector->>EventCollector: generate batch_id
            EventCollector->>EventCollector: pending_batches[batch_id] = end_date
            EventCollector->>Transformers: put((batch_id, response)) on collect_queue

            EventCollector->>BatchStatusQueue: get(timeout=60)
            alt confirmed matching batch_id
                BatchStatusQueue-->>EventCollector: confirmed_batch_id
                EventCollector->>EventCollector: pending_batches.pop(batch_id)
                EventCollector->>EventCollector: _update_time_range()
                EventCollector->>EventCollector: _sleep_until_next_batch()
            else different batch_id
                BatchStatusQueue-->>EventCollector: other_batch_id
                EventCollector->>BatchStatusQueue: put(other_batch_id)
            end
        else no response
            EventCollector->>EventCollector: log No messages to forward
            EventCollector->>EventCollector: _update_time_range()
            EventCollector->>EventCollector: _sleep_until_next_batch()
        end
    end

    loop transform_loop
        Transformers->>Transformers: get((batch_id, response)) from collect_queue
        Transformers->>Transformers: for messages in batched(_transform(response))
        Transformers->>EventsForwarder: put((batch_id, list(messages))) on forwarding_queue
    end

    loop forward_loop
        EventsForwarder->>EventsForwarder: next_batch(max_batch_size)
        EventsForwarder->>EventsForwarder: collect batch_ids, events from forwarding_queue
        alt events not empty
            EventsForwarder->>Intake: push_events_to_intakes(events)
            Intake-->>EventsForwarder: success
            EventsForwarder->>EventsForwarder: processed_batches.add(batch_id)
            EventsForwarder->>BatchStatusQueue: put(batch_id)
        else no events
            EventsForwarder->>EventsForwarder: wait
        end
    end

Updated class diagram for Skyhigh SWG batch-aware workers

classDiagram
    class SkyhighSecuritySWGTrigger {
        +SkyhighSWGConfig configuration
        +run()
    }

    class EventCollector {
        +SkyhighSecuritySWGTrigger connector
        +queue~Queue~ events_queue
        +queue~Queue~ batch_status_queue
        +datetime trigger_activation
        +dict headers
        +str endpoint
        +datetime end_date
        +datetime start_date
        +str url
        +dict pending_batches
        +log(*args, **kwargs)
        +query_api()
        +next_batch()
        +_update_time_range()
        +_sleep_until_next_batch()
        +run()
    }

    class EventsTransformer {
        +SkyhighSecuritySWGTrigger connector
        +SkyhighSWGConfig configuration
        +queue~Queue~ queue
        +queue~Queue~ output_queue
        +int max_batch_size
        +run()
    }

    class EventsForwarder {
        +SkyhighSecuritySWGTrigger connector
        +SkyhighSWGConfig configuration
        +queue~Queue~ queue
        +queue~Queue~ batch_status_queue
        +int max_batch_size
        +set processed_batches
        +tuple~set, list~ next_batch(int max_batch_size)
        +run()
    }

    SkyhighSecuritySWGTrigger --> EventCollector : creates
    SkyhighSecuritySWGTrigger --> EventsTransformer : creates
    SkyhighSecuritySWGTrigger --> EventsForwarder : creates

    EventCollector --> EventsTransformer : puts(batch_id, response) on events_queue
    EventsTransformer --> EventsForwarder : puts(batch_id, messages) on output_queue
    EventsForwarder --> EventCollector : confirms batch_id via batch_status_queue

File-Level Changes

Change	Details	Files
Introduce batch IDs and a batch status queue to coordinate between collector, transformer, and forwarder so checkpoints only advance after successful intake push.	Extend EventCollector, Transformer, and EventsForwarder constructors and run loops to pass around (batch_id, payload) tuples instead of raw messages Have EventCollector generate UUID batch IDs, track them in a pending_batches map, enqueue them with responses, and wait on a batch_status_queue confirmation before advancing the time range Modify Transformer to propagate batch IDs along with transformed event lists into the forwarding queue Update EventsForwarder to aggregate batch_ids with events, push events to the intake, and then send one confirmation per processed batch_id on the batch_status_queue while deduplicating confirmations	`SkyhighSecurity/gateway_cloud_services/trigger_skyhigh_security_swg.py`
Wire the new batch_status_queue through trigger startup and tests, and adapt tests to the new (batch_id, events) interfaces and confirmation behavior.	Create a batch_status_queue in the trigger run method and pass it to EventCollector, Transformer, and EventsForwarder instances including restart path Add pytest fixtures for batch_status_queue and update event_collector and forwarder fixtures to accept it Adjust transformer and forwarder tests to work with (batch_id, payload) tuples and assert that batch confirmation is sent on batch_status_queue	`SkyhighSecurity/gateway_cloud_services/trigger_skyhigh_security_swg.py` `SkyhighSecurity/tests/test_gateway_cloud_services_trigger.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Copilot

Pull request overview

This PR aims to prevent Skyhigh SWG event loss by introducing a “batch ID + push confirmation” mechanism so the collector only advances its checkpoint after events are forwarded successfully.

Changes:

Add a batch_status_queue and batch IDs to coordinate collector → transformer → forwarder processing.
Modify the collector to wait for batch push confirmation before updating the time range.
Update tests to account for the new batch/confirmation flow.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
`SkyhighSecurity/gateway_cloud_services/trigger_skyhigh_security_swg.py`	Introduces batch IDs and a status queue to gate checkpoint updates on successful forwarding.
`SkyhighSecurity/tests/test_gateway_cloud_services_trigger.py`	Adjusts fixtures/tests to pass and assert batch/confirmation data through the pipeline.

Comments suppressed due to low confidence (1)

SkyhighSecurity/gateway_cloud_services/trigger_skyhigh_security_swg.py:404

EventsForwarder now requires batch_status_queue as the third constructor argument, but Workers.create(...) still instantiates it with (self, forwarding_queue, batch_size). That means batch_size (an int) will be passed as batch_status_queue, leading to runtime failures when the forwarder tries to put() confirmations. Update the worker creation call to pass batch_status_queue before batch_size.

        batch_status_queue: queue.Queue = queue.Queue(maxsize=100)

        # start the event forwarder
        batch_size = int(os.environ.get("BATCH_SIZE", 10000))
        forwarders = Workers.create(

Copilot · 2026-02-26T14:14:39Z

SkyhighSecurity/gateway_cloud_services/trigger_skyhigh_security_swg.py

+                try:
+                    confirmed_batch_id = self.batch_status_queue.get(
+                        block=True, timeout=60
+                    )  # 60 second timeout
+                    if confirmed_batch_id == batch_id:
+                        self.log(
+                            message=f"Batch {batch_id} confirmed pushed", level="debug"
+                        )
+                        # Remove from pending
+                        self.pending_batches.pop(batch_id, None)
+                    else:
+                        self.log(
+                            message=f"Received confirmation for {confirmed_batch_id} but waiting for {batch_id}",
+                            level="warning",
+                        )
+                        # Put it back for next iteration
+                        self.batch_status_queue.put(confirmed_batch_id)
+                except queue.Empty:
+                    self.log(
+                        message=f"Timeout waiting for batch {batch_id} confirmation. Batch may still be processing.",
+                        level="warning",
+                    )
+                    # Note: We don't remove from pending, checkpoint won't be saved
+                    return


When a confirmation is received for a different batch ID, the code logs and re-queues that ID, but then continues and updates the time range anyway. This can advance the checkpoint even though the current batch_id hasn't been confirmed. Consider looping until the expected batch_id is confirmed (or a deadline is reached) and only then updating the time range.

Suggested change

try:

confirmed_batch_id = self.batch_status_queue.get(

block=True, timeout=60

) # 60 second timeout

if confirmed_batch_id == batch_id:

self.log(

message=f"Batch {batch_id} confirmed pushed", level="debug"

)

# Remove from pending

self.pending_batches.pop(batch_id, None)

else:

self.log(

message=f"Received confirmation for {confirmed_batch_id} but waiting for {batch_id}",

level="warning",

)

# Put it back for next iteration

self.batch_status_queue.put(confirmed_batch_id)

except queue.Empty:

self.log(

message=f"Timeout waiting for batch {batch_id} confirmation. Batch may still be processing.",

level="warning",

)

# Note: We don't remove from pending, checkpoint won't be saved

return

# Wait up to 60 seconds for the specific batch_id to be confirmed.

deadline = datetime.now(timezone.utc) + timedelta(seconds=60)

while True:

remaining = (deadline - datetime.now(timezone.utc)).total_seconds()

if remaining <= 0:

self.log(

message=(

f"Timeout waiting for batch {batch_id} confirmation. "

"Batch may still be processing."

),

level="warning",

)

# Note: We don't remove from pending, checkpoint won't be saved

return

try:

confirmed_batch_id = self.batch_status_queue.get(

block=True,

timeout=remaining,

)

except queue.Empty:

self.log(

message=(

f"Timeout waiting for batch {batch_id} confirmation. "

"Batch may still be processing."

),

level="warning",

)

# Note: We don't remove from pending, checkpoint won't be saved

return

if confirmed_batch_id == batch_id:

self.log(

message=f"Batch {batch_id} confirmed pushed",

level="debug",

)

# Remove from pending

self.pending_batches.pop(batch_id, None)

break

else:

self.log(

message=(

f"Received confirmation for {confirmed_batch_id} "

f"but waiting for {batch_id}"

),

level="warning",

)

# Put it back for next iteration

self.batch_status_queue.put(confirmed_batch_id)

Copilot · 2026-02-26T14:14:39Z

SkyhighSecurity/gateway_cloud_services/trigger_skyhigh_security_swg.py

+                        message=f"Timeout waiting for batch {batch_id} confirmation. Batch may still be processing.",
+                        level="warning",
+                    )
+                    # Note: We don't remove from pending, checkpoint won't be saved


On confirmation timeout, next_batch() returns immediately. In run(), that causes the collector loop to re-query the same time range again without sleeping, potentially hammering the API and growing pending_batches indefinitely if confirmations never arrive. Consider sleeping/backing off on timeout and cleaning up or retrying the pending batch instead of returning to a tight loop.

Suggested change

# Note: We don't remove from pending, checkpoint won't be saved

# Note: We don't remove from pending, checkpoint won't be saved

# Sleep before returning to avoid tight loop and API hammering

self._sleep_until_next_batch()

Copilot · 2026-02-26T14:14:40Z

SkyhighSecurity/gateway_cloud_services/trigger_skyhigh_security_swg.py

+                    # Confirm batches after successful push
+                    for batch_id in batch_ids:
+                        if batch_id not in self.processed_batches:
+                            try:
+                                self.batch_status_queue.put(batch_id, block=False)


Batch confirmation is currently sent as soon as any events for a batch_id are pushed. However, the Transformer can emit multiple message chunks with the same batch_id (when the CSV response exceeds max_batch_size). Confirming after the first chunk can let the collector advance the checkpoint while later chunks are still queued, causing event loss on restart. Consider adding an explicit end-of-batch marker/count so the forwarder only confirms after the last chunk of a batch has been successfully pushed.

Copilot · 2026-02-26T14:14:40Z

SkyhighSecurity/gateway_cloud_services/trigger_skyhigh_security_swg.py

+        self.processed_batches: set = (
+            set()
+        )  # Track which batch_ids we've already confirmed


self.processed_batches grows by one UUID per collected batch and is never pruned, so the forwarder will leak memory over long runtimes. If you need de-duplication, consider a bounded/TTL structure, or redesign the batching/confirmation so a growing global set isn't required.

Copilot · 2026-02-26T14:14:40Z

SkyhighSecurity/tests/test_gateway_cloud_services_trigger.py

 @pytest.fixture
-def event_collector(trigger, events_queue):
-    return EventCollector(trigger, events_queue)
+def event_collector(trigger, events_queue, batch_status_queue):


The event_collector fixture now passes an empty batch_status_queue, but EventCollector.next_batch() blocks waiting for a confirmation (up to 60s). Tests like test_next_batch/test_next_batch_error_should_wait that call next_batch() without arranging a confirmation will hang or time out and will no longer observe the expected time range update. Update those tests to pre-populate batch_status_queue (or patch the confirmation wait) so they match the new behavior.

Suggested change

def event_collector(trigger, events_queue, batch_status_queue):

def event_collector(trigger, events_queue, batch_status_queue):

# Pre-populate the batch_status_queue so that calls to EventCollector.next_batch()

# have an immediate confirmation available and do not block waiting on an empty queue.

batch_status_queue.put(None)

Copilot · 2026-02-26T14:14:41Z

SkyhighSecurity/tests/test_gateway_cloud_services_trigger.py

+    batch_status_queue = queue.Queue()
+    transformer = Transformer(trigger, input_queue, events_queue, batch_status_queue)


Transformer.__init__ currently accepts (connector, queue, output_queue, max_batch_size=...), but this test passes an extra batch_status_queue positional argument. This will be interpreted as max_batch_size (a Queue), causing runtime errors when batched(..., self.max_batch_size) runs. Update the test to match the actual Transformer signature (or update Transformer if it is supposed to take a status queue).

Suggested change

batch_status_queue = queue.Queue()

transformer = Transformer(trigger, input_queue, events_queue, batch_status_queue)

transformer = Transformer(trigger, input_queue, events_queue)

Copilot · 2026-02-26T14:14:41Z

SkyhighSecurity/tests/test_gateway_cloud_services_trigger.py

+    batch_ids, events = events_queue.get(block=False)
    assert events == ["user_id=-1 username=foo"]
+    assert "batch-1" in batch_ids


Transformer puts (batch_id, list(messages)) into the output queue, but this test unpacks the item as (batch_ids, events) and then asserts membership ("batch-1" in batch_ids). This is currently just checking substring membership on a string and doesn't validate the intended contract. Update the test to unpack (batch_id, events) and assert equality on batch_id.

Suggested change

batch_ids, events = events_queue.get(block=False)

assert events == ["user_id=-1 username=foo"]

assert "batch-1" in batch_ids

batch_id, events = events_queue.get(block=False)

assert events == ["user_id=-1 username=foo"]

assert batch_id == "batch-1"

Skyhigh: fix loss events

427c024

TOUFIKIzakarya requested a review from Copilot February 26, 2026 14:09

Copilot started reviewing on behalf of TOUFIKIzakarya February 26, 2026 14:10 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skyhigh: fix loss events#2065

Skyhigh: fix loss events#2065
TOUFIKIzakarya wants to merge 1 commit intodevelopfrom
fix/skyhigh_loss_events

TOUFIKIzakarya commented Feb 26, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Feb 26, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-def event_collector(trigger, events_queue, batch_status_queue):
+def event_collector(trigger, events_queue, batch_status_queue):
+    # Pre-populate the batch_status_queue so that calls to EventCollector.next_batch()
+    # have an immediate confirmation available and do not block waiting on an empty queue.
+    batch_status_queue.put(None)

		batch_status_queue = queue.Queue()
		transformer = Transformer(trigger, input_queue, events_queue, batch_status_queue)

Conversation

TOUFIKIzakarya commented Feb 26, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for Skyhigh SWG batch tracking and confirmation

Updated class diagram for Skyhigh SWG batch-aware workers

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TOUFIKIzakarya commented Feb 26, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 26, 2026 •

edited

Loading