Skip to content

feat: Add Anozrway connector#2025

Draft
helkabouss wants to merge 21 commits intoSEKOIA-IO:developfrom
Akonis-cybersecurity:feat/anozrway
Draft

feat: Add Anozrway connector#2025
helkabouss wants to merge 21 commits intoSEKOIA-IO:developfrom
Akonis-cybersecurity:feat/anozrway

Conversation

@helkabouss
Copy link

@helkabouss helkabouss commented Feb 12, 2026

Description

This PR adds a new connector for Anozrway to the SEKOIA.IO automation library.

Features

  • Historical Connector: Fetch historical events from Anozrway API
  • HTTP Client: Robust client with error handling and retry logic
  • Metrics: Prometheus metrics integration for monitoring
  • Tests: Comprehensive unit tests with high coverage
  • Docker: Containerized deployment ready

Components

  • anozrway_modules/client/: HTTP client and error handling
  • anozrway_modules/historical_connector.py: Main connector logic
  • anozrway_modules/models.py: Data models
  • anozrway_modules/metrics.py: Metrics collection
  • tests/: Complete test suite
  • manifest.json: Connector configuration
  • Dockerfile: Container configuration

Type of change

  • New connector/integration
  • Documentation included
  • Tests included

Testing

  • Unit tests pass locally
  • Manual testing completed
  • Docker build successful

Checklist

  • Code follows the project's style guidelines
  • Self-review completed
  • Documentation added
  • Tests added and passing
  • No breaking changes

Summary by Sourcery

Introduce an Anozrway integration module providing a historical connector that ingests leak detection events from the Anozrway Balise Pipeline into SEKOIA.IO, backed by an OAuth2-enabled HTTP client and observability via Prometheus metrics.

New Features:

  • Add an Anozrway historical connector that periodically fetches domain-based leak events and forwards them to a SEKOIA intake with checkpointing and de-duplication.
  • Introduce an OAuth2-based Anozrway HTTP client supporting domain search and events retrieval with configurable endpoints and headers.
  • Expose Anozrway configuration and metadata via a dedicated module, manifest, and connector registration entrypoint.

Enhancements:

  • Add Prometheus counters, gauges, and histograms to track collected, forwarded, and duplicate events as well as API usage and checkpoint age.
  • Containerize the Anozrway module with a Dockerfile and Poetry-based dependency management for deployment.

Build:

  • Introduce a pyproject.toml with Poetry configuration, runtime and test dependencies, formatting and coverage settings for the Anozrway package.

Documentation:

  • Document the new Anozrway connector and module metadata via manifest and changelog entries.

Tests:

  • Add comprehensive unit tests for the Anozrway HTTP client, historical connector behavior, and supporting module, models, and metrics with high coverage thresholds enforced via pytest configuration.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 12, 2026

Reviewer's Guide

Implements a new Anozrway historical connector module for SEKOIA.IO, including an async HTTP client with OAuth2, rate limiting and retry logic, a stateful historical connector with deduplication and checkpointing, Prometheus metrics, packaging/manifest/Docker support, and a comprehensive pytest suite.

Sequence diagram for Anozrway historical collection run loop

sequenceDiagram
    actor Operator
    participant Module as SekoiaModule
    participant Connector as AnozrwayHistoricalConnector
    participant Client as AnozrwayClient
    participant API as AnozrwayAPI
    participant Intake as SekoiaIntake
    participant Metrics as PrometheusMetrics

    Operator->>Module: start module.run()
    Module->>Connector: register and start anozrway_historical
    activate Connector

    loop while Connector.running
        Connector->>Client: __aenter__(module_config, trigger)
        activate Client
        Client->>API: _get_access_token() via token_url
        API-->>Client: access_token

        loop fetch_events windows
            Connector->>Client: fetch_events(context, domain, start_date, end_date)
            activate Client
            Client->>API: POST /events
            API-->>Client: { events: [...] }
            Client-->>Connector: List events
            deactivate Client

            Connector->>Connector: deduplicate, enrich, update max_seen_ts
            Connector->>Metrics: api_requests, api_request_duration, events_collected, events_duplicated

            alt non_empty_batch
                Connector->>Intake: push_data_to_intakes(events=batch)
                Connector->>Metrics: events_forwarded
            end
        end

        Connector->>Connector: save_checkpoint(last_seen+1s)
        Connector->>Metrics: checkpoint_age

        Client->>Client: __aexit__()
        deactivate Client

        Connector->>Connector: sleep(configuration.frequency)
    end

    Connector-->>Module: shutdown
    deactivate Connector
Loading

Class diagram for Anozrway connector, client and module

classDiagram
    class Module
    class Trigger
    class AsyncConnector
    class DefaultConnectorConfiguration
    class BaseModel

    class AnozrwayModule {
        +str name
        +str description
    }

    class AnozrwayModuleConfiguration {
    }

    class AnozrwayHistoricalConfiguration {
        +Optional~str~ intake_server
        +str intake_key
        +int frequency
        +int chunk_size
        +str context
        +str domains
        +int lookback_days
        +int window_seconds
    }

    class AnozrwayHistoricalConnector {
        +str name
        +AnozrwayHistoricalConfiguration configuration
        -PersistentJSON context_store
        -PersistentJSON event_cache_store
        -timedelta event_cache_ttl
        +last_checkpoint() datetime
        +save_checkpoint(last_seen datetime) void
        +fetch_events(client AnozrwayClient) AsyncGenerator~List~
        +next_batch() AsyncGenerator~List~
        +run() void
        +_async_run() void
        -_parse_domains() List~str~
        -_cleanup_event_cache() void
        -_is_new_event(cache_key str) bool
        +_compute_dedup_key(searched_domain str, event Dict) str
        +_extract_event_ts(event Dict) datetime
        +_extract_entity_id(event Dict) str
        +_safe_str(x Any) str
    }

    class AnozrwayClient {
        -Dict cfg
        -Optional~Trigger~ trigger
        -str base_url
        -str token_url
        -str client_id
        -str client_secret
        -Any x_restrict_access
        -int timeout
        -Optional~aiohttp.ClientSession~ _session
        -Optional~str~ _access_token
        -Optional~datetime~ _token_expires_at
        -AsyncLimiter _rate_limiter
        +AnozrwayClient(module_config Dict, trigger Trigger)
        +log(message str, level str) void
        +search_domain_v1(context str, domain str, start_date datetime, end_date datetime) List~Dict~
        +fetch_events(context str, domain str, start_date datetime, end_date datetime) List~Dict~
        +__aenter__() AnozrwayClient
        +__aexit__(exc_type Any, exc_val Any, exc_tb Any) void
        -_get_access_token() str
        -_to_iso(dt datetime) str
    }

    class AnozrwayError {
    }

    class AnozrwayAuthError {
    }

    class AnozrwayRateLimitError {
    }

    AnozrwayModule --|> Module
    AnozrwayModuleConfiguration --|> BaseModel
    AnozrwayHistoricalConfiguration --|> DefaultConnectorConfiguration
    AnozrwayHistoricalConnector --|> AsyncConnector
    AnozrwayHistoricalConnector o--> AnozrwayHistoricalConfiguration
    AnozrwayHistoricalConnector o--> AnozrwayClient

    AnozrwayClient ..> Trigger : optional

    AnozrwayError --|> Exception
    AnozrwayAuthError --|> AnozrwayError
    AnozrwayRateLimitError --|> AnozrwayError
Loading

File-Level Changes

Change Details Files
Introduce asynchronous Anozrway HTTP client with OAuth2, rate limiting, and robust error handling for domain search and events endpoints.
  • Create AnozrwayClient with configurable base URL, token URL, client credentials, timeout and optional x-restrict-access header
  • Implement OAuth2 client-credentials flow with token caching and proactive expiry handling in _get_access_token
  • Add search_domain_v1 and fetch_events methods that build JSON payloads, apply AsyncLimiter rate limiting, handle 401/429/retry with backoff, and normalize non-list responses to empty lists
  • Provide async context manager that manages aiohttp.ClientSession lifecycle and validates token on aenter
  • Define custom exception hierarchy AnozrwayError, AnozrwayAuthError, and AnozrwayRateLimitError for clearer error semantics
Anozrway/anozrway_modules/client/http_client.py
Anozrway/anozrway_modules/client/errors.py
Anozrway/tests/test_http_client.py
Add historical connector that batches, deduplicates, and forwards Anozrway leak events while maintaining checkpoints and an event cache.
  • Define AnozrwayHistoricalConfiguration extending DefaultConnectorConfiguration with intake, frequency, chunking, domain list, context, lookback and window parameters
  • Implement AnozrwayHistoricalConnector as AsyncConnector using PersistentJSON stores for context and event cache, with a TTL-based cleanup mechanism
  • Compute deduplication keys from domain, nom_fuite, and event timestamp; track seen events in cache and expose duplicate metrics
  • Implement sliding-window event collection across configured domains, enriching events with _searched_domain and _context, normalizing download_links, and updating checkpoints based on max event timestamp
  • Provide next_batch and _async_run loop to stream batches to push_data_to_intakes with retry/backoff semantics and special handling for authentication failures
Anozrway/anozrway_modules/historical_connector.py
Anozrway/tests/test_historical_connector.py
Expose Prometheus metrics and basic module scaffolding for the Anozrway integration.
  • Create Prometheus counters/gauges/histograms for events collected/forwarded/duplicated, API request counts and durations, and checkpoint age
  • Add AnozrwayModule wrapper with name/description plus an empty AnozrwayModuleConfiguration model
  • Wire the historical connector into a runnable module entrypoint and register it under the slug anozrway_historical
Anozrway/anozrway_modules/metrics.py
Anozrway/anozrway_modules/__init__.py
Anozrway/anozrway_modules/models.py
Anozrway/main.py
Anozrway/tests/test_misc.py
Provide packaging, manifest, Docker, and test configuration for the new connector module.
  • Add Poetry project configuration with runtime and dev dependencies, formatting and coverage settings, and pytest options enforcing high coverage
  • Define manifest.json with connector configuration schema, defaults, secrets, and metadata (name, slug, uuid, categories)
  • Create Dockerfile that installs dependencies via Poetry and runs the module under a non-root user
  • Add pytest fixtures for ephemeral symphony storage and include auxiliary project files like changelog, logo, and connector definition JSON
Anozrway/pyproject.toml
Anozrway/manifest.json
Anozrway/Dockerfile
Anozrway/tests/conftest.py
Anozrway/CHANGELOG.md
Anozrway/connector_anozrway_historical.json
Anozrway/logo.svg
Anozrway/.gitignore
Anozrway/poetry.lock

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The Dockerfile uses python:3.11 while pyproject.toml declares python = "^3.12"; aligning these versions will avoid surprises at runtime and ensure the image matches the declared runtime constraints.
  • In AnozrwayClient.__aenter__, if _get_access_token raises (e.g. bad credentials), the aiohttp.ClientSession is never closed; consider wrapping the token fetch in a try/except that closes the session on failure to avoid leaking connections.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The Dockerfile uses `python:3.11` while `pyproject.toml` declares `python = "^3.12"`; aligning these versions will avoid surprises at runtime and ensure the image matches the declared runtime constraints.
- In `AnozrwayClient.__aenter__`, if `_get_access_token` raises (e.g. bad credentials), the `aiohttp.ClientSession` is never closed; consider wrapping the token fetch in a try/except that closes the session on failure to avoid leaking connections.

## Individual Comments

### Comment 1
<location> `Anozrway/anozrway_modules/historical_connector.py:154` </location>
<code_context>
+        """
+        nom_fuite = cls._safe_str(event.get("nom_fuite")).strip().lower()
+        ts = cls._extract_event_ts(event)
+        ts_s = ts.isoformat().replace("+00:00", "Z") if ts else ""
+
+        raw = "|".join(
</code_context>

<issue_to_address>
**issue (bug_risk):** Events without a parsable timestamp will all share the same dedup key segment, potentially collapsing distinct events.

When `ts` is `None`, `ts_s` becomes an empty string, so all events with the same `(searched_domain, nom_fuite)` but missing/unparseable timestamps collapse to one dedup key. If multiple such events are valid from the upstream API, this will drop data. Consider falling back to another stable field (or the raw timestamp string) instead of `""` to distinguish them.
</issue_to_address>

### Comment 2
<location> `Anozrway/anozrway_modules/historical_connector.py:282-285` </location>
<code_context>
+        import asyncio
+        import signal
+
+        loop = asyncio.get_event_loop()
+
+        def handle_stop_signal():
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Using `asyncio.get_event_loop()` in `run` can be fragile on newer Python versions and when no default loop exists.

On Python 3.11+ this call is deprecated when no loop is running and may raise in some environments. Since this is the connector entrypoint, prefer creating and setting an explicit loop, e.g. `loop = asyncio.new_event_loop(); asyncio.set_event_loop(loop)`, rather than relying on the implicit global loop.

```suggestion
        import asyncio
        import signal

        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
```
</issue_to_address>

### Comment 3
<location> `Anozrway/tests/test_misc.py:29-35` </location>
<code_context>
+    assert AnozrwayModuleConfiguration() is not None
+
+
+def test_metrics_smoke():
+    assert events_collected._name == "anozrway_historical_events_collected"
+    assert events_forwarded._name == "anozrway_historical_events_forwarded"
+    assert events_duplicated._name == "anozrway_historical_events_duplicated"
+    assert api_requests._name == "anozrway_api_requests"
+    assert api_request_duration._name == "anozrway_api_request_duration_seconds"
+    assert checkpoint_age._name == "anozrway_checkpoint_age_seconds"
</code_context>

<issue_to_address>
**issue (testing):** Metric name expectations don't match the Prometheus convention used in the implementation

`Counter` metrics in `metrics.py` are defined with the `_total` suffix (e.g. `anozrway_historical_events_collected_total`), but these assertions expect names without `_total`, so `test_metrics_smoke` will fail despite correct metric definitions. Please align the expected names with the actual ones (including `_total` for counters) or relax the checks (e.g. `endswith("_events_collected_total")`) to assert the intended semantics rather than the full literal name.
</issue_to_address>

### Comment 4
<location> `Anozrway/anozrway_modules/client/http_client.py:89` </location>
<code_context>
+    def _to_iso(dt: datetime) -> str:
+        return dt.astimezone(timezone.utc).isoformat().replace("+00:00", "Z")
+
+    async def search_domain_v1(
+        self,
+        context: str,
</code_context>

<issue_to_address>
**issue (complexity):** Consider extracting the shared POST/retry/backoff/response-handling logic into a single private helper so the two public methods only construct payloads and delegate to it.

You can remove a lot of duplication by centralizing the “POST + retry + backoff + result extraction” logic into a single private helper, then make the two public methods just build payloads and delegate.

A focused refactor could look like this:

```python
# helper inside AnozrwayClient
async def _post_with_retry(
    self,
    path: str,
    payload: Dict[str, Any],
    *,
    result_key: str,
    unauthorized_msg: str,
    generic_error_msg: str,
) -> List[Dict[str, Any]]:
    if not self._session:
        raise AnozrwayError("HTTP session not initialized")

    access_token = await self._get_access_token()

    url = f"{self.base_url}{path}"
    headers = {
        "Content-Type": "application/json",
        "authorization": f"Bearer {access_token}",
    }
    if self.x_restrict_access:
        headers["x-restrict-access"] = str(self.x_restrict_access)

    max_attempts = 3
    attempt = 0
    backoff = 1

    while attempt < max_attempts:
        attempt += 1

        async with self._rate_limiter:
            async with self._session.post(
                url,
                json=payload,
                headers=headers,
                timeout=self.timeout,
                raise_for_status=False,
            ) as resp:
                status = resp.status

                if status == 401:
                    # drop token and retry once
                    self._access_token = None
                    self._token_expires_at = None
                    if attempt < max_attempts:
                        continue
                    raise AnozrwayAuthError(unauthorized_msg)

                if status == 429:
                    await asyncio.sleep(60 * backoff)
                    backoff *= 2
                    continue

                if status != 200:
                    text = await resp.text()
                    raise AnozrwayError(f"{generic_error_msg} ({status}): {text}")

                data = await resp.json()

        results = data.get(result_key) or []
        if not isinstance(results, list):
            return []
        return results

    raise AnozrwayRateLimitError(
        f"Exceeded maximum retry attempts while calling {generic_error_msg}"
    )
```

Then your two public methods become much smaller and easier to reason about:

```python
async def search_domain_v1(
    self,
    context: str,
    domain: str,
    start_date: datetime,
    end_date: datetime,
) -> List[Dict[str, Any]]:
    payload = {
        "context": context,
        "domain": domain,
        "start_date": self._to_iso(start_date),
        "end_date": self._to_iso(end_date),
    }
    return await self._post_with_retry(
        "/v1/domain/searches",
        payload,
        result_key="results",
        unauthorized_msg="Unauthorized when calling Anozrway v1 domain search",
        generic_error_msg="v1 domain search failed",
    )

async def fetch_events(
    self,
    context: str,
    domain: str,
    start_date: datetime,
    end_date: datetime,
) -> List[Dict[str, Any]]:
    payload = {
        "context": context,
        "domain": domain,
        "start_date": self._to_iso(start_date),
        "end_date": self._to_iso(end_date),
    }
    return await self._post_with_retry(
        "/events",
        payload,
        result_key="events",
        unauthorized_msg="Unauthorized when calling Balise Pipeline /events",
        generic_error_msg="Balise Pipeline /events failed",
    )
```

This keeps all existing behavior (tokens, headers, rate limiting, 401 handling, 429 backoff, error messages) but makes future changes to the HTTP/retry logic localized to one place.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@socket-security
Copy link

socket-security bot commented Feb 12, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedwerkzeug@​3.1.696100100100100
Addedrich@​14.3.398100100100100
Addedtyper@​0.24.099100100100100
Addedannotated-doc@​0.0.4100100100100100

View full report

@socket-security
Copy link

socket-security bot commented Feb 12, 2026

All alerts resolved. Learn more about Socket for GitHub.

This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored.

View full report

@helkabouss helkabouss marked this pull request as draft February 20, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant