Skip to content

Conversation

@mgoworko
Copy link
Collaborator

@mgoworko mgoworko commented Oct 1, 2025

🚀New (ES) Audit log serializer compliant with Elastic Common Schema (ECS)
🧐Enhancement (ES) Enable nested field definitions in the configurable audit log serializer for more flexible audit logging

Summary by CodeRabbit

  • New Features

    • Added an ECS v1 audit log serializer and option to select ECS serializer/version.
    • Added support for processing-duration (nanos), numeric, boolean, and nested field types for richer structured audit output.
    • Configuration decoding updated to accept ECS serializer settings, ECS versioning, and nested field definitions.
  • Tests

    • Added unit and integration tests validating ECS-formatted output, nested-field handling, and ECS configuration paths.
    • Added utilities for deterministic timestamps and test request contexts.

@mgoworko mgoworko changed the title [RORDEV-1567] draft [RORDEV-1567] ECS serializer and improved configurable serializer Oct 12, 2025
@mgoworko mgoworko marked this pull request as ready for review October 12, 2025 13:24
@mgoworko mgoworko requested a review from coutoPL October 12, 2025 13:30
@coderabbitai

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (1)

227-239: Map invalid ECS version to AuditingSettingsCreationError.

Decoding failures for version currently bubble as generic DecodingFailure. Align with other paths by mapping to an AuditingSettingsCreationError.

Apply this diff:

-      version <- c.downField("version").as[Option[EcsSerializerVersion]]
+      version <- c.downField("version").as[Option[EcsSerializerVersion]]
+        .left.map(withAuditingSettingsCreationErrorMessage(msg => s"ECS serializer 'version' is invalid: $msg"))
🧹 Nitpick comments (1)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (1)

33-85: Add ECS event.outcome and enforce trace.id format

  • In EcsV1AuditLogSerializer (lines 33–85), introduce an event.outcome descriptor mapping Allowed→“success”, Forbidden/Errored→“failure”, otherwise “unknown” per ECS.
  • Ensure trace.id (CorrelationId) emits a 16-byte lowercase hex string (32 chars, non-zero); if it doesn’t, emit it under labels or use another ECS field.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c0ca654 and aa15975.

📒 Files selected for processing (4)
  • audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (4 hunks)
  • core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala (2 hunks)
  • core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (1 hunks)
  • core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala
🧰 Additional context used
🧬 Code graph analysis (3)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala (1)
audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (2)
  • AuditFieldValueDescriptor (171-251)
  • ProcessingDurationNanos (192-192)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (2)
audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (31)
  • ror (28-299)
  • AllowedEventMode (161-165)
  • AuditFieldName (167-167)
  • AuditFieldValueDescriptor (171-251)
  • serialize (32-45)
  • serialize (47-66)
  • Nested (245-245)
  • Nested (247-249)
  • StaticText (237-237)
  • CorrelationId (199-199)
  • HttpPath (215-215)
  • RemoteAddress (219-219)
  • LocalAddress (221-221)
  • HttpMethod (211-211)
  • Content (223-223)
  • ContentLengthInBytes (225-225)
  • User (180-180)
  • ImpersonatedByUser (182-182)
  • Id (197-197)
  • Action (184-184)
  • Type (208-208)
  • FinalState (176-176)
  • ProcessingDurationNanos (192-192)
  • ErrorType (204-204)
  • ErrorMessage (206-206)
  • EsClusterName (233-233)
  • EsNodeName (231-231)
  • TaskId (201-201)
  • InvolvedIndices (186-186)
  • AclHistory (188-188)
  • XForwardedForHttpHeader (217-217)
audit/src/main/scala/tech/beshu/ror/audit/AuditResponseContext.scala (1)
  • AuditResponseContext (28-55)
core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (2)
audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (9)
  • ror (28-299)
  • Reason (178-178)
  • AllowedEventMode (161-165)
  • AuditFieldName (167-167)
  • AuditFieldValueDescriptor (171-251)
  • BooleanValue (239-239)
  • NumericValue (241-241)
  • Nested (245-245)
  • Nested (247-249)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (2)
  • EcsV1AuditLogSerializer (25-31)
  • EcsV1AuditLogSerializer (33-86)
🔇 Additional comments (4)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala (1)

30-31: Good parser enhancements (trim + nanos support).

  • Trimming placeholder keys reduces false negatives while keeping matching case-insensitive.
  • Added PROCESSING_DURATION_NANOS mapping aligns with helper and ECS needs.

Also applies to: 68-68

core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (3)

285-290: Serializer type: ECS registration and error message look good.

Recognizing ecs and listing it among allowed types is correct and user-friendly.

Also applies to: 306-307


309-318: Version decoder behavior is fine; pairs well with the left.map above.

"v1" accepted, others rejected with a clear message.


376-396: Robust, type-aware field decoding with nested support.

  • Supports booleans, numerics, strings via parser, and nested objects recursively.
  • Arrays are rejected with a clear error; good guardrail.

Please ensure docs mention that arrays aren’t supported in configurable fields and show an example for nested objects.

Comment on lines 66 to 72
AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(
AuditFieldName("id") -> AuditFieldValueDescriptor.Id,
AuditFieldName("action") -> AuditFieldValueDescriptor.Action,
AuditFieldName("type") -> AuditFieldValueDescriptor.Type,
AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState,
AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,
),
Copy link

@coderabbitai coderabbitai bot Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix event.reason mapping (use Reason, not FinalState).

event.reason should carry the explanatory reason text, not the final state. Map it to AuditFieldValueDescriptor.Reason.

Apply this diff:

-      AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState,
+      AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(
AuditFieldName("id") -> AuditFieldValueDescriptor.Id,
AuditFieldName("action") -> AuditFieldValueDescriptor.Action,
AuditFieldName("type") -> AuditFieldValueDescriptor.Type,
AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState,
AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,
),
AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(
AuditFieldName("id") -> AuditFieldValueDescriptor.Id,
AuditFieldName("action") -> AuditFieldValueDescriptor.Action,
AuditFieldName("type") -> AuditFieldValueDescriptor.Type,
AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason,
AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,
),
🤖 Prompt for AI Agents
In
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala
around lines 66 to 72, the mapping for AuditFieldName("reason") incorrectly uses
AuditFieldValueDescriptor.FinalState; change it to
AuditFieldValueDescriptor.Reason so event.reason carries the explanatory reason
text. Update the nested mapping entry to replace FinalState with Reason and
ensure imports/usage remain consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the action section and used ECS 1.6 specification

 AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(
      AuditFieldName("id") -> AuditFieldValueDescriptor.Id,
      AuditFieldName("dataset") -> AuditFieldValueDescriptor.Action, // ROR Action describes the resource being accessed, so it is dataset in ECS
      AuditFieldName("action") -> AuditFieldValueDescriptor.Type, // ROR Type describes operation that is executed, so it is ECS action
      AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState, // ROR Final state described the outcome, so it is the ECS reason of event happening, the ROR Reason is a custom detailed_reason label
      AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,
    ),

The ROR "reason" is now one of custom ECS labels:

    AuditFieldName("labels") -> AuditFieldValueDescriptor.Nested(
      ...
      AuditFieldName("detailed_reason") -> AuditFieldValueDescriptor.Reason,
    ),

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

ECS 1.6 event.reason event.action event.dataset field definitions

💡 Result:

Summary definitions (ECS v1.6 semantics are the same as current ECS event docs):

  • event.action — The action captured by the event; more specific than event.category (examples: "group-add", "process-started", "file-created"). Type: keyword. Level: core. [1]

  • event.reason — Why the event happened according to the source; explains the motive or cause for the action/outcome (example: "blocked site", "Terminated an unexpected process"). Type: keyword. Level: extended. [1]

  • event.dataset — Name of the dataset / which type of log the event comes from when a source publishes multiple log types; recommended convention is "." (example: "apache.access"). Type: keyword. Level: core. [2]

Sources:

  1. Elastic Common Schema — Event fields (event.action, event.reason). [Elastic docs]. [2]
  2. Elastic Common Schema — Event fields (event.dataset / implementation guidance). [Elastic docs]. [3]

@mgoworko, I appreciate the refactor to ECS 1.6, but I have concerns about the semantic alignment with the ECS specification:

Issues with current mappings:

  1. event.dataset → ROR Action: According to ECS, event.dataset identifies which log type/source the event comes from (e.g., "apache.access", "ror.audit"). Mapping it to ROR Action (like "cluster:internal_ror/user_metadata/get") misuses this field—it should be a static identifier like "ror.audit" or "readonlyrest.access".

  2. event.action → ROR Type: ECS defines event.action as "the action captured by the event" (e.g., "user-authentication", "resource-access"). ROR Type (e.g., "RRTestConfigRequest") seems too low-level. The ROR Action would be more semantically appropriate here as it describes what operation was attempted.

  3. event.reason → ROR FinalState: ECS defines event.reason as "why the event happened according to the source; explains the motive or cause." ROR FinalState (ALLOWED/FORBIDDEN) is an outcome, not a reason. This should map to event.outcome instead (which accepts values like "success", "failure", "unknown").

Recommended mapping:

AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(
  AuditFieldName("id") -> AuditFieldValueDescriptor.Id,
  AuditFieldName("dataset") -> AuditFieldValueDescriptor.StaticText("ror.audit"),
  AuditFieldName("action") -> AuditFieldValueDescriptor.Action,
  AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason,
  AuditFieldName("outcome") -> AuditFieldValueDescriptor.FinalState, // Map ALLOWED/FORBIDDEN to success/failure
  AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,
),
AuditFieldName("labels") -> AuditFieldValueDescriptor.Nested(
  AuditFieldName("request_type") -> AuditFieldValueDescriptor.Type,
  // ... other labels
),

This aligns better with ECS semantics where event.reason explains "why," event.outcome captures the result, and event.dataset identifies the log source.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgoworko is it ok now?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!


final case class Combined(values: List[AuditFieldValueDescriptor]) extends AuditFieldValueDescriptor

final case class Nested(values: Map[AuditFieldName, AuditFieldValueDescriptor]) extends AuditFieldValueDescriptor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's one level of nesting. Let's do it in a more generic way (to allow many nesting levels).

BTW. We define fields like that:

fields: Map[AuditFieldName, AuditFieldValueDescriptor]

So, maybe we should not add Nested as a descriptor, but maybe we should modify the AuditFieldName definition instead.

E.g.
AuditFieldName is:

  • a string OR
  • is a composition of AuditFieldNames (or strings)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see I was wrong about the nesting.
Nevertheless, let's consider the change above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that was my first version of implementation - the field name as list of strings, representing nesting. I have it in git history, so I can do it with no problems if we so decide. But I honestly think that the current version is better, because:

  • it represents 1:1 the structure of the JSON, so is easier to grasp
  • it forces to defined all fields inside the section, which improves readability
  • it is especially better when user defines "configurable" serializer:

Example from tests:

readonlyrest:
  audit:
    enabled: true
    outputs:
    - type: log
      serializer:
        type: configurable
        verbosity_level_serialization_mode: [INFO]
        fields:
          custom_section:
            nested_text: "nt"
            nested_number: 123
            nested_boolean: true
            double_nested:
              double_nested_next: "dnt"
              triple_nested:
                triple_nested_next: "tnt"
          node_name_with_static_suffix: "{ES_NODE_NAME} with suffix"
          another_field: "{ES_CLUSTER_NAME} {HTTP_METHOD}"
          tid: "{TASK_ID}"
          bytes: "{CONTENT_LENGTH_IN_BYTES}"

With the approach based on the list of strings, it would look like that:

fields:
  - path: ["custom_section", "nested_text"]
    value: "nt"

  - path: ["custom_section", "nested_number"]
    value: 123

  - path: ["custom_section", "nested_boolean"]
    value: true

  - path: ["custom_section", "double_nested", "double_nested_next"]
    value: "dnt"

  - path: ["custom_section", "double_nested", "triple_nested", "triple_nested_next"]
    value: "tnt"

  - path: ["node_name_with_static_suffix"]
    value: "{ES_NODE_NAME} with suffix"

  - path: ["another_field"]
    value: "{ES_CLUSTER_NAME} {HTTP_METHOD}"

  - path: ["tid"]
    value: "{TASK_ID}"

  - path: ["bytes"]
    value: "{CONTENT_LENGTH_IN_BYTES}"

I think that the nested approach is more readable in the code too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I agree. I see it as follows:

  1. we have some representation of the fields and value descriptors in the ROR's YAML
  2. then we have our internal (domain) representation of it
  3. finally, we serialize the representation to the ES index document

And the only thing that is JSON here is dthe ocument.
The internal representation is not a JSON, and the "fields" representation in ROR's YAML is not a JSON either (we don't support arrays).

But I don't want you to change the YAML's representation. It's fine. But obivously we should detect invalid settings like:

readonlyrest:
  audit:
    enabled: true
    outputs:
    - type: log
      serializer:
        type: configurable
        verbosity_level_serialization_mode: [INFO]
        fields: 1 # or string, or bool, or array
readonlyrest:
  audit:
    enabled: true
    outputs:
    - type: log
      serializer:
        type: configurable
        verbosity_level_serialization_mode: [INFO]
        fields:
          custom_section:
            - sth: 1
            - sth: 2
readonlyrest:
  audit:
    enabled: true
    outputs:
    - type: log
      serializer:
        type: configurable
        verbosity_level_serialization_mode: [INFO]
        fields:
          custom_section: {}
          custom_section2: null

So, when we take into consideration the limitations, we could say that we have a map of simplified field path keys (without arrays) and value descriptor leafs.

In code, we represent it as Map[AuditFieldName, AuditFieldValueDescriptor], where

  • AuditFieldValueDescriptor is always a leaf (of type string, int, bool ... maybe null?!)
  • AuditFieldName is a path to this leaf (e.g., custom_section.double_nested.double_nested_next or node_name_with_static_suffix)

That's why I propose to define AuditFieldName (in the domain ... maybe we should rename it to AuditFieldPath) like this:

type AuditFieldPath = NonEmptyList[String].

And AuditFieldValueDescriptor will be free of sth like Nested.

WDYT?

@mgoworko mgoworko requested a review from coutoPL October 25, 2025 14:45
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Collaborator

@coutoPL coutoPL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment left

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants