[RORDEV-1567] ECS serializer and improved configurable serializer #1165

mgoworko · 2025-10-01T21:20:18Z

🚀New (ES) Audit log serializer compliant with Elastic Common Schema (ECS)
🧐Enhancement (ES) Enable nested field definitions in the configurable audit log serializer for more flexible audit logging

Summary by CodeRabbit

New Features
- Added an ECS v1 audit log serializer and option to select ECS serializer/version.
- Added support for processing-duration (nanos), numeric, boolean, and nested field types for richer structured audit output.
- Configuration decoding updated to accept ECS serializer settings, ECS versioning, and nested field definitions.
Tests
- Added unit and integration tests validating ECS-formatted output, nested-field handling, and ECS configuration paths.
- Added utilities for deterministic timestamps and test request contexts.

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (1)
227-239: Map invalid ECS version to AuditingSettingsCreationError.

Decoding failures for version currently bubble as generic DecodingFailure. Align with other paths by mapping to an AuditingSettingsCreationError.

Apply this diff:
-      version <- c.downField("version").as[Option[EcsSerializerVersion]]
+      version <- c.downField("version").as[Option[EcsSerializerVersion]]
+        .left.map(withAuditingSettingsCreationErrorMessage(msg => s"ECS serializer 'version' is invalid: $msg"))

🧹 Nitpick comments (1)

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (1)

33-85: Add ECS event.outcome and enforce trace.id format

In EcsV1AuditLogSerializer (lines 33–85), introduce an event.outcome descriptor mapping Allowed→“success”, Forbidden/Errored→“failure”, otherwise “unknown” per ECS.

Ensure trace.id (CorrelationId) emits a 16-byte lowercase hex string (32 chars, non-zero); if it doesn’t, emit it under labels or use another ECS field.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c0ca654 and aa15975.

📒 Files selected for processing (4)

audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (4 hunks)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala (2 hunks)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (1 hunks)
core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala

🧰 Additional context used

🧬 Code graph analysis (3)

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala (1)

audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (2)

AuditFieldValueDescriptor (171-251)

ProcessingDurationNanos (192-192)

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (2)

audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (31)

ror (28-299)

AllowedEventMode (161-165)

AuditFieldName (167-167)

AuditFieldValueDescriptor (171-251)

serialize (32-45)

serialize (47-66)

Nested (245-245)

Nested (247-249)

StaticText (237-237)

CorrelationId (199-199)

HttpPath (215-215)

RemoteAddress (219-219)

LocalAddress (221-221)

HttpMethod (211-211)

Content (223-223)

ContentLengthInBytes (225-225)

User (180-180)

ImpersonatedByUser (182-182)

Id (197-197)

Action (184-184)

Type (208-208)

FinalState (176-176)

ProcessingDurationNanos (192-192)

ErrorType (204-204)

ErrorMessage (206-206)

EsClusterName (233-233)

EsNodeName (231-231)

TaskId (201-201)

InvolvedIndices (186-186)

AclHistory (188-188)

XForwardedForHttpHeader (217-217)

audit/src/main/scala/tech/beshu/ror/audit/AuditResponseContext.scala (1)

AuditResponseContext (28-55)

core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (2)

audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (9)

ror (28-299)

Reason (178-178)

AllowedEventMode (161-165)

AuditFieldName (167-167)

AuditFieldValueDescriptor (171-251)

BooleanValue (239-239)

NumericValue (241-241)

Nested (245-245)

Nested (247-249)

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (2)

EcsV1AuditLogSerializer (25-31)

EcsV1AuditLogSerializer (33-86)

🔇 Additional comments (4)

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala (1)

30-31: Good parser enhancements (trim + nanos support).

Trimming placeholder keys reduces false negatives while keeping matching case-insensitive.

Added PROCESSING_DURATION_NANOS mapping aligns with helper and ECS needs.

Also applies to: 68-68

core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (3)

285-290: Serializer type: ECS registration and error message look good.

Recognizing ecs and listing it among allowed types is correct and user-friendly.

Also applies to: 306-307

309-318: Version decoder behavior is fine; pairs well with the left.map above.

"v1" accepted, others rejected with a clear message.

376-396: Robust, type-aware field decoding with nested support.

Supports booleans, numerics, strings via parser, and nested objects recursively.

Arrays are rejected with a clear error; good guardrail.

Please ensure docs mention that arrays aren’t supported in configurable fields and show an example for nested objects.

coderabbitai · 2025-10-14T16:52:58Z

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala

+    AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(
+      AuditFieldName("id") -> AuditFieldValueDescriptor.Id,
+      AuditFieldName("action") -> AuditFieldValueDescriptor.Action,
+      AuditFieldName("type") -> AuditFieldValueDescriptor.Type,
+      AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState,
+      AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,
+    ),


⚠️ Potential issue | 🔴 Critical

Fix event.reason mapping (use Reason, not FinalState).

event.reason should carry the explanatory reason text, not the final state. Map it to AuditFieldValueDescriptor.Reason.

Apply this diff:

- AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState, + AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(

AuditFieldName("id") -> AuditFieldValueDescriptor.Id,

AuditFieldName("action") -> AuditFieldValueDescriptor.Action,

AuditFieldName("type") -> AuditFieldValueDescriptor.Type,

AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState,

AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,

),

AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(

AuditFieldName("id") -> AuditFieldValueDescriptor.Id,

AuditFieldName("action") -> AuditFieldValueDescriptor.Action,

AuditFieldName("type") -> AuditFieldValueDescriptor.Type,

AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason,

AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,

),

🤖 Prompt for AI Agents

In core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala around lines 66 to 72, the mapping for AuditFieldName("reason") incorrectly uses AuditFieldValueDescriptor.FinalState; change it to AuditFieldValueDescriptor.Reason so event.reason carries the explanatory reason text. Update the nested mapping entry to replace FinalState with Reason and ensure imports/usage remain consistent.

I refactored the action section and used ECS 1.6 specification

AuditFieldName("event") -> AuditFieldValueDescriptor.Nested( AuditFieldName("id") -> AuditFieldValueDescriptor.Id, AuditFieldName("dataset") -> AuditFieldValueDescriptor.Action, // ROR Action describes the resource being accessed, so it is dataset in ECS AuditFieldName("action") -> AuditFieldValueDescriptor.Type, // ROR Type describes operation that is executed, so it is ECS action AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState, // ROR Final state described the outcome, so it is the ECS reason of event happening, the ROR Reason is a custom detailed_reason label AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos, ),

The ROR "reason" is now one of custom ECS labels:

AuditFieldName("labels") -> AuditFieldValueDescriptor.Nested( ... AuditFieldName("detailed_reason") -> AuditFieldValueDescriptor.Reason, ),

🧩 Analysis chain

🌐 Web query:

ECS 1.6 event.reason event.action event.dataset field definitions

💡 Result:

Summary definitions (ECS v1.6 semantics are the same as current ECS event docs):

event.action — The action captured by the event; more specific than event.category (examples: "group-add", "process-started", "file-created"). Type: keyword. Level: core. [1]

event.reason — Why the event happened according to the source; explains the motive or cause for the action/outcome (example: "blocked site", "Terminated an unexpected process"). Type: keyword. Level: extended. [1]

event.dataset — Name of the dataset / which type of log the event comes from when a source publishes multiple log types; recommended convention is "." (example: "apache.access"). Type: keyword. Level: core. [2]

Sources:

Elastic Common Schema — Event fields (event.action, event.reason). [Elastic docs]. [2]

Elastic Common Schema — Event fields (event.dataset / implementation guidance). [Elastic docs]. [3]

@mgoworko, I appreciate the refactor to ECS 1.6, but I have concerns about the semantic alignment with the ECS specification:

Issues with current mappings:

event.dataset → ROR Action: According to ECS, event.dataset identifies which log type/source the event comes from (e.g., "apache.access", "ror.audit"). Mapping it to ROR Action (like "cluster:internal_ror/user_metadata/get") misuses this field—it should be a static identifier like "ror.audit" or "readonlyrest.access".

event.action → ROR Type: ECS defines event.action as "the action captured by the event" (e.g., "user-authentication", "resource-access"). ROR Type (e.g., "RRTestConfigRequest") seems too low-level. The ROR Action would be more semantically appropriate here as it describes what operation was attempted.

event.reason → ROR FinalState: ECS defines event.reason as "why the event happened according to the source; explains the motive or cause." ROR FinalState (ALLOWED/FORBIDDEN) is an outcome, not a reason. This should map to event.outcome instead (which accepts values like "success", "failure", "unknown").

Recommended mapping:

AuditFieldName("event") -> AuditFieldValueDescriptor.Nested( AuditFieldName("id") -> AuditFieldValueDescriptor.Id, AuditFieldName("dataset") -> AuditFieldValueDescriptor.StaticText("ror.audit"), AuditFieldName("action") -> AuditFieldValueDescriptor.Action, AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason, AuditFieldName("outcome") -> AuditFieldValueDescriptor.FinalState, // Map ALLOWED/FORBIDDEN to success/failure AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos, ), AuditFieldName("labels") -> AuditFieldValueDescriptor.Nested( AuditFieldName("request_type") -> AuditFieldValueDescriptor.Type, // ... other labels ),

This aligns better with ECS semantics where event.reason explains "why," event.outcome captures the result, and event.dataset identifies the log source.

@mgoworko is it ok now?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coutoPL · 2025-10-21T07:15:27Z

audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala

+
    final case class Combined(values: List[AuditFieldValueDescriptor]) extends AuditFieldValueDescriptor

+    final case class Nested(values: Map[AuditFieldName, AuditFieldValueDescriptor]) extends AuditFieldValueDescriptor


It's one level of nesting. Let's do it in a more generic way (to allow many nesting levels).

BTW. We define fields like that:

fields: Map[AuditFieldName, AuditFieldValueDescriptor]

So, maybe we should not add Nested as a descriptor, but maybe we should modify the AuditFieldName definition instead.

E.g.
AuditFieldName is:

a string OR

is a composition of AuditFieldNames (or strings)?

oh, I see I was wrong about the nesting.
Nevertheless, let's consider the change above.

Actually that was my first version of implementation - the field name as list of strings, representing nesting. I have it in git history, so I can do it with no problems if we so decide. But I honestly think that the current version is better, because:

it represents 1:1 the structure of the JSON, so is easier to grasp

it forces to defined all fields inside the section, which improves readability

it is especially better when user defines "configurable" serializer:

Example from tests:

readonlyrest: audit: enabled: true outputs: - type: log serializer: type: configurable verbosity_level_serialization_mode: [INFO] fields: custom_section: nested_text: "nt" nested_number: 123 nested_boolean: true double_nested: double_nested_next: "dnt" triple_nested: triple_nested_next: "tnt" node_name_with_static_suffix: "{ES_NODE_NAME} with suffix" another_field: "{ES_CLUSTER_NAME} {HTTP_METHOD}" tid: "{TASK_ID}" bytes: "{CONTENT_LENGTH_IN_BYTES}"

With the approach based on the list of strings, it would look like that:

fields: - path: ["custom_section", "nested_text"] value: "nt" - path: ["custom_section", "nested_number"] value: 123 - path: ["custom_section", "nested_boolean"] value: true - path: ["custom_section", "double_nested", "double_nested_next"] value: "dnt" - path: ["custom_section", "double_nested", "triple_nested", "triple_nested_next"] value: "tnt" - path: ["node_name_with_static_suffix"] value: "{ES_NODE_NAME} with suffix" - path: ["another_field"] value: "{ES_CLUSTER_NAME} {HTTP_METHOD}" - path: ["tid"] value: "{TASK_ID}" - path: ["bytes"] value: "{CONTENT_LENGTH_IN_BYTES}"

I think that the nested approach is more readable in the code too.

I'm not sure if I agree. I see it as follows:

we have some representation of the fields and value descriptors in the ROR's YAML

then we have our internal (domain) representation of it

finally, we serialize the representation to the ES index document

And the only thing that is JSON here is dthe ocument.
The internal representation is not a JSON, and the "fields" representation in ROR's YAML is not a JSON either (we don't support arrays).

But I don't want you to change the YAML's representation. It's fine. But obivously we should detect invalid settings like:

readonlyrest: audit: enabled: true outputs: - type: log serializer: type: configurable verbosity_level_serialization_mode: [INFO] fields: 1 # or string, or bool, or array

readonlyrest: audit: enabled: true outputs: - type: log serializer: type: configurable verbosity_level_serialization_mode: [INFO] fields: custom_section: - sth: 1 - sth: 2

readonlyrest: audit: enabled: true outputs: - type: log serializer: type: configurable verbosity_level_serialization_mode: [INFO] fields: custom_section: {} custom_section2: null

So, when we take into consideration the limitations, we could say that we have a map of simplified field path keys (without arrays) and value descriptor leafs.

In code, we represent it as Map[AuditFieldName, AuditFieldValueDescriptor], where

AuditFieldValueDescriptor is always a leaf (of type string, int, bool ... maybe null?!)

AuditFieldName is a path to this leaf (e.g., custom_section.double_nested.double_nested_next or node_name_with_static_suffix)

That's why I propose to define AuditFieldName (in the domain ... maybe we should rename it to AuditFieldPath) like this:

type AuditFieldPath = NonEmptyList[String].

And AuditFieldValueDescriptor will be free of sth like Nested.

WDYT?

.../scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala

core/src/test/scala/tech/beshu/ror/unit/acl/factory/AuditSettingsTests.scala

coutoPL

one comment left

mgoworko added 7 commits September 27, 2025 22:51

qs

e8dc296

qs

d7f70e0

Merge remote-tracking branch 'origin/develop' into RORDEV-1567

b3c3ba8

Merge remote-tracking branch 'origin/develop' into RORDEV-1567

2b6454e

qs

ff5f100

qs

1c5f7a4

ECS schema version field

944662b

mgoworko changed the title ~~[RORDEV-1567] draft~~ [RORDEV-1567] ECS serializer and improved configurable serializer Oct 12, 2025

mgoworko marked this pull request as ready for review October 12, 2025 13:24

mgoworko requested a review from coutoPL October 12, 2025 13:30

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

mgoworko added 4 commits October 14, 2025 18:22

add ECS serializer integration test

f940792

Merge remote-tracking branch 'origin/develop' into RORDEV-1567

a57e942

rabbit review change

59e4797

rabbit review change

c0ca654

mgoworko commented Oct 14, 2025

View reviewed changes

core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

rabbit review change

aa15975

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

coutoPL reviewed Oct 21, 2025

View reviewed changes

mgoworko added 3 commits October 25, 2025 16:17

review changes

51ed19d

review changes

96304a7

Merge remote-tracking branch 'origin/develop' into RORDEV-1567

08f8c99

mgoworko requested a review from coutoPL October 25, 2025 14:45

This comment was marked as resolved.

Sign in to view

adjust tests

3f63b3b

This comment was marked as resolved.

Sign in to view

coutoPL reviewed Oct 28, 2025

View reviewed changes


		final case class Combined(values: List[AuditFieldValueDescriptor]) extends AuditFieldValueDescriptor

		final case class Nested(values: Map[AuditFieldName, AuditFieldValueDescriptor]) extends AuditFieldValueDescriptor

Uh oh!

[RORDEV-1567] ECS serializer and improved configurable serializer #1165

Are you sure you want to change the base?

[RORDEV-1567] ECS serializer and improved configurable serializer #1165

Uh oh!

Conversation

mgoworko commented Oct 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

This comment was marked as outdated.

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgoworko Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coutoPL Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coutoPL Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

coutoPL Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

mgoworko Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coutoPL Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

coutoPL left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoworko commented Oct 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot Oct 14, 2025 •

edited

Loading