-
Couldn't load subscription status.
- Fork 164
[RORDEV-1567] ECS serializer and improved configurable serializer #1165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (1)
227-239: Map invalid ECS version to AuditingSettingsCreationError.Decoding failures for
versioncurrently bubble as generic DecodingFailure. Align with other paths by mapping to an AuditingSettingsCreationError.Apply this diff:
- version <- c.downField("version").as[Option[EcsSerializerVersion]] + version <- c.downField("version").as[Option[EcsSerializerVersion]] + .left.map(withAuditingSettingsCreationErrorMessage(msg => s"ECS serializer 'version' is invalid: $msg"))
🧹 Nitpick comments (1)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (1)
33-85: Add ECS event.outcome and enforce trace.id format
- In EcsV1AuditLogSerializer (lines 33–85), introduce an
event.outcomedescriptor mapping Allowed→“success”, Forbidden/Errored→“failure”, otherwise “unknown” per ECS.- Ensure
trace.id(CorrelationId) emits a 16-byte lowercase hex string (32 chars, non-zero); if it doesn’t, emit it underlabelsor use another ECS field.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala(4 hunks)core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala(2 hunks)core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala(1 hunks)core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala
🧰 Additional context used
🧬 Code graph analysis (3)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala (1)
audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (2)
AuditFieldValueDescriptor(171-251)ProcessingDurationNanos(192-192)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (2)
audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (31)
ror(28-299)AllowedEventMode(161-165)AuditFieldName(167-167)AuditFieldValueDescriptor(171-251)serialize(32-45)serialize(47-66)Nested(245-245)Nested(247-249)StaticText(237-237)CorrelationId(199-199)HttpPath(215-215)RemoteAddress(219-219)LocalAddress(221-221)HttpMethod(211-211)Content(223-223)ContentLengthInBytes(225-225)User(180-180)ImpersonatedByUser(182-182)Id(197-197)Action(184-184)Type(208-208)FinalState(176-176)ProcessingDurationNanos(192-192)ErrorType(204-204)ErrorMessage(206-206)EsClusterName(233-233)EsNodeName(231-231)TaskId(201-201)InvolvedIndices(186-186)AclHistory(188-188)XForwardedForHttpHeader(217-217)audit/src/main/scala/tech/beshu/ror/audit/AuditResponseContext.scala (1)
AuditResponseContext(28-55)
core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (2)
audit/src/main/scala/tech/beshu/ror/audit/utils/AuditSerializationHelper.scala (9)
ror(28-299)Reason(178-178)AllowedEventMode(161-165)AuditFieldName(167-167)AuditFieldValueDescriptor(171-251)BooleanValue(239-239)NumericValue(241-241)Nested(245-245)Nested(247-249)core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala (2)
EcsV1AuditLogSerializer(25-31)EcsV1AuditLogSerializer(33-86)
🔇 Additional comments (4)
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala (1)
30-31: Good parser enhancements (trim + nanos support).
- Trimming placeholder keys reduces false negatives while keeping matching case-insensitive.
- Added PROCESSING_DURATION_NANOS mapping aligns with helper and ECS needs.
Also applies to: 68-68
core/src/main/scala/tech/beshu/ror/accesscontrol/factory/decoders/AuditingSettingsDecoder.scala (3)
285-290: Serializer type: ECS registration and error message look good.Recognizing
ecsand listing it among allowed types is correct and user-friendly.Also applies to: 306-307
309-318: Version decoder behavior is fine; pairs well with the left.map above.
"v1"accepted, others rejected with a clear message.
376-396: Robust, type-aware field decoding with nested support.
- Supports booleans, numerics, strings via parser, and nested objects recursively.
- Arrays are rejected with a clear error; good guardrail.
Please ensure docs mention that arrays aren’t supported in configurable fields and show an example for nested objects.
| AuditFieldName("event") -> AuditFieldValueDescriptor.Nested( | ||
| AuditFieldName("id") -> AuditFieldValueDescriptor.Id, | ||
| AuditFieldName("action") -> AuditFieldValueDescriptor.Action, | ||
| AuditFieldName("type") -> AuditFieldValueDescriptor.Type, | ||
| AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState, | ||
| AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix event.reason mapping (use Reason, not FinalState).
event.reason should carry the explanatory reason text, not the final state. Map it to AuditFieldValueDescriptor.Reason.
Apply this diff:
- AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState,
+ AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| AuditFieldName("event") -> AuditFieldValueDescriptor.Nested( | |
| AuditFieldName("id") -> AuditFieldValueDescriptor.Id, | |
| AuditFieldName("action") -> AuditFieldValueDescriptor.Action, | |
| AuditFieldName("type") -> AuditFieldValueDescriptor.Type, | |
| AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState, | |
| AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos, | |
| ), | |
| AuditFieldName("event") -> AuditFieldValueDescriptor.Nested( | |
| AuditFieldName("id") -> AuditFieldValueDescriptor.Id, | |
| AuditFieldName("action") -> AuditFieldValueDescriptor.Action, | |
| AuditFieldName("type") -> AuditFieldValueDescriptor.Type, | |
| AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason, | |
| AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos, | |
| ), |
🤖 Prompt for AI Agents
In
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala
around lines 66 to 72, the mapping for AuditFieldName("reason") incorrectly uses
AuditFieldValueDescriptor.FinalState; change it to
AuditFieldValueDescriptor.Reason so event.reason carries the explanatory reason
text. Update the nested mapping entry to replace FinalState with Reason and
ensure imports/usage remain consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the action section and used ECS 1.6 specification
AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(
AuditFieldName("id") -> AuditFieldValueDescriptor.Id,
AuditFieldName("dataset") -> AuditFieldValueDescriptor.Action, // ROR Action describes the resource being accessed, so it is dataset in ECS
AuditFieldName("action") -> AuditFieldValueDescriptor.Type, // ROR Type describes operation that is executed, so it is ECS action
AuditFieldName("reason") -> AuditFieldValueDescriptor.FinalState, // ROR Final state described the outcome, so it is the ECS reason of event happening, the ROR Reason is a custom detailed_reason label
AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,
),
The ROR "reason" is now one of custom ECS labels:
AuditFieldName("labels") -> AuditFieldValueDescriptor.Nested(
...
AuditFieldName("detailed_reason") -> AuditFieldValueDescriptor.Reason,
),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
ECS 1.6 event.reason event.action event.dataset field definitions
💡 Result:
Summary definitions (ECS v1.6 semantics are the same as current ECS event docs):
-
event.action — The action captured by the event; more specific than event.category (examples: "group-add", "process-started", "file-created"). Type: keyword. Level: core. [1]
-
event.reason — Why the event happened according to the source; explains the motive or cause for the action/outcome (example: "blocked site", "Terminated an unexpected process"). Type: keyword. Level: extended. [1]
-
event.dataset — Name of the dataset / which type of log the event comes from when a source publishes multiple log types; recommended convention is "." (example: "apache.access"). Type: keyword. Level: core. [2]
Sources:
- Elastic Common Schema — Event fields (event.action, event.reason). [Elastic docs]. [2]
- Elastic Common Schema — Event fields (event.dataset / implementation guidance). [Elastic docs]. [3]
@mgoworko, I appreciate the refactor to ECS 1.6, but I have concerns about the semantic alignment with the ECS specification:
Issues with current mappings:
-
event.dataset→ ROR Action: According to ECS,event.datasetidentifies which log type/source the event comes from (e.g., "apache.access", "ror.audit"). Mapping it to ROR Action (like "cluster:internal_ror/user_metadata/get") misuses this field—it should be a static identifier like"ror.audit"or"readonlyrest.access". -
event.action→ ROR Type: ECS definesevent.actionas "the action captured by the event" (e.g., "user-authentication", "resource-access"). ROR Type (e.g., "RRTestConfigRequest") seems too low-level. The ROR Action would be more semantically appropriate here as it describes what operation was attempted. -
event.reason→ ROR FinalState: ECS definesevent.reasonas "why the event happened according to the source; explains the motive or cause." ROR FinalState (ALLOWED/FORBIDDEN) is an outcome, not a reason. This should map toevent.outcomeinstead (which accepts values like "success", "failure", "unknown").
Recommended mapping:
AuditFieldName("event") -> AuditFieldValueDescriptor.Nested(
AuditFieldName("id") -> AuditFieldValueDescriptor.Id,
AuditFieldName("dataset") -> AuditFieldValueDescriptor.StaticText("ror.audit"),
AuditFieldName("action") -> AuditFieldValueDescriptor.Action,
AuditFieldName("reason") -> AuditFieldValueDescriptor.Reason,
AuditFieldName("outcome") -> AuditFieldValueDescriptor.FinalState, // Map ALLOWED/FORBIDDEN to success/failure
AuditFieldName("duration") -> AuditFieldValueDescriptor.ProcessingDurationNanos,
),
AuditFieldName("labels") -> AuditFieldValueDescriptor.Nested(
AuditFieldName("request_type") -> AuditFieldValueDescriptor.Type,
// ... other labels
),This aligns better with ECS semantics where event.reason explains "why," event.outcome captures the result, and event.dataset identifies the log source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgoworko is it ok now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
|
|
||
| final case class Combined(values: List[AuditFieldValueDescriptor]) extends AuditFieldValueDescriptor | ||
|
|
||
| final case class Nested(values: Map[AuditFieldName, AuditFieldValueDescriptor]) extends AuditFieldValueDescriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one level of nesting. Let's do it in a more generic way (to allow many nesting levels).
BTW. We define fields like that:
fields: Map[AuditFieldName, AuditFieldValueDescriptor]
So, maybe we should not add Nested as a descriptor, but maybe we should modify the AuditFieldName definition instead.
E.g.
AuditFieldName is:
- a string OR
- is a composition of
AuditFieldNames (or strings)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I see I was wrong about the nesting.
Nevertheless, let's consider the change above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that was my first version of implementation - the field name as list of strings, representing nesting. I have it in git history, so I can do it with no problems if we so decide. But I honestly think that the current version is better, because:
- it represents 1:1 the structure of the JSON, so is easier to grasp
- it forces to defined all fields inside the section, which improves readability
- it is especially better when user defines "configurable" serializer:
Example from tests:
readonlyrest:
audit:
enabled: true
outputs:
- type: log
serializer:
type: configurable
verbosity_level_serialization_mode: [INFO]
fields:
custom_section:
nested_text: "nt"
nested_number: 123
nested_boolean: true
double_nested:
double_nested_next: "dnt"
triple_nested:
triple_nested_next: "tnt"
node_name_with_static_suffix: "{ES_NODE_NAME} with suffix"
another_field: "{ES_CLUSTER_NAME} {HTTP_METHOD}"
tid: "{TASK_ID}"
bytes: "{CONTENT_LENGTH_IN_BYTES}"
With the approach based on the list of strings, it would look like that:
fields:
- path: ["custom_section", "nested_text"]
value: "nt"
- path: ["custom_section", "nested_number"]
value: 123
- path: ["custom_section", "nested_boolean"]
value: true
- path: ["custom_section", "double_nested", "double_nested_next"]
value: "dnt"
- path: ["custom_section", "double_nested", "triple_nested", "triple_nested_next"]
value: "tnt"
- path: ["node_name_with_static_suffix"]
value: "{ES_NODE_NAME} with suffix"
- path: ["another_field"]
value: "{ES_CLUSTER_NAME} {HTTP_METHOD}"
- path: ["tid"]
value: "{TASK_ID}"
- path: ["bytes"]
value: "{CONTENT_LENGTH_IN_BYTES}"
I think that the nested approach is more readable in the code too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I agree. I see it as follows:
- we have some representation of the fields and value descriptors in the ROR's YAML
- then we have our internal (domain) representation of it
- finally, we serialize the representation to the ES index document
And the only thing that is JSON here is dthe ocument.
The internal representation is not a JSON, and the "fields" representation in ROR's YAML is not a JSON either (we don't support arrays).
But I don't want you to change the YAML's representation. It's fine. But obivously we should detect invalid settings like:
readonlyrest:
audit:
enabled: true
outputs:
- type: log
serializer:
type: configurable
verbosity_level_serialization_mode: [INFO]
fields: 1 # or string, or bool, or arrayreadonlyrest:
audit:
enabled: true
outputs:
- type: log
serializer:
type: configurable
verbosity_level_serialization_mode: [INFO]
fields:
custom_section:
- sth: 1
- sth: 2readonlyrest:
audit:
enabled: true
outputs:
- type: log
serializer:
type: configurable
verbosity_level_serialization_mode: [INFO]
fields:
custom_section: {}
custom_section2: nullSo, when we take into consideration the limitations, we could say that we have a map of simplified field path keys (without arrays) and value descriptor leafs.
In code, we represent it as Map[AuditFieldName, AuditFieldValueDescriptor], where
AuditFieldValueDescriptoris always a leaf (of type string, int, bool ... maybe null?!)AuditFieldNameis a path to this leaf (e.g.,custom_section.double_nested.double_nested_nextornode_name_with_static_suffix)
That's why I propose to define AuditFieldName (in the domain ... maybe we should rename it to AuditFieldPath) like this:
type AuditFieldPath = NonEmptyList[String].
And AuditFieldValueDescriptor will be free of sth like Nested.
WDYT?
.../scala/tech/beshu/ror/accesscontrol/audit/configurable/AuditFieldValueDescriptorParser.scala
Show resolved
Hide resolved
core/src/main/scala/tech/beshu/ror/accesscontrol/audit/ecs/EcsV1AuditLogSerializer.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/tech/beshu/ror/unit/acl/factory/AuditSettingsTests.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment left
🚀New (ES) Audit log serializer compliant with Elastic Common Schema (ECS)
🧐Enhancement (ES) Enable nested field definitions in the configurable audit log serializer for more flexible audit logging
Summary by CodeRabbit
New Features
Tests