Forward structured application JSON logs, optionally enriched with ECS fields

**Describe the enhancement:**

I am using the Kinesis input to collect structured logs from AWS CloudWatch that have been partly populated using the [ecs-pino-format](https://github.com/elastic/ecs-logging-nodejs/tree/main/loggers/pino) library. Inspecting the data output to Elasticsearch I found that there are a couple layers of structure around my application logs:

- Elastic Serverless Forwarder's ECS fields
- AWS [log record](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/ValidateLogEventFlow.html) structure, JSON stringified on the ECS `message` field
- My application logs are further stringified on the log record's `message` field

```json
{
  "@timestamp": "2023-02-14T21:26:04.387857Z",
  "message": "{\"id\":\"37385191276969030971785538657917125344955573022299717635\",\"timestamp\":1676409955997,\"message\":\"{\\\"log.level\\\":\\\"info\\\",\\\"@timestamp\\\":\\\"2023-02-14T21:25:55.996Z\\\",\\\"message\\\":\\\"beepboop\\\"}\"}",
}
```

I was expecting to see my application logs as-is. This could be achieved if:

- The CloudWatch log record structure was parsed by ESF
- The log record `message` string was attempted to be parsed as JSON
  - If `message` is valid JSON, merge the content into the root object that is forwarded 
  - If `message` is not JSON, preserve it as-is
- _Optionally_ add (or overwrite/merge) ECS objects with data from ESF
 
**Describe a specific use case for the enhancement or feature:**

When I initially saw this I thought that it was a reasonable preservation of the various layers my logs are passing through. I assumed that I could use Ingest Pipelines to parse each layer, extract my logs, and merge them into the ESF structure in order to leverage some of it's ECS content while adding my own. For reference, here's a portion of my pipeline:

```json
{
  "processors": [
    {
      "json": {
        "field": "message",
        "target_field": "parsed_cloudwatch_log_event"
      }
    },
    {
      "json": {
        "field": "parsed_cloudwatch_log_event.message",
        "add_to_root": true
      }
    },
    {
      "set": {
        "field": "@timestamp",
        "copy_from": "parsed_app_log.@timestamp",
        "ignore_failure": true
      }
    },
    {
      "set": {
        "field": "message",
        "copy_from": "parsed_app_log.message",
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "field": "parsed_cloudwatch_log_event"
      }
    },
    {
      "remove": {
        "field": "parsed_app_log"
      }
    }
  ]
}
```

However I have encountered some limitations in the pipeline processors that are [currently blocking me](https://github.com/elastic/ecs-logging-nodejs/issues/140) from accessing a `log.level` field that **ecs-pino-format** is adding.

In my own case I may need to stop using **ecs-pino-format** as I don't see the blocking issues in the data processors being resolved anytime soon. I don't really expect that this enhancement will be implemented either. But I wanted to at least document what I'm dealing with as a user.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Forward structured application JSON logs, optionally enriched with ECS fields #258

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Forward structured application JSON logs, optionally enriched with ECS fields #258

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions