Skip to content

Forward structured application JSON logs, optionally enriched with ECS fields #258

Closed
@pushred

Description

@pushred

Describe the enhancement:

I am using the Kinesis input to collect structured logs from AWS CloudWatch that have been partly populated using the ecs-pino-format library. Inspecting the data output to Elasticsearch I found that there are a couple layers of structure around my application logs:

  • Elastic Serverless Forwarder's ECS fields
  • AWS log record structure, JSON stringified on the ECS message field
  • My application logs are further stringified on the log record's message field
{
  "@timestamp": "2023-02-14T21:26:04.387857Z",
  "message": "{\"id\":\"37385191276969030971785538657917125344955573022299717635\",\"timestamp\":1676409955997,\"message\":\"{\\\"log.level\\\":\\\"info\\\",\\\"@timestamp\\\":\\\"2023-02-14T21:25:55.996Z\\\",\\\"message\\\":\\\"beepboop\\\"}\"}",
}

I was expecting to see my application logs as-is. This could be achieved if:

  • The CloudWatch log record structure was parsed by ESF
  • The log record message string was attempted to be parsed as JSON
    • If message is valid JSON, merge the content into the root object that is forwarded
    • If message is not JSON, preserve it as-is
  • Optionally add (or overwrite/merge) ECS objects with data from ESF

Describe a specific use case for the enhancement or feature:

When I initially saw this I thought that it was a reasonable preservation of the various layers my logs are passing through. I assumed that I could use Ingest Pipelines to parse each layer, extract my logs, and merge them into the ESF structure in order to leverage some of it's ECS content while adding my own. For reference, here's a portion of my pipeline:

{
  "processors": [
    {
      "json": {
        "field": "message",
        "target_field": "parsed_cloudwatch_log_event"
      }
    },
    {
      "json": {
        "field": "parsed_cloudwatch_log_event.message",
        "add_to_root": true
      }
    },
    {
      "set": {
        "field": "@timestamp",
        "copy_from": "parsed_app_log.@timestamp",
        "ignore_failure": true
      }
    },
    {
      "set": {
        "field": "message",
        "copy_from": "parsed_app_log.message",
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "field": "parsed_cloudwatch_log_event"
      }
    },
    {
      "remove": {
        "field": "parsed_app_log"
      }
    }
  ]
}

However I have encountered some limitations in the pipeline processors that are currently blocking me from accessing a log.level field that ecs-pino-format is adding.

In my own case I may need to stop using ecs-pino-format as I don't see the blocking issues in the data processors being resolved anytime soon. I don't really expect that this enhancement will be implemented either. But I wanted to at least document what I'm dealing with as a user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions