Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_http: unstructured logs #10128

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Mar 24, 2025

Summary

Add support for unstructured logs to in_http.

Description

Add support for handling of unstructured new line delimited logs when using a content-type: text/plain header. Each log line is submitted with the log key.

This is modeled on the support for unstructured logs in in_tail. Neither parsers nor multiline are supported as of yet. I have done the work of for supporting parsers, which also allows for implementing the tag_key feature and could also be added to this PR. Support for multiline though is still pending.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@pwhelan
Copy link
Contributor Author

pwhelan commented Mar 26, 2025

I did not include a configuration example since any configuration with http input should suffice.

Here is a test using valgrind:

Submitting the logs:

╭─pwhelan@Phillips-MacBook-Pro ~
╰─$ cat errors.log
[Fri Dec 16 01:46:23 2005] [error] [client 1.2.3.4] Directory index forbidden by rule: /home/test/
[Fri Dec 16 01:54:34 2005] [error] [client 1.2.3.4] Directory index forbidden by rule: /apache/web-data/test2
[Fri Dec 16 02:25:55 2005] [error] [client 1.2.3.4] Client sent malformed Host header
[Mon Dec 19 23:02:01 2005] [error] [client 1.2.3.4] user test: authentication failure for "/~dcid/test1": Password Mismatch
╭─pwhelan@Phillips-MacBook-Pro ~
╰─$ cat errors.log | http hydra.lan:8000/ Content-type:text/plain                                                                        1 ↵
HTTP/1.1 201 Created
content-length: 0

The valgrind logs:

╭─pwhelan@hydra /home/pwhelan/Projects/work/fluent-bit.git/pwhelan-http-unstructured-logs/build  ‹system›  <pwhelan-http-unstructured-logs>
╰─$ valgrind ./bin/fluent-bit -i http -p port=8000 -o stdout -f 1
==2288== Memcheck, a memory error detector
==2288== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==2288== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==2288== Command: ./bin/fluent-bit -i http -p port=8000 -o stdout -f 1
==2288==
Fluent Bit v4.0.0
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___  _____
|  ___| |                | |   | ___ (_) |           /   ||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| || |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| ||  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/


[2025/03/26 12:08:34] [ info] [fluent bit] version=4.0.0, commit=3caea300ca, pid=2288
[2025/03/26 12:08:34] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/03/26 12:08:34] [ info] [simd    ] disabled
[2025/03/26 12:08:34] [ info] [output:stdout:stdout.0] worker #0 started
[2025/03/26 12:08:34] [ info] [cmetrics] version=0.9.9
[2025/03/26 12:08:34] [ info] [ctraces ] version=0.6.2
[2025/03/26 12:08:34] [ info] [input:http:http.0] initializing
[2025/03/26 12:08:34] [ info] [input:http:http.0] storage_strategy='memory' (memory only)
[2025/03/26 12:08:34] [ info] [sp] stream processor started
[0] http.0: [[1743001720.897367165, {}], {"log"=>"[Fri Dec 16 01:46:23 2005] [error] [client 1.2.3.4] Directory index forbidden by rule: /home/test/"}]
[1] http.0: [[1743001720.914443982, {}], {"log"=>"[Fri Dec 16 01:54:34 2005] [error] [client 1.2.3.4] Directory index forbidden by rule: /apache/web-data/test2"}]
[2] http.0: [[1743001720.915065654, {}], {"log"=>"[Fri Dec 16 02:25:55 2005] [error] [client 1.2.3.4] Client sent malformed Host header"}]
[3] http.0: [[1743001720.915142481, {}], {"log"=>"[Mon Dec 19 23:02:01 2005] [error] [client 1.2.3.4] user test: authentication failure for "/~dcid/test1": Password Mismatch"}]
^C[2025/03/26 12:08:47] [engine] caught signal (SIGINT)
[2025/03/26 12:08:47] [ warn] [engine] service will shutdown in max 5 seconds
[2025/03/26 12:08:47] [ info] [engine] service has stopped (0 pending tasks)
[2025/03/26 12:08:47] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2025/03/26 12:08:47] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==2288==
==2288== HEAP SUMMARY:
==2288==     in use at exit: 0 bytes in 0 blocks
==2288==   total heap usage: 1,776 allocs, 1,776 frees, 1,010,222 bytes allocated
==2288==
==2288== All heap blocks were freed -- no leaks are possible
==2288==
==2288== For lists of detected and suppressed errors, rerun with: -s
==2288== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The same configuration with debugging:

╭─pwhelan@hydra /home/pwhelan/Projects/work/fluent-bit.git/pwhelan-http-unstructured-logs/build  ‹system›  <pwhelan-http-unstructured-logs>
╰─$ ./bin/fluent-bit -vvv -i http -p port=8000 -o stdout -f 1
Fluent Bit v4.0.0
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___  _____
|  ___| |                | |   | ___ (_) |           /   ||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| || |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| ||  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/


[2025/03/26 12:10:55] [ info] Configuration:
[2025/03/26 12:10:55] [ info]  flush time     | 1.000000 seconds
[2025/03/26 12:10:55] [ info]  grace          | 5 seconds
[2025/03/26 12:10:55] [ info]  daemon         | 0
[2025/03/26 12:10:55] [ info] ___________
[2025/03/26 12:10:55] [ info]  inputs:
[2025/03/26 12:10:55] [ info]      http
[2025/03/26 12:10:55] [ info] ___________
[2025/03/26 12:10:55] [ info]  filters:
[2025/03/26 12:10:55] [ info] ___________
[2025/03/26 12:10:55] [ info]  outputs:
[2025/03/26 12:10:55] [ info]      stdout.0
[2025/03/26 12:10:55] [ info] ___________
[2025/03/26 12:10:55] [ info]  collectors:
[2025/03/26 12:10:55] [ info] [fluent bit] version=4.0.0, commit=3caea300ca, pid=2345
[2025/03/26 12:10:55] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2025/03/26 12:10:55] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/03/26 12:10:55] [ info] [simd    ] disabled
[2025/03/26 12:10:55] [ info] [cmetrics] version=0.9.9
[2025/03/26 12:10:55] [ info] [ctraces ] version=0.6.2
[2025/03/26 12:10:55] [ info] [input:http:http.0] initializing
[2025/03/26 12:10:55] [ info] [input:http:http.0] storage_strategy='memory' (memory only)
[2025/03/26 12:10:55] [debug] [http:http.0] created event channels: read=25 write=26
[2025/03/26 12:10:55] [debug] [downstream] listening on 0.0.0.0:8000
[2025/03/26 12:10:55] [debug] [stdout:stdout.0] created event channels: read=28 write=29
[2025/03/26 12:10:55] [ info] [sp] stream processor started
[2025/03/26 12:10:55] [ info] [output:stdout:stdout.0] worker #0 started
[2025/03/26 12:11:02] [debug] [task] created task=0x7ffff002b980 id=0 OK
[2025/03/26 12:11:02] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] http.0: [[1743001862.176759521, {}], {"log"=>"[Fri Dec 16 01:46:23 2005] [error] [client 1.2.3.4] Directory index forbidden by rule: /home/test/"}]
[1] http.0: [[1743001862.176789327, {}], {"log"=>"[Fri Dec 16 01:54:34 2005] [error] [client 1.2.3.4] Directory index forbidden by rule: /apache/web-data/test2"}]
[2] http.0: [[1743001862.176793575, {}], {"log"=>"[Fri Dec 16 02:25:55 2005] [error] [client 1.2.3.4] Client sent malformed Host header"}]
[3] http.0: [[1743001862.176800708, {}], {"log"=>"[Mon Dec 19 23:02:01 2005] [error] [client 1.2.3.4] user test: authentication failure for "/~dcid/test1": Password Mismatch"}]
[2025/03/26 12:11:02] [debug] [out flush] cb_destroy coro_id=0
[2025/03/26 12:11:02] [debug] [task] destroy task=0x7ffff002b980 (task_id=0)
^C[2025/03/26 12:11:04] [engine] caught signal (SIGINT)
[2025/03/26 12:11:04] [ warn] [engine] service will shutdown in max 5 seconds
[2025/03/26 12:11:04] [ info] [engine] service has stopped (0 pending tasks)
[2025/03/26 12:11:04] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2025/03/26 12:11:04] [ info] [output:stdout:stdout.0] thread worker #0 stopped

@pwhelan pwhelan marked this pull request as ready for review March 26, 2025 15:12
@niedbalski niedbalski self-requested a review March 31, 2025 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant