Skip to content

Latest commit

 

History

History
79 lines (79 loc) · 2.41 KB

todos.org

File metadata and controls

79 lines (79 loc) · 2.41 KB

Have a ConsolePolicy which

Possible fields for a log message:

  • LEVEL: info, warn, error
  • MESSAGE:
  • PREFIX:
  • RAW_ERROR: Only see this if LEVEL=error
  • SUBJECT: publish, publisherror

Inplace message filter

Header

  • [ ] Check for header constraints in-place to avoid memory allocation.
  • [ ] Possible constraints
    • Timestamp
    • Node name
    • Pool name

How to make thing fast?

  • Run the first time to get the size of lines.
  • Second run will be used to filter messages?

File

  • size: Size of a file.
  • lines: Position of EOL.
  • data: A file content.

FileStats

  • path: Full path
  • stem: Stem
  • extension: File extension
  • mask: Read/Write/Execute etc
  • symlink: Is a symlink or not
  • is_dir: Is a folder or not

Tasks

Loop thought all log files and find all possible field name.

  • Extract JSON string.
  • Parse JSON string and update possible key list.

Data structure for scribe log

LogMessage

MessageHeader

MessageBody

Multi-threaded log parser.

Producer

  • Read trunk of data
  • Detect the last EOL and pass data to the consumer. Might need to push data into a queue?

Consumers

  • Pop data off queue.
  • Parse data and keep informat that we need.
  • Update the results (shared data)

Store data

  • Store data into NoSQL database.
  • Column: date_hour
  • Key: Message IDs.
  • Need to have a cache information for message ids.

Check message life cycle

  • Parse incomming messages.
  • Update message lifecycle table. A vector + a hash table.
  • Print out summary information.
  • Print out outlier.

What we can do to speedup fastgrep?

Parsing algorithm: Can we parse the read buffer instead of copying to the line buffer?

  • Handle cases that a line can be on many block.

Scan log file to cache message information such as

  • begin and end of a message.
  • Type of messages?
  • Timestamp

How do we cache messages

publish request.

  • Can be varied from 131 to several MBytes.

Control messages

  • We only store the message id (20 bytes) and message type (1 bytes)

Error messages

  • User error
  • RAW_ERROR: connect and publish errors etc.

Use cases

Support skip patterns

Filter message by time stamp.

Save found message to files in different format.

Check message life cycle.

Clustering messages

Questions

Can mixed-in offer better performance than policy based approach?