- LEVEL: info, warn, error
- MESSAGE:
- PREFIX:
- RAW_ERROR: Only see this if LEVEL=error
- SUBJECT: publish, publisherror
- [ ] Check for header constraints in-place to avoid memory allocation.
- [ ] Possible constraints
- Timestamp
- Node name
- Pool name
- Run the first time to get the size of lines.
- Second run will be used to filter messages?
- size: Size of a file.
- lines: Position of EOL.
- data: A file content.
- path: Full path
- stem: Stem
- extension: File extension
- mask: Read/Write/Execute etc
- symlink: Is a symlink or not
- is_dir: Is a folder or not
- Extract JSON string.
- Parse JSON string and update possible key list.
- Read trunk of data
- Detect the last EOL and pass data to the consumer. Might need to push data into a queue?
- Pop data off queue.
- Parse data and keep informat that we need.
- Update the results (shared data)
- Store data into NoSQL database.
- Column: date_hour
- Key: Message IDs.
- Need to have a cache information for message ids.
- Parse incomming messages.
- Update message lifecycle table. A vector + a hash table.
- Print out summary information.
- Print out outlier.
- Handle cases that a line can be on many block.
- begin and end of a message.
- Type of messages?
- Timestamp
- Can be varied from 131 to several MBytes.
- We only store the message id (20 bytes) and message type (1 bytes)
- User error
- RAW_ERROR: connect and publish errors etc.