Skip to content

Added JSON-Formatted Output #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions samples/amcache.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i always forget to do this, thanks!


import sys
import json
import logging
import datetime
from collections import namedtuple
Expand Down Expand Up @@ -169,12 +172,15 @@ def main(argv=None):

parser = argparse.ArgumentParser(
description="Parse program execution entries from the Amcache.hve Registry hive")
group = parser.add_mutually_exclusive_group()
group.add_argument("-t", action="store_true", dest="do_timeline",
help="Output in simple timeline format")
group.add_argument("-j", action="store_true", dest="do_json",
help="Output in JSON-formatted strings")
parser.add_argument("registry_hive", type=str,
help="Path to the Amcache.hve hive to process")
parser.add_argument("-v", action="store_true", dest="verbose",
help="Enable verbose output")
parser.add_argument("-t", action="store_true", dest="do_timeline",
help="Output in simple timeline format")
args = parser.parse_args(argv[1:])

if args.verbose:
Expand Down Expand Up @@ -213,6 +219,17 @@ def main(argv=None):
w.writerow(["timestamp", "timestamp_type", "path", "sha1"])
for e in sorted(entries, key=lambda e: e.timestamp):
w.writerow([e.timestamp, e.type, e.entry.path, e.entry.sha1])

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should add check to see that both -j and -t are not both provided. i assume the user would figure out pretty quickly on their own, but its best to be explicit.

elif args.do_json:
for e in ee:
document = {}
for i in FIELDS:
val = getattr(e, i.name, "-")
if isinstance(val, datetime.datetime):
val = val.isoformat(" ")
document[i.name] = val
print(json.dumps(document, ensure_ascii=False).encode("utf-8"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the unicode handling here looks good to me. nice work! json.dumps returns either str or unicode, which you correctly encode into a specific representation.

on windows, i've occasionally hit issues where stdout is open in text mode, so when writing binary data to stdout, the windows shell inserts some unexpected bytes. however, since here we're dealing with encoded text, i think we should be ok.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note, the output format of the -j mode is not json (a single document), but a collection of json documents. i've seen this called jsonl before (http://jsonlines.org/).

making the output a single document makes it easier for most programs to ingest, but more difficult to process streaming. jsonl works a bit better for processing incrementally. what format do you think we should use?


else:
w = unicodecsv.writer(sys.stdout, delimiter="|", quotechar="\"",
quoting=unicodecsv.QUOTE_MINIMAL, encoding="utf-8")
Expand Down