Added JSON-Formatted Output #67

PoorBillionaire · 2016-06-25T02:51:05Z

Greets--

Short: This pull request adds functionality to the samples/amcache.py script, to output results as JSON-formatted strings with the -j command-line switch.

Long:
amcache.py would be a great tool to use when sampling an environment for Amcache.hve artifacts at scale. I have considered various ways to consume and log the data parsed by amcache.py: shipping to Elasticsearch or Logstash (via syslog/filebeat), or perhaps by interpreting the data with other Python scripts.

Of the options I have considered, JSON-formatted output would be extremely valuable - for example, in Logstash the syslog and beats plugins contain codec functionality for JSON-formatted events. Similarly, other Python scripts could benefit from JSON-formatted strings by loading them as dictionary objects to access and act upon the various key/value combinations.

Sample output:

dev@computer:~/python-registry$ python samples/amcache.py Amcache.hve -j

{"pe_checksum": 161438, "modified_timestamp": "2013-08-22 13:25:38.066004", "sha1": "0000f783a29297a42e86e7f2ef17d91737eb5add732d", "pe_sizeofimage": 143360, "first_run": "2014-11-21 09:54:02.063293", "language": 1033, "linker_timestamp": "2013-08-22 02:47:53", "company": "Microsoft Corporation", "switchbackcontext": 72057594037929472, "product": "Microsoft® Windows® Operating System", "header_hash": "0101c6ba8c455c8332340bc4a73f782f15b427336eff", "modified_timestamp2": "2013-08-22 13:25:38.053610", "version": "6.3.9600.16384 (winblue_rtm.130821-1623)", "id": "-", "file_description": "Dism Host Servicing Process", "created_timestamp": "2014-11-21 09:54:01.969542", "path": "C:\\Users\\Administrator\\AppData\\Local\\Temp\\9D1571B1-DEC2-4D4D-8166-81378BC8398F\\DismHost.exe", "version_number": "6.3.9600.16384", "size": 140392}

Note: I am highly inadequate when it comes to handling unicode strings, but found a combination of methods that seemed to do the job. I would appreciate any further testing to ensure it works consistent with the rest of your project.

Thanks!

Adam
Tw: @_TrapLoop

williballenthin · 2016-06-25T15:30:46Z

samples/amcache.py

 import sys
 import logging
 import datetime
 from collections import namedtuple
+import json


style nit pick: i like to order my imports by line length (with group of stdlib, pip-installable, and project-local modules).

williballenthin · 2016-06-25T15:51:12Z

these changes are great! thanks for taking the time to make enhancements, and then explain your reasoning. i provided some comments on the code, which i are meant to be totally respectful and constructive. let me know if you have any questions or issues, and i'm happy to discuss.

@matthewdunwoody is currently working on an amcache.py invoker tool to make it easy to process large zip files of hives, and distribute the processing across many cores. i'll encourage him to make it available online for your enjoyment.

PoorBillionaire · 2016-06-25T18:06:02Z

samples/amcache.py

            print(json.dumps(document, ensure_ascii=False).encode("utf-8"))

    else:
        w = unicodecsv.writer(sys.stdout, delimiter="|", quotechar="\"",
                              quoting=unicodecsv.QUOTE_MINIMAL, encoding="utf-8")
        w.writerow(map(lambda e: e.name, FIELDS))
        for e in ee:
-            print(e)
-            exit(type(e.path))


...I don't even remember doing this.

PoorBillionaire · 2016-06-25T18:12:58Z

Thanks for the great feedback, it was very helpful. Let me know if those syntax and style changes work for you.

I need to do a bit of reading before I can talk more about the json/l formatting - I'll update this thread later tonight.

PoorBillionaire · 2016-06-29T03:33:56Z

Apologies for the delayed response.

I've thought a bit about the formatting: one large JSON document per hive file, or using jsonl with each line representing a parsed entry in a given hive file. As you noted Willi, the pros of jsonl involve incremental processing and streaming. There is also a nice simplicity in the jsonl approach with a larger volume of small and self-contained entries from the parsed amcache hive.

In the case of one JSON document, the benefit is that one could easily shovel the output of a single document to another script - though I feel like there is a decent amount of added complexity to the object. Each key in the document would need to be unique. That key could be a processed value of some kind - perhaps I could parse the base file name from the path attribute to be the key, with the value containing the other attributes from the FIELDS object:

{
    baseFileName: {
        "path" : "<path>", "hash" : "<hash>", "timestamp" : "<timestamp>"
    },
    baseFileName: {
        "path" : "<path>", "hash" : "<hash>", "timestamp" : "<timestamp>"
    }
}

...unless you had a different structure in mind.

Sheepishly, I suppose i'm saying "it depends". For me, it would be more natural to use jsonl output, which I could ship to Elasticsearch directly and easily without worrying much about parsing - or use an indexer to perform additional processesing if needed. I am certainly open to other opinions. I am probably way too comfortable in my current way of thinking.

PoorBillionaire · 2016-06-29T03:36:39Z

Additionally, I realized today that regardless of the format used, I have been ignoring the fact that I will need to (in my use-case at least) have the subject hostname or IP address available at run time: if an Analyst collects at scale, potentially thousands of amcache.hve files are produced - none of the analysis after that matters unless we can tie a given amcache entry to the host it was collected from.

In that case, perhaps the do_json piece would be wrapped in a function, which would take a host identifier as an optional function parameter. If the host parameter is provided, that value would be inserted into the json document. That value could be provided at the command line. Thoughts?

PoorBillionaire added 3 commits June 24, 2016 18:40

Added JSON-formatted output option

cd261bd

Added '-' to encoding to match Python docs

bdee3be

Using the print() function to stay consistent with the project

da9ed00

williballenthin reviewed Jun 25, 2016
View reviewed changes

PoorBillionaire added 3 commits June 25, 2016 10:34

Re-ordered imports

aa95a5b

Cleaned up jsonl functionality, removed lines I used during testing

49517ee

Adding mutual exclusivity to -t and -j command-line switches

9276975

PoorBillionaire reviewed Jun 25, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added JSON-Formatted Output #67

Added JSON-Formatted Output #67

Uh oh!

PoorBillionaire commented Jun 25, 2016 •

edited

Loading

Uh oh!

williballenthin Jun 25, 2016

Uh oh!

williballenthin commented Jun 25, 2016 •

edited

Loading

Uh oh!

PoorBillionaire Jun 25, 2016

Uh oh!

PoorBillionaire commented Jun 25, 2016

Uh oh!

PoorBillionaire commented Jun 29, 2016 •

edited

Loading

Uh oh!

PoorBillionaire commented Jun 29, 2016 •

edited

Loading

Uh oh!

Uh oh!

Added JSON-Formatted Output #67

Are you sure you want to change the base?

Added JSON-Formatted Output #67

Uh oh!

Conversation

PoorBillionaire commented Jun 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

williballenthin Jun 25, 2016

Choose a reason for hiding this comment

Uh oh!

williballenthin commented Jun 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PoorBillionaire Jun 25, 2016

Choose a reason for hiding this comment

Uh oh!

PoorBillionaire commented Jun 25, 2016

Uh oh!

PoorBillionaire commented Jun 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PoorBillionaire commented Jun 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

PoorBillionaire commented Jun 25, 2016 •

edited

Loading

williballenthin commented Jun 25, 2016 •

edited

Loading

PoorBillionaire commented Jun 29, 2016 •

edited

Loading

PoorBillionaire commented Jun 29, 2016 •

edited

Loading