Updating Complex Modules to Include LAVA Output

This guide outlines the process of updating existing complex xLEAPP modules to include LAVA output. These modules typically process a single group of file pattern searches but generate multiple HTML and TSV outputs.

Note that if you are just attempting to split output into multiple HTML pages or just handle HTML manually for some other reason but the LAVA and TSV is a single table/output you can probably still use the instructions found in Updating Modules for Automatic Output Generation. (Refer to carCD.py for an example of manual HTML with automatic LAVA/TSV processing)

Key Changes

Update the __artifacts_v2__ block
Modify imports (add LAVA and Media Manager functions)
Handle Media Files (if applicable)
Add LAVA output generation to the main function
Use special data type handlers for LAVA output

Detailed Process

1. Update the `__artifacts_v2__` block

Modify the __artifacts_v2__ dictionary to include all required fields for the complex artifact. This dictionary should be the first thing in the script, before any imports or other code. Make sure to set output_types to "none" since we're not using the artifact processor for automated output generation. Additionally, include a "function" key with the name of the main processing function as its value.

__artifacts_v2__ = {
    "complex_artifact": {
        "name": "Complex Artifact Name",
        "description": "Description of the complex artifact",
        "author": "@AuthorUsername",
        "version": "1.0",
        "date": "2023-05-24",
        "requirements": "none",
        "category": "Complex Artifact Category",
        "notes": "",
        "paths": ('Path/to/complex/artifact/files',),
        "output_types": "none",  # Changed from ["HTML", "TSV"] to "none"
        "function": "get_complex_artifact"  # This should match the name of your main processing function
    }
}

2. Modify Imports

Add imports for LAVA functions and, if your module handles media, the Media Manager functions from ilapfuncs.

# Existing imports for your module...
# import scripts.artifact_report # For manual HTML reports
# from scripts.ilapfuncs import tsv # For manual TSV

# Add these for LAVA and Media Management:
from scripts.lavafuncs import lava_process_artifact, lava_insert_sqlite_data
from scripts.ilapfuncs import (
    check_in_media,
    check_in_embedded_media,
    get_file_path, # If you use it to find files
    # ... other necessary ilapfuncs ...
)
import os # Often needed for path joining with 'data' subfolder

3. Handle Media Files (If Applicable)

If your complex artifact involves media files, use the Media Manager functions (check_in_media or check_in_embedded_media) to process them. This step should occur before you populate your data_list with media references and before you call lava_process_artifact.

Get artifact_info: Retrieve the artifact's metadata dictionary from __artifacts_v2__ using the current function's name.

# Inside your main processing function (e.g., get_complex_artifact)
# This assumes 'complex_artifact' is the key in __artifacts_v2__
# and 'get_complex_artifact' is the name of the current function.
# This dict is passed to check_in_media or check_in_embedded_media.
current_artifact_info = __artifacts_v2__["complex_artifact"]

Call check_in_media or check_in_embedded_media: These functions return a media_ref_id (string), which you then put into your data_list.

For files on disk (check_in_media):

image_file_pattern = "**/path/to/image.jpg" # Pattern to find the media file
found_image_path_in_extraction = get_file_path(files_found, image_file_pattern)

media_ref_id = None
if found_image_path_in_extraction:
    media_ref_id = check_in_media(
        artifact_info=current_artifact_info,
        report_folder=report_folder,
        seeker=seeker,
        files_found=files_found, 
        file_path=image_file_pattern, 
        name="Descriptive Name for Media Item" # Optional
    )
# Add media_ref_id (or placeholder if None) to your data_list
# e.g., data_list_for_part1.append((timestamp, event_details, media_ref_id))

For embedded binary data (check_in_embedded_media):

binary_media_data = record['blob_column'] # Actual bytes of the media
source_db_path = get_file_path(files_found, "**/source_app.db")

media_ref_id = None
if binary_media_data and source_db_path:
    media_ref_id = check_in_embedded_media(
        artifact_info=current_artifact_info,
        report_folder=report_folder, 
        seeker=seeker,
        source_file=source_db_path,
        data=binary_media_data,
        name="Embedded Media Item Name" # Optional
    )
# Add media_ref_id to your data_list

Update data_headers: For each data_list that will contain media_ref_ids, ensure the corresponding data_headers list marks that column with the type 'media'.
```
# For data_headers_part1 if it has a media column:
data_headers_part1 = [('Timestamp', 'datetime'), 'Description', ('Photo', 'media')]
#
# For data_headers_part2 if it also has media:
data_headers_part2 = ['User', ('Attachment', 'media', 'max-width:50px;')] # Optional style for HTML
```
This 'media' type is essential for lava_process_artifact to correctly handle the media_ref_id. It also helps the manual HTML report generation if you have a helper function that understands this tuple format for media.

4. Add LAVA output generation to the main function

After processing data (including any media calls), generating your manual HTML and TSV reports, add the LAVA output generation for each distinct dataset.

# In your main processing function, e.g., get_complex_artifact
def get_complex_artifact(files_found, report_folder, seeker, wrap_text, timezone_offset):
    data_list1 = []
    data_list2 = []
    data_headers1 = ['Column1', 'Column2', 'Column3']
    data_headers2 = ['ColumnA', 'ColumnB', 'ColumnC']

    for file_found in files_found:
        # Process data and populate data_list1 and data_list2
        # ...

    # Generate HTML report for the first artifact
    report1 = ArtifactHtmlReport('Complex Artifact - Part 1')
    report1.start_artifact_report(report_folder, 'Complex Artifact - Part 1')
    report1.add_script()
    report1.write_artifact_data_table(data_headers1, data_list1, file_found)
    report1.end_artifact_report()

    # Generate TSV for the first artifact
    tsv(report_folder, data_headers1, data_list1, 'Complex Artifact - Part 1')

    # Generate HTML report for the second artifact
    report2 = ArtifactHtmlReport('Complex Artifact - Part 2')
    report2.start_artifact_report(report_folder, 'Complex Artifact - Part 2')
    report2.add_script()
    report2.write_artifact_data_table(data_headers2, data_list2, file_found)
    report2.end_artifact_report()

    # Generate TSV for the second artifact
    tsv(report_folder, data_headers2, data_list2, 'Complex Artifact - Part 2')

    # Generate LAVA output
    category = "Complex Artifact Category"
    module_name = "get_complex_artifact"

    # Add special data type handlers for LAVA output
    data_headers1[0] = (data_headers1[0], 'datetime')
    data_headers2[2] = (data_headers2[2], 'date')

    # Process first artifact for LAVA
    table_name1, object_columns1, column_map1 = lava_process_artifact(category, module_name, 'Complex Artifact - Part 1', data_headers1, len(data_list1))
    lava_insert_sqlite_data(table_name1, data_list1, object_columns1, data_headers1, column_map1)

    # Process second artifact for LAVA
    table_name2, object_columns2, column_map2 = lava_process_artifact(category, module_name, 'Complex Artifact - Part 2', data_headers2, len(data_list2))
    lava_insert_sqlite_data(table_name2, data_list2, object_columns2, data_headers2, column_map2)

Important Notes

The output_types in __artifacts_v2__ is set to "none" because any HTML/TSV generation is manual. LAVA output is added by directly calling lava_process_artifact and lava_insert_sqlite_data.
Call LAVA functions for each distinct data table you want to represent in LAVA.
The category comes from your __artifacts_v2__ entry. The module_name for lava_process_artifact is the name of your Python artifact processing function (e.g., get_complex_artifact). The artifact_name_for_lava should be a unique, descriptive name for that specific table being generated.
Media Handling:
- Use check_in_media / check_in_embedded_media from ilapfuncs.py.
- Pass the data subfolder path (e.g., os.path.join(report_folder, 'data')) as the report_folder argument to these media functions.
- These functions handle copying/linking media and creating LAVA database entries (media_items, media_references).
- The media_ref_id returned is placed in your data_list.
- lava_process_artifact interprets this media_ref_id correctly when the column header is typed as 'media'.

These changes will add LAVA output capability to your complex module while maintaining its existing HTML and TSV outputs and utilizing special data type handlers for LAVA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!