Skip to content

Add Provenance Tracking to File Upload Service#112

Merged
parthnair98 merged 1 commit intoRedback-Operations:mainfrom
shimrxn:provenance-feature
Sep 16, 2025
Merged

Add Provenance Tracking to File Upload Service#112
parthnair98 merged 1 commit intoRedback-Operations:mainfrom
shimrxn:provenance-feature

Conversation

@shimrxn
Copy link
Contributor

@shimrxn shimrxn commented Sep 10, 2025

This PR introduces a provenance feature to the Streamlit file upload service, ensuring every uploaded file is accompanied by auditable metadata for traceability.


Pull Request: Add Provenance Tracking to File Upload Service

This PR introduces a provenance feature to the Streamlit file upload service, ensuring every uploaded file is accompanied by auditable metadata for traceability.

Key changes:

  • Added provenance fields (provenance_source, source_url) to capture dataset origin.

  • Implemented URL validation to ensure correct source formatting.

  • Extended upload flow to generate a provenance JSON log containing:

    • filename, project, preprocessing option, uploader info, and timestamp
    • digital signature for each provenance entry
  • Provenance logs are now stored in MinIO alongside the uploaded files.

  • Added a new "Provenance Logs" tab in the Streamlit UI to browse and view logs directly.

Impact:

  • Improves data governance and accountability by linking files to their origin.
  • Supports future auditing and compliance requirements.
  • Production-ready design aligned with Redback’s warehouse architecture.

@github-actions
Copy link

🔒 Security Scan Results

🔒 Security Scan Results
=========================

Bandit Scan Results:
-------------------
Run started:2025-09-10 07:41:55.781022

Test results:
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:100:17
99	    port = int(os.getenv('FLASK_RUN_PORT', 5000))
100	    app.run(host='0.0.0.0', port=port)

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Core DW Infrastructure/flask/flaskapi_dw.py:86:17
85	if __name__ == '__main__':
86	    app.run(host='0.0.0.0', port=5000)  # Running on port 5000 IMPORTANT

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./File Upload Service/flask/flaskapi_dw.py:86:17
85	if __name__ == '__main__':
86	    app.run(host='0.0.0.0', port=5000)  # Running on port 5000 IMPORTANT

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./MongoDB_Connection/Project1/main.py:12:35
11	    debug_mode = os.environ.get('FLASK_DEBUG', 'False').lower() == 'true'
12	    app.run(debug=debug_mode, host='0.0.0.0')

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:100:17
99	    port = int(os.getenv('FLASK_RUN_PORT', 5000))
100	    app.run(host='0.0.0.0', port=port)

--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   CWE: CWE-89 (https://cwe.mitre.org/data/definitions/89.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6/plugins/b608_hardcoded_sql_expressions.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:168:12
167	    placeholders = ', '.join(['?' for _ in data[0]])
168	    query = f"INSERT INTO {table_name} VALUES ({placeholders})"
169	    cursor = conn.cursor()

--------------------------------------------------
>> Issue: [B108:hardcoded_tmp_directory] Probable insecure usage of temp file/directory.
   Severity: Medium   Confidence: Medium
   CWE: CWE-377 (https://cwe.mitre.org/data/definitions/377.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6/plugins/b108_hardcoded_tmp_directory.html
   Location: ./pre-processing/pre-processing.py:177:29
176	
177	            temp_file_path = f'/tmp/{obj.object_name}'
178	

--------------------------------------------------

Code scanned:
	Total lines of code: 2481
	Total lines skipped (#nosec): 0
	Total potential issues skipped due to specifically being disabled (e.g., #nosec BXXX): 0

Run metrics:
	Total issues (by severity):
		Undefined: 0
		Low: 10
		Medium: 7
		High: 0
	Total issues (by confidence):
		Undefined: 0
		Low: 1
		Medium: 6
		High: 10
Files skipped (0):

Dependency Check Results:
-----------------------

No critical security issues detected.

The code has passed all critical security checks.

Copy link
Collaborator

@jd-deakin jd-deakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, security improvement.

@parthnair98 parthnair98 merged commit 6496b02 into Redback-Operations:main Sep 16, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants