Skip to content

birhanu-eshete/SPIF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

#SPIF - Sensitive and Private Information Filter

alt text

A tool for filtering open data for private/sensitive information before publication

This Tool is developed during the RHoK Global Hackaton in Trento, Italy - Dec 1-2, 2012

The problem statement is at - http://www.rhok.org/problems/ffilter-tool-screening-open-data-privatesensitive-information-publication

We would like to have your feedback!

##Features

Supports:

  • text files, CSV
  • XLS, XLSX, DOCX
  • XML, HTML
  • PDF

Filters:

  • Social Security Number
  • Credit Cards Number
  • Telephone Number
  • Bank Account Number

##Requirements/Dependencies

##Operating Systems This script is indepent of the host platform (Windows, Linux or MacOS). All it needs is a Python interpretor. We tested this script on:

  • MacOs 10.6.4
  • Ubuntu 12.10 LTS

##Usage

  • Command Line Mode python spif.py filename.ext
  • GUI Mode python main.py

##Known Issues

  • Not all pdf files are supported by PyPdf library, do not get surprised if the script is selective on pdf files as the conversion scheme matters.
  • Docx conversion does not give the page number where a sensitive or private information is found. So it is a little course-grained.
  • Depending on what you are scanning, some filters could be noisy or quite useful. For example, a 16 digits pattern flagged as credit card number in a session cookie of an HTML page is useless while the same pattern in a network traffic could be an example of a real credit card number being sent over unencrypted connection.

##Authors

  • Birhanu Eshete - birhanu.mekuria(at)gmail.com
  • Ali Fawzi Najm Al-Shammari - afnfun(at)yahoo.com
  • Michele Fogarolli - michelefoga(at)gmail.com

##License This code is released under the MIT License

About

Sensitive and Private Information Filter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages