Skip to content

Commit

Permalink
Merge pull request #234 from SAP/develop
Browse files Browse the repository at this point in the history
upgrade to v4.8.0
  • Loading branch information
marcorosa authored May 9, 2022
2 parents cffda44 + 6a0daa1 commit f368a05
Show file tree
Hide file tree
Showing 6 changed files with 244 additions and 27 deletions.
7 changes: 7 additions & 0 deletions .pre-commit-hooks.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
- id: credential-digger-hook
name: Credential Digger hook
description: This hook run Credential Digger on pre-commited files
entry: credentialdigger
args: ['hook']
language: python
pass_filenames: false
51 changes: 28 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,27 +14,27 @@ TLDR; watch the video ⬇️



- [Why](#why)
- [Requirements](#requirements)
- [Download and installation](#download-and-installation)
- [How to run](#how-to-run)
- [Credential Digger](#credential-digger)
- [Why](#why)
- [Requirements](#requirements)
- [Download and Installation](#download-and-installation)
- [How to run](#how-to-run)
- [Add rules](#add-rules)
- [Scan a repository](#scan-a-repository)
- [Docker container](#docker-container)
- [Advanced installation](#advanced-install)
- [Docker container](#docker-container)
- [Advanced Installation](#advanced-installation)
- [Build from source](#build-from-source)
- [External postgres database](#external-postgres-database)
- [How to update the project](#how-to-updade-the-project)
- [Python library usage](#python-library-usage)
- [Add rules](#add-rules)
- [Scan a repository](#scan-a-repository)
- [CLI - Command Line Interface](#cli-command-line-interface)
- [Pypi install source](#pypi-install-source)
- [Wiki](#wiki)
- [Contributing](#contributing)
- [How to obtain support](#how-to-obtain-support)
- [News](#news)

- [How to update the project](#how-to-update-the-project)
- [Python library usage](#python-library-usage)
- [Add rules](#add-rules-1)
- [Scan a repository](#scan-a-repository-1)
- [CLI - Command Line Interface](#cli---command-line-interface)
- [pre-commit hook](#pre-commit-hook)
- [Wiki](#wiki)
- [Contributing](#contributing)
- [How to obtain support](#how-to-obtain-support)
- [News](#news)

## Why
In data protection, one of the most critical threats is represented by hardcoded (or plaintext) credentials in open-source projects. Several tools are already available to detect leaks in open-source platforms, but the diversity of credentials (depending on multiple factors such as the programming language, code development conventions, or developers' personal habits) is a bottleneck for the effectiveness of these tools. Their lack of precision leads to a very high number of pieces of code incorrectly detected as leaked secrets. Data wrongly detected as a leak is called _false positive_ data, and compose the huge majority of the data detected by currently available tools.
Expand Down Expand Up @@ -96,27 +96,27 @@ One of the core components of Credential Digger is the regular expression scanne
**Before the very first scan, you need to add the rules that will be used by the scanner.** This step is only needed once.

```bash
python -m credentialdigger add_rules --sqlite /path/to/data.db /path/to/rules.yaml
credentialdigger add_rules --sqlite /path/to/data.db /path/to/rules.yaml
```

### Scan a repository

After adding the rules, you can scan a repository:

```bash
python -m credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db
credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db
```

Machine learning models are not mandatory, but highly recommended in order to reduce the manual effort of reviewing the result of a scan:

```bash
python -m credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db --models PathModel PasswordModel
credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db --models PathModel PasswordModel
```

As for the models, also the similarity feature is not mandatory, but highly recommended in order to reduce the manual effort while assessing the discoveries after a scan:

```bash
python -m credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db --similarity --models PathModel PasswordModel
credentialdigger scan https://github.com/user/repo --sqlite /path/to/data.db --similarity --models PathModel PasswordModel
```


Expand Down Expand Up @@ -237,9 +237,14 @@ Credential Digger also offers a simple CLI to scan a repository. The CLI support

Refer to the [Wiki](https://github.com/SAP/credential-digger/wiki) for all the supported commands and their usage.

## Pypi install source

Credential Digger Python install is also avaialble as a Pypi project https://pypi.org/project/credentialdigger/
## pre-commit hook

Credential Digger can be used with the [pre-commit](https://pre-commit.com/) framework to scan staged files before each commit.

Please, refer to the [Wiki page of the pre-commit hook](https://github.com/SAP/credential-digger/wiki/pre-commit-hook) for further information on its installation and execution.



## Wiki

Expand Down
7 changes: 6 additions & 1 deletion credentialdigger/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,10 @@

from credentialdigger.cli import cli

if __name__ == "__main__":

def main():
cli.main(sys.argv)


if __name__ == '__main__':
main()
22 changes: 20 additions & 2 deletions credentialdigger/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from credentialdigger import PgClient, SqliteClient
from dotenv import load_dotenv

from . import (add_rules, get_discoveries, scan, scan_path,
from . import (add_rules, get_discoveries, hook, scan, scan_path,
scan_snapshot, scan_user, scan_wiki)

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -53,6 +53,17 @@ def main(sys_argv):
during the scan (e.g., during the insertion of the detections in \
the db)')

parser_hook_base = customParser(add_help=False)
parser_hook_base.add_argument(
'--rules', default=None, type=str,
help='Specify the yaml file path containing the scan rules \
e.g., /path/to/rules.yaml')
parser_hook_base.add_argument(
'--no_interaction', action='store_true',
help='Flag used to remove the interaction i.e., do not prompt if the \
commit should continue in case of discoveries. If specified, \
the hook will fail in case of discoveries are found.')

# add_rules subparser configuration
parser_add_rules = subparsers.add_parser(
'add_rules', help='Add scanning rules from a file to the database',
Expand Down Expand Up @@ -95,6 +106,12 @@ def main(sys_argv):
parents=[parser_dotenv, parser_sqlite])
get_discoveries.configure_parser(parser_get_discoveries)

# hook subparser configuration
parser_get_discoveries = subparsers.add_parser(
'hook', help='Launch Credential Digger as a pre-commit hook',
parents=[parser_dotenv, parser_sqlite, parser_hook_base])
hook.configure_parser(parser_get_discoveries)

# Run the parser
if len(sys_argv) == 1:
main_parser.print_help()
Expand All @@ -107,6 +124,7 @@ def main(sys_argv):
if args.func in [
add_rules.run,
get_discoveries.run,
hook.run,
scan.run,
scan_user.run,
scan_wiki.run,
Expand All @@ -115,7 +133,7 @@ def main(sys_argv):
]:
# Connect to db only when running commands that need it
if args.sqlite:
client = SqliteClient(args.sqlite)
client = SqliteClient(os.path.expanduser(args.sqlite))
logger.info('Database in use: Sqlite')
else:
client = PgClient(dbname=os.getenv('POSTGRES_DB'),
Expand Down
180 changes: 180 additions & 0 deletions credentialdigger/cli/hook.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
"""
The 'hook' module can be used to run credential digger as a pre-commit hook.
It detects hardcoded secrets in staged files blocking the commit before the
code gets public.
usage: credentialdigger hook [-h] [--dotenv DOTENV] [--sqlite SQLITE]
[--rules RULES] [--no_interaction]
optional arguments:
-h, --help show this help message and exit
--dotenv DOTENV The path to the .env file which will be used in all
commands. If not specified, the one in the current
directory will be used (if present).
--sqlite SQLITE If specified, scan the repo using the sqlite client
passing as argument the path of the db.
Otherwise, use postgres (must be up and running)
--rules RULES Specify the yaml file path containing the scan rules
e.g., /path/to/rules.yaml
--no_interaction Flag used to remove the interaction i.e.,
do not prompt if the commit should continue
in case of discoveries. If specified, the hook will
fail in case of discoveries.
"""

import subprocess
import sys

from credentialdigger.models.model_manager import ModelManager


def configure_parser(parser):
""" Configure arguments for command line parser.
Parameters
----------
parser: `credentialdigger.cli.customParser`
Command line parser
"""
parser.set_defaults(func=run)


def system(*args, **kwargs):
"""Run a command and get the result."""
kwargs.setdefault('stdout', subprocess.PIPE)
proc = subprocess.Popen(args, **kwargs)
out, err = proc.communicate()
return out


def print_msg(msg):
"""Print a message to /dev/tty."""
subprocess.run(f'echo \"\n{msg}\n\" > /dev/tty',
shell=True,
stdout=subprocess.PIPE)


def ask_commit(str_discoveries):
"""Ask for the commit confirmation in case of possible leaks.
Parameters
----------
str_discoveries: str
Discoveries formatted as a string
"""

msg = 'You have the following disoveries:\n\n' \
f'{str_discoveries}\nWould you like to commit anyway? (y/N)'
print_msg(msg)

sys.stdin = open('/dev/tty', 'r')
# Create a process on /dev/tty to capture the input (commit or not)
# It reads the input, saves it in userinput and echos it
# subprocess.check_output return the output of the command i.e., userinpput
user_input = subprocess.check_output('read -p \"\" userinput && echo '
'\"$userinput\"',
shell=True, stdin=sys.stdin).rstrip()

return user_input.decode('utf-8')


def run(client, args):
"""Run Credential Digger on staged files.
Parameters
----------
client: `credentialdigger.Client`
Instance of the client on which to save results
args: `argparse.Namespace`
Arguments from command line parser.
Returns
-------
While this function returns nothing, it gives an exit status (integer)
that is equal to the number of discoveries causing the hook to fail.
If it exits with a value that is equal to 0, then it means
that the scan detected no leaks in the staged files, or it means,
in case interaction, that the user choosed to commit even
in case of leaks. If the exit value is 0 the hook is successful.
"""

files_status = system('git', 'diff', '--name-status', '--staged'
).decode('utf-8').splitlines()
files = []
for fs in files_status:
stats = fs.split('\t')
status = stats[0]
# Check status using the first char
# D = deleted files
# R = renamed files
if status[0] not in 'DR':
# Get the name of the staged file
filename = stats[1]
files.append(filename)

if args.rules:
client.add_rules_from_file(args.rules)
elif not client.get_rules():
client.add_rules_from_file('./ui/backend/rules.yml')

new_discoveries = []
subprocess.run(f'echo \"\nChecking files={files} \" > /dev/tty',
shell=True,
stdout=subprocess.PIPE)

# For optimization purposes, the PathModel and the PasswordModel are
# separated, otherwise scan_path will call both models for each file
# With this implementation the discoveries are accumulated and the
# PasswordModel will be run only once for the password discoveries
for staged_file in files:
new_discoveries += client.scan_path(scan_path=staged_file,
models=['PathModel'],
force=True,
debug=False)

if not new_discoveries:
print_msg('No hardcoded secrets found in your commit')
sys.exit(0)

rules = client.get_rules()
password_rules = set([
r['id'] for r in rules if r['category'] == 'password'])
password_discoveries = []
no_password_discoveries = []
for d in new_discoveries:
disc = client.get_discovery(d)
if disc['rule_id'] in password_rules:
password_discoveries.append(disc)
else:
no_password_discoveries.append(disc)

# Run the PasswordModel
disc = []
if password_discoveries:
mm = ModelManager('PasswordModel')
disc = mm.launch_model_batch(password_discoveries)

list_of_discoveries = []
for d in disc:
if d['state'] == 'new':
list_of_discoveries.append(d)

# There may be also discoveries other than passwords
list_of_discoveries += no_password_discoveries
# If all the discoveries were false positive discoveries
if not list_of_discoveries:
print_msg('No hardcoded secrets found in your commit')
sys.exit(0)

str_discoveries = ''
for d in list_of_discoveries:
str_discoveries += (f'file: {d["file_name"]}\n'
f'secret: {d["snippet"]}\n'
f'line number: {d["line_number"]}\n' +
40 * '-')

if not args.no_interaction and \
ask_commit(str_discoveries).startswith(('y', 'Y')):
print_msg('Committing...')
sys.exit(0)
sys.exit(len(list_of_discoveries))
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ def requirements():

setuptools.setup(
name='credentialdigger',
version='4.7.0',
version='4.8.0',
author='SAP SE',
maintainer='Marco Rosa, Slim Trabelsi',
maintainer_email='[email protected], [email protected]',
Expand All @@ -29,4 +29,6 @@ def requirements():
'Operating System :: OS Independent',
],
python_requires='>3.5, <3.10',
entry_points={'console_scripts': ['credentialdigger=credentialdigger'
'.__main__:main']},
)

0 comments on commit f368a05

Please sign in to comment.