Skip to content

Commit 557ca0e

Browse files
ewdurbinxmunozwoodruffw
authored
Tooling for automated detection of malware (#7377)
* Add new models for malware detection. (#7118) * Add new models for malware detection. Fixes #7090 and #7092. * Code review changes. - FK on release_file.id field instead of md5 - Change message type from String to Text - Change Enum class in model to singular form * Add admin interface to view and enable checks (#7134) * Add admin interface to view and enable checks - Implement list, detail and change_state views (#7133) - Add unit tests for check admin view * Add comprehensive test coverage for check admin * Add initial hook-based check execution mechanism (#7160) * Add initial hook-based check execution mechanism * scratch/poc * Add initial hook-based check execution mechanism * Use sqlalchemy event hooks for malware checks * Fix unit tests * Add enum for MalwareCheckObjectType * Add unit tests for init. * Add tests for tasks, services, and utils. Also, some small bugfixes in MalwareCheckFactory and the get_enabled_checks method. * Fix spurious task test. * Add missing drop enum to downgrade function. * Added TODO to dev/environment * Be more explicit in check lookup Co-authored-by: Ernest W. Durbin III <[email protected]> * Add malware check syncing mechanism (#7190) * Add malware check syncing mechanism * Code review changes. * Refactor MalwareCheckBase. Fixes #7091. (#7196) * Refactor MalwareCheckBase. Fixes #7091. Add Foreign Keys in MalwareVerdicts for other types of objects (Releases, Projects). * Change verdict dict to kwargs. * Add wipe-out functionality (#7202) * Add wipe-out functionality Related: #7133 * Call list explicitly * Add rudimentary verdicts view. Progress on #6062. (#7207) * Add rudimentary verdicts view. Progress on #6062. Also, add some better testing logic for wiped_out condition. * Code review changes. - Conditionally show fields that are populated - JSON pretty formatting * Fix unit test bug. - Use `get` instead of `filter` to look up verdict by pkey. * simplify unit tests for verdicts view * introduce malware queue (#7227) * introduce malware queue * correct syntax, apparently list of tuples documented doesn't work. * Add backfill functionality to check admin #7094 (#7232) * Add backfill functionality to check admin #7094 - Add backfill task - Change lookup of checks to check_name instead of id - Load checks that are also in "evaluation" state * Add unit tests for backfill. - Log number of runs executed by backfill - Perform basic validation on sample_rate input - Clean up other testing logic. * Remove superfluous 'all()' * Code review changes. - Set backfill size to a fix number, not configurable via web ui. - Backfill task enqueues run_check tasks - Only retry if `check.run` fails, not if loading the check fails. - Use exponential backoff for retries. * Update warehouse/admin/templates/admin/malware/checks/detail.html Co-Authored-By: Ernest W. Durbin III <[email protected]> Co-authored-by: Ernest W. Durbin III <[email protected]> * Refactor testing logic #7098 (#7257) - Add `schedule` field to MalwareCheck model #7096 - Move ExampleCheck into tests/common/ to remove test dependency from prod code - Rename functions and classes to differentiate between "hooked" and "scheduled" checks * Event-based Malware check (#7249) * requirements: Introduce yara * [WIP] malware/check: SetupPatternCheck In progress. Introduces SetupPatternCheck, an implementation of an event-based check that scans the `setup.py`s of release files for suspicious patterns. * malware/checks: Give MalwareCheckBase.run/scan args, kwargs * malware: Add check preparation Fiddle with the check/run signature a bit more. * malware/checks: Unpack file path correctly * docker-compose: Override FILES_BACKEND for worker The worker needs to be able to see the "files" virtual host during development so that malware checks can fetch their underlying release files. * [WIP] malware/checks: setup.py extraction * malware/checks: setup_patterns: Fix enum, seek * malware/checks: setup_patterns: Apply YARA rules Each rule match becomes a verdict. * malware/checks: setup_patterns: Prefer get over filter * warehouse/{admin,malware}: Consistent enum names Also enforce uniqueness for enum values. * warehouse/{admin,malware}: More enum changes * tests: Update admin, malware tests * tests: Fix enum, more test fixes * tests: Add prepare tests * malware/changes: base: Unpack id correctly * tests: Begin adding SetupPatternCheck tests * malware/checks: setup_patterns: Fix enum * tests: More SetupPatternCheck tests * warehouse/malware: setup_patterns: Fix enums * tests: More SetupPatternCheck tests * tests: Add license header * malware/checks: setup_patterns: Add TODO * tests: More SetupPatternCheck tests * tests: More SetupPatternCheck tests * tests: Complete extraction tests for SetupPatternCheck * tests: Fix test * malware/checks: Add docstring for prepare * malware/checks: blacken * malware/checks: Document, expand YARA rules * tests, warehouse: Restructure utilities * malware: Order some enums, reduce SetupPatternCheck verdicts * malware/models: Add missing __lt__ * malware/checks: Always embed the model object in the prepared arguments Use it instead of performing a DB request in the check itself. * malware/checks: Avoid raw bytes * malware/changes: Remove unused import * tests: Fixup malware tests * warehouse/malware: blacken * tests: Fill in malware coverage * tests, warehouse: Add a benign verdict for SetupPatternCheck * tests: blacken * Implement scheduled checks #7093 (#7271) * Implement scheduled checks #7093 - Rename `run_backfill` to `run_evaluation` in admin malware view - Modify `run` and `scan` method signatures to accept `**kwargs` - Extend `run_check` to accomodate scheduled check functionality * Reduce unit test flakiness * Code review changes. Also replace `check.hooked_object` with `check.hooked_object.value` in check detail template. * tests, warehouse: enum fixes * Fix lint error Co-authored-by: William Woodruff <[email protected]> * Add verdicts view filtering capabilities #6062. (#7322) * Add verdicts view filtering capabilities #6062. * Code review changes. - Refactor tests to be parametrized. - Pass `_query` to `route_path` in template. - Remove `is None` from filter query, it adds nothing. * Add verdict administrator review. Fixes #6062. (#7339) * Add verdict administrator review. Fixes #6062. - Add new `admin.verdicts.review` endpoint - Change layout of verdict list and detail view and add forms - Change sort order of the MalwareChecks, and update the tests * Code review changes. - Rename MalwareVerdict field `administrator_verdict` to `reviewer_verdict`. - Change verdict review permission from `admin` to `moderator`. * Misc cleanup and TODOs on malware checks. (#7355) * Misc cleanup and TODOs on malware checks. - Change backfill function to invoke `IMalwareCheckService` interface - Add support for `kwargs to `IMalwareCheckService` interface - Rename variable from reserved word `file` to `release_file` - Add `FatalCheckException` for non-retryable exceptions - Replace `MALWARE_CHECK_BACKEND` in dev/environment * Make `IMalwareService` the entrypoint for `run_check` - Add `run_scheduled_check` task that invokes this interface. - Remove useless utility method - Move `FatalCheckException` into warehouse/malware/errors.py. * malware/checks: PackageTurnover skeleton (#7321) * malware/checks: PackageTurnover skeleton * malware/checks: PackageTurnover: Add NOTE * malware/checks: PackageTurnoverCheck: more work * tests: blacken * malware/checks: More PackageTurnoverCheck work * malware/checks: Blacken * malware/checks: Blacken * package_turnover: Promote from indeterminate to threat * tests: Begin adding package_turnover tests * tests: Add remaining package_turnover tests * tests: Drop unused imports * warehouse: Drop (ww) from NOTE * checks/package_turnover: Drop NOTE Co-authored-by: Cristina <[email protected]> Co-authored-by: William Woodruff <[email protected]>
1 parent 3f0d4e0 commit 557ca0e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+4174
-4
lines changed

Procfile

+1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ release: bin/release
22
web: bin/start-web python -m gunicorn.app.wsgiapp -c gunicorn.conf.py warehouse.wsgi:application
33
web-uploads: bin/start-web python -m gunicorn.app.wsgiapp -c gunicorn-uploads.conf.py warehouse.wsgi:application
44
worker: bin/start-worker celery -A warehouse worker -Q default -l info --max-tasks-per-child 32
5+
worker-malware: bin/start-worker celery -A warehouse worker -Q malware -l info --max-tasks-per-child 32
56
worker-beat: bin/start-worker celery -A warehouse beat -S redbeat.RedBeatScheduler -l info

bin/release

+3
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,6 @@ set -eo pipefail
55

66
# Migrate our database to the latest revision.
77
python -m warehouse db upgrade head
8+
9+
# Insert/upgrade malware checks.
10+
python -m warehouse malware sync-checks

dev/environment

+2
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ MAIL_BACKEND=warehouse.email.services.SMTPEmailSender host=smtp port=2525 ssl=fa
2929

3030
BREACHED_PASSWORDS=warehouse.accounts.NullPasswordBreachedService
3131

32+
MALWARE_CHECK_BACKEND=warehouse.malware.services.PrinterMalwareCheckService
33+
3234
METRICS_BACKEND=warehouse.metrics.DataDogMetrics host=notdatadog
3335

3436
STATUSPAGE_URL=https://2p66nmmycsj3.statuspage.io

docker-compose.yml

+1
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ services:
9393
env_file: dev/environment
9494
environment:
9595
C_FORCE_ROOT: "1"
96+
FILES_BACKEND: "warehouse.packaging.services.LocalFileStorage path=/var/opt/warehouse/packages/ url=http://files:9001/packages/{path}"
9697
links:
9798
- db
9899
- redis

requirements/main.in

+1
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,6 @@ typeguard
5555
webauthn
5656
whitenoise
5757
WTForms>=2.0.0
58+
yara-python
5859
zope.sqlalchemy
5960
zxcvbn

requirements/main.txt

+14
Original file line numberDiff line numberDiff line change
@@ -594,6 +594,20 @@ wired==0.2.1 \
594594
wtforms==2.2.1 \
595595
--hash=sha256:0cdbac3e7f6878086c334aa25dc5a33869a3954e9d1e015130d65a69309b3b61 \
596596
--hash=sha256:e3ee092c827582c50877cdbd49e9ce6d2c5c1f6561f849b3b068c1b8029626f1
597+
yara-python==3.11.0 \
598+
--hash=sha256:105d851e050b32951ee577148c7f1b18c0a7c64432fef8159069191d522fba86 \
599+
--hash=sha256:1d35c7f606465015de02143dfa4e1ad2f4ee85fdb5d5af756b51b2bac62ac7bc \
600+
--hash=sha256:24cd492d6bf8ecedb128f5b02886770be9df03bd1b84ab06a978d45bb1a8ff92 \
601+
--hash=sha256:58cfc837e7769811afbfb19b1db952ec01e50cdbf9df576fb587e1e343694526 \
602+
--hash=sha256:5b8d708751a66d1507d819218d06baccdf5527c147c2bd3062f087e2f367a17d \
603+
--hash=sha256:6f90bb264470235549e1bb4e355fa82895409cd46f27aceecaddfbf55e66ed71 \
604+
--hash=sha256:70d39c2238c5854e7cd8f11595317dc4d89417e88035d8acca24bcc58a93150f \
605+
--hash=sha256:8d255349d69d833bca604b4215bdf499c87357172512273feb934f6442b8e6b2 \
606+
--hash=sha256:8e44f9600607cb1d74a0f26df5d0a1c06ea54f4601206124f47f1bbb58e6a374 \
607+
--hash=sha256:9e4fafc327e3a343c545dcf5f173fa8bc712aebffe5f034d205c0bac1f1c5df6 \
608+
--hash=sha256:c919ee656139ed46a0056e8a3de179bbc98d42a2be6fb85c95b1e2ec65396b34 \
609+
--hash=sha256:e4124414d3cff9a10669569a89f585f81c8114b283ab48b2e756e0347a89de0a \
610+
--hash=sha256:f104f0bb21a0867f22e750bb4e05de629ec9f37facc84daf963385a86371b0d9
597611
zipp==2.1.0 \
598612
--hash=sha256:ccc94ed0909b58ffe34430ea5451f07bc0c76467d7081619a454bf5c98b89e28 \
599613
--hash=sha256:feae2f18633c32fc71f2de629bfb3bd3c9325cd4419642b1f1da42ee488d9b98

tests/common/checks/__init__.py

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Licensed under the Apache License, Version 2.0 (the "License");
2+
# you may not use this file except in compliance with the License.
3+
# You may obtain a copy of the License at
4+
#
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
#
7+
# Unless required by applicable law or agreed to in writing, software
8+
# distributed under the License is distributed on an "AS IS" BASIS,
9+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
# See the License for the specific language governing permissions and
11+
# limitations under the License.
12+
13+
from .hooked import ExampleHookedCheck # noqa
14+
from .scheduled import ExampleScheduledCheck # noqa

tests/common/checks/hooked.py

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Licensed under the Apache License, Version 2.0 (the "License");
2+
# you may not use this file except in compliance with the License.
3+
# You may obtain a copy of the License at
4+
#
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
#
7+
# Unless required by applicable law or agreed to in writing, software
8+
# distributed under the License is distributed on an "AS IS" BASIS,
9+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
# See the License for the specific language governing permissions and
11+
# limitations under the License.
12+
13+
from warehouse.malware.checks.base import MalwareCheckBase
14+
from warehouse.malware.errors import FatalCheckException
15+
from warehouse.malware.models import VerdictClassification, VerdictConfidence
16+
17+
18+
class ExampleHookedCheck(MalwareCheckBase):
19+
20+
version = 1
21+
short_description = "An example hook-based check"
22+
long_description = "The purpose of this check is to test the \
23+
implementation of a hook-based check. This check will generate verdicts if enabled."
24+
check_type = "event_hook"
25+
hooked_object = "File"
26+
27+
def __init__(self, db):
28+
super().__init__(db)
29+
30+
def scan(self, **kwargs):
31+
file_id = kwargs.get("obj_id")
32+
if file_id is None:
33+
raise FatalCheckException("Missing required kwarg `obj_id`")
34+
35+
self.add_verdict(
36+
file_id=file_id,
37+
classification=VerdictClassification.Benign,
38+
confidence=VerdictConfidence.High,
39+
message="Nothing to see here!",
40+
)

tests/common/checks/scheduled.py

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Licensed under the Apache License, Version 2.0 (the "License");
2+
# you may not use this file except in compliance with the License.
3+
# You may obtain a copy of the License at
4+
#
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
#
7+
# Unless required by applicable law or agreed to in writing, software
8+
# distributed under the License is distributed on an "AS IS" BASIS,
9+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
# See the License for the specific language governing permissions and
11+
# limitations under the License.
12+
13+
from warehouse.malware.checks.base import MalwareCheckBase
14+
from warehouse.malware.models import VerdictClassification, VerdictConfidence
15+
from warehouse.packaging.models import Project
16+
17+
18+
class ExampleScheduledCheck(MalwareCheckBase):
19+
20+
version = 1
21+
short_description = "An example scheduled check"
22+
long_description = "The purpose of this check is to test the \
23+
implementation of a scheduled check. This check will generate verdicts if enabled."
24+
check_type = "scheduled"
25+
schedule = {"minute": "0", "hour": "*/8"}
26+
27+
def __init__(self, db):
28+
super().__init__(db)
29+
30+
def scan(self, **kwargs):
31+
project = self.db.query(Project).first()
32+
self.add_verdict(
33+
project_id=project.id,
34+
classification=VerdictClassification.Benign,
35+
confidence=VerdictConfidence.High,
36+
message="Nothing to see here!",
37+
)

tests/common/db/malware.py

+63
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Licensed under the Apache License, Version 2.0 (the "License");
2+
# you may not use this file except in compliance with the License.
3+
# You may obtain a copy of the License at
4+
#
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
#
7+
# Unless required by applicable law or agreed to in writing, software
8+
# distributed under the License is distributed on an "AS IS" BASIS,
9+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
# See the License for the specific language governing permissions and
11+
# limitations under the License.
12+
13+
import datetime
14+
15+
import factory
16+
import factory.fuzzy
17+
18+
from warehouse.malware.models import (
19+
MalwareCheck,
20+
MalwareCheckObjectType,
21+
MalwareCheckState,
22+
MalwareCheckType,
23+
MalwareVerdict,
24+
VerdictClassification,
25+
VerdictConfidence,
26+
)
27+
28+
from .base import WarehouseFactory
29+
from .packaging import FileFactory
30+
31+
32+
class MalwareCheckFactory(WarehouseFactory):
33+
class Meta:
34+
model = MalwareCheck
35+
36+
name = factory.fuzzy.FuzzyText(length=12)
37+
version = 1
38+
short_description = factory.fuzzy.FuzzyText(length=80)
39+
long_description = factory.fuzzy.FuzzyText(length=300)
40+
check_type = factory.fuzzy.FuzzyChoice(list(MalwareCheckType))
41+
hooked_object = factory.fuzzy.FuzzyChoice(list(MalwareCheckObjectType))
42+
schedule = {"minute": "*/10"}
43+
state = factory.fuzzy.FuzzyChoice(list(MalwareCheckState))
44+
created = factory.fuzzy.FuzzyNaiveDateTime(
45+
datetime.datetime.utcnow() - datetime.timedelta(days=7)
46+
)
47+
48+
49+
class MalwareVerdictFactory(WarehouseFactory):
50+
class Meta:
51+
model = MalwareVerdict
52+
53+
check = factory.SubFactory(MalwareCheckFactory)
54+
release_file = factory.SubFactory(FileFactory)
55+
release = None
56+
project = None
57+
manually_reviewed = True
58+
reviewer_verdict = factory.fuzzy.FuzzyChoice(list(VerdictClassification))
59+
classification = factory.fuzzy.FuzzyChoice(list(VerdictClassification))
60+
confidence = factory.fuzzy.FuzzyChoice(list(VerdictConfidence))
61+
message = factory.fuzzy.FuzzyText(length=80)
62+
full_report_link = None
63+
details = None

tests/common/db/packaging.py

+1
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ class Meta:
8383

8484
release = factory.SubFactory(ReleaseFactory)
8585
python_version = "source"
86+
filename = factory.fuzzy.FuzzyText(length=12)
8687
md5_digest = factory.LazyAttribute(
8788
lambda o: hashlib.md5(o.filename.encode("utf8")).hexdigest()
8889
)

tests/conftest.py

+3
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,9 @@ def app_config(database):
174174
"files.backend": "warehouse.packaging.services.LocalFileStorage",
175175
"docs.backend": "warehouse.packaging.services.LocalFileStorage",
176176
"mail.backend": "warehouse.email.services.SMTPEmailSender",
177+
"malware_check.backend": (
178+
"warehouse.malware.services.PrinterMalwareCheckService"
179+
),
177180
"files.url": "http://localhost:7000/",
178181
"sessions.secret": "123456",
179182
"sessions.url": "redis://localhost:0/",

tests/unit/admin/test_routes.py

+23
Original file line numberDiff line numberDiff line change
@@ -123,4 +123,27 @@ def test_includeme():
123123
pretend.call("admin.flags.edit", "/admin/flags/edit/", domain=warehouse),
124124
pretend.call("admin.squats", "/admin/squats/", domain=warehouse),
125125
pretend.call("admin.squats.review", "/admin/squats/review/", domain=warehouse),
126+
pretend.call("admin.checks.list", "/admin/checks/", domain=warehouse),
127+
pretend.call(
128+
"admin.checks.detail", "/admin/checks/{check_name}", domain=warehouse
129+
),
130+
pretend.call(
131+
"admin.checks.change_state",
132+
"/admin/checks/{check_name}/change_state",
133+
domain=warehouse,
134+
),
135+
pretend.call(
136+
"admin.checks.run_evaluation",
137+
"/admin/checks/{check_name}/run_evaluation",
138+
domain=warehouse,
139+
),
140+
pretend.call("admin.verdicts.list", "/admin/verdicts/", domain=warehouse),
141+
pretend.call(
142+
"admin.verdicts.detail", "/admin/verdicts/{verdict_id}", domain=warehouse
143+
),
144+
pretend.call(
145+
"admin.verdicts.review",
146+
"/admin/verdicts/{verdict_id}/review",
147+
domain=warehouse,
148+
),
126149
]

0 commit comments

Comments
 (0)