Skip to content
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
89d8906
add bleach and pytest and update fastapi and ruff
RafaelCenzano Oct 31, 2025
73e8a1f
new sanitization and validation helper functions for improved security
RafaelCenzano Oct 31, 2025
64751e3
validate against mongo content in inputs
RafaelCenzano Oct 31, 2025
cef2155
move common models and validation functions to common file
RafaelCenzano Oct 31, 2025
84226ea
validate code is UUID v4
RafaelCenzano Oct 31, 2025
31edd36
Add validation for backtest routes
RafaelCenzano Oct 31, 2025
15402e5
Add validation for loanertech routes
RafaelCenzano Oct 31, 2025
bd37edb
add validation for laf routes
RafaelCenzano Oct 31, 2025
0f8dcf0
add pytest for new functions in sanitize.py
RafaelCenzano Oct 31, 2025
480d978
new pytest github action
RafaelCenzano Oct 31, 2025
0690ac7
update pinned actions for workflows
RafaelCenzano Oct 31, 2025
2af5353
update name to Pytest
RafaelCenzano Oct 31, 2025
120098b
add permissions to workflow
RafaelCenzano Oct 31, 2025
5c10769
remove regex and just depend on bleach to remove tags
RafaelCenzano Oct 31, 2025
e41e3a9
compile regex expression for more efficient execution
RafaelCenzano Oct 31, 2025
ccf7938
exclude valkey/redis dumps
RafaelCenzano Oct 31, 2025
10a182d
Merge branch 'main' into update-input-security
RafaelCenzano Jan 5, 2026
0a78910
chore: add version in comment
RafaelCenzano Jan 5, 2026
266aded
refactor: use annotated type validation for style and pydantic v2 mat…
RafaelCenzano Jan 6, 2026
6239d81
fix: ensure validation for objectid is used
RafaelCenzano Jan 6, 2026
8d23f31
chore: create constant for location max length
RafaelCenzano Jan 7, 2026
ce7d0f9
test: cover all edge cases
RafaelCenzano Jan 7, 2026
3db9ec7
chore: add strip to all output for course code validation
RafaelCenzano Jan 7, 2026
58baa6c
chore: remove duplicate annotated namefilter type
RafaelCenzano Jan 7, 2026
a166cf6
refactor: phone regex
RafaelCenzano Jan 10, 2026
c813edb
fix: tests and mongo santize function
RafaelCenzano Jan 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/docker-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,18 @@ jobs:
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 #v5
# Uses the `docker/login-action` action to log in to the Container registry registry using the account and password that will publish the packages. Once published, the packages are scoped to the account defined here.
- name: Log in to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef #v3.6.2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# This step uses [docker/metadata-action](https://github.com/docker/metadata-action#about) to extract tags and labels that will be applied to the specified image. The `id` "meta" allows the output of this step to be referenced in a subsequent step. The `images` value provides the base name for the tags and labels.
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
uses: docker/metadata-action@c1e51972afc2121e065aed6d45c65596fe445f3f #v5.8.0
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
Expand All @@ -44,7 +44,7 @@ jobs:
# It uses the `tags` and `labels` parameters to tag and label the image with the output from the "meta" step.
- name: Build and push Docker image
id: push
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 #v6.18.0
with:
context: .
push: true
Expand All @@ -53,7 +53,7 @@ jobs:

# This step generates an artifact attestation for the image, which is an unforgeable statement about where and how it was built. It increases supply chain security for people who consume the image. For more information, see "[AUTOTITLE](/actions/security-guides/using-artifact-attestations-to-establish-provenance-for-builds)."
- name: Generate artifact attestation
uses: actions/attest-build-provenance@db473fddc028af60658334401dc6fa3ffd8669fd
uses: actions/attest-build-provenance@977bb373ede98d70efdf65b84cb5f73e068dcc2a #v3.0.0
with:
subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}}
subject-digest: ${{ steps.push.outputs.digest }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 #v5

- name: "Build Docker Image"
run: |
Expand Down
32 changes: 32 additions & 0 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: pytest

on:
pull_request:
paths:
- "**.py"
- "uv.lock"
- ".python-version"
- "pyproject.toml"

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 #v5

- name: Install uv
uses: astral-sh/setup-uv@85856786d1ce8acfbcc2f13a5f3fbd6b938f9f41 #v7.1.2
with:
version: "latest"

- name: Set up Python
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c #v6
with:
python-version: "3.13.3"

- name: Install dependencies
run: uv sync --frozen --no-cache

- name: Run pytest
run: uv run pytest
4 changes: 2 additions & 2 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ jobs:
ruff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- uses: astral-sh/ruff-action@0c50076f12c38c3d0115b7b519b54a91cb9cf0ad
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 #v5
- uses: astral-sh/ruff-action@57714a7c8a2e59f32539362ba31877a1957dded1 #v3.5.1
6 changes: 6 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,14 @@ dependencies = [
"aiocache>=0.12.3",
"fastapi-sso>=0.17.0",
"fastapi[standard]>=0.115.6",
"bleach>=6.1.0",
"pydantic-settings>=2.7.1",
"pyjwt>=2.10.1",
"pymongo>=4.13.2",
"ruff>=0.9.2",
"sentry-sdk[fastapi]>=2.20.0",
"valkey-glide>=2.0.1",
"pytest>=8.3.3",
]

[tool.ruff]
Expand All @@ -28,3 +30,7 @@ quote-style = "double"
indent-style = "space"
line-ending = "lf"

[tool.pytest.ini_options]
pythonpath = ["."]
testpaths = ["tests"]

5 changes: 5 additions & 0 deletions server/database/laf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
datetime_time_delta,
get_next_sequence_value,
)
from server.helpers.sanitize import reject_mongo_operators
from server.models.laf import ArchivedLAFItem, ExpiredItem, LAFItem, LostReportItem

sequence_id_collection = database.get_collection("sequence_id")
Expand Down Expand Up @@ -109,6 +110,7 @@ async def lost_report_helper(lost_report: dict) -> LostReportItem:

# Add a new laf item into to the database
async def add_laf(laf_data: dict) -> LAFItem:
reject_mongo_operators(laf_data)
type_id = await get_type_id(laf_data["type"])
del laf_data["type"]

Expand Down Expand Up @@ -149,6 +151,7 @@ async def update_laf(laf_id: int, laf_data: dict, now: datetime) -> bool:
del laf_data["type"]
laf_data["type_id"] = type_id

reject_mongo_operators(laf_data)
updated_laf_item = await laf_items_collection.update_one(
{"_id": laf_id}, {"$set": laf_data}
)
Expand Down Expand Up @@ -397,6 +400,7 @@ async def retrieve_expired_laf(

# Add a new lost report into to the database
async def add_lost_report(lost_report_data: dict, auth: bool) -> LostReportItem:
reject_mongo_operators(lost_report_data)
type_id = await get_type_id(lost_report_data["type"])
del lost_report_data["type"]
now = datetime.now()
Expand Down Expand Up @@ -437,6 +441,7 @@ async def update_lost_report(
del lost_report_data["type"]
lost_report_data["type_id"] = type_id

reject_mongo_operators(lost_report_data)
updated_lost_report = await lost_reports_collection.update_one(
{"_id": lost_report_id_bson}, {"$set": lost_report_data}
)
Expand Down
3 changes: 3 additions & 0 deletions server/database/loanertech.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from server.database import database
from server.helpers.db import get_next_sequence_value
from server.helpers.sanitize import reject_mongo_operators
from server.models.loanertech import LoanerTechItem, LoanerTechItemUnauthorized

sequence_id_collection = database.get_collection("sequence_id")
Expand Down Expand Up @@ -47,6 +48,7 @@ async def retrieve_loanertechs() -> list[LoanerTechItem]:

# Add a new loanertech item into to the database
async def add_loanertech(loanertech_data: dict) -> LoanerTechItem:
reject_mongo_operators(loanertech_data)
# Add the ID to the loanertech data
loanertech_data["_id"] = await get_next_sequence_value(
"loanertech_id", sequence_id_collection
Expand Down Expand Up @@ -81,6 +83,7 @@ async def update_loanertech(id: int, data: dict) -> bool:
# Return false if an empty request body is sent.
if len(data) < 1:
return False
reject_mongo_operators(data)
loanertech = await loanertech_collection.find_one({"_id": id})
if loanertech:
updated_loanertech = await loanertech_collection.update_one(
Expand Down
56 changes: 56 additions & 0 deletions server/helpers/sanitize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
from __future__ import annotations

import re
from typing import Any

import bleach

_OBJECT_ID_RE = re.compile(r"^[a-fA-F0-9]{24}$")


def strip_tags(text: str) -> str:
if text is None:
return ""
Comment on lines +12 to +14
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strip_tags function returns an empty string when text is None, but this could mask errors. Consider raising a ValueError or TypeError instead to make it clear that None is not an acceptable input, ensuring callers handle null values explicitly.

Suggested change
def strip_tags(text: str) -> str:
if text is None:
return ""
def strip_tags(text: str | None) -> str:
if text is None:
raise TypeError("strip_tags() expected a string, got None")

Copilot uses AI. Check for mistakes.
# First, remove script and style tags with their content using regex
# This handles cases where bleach might leave script content
text = re.sub(
r"<script[^>]*>.*?</script>", "", text, flags=re.IGNORECASE | re.DOTALL
)
text = re.sub(r"<style[^>]*>.*?</style>", "", text, flags=re.IGNORECASE | re.DOTALL)
# Remove all remaining HTML tags using bleach
return bleach.clean(text, tags=[], attributes={}, strip=True)


def normalize_ws(text: str) -> str:
# Collapse whitespace and trim
return re.sub(r"\s+", " ", text or "").strip()


def sanitize_text(text: str, max_len: int | None = None) -> str:
cleaned = normalize_ws(strip_tags(str(text)))
if max_len is not None and len(cleaned) > max_len:
cleaned = cleaned[:max_len]
return cleaned


def is_valid_object_id(value: str) -> bool:
if not isinstance(value, str):
return False
return bool(_OBJECT_ID_RE.fullmatch(value))


def _reject_key(key: Any) -> None:
if isinstance(key, str) and (key.startswith("$") or "." in key):
raise ValueError("MongoDB operator or dotted keys are not allowed in input")


def reject_mongo_operators(obj: Any) -> Any:
# Recursively validate that no keys start with '$' or contain '.'
if isinstance(obj, dict):
for k, v in obj.items():
_reject_key(k)
reject_mongo_operators(v)
elif isinstance(obj, list):
for item in obj:
reject_mongo_operators(item)
return obj
20 changes: 0 additions & 20 deletions server/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,20 +0,0 @@
from typing import Any

from pydantic import BaseModel


class ResponseModel(BaseModel):
data: Any
message: str


class BoolResponse(ResponseModel):
data: bool


class StringListResponse(ResponseModel):
data: list[str]


class IntResponse(ResponseModel):
data: int
18 changes: 16 additions & 2 deletions server/models/auth.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,26 @@
from pydantic import BaseModel, Field
from uuid import UUID

from pydantic import BaseModel, Field, field_validator


class TokenRequest(BaseModel):
code: str = Field(...)

@field_validator("code", mode="before")
@classmethod
def v_code(cls, v: str) -> str:
# Strip whitespace
v = v.strip() if isinstance(v, str) else str(v)
try:
# Validate it's a valid UUID v4
uuid_obj = UUID(v, version=4)
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UUID validation logic has a flaw: UUID(v, version=4) doesn't actually validate that the UUID is version 4, it only attempts to parse v as a UUID and raises an error if the version parameter doesn't match. A non-v4 UUID string will still parse successfully. Use uuid_obj.version to check if it equals 4 after parsing, or use a regex pattern to validate the v4 format specifically.

Suggested change
uuid_obj = UUID(v, version=4)
uuid_obj = UUID(v)
if uuid_obj.version != 4:
raise ValueError("code must be a valid UUID v4")

Copilot uses AI. Check for mistakes.
return str(uuid_obj)
except (ValueError, AttributeError):
raise ValueError("code must be a valid UUID v4")

class Config:
json_schema_extra = {
"example": {
"code": "adsa-sda-dsa-ds-d-asd",
"code": "550e8400-e29b-41d4-a716-446655440000",
}
}
18 changes: 16 additions & 2 deletions server/models/backtest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
from typing import TypedDict
from typing import Annotated, TypedDict

from server.models import ResponseModel
from pydantic import BeforeValidator

from server.models.common import ResponseModel, validate_object_id


def validate_course_code(v: str) -> str:
"""Validate that a string is a valid course code (exactly 4 uppercase letters)."""
v = v.strip() if isinstance(v, str) else str(v)
if not (len(v) == 4 and v.isalpha() and v.isupper()):
raise ValueError("must be exactly 4 uppercase letters (A-Z)")
return v


ObjectId = Annotated[str, BeforeValidator(validate_object_id)]
CourseCode = Annotated[str, BeforeValidator(validate_course_code)]


class Course(TypedDict):
Expand Down
45 changes: 45 additions & 0 deletions server/models/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from typing import Annotated, Any

from pydantic import BaseModel, BeforeValidator

from server.helpers.sanitize import is_valid_object_id, sanitize_text


def validate_name(v: str | None) -> str:
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validate_name function accepts None as input per the type hint (str | None) but will raise an AttributeError when strip() is called on None. Either remove None from the type hint or add explicit None handling before calling strip().

Suggested change
def validate_name(v: str | None) -> str:
def validate_name(v: str) -> str:

Copilot uses AI. Check for mistakes.
"""Validate and sanitize name filter (max 100 characters)."""
return sanitize_text(v, max_len=100)


def validate_object_id(v: str) -> str:
"""Validate that a string is a valid MongoDB ObjectId (24 hex characters)."""
if not is_valid_object_id(v):
raise ValueError("must be a valid ObjectId (24 hexadecimal characters)")
return v


def validate_name_filter(v: str | None) -> str | None:
"""Validate optional name filter."""
if v is None:
return None
return validate_name(v)


NameFilter = Annotated[str, BeforeValidator(validate_name_filter)]
Name = Annotated[str, BeforeValidator(validate_name)]


class ResponseModel(BaseModel):
data: Any
message: str


class BoolResponse(ResponseModel):
data: bool


class StringListResponse(ResponseModel):
data: list[str]


class IntResponse(ResponseModel):
data: int
Loading