Skip to content
46 changes: 39 additions & 7 deletions jwt_proxy/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,41 @@
import jwt
import requests
import json
import re

from jwt_proxy.audit import audit_HAPI_change

blueprint = Blueprint('auth', __name__)
SUPPORTED_METHODS = ('GET', 'POST', 'PUT', 'DELETE', 'OPTIONS')

# TODO: to be pulled into its own module and loaded per config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love the TODO. makes sense to me to move this to a separate module - by that i mean a separate .py file for easier migration later.

def scope_filter(req, token):
# Check path
resource_pattern = rf"(Patient|DocumentReference)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear this approach leaves a backdoor to well tailored requests. A user could include a string like "Patient" in say an or-clause of a search, or the like.

Any reason to not just limit to the requested resource in the request path? i.e. {url}/Patient or {url}/DocumentReference for this use case?

Copy link
Contributor Author

@daniellrgn daniellrgn Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not just limit to the requested resource in the request path? i.e. {url}/Patient or {url}/DocumentReference for this use case?

That's the intention of this pattern, to enforce the request path (sans params) ends in either Patient or DocumentReference (it's checked against req.path, which will typically be fhir/DocumentReference etc.). I preferred containing this check within the function to defining a url rule or something, but let me know if there's a better/more pythonic way!

if re.search(resource_pattern, req.path) is None:
return False

user_id = token.get("sub")
identifier_pattern = rf"(https(:|%3[Aa])(\/|%2[Ff]){2}keycloak\.ltt\.cirg\.uw\.edu(%7[Cc]|\|))?{user_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is rather hardcoded to one keycloak install. it would be nice to look to a config value, similar to UPSTREAM_SERVER.

any reason to use re for URL parsing, vs. say urlparse from urllib.parse ?

Copy link
Contributor Author

@daniellrgn daniellrgn Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Paul, good points -

This pattern just matches identifier-like query param values like https://keycloak.ltt.cirg.uw.edu|keycloak-user-id
The code system https://keycloak.ltt.cirg.uw.edu is a dummy system url that cPRO adds to the LTT patient resources to identify keycloak user ids, and would certainly better live in config to support other codings/identifiers/systems.

I wasn't aware of urlparse, so thanks for the recommendation! I can't tell if it's quite equivalent to the use of this pattern though: it seems closer to a replacement for Flask's req.args call in the next line that parses the query params from the url, but if there's a better way to validate the content of a specific query param value there I'm all ears.

All that being said, I'll clean up this pattern - I'd hedged it a bit since I didn't know exactly what was url-encoded and what wasn't, but I've found that the query param values are always url-decoded by the req.args call here, and the later check against the identifier params in a POSTed resource's references will always come through url-encoded.

# Search params for keycloak id
params = req.args
id_param_value = params.get("identifier", params.get("_identifier", params.get("subject.identifier")))
if id_param_value is not None and re.search(identifier_pattern, id_param_value):
return True
# Search body for keycloak id
if req.is_json:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the request.json property is a nice shortcut. will be None if not included or wrong content-type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The Flask docs recommended preferring get_json(), though it looks like I don't need this check as get_json() includes it implicitly.

try:
body = req.get_json()
except (ValueError, TypeError):
return False
resource_type = body.get("resourceType")
if resource_type == "DocumentReference":
reference_string = body.get("subject", {}).get("reference")
if reference_string is not None and re.search(identifier_pattern, reference_string):
return True
return False



def proxy_request(req, upstream_url, user_info=None):
"""Forward request to given url"""
Expand Down Expand Up @@ -67,13 +96,16 @@ def validate_jwt(relative_path):
)
except jwt.exceptions.ExpiredSignatureError:
return jsonify(message="token expired"), 401

response_content = proxy_request(
req=request,
upstream_url=f"{current_app.config['UPSTREAM_SERVER']}/{relative_path}",
user_info=decoded_token.get("email") or decoded_token.get("preferred_username"),
)
return response_content

# TODO: call new function here to dynamically load a filter call dependent on config; hardwired for now
if scope_filter(request, decoded_token):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably best to exit early (with the return jsonify... as you have). i fear we'll get a deep nest of it clauses when additional filters come online.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Paul! was going to mention something similar; sometimes called a "guard clause"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks both!

response_content = proxy_request(
req=request,
upstream_url=f"{current_app.config['UPSTREAM_SERVER']}/{relative_path}",
user_info=decoded_token.get("email") or decoded_token.get("preferred_username"),
)
return response_content
return jsonify(message="invalid request"), 400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this implies a lack of authorization, a 401 is a more accurate code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that 400 generally doesn't fit. I'd advocate to keep the 400 response on parsing errors like get_json failing though.
Would a 403 in response be an even better semantic fit, as by this point the user is authenticated but unauthorized to make the request, and reauthenticating will not help?

Copy link
Contributor Author

@daniellrgn daniellrgn Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard that first bit, I didn't realize werkzeug's BadRequest errors automatically convert to 400 when left uncaught. They sure do think these things through 😛



@blueprint.route("/fhir/.well-known/smart-configuration")
Expand Down