Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry instrumentation for RBAC #771

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

pilhuhn
Copy link
Contributor

@pilhuhn pilhuhn commented Nov 18, 2022

Link(s) to Jira

Description of Intent of Change(s)

Support Distributed Tracing by instrumenting RBAC with OpenTelemetry instrumentation

Local Testing

How can the feature be exercised?

  • pip install
  • # Temporary patch until the Django2 instrumentation supports ASGI (or we move to Django 3+)
    cp patch/otel_middleware.py ${APP_ROOT}/lib64/python3.8/site-packages/opentelemetry/instrumentation/django/middleware/
  • Run Jaeger with native otel support enabled, set OTLP_ENABLED=True , run rbac and see traces in Jaeger

Dockerfile has the patching -- this is temporarily needed until we either move to Django3+ or open-telemetry/opentelemetry-python-contrib#1416 is fixed.

Checklist

  • if API spec changes are required, is the spec updated?
  • are there any pre/post merge actions required? if so, document here.
  • are theses changes covered by unit tests?
  • if warranted, are documentation changes accounted for?
  • does this require migration changes?
    • if yes, are they backwards compatible?
  • is there known, direct impact to dependent teams/components?
    • if yes, how will this be handled?

Secure Coding Practices Checklist Link

Secure Coding Practices Checklist

  • Input Validation
  • Output Encoding
  • Authentication and Password Management
  • Session Management
  • Access Control
  • Cryptographic Practices
  • Error Handling and Logging
  • Data Protection
  • Communication Security
  • System Configuration
  • Database Security
  • File Management
  • Memory Management
  • General Coding Practices

Copy link
Contributor

@dehort dehort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I am not very familiar with the rbac service and opentelemetry.

Have we done any performance testing around these changes? Is this in the hot path? I would think so, as we appear to be installing a web server middleware.

If something goes wrong, how can we quickly/easily disable this? Is it as simple as setting OTLP_ENABLED=false? Is there an impact on performance with OTLP_ENABLED=false?

If something goes really really wrong, we need to be able to roll this change back out. I would like to see this change rebased / squashed into a single commit to make it easier to revert this change if needed.

I think we need to add the config env vars to the deploy/rbac-clowdapp.yml?

This change seems to be logging/sending the access-policy (and page?) to the telemetry server. How big are the access-policies?

@@ -0,0 +1,395 @@
# Copyright The OpenTelemetry Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file appears to come from here, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return self.get_paginated_response(page)
return Response({"data": access_policy})

tracer = trace.get_tracer(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to use the trace/tracer object when tracing is disabled (OTLP_ENABLED=false)? I assume that is fine and the trace object just becomes a NullObject in that case.

@@ -16,3 +16,12 @@ API_PATH_PREFIX=/api/rbac

DEVELOPMENT=True
# DJANGO_DEBUG=True

OTLP_ENABLED=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-picky (i can't help it...i'm sorry!) I kind of like being overly verbose on naming these types of variables. Maybe something like OPEN_TELEMETRY_ENABLED? My brain wires up "OTLP" to Online Transaction Processing (OLTP vs OTLP). :)


tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("access-get"):
span = trace.get_current_span()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get the span from the context manager in the with clause instead of calling span = trace.get_current_span()? Something like this:

with tracer.start_as_current_span("access-get") as span:

span.add_event("access_policy", {"policy": '-none-' if access_policy is None else access_policy})
if access_policy is None:
# Wrap in a span of its own
with tracer.start_as_current_span("access-policy-none"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more clear to name this span "access-policy-cache-miss" (if i understand the code correctly).

Like above it might be possible to get a reference to the child span as part of the with clause?

@jhutar
Copy link

jhutar commented Feb 6, 2023

/retest

@jhutar
Copy link

jhutar commented Feb 6, 2023

/rebuild

@dagbay-rh
Copy link
Contributor

@pilhuhn is this PR something we still want to get merged in? jw if its stale and can/should be closed or if some one should pick it back up

@pilhuhn
Copy link
Contributor Author

pilhuhn commented Aug 2, 2023

@dagbay-rh you should probably talk to @gmcculloug. I personally think it needs to be updated and merged

@syncrou
Copy link

syncrou commented Aug 2, 2023

@lphiri is this process moving forward platform wide? @pilhuhn I would want to hold off on this until I knew what direction the platform was moving on the Open Telemetry front.
cc @dagbay-rh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants