Skip to content

Commit

Permalink
feat: Added OpenTelemetry Tracing Support
Browse files Browse the repository at this point in the history
This commit introduces OpenTelemetry tracing support to enhance the
observability of the application. With this addition, developers and
operators can now trace the execution of operations within the
application, gaining insights into performance bottlenecks, errors, and
overall flow through the service architecture.

Key Changes:

1. **Tracing Integration**:
   - Updated `otel_handler.trace` decorator to enable tracing across
     critical functions within the application, including camera
     control, face detection, and launcher operations.
   - Updated `README.md` with detailed instructions on testing traces
     and viewing them in Grafana Cloud, ensuring users can effectively
     utilize tracing capabilities.

2. **Source Code Enhancements**:
   - Augmented various classes and methods with the
     `@otel_handler.trace` decorator, embedding tracing into the core
     functionalities of the application.
   - Made necessary adjustments to configuration and credential
     management to support the OpenTelemetry tracing infrastructure.

3. **Documentation Update**:
   - Extended the `README.md` to include a new section on "Testing
     Traces," providing users with clear guidance on how to implement,
     test, and view traces.

Testing Done:
- **Pytests**: Ran the entire suite of automated tests to ensure
  existing functionalities remain unaffected and new tracing
  capabilities integrate seamlessly.
- **Real Hardware Testing**: Conducted multiple rounds of thorough
  testing on real hardware to ensure the traces accurately represent the
  application's behavior and performance under realistic conditions.
- **Grafana Cloud Verification**: Verified that the spans are correctly
  captured and displayed in Grafana Cloud, providing clear visibility
  into the application's operational traces.

Addresses GitHub Issue:
- This commit addresses GitHub issue #49, significantly enhancing the
  application's observability and troubleshooting capabilities through
  OpenTelemetry tracing.

With the addition of tracing, the application's operations can now be
visualized and analyzed in greater detail, providing valuable insights
for development, troubleshooting, and performance optimization.

ChatGPT links:
1. https://chat.openai.com/share/d9aaa3f5-d3b1-4ff5-81b7-68d204dbcc16
  • Loading branch information
Mr. ChatGPT committed Jan 2, 2024
1 parent e43187c commit f84c368
Show file tree
Hide file tree
Showing 8 changed files with 128 additions and 12 deletions.
53 changes: 53 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,59 @@ Before you begin, ensure you have the following:
After running the application and generating some data, you should see metrics appearing in your Grafana dashboard. Verify that the metrics make sense and reflect the application's operations accurately. Look for any discrepancies or unexpected behavior in metric reporting.

### Testing Traces

#### Tracing Functions

To trace a function, decorate it with the `@otel_handler.trace` decorator:

```python
@otel_handler.trace
def your_function_to_trace(arg1, arg2):
# Your function logic
```

#### Viewing Traces in Grafana Cloud

After integrating enhanced tracing capabilities into your application using OpenTelemetry, you can visualize and analyze the traces in Grafana Cloud. Here's how to view the traces:
##### Tracing Prerequisites
- Ensure that your application is configured to send traces to Grafana Cloud's OTLP endpoint. This typically involves setting the correct endpoint, API token, and other necessary configuration in your application's OpenTelemetry setup.
- Have access to a Grafana Cloud account where the traces are sent. Ensure you have the appropriate permissions to view and manage traces.
##### Viewing Traces
1. **Log in to Grafana Cloud**: Navigate to your Grafana Cloud instance and log in with your credentials.
1. **Navigate to the Traces Section**:
- Once logged in, look for the "Explore" section in the left-hand menu.
- Within "Explore", you should see an option for "Traces" or "Tempo" (Grafana's tracing backend), depending on your Grafana Cloud setup.

1. **Selecting Data Source**:
- If prompted, select the appropriate data source that corresponds to where your application sends its traces. This is typically the OTLP endpoint you configured in your application.

1. **Exploring Traces**:
- **View Trace List**: You will see a list of recent traces. Each trace typically represents a request or transaction in your application.
- **Filtering and Searching**: Use available filters or search functionalities to find specific traces. You can filter by service, operation, duration, and other trace attributes.
- **Trace Details**: Click on a specific trace to view its detailed information, including spans, attributes, and any logs or errors captured.

1. **Understanding Trace Details**:
- **Spans**: Each trace consists of multiple spans. Each span represents a unit of work in your application, like a function call or a database query.
- **Attributes**: Look at the attributes to understand more about each span, including function arguments, return values, and error messages.
- **Visualization**: Spans are typically visualized in a waterfall diagram showing the parent-child relationships and the time each span took.

#### Tips for Effective Trace Analysis

- **Correlate Logs and Metrics**: If possible, correlate trace data with logs and metrics to get a comprehensive view of the application behavior.
- **Use Trace ID**: If you need to correlate a trace with logs or other data, use the trace ID as a reference.
- **Regular Review**: Regularly review trace data to understand typical application behavior and identify areas for performance improvement or error correction.

#### Grafana Cloud Support

For more detailed instructions or troubleshooting, refer to the Grafana Cloud documentation or contact Grafana Cloud support. Ensure your Grafana Cloud and OpenTelemetry configurations are correctly set up for successful trace collection and visualization.


## Credits

This code is based on the original source available at [https://github.com/hovren/pymissile](https://github.com/hovren/pymissile).
Expand Down
8 changes: 8 additions & 0 deletions src/pygptcourse/camera_control.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import time

from pygptcourse.otel_decorators import otel_handler
from pygptcourse.tshirt_launcher import (
DOWN,
LEFT,
Expand All @@ -20,18 +21,21 @@ class CameraControl:
TOLERANCE = IMAGE_HEIGHT / (4 * 2) # for 480, it is 60
launch_count = 0

@otel_handler.trace
def __init__(self, simulation_mode=False):
self.simulation_mode = simulation_mode
print("Starting initialization of Launcher")
self.launcher = Launcher() if not simulation_mode else SimulatedLauncher()
print("Finished initialization of Launcher")
self.current_camera_position = [self.TOTAL_TIME_LR, self.TOTAL_TIME_TB]

@otel_handler.trace
def start(self):
if not self.launcher.running:
print("Starting launcher...")
self.launcher.start()

@otel_handler.trace
def move_camera(self, direction, duration):
cmd = STOP
prev_current_camera_position = self.current_camera_position.copy()
Expand Down Expand Up @@ -77,6 +81,7 @@ def move_camera(self, direction, duration):
)
self.launcher.move(cmd, duration)

@otel_handler.trace
def move_camera_to_center(self):
print("Moving camera to center")
# Move to bottom left (0, TOTAL_TIME_TB)
Expand All @@ -93,6 +98,7 @@ def move_camera_to_center(self):
self.move_camera("RIGHT", self.TOTAL_TIME_LR / 2)
self.move_camera("UP", self.TOTAL_TIME_TB / 2)

@otel_handler.trace
def check_and_move_camera(self, face_center):
dx = face_center[0] - (self.IMAGE_WIDTH / 2)
dy = face_center[1] - (self.IMAGE_HEIGHT / 2)
Expand All @@ -114,6 +120,7 @@ def check_and_move_camera(self, face_center):

return moving

@otel_handler.trace
def launch_if_aligned(self, face_center):
moving = self.check_and_move_camera(face_center)
if not moving:
Expand All @@ -122,6 +129,7 @@ def launch_if_aligned(self, face_center):
else:
print("Target not aligned. Holding launch.")

@otel_handler.trace
def stop(self):
self.launcher.running = False
self.launcher.close()
5 changes: 5 additions & 0 deletions src/pygptcourse/camera_manager.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
import cv2 # type: ignore

from pygptcourse.otel_decorators import otel_handler


class CameraManager:
@otel_handler.trace
def __init__(self, resolution=(640, 480)):
self.video_capture = cv2.VideoCapture(0)
self.video_capture.set(3, resolution[0]) # Horizontal resolution
self.video_capture.set(4, resolution[1]) # Vertical resolution

@otel_handler.trace
def start(self):
# Additional logic for starting the camera can be added here
return self.video_capture

@otel_handler.trace
def stop(self):
# Stop and release the video capture
self.video_capture.release()
Expand Down
2 changes: 1 addition & 1 deletion src/pygptcourse/credentials.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def __init__(self):
).decode("utf-8")
self.endpoint = os.getenv("GRAFANA_OTLP_ENDPOINT")
if self.endpoint:
self.trace_endpoint = self.endpoint + "/v1/traces"
self.traces_endpoint = self.endpoint + "/v1/traces"
self.metrics_endpoint = self.endpoint + "/v1/metrics"
self.logs_endpoint = self.endpoint + "/v1/logs"

Expand Down
2 changes: 2 additions & 0 deletions src/pygptcourse/face_detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ def __init__(self, face_images, image_loader):
self.image_loader = image_loader
self.face_encodings = self.load_and_encode_faces(face_images)

@otel_handler.trace
def load_and_encode_faces(self, face_images):
encodings = {}
for name, image_path in face_images.items():
Expand All @@ -16,6 +17,7 @@ def load_and_encode_faces(self, face_images):
encodings[name] = face_recognition.face_encodings(image)[0]
return encodings

@otel_handler.trace
def detect_faces(self, image):
face_locations = face_recognition.face_locations(image)
face_encodings = face_recognition.face_encodings(image, face_locations)
Expand Down
1 change: 1 addition & 0 deletions src/pygptcourse/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ def is_display_available():
return "DISPLAY" in os.environ


@otel_handler.trace
def main():
parser = argparse.ArgumentParser(description="Run the camera control system.")
parser.add_argument(
Expand Down
52 changes: 41 additions & 11 deletions src/pygptcourse/otel_decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,14 @@
from functools import wraps

from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.metrics import get_meter_provider, set_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import get_tracer_provider, set_tracer_provider

from pygptcourse.credentials import OpenTelemetryCredentials

Expand All @@ -33,9 +37,11 @@ def get_count(self, labels=None):


class OpenTelemetryHandler:
def __init__(self):
def __init__(self, trace_interval_minutes=5):
self.creds = OpenTelemetryCredentials()
self.enabled = self.creds.is_configured()
self.last_trace_time = 0
self.trace_interval_seconds = trace_interval_minutes * 60

if self.enabled:
try:
Expand All @@ -59,6 +65,21 @@ def __init__(self):

self.meter = get_meter_provider().get_meter(service_name, VERSION)

# Setup Tracing
# Initialize the tracer provider and trace exporter
self.otlp_trace_exporter = OTLPSpanExporter(
endpoint=f"{self.creds.traces_endpoint}",
headers={"authorization": f"Basic {self.creds.api_encoded_token}"},
)
trace_provider = TracerProvider(resource=self.resource)
trace_provider.add_span_processor(
BatchSpanProcessor(self.otlp_trace_exporter)
)

# Set the fully configured tracer provider globally
set_tracer_provider(trace_provider)
self.tracer = get_tracer_provider().get_tracer(service_name, VERSION)

# Metric definitions
self.usb_failures = self.meter.create_counter(
"usb_connection_failures",
Expand All @@ -73,6 +94,7 @@ def __init__(self):
description="Total number of faces detected",
unit="int",
)

except Exception as e:
# Handle initialization failure by disabling OpenTelemetry and using dummy metrics
self.enabled = False
Expand All @@ -93,21 +115,29 @@ def _initialize_dummy_metrics(self):
def trace(self, func):
@wraps(func)
def wrapper(*args, **kwargs):
if self.enabled:
# If OTLP is enabled, do something before the function (e.g., start a span)
if not self.enabled: # Skip all tracing if OTLP is not configured
return func(*args, **kwargs)

# Execute the function
result = func(*args, **kwargs)
# Use the stored tracer instance
with self.tracer.start_as_current_span(func.__name__) as span:
# Capture and log function arguments
span.set_attribute("arguments", str(args) + " " + str(kwargs))

# Do something after the function (e.g., end the span)
try:
# Execute the wrapped function
result = func(*args, **kwargs)

return result
else:
# If OTLP is not enabled, just execute the function
return func(*args, **kwargs)
# Capture and log the return value
span.set_attribute("return_value", str(result))
return result
except Exception as e:
# Capture and log the exception details
span.set_attribute("error", True)
span.record_exception(e)
raise

return wrapper


# Global instance of the handler
otel_handler = OpenTelemetryHandler()
otel_handler = OpenTelemetryHandler(trace_interval_minutes=1)
17 changes: 17 additions & 0 deletions src/pygptcourse/tshirt_launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
import usb.core # type: ignore
import usb.util # type: ignore

from pygptcourse.otel_decorators import otel_handler

VENDOR = 0x1941
PRODUCT = 0x8021

Expand Down Expand Up @@ -50,28 +52,35 @@ def __init__(self):
self.running = False
super().__init__()

@otel_handler.trace
def send_command(self, command):
print(f"Simulated sending command {command}")

@otel_handler.trace
def start(self):
self.running = True
print("Simulated launcher started")

@otel_handler.trace
def stop(self):
self.running = False
print("Simulated launcher stopped")

@otel_handler.trace
def fire(self):
print("Simulated firing")

@otel_handler.trace
def move(self, command, duration):
print(f"Simulating move with command {command} for duration {duration}")

@otel_handler.trace
def close(self):
print("Simulated launcher closed")


class Launcher(AbstractLauncher):
@otel_handler.trace
def __init__(self):
dev = usb.core.find(idVendor=VENDOR, idProduct=PRODUCT)

Expand Down Expand Up @@ -119,16 +128,19 @@ def __init__(self):
# except usb.core.USBError, e:
# print("RESET ERROR", e)

@otel_handler.trace
def start(self):
self.running = True
self.t = threading.Thread(target=self.read_process)
self.t.start()
self.running = True

@otel_handler.trace
def stop(self):
self.running = False
print("Thread stopped")

@otel_handler.trace
def read_process(self):
abort_fire = False
fire_complete_time = time.time()
Expand Down Expand Up @@ -199,19 +211,22 @@ def read_process(self):
self.close()
print("THREAD STOPPED")

@otel_handler.trace
def read(self, length):
try:
return self.ep.read(length)
except usb.core.USBError:
return None

@otel_handler.trace
def send_command(self, command):
try:
self.command = command
self.dev.ctrl_transfer(0x21, 0x09, 0x200, 0, [command])
except usb.core.USBError as e:
print("SEND ERROR", e)

@otel_handler.trace
def move(self, command, duration):
try:
self.send_command(command)
Expand All @@ -220,6 +235,7 @@ def move(self, command, duration):
except usb.core.USBError as e:
print("SEND ERROR", e)

@otel_handler.trace
def fire(self):
try:
self.firing = True
Expand All @@ -230,6 +246,7 @@ def fire(self):

# added to see if this would fix the overheating problem
# after the program exits when connected to a Mac
@otel_handler.trace
def close(self):
self.stop()
print("Closing connection")
Expand Down

0 comments on commit f84c368

Please sign in to comment.