Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Trace Query & Page Load Optimization #2334

Open
ps48 opened this issue Feb 3, 2025 · 1 comment
Open

[WIP] Trace Query & Page Load Optimization #2334

ps48 opened this issue Feb 3, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request traces traces telemetry related features

Comments

@ps48
Copy link
Member

ps48 commented Feb 3, 2025

1. Overview

This document outlines the proposed optimizations for trace queries and page load performance in OpenSearch Dashboards. The goal is to enhance query efficiency, reduce data load times, and improve the user experience (UX) for trace and service views.

2. Pages Impacted

  • Trace View - details of a single trace with span gantt charts and filtered service map
  • Trace Content - overview of all the spans and traces in systems for the given time frame
  • Service View - details of a particular service and it’s trends based on the selected time-frame
  • Service Content - overview of all the services for the given time frame with a service map

3. Investigation

3.1 List of Queries

  • Identify all the existing queries running on the affected pages.
  • Categorize them by purpose and usage.

3.2 Query Initiators

  • Determine which components or user interactions trigger queries.
  • Assess whether some queries can be optimized or eliminated.

3.3 Context of Queries

  • Identify where each query is used in the UI.
  • Determine the semantic meaning of the queries for the user.

@ps48 & @TackAdam to add further details on investigations done on services and traces pages

4. Types of Optimization

4.1 OpenSearch Index Optimization

4.2 Query Optimization

  • Combine redundant queries to minimize requests.
  • Identify slow queries and optimize their execution.

4.3 Pagination Implementation

  • Implement pagination for trace tables, span tables, and service tables.
  • Reduce the volume of data retrieved per request.

4.4 Managing Multiple Queries

  • Optimize queries from server-side and initiators.
  • Improve React state management to handle fewer redundant queries.

4.5 Data Load Reduction

  • Reduce the number of records retrieved per page.
  • Optimize the backend response structure to minimize payload size.

4.6 UX Enhancements

  • Modify the UI to start with filtered views rather than full data loads.
  • Provide users with quick access to relevant data through pre-applied filters.

4.7 Benchmarking and Testing

@ps48 ps48 added enhancement New feature or request traces traces telemetry related features untriaged and removed untriaged labels Feb 3, 2025
@ps48 ps48 self-assigned this Feb 3, 2025
@TackAdam
Copy link
Collaborator

TackAdam commented Feb 3, 2025

Traceview:

  • Query # 2 : Main payload (Should be able to encompass the calls from the following: 1, 3, 6, 9, 10)
    • Query # 1 Overview (Details can be gotten from payload)
    • Query # 3 Pie chart latency
    • Query # 6 Gantt chart * Sort on the UI
    • Query # 9 Span List
    • Query # 10 Tree view
  • Service map queries (Pending optimization into one call)
    • Query # 4Target resource - Service Map (Probably the nodes)
    • Query # 5 Service-map / similar to above call just adds target.domain
    • Query # 7 Service map request / getServiceEdgesQuery
    • Query # 8 Service map metrics (How are we landing without trace id)

Traces Content

  • Query # 1 : Main content for table
    • Query # 2 Traces table → percentiles (Check PM if percentile of group is useful)
      • May not be accurate only going off of 10,000 entries
  • Trace Group (Expandable menu triggers additional queries)
    • Main table made up of
      • Query # 1 Traces table just the percentile
      • Query # 6 Latency part of table
      • Query # 7 Main table information duration / errors
    • Query # 2 Histogram throughput //Should these be moved outside Trace group and displayed at top?
    • Query # 3 Error rate histogram //Should these be moved outside trace group and displayed at top?
    • Query # 4 Should be deleted: repeat call of the Traces Content top level query # 1
    • Query # 5 Should be deleted: repeat call of the Traces Content top level query # 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request traces traces telemetry related features
Projects
None yet
Development

No branches or pull requests

2 participants