Skip to content

lttng: Add req_type to ProcessCompletions trace #877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions include/nccl_ofi_tracepoint.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
lttng_ust_tracepoint(nccl_ofi_plugin, Flush, request, nccl_req); \
} while(0)

#define NCCL_OFI_TRACE_COMPLETIONS_SENDRECV(dev,request,ctx) do { \
lttng_ust_tracepoint(nccl_ofi_plugin, ProcessCompletions, dev,request,ctx); \
#define NCCL_OFI_TRACE_COMPLETIONS_SENDRECV(dev,request,req_type,ctx) do { \
lttng_ust_tracepoint(nccl_ofi_plugin, ProcessCompletions, dev,request,req_type,ctx); \
} while(0)

/***** RDMA PROTOCL *****/
Expand Down Expand Up @@ -94,8 +94,8 @@
NCCL_OFI_TRACE_EAGER_RECV_NVTX(dev, rail_id, comm, msg_seq_num); \
} while(0)

#define NCCL_OFI_TRACE_COMPLETIONS(dev,request,ctx) do { \
lttng_ust_tracepoint(nccl_ofi_plugin, ProcessCompletions, dev,request,ctx); \
#define NCCL_OFI_TRACE_COMPLETIONS(dev,request,req_type,ctx) do { \
lttng_ust_tracepoint(nccl_ofi_plugin, ProcessCompletions, dev,request,req_type,ctx); \
} while(0)

#define NCCL_OFI_TRACE_FLUSH(request, nccl_req) do { \
Expand Down
2 changes: 2 additions & 0 deletions include/tracing_impl/lttng.h
Original file line number Diff line number Diff line change
Expand Up @@ -313,11 +313,13 @@ LTTNG_UST_TRACEPOINT_EVENT(
LTTNG_UST_TP_ARGS(
int, dev,
void *, request,
int, req_type,
void *, ctx
),
LTTNG_UST_TP_FIELDS(
lttng_ust_field_integer(int, dev, dev)
lttng_ust_field_integer_hex(uint64_t, request, (uint64_t)request)
lttng_ust_field_integer(int, req_type, req_type)
lttng_ust_field_integer(uint64_t, ctx, (uint64_t)ctx)
)
)
Expand Down
2 changes: 1 addition & 1 deletion src/nccl_ofi_rdma.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@ static inline int inc_req_completion(nccl_net_ofi_rdma_req_t *req,
req->state = NCCL_OFI_RDMA_REQ_COMPLETED;

/* Trace this completion */
NCCL_OFI_TRACE_COMPLETIONS(req->dev_id, req, req);
NCCL_OFI_TRACE_COMPLETIONS(req->dev_id, req, req->type, req);
}

nccl_net_ofi_mutex_unlock(&req->req_lock);
Expand Down
2 changes: 1 addition & 1 deletion src/nccl_ofi_sendrecv.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ static int sendrecv_req_handle_cq_entry(nccl_net_ofi_context_t *ctx,

nccl_net_ofi_sendrecv_req_t *req = container_of(ctx, nccl_net_ofi_sendrecv_req_t, ctx);

NCCL_OFI_TRACE_COMPLETIONS_SENDRECV(req->dev_id, req, &ctx->ofi_ctx);
NCCL_OFI_TRACE_COMPLETIONS_SENDRECV(req->dev_id, req, -1, &ctx->ofi_ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can use cq_entry->flags to differentiate between request types for SENDRECV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a possibility, I tried replacing the -1 with cq_entry->flags for SENDRECV and it looks like the cq_entry->flags value here for send requests is 2058 for SEND requests (FI_SEND, FI_TAGGED, and FI_MSG) and 1034 for RECV requests (FI_RECV, FI_TAGGED, and FI_MSG). Libfabric flags defined here.

ProcessCompletions: { cpu_id = 22 }, { dev = 0, request = 0x79024BEC3F80, req_type = 2058, ctx = 133050770669456 }
...
ProcessCompletions: { cpu_id = 22 }, { dev = 0, request = 0x7901B0154EF0, req_type = 1034, ctx = 133048156114688 }
...

If representing the cq_entry->flags value as the req_type field would be appropriate I could reuse the same ProcessCompletions tracepoint event between RDMA and SENDRECV transports, otherwise I might need to create a separate ProcessCompletionsSendRecv tracepoint event that has a cq_entry_flags field instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analog of RDMA's req->type is SENDRECV's req->direction (link); I would use that here.


if (cq_entry->flags & FI_RECV) {
sendrecv_req_update(req, NCCL_OFI_SENDRECV_REQ_COMPLETED, cq_entry->len);
Expand Down
Loading