Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: nsight system does not work with --gpu-metrics-device option #4797

Open
2 tasks done
SeungsuBaek opened this issue Dec 4, 2024 · 0 comments
Open
2 tasks done
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@SeungsuBaek
Copy link

SeungsuBaek commented Dec 4, 2024

Version

https://hub.docker.com/r/rapidsai/base

version : 24.10

Which installation method(s) does this occur on?

No response

Describe the bug.

Hi.

I want to profile cugraph with nsight system to check gpu dram bandwidth and pcie bandwidth.

For this, i use nsys profile --gpu-metrics-device=0 command.

I got the profiling result, but result has some error.

Below is nsys profile --gpu-metrics-device=0 command output.

Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  Props {
    Items {
      Type: ErrorText
      Value: "GPU Metrics [0]: NVPA_STATUS_ERROR\n- API function: Nvpw.GPU_PeriodicSampler_DecodeCounters_V2(&params)\n- Error code: 1\n- Source function: virtual QuadDDaemon::EventSource::PwMetrics::PeriodicSampler::DecodeResult QuadDDaemon::EventSource::{anonymous}::GpuPeriodicSampler::DecodeCounters(uint8_t*, size_t) const\n- Source location: /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Target/Daemon/EventSource/GpuMetrics.cpp:242"
    }
  }
}

Image

The image show that collecting gpu metrics suddenly stoped.


import dask_cudf
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
import cugraph
import cugraph.dask as dask_cugraph
import cugraph.dask.comms.comms as Comms
from cugraph.generators.rmat import rmat
import time
import argparse
import rmm

def main():
    parser = argparse.ArgumentParser()
    description = '''python bfs.py --n_workers 1 --visible_devices 0,1,2,3
                    --dataset /HUVM/dataset/graph/soc-twitter-2010.csv --loop'''
    parser.add_argument('--n_workers', type=int, required=True, help='number of workers')
    parser.add_argument('--visible_devices', type=str, required=True,
                        help='comma-separated CUDA_VISIBLE_DEVICES (e.g. 0,1,2,3)')
    parser.add_argument('--dataset', type=str, required=True, help='path to graph dataset')
    parser.add_argument('--loop', default=False, action='store_true', help='run one time or in loop')
    args = parser.parse_args()

    # Initialize the CUDA cluster
    cluster = LocalCUDACluster(
               rmm_managed_memory=True,
               rmm_pool_size="50GB",
               CUDA_VISIBLE_DEVICES=args.visible_devices,
               n_workers=args.n_workers
    )
    client = Client(cluster)
    Comms.initialize(p2p=True)

    # Initialize multi-GPU communication
    # Set the reader chunk size to automatically get one partition per GPU
    chunksize = dask_cugraph.get_chunksize(args.dataset)

    # Multi-GPU CSV reader
    e_list = dask_cudf.read_csv(
        args.dataset, chunksize=chunksize, delimiter=' ',
        names=['src', 'dst'], dtype=['int32', 'int32']
    )

    # Create a directed graph from the edge list
    G = cugraph.Graph(directed=True)
    G.from_dask_cudf_edgelist(e_list, source='src', destination='dst')

    # Run BFS in loop or once based on the argument
    if args.loop:
        while True:
            t_start = time.time()
            result = dask_cugraph.bfs(G, start=1)  # Use 'start' argument
#            wait(result)  # Ensure computation finishes
            print("Execution time: ", time.time() - t_start)
    else:
        t_start = time.time()
        result = dask_cugraph.bfs(G, start=1)  # Use 'start' argument
#        wait(result)  # Ensure computation finishes
        print("Execution time: ", time.time() - t_start)

    # Clean up
    Comms.destroy()
    client.close()
    cluster.close()

if __name__ == "__main__":
    main()

This is my bfs benchmark code.

Is there any bug about cugraph application with gpu performance counter?

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@SeungsuBaek SeungsuBaek added ? - Needs Triage Need team to review and classify bug Something isn't working labels Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant