Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unification #3

Merged
merged 70 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
7618926
Add architecture query to GPU NVIDIA hardware sampler.
breyerml Aug 6, 2024
8618468
Add endianness queries for all GPUs.
breyerml Aug 6, 2024
8ed1bbb
Add vendor_id queries for all GPUs.
breyerml Aug 6, 2024
9d0093d
Use same YAML entry for all samplers.
breyerml Aug 6, 2024
62dfc33
Fix typo.
breyerml Aug 7, 2024
473eae6
Rename utilization samples.
breyerml Aug 7, 2024
11f9831
Fix usage of wrong variable type.
breyerml Aug 7, 2024
e7cc2b2
Rename performance_state to performance_level.
breyerml Aug 7, 2024
1d55f23
Merge branch 'main' into unification
breyerml Aug 16, 2024
e5f7337
Add samples to README (including TODOs).
breyerml Aug 16, 2024
1221588
Add architecture function for AMD GPUs
breyerml Aug 16, 2024
ce69a52
Add query for the number of cores.
breyerml Aug 16, 2024
567afcd
Update power related query to be more uniform (except Intel Level Zero).
breyerml Sep 12, 2024
0c31784
Clarify total energy consumption is only calculated and not sampled v…
breyerml Sep 12, 2024
046753c
Clarify total energy consumption is only calculated and not sampled v…
breyerml Sep 12, 2024
453cffe
Split time_point output into unit and values such that the unit prefi…
breyerml Sep 12, 2024
5d376d4
Unify clock related samples and add new ones depending on the target …
breyerml Sep 13, 2024
dfe75ae
(temporarily) disable level zero support.
breyerml Sep 13, 2024
2919e1c
Backport library to support C++17 instead of only C++20 (mainly chang…
breyerml Sep 16, 2024
0f7a253
Unify temperature related samples.
breyerml Sep 16, 2024
357bf24
Unify memory related samples.
breyerml Sep 16, 2024
e1a808c
Unify general samples.
breyerml Sep 16, 2024
ece190c
Prefix YAML entry to make its meaning clearer.
breyerml Sep 16, 2024
c8357ee
Consistent quoting of string-like values in the YAML file (and only i…
breyerml Sep 16, 2024
eaa4e4e
Move implementation to cpp file.
breyerml Sep 16, 2024
527c635
Clean-up utility header.
breyerml Sep 16, 2024
131de00
Update Python bindings.
breyerml Sep 16, 2024
6f40795
Update README tables.
breyerml Sep 16, 2024
e1a0da5
Update sample name.
breyerml Sep 16, 2024
d719e8f
Add new function returning relative time points (relative to the firs…
breyerml Sep 16, 2024
4f411f3
Add new python only functions that return a relative event, i.e., eve…
breyerml Sep 16, 2024
d6c69e9
Mark dump_yaml member functions as const.
breyerml Sep 17, 2024
cbbac19
Implement HWS_CUDA_ERROR_CHECK macro for also checking cuda error cod…
breyerml Sep 17, 2024
5fc03c2
Fix errors in documentation.
breyerml Sep 17, 2024
acb9826
Fix errors in documentation wrongly using PLSSVM.
breyerml Sep 17, 2024
bd4d987
Add a new system_hardware_sampler that automatically samples all avai…
breyerml Sep 17, 2024
7c96f02
Fix clang-tidy warnings.
breyerml Sep 17, 2024
9782e96
Remove unused fetch content.
breyerml Sep 23, 2024
1dd98f2
Add missing detail namespace qualifier.
breyerml Sep 24, 2024
28b52f4
Fix an error where the power usage was calculated from a reference po…
breyerml Sep 24, 2024
03e572e
Change order of device ID and bus ID.
breyerml Sep 24, 2024
6ae5c21
Add newlines between the different categories to make the YAML output…
breyerml Sep 24, 2024
d32f6bf
Fix some compilation warnings and linker errors.
breyerml Sep 24, 2024
89129b2
Update README file.
breyerml Sep 24, 2024
d26130a
Interpolate total power consumption from the current power usage on A…
breyerml Sep 24, 2024
d115e31
Update Intel GPU Level Zero implementation (not tested yet since curr…
breyerml Sep 24, 2024
81fe9cd
Add function to check whether a sample category as any sample. Output…
breyerml Sep 24, 2024
e3f7f3b
Only add newlines if the sample category isn't empty.
breyerml Sep 24, 2024
66ba78b
Add the possibility to disable sampling categories.
breyerml Sep 24, 2024
d5e33bf
Output throttle reasons as string and as bitmask.
breyerml Sep 24, 2024
4cceea6
Implement Intel GPU system_hardware_sampler device discovery.
breyerml Sep 24, 2024
ab809d9
Add a function to return the hardware samples as YAML string instead …
breyerml Sep 24, 2024
988dc77
Add a new function to retrieve the hardware samples only excluding ev…
breyerml Sep 27, 2024
68a3ad1
Make the device_identification function public.
breyerml Sep 27, 2024
d539560
Add alias targets.
breyerml Sep 27, 2024
a47e8fe
Fix turbostat logic related output bug.
breyerml Sep 30, 2024
f5747ae
Add version information using CMake configuration.
breyerml Sep 30, 2024
dc71dce
Add check that the sampling interval must not be zero.
breyerml Sep 30, 2024
ed85830
Update code examples.
breyerml Sep 30, 2024
257ca3d
Fix usage of wrong C++ standard in documentation string.
breyerml Sep 30, 2024
63bb80f
Add the possibility to generate a Doxygen documentation.
breyerml Sep 30, 2024
621f50d
Rename hardware_sampling folder to hws and change target library name.
breyerml Sep 30, 2024
5c21328
Add {fmt} to the install targets.
breyerml Oct 7, 2024
f8b4427
Undo last commit.
breyerml Oct 7, 2024
55e2936
Fix compilation error in the level zero error check function.
breyerml Oct 8, 2024
d1e878e
Add missing comparison to ZE_RESULT_SUCCESS.
breyerml Oct 8, 2024
751adee
Fix power related wrong units and values.
breyerml Oct 8, 2024
13daaa8
Correctly init level zero driver.
breyerml Oct 8, 2024
12da0d9
Try fixing installation issues.
breyerml Oct 8, 2024
7c0ce1e
Update README.
breyerml Oct 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,9 @@ ForEachMacros: [ 'foreach', 'Q_FOREACH', 'BOOST_FOREACH' ]
IfMacros: [ ]
IncludeBlocks: Regroup
IncludeCategories:
- Regex: '^"hardware_sampling/'
- Regex: '^"hws/'
Priority: 1
- Regex: '^"(pybind|nvml|rocm_smi|level_zero|subprocess)'
- Regex: '^"(pybind|nvml|cuda|rocm_smi|hip|level_zero|subprocess|fmt)'
Priority: 2
- Regex: '^.*'
Priority: 3
Expand Down
43 changes: 43 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Generate documentation

# only trigger this action on specific events
on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
build-documentation:
runs-on: ubuntu-latest
steps:
# checkout repository
- name: Checkout hws
uses: actions/[email protected]
with:
path: hardware_sampling
# install dependencies
- name: Dependencies
run: |
sudo apt update
sudo apt-get install -y doxygen graphviz
# configure project via CMake
- name: Configure
run: |
cd hardware_sampling
mkdir build
cd build
cmake -DHWS_ENABLE_DOCUMENTATION=ON ..
# build project
- name: Generate
run: |
cd hardware_sampling/build
make doc
# deploy generated documentation using github.io
- name: Deploy
uses: peaceiris/actions-gh-pages@v4
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./hardware_sampling/docs/html
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ Prerequisites
# CMake ================================
bin/
build*/
docs/html
install*/
cmake-build*/
CMakeLists.txt.user
CMakeCache.txt
Expand All @@ -53,4 +55,7 @@ CTestTestfile.cmake
# IDEs ================================
.idea/
.vscode/
.vs/
.vs/

# auto-generated version header
include/hws/version.hpp
115 changes: 83 additions & 32 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,29 @@

cmake_minimum_required(VERSION 3.22)

project("HWS - Hardware Sampling for GPUs and CPUs"
project("hws - Hardware Sampling for GPUs and CPUs"
VERSION 1.0.0
LANGUAGES CXX
DESCRIPTION "Hardware sampling (e.g., clock frequencies, memory consumption, temperatures, or energy draw) for CPUs, and GPUS.")
DESCRIPTION "Hardware sampling (e.g., clock frequencies, memory consumption, temperatures, or energy draw) for CPUs and GPUS.")

# explicitly set library source files
set(HWS_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/event.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/hardware_sampler.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/utility.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/event.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/hardware_sampler.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/system_hardware_sampler.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/utility.cpp
)

# create hardware sampling library
set(HWS_LIBRARY_NAME hardware_sampling)
set(HWS_LIBRARY_NAME hws)
add_library(${HWS_LIBRARY_NAME} SHARED ${HWS_SOURCES})
add_library(hws::hws ALIAS ${HWS_LIBRARY_NAME})

# set install target
set(HWS_TARGETS_TO_INSTALL ${HWS_LIBRARY_NAME})

# use C++20
target_compile_features(${HWS_LIBRARY_NAME} PUBLIC cxx_std_20)
# use C++17
target_compile_features(${HWS_LIBRARY_NAME} PUBLIC cxx_std_17)

# add target include directory
target_include_directories(${HWS_LIBRARY_NAME} PUBLIC
Expand Down Expand Up @@ -58,6 +60,44 @@ endif ()
message(STATUS "Setting the hardware sampler interval to ${HWS_SAMPLING_INTERVAL}ms.")
target_compile_definitions(${HWS_LIBRARY_NAME} PUBLIC HWS_SAMPLING_INTERVAL=${HWS_SAMPLING_INTERVAL}ms)

# install fmt as dependency
include(FetchContent)
set(HWS_fmt_VERSION 11.0.2)
find_package(fmt 11.0.2 QUIET)
if (fmt_FOUND)
message(STATUS "Found package fmt.")
else ()
message(STATUS "Couldn't find package fmt. Building version ${HWS_fmt_VERSION} from source.")
set(FMT_PEDANTIC OFF CACHE INTERNAL "" FORCE)
set(FMT_WERROR OFF CACHE INTERNAL "" FORCE)
set(FMT_DOC OFF CACHE INTERNAL "" FORCE)
set(FMT_INSTALL ON CACHE INTERNAL "" FORCE) # let {fmt} handle the install target
set(FMT_TEST OFF CACHE INTERNAL "" FORCE)
set(FMT_FUZZ OFF CACHE INTERNAL "" FORCE)
set(FMT_CUDA_TEST OFF CACHE INTERNAL "" FORCE)
set(FMT_MODULE OFF CACHE INTERNAL "" FORCE)
set(FMT_SYSTEM_HEADERS ON CACHE INTERNAL "" FORCE)
# fetch string formatting library fmt
FetchContent_Declare(fmt
GIT_REPOSITORY https://github.com/fmtlib/fmt.git
GIT_TAG ${HWS_fmt_VERSION}
QUIET
)
FetchContent_MakeAvailable(fmt)
set_property(TARGET fmt PROPERTY POSITION_INDEPENDENT_CODE ON)
add_dependencies(${HWS_LIBRARY_NAME} fmt)
endif ()
target_link_libraries(${HWS_LIBRARY_NAME} PUBLIC fmt::fmt)

########################################################################################################################
## configure version header ##
########################################################################################################################
message(STATUS "Configuring version information.")
configure_file(
${CMAKE_CURRENT_SOURCE_DIR}/include/hws/version.hpp.in
${CMAKE_CURRENT_SOURCE_DIR}/include/hws/version.hpp
@ONLY
)

####################################################################################################################
## CPU measurements ##
Expand Down Expand Up @@ -148,9 +188,9 @@ if (HWS_LSCPU_FOUND OR HWS_FREE_FOUND OR HWS_TURBOSTAT_EXECUTION_TYPE)
# add source file to source file list
target_sources(${HWS_LIBRARY_NAME} PRIVATE
$<BUILD_INTERFACE:
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/cpu/hardware_sampler.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/cpu/cpu_samples.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/cpu/utility.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/cpu/hardware_sampler.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/cpu/cpu_samples.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/cpu/utility.cpp;
>)

# add compile definitions
Expand All @@ -166,15 +206,16 @@ endif ()
# find libraries necessary for NVML and link against them
find_package(CUDAToolkit QUIET)
if (CUDAToolkit_FOUND)
target_link_libraries(${HWS_LIBRARY_NAME} PRIVATE CUDA::nvml)
target_link_libraries(${HWS_LIBRARY_NAME} PRIVATE CUDA::nvml CUDA::cudart)

message(STATUS "Enable sampling of NVIDIA GPU information using NVML.")

# add source file to source file list
target_sources(${HWS_LIBRARY_NAME} PRIVATE
$<BUILD_INTERFACE:
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/gpu_nvidia/hardware_sampler.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/gpu_nvidia/nvml_samples.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_nvidia/hardware_sampler.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_nvidia/nvml_samples.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_nvidia/utility.cpp
>)

# add compile definition
Expand All @@ -190,16 +231,18 @@ endif ()
## try finding ROCm SMI
find_package(rocm_smi QUIET)
if (rocm_smi_FOUND)
target_link_libraries(${HWS_LIBRARY_NAME} PRIVATE -lrocm_smi64)
find_package(HIP REQUIRED)
target_link_libraries(${HWS_LIBRARY_NAME} PRIVATE -lrocm_smi64 hip::host)
target_include_directories(${HWS_LIBRARY_NAME} PRIVATE ${ROCM_SMI_INCLUDE_DIR})

message(STATUS "Enable sampling of AMD GPU information using ROCm SMI.")

# add source file to source file list
target_sources(${HWS_LIBRARY_NAME} PRIVATE
$<BUILD_INTERFACE:
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/gpu_amd/hardware_sampler.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/gpu_amd/rocm_smi_samples.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_amd/hardware_sampler.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_amd/rocm_smi_samples.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_amd/utility.cpp
>)

# add compile definition
Expand All @@ -222,9 +265,9 @@ if (level_zero_FOUND)
# add source file to source file list
target_sources(${HWS_LIBRARY_NAME} PRIVATE
$<BUILD_INTERFACE:
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/gpu_intel/hardware_sampler.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/gpu_intel/level_zero_samples.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hardware_sampling/gpu_intel/utility.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_intel/hardware_sampler.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_intel/level_zero_samples.cpp;
${CMAKE_CURRENT_SOURCE_DIR}/src/hws/gpu_intel/utility.cpp
>)

# add compile definition
Expand All @@ -238,19 +281,27 @@ endif ()
## enable Python bindings ##
####################################################################################################################
option(HWS_ENABLE_PYTHON_BINDINGS "Build language bindings for Python." ON)

if (HWS_ENABLE_PYTHON_BINDINGS)
add_subdirectory(bindings)
endif ()


########################################################################################################################
## add documentation ##
########################################################################################################################
option(HWS_ENABLE_DOCUMENTATION "Add documentation using Doxygen." OFF)
if (HWS_ENABLE_DOCUMENTATION)
add_subdirectory(docs)
endif ()


########################################################################################################################
## add support for `make install` ##
########################################################################################################################
include(GNUInstallDirs)
## install all necessary library targets
install(TARGETS ${HWS_TARGETS_TO_INSTALL}
EXPORT hardware_sampling_Targets
EXPORT hws_Targets
ARCHIVE DESTINATION "${CMAKE_INSTALL_LIBDIR}" # all files that are neither executables, shared lib or headers
LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}" # all shared lib files
RUNTIME DESTINATION "${CMAKE_INSTALL_BINDIR}" # all executables
Expand All @@ -264,28 +315,28 @@ install(DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/include/"
## manage version comparison
include(CMakePackageConfigHelpers)
write_basic_package_version_file(
"hardware_samplingConfigVersion.cmake"
"hwsConfigVersion.cmake"
VERSION ${PROJECT_VERSION}
COMPATIBILITY SameMajorVersion
)

## generate configuration file
configure_package_config_file(
"${CMAKE_CURRENT_SOURCE_DIR}/cmake/hardware_samplingConfig.cmake.in"
"${PROJECT_BINARY_DIR}/hardware_samplingConfig.cmake"
INSTALL_DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/hardware_sampling/cmake
"${CMAKE_CURRENT_SOURCE_DIR}/cmake/hwsConfig.cmake.in"
"${PROJECT_BINARY_DIR}/hwsConfig.cmake"
INSTALL_DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/hws/cmake
)

## create and copy install-targets file
install(EXPORT hardware_sampling_Targets
FILE hardware_samplingTargets.cmake
install(EXPORT hws_Targets
FILE hwsTargets.cmake
NAMESPACE hws::
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/hardware_sampling/cmake
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/hws/cmake
)

## create file containing the build configuration and version information
install(FILES
"${PROJECT_BINARY_DIR}/hardware_samplingConfig.cmake"
"${PROJECT_BINARY_DIR}/hardware_samplingConfigVersion.cmake"
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/hardware_sampling/cmake
"${PROJECT_BINARY_DIR}/hwsConfig.cmake"
"${PROJECT_BINARY_DIR}/hwsConfigVersion.cmake"
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/hws/cmake
)
Loading