-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: copy and permute improvements #13714
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: srkenno <[email protected]>
Signed-off-by: srkenno <[email protected]>
Should this PR be tested? Or is it WIP? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve so that the AT runs..
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: tjfulle |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
|
@cgcgcg I though pre-test inspection was sufficient to start the AT. Is approval now needed? |
@jhux2 I'm confused by this as well. |
@jhux2 @cgcgcg I checked the AT logs and it thought that this PR was already tested and had failed. My best guess is that it was picking up the previous failure from the old PR, since the commit was the same (basically creating a PR against It did explicitly see Jonathan's PRE-TEST-INSPECTED label and accept it. |
@sebrowne Thanks for looking into it! |
Signed-off-by: Christian Glusa <[email protected]>
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: tjfulle |
@trilinos/framework The framework-test failed with
Do I need to re-approve the job? |
The message about Please try re-running the framework-tests job now. |
@achauphan That worked. Thank you! |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments. Generally speaking, can copyAndPermuteNew
simply replace copyAndPermute
?
auto numEntries = rowInfo.numEntries; | ||
using inp_view_type = View<const GO*, Kokkos::HostSpace, MemoryUnmanaged>; | ||
inp_view_type inputInds(inputGblColInds, numInputInds); | ||
size_t numInserted; | ||
{ | ||
auto gblIndsHostView = this->gblInds_wdv.getHostView(Access::ReadWrite); | ||
// FIXME - device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the code needs to be rewritten to use device?
@@ -4805,6 +4828,14 @@ namespace Tpetra { | |||
const char tfecfFuncName[] = "copyAndPermute: "; | |||
const bool verbose = verbose_; | |||
|
|||
if (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete?
if (true) { | ||
const row_graph_type& srcRowGraph = | ||
dynamic_cast<const row_graph_type&> (source); | ||
copyAndPermuteNew(srcRowGraph, *this, numSameIDs, permuteToLIDs, permuteFromLIDs, INSERT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since copyAndPermute
calls copyAndPermuteNew
, would it make sense that cAPN
simply becomes cAP
?
Kokkos::abort("error"); \ | ||
} while(0) | ||
|
||
Kokkos::parallel_for("Tpetra_CrsGraph::copyAndPermuteNew2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be copyAndPermuteNew
?
#ifdef CRSGRAPH_INNER_ABORT | ||
#undef CRSGRAPH_INNER_ABORT | ||
#endif | ||
|
||
#define CRSGRAPH_INNER_ABORT(lin) do { \ | ||
printf("ERROR: Tpetra_CrsGraph_def.hpp:%d", lin); \ | ||
Kokkos::abort("error"); \ | ||
} while(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete?
@@ -624,7 +624,15 @@ class WrappedDualView { | |||
|
|||
// We check to see if the memory is not aliased *or* if it is a supported | |||
// (heterogeneous memory) accelerator (for shared host/device memory). | |||
return !memoryIsAliased() || Spaces::is_gpu_exec_space<typename DualViewType::execution_space>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary?
typedef GlobalOrdinal GO; | ||
|
||
const bool sorted = graph.isSorted (); | ||
const bool atomic = useAtomicUpdatesByDefault; // FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what needs to be fixed?
if (0 && targetIsLocallyIndexed && sourceIsLocallyIndexed) { | ||
|
||
// Create a mapping from the source's local column id's to my local column ids | ||
using DT = typename Node::device_type; | ||
const map_type& src_col_map = *(srcMat.getColMap()); | ||
const map_type& tgt_col_map = *(this->getColMap()); | ||
|
||
auto local_src_col_map = src_col_map.getLocalMap(); | ||
auto local_tgt_col_map = tgt_col_map.getLocalMap(); | ||
|
||
auto invalid = Teuchos::OrdinalTraits<LO>::invalid(); | ||
LO num_src_cols = static_cast<LO>(src_col_map.getLocalNumElements()); | ||
Kokkos::UnorderedMap<LO, LO, DT> lid_map(num_src_cols); | ||
for (LO src_local_col_idx=0; src_local_col_idx<num_src_cols; src_local_col_idx++) | ||
{ | ||
// FIXME using local maps here causes an exception in the current version, | ||
// possibly this is now invoked from host? | ||
auto global_idx = local_src_col_map.getGlobalElement(src_local_col_idx); | ||
auto tgt_local_col_idx = local_tgt_col_map.getLocalElement(global_idx); | ||
if (tgt_local_col_idx != invalid) { | ||
lid_map.insert(src_local_col_idx, tgt_local_col_idx); | ||
} | ||
} | ||
for (LO local_row=0; local_row<numSameIDs_as_LID; local_row++) | ||
{ | ||
values_host_view_type src_local_vals; | ||
local_inds_host_view_type src_local_cols; | ||
srcMat.getLocalRowView(local_row, src_local_cols, src_local_vals); | ||
|
||
values_host_view_type tgt_local_vals; | ||
local_inds_host_view_type tgt_local_cols; | ||
this->getLocalRowView(local_row, tgt_local_cols, tgt_local_vals); | ||
|
||
Kokkos::View<LO*, DT> indices("tgt_local_cols", src_local_cols.extent(0)); | ||
Kokkos::View<Scalar*, DT> values("tgt_local_vals", src_local_cols.extent(0)); | ||
size_t idx = 0; | ||
for (size_t offset=0; offset<src_local_cols.extent(0); offset++) { | ||
auto src_local_col_idx = src_local_cols(offset); | ||
if (lid_map.exists(src_local_col_idx)) { | ||
auto j = lid_map.find(src_local_col_idx); | ||
auto tgt_local_idx = lid_map.value_at(j); | ||
indices(idx) = tgt_local_idx; | ||
values(idx) = src_local_vals(offset); | ||
idx += 1; | ||
} | ||
} | ||
Kokkos::View<const local_ordinal_type*, DT> indices_const(indices.data(), indices.size()); | ||
const impl_scalar_type* const values_const_data = reinterpret_cast<const impl_scalar_type*>(values.data()); | ||
Kokkos::View<const impl_scalar_type*, DT> values_const(values_const_data, values.size()); | ||
auto inds = Kokkos::subview(indices_const, Kokkos::make_pair(size_t(0), idx)); | ||
auto vals = Kokkos::subview(values_const, Kokkos::make_pair(size_t(0), idx)); | ||
this->replaceLocalValues(local_row, inds, vals); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch never executes?
auto permuteToLIDs_h = permuteToLIDs.view_host (); | ||
TEUCHOS_ASSERT( ! permuteFromLIDs.need_sync_host () ); | ||
auto permuteFromLIDs_h = permuteFromLIDs.view_host (); | ||
if (1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
} | ||
} | ||
|
||
// FIXME - need to apply the same approach as above to the permutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we open a separate issue so we don't forget this?
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
2 similar comments
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
@trilinos/tpetra
Work completed by @skennon10
Motivation
In some simulations, up to 70-80 of time can be spent in
Tpetra::CrsMatrix::copy_and_permute
. This PR reduces that time dramatically.Supersedes #13682
Stakeholder Feedback
From an affected customer:
Testing
Local Tpetra tests passed.