Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run on multiple localities with MPI #1277

Open
stevenrbrandt opened this issue Oct 6, 2020 · 4 comments
Open

Can't run on multiple localities with MPI #1277

stevenrbrandt opened this issue Oct 6, 2020 · 4 comments

Comments

@stevenrbrandt
Copy link
Member

stevenrbrandt commented Oct 6, 2020

Run command

mpirun -np 4 /usr/local/build/bin/physl --dump-counters=py-csv.txt --dump-newick-tree=py-tree.txt --dump-dot=py-graph.txt --performance --print=result.py --hpx:run-hpx-main --hpx:thread=1 cannon.physl

Contents of cannon.physl

define$53$0(cannon$53$0, size$53$11, block(define$54$4(array1$54$4, random_d$54$13(list$54$22(size$54$23, size$54$29), find_here$54$36(), num_localities$54$49())), define$55$4(array2$55$4, random_d$55$13(list$55$22(size$55$23, size$55$29), find_here$55$36(), num_localities$55$49())), define$56$4(v1$56$4, cannon_product_d$56$9(array1$56$26, array2$56$34)), define$57$4(v2$57$4, dot_d$57$9(array1$57$15, array2$57$23)), all$58$11(__eq$58$15(v1$58$15, v2$58$21))))
cannon(120)

Generated from

def cannon(size):
    array1 = random_d([size, size], find_here(), num_localities())
    array2 = random_d([size, size], find_here(), num_localities())
    v1 = cannon_product_d(array1, array2)
    v2 = dot_d(array1, array2)
    return all(v1 == v2)

Output:

physl: exception caught:
the given component id does not belong to a local object: HPX(bad_parameter)
physl: exception caught:
the given component id does not belong to a local object: HPX(bad_parameter)
physl: exception caught:
the given component id does not belong to a local object: HPX(bad_parameter)
physl: exception caught:
the given component id does not belong to a local object: HPX(bad_parameter)
@NanmiaoWu
Copy link
Contributor

@stevenrbrandt Are you able to run if using srun instread of mpirun?

@NanmiaoWu
Copy link
Contributor

NanmiaoWu commented Oct 6, 2020

I tested on latest stable HPX, blaze, blaze_tensor, and phylanx. I created a file, named test_c.physl, which contains

define$53$0(cannon$53$0, size$53$11, block(define$54$4(array1$54$4, random_d$54$13(list$54$22(size$54$23, size$54$29), find_here$54$36(), num_localities$54$49())), define$55$4(array2$55$4, random_d$55$13(list$55$22(size$55$23, size$55$29), find_here$55$36(), num_localities$55$49())), define$56$4(v1$56$4, cannon_product_d$56$9(array1$56$26, array2$56$34)), define$57$4(v2$57$4, dot_d$57$9(array1$57$15, array2$57$23)), all$58$11(__eq$58$15(v1$58$15, v2$58$21))))
cannon(120)

Tested on qbc, the error info is:

[nanmiao@qbc2 bin]$ srun -N 1 -n 4 /home/nanmiao/dev/src/phylanx/build/bin/physl --dump-counters=py-csv.txt --dump-newick-tree=py-tree.txt --dump-dot=py-graph.txt --performance --print=result.py --hpx:run-hpx-main --hpx:thread=1 /home/nanmiao/dev/src/phylanx/examples/algorithms/als/test_c.physl 
physl: exception caught:
test_c.physl(58, 15): __eq:: cannot broadcast a matrix into a differently sized matrix: HPX(bad_parameter)
physl: exception caught:
test_c.physl(58, 15): __eq:: cannot broadcast a matrix into a differently sized matrix: HPX(bad_parameter)
physl: exception caught:
test_c.physl(58, 15): __eq:: cannot broadcast a matrix into a differently sized matrix: HPX(bad_parameter)
physl: exception caught:
test_c.physl(58, 15): __eq:: cannot broadcast a matrix into a differently sized matrix: HPX(bad_parameter)

@stevenrbrandt
Copy link
Member Author

@NanmiaoWu no, I'm running inside a singularity image.

@hkaiser
Copy link
Member

hkaiser commented Oct 8, 2020

@stevenrbrandt, this could have been caused by a problem in HPX. Could you please try STEllAR-GROUP/hpx#5004 to see if this fixes your issue?

After applying the patch, I see the same error as @NanmiaoWu, caused by dot_d returning a badly-sized array. See #1284 for the corresponding ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants