- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20181117
        Geoffrey Paulsen edited this page Jan 15, 2019 
        ·
        1 revision
      
    - Dialup Info: (Do not post to public mailing list or public wiki)
 
- Geoff Paulsen
 - Jeff Squyres
 - Akshay Venkatesh (nVidia)
 - Brian Barrett
 - Dan Topa (LANL)
 - David Bernholdt
 - Geoffroy Vallee
 - Howard Pritchard
 - Joshua Ladd
 - Josh Hursey
 - Matias Cabral
 - Matthew Dosanjh
 - Nathan Hjelm
 - Ralph Castain
 - Thomas Naughton
 - Todd Kordenbrock
 - Xin Zhao
 
- Edgar Gabriel
 - Aravind Gopalakrishnan (Intel)
 - Arm (UTK)
 - George
 - Peter Gottesman (Cisco)
 - mohan
 
- 
Vader Issue 6014 - Aquire thread fence anywhere we're using the sync buildin primatives.
- Symptom is hang.
 - Reproducer is tight loop on barrier
 - Need a FAQ entry on this.
 - Workaround is 
--disable-builtin-atomics- no real downside from doing this.
 
 - 3.x, 4.x, maybe 2.1.x?
 - master wasn't affected because it uses C11 by default.
 - Wherever this goes back to, we need to update PMIx there to, since same code.
 - End of the day, this will drive a new release on v2.x, v3.0 and 3.1
 
 - 
Face to Face was next week
 - 
Summary of PMIx re-architecturing for v5.0
 - 
Lots of TCP wire-up discussion
 
Review All Open Blockers
- Schedule:
 - Driver: Assembly and locking fix, vader and pmix, etc.
 - Jeff will roll an RC end of this week.
 - May get ready to finish it, but not release it until January (since we're all going away).
 
Review v3.0.x Milestones v3.0.3
- Schedule:
 - Scheduled 3.0.4 may of 2019
- PMIx 2.2 will be available next week
 
 
Review v3.1.x Milestones v3.1.0
- Schedule:
 - Scheduled 3.1.4 april of 2019
- New PMIx available next week
 
 
Review v4.0.x Milestones v4.0.1
- Schedule: Need a quick turn around for a v4.0.1
 - v4.0.0 - a few major issues:
- mpi.h is correct, but the library is not building the removed and deprecated functions because they're missing in Makefile.
 - Two issue hit via SPACK packaging:
- root cause may be: make -j creates TOO many threads of parallel execution on some OSes.
 - max filename restrition on fortran header files.
- PR6121 master - should resolve on v4.0.x
 
 
 
 - Discuss pulling PR 6110 into v4.0.1
- Bug, some OSHMEM APIs missed in v4.0.0
 - Jeff pulled up slides showing that we can ADD APIs in minor versions.
- Old built executables must be able to run with newer.
 - We need to verify if the patch breaks anything with older built executables.
 
 - Because this PR is just adding functions, it should be okay.
 - Mellanox volunteered to test built with old executable and run with newer OMPI
 - If that test passes, everyone is okay with pulling this in.
 
 - UCX priority PR - expecting a PR from master
 - Matias Cabral local procs with OFI MTL - master this PR is okay, will be coming back to v4.0.x 6106
 - Two rankfile mapper issues reported on mailing list. Howard will file issue.
 
- Libtool issue came up before or during supercomputing.
- this goes back to v3.0 or v3.1 (can't remember what user was actually using).
 - We made a backwards incompatible change to opal (not part of our ABI)
 - when we bumped the version numbers in libtool, we bumped the version so you couldn't use an old libopal with a new libmpi. On basis that Apps should only link in libmpi, so it doesn't matter.
 - We had a user complaining that it was failing due to link errors.  After a bit for, his app was linking against lib HDF5 library which is linked against libtool which does a secondary inspection and links against libopal.
- Not really an HDF5 bug, it's a libtool issue.
 - Litterly nothing we can do for v3.0.3 (or nothing we can do for v3.0.x)
 - Probably want to figure out what to do here.
 
 - Option 1: Stop installing libtool 
.lafiles.- Actually be "gross", have to talk to package managers, they have strong feelings.
 
 - Option 2: Start treating those libraries as part of our ABI gaurantee.
 - Option 3: Someone's flavor of libtool has a patch that they don't include the dependent library in the 
.lafiles.- Jeff and Brian will look at patch, and inquire upstream with libtool
 - 2015 was last time libtool had an active release.
- Don't know if there's much active libtool development anyway.
 
 - Need to feel out the libtool community about this.
 
 
 - Lots of golden balls on PR's due to Amazon EWS / Jenkins
- Very busy time for Amazon this week
 - Looks like the problem is in Jenkins (deadlocking on itself), web-interface is still up. None of the instances spin down, etc. Need to go find jenkin's bug report and see if they've made progress.
 - Need someone with root on Jenkins to look at it.
 - UPdate: Jenkins server is just dead.
 
 - What do we do about all of these Master PRs?
- We don't have a release off of master soon.
 - New PRs won't go yellow-ball because don't spawn EC2 tasks (theory)
 - Will still run libfabric and some other tests.
 
 
- Releasing a new version at end of week or next week.
 
- Cisco has a one-sided info check that failed a hundred times.
- Cisco install fail looks like a legit compile fail (ipv6 master)
 
 
- We have a new ibm-ompi SLACK channel for Open MPI developers.
- Not for users, just developers...
 - email Jeff If you're interested in being added.
 
 
Review Master Master Pull Requests
- didn't discuss today.
 
Review Master MTT testing
- Mellanox, Sandia, Intel
 - LANL, Houston, IBM, Fujitsu
 - Amazon,
 - Cisco, ORNL, UTK, NVIDIA