- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20170606
        Geoffrey Paulsen edited this page Jan 9, 2018 
        ·
        1 revision
      
    - Dialup Info: (Do not post to public mailing list or public wiki)
 
- Geoff Paulsen
 - Edgar Gabriel
 - Artem Polyakov
 - Jeff Squyres (Cisco)
 - Howard Pritchard
 - Josh Hursey
 - Joshua Ladd
 - Mohan
 - Todd Kordenbrock
 - David Bernholdt
 - Nathan Hjelm
 - Ralph
 - Brian Barrett (Amazon)
 - Geoffroy Vallee
 - Mark Allen (IBM)
 - Sylvain Jeaugey
 - Thomas Naughton
 
Review All Open Blockers
- Discuss Progression Issue 3616
- openib progression issue.
 - Nathan will try to look at this, this week.
 - Not sure where we might block in the callbacks.
 - George is re-working progression model, but because we're getting new model, we just need an ugly solution for now.
 - Hit this in a non-contiguous one-sided put down in openib via osc_rdma. The accumulate wants to trigger another callback. And then a barrier to get the timing right.
 - Nathan thinks he can make the unlock non-blocking in the accumulate lock.
 
 
- released June 1st.
 - No driver for a v2.0.4 at this time.
 
- v2.1.1 went out in May
 - No Driver for v2.1.2 at this time.
 
Review Milestones v3.0
- Planning to do v3.0 RC today, but lots of failures in nightly MTTs.
- Cisco killed a bunch, and will re-kick-off a bunch.
 
 - Datatype and Info Key type errors out of IBM tests.
 - Amazon false positives because they're direct launching, but don't support dynamic processes in direct launch.
 - Howard sent out request for NEWs updates.
 - One additional PMIx issue.
- orte, opal and PMIx, threading issue from IBM.
 - Some confusion if we have assembly backwards in PMIx 2.0 (Nathan or George)
 - Nathan can take a look when he gets into office today.
 - only seen evidence in PMIx.
 - Issue is in PMIx: https://github.com/pmix/pmix/issues/347
 
 - Ralph sync up with Brian and Howard end of day to hear status of issue, for v3.0 RC.
 
Review Master Pull Requests
Review Master MTT testing
- Still seeing some 'make check' errors has been fixed.
- IBM still seeing a hang in 'make check' - must be ppc64le specific. No timeout.
 
 - 32bit compiler stuff fixed in pmix fix.
 - Geoffroy Vallee - still seeing some problems disabling make check.
 - MPI_Send_receive_replace - got fixed.
 - Timeouts are all CUDA related - nvidia.
- still there.
 
 - Issue: Redhat stock autoconf (rather than build our own)
 - Need a maintainer for rankfile mapper.
- IBM will take up maintaining rankfile mapper from Ralph.
 
 
- Intel making lots of progress. Nice features, but not sure how to make the transition.
 - .ini files would need to be transitioned across because python doesn't support funclets.
 - Does everyone have to transition the same day, or can the transition be one by one.
- Yes, everyone can transition in their own time.
 
 
- Face2Face Meeting-2017-07
- Date: July 11-13 (9am Tuesday - noon on Thursday.
 - Cisco has booked space in Chicago.
- Cisco has reserved some space right next to O-Hare  (can get shuttle to hotel).
- we have met there before.
 
 - Jeff will come in Monday evening.
 
 - Cisco has reserved some space right next to O-Hare  (can get shuttle to hotel).
 
 
- Amazon - bringing much more testing online, and CI processes.
- v3.0.0 Release work
 - Improved Jenkins infrastructure. Hopefully some changes yesterday (in Jenkins setup at Amazon) will make it run a little faster.
 
 - Travis is now officially deactivated. No longer using Travis.
 
- Amazon
 - Cisco, ORNL, UTK, NVIDIA
 - Mellanox, Sandia, Intel
 - LANL, Houston, IBM, Fujitsu