-
Notifications
You must be signed in to change notification settings - Fork 28
mOS for HPC v1.0 Readme
mOS for HPC is an operating systems research project at Intel, targeting extreme scale HPC systems deploying converged workflows for modeling / simulation, data analytics, and AI. It aims to deliver a high performance computing environment with the scalability, low noise, and repeatability expected of lightweight kernels (LWK), while maintaining overall Linux compatibility that HPC plus AI / ML applications need.
mOS for HPC continues to be under development. These materials are being made available to interested parties to explore, to test drive, and to provide feedback through the mailing list. The ability to compile, install, and boot a Linux kernel is required to get mOS up and running. A good understanding of what an OS does and how it interacts with the underlying hardware is needed to configure the LWK partition and get the most out of it. Support is limited by the development team's ability to respond through the mailing list.
Feature | Description |
---|---|
SLES15 SP4 Linux 5.14 base | mOS for HPC v1.0 is based on the SLES15 SP4 kernel from OpenSUSE (Linux 5.14). From a compatibility perspective, this version has been integrated and tested on a system based on SLES 15 SP3 with OpenHPC and MPICH. |
GPU resource management | Designate, reserved, allocate GPU resources similarly to CPU and Memory resources (dependency on 1Source LevelZero package). New yod options. New mOS sysfs files to manage GPU designation, usage counts, NUMA. |
Bug fixes |
The development and testing for mOS for HPC v1.0 has been performed on multi-socket systems with Intel(R) Xeon(R) Scalable processor family, including the 4th generation Xeon(R) Sapphire Rapids processor. As a result, mOS for HPC includes optimizations considering technologies such as multi-socket CPUs, high core counts, Intel® Hyper-Threading Technology; and complex memory configurations up to 16 NUMA domains (DDR + high bandwidth memory). Specific configurations include:
- Intel(R) Xeon(R) Sapphire Rapids x 2, 1TiB of DRAM and 128GiB of MCDRAM, Intel(R) HT Technology on, booted in SNC4/non-SNC sub-numa clustering and flat/cache HBM memory mode
- Intel(R) Xeon(R) Platinum 8168 processors with 192GiB of DDR4, Intel(R) HT Technology on, and booted without sub-numa clustering (SNC)
- Intel(R) Xeon(R) Gold 6140 processors with 128GiB of DDR4, Intel(R) HT Technology on, and booted without sub-numa clustering (SNC)
- Intel(R) Xeon Phi(TM) processor 7230
Your mileage may vary on other platforms and configurations in terms of functionality and performance.
Additional remarks:
- If you use the Intel(R) Xeon Phi(TM) processor 7230, then Quadrant cluster mode, Flat memory mode is recommended.
- If you want to make all of MCDRAM available to applications on Intel(R) Xeon Phi(TM) processors, you must verify that MCDRAM is hot-pluggable in the BIOS settings. Please see the Administrator's Guide.
- Processors outside of the x86_64 architecture designation in Linux are unsupported – the kernel code will not configure and build.
The Linux distribution used by the development team for building, installing, and testing mOS for HPC has been for SLES 15 SP4 and SLES 15 SP3 with OpenHPC and MPICH MPI. There has been limited testing with Intel MPI. Other distributions have had almost no testing, and may require adaptations for the build and install instructions to your environment.
mOS for HPC development plans to track Intel(R) oneAPI toolkits, Intel(R) Parallel Studio XE 2021 Cluster Edition for Linux*, and MPICH/MPICH4 updates as they become available. Almost no testing has been done using other compilers (e.g. gcc) or MPI runtimes (e.g. MVAPICH or OpenMPI).
The mOS for HPC source can be checked out from GitHub at https://github.com/intel/mOS. Please see the Administrator's Guide for further instructions.
Register for the mOS for HPC mailing list at https://groups.google.com/g/mos-devel/. Please, submit feedback and follow discussions through this list.
What is mOS for HPC mOS for HPC is an operating systems research project at Intel, targeting extreme scale HPC systems deploying converged workflows for modeling / simulation, data analytics, and AI. It aims to deliver a high performance computing environment with the scalability, low noise, and repeatability expected of lightweight kernels (LWK), while maintaining overall Linux compatibility that HPC plus AI / ML applications need.mOS for HPC continues to be under development. These materials are being made available to interested parties to explore, to test drive, and to provide feedback through the mailing list. The ability to compile, install, and boot a Linux kernel is required to get mOS up and running. A good understanding of what an OS does and how it interacts with the underlying hardware is needed to configure the LWK partition and get the most out of it. Support is limited by the development team's ability to respond through the mailing list.
What's new for v1.0? SLES15 SP4 Linux 5.14 base mOS for HPC v1.0 is based on the SLES15 SP4 kernel from OpenSUSE (Linux 5.14). From a compatibility perspective, this version has been integrated and tested on a system based on SLES 15 SP3 with OpenHPC and MPICH. GPU resource management Designate, reserved, allocate GPU resources similarly to CPU and Memory resources (dependency on 1Source LevelZero package).
New yod options New mOS sysfs files to manage GPU designation, usage counts, NUMA, Bug fixes
Platform requirements The development and testing for mOS for HPC v1.0 has been performed on multi-socket systems with Intel(R) Xeon(R) Scalable processor family, including the 4th generation Xeon(R) Sapphire Rapids processor. As a result, mOS for HPC includes optimizations considering technologies such as multi-socket CPUs, high core counts, Intel® Hyper-Threading Technology; and complex memory configurations up to 16 NUMA domains (DDR + high bandwidth memory). Specific configurations include:
Intel(R) Xeon(R) Sapphire Rapids x 2, 1TiB of DRAM and 128GiB of MCDRAM, Intel(R) HT Technology on, booted in SNC4/non-SNC sub-numa clustering and flat/cache HBM memory mode Intel(R) Xeon(R) Platinum 8168 processors with 192GiB of DDR4, Intel(R) HT Technology on, and booted without sub-numa clustering (SNC) Intel(R) Xeon(R) Gold 6140 processors with 128GiB of DDR4, Intel(R) HT Technology on, and booted without sub-numa clustering (SNC) Intel(R) Xeon Phi(TM) processor 7230 Your mileage may vary on other platforms and configurations in terms of functionality and performance.
Additional remarks:
If you use the Intel(R) Xeon Phi(TM) processor 7230, then Quadrant cluster mode, Flat memory mode is recommended. If you want to make all of MCDRAM available to applications on Intel(R) Xeon Phi(TM) processors, you must verify that MCDRAM is hot-pluggable in the BIOS settings. Please see the Administrator's Guide. Processors outside of the x86_64 architecture designation in Linux are unsupported – the kernel code will not configure and build. The Linux distribution used by the development team for building, installing, and testing mOS for HPC has been for SLES 15 SP4 and SLES 15 SP3 with OpenHPC and MPICH MPI. There has been limited testing with Intel MPI. Other distributions have had almost no testing, and may require adaptations for the build and install instructions to your environment.
mOS for HPC development plans to track Intel(R) oneAPI toolkits, Intel(R) Parallel Studio XE 2021 Cluster Edition for Linux*, and MPICH/MPICH4 updates as they become available. Almost no testing has been done using other compilers (e.g. gcc) or MPI runtimes (e.g. MVAPICH or OpenMPI).
Where to get code The mOS for HPC source can be checked out from GitHub at https://github.com/intel/mOS. Please see the Administrator's Guide for further instructions.
Where to report issues or ask questions Register for the mOS for HPC mailing list at https://groups.google.com/g/mos-devel/. Please, submit feedback and follow discussions through this list.