Skip to content

mOS for HPC v0.8 Readme

Rolf Riesen edited this page Aug 13, 2020 · 2 revisions

What is mOS for HPC

mOS for HPC is an operating systems research project at Intel, targeting extreme scale HPC systems deploying converged workflows for modeling / simulation, data analytics, and AI.  It aims to deliver a high performance computing environment with the scalability, low noise, and repeatability expected of lightweight kernels (LWK), while maintaining overall Linux compatibility that HPC plus AI / ML applications need.  

mOS for HPC remains under development at this time. These materials are being made available to interested parties to explore, to test drive, and to provide feedback through the mailing list. Consider the quality level to be pre-alpha and as-is. It is not intended to be used in production or for business critical uses. The required knowledge level of users is expected to be expert for Linux internals and for operating system principles. Support is limited by the development team's ability to respond through the mailing list. 

What's new for v0.8?

Feature Description Learn More
Linux 5.4.18 mOS for HPC v0.8 is based on the long term support kernel Linux 5.4.18 from kernel.org.
From a compatibility perspective, this version has been integrated and tested on a system based on SLES 15 SP1 with OpenHPC.

lwkctl auto

Automatic configuration of memory using the lwkctl command now defaults to balancing designations across memory domains of the same memory type. The ability to designate the maximum possible memory from each node still exists through the use of the keyword=value: lwkmem=auto:max.

man page for lwkctl
lwkctl precise A new option, --precise, will generate an LWK partition only if the exact memory requested can be satisfied. The previous default behavior to create a partition with less memory if the requested memory cannot be obtained remains the same. man page for lwkctl
Memory Various bug fixes and improvements in LWK memory manager git log
yod Introduce additional resource reservation syntax to use a file to map resources for specific MPI ranks. yod man page & git log
LWK behavior Syscalls are all local now by default.
Test & validation More unit tests
RAS Improved and expanded RAS event system.
Scheduler
  • Make round robin, instead of FIFO, scheduling the default
  • Do not time slice in round robin when there are fewer than two processes


Platform requirements

The development and testing for mOS for HPC v0.8 has been performed on systems with Intel(R) Xeon(R) Scalable processor family and on systems with Intel(R) Xeon Phi(TM) product family. As a result, mOS for HPC includes optimizations considering technologies such as multi-socket CPUshigh core counts, Intel® Hyper-Threading Technology; and complex memory configurations up to 8 NUMA domains (DDR + high bandwidth memory). Specific configurations include:

  • Intel(R) Xeon(R) Gold 6140  processors with 128GiB of DDR4, Intel(R) HT Technology on, and booted without subnuma clustering (SNC)
  • Intel(R) Xeon Phi(TM) processors 7250 with 96GiB of DRAM and 16GiB of MCDRAM, Intel(R) HT Technology on, and booted in sub-NUMA clustering 4 (SNC-4) mode and flat memory mode

Your mileage may vary on other platforms and configurations in terms of functionality and performance.

Additional remarks:

  • If you use the Intel(R) Xeon Phi(TM) processor 7230, then Quadrant cluster mode, Flat memory mode is recommended.
  • If you want to make all of MCDRAM available to applications on Intel(R) Xeon Phi(TM) processors, you must verify that MCDRAM is hot-pluggable in the BIOS settings.  Please see the Administrator's Guide.
  • The development team has observed lower performance of mOS for HPC when running in cache memory mode on Intel(R) Xeon Phi(TM) processors, which is not necessarily attributed to hardware.
  • Processors outside of the x86_64 architecture designation in Linux are unsupported – the kernel code will not configure and build.

The Linux distribution used by the development team for building, installing, and testing mOS for HPC has been for SLES 15 SP1 with OpenHPC. There also has been limited testing with CentOS 7.  Other distributions have had almost no testing, and may require adaptations for the build, install instructions to your environment.

mOS for HPC development plans to track Intel(R) Parallel Studio XE 2020 Cluster Edition for Linux* and MPICH/MPICH4 updates as they become available.  Almost no testing has been done using other compilers (e.g. gcc) or MPI runtimes (e.g. MVAPICH or OpenMPI). 

Where to get code

The mOS for HPC source can be checked out from GitHub at https://github.com/intel/mOS.   Please see the Administrator's Guide for further instructions.

Where to report issues or ask questions

Register for the mOS for HPC mailing list at https://groups.google.com/g/mos-devel/. Please, submit feedback and follow discussions through this list.


*Other names and brands may be claimed as the property of others.