Skip to content

Configuration and instructions to create a CERN-like cluster with virtual machines using Ansible and Proxmox

Notifications You must be signed in to change notification settings

Charlotte-Knight/cern-like-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cern-like-cluster

Physicists who work on the Large Hadron Collider (LHC) at CERN tend to do most of their work on remote machines via ssh which provide access to:

  1. a consistent environment;
  2. common software;
  3. common filesystems such as eos (Exabyte storage);
  4. and batch computing systems like the Worldwide LHC Computing Grid (1.4M cores).

The machines that we remote into are usually just one in a cluster of machines that play different roles and this repository aims to replicate a minimal example of such a cluster. This minimal example contains 3 machines: server, lx01, and exec01. These machines all have the same operating system (Rocky Linux 9.6) and useful packages that you might expect to find, e.g. gcc.

The server machine hosts an NFS share for the /home directory which the other machines mount so that the /home directory is the same across machines. The server machine is also a local squid proxy for cvmfs, which essentially means that it is a cvmfs cache for the cluster.

The clients, lx01 and exec01, have cvmfs mounted and configured to use server as a proxy. They also have eos mounted.

Finally, the machines have htcondor installed for batch computing where each machine plays a different role. In this setup, the intention is that a user will log in to lx01 (an interactive machine) and do their work there, and if they wish to submit some tasks/jobs to a pool of machines, they can. In this case, that "pool" is just one machine: exec01. In htcondor language, this means that lx01 is configured as an "access point", exec01 is configured as an "execution point", and server is set up as a "central manager" which negotiates between resources and resource requests, i.e. receives jobs from access points and distributes them to execution points.

The configuration of these machines is handled automatically with Ansible. A "playbook" is written which defines the configuration tasks, and this is paired with an "inventory" which defines the machines in the cluster. A playbook contains tasks like installing a particular package with dnf, or mounting a drive. An inventory provides the IP addresses of the machines, and other things like what user Ansible should ssh into the machines with.

Lastly, there is the issue of where the machines come from in the first place. For a real-world application, one would want to buy physical machines, and also expand the minimal example described above, i.e. have many exec machines and possibly more lx machines. For my experimentation, I have instead chosen to use virtual machines which I create on my old Dell XPS laptop using Proxmox VE.

Proxmox VE, in my basic understanding, is an operating system that provides native support for virtual machines and makes everything work nicely. Virtual machines can be managed via an online interface, or in the command line, and I use a mixture of both.

Further details and instructions for creating the virtual machines and configurating them can be found (once added) in the proxmox directory and ansible directory of this repository.

About

Configuration and instructions to create a CERN-like cluster with virtual machines using Ansible and Proxmox

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages