The following sections provide details on installing
omnia.yml using CLI.
git clone https://github.com/dellhpc/omnia.git
Note: After the Omnia repository is cloned, a folder named omnia is created. Ensure that you do not rename this folder.
Change the directory to omnia:
omnia_config.ymlfile, provide the required details (Check the parameter guide for more information).
Note: Without the login node, Slurm jobs can be scheduled only through the manager node.
**Note**: * Omnia checks for red hat subscription being enabled on RedHat nodes as a pre-requisite. Check out [how to enable Red Hat subscription here](/omnia/Installation_Guides/ENABLING_OMNIA_FEATURES.html#red-hat-subscription). Not having Red Hat subscription enabled on the manager node will cause `omnia.yml` to fail. If compute nodes do not have Red Hat subscription enabled, `omnia.yml` will skip the node entirely. * Ensure that all the four groups (login_node, manager, compute, nfs_node) are present in the template, even if the IP addresses are not updated under login_node and nfs_node groups.
|Leap OS||CentOS, Rocky, Red Hat|
Note: Omnia creates a log file which is available at:
To skip the installation of Kubernetes, enter:
ansible-playbook omnia.yml -i inventory --skip-tags "kubernetes"
To skip the installation of Slurm, enter:
ansible-playbook omnia.yml -i inventory --skip-tags "slurm"
Note: If only Slurm is being installed on the cluster, docker credentials are not required.
Warning: LMOD and LUA are installed with Slurm when running
omnia.yml. If LMOD and LUA are required, do not use the skip Slurm tag.
To skip the NFS client setup, enter the following command to skip the k8s_nfs_client_setup role of Kubernetes:
ansible-playbook omnia.yml -i inventory --skip-tags "nfs_client"
The default path of the Ansible configuration file is `/etc/ansible/`. If the file is not present in the default path, then edit the `ansible_config_file_path` variable to update the configuration path.
- Supported values for Kubernetes CNI are calico and flannel. The default value of CNI considered by Omnia is calico.
- The default value of Kubernetes Pod Network CIDR is 10.244.0.0/16. If 10.244.0.0/16 is already in use within your network, select a different Pod Network CIDR. For more information, see https://docs.projectcalico.org/getting-started/kubernetes/quickstart.
- If you want to view or edit the
omnia_config.ymlfile, run the following command:
ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key– To view the file.
ansible-vault edit omnia_config.yml --vault-password-file .omnia_vault_key– To edit the file.
- It is suggested that you use the ansible-vault view or edit commands and that you do not use the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to
slurm as the default username for MariaDB.
The following kubernetes roles are provided by Omnia when omnia.yml file is run:
Caution: If the target node is running Rocky, Nvidia drivers will only be installed if kernel package upgrades are available. If not, the installation is skipped with a warning message.
/home/k8snfs, is created. Using this directory, compute nodes share the common files.
omnia.ymland skip slurm using
The following Slurm roles are provided by Omnia when omnia.yml file is run:
To enable the login node, the login_node_required variable must be set to “true” in the omnia_config.yml file.
Note: If LeapOS is being deployed, login_common and login_server roles will be skipped.
To skip the installation of:
- The login node: In the
omnia_config.ymlfile, set the login_node_required variable to “false”.
- The FreeIPA server and client: Use
--skip-tags freeipawhile executing the omnia.yml file.
If you want to install JupyterHub and Kubeflow playbooks, you have to first install the JupyterHub playbook and then install the Kubeflow playbook.
Commands to install JupyterHub and Kubeflow:
ansible-playbook platforms/jupyterhub.yml -i inventory
ansible-playbook platforms/kubeflow.yml -i inventory
Note: When the Internet connectivity is unstable or slow, it may take more time to pull the images to create the Kubeflow containers. If the time limit is exceeded, the Apply Kubeflow configurations task may fail. To resolve this issue, you must redeploy Kubernetes cluster and reinstall Kubeflow by completing the following steps:
- Format the OS on manager and compute nodes.
- In the
omnia_config.ymlfile, change the k8s_cni variable value from calico to flannel.
- Run the Kubernetes and Kubeflow playbooks.
To update the INVENTORY file present in
omnia directory with the new node IP address under the compute group. Ensure the other nodes which are already a part of the cluster are also present in the compute group along with the new node. Then, run
omnia.yml to add the new node to the cluster and update the configurations of the manager node.
BeeGFS is a hardware-independent POSIX parallel file system (a.k.a. Software-defined Parallel Storage) developed with a strong focus on performance and designed for ease of use, simple installation, and management. BeeGFS is created on an Available Source development model (source code is publicly available), offering a self-supported Community Edition and a fully supported Enterprise Edition with additional features and functionalities. BeeGFS is designed for all performance-oriented environments including HPC, AI and Deep Learning, Media & Entertainment, Life Sciences, and Oil & Gas (to name a few).
Once all the pre-requisites are met, run
omnia.yml to set up BeeGFS.