Omnia

Logo

Ansible playbook-based tools for deploying Slurm and Kubernetes clusters for High Performance Computing, Machine Learning, Deep Learning, and High-Performance Data Analytics

View the Project on GitHub dellhpc/omnia

Install Omnia using CLI

The following sections provide details on installing Omnia using CLI. If you want to install the Omnia appliance and manage workloads using the Omnia appliance, see Install the Omnia appliance and Monitor Kubernetes and Slurm for more information.

Prerequisites

Install Omnia using CLI

  1. Clone the Omnia repository:
    git clone -b release https://github.com/dellhpc/omnia.git 
    

    Note: After the Omnia repository is cloned, a folder named omnia is created. Ensure that you do not rename this folder.

  2. Change the directory to omnia: cd omnia

  3. An inventory file must be created in the omnia folder. Add compute node IPs under [compute] group and the manager node IP under [manager] group. See the INVENTORY template file under omnia\docs folder.

  4. To install Omnia:
    ansible-playbook omnia.yml -i inventory -e "ansible_python_interpreter=/usr/bin/python2" 
    
  5. By default, no skip tags are selected, and both Kubernetes and Slurm will be deployed.

To skip the installation of Kubernetes, enter:
ansible-playbook omnia.yml -i inventory -e "ansible_python_interpreter=/usr/bin/python2" --skip-tags "kubernetes"

To skip the installation of Slurm, enter:
ansible-playbook omnia.yml -i inventory -e "ansible_python_interpreter=/usr/bin/python2" --skip-tags "slurm"

To skip the NFS client setup, enter the following command to skip the k8s_nfs_client_setup role of Kubernetes:
ansible-playbook omnia.yml -i inventory -e "ansible_python_interpreter=/usr/bin/python2" --skip-tags "nfs_client"

  1. To provide passwords for mariaDB Database (for Slurm accounting), Kubernetes Pod Network CIDR, and Kubernetes CNI, edit the omnia_config.yml file.
    Note:
    • Supported values for Kubernetes CNI are calico and flannel. The default value of CNI considered by Omnia is calico.
    • The default value of Kubernetes Pod Network CIDR is 10.244.0.0/16. If 10.244.0.0/16 is already in use within your network, select a different Pod Network CIDR. For more information, see https://docs.projectcalico.org/getting-started/kubernetes/quickstart.

To view the set passwords of omnia_config.yml at a later time:
ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key

Omnia considers slurm as the default username for MariaDB.

Kubernetes roles

The following kubernetes roles are provided by Omnia when omnia.yml file is run:

Note:

Slurm roles

The following Slurm roles are provided by Omnia when omnia.yml file is run:

Note: If you want to install JupyterHub and Kubeflow playbooks, you have to first install the JupyterHub playbook and then install the Kubeflow playbook.

Commands to install JupyterHub and Kubeflow:

Note: When the Internet connectivity is unstable or slow, it may take more time to pull the images to create the Kubeflow containers. If the time limit is exceeded, the Apply Kubeflow configurations task may fail. To resolve this issue, you must redeploy Kubernetes cluster and reinstall Kubeflow by completing the following steps:

Add a new compute node to the cluster

To update the INVENTORY file present in omnia directory with the new node IP address under the compute group. Ensure the other nodes which are already a part of the cluster are also present in the compute group along with the new node. Then, runomnia.yml to add the new node to the cluster and update the configurations of the manager node.