Ansible playbook-based tools for deploying Slurm and Kubernetes clusters for High Performance Computing, Machine Learning, Deep Learning, and High-Performance Data Analytics

This project is maintained by dellhpc

Custom ISO provisioning on Dell EMC PowerEdge Servers

Configuring Servers with Out-of-Band Management (Provision Method: iDRAC)

Generating a Custom ISO

Run idrac_template via CLI

  1. Verify that /opt/omnia/idrac_inventory is created and updated with all iDRAC IP details. This is done automatically when control_plane.yml is run. If it’s not updated, run ansible-playbook collect_device_info.yml from the control_plane directory.
  2. Run ansible-playbook idrac.yml -i /opt/omnia/idrac_inventory

Run idrac_template on the AWX UI.

  1. Run kubectl get svc -n awx.
  2. Copy the Cluster-IP address of the awx-ui.
  3. To retrieve the AWX UI password, run kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode.
  4. Open the default web browser on the control plane and enter http://<IP>:8052, where IP is the awx-ui IP address and 8052 is the awx-ui port number. Log in to the AWX UI using the username as admin and the retrieved password.
  5. Under RESOURCES -> Templates, launch the idrac_template.

Omnia role used to provision custom ISO on PowerEdge Servers using iDRAC: provision_idrac

For the idrac.yml file to successfully provision the custom ISO on the PowerEdge Servers, ensure that the following prerequisites are met:

The provision_idrac file configures and validates the following:

Provisioning newly added PowerEdge servers in the cluster

To provision newly added servers, wait till the iDRAC IP addresses are automatically added to the idrac_inventory. After the iDRAC IP addresses are added, launch the iDRAC template on the AWX UI to provision CentOS custom OS on the servers.

If you want to re-provision all the servers in the cluster or any of the faulty servers, you must remove the respective iDRAC IP addresses from provisioned_idrac_inventory on AWX UI and then launch the iDRAC template. If required, you can delete the provisioned_idrac_inventory from the AWX UI to remove the IP addresses of provisioned servers. After the servers are provisioned, provisioned_idrac_inventory is created and updated on the AWX UI.

Configuring Servers with In-Band Management (Provision Method: PXE)

Omnia role used: provision_cobbler
Ports used by Cobbler:

To create the Cobbler image, Omnia configures the following:

To access the Cobbler dashboard, enter https://<IP>/cobbler_web where <IP> is the Global IP address of the control plane. For example, enter to access the Cobbler dashboard.

Note: After the Cobbler Server provisions the operating system on the servers, IP addresses and hostnames are assigned by the DHCP service.

  • If a mapping file is not provided, the hostname to the server is provided based on the following format: computexxx-xxx where “xxx-xxx” is the last two octets of the Host IP address. For example, if the Host IP address is then the assigned hostname by Omnia is compute0-11.
  • If a mapping file is provided, the hostnames follow the format provided in the mapping file.
  • If you want to add more nodes, append the new nodes in the existing mapping file. However, do not modify the previous nodes in the mapping file as it may impact the existing cluster.
  • With the addition of Multiple profiles, the cobbler container dynamically updates the mount point based on the value of provision_os in base_vars.yml.

DHCP routing using Cobbler

Omnia now supports DHCP routing via Cobbler. To enable routing, update the primary_dns and secondary_dns in base_vars with the appropriate IPs (hostnames are currently not supported). For compute nodes that are not directly connected to the internet (ie only host network is configured), this configuration allows for internet connectivity.

Security enhancements

Omnia provides the following options to enhance security on the provisioned PowerEdge servers:

Note: It is suggested that you use the ansible-vault view or edit commands and that you do not use the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to idrac_tools_vars.yml.

On the AWX Dashboard, select the respective security requirement playbook and launch the iDRAC template by performing the following steps.

  1. On the AWX Dashboard, under RESOURCES -> Templates, select the idrac_template.
  2. Under the Details tab, click Edit.
  3. In the Edit Details page, click the Playbook drop-down menu and select tools/idrac_system_lockdown.yml, tools/idrac_secure_boot.yml, tools/idrac_2fa.yml, or tools/idrac_ldap.yml.
  4. Click Save.
  5. To launch the iDRAC template with the respective playbook selected, click Launch.