Ansible playbook-based tools for deploying Slurm and Kubernetes clusters for High Performance Computing, Machine Learning, Deep Learning, and High-Performance Data Analytics

This project is maintained by dellhpc

Parameters in opensm.conf

This file is located in /control_plane/input_params

Parameter Name Default Value Description
guid 0x0000000000000000 The port GUID on which OpenSM is running
m_key 0x0000000000000000 M_Key value sent to all ports qualifying all Set(PortInfo)
m_key_lease_period 0 The lease period used for the M_Key on this subnet in [sec]
m_key_protection_level 0 The protection level used for the M_Key on this subnet
m_key_lookup TRUE If TRUE, SM tries to determine the m_key of unknown ports from guid2mkey file else SM won’t try to determine the m_key of unknown ports.
sm_key 0x0000000000000001 SM_Key value of the SM used for SM authentication
sa_key 0x0000000000000001 SM_Key value to qualify rcv SA queries as ‘trusted’
subnet_prefix 0xfe80000000000000 Subnet prefix used on this subnet
lmc 0 The LMC value used on this subnet
lmc_esp0 FALSE lmc_esp0 determines whether LMC value used on subnet is used for enhanced switch port 0. If TRUE, LMC value for subnet is used for ESP0. Otherwise, LMC value for ESP0s is 0.
sm_sl 0 sm_sl determines SMSL used for SM/SA communication
packet_life_time 0x12 The code of maximal time a packet can live in a switch.
The actual time is 4.096usec * 2^
The value 0x14 disables this mechanism
vl_stall_count 0x07 The number of sequential packets dropped that cause the port to enter the VLStalled state. The result of setting this value to 0 is undefined.
leaf_vl_stall_count 0x08 The number of sequential packets dropped that cause the port to enter the VLStalled state. This value is for switch ports driving a CA or router port. The result of setting this value to zero is undefined.
head_of_queue_lifetime 0x12 The code of maximal time a packet can wait at the head of transmission queue.
The actual time is 4.096usec * 2^
The value 0x14 disables this mechanism
leaf_head_of_queue_lifetime 0x10 The maximal time a packet can wait at the head of queue on switch port connected to a CA or router port
max_op_vls 5 Limit the maximal operational VLs
force_link_speed 15 Force PortInfo:LinkSpeedEnabled on switch ports
If 0, don’t modify PortInfo:LinkSpeedEnabled on switch port
Otherwise, use value for PortInfo:LinkSpeedEnabled on switch port
Values are (IB Spec 1.2.1, Table 146 “PortInfo”)
1: 2.5 Gbps
3: 2.5 or 5.0 Gbps
5: 2.5 or 10.0 Gbps
7: 2.5 or 5.0 or 10.0 Gbps
2,4,6,8-14 Reserved
Default 15: set to PortInfo:LinkSpeedSupported
force_link_speed_ext 31 Force PortInfo:LinkSpeedEnabled on switch ports
If 0, don’t modify PortInfo:LinkSpeedEnabled on switch port
Otherwise, use value for PortInfo:LinkSpeedEnabled on switch port
Values are (MgtWG RefIDs #4722 and #9366)
1: 14.0625 Gbps
2: 25.78125 Gbps
3: 14.0625 Gbps or 25.78125 Gbps
4: 53.125 Gbps
5: 14.0625 Gbps or 53.125 Gbps
6: 25.78125 Gbps or 53.125 Gbps
7: 14.0625 Gbps, 25.78125 Gbps or 53.125 Gbps
30: Disable extended link speeds
Default 31: set to PortInfo:LinkSpeedExtSupported
force_link_width 255 Force PortInfo:LinkWidthEnabled on switch ports
If 0, don’t modify PortInfo:LinkWidthEnabled on switch port
Otherwise, use value for PortInfo:LinkWidthEnabled on switch port
Values are (IB Spec 1.2.1, Table 146 “PortInfo” augmented by MgtWG RefIDs #9306-9309)
fdr10 1 FDR10 on ports on devices that support FDR10
Accepted Values: 0 (don’t use fdr10 (no MLNX ExtendedPortInfo MADs)), 1 (enable fdr10 when supported), 2 (disable fdr10 when supported)
subnet_timeout 18 The subnet_timeout code that will be set for all the ports
The actual timeout is 4.096usec * 2^
local_phy_errors_threshold 0x08 Threshold of local phy errors for sending Trap 129
overrun_errors_threshold 0x08 Threshold of credit overrun errors for sending Trap 130
use_mfttop TRUE Use SwitchInfo:MulticastFDBTop if advertised in PortInfo:CapabilityMask
no_partition_enforcement FALSE Disable partition enforcement by switches (DEPRECATED)
This option is DEPRECATED. Please use part_enforce instead
part_enforce both Partition enforcement type (for switches)
Accepted Values: both, out, in, off
Default Value: both (outbound and inbound enforcement)
allow_both_pkeys FALSE Allow both full and limited membership on the same partition
keep_pkey_indexes TRUE Keep current and take into account old pkey indexes during calculation of physical ports pkey tables
sm_assigned_guid 0x00 SM assigned GUID byte where GUID is formed from OpenFabrics OUI followed by 40 bits xy 00 ab cd ef where xy is the SM assigned GUID byte and ab cd ef is an SM autogenerated 24 bits SM assigned GUID byte should be configured as subnet unique.
sweep_interval 10 The number of seconds between subnet sweeps (0 disables it)
reassign_lids FALSE If TRUE cause all lids to be reassigned
force_heavy_sweep FALSE If TRUE forces every sweep to be a heavy sweep
sweep_on_trap TRUE If TRUE every trap 128 and 144 will cause a heavy sweep.
Note: Successive identical traps (>10) are suppressed
port_profile_switch_nodes FALSE If TRUE count switches as link subscriptions
port_prof_ignore_file   Name of file with port guids to be ignored by port profiling
hop_weights_file   The file holding routing weighting factors per output port
port_search_ordering_file   The file holding non-default port order per switch for routing
routing_engine   Multiple routing engines can be specified separated by commas so that specific ordering of routing algorithms will be tried if earlier routing engines fail.
Accepted Values: minhop, updn, dnup, file, ftree, lash, dor, torus-2QoS, nue, dfsssp, sssp
avoid_throttled_links FALSE Routing engines will avoid throttled switch-to-switch links
supported by: nue, dfsssp, sssp
connect_roots FALSE Connect roots (use FALSE if unsure)
use_ucast_cache FALSE Use unicast routing cache (use FALSE if unsure)
lid_matrix_dump_file   Lid matrix dump file name
lfts_file   LFTs file name
root_guid_file   The file holding the root node guids (for fat-tree or Up/Down)
Note: Place one GUID per line
cn_guid_file   The file holding the fat-free compute node guids (for fat-tree or Up/Down)
Note: Place one GUID per line
io_guid_file   The file holding the fat-free I/O node guids (for fat-tree or Up/Down)
Note: Place one GUID per line. If only one io_guid file is provided, the rest of the nodes are assumed to be compute nodes.
quasi_ftree_indexing FALSE If TRUE: enable alternative indexing policy for ftree routing in quasi-ftree topologies that can improve shift-pattern support. The switch indexing starts from root switch and leaf switches are termination points of BFS algorithm
If FALSE: the indexing starts from leaf switch (default)
max_reverse_hops 0 Number of reverse hops allowed for I/O nodes
Used for connectivity between I/O nodes connected to Top Switches
ids_guid_file   The file holding the node ids which will be used by Up/Down algorithm instead of GUIDs (one guid and id in each line)
guid_routing_order_file   The file holding guid routing order (for MinHop and Up/Down)
do_mesh_analysis FALSE Enable mesh topology analysis (for LASH algorithm)
lash_start_vl 0 Starting VL for LASH algorithm
nue_max_num_vls 1 Maximum number of VLs for Nue routing algorithm (default: 1; to enforce deadlock-freedom even if QoS is not enabled). Set to 0 if Nue should automatically determine and choose maximum supported by the fabric, or any integer >= 1 (then Nue uses min(max_supported,nue_max_num_vls)
nue_include_switches FALSE If TRUE, then Nue assumes that switches will send/receive data traffic, too, and hence their paths are included in the deadlock-avoidance calculation (use FALSE if unsure)
port_shifting FALSE Port Shifting (use FALSE if unsure)
scatter_ports 0 Assign ports in a random order instead of round-robin.
If 0: disable (default), else use the value as a random seed
guid_routing_order_no_scatter FALSE Enables using scatter for ports defined in the guid_routing_order file
sa_db_file   SA database file name
sa_db_dump FALSE If TRUE causes OpenSM to dump SA database at the end of every light sweep, regardless of the verbosity level
torus_config /etc/rdma/torus-2QoS.conf Torus-2QoS configuration file name
sm_priority 0 SM priority used for deciding who is the master.
Accepted Values: 0 (lowest priority)- 15 (high priority)
ignore_other_sm FALSE If TRUE other SMs on the subnet should be ignored
sminfo_polling_timeout 10000 Timeout in [msec] between two polls of active master SM
polling_retry_number 4 Number of failing polls of remote SM that declares it dead
honor_guid2lid_file FALSE If TRUE honor the guid2lid file when coming out of standby state, if such file exists and is valid
max_wire_smps 4 Maximum number of SMPs sent in parallel
max_wire_smps2 4 Maximum number of timeout based SMPs allowed to be outstanding
A value less than or equal to max_wire_smps disables this mechanism
max_smps_timeout 600000 The timeout in [usec] used for sending SMPs above max_wire_smps limit and below max_wire_smps2 limit
transaction_timeout 200 The maximum time in [msec] allowed for a transaction to complete
transaction_retries 3 The maximum number of retries allowed for a transaction to complete
long_transaction_timeout 500 The maximum time in [msec] allowed for a “long” transaction to complete
Currently, long transaction is only a set of optimized SL2VLMappingTable
max_msg_fifo_timeout 10000 Maximal time in [msec] a message can stay in the incoming message queue.
If there is more than one message in the queue and the last message stayed in the queue more than this value, any SA request will be immediately be dropped but BUSY status is not currently returned.
daemon FALSE Daemon mode
sm_inactive FALSE Subnet Inactive
babbling_port_policy FALSE Babbling Port Policy
drop_event_subscriptions FALSE Drop event subscriptions (InformInfo and ServiceRecord) on port removal and SM coming out of STANDBY
ipoib_mcgroup_creation_validation TRUE Validate IPoIB non-broadcast group creation parameters against broadcast group parameters per IETF RFC 4391 (default TRUE)
mcgroup_join_validation TRUE Validate multicast join parameters against multicast group parameters when MC group already exists
use_original_extended_sa_rates_only FALSE Use original extended SA rates only
The original extended SA rates are up through 300 Gbps (12x EDR)
Set to TRUE for subnets with old kernels/drivers that don’t understand the new SA rates for 2x link width and/or HDR link speed (19-22)
use_optimized_slvl FALSE Use Optimized SLtoVLMapping programming if supported by device
fsync_high_avail_files TRUE Sync in memory files used for high availability with storage
perfmgr FALSE Enable Performance Manager
perfmgr_redir TRUE Enable Redirection
perfmgr_sweep_time_s 180 sweep time in seconds
perfmgr_max_outstanding_queries 500 Max outstanding queries
perfmgr_ignore_cas FALSE Ignore CAs on sweep
perfmgr_rm_nodes TRUE Remove missing nodes from DB
perfmgr_log_errors TRUE Log error counters to opensm.log
perfmgr_query_cpi TRUE Query PerfMgt Get(ClassPortInfo) for extended capabilities
Extended capabilities include 64 bit extended counters and transmit wait support
perfmgr_xmit_wait_log FALSE Log xmit_wait errors
perfmgr_xmit_wait_threshold 65535 If logging xmit_wait’s; wait threshold
event_db_dump_file   Dump file to dump the events to
event_plugin_name   Event plugin name(s)
event_plugin_options   Options string that would be passed to the plugin(s)
node_name_map_name   Node name map for mapping node’s to more descriptive node descriptions
Refer to man ibnetdiscover for more info
log_flags 0x03 The log flags used
force_log_flush FALSE Force flush of the log file after each log message
log_file /var/log/opensm.log Log file to be used
log_max_size 0 Limit the size of the log file in MB. If overrun, log is restarted
accum_log_file TRUE If TRUE will accumulate the log over multiple OpenSM sessions
per_module_logging_file /etc/rdma/per-module-logging.conf Per module logging configuration file
Each line in config file contains where module_name is file name including .c
separator is either = , space, or tab
log_flags is the same flags as used in the coarse/overall logging
dump_files_dir /var/log/ The directory to hold the file OpenSM dumps
enable_quirks FALSE If TRUE enables new high risk options and hardware specific quirks
no_clients_rereg FALSE If TRUE disables client reregistration
disable_multicast FALSE If TRUE OpenSM should disable multicast support and no multicast routing is performed if TRUE
exit_on_fatal TRUE If TRUE opensm will exit on fatal initialization issues
console off Accepted Values: off, local
console_port 10000 Telnet port for console
qos FALSE Enable QoS setup
qos_policy_file /etc/rdma/qos-policy.conf QoS policy file to be used
suppress_sl2vl_mad_status_errors FALSE Suppress QoS MAD status errors
qos_max_vls 0 QoS default options
qos_high_limit -1 QoS default options
qos_vlarb_high   QoS default options
qos_vlarb_low   QoS default options
qos_sl2vl   QoS default options
qos_sw0_max_vls 0 QoS Switch Port 0 options
qos_sw0_high_limit -1 QoS Switch Port 0 options
qos_sw0_vlarb_high   QoS Switch Port 0 options
qos_sw0_vlarb_low   QoS Switch Port 0 options
qos_sw0_sl2vl   QoS Switch Port 0 options
qos_swe_max_vls 0 QoS Switch external ports options
qos_swe_high_limit -1 QoS Switch external ports options
qos_swe_vlarb_high   QoS Switch external ports options
qos_swe_vlarb_low   QoS Switch external ports options
qos_swe_sl2vl   QoS Switch external ports options
qos_rtr_max_vls 0 QoS Router ports options
qos_rtr_high_limit -1 QoS Router ports options
qos_rtr_vlarb_high   QoS Router ports options
qos_rtr_vlarb_low   QoS Router ports options
qos_rtr_sl2vl   QoS Router ports options
congestion_control FALSE Enable Congestion Control Configuration
cc_key 0x0000000000000000 CCKey to use when configuring congestion control
note that this does not configure a new CCkey, only the CCkey to use
cc_max_outstanding_mads 500 Congestion Control Max outstanding MAD
cc_sw_cong_setting_control_map 0x0 Control Map - bitmask indicating which of the following are to be used
* bit 0 - victim mask
* bit 1 - credit mask
* bit 2 - threshold + packet size
* bit 3 - credit starvation threshold + return delay valid
* bit 4 - marking rate valid
cc_sw_cong_setting_victim_mask 0x0000000000000000000000000000000000000000000000000000000000000000 Victim Mask - 256 bit mask representing switch ports, mark packets with FECN whether they are the source or victim of congestion
* bit 0 - port 0 (enhanced port)
* bit 1 - port 1

* bit 254 - port 254
* bit 255 - reserved
cc_sw_cong_setting_credit_mask 0x0000000000000000000000000000000000000000000000000000000000000000 Credit Mask - 256 bit mask representing switch ports to apply credit starvation
* bit 0 - port 0 (enhanced port)
* bit 1 - port 1

* bit 254 - port 254
* bit 255 - reserved
cc_sw_cong_setting_threshold 0x00 Threshold - value indicating aggressiveness of congestion marking
0x0 - none, 0x1 - loose, …, 0xF - aggressive
cc_sw_cong_setting_packet_size 0 Packet Size - any packet less than this size will not be marked with a FECN
Units are in credits
cc_sw_cong_setting_credit_starvation_threshold 0X00 Credit Starvation Threshold - value indicating aggressiveness of credit starvation
Accepted Values: 0x0 (none), 0x1 (loose), …, 0xF (aggressive)
cc_sw_cong_setting_credit_starvation_return_delay 0:00 Credit Starvation Return Delay - in CCT entry shift:multiplier format, see IB spec
cc_sw_cong_setting_marking_rate 0 Marking Rate - mean number of packets between markings
cc_ca_cong_setting_port_control 0x0000 Port Control
bit 0 = 0, QP based congestion control
bit 0 = 1, SL/port based congestion control
cc_ca_cong_setting_control_map 0x0000 Control Map - 16 bit bitmask indicating which SLs should be configured
cc_ca_cong_setting_ccti_timer 0 0  
cc_ca_cong_setting_ccti_increase 0 0  
cc_ca_cong_setting_trigger_threshold 0 0  
cc_ca_cong_setting_ccti_min 0 0  
cc_cct   Comma separated list of CCT entries representing CCT.
Format is shift:multipler,shift_multiplier,shift:multiplier,…
prefix_routes_file /etc/rdma/prefix-routes.conf Prefix routes file name
consolidate_ipv6_snm_req FALSE