Hi All,
I am trying to set up a 3 node cluster in Nutanix CE 2.1 but I don’t seem to be able to get past the creation of the auth certificates, my set up is as below:
Dell PowerEdge R7515 2U Rackmount Server
1 x AMD EPYC 7H12 64Core 2.6GHz CPU
512GB (8x64GB PC4-3200AA-R) DDR4 RAM
6 x 1.6TB SSD PCIe NVMe Gen 4 2.5" Dell Enterprise Hot-Swap SSD - used for Data
PERC S150 Software Controller for NVMe Drives
x24 way Backplane for up to 24 x 2.5" PCIe NVMe Hot Swap Drives
Dell BOSS Controller Card w/ 2 x 240GB M.2 SATA SSD - used for AHV
Broadcom 5720 1Gb Dual Port Onboard LOM (2x1GB RJ45 Ports)
Broadcom 57414 Dual Port Mezz LOM (2x10/25GB SFP28 Ports)
1x 960GB PCIe NVMe Gen 4 2.5” - used for CVM
these are connected to a Dell SF-S5148-ON 25GB switch which is then uplinked to a Sonicwall TZ470 firewall.
the issue I seem to be seeing is that when creating the cluster, the CVM is unable to connect to the control master and then is unable to copy the certificate files via SCP, I have checked communication across the hosts/ CVMs and I am able to ssh between them, I also notice in the logs (I have copied in below) that there is an error for ‘hostname contains invalid characters’ yet the hostnames are the default hostnames the CVM’s are given?
I can create a single node cluster fine so I don’t believe it is a hardware issue or compatibility issue and all disks and network cards have loaded correctly.
I have basically followed all documentation/ video guides/ blog guides I have found on Nutanix CE 2.1 and followed the process as it is and have not modified anything so not sure what is going wrong.
Any help would be greatly appreciated!
Genesis logs for reference:
2025-04-22 14:30:27,689Z INFO 42674176 node_manager.py:9868 Fetching CVM ip configuration for the following CVM's h'10.100.0.21', '10.100.0.22', '10.100.0.23']to be populated into node discovery information
2025-04-22 14:30:27,695Z INFO 40540768 node_manager.py:7908 Fetching external IP configuration
2025-04-22 14:30:27,695Z ERROR 40540768 node_manager.py:8169 Zookeeper mapping is unconfigured
2025-04-22 14:30:27,705Z INFO 40540768 ipv4config.py:1059 Netmask is 255.255.255.0
2025-04-22 14:30:27,723Z INFO 40540768 ipv4config.py:1093 Discovered network information: hwaddr 52:54:00:74:a7:02, address 10.100.0.21, netmask 255.255.255.0, gateway 10.100.0.1, vlan None
2025-04-22 14:30:28,025Z INFO 40540768 kvm_utils.py:185 Interface with mac address 52:54:00:74:a7:02 does not have vlan id
2025-04-22 14:30:28,026Z INFO 40540768 node_manager.py:7960 Returning IP configuration for CVM: {'address': '10.100.0.21', 'netmask': '255.255.255.0', 'gateway': '10.100.0.1', 'vlan': None}, hypervisor: {'address': '10.100.0.11', 'netmask': '255.255.255.0', 'gateway': '10.100.0.1'}, IPMI: {'address': '10.100.50.11', 'netmask': '255.255.255.0', 'gateway': '10.100.50.1'}
2025-04-22 14:30:28,047Z INFO 42674176 node_manager.py:9874 Discovered unconfigured svms: u'10.100.0.21', '10.100.0.22', '10.100.0.23']
2025-04-22 14:30:28,526Z INFO 40540768 node_manager.py:4814 Following services are active for firewall op: e'GenesisGatewayServer']
2025-04-22 14:30:28,527Z INFO 40540768 common_utils.py:1007 Attempting to get IPv4 subnet for CVM iface eth0
2025-04-22 14:30:28,537Z INFO 40540768 ipv4config.py:1059 Netmask is 255.255.255.0
2025-04-22 14:30:28,555Z INFO 40540768 ipv4config.py:1093 Discovered network information: hwaddr 52:54:00:74:a7:02, address 10.100.0.21, netmask 255.255.255.0, gateway 10.100.0.1, vlan None
2025-04-22 14:30:28,555Z INFO 40540768 utils.py:288 Static port 2030 has been flagged for ipset relaxation
2025-04-22 14:30:28,555Z INFO 40540768 utils.py:291 Added new network definition
key:MGMT_NODE_IPS_NOSET
value:{'interface': 'eth0', 'source4': '10.100.0.0/24', 'source6': '::/0', 'ipset_name': '', 'tcp': _], 'udp': ]}
2025-04-22 14:30:28,794Z INFO 40540768 salt.py:289 Executing salt call for CVM
2025-04-22 14:30:30,796Z INFO 40540768 salt.py:298 Successfully executed salt command
2025-04-22 14:30:30,796Z INFO 40540768 node_manager.py:7280 iptables rules applied for state kBaseConfig
2025-04-22 14:30:30,796Z ERROR 40540768 genesis_utils.py:3028 Unable to fetch cluster_functions from cached proto
2025-04-22 14:30:36,321Z INFO 41283456 genesis_utils.py:6307 Creating cluster certificates for 0006335e-d344-3a3d-7ee5-bc97e10924e0, attempt 1
2025-04-22 14:30:36,321Z INFO 41283456 genesis_utils.py:6358 Root certificates already exist
2025-04-22 14:30:36,322Z INFO 41283456 genesis_utils.py:6370 ICA certificates already exist
2025-04-22 14:30:36,322Z ERROR 41283456 genesis_utils.py:341 Failed to get CVM id
2025-04-22 14:30:36,322Z INFO 41283456 genesis_utils.py:6422 Getting IP information
2025-04-22 14:30:36,331Z INFO 41283456 ipv4config.py:1059 Netmask is 255.255.255.0
2025-04-22 14:30:36,348Z INFO 41283456 ipv4config.py:1093 Discovered network information: hwaddr 52:54:00:74:a7:02, address 10.100.0.21, netmask 255.255.255.0, gateway 10.100.0.1, vlan None
2025-04-22 14:30:36,348Z INFO 41283456 genesis_utils.py:6426 Current IP: 10.100.0.21
2025-04-22 14:30:36,348Z INFO 41283136 genesis_utils.py:6828 Setting up certificates on 10.100.0.22
2025-04-22 14:30:36,349Z INFO 41283616 genesis_utils.py:6828 Setting up certificates on 10.100.0.23
2025-04-22 14:31:06,989Z WARNING 41283136 command.py:226 Timeout executing scp -q -o CheckHostIp=no -o ConnectTimeout=15 -o StrictHostKeyChecking=no -o TCPK
eepAlive=yes -o UserKnownHostsFile=/dev/null -o ControlPath=/home/nutanix/.ssh/controlmasters/tmp12vyamcw -o PreferredAuthentications=publickey -o IdentityFile=/home/nutanix/.ssh/id_rsa -r /home/certs/ica.crt 'nutanix@t10.100.0.22]:/home/certs/ica.crt': 30 secs elapsed
2025-04-22 14:31:06,989Z ERROR 41283136 genesis_utils.py:6907 Unable to send CA information to noderet: -1 out: b'' err: b''
2025-04-22 14:31:06,989Z ERROR 41283136 genesis_utils.py:6830 Unable to passwordless copy to other nodes, trying default password
2025-04-22 14:31:07,013Z WARNING 41283616 command.py:226 Timeout executing scp -q -o CheckHostIp=no -o ConnectTimeout=15 -o StrictHostKeyChecking=no -o TCPKeepAlive=yes -o UserKnownHostsFile=/dev/null -o ControlPath=/home/nutanix/.ssh/controlmasters/tmp4jwwp84j -o PreferredAuthentications=publickey -o IdentityFile=/home/nutanix/.ssh/id_rsa -r /home/certs/ica.crt 'nutanix@r10.100.0.23]:/home/certs/ica.crt': 30 secs elapsed
2025-04-22 14:31:07,013Z ERROR 41283616 genesis_utils.py:6907 Unable to send CA information to noderet: -1 out: b'' err: b''
2025-04-22 14:31:07,013Z ERROR 41283616 genesis_utils.py:6830 Unable to passwordless copy to other nodes, trying default password
2025-04-22 14:31:36,350Z ERROR 41283456 genesis_utils.py:6804 Timed out for nodes {<Future at 0x7f3bfdfd8ac0 state=running>, <Future at 0x7f3bfdfd8970 state=running>}
2025-04-22 14:31:39,166Z WARNING 41283136 command.py:367 Timeout executing scp -q -o CheckHostIp=no -o ConnectTimeout=15 -o StrictHostKeyChecking=no -o TCPKeepAlive=yes -o UserKnownHostsFile=/dev/null -o PreferredAuthentications=keyboard-interactive,password -o NumberOfPasswordPrompts=1 -r /home/certs/ica.crt 'nutanix@e10.100.0.22]:/home/certs/ica.crt': 30 secs elapsed
2025-04-22 14:31:39,166Z ERROR 41283136 genesis_utils.py:6907 Unable to send CA information to noderet: -1 out: b'' err: b''
2025-04-22 14:31:39,166Z ERROR 41283136 genesis_utils.py:6836 Failed to set up password based connection with 10.100.0.22 to transfer certificates
2025-04-22 14:31:39,178Z WARNING 41283616 command.py:367 Timeout executing scp -q -o CheckHostIp=no -o ConnectTimeout=15 -o StrictHostKeyChecking=no -o TCPKeepAlive=yes -o UserKnownHostsFile=/dev/null -o PreferredAuthentications=keyboard-interactive,password -o NumberOfPasswordPrompts=1 -r /home/certs/ica.crt 'nutanix@/10.100.0.23]:/home/certs/ica.crt': 30 secs elapsed
2025-04-22 14:31:39,178Z ERROR 41283616 genesis_utils.py:6907 Unable to send CA information to noderet: -1 out: b'' err: b''
2025-04-22 14:31:39,178Z ERROR 41283616 genesis_utils.py:6836 Failed to set up password based connection with 10.100.0.23 to transfer certificates
2025-04-22 14:31:39,179Z WARNING 41283456 genesis_utils.py:6437 Retrying for failed copy on <Future at 0x7f3bfdfd8ac0 state=finished returned bool>, <Future at 0x7f3bfdfd8970 state=finished returned bool>]
2025-04-22 14:31:44,180Z INFO 41283616 genesis_utils.py:6828 Setting up certificates on <Future at 0x7f3bfdfd8ac0 state=finished returned bool>
2025-04-22 14:31:44,181Z INFO 41283136 genesis_utils.py:6828 Setting up certificates on <Future at 0x7f3bfdfd8970 state=finished returned bool>
2025-04-22 14:31:44,285Z ERROR 41283616 ssh_client.py:671 Error connecting through control master
2025-04-22 14:31:44,285Z INFO 41283616 genesis_utils.py:6886 Node <Future at 0x7f3bfdfd8ac0 state=finished returned bool> hasn't upgraded yet, using the legacy command
2025-04-22 14:31:44,285Z ERROR 41283616 genesis_utils.py:6889 Error when detecting if legacy path should be taken: b'Could not connect through control master'
2025-04-22 14:31:44,289Z ERROR 41283136 ssh_client.py:671 Error connecting through control master
2025-04-22 14:31:44,289Z INFO 41283136 genesis_utils.py:6886 Node <Future at 0x7f3bfdfd8970 state=finished returned bool> hasn't upgraded yet, using the legacy command
2025-04-22 14:31:44,289Z ERROR 41283136 genesis_utils.py:6889 Error when detecting if legacy path should be taken: b'Could not connect through control master'
2025-04-22 14:31:44,389Z ERROR 41283616 ssh_client.py:671 Error connecting through control master
2025-04-22 14:31:44,389Z ERROR 41283616 genesis_utils.py:6898 Unable to create directory for CA info cmd: sudo mkdir -p /home/certs out: b'' err: b'Could not connect through control master'
2025-04-22 14:31:44,389Z ERROR 41283616 genesis_utils.py:6830 Unable to passwordless copy to other nodes, trying default password
2025-04-22 14:31:44,393Z ERROR 41283136 ssh_client.py:671 Error connecting through control master
2025-04-22 14:31:44,393Z ERROR 41283136 genesis_utils.py:6898 Unable to create directory for CA info cmd: sudo mkdir -p /home/certs out: b'' err: b'Could not connect through control master'
2025-04-22 14:31:44,393Z ERROR 41283136 genesis_utils.py:6830 Unable to passwordless copy to other nodes, trying default password
2025-04-22 14:31:44,662Z INFO 41283616 genesis_utils.py:6886 Node <Future at 0x7f3bfdfd8ac0 state=finished returned bool> hasn't upgraded yet, using the legacy command
2025-04-22 14:31:44,662Z ERROR 41283616 genesis_utils.py:6889 Error when detecting if legacy path should be taken: b'hostname contains invalid characters\r\n'
2025-04-22 14:31:44,666Z INFO 41283136 genesis_utils.py:6886 Node <Future at 0x7f3bfdfd8970 state=finished returned bool> hasn't upgraded yet, using the legacy command
2025-04-22 14:31:44,666Z ERROR 41283136 genesis_utils.py:6889 Error when detecting if legacy path should be taken: b'hostname contains invalid characters\r\n'
2025-04-22 14:31:44,936Z ERROR 41283616 genesis_utils.py:6898 Unable to create directory for CA info cmd: sudo mkdir -p /home/certs out: b'' err: b'hostname contains invalid characters\r\n'
2025-04-22 14:31:44,936Z ERROR 41283616 genesis_utils.py:6836 Failed to set up password based connection with <Future at 0x7f3bfdfd8ac0 state=finished returned bool> to transfer certificates
2025-04-22 14:31:44,938Z ERROR 41283136 genesis_utils.py:6898 Unable to create directory for CA info cmd: sudo mkdir -p /home/certs out: b'' err: b'hostname contains invalid characters\r\n'
2025-04-22 14:31:44,938Z ERROR 41283136 genesis_utils.py:6836 Failed to set up password based connection with <Future at 0x7f3bfdfd8970 state=finished returned bool> to transfer certificates
2025-04-22 14:31:44,938Z ERROR 41283456 genesis_utils.py:6802 Failed to get certs to nodes F<Future at 0x7f3bfdfd8970 state=finished returned bool>, <Future at 0x7f3bfdfd8ac0 state=finished returned bool>]
2025-04-22 14:31:44,939Z ERROR 41283456 genesis_utils.py:6442 Failed to copy certs for 4<Future at 0x7f3bfdfd8ac0 state=finished returned bool>, <Future at 0x7f3bfdfd8970 state=finished returned bool>]