21 November – VMware engineering has given a fix for this issue. Results are posted at the end of this post.

17 October – Currently this issue is under investigation by VMware and the SR has been referred to VMware engineering. The workaround for this issue at this time is to not use the provision VMkernel interface and TCP/IP stack.

Part of a solution for a customer was the ability to perform migrations of VM’s between vCenter Servers. Furthermore, company policy dictates the management network is used only for management purposes.

By default, data for VM cold migration, cloning, and snapshots is transferred through the management network. This traffic is called provisioning traffic. On a host, you can dedicate a separate VMkernel interface to the provisioning traffic, for example, to isolate this traffic on another VLAN.

To comply with the policy the design used the provisioning VMkernel interface and TCP/IP stack to isolate the provisioning traffic to another VLAN.

When performing the validation of the design we ran into an issue with the Cross vCenter vMotion possibilities. If the VM was powered on the x-vCenter vMotion would perform successful but when the VM was powered off the x-vCenter vMotion would fail with the error ‘Cannot connect to host’.

web_client_error

Because the failure happened when the VM was powered off we immediately suspected the provisioning vmkernel interface and TCP/IP stack. We double checked the configuration and checked if the vmkernel interfaces could reach each other.

esxa01_vmkernel_config esxb01_vmkernel_config

esxa01_firewall esxa01_firewall_out

esxa01_ping_esxb01 esxb01_ping_esxa01

After this we examined the vpxa log on the source host and found connection errors between the provisioning vmkernels on the source and destination hosts.

esxa01_vpxa_log_error esxa01_vpxa_log_error_2

Because pinging between the vmkernel interfaces worked, we wanted to verify if the provisioning network packets reached the vmkernel interface. To verify this we used the pktcap-uw tool which is included by default in ESXi 5.5 and later versions. The pktcap-uw tool is an enhanced packet capture and analysis tool that can be used in place of the legacy tcpdump-uw tool.

With the pktcap-uw tool we generated receive and transmit traffic captures on the provisioning vmkernel interfaces on both the source and destination hosts. The capture files were analyzed with wireshark to verify if the provisioning network packets are exchanged between the hosts.

esxb01_pcap_receive

The picture above is taken from the receive packet capture on the destination host. As you can see packets are received from the IP address 192.168.13.10  on port 902. This is the IP address of the provisioning vmkernel interface on the source host and port 902 is used for the provisioning traffic (NFC).

Because traffic was flowing between the vmkernel interfaces we checked if the NFC service is listening to accept connections on the provisioning vmkernel interfaces.

esxa01_ssh_esxb01_prov esxb01_ssh_esxa01_prov

The NFC service was not listening on the provisioning vmkernel interfaces on both hosts. To verify if the NFC service was listening on the host we performed the same test on the management vmkernel interfaces.

esxa01_ssh_esxb01_mgmt esxb01_ssh_esxa01_mgmt

This time the test was successful. It looks like the NFC service only responds to incoming connections on the management vmkernel interfaces. To further investigate this issue we opened a SR at VMware.

The SR was transfered to VMware engineering and we had to wait a very long time.

The fix VMware engineering provided us was to increase the maximum memory that could be used by the NFC process on the vSphere hosts. These commands where:

“grpID=$(vsish -e set /sched/groupPathNameToID host vim vmvisor nfcd | cut -d’ ‘ -f 1)”
“vsish -e set /sched/groups/$grpID/memAllocationInMB max=16”

And to check the result:

“vsish -e get /sched/groups/$grpID/memAllocationInMB”

esxi_commands

After configuring the hosts with this, I performed a new x-vCenter vMotion action and this time at was successful. We decided to do a few more between different hosts and all of these were successful.

vmotion_success.PNG

Many thanks to the VMware support representative for keeping the SR open and giving us updates on the progress!

Advertisements

3 thoughts on “Cross vCenter vMotion – Cannot connect to host

  1. Hi, we are experiencing a similar issue where the “online vMotion” is failing when attemting it via XvC.

    When trying to increase the memory as described in the article, i get the below error.
    VSISHCmdSet():Set failed: Failure

    Any idea why we would get that error? we are running ESXi 6u3

    [root@ESXi1:~] grpID=$(vsish -e set /sched/groupPathNameToID host vim vmvisor nfcd | cut -d. . -f 1)
    [root@ESX1:~] vsish -e set /sched/groups/$grpID/memAllocationInMB max=16
    VSISHCmdSet():Set failed: Failure
    [root@ESX1:~] vsish -e get /sched/groups/$grpID/memAllocationInMB
    memsched-allocation {
    min:1048096
    max:1048096
    shares:1073741823
    minLimit:1048096
    units:units: 3 -> mb
    }

    Thanks in advance.

    Like

    1. Hi, i’ve only tested this with ESXi 6.0 Update 2 and not with ESXi 6.0 Update 3. Maybe they have changed something in the update that breaks this. The release notes do not mention anything about it.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s