VSAN 6.2 UPGRADE ERROR

Update 30 March – VMware have released a KB article with steps how to solve this problem. VMware will be providing a permanent fix in due course. In the meantime they are providing a  script that will detect stranded objects, broken disk chains and objects with a CBT lock. Where possible, the script will then ask whether you want the issue to be fixed for you, align the objects, and bring everything to interim version 2.5. This will then allow the upgrade of the on-disk format to proceed to V3. All steps (as well as the script) can be found in the KB articles listed below.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144881

For a detailed description of the steps in the KB article Cormac Hogan has written a blog post.

http://cormachogan.com/2016/03/31/vsan-6-2-upgrade-failed-realign-objects/

With the announcement of GA of vSphere 6 Update 2 it was time to upgrade our lab environment to vSphere Update 2 and VSAN 6.2. Upgrading the PSC, VCSA and vSphere hosts went easily with no problems. What made me smile was that the VCSA upgrade 70% stuck GUI issue was fixed as well! Not only did VMware implement new features into Update 2 but also a lot off bug fixes were applied.

But now for the serious part, upgrading to VSAN 6.2. When you have updated your vCenter and vSphere hosts to Update 2 it is possible to upgrade your VSAN datastore. This is as easy as clicking one button. At least, it should be.. not for our lab environment 😦

The lab enviroment has been build when vSphere 6 GA was available and in the time after this it has been upgraded multiple times, used a lot and in occasion CPR had to be performed to get it back running.

All this made the automatic upgrade to VSAN 6.2 return with the error: “Failed to realign following VSAN objects”.

vsan_62_upgrade_failure

Luckily, after playing with the lab environment  I know my way around checking the current status of  VSAN. The health check plugin in the Web Client didn’t show any errors so I used the RVC on the vCenter Server Appliance to dig a little deeper.

I ran the  ‘vsan.check_state’ command the verify the status of the VSAN cluster. As shown in the picture below no immediate problems were found.

vsan_check_state

After this I decided to check the status of one of the objects presented in the error message with the ‘vsan.object_info’ command. The object presented in the error is a VMDK of the PSC. I random checked some other objects presented in the error and almost all of them where VMDK’s of VM’s on the VSAN datastore.

vsan_object_info

I knew it was possible to remove these objects from the VSAN datastore but since it were VMDK’s and for instance not folder or VM swap files this was not possible without breaking the VM. If you remove the object it will not be available anymore!

The objects that represented an VM swap file could be removed. I powered off the VM and logged on with SSH to a vSphere host in the VSAN cluster. On the host i used the ‘objtool’ command to remove the object.

esx_shell_remove_object.PNG

But after removing the objects I still had the problem that the remaining objects represented VMDK’s. Since we do not have any other storage in our lab environment then VSAN it was not possible to Storage vMotion alle these VM’s to different datastores. We use the default VSAN storage policy with an FTT configuration of ‘1’. With this configuration it is possible to survive 1 failure, eg disk group or host. With this in mind I decided to perform the upgrade to VSAN 6.2 manually.

When you create a new diskgroup it will automatically be created with the VSAN 6.2 on-disk format. Since I did not have any new disks I deciced to remove the disk groups one by one and recreate new disk groups with disks from the removed disk groups. Actually, this is the same as the automatic upgrade does!

When removing the disk group I decided to use the option to do a full data migration of the data on the disk group to another disk group. I think it should also be possible to skip this and let VSAN handle this as a disk group failure. The data will be resynchronised after the disk group has been removed, thus resulting in lower availability of the data for a period of time.

After a while I managed to upgrade all the disk groups to the new on-disk format and able to use the new VSAN 6.2 features. As it is a lab setup, we only have a hybrid VSAN config so not all new features are supported (deduplication, erasure coding).

Advertisements

2 thoughts on “VSAN 6.2 UPGRADE ERROR

    • No, I haven’t. Because it is our lab environment we do not have to ability to create a SR with VMware. It’s all licensed under NFR.
      Also I wanted to try to fix it myself to better understand the problem.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s