Update manager woes

Had a problem this evening where an ESXi host would choke on the 6.5 U2 update. vSphere reported “The host returns esxupdate error code:15. The package manager transaction is not successful. Check the Update Manager log files and esxupdate log files for more details.”  Which lead me to https://kb.vmware.com/kb/2030665

esxupdate.log had the Python errors listed in that KB. Quick fix right? No, that would be too easy. After following the easy instructions in that KB where you just delete the /locker/packages/6.5.0 directory. I kicked off VUM and was presented with the same error. Tried the long fix. Recreated the 6.5.0 structure from a good host. Again same error.

Some digging lead me to https://blog.definebroken.com/2017/07/28/patching-vsphere-esxi-to-6-5u1-failing-with-error-15-cause-ran-out-of-vfat-space-due-to-vsantrace-and-vsanobserver-files/

He had a problem with vsantrace files.  I checked and that wasn’t my problem.  But I did cause me to watch the vmkernel.log to see if there were any clues there.

Sure enough I was getting a bunch of out of space errors:

2018-08-11T04:28:18.037Z cpu23:124089)WARNING: VFAT: 313: VFAT volume mpx.vmhba32:C0:T0:L0:8 (UUID 597f645d-2327f2da-1218-246e963e79d0) is full.  (585696 sectors, 0 free sectors)

Did some du work and found that /store/locker/core was taking up ~80MB

[root@HOSTA1:/vmfs/volumes/597f645d-2327f2da-1218-246e963e79d0] du -h .
256.0K ./packages/var/db/locker/vibs
256.0K ./packages/var/db/locker/profiles
768.0K ./packages/var/db/locker
1.0M ./packages/var/db
1.3M ./packages/var
1.5M ./packages
80.2M ./var/core
80.4M ./var
256.0K ./epd-new
82.4M .

Sure enough there was a zdump file in var/core

A simple rm and then everything worked.

 

I didn’t catch this in my initial troubleshooting because the VFAT partition reported as 28% in use.

Since I was patching ESX itself the first step is to delete and recreate the vmtools and floppy images vCenter uses for client OS installation. These are stored in the /store/packages/<ESXversion> directory. This freed a bit of space, but all the space on the partition was consumed when it copied the updated versions and packages for installation.

Sometimes it is just a different simple fix.

Advertisements

Unassociated vSAN objects

Since mid 2017 I have been aware of an issue with Vmware Horizon that when a VM is deleted files are left behind.  When Horizon creates a new machine with the same name a new folder is created with an _1 appended to the end (or _2, _3, …  if this machine has been deleted multiple times.) It seems this has been an issue with Horizon since v6.0 and vmware has a KB article for a work around (KB2108928)

That work around isn’t great,  it is manual but works in a traditional storage environment. An admin would console/ssh into an ESXi host and issue an rm -f command on the offending folders and be done. My virtual desktop VMs reside on a vSAN. Within a vSAN all the folders and vmdk files are objects; if I were to rm a folder on a vSAN datastore it would not delete the underlying objects and they would still consume space on the disks. The rm command is not vSAN aware.

I could load up the vSphere web console and delete the directories individually. I could even use the HTML5 interface and select multiple folders for deletion simultaneously. In either case I need to check each individual folder to verify it is no longer in use.

There has to be a better way.

 

Thankfully the vSAN engineers have a command that will list the status of every object stored on the vSAN. This command is aware which VM is associated with each object. Since the source VM has been deleted the objects remaining will be unassociated. Be careful with unassociated objects; any template, ISO, txt, ova,etc file that you have placed on your vsanDatastore that is not mounted or in use by a VM will be in the unassociated object list.

To get this list login to RVC on your vCenter appliance and execute:

vsan.obj_status_report . --print-uuids --print-table

(see https://blogs.vmware.com/vsphere/2014/07/managing-vsan-ruby-vsphere-console.html for info about RVC)

Initial header output:

20180713-vsan-obj_status_report-header

Unassociated objects are below the list of VMs:

20180713-vsan-obj_status_report-unassociated

We copy the list of unassociated objects into the clipboard.

 

Then on an ESXi host in the vSAN cluster we create a new file with vi and paste the contents in and save as unassociated.txt.

(I have not found a more elegant way of doing this, please let me know in the comments if you do, the esxcli vsan namespace commands are not aware of object and VM association)

We now have a file that has your object UUIDs and some display artifacts.

We do some text processing to remove those artifacts:

cat unassociated.txt | awk '{print $2}' > UUID.txt

 

Now we have a file with just the UUID of the unassociated objects.

Time to translate the UUID into something we can use to filter and narrow the list to just the objects we would like to remove. We use the objtool command and loop through the UUID.txt get the metadata on each object and output that to another file:

cat UUID.txt | while read UUID ; do echo -e "\nUUID: $UUID" ; /usr/lib/vmware/osfs/bin/objtool getAttr -u $UUID | grep -i 'friend\|class\|path'; done > uuid_status.txt

 

uuid_status.txt now contains four lines per object: UUID, Friendly name, Object Class, and Path:

20180713-uuid_status.txt

That’s not too useful if we want to filter this in a meaningful way.

Lets make a csv we can ingest into something else (or filter further)

awk '/UUID: /{if (x)print x;x="";}{x=(!x)?$0:x","$0;}END{print x;}' uuid_status.txt |sed 's/UUID: //g' |sed 's/User friendly name://g' |sed 's/Object class: //g' | sed 's/Object path: //g' | sed 's/,$//g' > uuid_status.csv

Thanks to http://www.theunixschool.com/2012/05/awk-join-or-merge-lines-on-finding.html Example 4 for the awk command.

This will create a csv file with each object taking up one line.

Further filtering needs to be done to only have the UUIDs of objects we truly wish to delete.

 

Positive match filtering:

Since all my VMs are created by Horizon they follow a naming pattern, and the friendly name and path are both based of the I have a easy job filtering.

In my case the objects all contain  VDI- at the beginning of the namespace or the filename

grep ',VDI-\|/VDI-' uuid_status.csv > uuid_to_delete.csv

 

Negative match filtering:

If I didn’t have the luxury of  positive match filtering I would have to generate my list based on exclusionary patterns

For example my Appvolumes vmdks are unassociated so I would filter out the appVolumes folder as well as the apps & writable template folders.  If I had an ISO folder I could exclude it. And I would always want to exclude the .vsan.stats object

grep -v appVolumes uuid_status.csv | grep -v _templates | grep -v ISOs | grep -v vsan.stats > uuid_to_delete.csv

 

(note: I’m not including the leading / for folder names. If you use “/foldername” as your grep filter it will not match the namespace object. Deleting the namespace object removes vCenter’s access the object)

To be more confident combine both positive and negative filtering:

grep ',VDI-\|/VDI-' uuid_status.csv | grep -v appVolumes | grep -v _templates | grep -v ISOs | grep -v vsan.stats > uuid_to_delete.csv

 

I like to do a visual check of my list so I load up the csv in excel and peruse the contents to be sure.

Once we have final list of files to delete in uuid_to_delete.csv we need to remove everything except the UUID:

cat uuid_to_delete.csv | awk -F , '{print $1}' > uuid_to_delete.txt

 

Now comes the point  of no return. I suggest verifying your backups prior to this step.

Deleting the objects:

cat uuid_to_delete.txt | while read UUID ; do /usr/lib/vmware/osfs/bin/objtool delete -u $UUID ; done

 

Yes a few of these steps could be combined. I separated them for instruction, and to reduce unintended consequences.