Let’s say a host in your pool won’t restart a VM and freezes half way (that wonderful yellow icon). If you hit the console tab, it might be blank. If you hit the console tab of the host, it might also be blank. If you SSH in it may connect, but you can’t pass any xe commands. It just sits. If you attempt to migrate or stop a VM, it hangs. The host is essentially frozen but VMs are still running on it just fine.
This is all a pretty good sign the XAPI service on the host is hung up. XAPI is the XenServer management toolstack which pretty much controls everything on the XenServer host. If the “XenAPI” toolstack is hosed, XenCenter can’t talk to the host and you probably won’t be able to pass any xe commands. The Xen API is what controls everything at the host layer. Quick way to troubleshoot this:
1. SSH into the host with the issue.
2. Type:
df -h
which will show the disk space usage on the file system. The “-h
” switch will display it in gigabytes. Much easier to read. We need to check the root partition and see if it is full. This is typically 4 GB and can be filled up by logs which may cause the XAPI service to stop. If the XenServer root disk is full, you will probably see it drop out of XenCenter because XAPI is stopped. You won’t be able to restart the XAPI service until you free up some space. Here is an example of the root being 100% full:
Extra tip, once you log in to one XenServer host, you can check other hosts remotely without having to SSH into each one in a different terminal. Just type:
ssh
3. If the root is full like above, type:
cd /var/log
then
ls
to list the logs. Type:
du –ksh *.*
to list the logs with the sizes. If you find one that is too big, delete it:
rm
From here you can skip ahead below to step 6 and try restarting XAPI.
Also, you might want to consider moving your logs off to a different volume. If you fill your dom0 root, you’re basically hosing the XenServer. Citrix has a good article on how to move the /var/log
directory to a different volume here:
http://support.citrix.com/article/CTX130245
or retain fewer logs by editing logrotate.conf here:
http://support.citrix.com/article/CTX131619
4. If your root is not full, the next thing you probably want to do is disable HA. You can do this in the XenCenter console or you can just type:
xe pool-ha-disable
or if you want to disable HA on a host (you’ll have to run this on each host though):
host-emergency-ha-disable force=true
5. After disabling HA, restart the toolstack:
xe-toolstack-restart
This will disconnect all the hosts in the pool in XenCenter but don’t panic. Give it 10-20 seconds, once the toolstack is restarted the hosts will all reconnect to XenCenter. All pending actions like reboots, migrations, etc. will all stop when restarting the tool stack so you have a clean slate.
6. You should be able to console into your host with the issues now. Type:
service xapi status
and see if it is running. If you want to see how taxed XAPI is, type:
top
to see all the running processing. If XAPI is taking up 40% CPU or more, that is a good indication something is hung up on it.
If XAPI is not running or is very taxed, type:
service xapi restart
if it hangs at “Stopping xapi” or “Starting xapi”, you may need to kill the process.
Type:
kill
using the process ID from when you ran “service xapi status” or “top”. Then service xapi status to verify all xapi processes have stopped. Then you can type:
service xapi restart
again if it didn’t automatically try and start already. Eventually it will say:
Starting xapi: ....start-of-day complete. [ OK ]
and you should see the host pop back in your XenCenter console. If you go back and run top, xapi should be taking up around 1% or less CPU.
You can type:
xe task-list
to see all the running tasks which shouldn’t be much at this point. Don’t forget to re-enable HA after you’re done. Hope this helps someone.
kb
January 19, 2012 at 7:41 PM
Awesomeness! Thanks for this post, it was exactly what I needed.
Brett
January 27, 2012 at 12:22 AM
Great article. thanks!
Robb
February 24, 2012 at 2:45 PM
Thanks for posting this. I have, what I am sure, is a disk full issue, but I’m unable to ssh or console into the server so I can’t remove any logs to free up disk space.
Any thoughts how I can move forward? Will logrotate free up any space when it runs again at 4am, or will it fail as it doesn’t have anywhere to compress the files too?
And what are the reboot options? Good idea, bad idea? Is there a safe mode in XenServer 6 that might help?
Thanks in advance for any help/suggestions/thoughts…
-r
Cliff Hogan
March 3, 2012 at 7:37 PM
Excellent article. Although while encountering the issue I diagnosed it correctly based on http://support.citrix.com/article/CTX128316, this blog post provides what is missing from the Citrix KB article, i.e. how to start XAPI, which is essential if the host, the pool master in this case cannot be restarted. The kill command was the one that saved the day
Enzo
September 17, 2012 at 2:23 PM
As you said, you won’t be able to issue xe commands, so you won’t be able to stop HA in step 4. I have found that you can sometimes kill the stunnel processes, sometimes those are defunct and then you can restart xapissl and then do a tool stack restart:
killall -9 xapi
service xapissl restart
xe-toolstack-restart
Back in business.
alyami
December 26, 2012 at 7:19 AM
Thanks a million, greate help
Jonathan James
March 20, 2013 at 3:04 AM
Excellent post! Many many thanks for rescuing me!
Frank
June 20, 2013 at 4:17 AM
It solves my problem.
Thanks a lot.
Anu Skariah
May 20, 2014 at 12:03 PM
Thanks Man. Awesome article.. Solved my problem.
Bill
August 15, 2014 at 10:15 AM
Works great, thanks
Danie vi
September 13, 2014 at 1:53 AM
How can we extend the root disk ?
Have any solutions ?
kasch
October 6, 2014 at 3:33 AM
Great! Many many thanks for rescuing me!
Kelly M
May 1, 2015 at 10:36 AM
You, Sir, are pure awesomeness! I was quite perplexed as to how my server showed to be powered off but yet my VM’s were still running. The logs were not full, but the xapi service had stopped/failed and I couldn’t figure out how to get to the server to do any maintenance. The ssh access, then toolstack reset and restarting xapi solved the issue.