I have two virtual machines in different cloud services. They are both D12. They are both configured identically. The local SSD D: drives both perform at tens of thousands of IOs/sec. The difference is on the C: drive performance.
Server A:
C:\>"c:\Program Files (x86)\SQLIO\sqlio.exe" -kW -t4 -s10 -f32 -b8 -BN
sqlio v1.5.SG
...
IOs/sec: 289.76
MBs/sec: 2.26
Not great, but I have the SSD and attached Data disks for high performance I/O. However compare to the other identical VM:
Server B:
C:\>"c:\Program Files (x86)\SQLIO\sqlio.exe" -kW -t4 -s10 -f32 -b8 -BN
sqlio v1.5.SG
...
IOs/sec: 27.65
MBs/sec: 0.21
Terrible slow! 10x slower than the other identical VM, and just a proof of what I had already suspected since simple OS operations were taking forever! I also see errors in the event log stating OS operations were taking much longer than expected and there is likely a hardware failure.
So, even though I don't have to manage my own hardware when using Azure, all VM's ultimately run on hardware and hardware is subject to failures, so my question is:
What to do when a VM is running on bad hardware?
Additionally, why isn't Microsoft Azure monitoring the underlying hardware and either doing something about it or at least notifying me, especially for disk IO failures which should be easy to spot?