To improve your service, track the fundamentals, not the symptoms and side effects.
Measure those factors that you are ultimately aiming for.
Do you want to restore service within 1 hour, or do you want the service to never actually go down?
Do you want to respond to incidents within 15 minutes, or do you want to have no incidents in the first place?
Drive the proactive, not the reactive. Track those metrics which drive your goals. Monitor the underlying side effects and symptoms by exception. When the fundamentals are off track, address the causes, not the side effects.
[Originally published as part of the 99words project]