Written by Philip L Yuson
- Ensure that SLA's are met at reasonable cost.
- Measure achieved availability
- Continuous improvement of availability
Availability ensures that service has little downtime and rapid service recovery.
Reliability is availability for an agreed period without interruption. It aims to prevent downtime through reliable components, resiliency or the ability to continue operation despite failure of one or more components and preventive maintenance.
Maintainability and recoverability are activities to keep the services in operation
Serviceability deals with the services from external service providers
Availability Management Activities
- Availability service levels are used from Service Level Management.
- Planning determines the availability requirements. It also includes designing for availability and recoverability.
- Monitoring measures the actual availability of the service. Measures include:
- Mean time to repair (MTTR) which is the measure of time when service is not available to the time service becomes available. It is referred as downtime.
- Mean time between failure (MTBF) refers to the time when the service is available. It is also referred as uptime.
- Mean time between system incidents (MTBSI) is the time between previous and next incident (MTTR + MTBF)
- Reporting provides reports on availability (MTTR, MTBF, MTBSI), overall uptime and downtime, number of faults and other information on why the availability SLA was not met.