Hanemann, A., Sailer, M., Schmitz, D. (2005):
A Framework for Failure Impact Analysis and Recovery with Respect to Service Level Agreements
In today's IT service market customers urge providers to grant guarantees for quality of service (QoS)
which are laid down in Service Level Agreements (SLAs). To satisfy customers and to avoid penalties,
service providers have to ensure that the agreed SLAs are met. Therefore, it is necessary
to be able to effectively deal with resource failures which could endanger the SLAs by
affecting the provided services. The effort for recovering from failures should be selected
corresponding to the expected SLA violation costs.
In this paper we present a framework to automatically determine the impact of resource
failures with respect to services and service level agreements. We achieve this by monitoring the
service quality from inside and outside the service provider and also by incorporating information
about the current and expected future service usage. The expected costs of the resource failures
are assessed to select an appropriate recovery alternative. Besides this short term perspective
the impact analysis can also be employed to identify critical resources and to improve the
service provisioning.
|