Hanemann, A., Sailer, M., Schmitz, D. (2005):
A Framework for Failure Impact Analysis and Recovery with Respect to Service Level Agreements
In today's IT service market customers urge providers to grant guarantees for quality of service (QoS) which are laid down in Service Level Agreements (SLAs). To satisfy customers and to avoid penalties, service providers have to ensure that the agreed SLAs are met. Therefore, it is necessary to be able to effectively deal with resource failures which could endanger the SLAs by affecting the provided services. The effort for recovering from failures should be selected corresponding to the expected SLA violation costs.
In this paper we present a framework to automatically determine the impact of resource failures with respect to services and service level agreements. We achieve this by monitoring the service quality from inside and outside the service provider and also by incorporating information about the current and expected future service usage. The expected costs of the resource failures are assessed to select an appropriate recovery alternative. Besides this short term perspective the impact analysis can also be employed to identify critical resources and to improve the service provisioning.