Use of Self-Healing Techniques for Highly-Available Distributed Monitoring

keywords: Monitoring, self-healing, distributed systems, reliability, high availability
The paper addresses the self-healing aspects of the monitoring systems. Nowadays, when the complex distributed systems are concerned, the monitoring system should become ``intelligent'' -- as the first step it can guide the user what should be monitored. The next level of the ``intelligence'' can be described by the term ``self-healing''. The goal is to provide the capability that a decision made automatically by the monitoring system should force the system under monitoring to behave more stable, reliable and predictable. In the paper a new monitoring system is presented: AgeMon is an agent based, distributed monitoring system with strictly defined roles which can be performed by the agents. In the paper we discuss self-healing in the context of monitoring. When the self-healing of the monitoring system is concerned, a good example is the case where it is possible to lose the monitoring data due to the storage problems. AgeMon can handle such problems and automatically elects substitute persistence agents to store the data.
mathematics subject classification 2000: 68-M14, 68-M15
reference: Vol. 37, 2018, No. 2, pp. 424–456