A More Faithful Formal Definition of the Desired Property for Distributed Snapshot Algorithms to Model Check the Property

keywords: Distributed snapshot algorithm, reachability, state machine, property specification, model checking
The first distributed snapshot algorithm was invented by Chandy and Lamport: Chandy-Lamport distributed snapshot algorithm (CLDSA). Distributed snapshot algorithms are crucial components to make distributed systems fault tolerant. Such algorithms are extremely important because many modern key software systems are in the form of distributed systems and should be fault tolerant. There are at least two desired properties such algorithms should satisfy: 1) the distributed snapshot reachability property (called the DSR property) and 2) the ability to run concurrently with, but not alter, an underlying distributed system (UDS). This paper identifies subtle errors in a paper on formalization of the DSR property and shows how to correct them. We give a more faithful formal definition of the DSR property; the definition involves two state machines -- one state machine M_UDS that formalizes a UDS and the other M_CLDSA that formalizes the UDS on which CLDSA is superimposed (UDS-CLDSA) -- and can be used to more precise model checking of the DSR property for CLDSA. We also prove a theorem on equivalence of our new definition and an existing one that only involves M_CLDSA to guarantee the validity of the existing model checking approach. Moreover, we prove the second property, namely that CLDSA does not alter the behaviors of UDS.
mathematics subject classification 2000: 68N30
reference: Vol. 38, 2019, No. 5, pp. 1009–1038