Non-Intrusive Data Inspection for Message-Based Systems

keywords: Message-based systems, message inspection, debugging distributed systems, ALICE O2 software, CERN, LHC
Over the years, research into debugging distributed systems with message passing communication has focused on verifying the implementation of functionality, such as race condition detection, and not on the exchanged data. In this paper we explore this previously undervalued approach. We present a new component to gather exchanged messages. We create a simplified model of message passing and the component's design based on it. Then, we discuss how to utilise the component to create tools which provide currently missing debugging information. In the end, we implement the component as part of the O2 framework and conduct benchmarks. We obtain promising results -- the component does not decrease the throughput.
mathematics subject classification 2000: 68W
reference: Vol. 40, 2021, No. 4, pp. 796–814