# **Runtime Instrumentation for Reactive Components**

## Luca Aceto $\boxdot$

- Reykjavik University, Reykjavik, Iceland
- Gran Sasso Science Institute, L'Aquila, Italy

## **Duncan Paul Attard**  $\boxdot$  **■**

University of Glasgow, Glasgow, UK

## **Adrian Francalanza** ⊠<sup>●</sup>

University of Malta, Msida, Malta

## **Anna Ingólfsdóttir** [#](mailto:annai@ru.is)

Reykjavik University, Reykjavik, Iceland

## **Abstract**

 Reactive software calls for instrumentation methods that uphold the reactive attributes of systems. Runtime verification imposes another demand on the instrumentation, namely that the trace event

sequences it reports to monitors are sound—that is, they reflect actual executions of the system under

scrutiny. This paper presents RIARC, a novel decentralised instrumentation algorithm for outline

- monitors meeting these two demands. The asynchronous setting of reactive software complicates the instrumentation due to potential trace event loss or reordering. RIARC overcomes these challenges
- using a next-hop IP routing approach to rearrange and report events soundly to monitors.

RIARC is validated in two ways. We subject its corresponding implementation to rigorous

 systematic testing to confirm its correctness. In addition, we assess this implementation via extensive empirical experiments, subjecting it to large realistic workloads to ascertain its reactiveness. Our

results show that RIARC optimises its memory and scheduler usage to maintain latency feasible for soft

real-time applications. We also compare RIARC to inline and centralised monitoring, revealing that

it induces comparable latency to inline monitoring in moderate concurrency settings, where software

performs long-running, computationally-intensive tasks, such as in Big Data stream processing.

- **2012 ACM Subject Classification** Software and its engineering → Software verification and validation
- **Keywords and phrases** Runtime instrumentation, decentralised monitoring, reactive systems
- **Digital Object Identifier** [10.4230/LIPIcs.CVIT.2016.23](https://doi.org/10.4230/LIPIcs.CVIT.2016.23)

**Acknowledgements** We thank the anonymous reviewers and the Artifact Evaluation Committee

for their constructive feedback. We thank Simon Fowler, Phil Trinder, and Keith Bugeja for their

 comments on improving this paper. This work was supported by EPSRC grant EP/T014628/1 (STARDUST).

<span id="page-0-0"></span>**1 Introduction**

<sup>34</sup> Modern software is generally built in terms of concurrent components that execute without relying on a global clock or shared state [\[90\]](#page-31-0). Instead, components interact via non-blocking

messaging, creating a loosely-coupled architecture known as a *reactive system* [\[8,](#page-27-0) [97\]](#page-31-1), which:

- responds in a timely manner (is *responsive*),
- remains available in the face of failure (is *resilient*),
- reacts to inputs from users or their environment (is *message-driven*), and
- grows and shrinks to accommodate varying computational loads (is *elastic*).
- The real-world behaviour of reactive systems is hard to understand statically, and *monitoring*
- is used to inspect their operation at *runtime*, *e.g.* for debugging [\[114\]](#page-32-0), security checking [\[63\]](#page-29-0),
- profiling [\[79\]](#page-30-0), resource usage analysis [\[37\]](#page-28-0), *etc.* This paper considers runtime verification (RV),
- an application of monitoring used to detect whether the *current* execution of a system under
- scrutiny (SuS) deviates from its correct behaviour [\[15,](#page-27-1) [74,](#page-30-1) [21\]](#page-28-1). A RV monitor is a *sequence*

© Luca Aceto, Duncan Paul Attard, Adrian Francalanza, and Anna Ingólfsdóttir; licensed under Creative Commons License CC-BY 4.0 42nd Conference on Very Important Topics (CVIT 2016). Editors: John Q. Open and Joan R. Access; Article No. 23; pp. 23:1–23:53 Leibniz International Proceedings in Info [Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany](https://www.dagstuhl.de)

## **23:2 Runtime Instrumentation for Reactive Components**

- *recogniser* [\[130,](#page-32-1) [104\]](#page-31-2): a state machine that incrementally analyses a *finite* fragment of the
- runtime information exhibited by a SuS to reach an *irrevocable* verdict (see [\[6,](#page-27-2) [5\]](#page-27-3) for details).
- *Instrumentation* lies at the core of runtime monitoring [\[73,](#page-30-2) [21,](#page-28-1) [65\]](#page-30-3). It is the mechanism
- <sup>49</sup> by which runtime information from a SuS is extracted and reported to monitors as a stream
- of system events called a *trace*. Software is typically instrumented in one of two ways. Inline
- instrumentation, or *inlining*, modifies the SuS by injecting tracing instructions at specific
- joinpoints, *e.g.* using AspectJ [\[93\]](#page-31-3) or BCEL [\[54\]](#page-29-1). Outline instrumentation, or *outlining*, uses an external tracing infrastructure to gather events, *e.g.* LTTng [\[56\]](#page-29-2) or OpenJ9 [\[59\]](#page-29-3), thereby
- treating the SuS as a *black box*. A key requirement setting RV apart from monitoring, *e.g.*,
- telemetry [\[88\]](#page-31-4) or profiling [\[128,](#page-32-2) [26\]](#page-28-2), is that the instrumentation must report *sound traces*.
- <span id="page-1-4"></span>▶ **Definition 1** (Sound traces)**.** *A finite trace T is* sound *w.r.t. a system component P iff it is*
- **1.** Complete*. T contains* all *the events exhibited by P so far,* and
- $58\text{ }$  **2.** Consistent. The event sequence in T reflects the order these occur locally at P.
- Traces that violate this soundness invariant are unfit for RV, as omitted, spurious, or out-of-sequence events incorrectly characterise the system behaviour, *nullifying* the verdicts  $\epsilon_1$  that monitors flag [\[21,](#page-28-1) [52\]](#page-29-4). Reactive software imposes another requirement: that the instrumentation *safeguards* the responsive, resilient, message-driven, and elastic attributes of the SuS. This necessitates an instrumentation method that is itself *reactive*, such that it:
- <span id="page-1-0"></span><sup>64</sup> 1. does not hamper the SuS by inducing unfeasible runtime overhead (is responsive),
- <span id="page-1-2"></span>**2.** permits monitors to fail independently of SuS components (is resilient),
- <span id="page-1-3"></span>**3.** reacts to trace events without blocking the SuS (is message-driven), and
- <span id="page-1-1"></span>**4.** grows and shrinks in proportion to the size of the SuS (is elastic).
- **Limitations of current RV instrumentation methods** State-of-the-art RV tools use in- strumentation methods that do not satisfy *all* of the conditions [1](#page-1-0) – [4](#page-1-1) above. This renders  $\tau$ <sup>0</sup> them inapplicable to reactive software; see [\[65,](#page-30-3) Tables 3 and 4] for details. Many approaches,  $\pi$ <sup>1</sup> including [\[24,](#page-28-3) [31,](#page-28-4) [49,](#page-29-5) [78,](#page-30-4) [113,](#page-32-3) [129,](#page-32-4) [134,](#page-32-5) [17\]](#page-27-4), assume systems with a *fixed* architecture where  $\frac{7}{2}$  the number of components remains constant at runtime, failing to meet condition [4.](#page-1-1) Works foregoing the assumption of a fixed system size, such as  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$  $[45, 94, 61, 60, 25, 31, 71, 3]$ , inline the SuS with monitors *statically*. Inlining monitors pre-deployment inherently accommodates systems that grow and shrink (condition [4\)](#page-1-1) as a by-product of the embedded monitor code  $\tau_6$  that executes on the same thread of system components; see fig. [1a.](#page-3-0) This scheme, however,  $\pi$  has shortcomings that make it less suited to reactive software. Recent studies [\[21,](#page-28-1) [52\]](#page-29-4) observe that the lock-step execution of the SuS and monitors can impair the operation of the instru- mented system, *e.g.* slow runtime analyses manifest as high latencies [\[38\]](#page-28-6), and faulty monitors may break the system [\[72\]](#page-30-6), which do not meet conditions [1](#page-1-0) and [2](#page-1-2) (*e.g. M<sup>Q</sup>* in fig. [1a\)](#page-3-0). Other works [\[46,](#page-29-9) [14\]](#page-27-6) argue that errors, such as deadlocks or component crashes, are hard to detect  $\frac{82}{10}$  since the monitoring logic shares the runtime thread of the affected component. Entwining <sup>83</sup> the execution of the SuS and monitors may also diminish the scalability, performance, and <sup>84</sup> resource usage efficiency of the monitored system because inlined monitor code cannot be run on separate threads [\[11\]](#page-27-7). Lastly, inlining is *incompatible* with unmodifiable software, such as <sup>86</sup> closed-source components (*e.g. R* in figs.  $1a-1c$  $1a-1c$ ), making outlining the only alternative.

 Outline instrumentation *can* address the limitations of inlining by isolating the SuS and its monitors (works [\[45,](#page-29-6) [38,](#page-28-6) [39\]](#page-28-7) that view externalised monitors as 'outline' embed tracing code to extract events from the SuS, subjecting them to the cons of inlining). The latest survey on decentralised RV [\[74,](#page-30-1) Tables 1 and 2] establishes that outlining-based tools, *e.g.* [\[50,](#page-29-10) [16,](#page-27-8) [17,](#page-27-4) [75,](#page-30-7) [38,](#page-28-6) [39,](#page-28-7) [132,](#page-32-6) [66\]](#page-30-8), are variations on *centralised* instrumentation. In this set-up,

#### **Aceto et al. 23:3**

events exhibited by SuS components are funnelled through a *global* trace buffer (*e.g. κ*{*P ,Q,R*}

in fig. [1b\)](#page-3-0) that a singleton monitor can analyse asynchronously, meeting condition [3.](#page-1-3) Yet, the

entral buffer introduces contention and sacrifices the scalability of the SuS  $[10]$ , violating

condition [4.](#page-1-1) Centralised architectures are prone to single point of failures (SPOFs) [\[97,](#page-31-1) [96\]](#page-31-6)

(violating condition [2\)](#page-1-2), which is not ideal for monitoring medium-scale reactive systems.

 **Contribution** We propose RIARC, a *decentralised* instrumentation algorithm for outline 98 monitors that overcomes the above shortcomings, fulfilling conditions  $1-4$ . Outline monitors minimise latency effects due to slow trace event analyses associated with inlining (meeting condition [1\)](#page-1-0). While RIARC does not handle monitor failure explicitly, it intrinsically enjoys a <sup>101</sup> modicum of partial failure by isolating the SuS and its decentralised monitor components 102 (meeting condition [2\)](#page-1-2); *e.g.* monitors  $M_{\{P\}}$  and  $M_{\{Q,R\}}$  in fig. [1c.](#page-3-0) RIARC uses a tracing infrastructure to obtain system events passively without modifying the SuS (meeting con- dition [3\)](#page-1-3). The algorithm equips each isolated monitor with a *local* trace buffer, using it to report events based on the SuS components a monitor is tasked to analyse (*e.g.* buffers *κ*{*P*} and *κ*{*Q,R*} in fig. [1c\)](#page-3-0). RIARC reorganises its instrumentation set-up to reflect dynamic changes in the SuS. It reacts to specific events in traces to instrument monitors for new SuS components and to remove redundant monitors when it detects graceful or abnormal component terminations. This enables RIARC to grow and shrink the verification set-up 110 on demand (meeting condition [4\)](#page-1-1). Given the challenges in fulfilling the conditions  $1-4$ , we scope our work to settings where communication is reliable (*i.e.,* no message corruption, duplication, and loss) [\[58\]](#page-29-11) and Byzantine failures do not arise [\[99\]](#page-31-7).

 To the best of our knowledge, the approach RIARC advocates is novel. One reason why  $_{114}$  $_{114}$  $_{114}$  $_{114}$  $_{114}$  outlining has never been adopted for decentralising monitors are the onerous conditions  $1-4$  imposed by reactive software. Utilising non-invasive tracing makes our set-up necessarily *asyn- chronous*. At the same time, this complicates the instrumentation, which must ensure trace soundness (def. [1\)](#page-1-4), notwithstanding the inherent phenomena arising from the concurrent exe- cution of the SuS and monitors, *e.g.* trace event reordering and process crashes. Consequently, the second reason is that the overhead incurred to uphold this invariant—in addition to scaling the verification set-up as the SuS executes—is perceived as prohibitive when compared to inlining. This opinion is often reinforced when the viability of outline instrumentation is predicated on empirical criteria tied to monolithic, batch-style programs, that *may not* apply to reactive software (*e.g.* percentage slowdown); *e.g.* see [\[100,](#page-31-8) [117,](#page-32-7) [116,](#page-32-8) [47,](#page-29-12) [46,](#page-29-9) [124,](#page-32-9) [30,](#page-28-8) [101\]](#page-31-9).  $124$  $124$  This paper shows how instrumenting outline monitors under conditions  $1-4$  can be

 achieved using a decentralised approach that guarantees def. [1,](#page-1-4) *and* with overheads considered feasible for typical soft real-time reactive systems. Concretely, we:

 $\text{127}$  (i) recall the benefits of the actor model of computation [\[85,](#page-30-9) [9\]](#page-27-10) for building reactive systems and argue how our model of processes and tracers readily maps to that setting, sec. [2;](#page-3-1)

 **(ii)** give a decentralised instrumentation algorithm for outline monitors, detailing how the reactive characteristics of the SuS can be preserved whilst ensuring def. [1,](#page-1-4) sec. [3;](#page-7-0)

 **(iii)** show the implementability of our algorithm in an actor language and systematically validate the correctness of its corresponding implementation w.r.t. def. [1](#page-1-4) by exhaustively inducing interleaved executions for a selection of instrumented systems, sec. [4;](#page-15-0)

 **(iv)** back up the feasibility of the implemented algorithm via a comprehensive empirical study that uses various workload configurations surpassing the state of the art, showing that the induced overhead minimally impacts the reactive attributes of the SuS, sec. [5.](#page-17-0)

#### **23:4 Runtime Instrumentation for Reactive Components**

<span id="page-3-0"></span>

**Figure 1** *P,Q,R* instrumented in inline (*left*), centralised (*middle*) and decentralised (*right*) modes

## <span id="page-3-1"></span>137 **2 A** computational model for reactive systems

 The actor model [\[85,](#page-30-9) [9\]](#page-27-10) emerged as *the* paradigm to design and build reactive systems [\[33\]](#page-28-9). *Actors*—the units of decomposition in this model—are abstractions of concurrent entities that share no mutable memory with other actors. Instead, actors interact through asyn- chronous message passing and alter their internal state based on the messages they consume. Asynchronous communication decouples actors spatially and temporally, which fully isolates system components and establishes the foundation for resiliency and elasticity [\[32,](#page-28-10) [97\]](#page-31-1). Each actor is equipped with an incoming message buffer called the *mailbox*, from which messages deposited by other actors can be selectively read. Besides sending and receiving messages, actors can *spawn* other actors. Actors in a system are addressable by their unique process identifier (PID), which they use to engage in directed, *point-to-point* communication. This idea of addressability is central to the actor model: it enables reasoning about decentralised computation, as the identity of components or messages can be propagated through a system and used in handling complex tasks, such as process registration and failure recovery [\[33\]](#page-28-9). As is often the case in decentralised computations, we assume that messages exchanged between pairs of processes are always received in the order in which they have been sent [\[43\]](#page-29-13).

 Frameworks, notably Erlang [\[11\]](#page-27-7), Elixir [\[91\]](#page-31-10), Akka [\[1\]](#page-27-11) for Scala [\[120\]](#page-32-10), along with oth- ers [\[123,](#page-32-11) [139\]](#page-32-12), instantiate the actor model. We adopt Erlang since its ecosystem is specifically engineered for highly-concurrent, soft real-time reactive systems [\[140,](#page-32-13) [12,](#page-27-12) [44\]](#page-29-14). The Erlang virtual machine (EVM) implements actors as lightweight processes. It employs *per process* garbage collection that, unlike the JVM, does not subject the virtual machine to global unpre- dictable pauses [\[89,](#page-31-11) [119\]](#page-32-14). This factor minimises the impact on the soft real-time properties of a system *and* is also crucial to the empirical evaluation of sec. [5](#page-17-0) since it stabilises the variance in our experiments. The EVM exposes a flexible *process tracing* API aimed at reactive software [\[42\]](#page-29-15). Erlang provides other components, *e.g.* supervision trees, message queues, *etc.*, for building fault-tolerant distributed applications. While we scope our work to fault-free settings (see sec. [1\)](#page-0-0), adopting Erlang gives us the foundation upon which our work can be naturally extended to address these aspects. Henceforth, we follow the established convention in Erlang literature and use the terms *actor*, *process*, and *component* synonymously.

### <span id="page-4-4"></span><sup>166</sup> **2.1 Process tracing and trace partitioning**

 Processes in a concurrent system form a *tree*, starting at the *root* process that spawns *child* <sup>[1](#page-4-0)68</sup> processes, and so forth<sup>1</sup>. Concurrency induces inherent *partitions* to the execution of the SuS in the form of isolated traces that reflect the *local* behaviour at each process [\[17\]](#page-27-4). RIARC exploits this aspect to attain several benefits. First, one can *selectively* specify the SuS processes to be instrumented. The upshot is that fewer trace events need to be gathered. <sub>172</sub> improving *efficiency*. Another benefit of partitioned traces is that each process can be dynamically instrumented, free from assumptions about the number of processes the SuS is expected to have. This makes the RV set-up *elastic*. Lastly, the instrumentation set-up can *partially fail*, as faulty SuS or monitor processes do not imperil the execution of one another.

<span id="page-4-1"></span> ▶ **Example 2** (Trace partitions)**.** Trace partitions enable RIARC to instrument a system in various arrangements. Fig. [2a](#page-5-0) depicts an interaction sequence for the execution of the SuS from sec. [1.](#page-0-0) In this interaction, the root process, *P*, spawns *Q* and communicates with it, at which point *Q* spawns process *R*; *P* and *Q* eventually terminate. We denote the process *spawning* and *termination* trace events by  $\sim$  and  $\star$ , and use ! and ? for *send* and *receive* 181 events respectively. The *sound* trace partitions for the processes in fig. [2a](#page-5-0) are  $\langle \sim_{\mathsf{P}} \cdot |_P, \star_{\mathsf{P}} \rangle$  for  $P, '?_0.~\diamondsuit_0.~\star_o'$  for *Q*, and the empty trace for *R*.

183 A centralised set-up such as that of fig. [1b](#page-3-0) can be obtained by instrumenting  $\{P, Q, R\}$ <sup>184</sup> with one monitor,  $M_{\{P,Q,R\}}$ , whereas instrumenting the components  $\{P\}$  and  $\{Q,R\}$  with <sup>185</sup> monitors  $M_{\{P\}}$  and  $M_{\{Q,R\}}$  gives the decentralised arrangement of fig. [1c.](#page-3-0) Each of these <sup>186</sup> instrumentation arrangements generates different executions.

<sup>187</sup> ▶ **Example 3** (Sound traces)**.** For the case of fig. [1b,](#page-3-0) RIARC can report to *M*{*P ,Q,R*} *one* 188 of four possible traces  $\langle \phi_{p}$ ,  $|_{p}$ ,  $\star_{p}$ ,  $\partial_{q}$ ,  $\star_{Q}$ ,  $\star_{Q}$ ,  $\star_{Q}$ ,  $\star_{p}$ ,  $\partial_{q}$ ,  $\star_{p}$ ,  $\phi_{q}$ ,  $\star_{Q}$ ,  $\star_{q}$ ,  $\star_{Q}$ ,  $\star_{p}$ ,  $\star_{Q}$ ,  $\star_{p}$ ,  $\star_{Q}$ ,  $\star_{p}$ ,  $\star_{Q}$ ,  $\star_{p}$ ,  $\star$  $\langle \phi_p, \phi_p, \phi_q, \phi_q, \phi_q, \phi_q \rangle$ . These *sound* traces result from the interleaved execution of processes 190 P, Q. Any other trace, e.g.  $\langle \neg p, \nabla_p, \nabla_p, \neg q, \nabla_q \rangle$  or  $\langle \neg p, \nabla_p, \nabla_p, \nabla_q, \nabla_q \rangle$ , is unsound since it <sup>191</sup> contradicts the local behaviour at processes *P* and *Q* of fig. [2a.](#page-5-0) The former trace omits the 192 request  $!_P$  that *P* makes to *Q* (it is *incomplete* w.r.t. *P*), and the latter trace inverts  $\sim_Q$  and  $\star_{\mathcal{O}}$ , suggesting that *Q* spawns *R* after *Q* terminates (it is *inconsistent* w.r.t. *Q*).

<sup>194</sup> ▶ **Example 4** (Separate instrumentation)**.** Fig. [2b](#page-5-0) shows another decentralised set-up, where <sup>195</sup> *P*, *Q*, and *R* are instrumented separately. In this case, the instrumentation should report to  $M_{\{P\}}$ ,  $M_{\{Q\}}$  and  $M_{\{R\}}$  the events observed *locally* at each process, as stated in ex. [2.](#page-4-1)

<sup>197</sup> RIARC makes two assumptions about process tracing in order to support the instrument-<sup>198</sup> ation arrangements shown in figs. [1b,](#page-3-0) [1c,](#page-3-0) and [2b:](#page-5-0)

<span id="page-4-3"></span><span id="page-4-2"></span> **A<sup>1</sup>** *Tracing processes sets.* Tracing can gather events for *sets* of SuS processes, *e.g. κ*{*P ,Q,R*} 200 in fig. [1b](#page-3-0) gathers the events of  $\{P,Q,R\}$ , and  $\kappa_{\{Q,R\}}$  in fig. [1c](#page-3-0) gathers the events of  $\{Q,R\}$ . **A<sup>2</sup>** *Tracing inheritance.* Tracing gathers the events of a SuS process *and* the children it spawns by default to eliminate the risk that trace events from child processes are missed. We opt for tracing inheritance since it follows established centralised RV monitoring tools, 204 including [\[16,](#page-27-8) [41,](#page-29-16) [50,](#page-29-10) [113\]](#page-32-3). In fact, tracing assumptions  $A_1$  $A_1$  and  $A_2$  mean that centralised set-ups like that of fig. [1b](#page-3-0) can be obtained just by tracing the root process *P*. Tracing inheritance requires the instrumentation to *intervene* if it needs to channel the events of a child process into a *new* trace partition that is *independent* from that of its parent, *e.g.* as in

<span id="page-4-0"></span>For example, using spawn() in Erlang  $[42]$  and Elixir  $[91]$ , ActorContext.spawn() in Akka [\[1\]](#page-27-11), Actor.createActor() in Thespian [\[123\]](#page-32-11), CreateProcess() in Windows [\[111\]](#page-31-12), *etc.*

<span id="page-5-0"></span>

**Figure 2** SuS with processes *P*, *Q*, and *R* instrumented with independent monitors

<span id="page-5-1"></span> fig. [1c.](#page-3-0) In such cases, the instrumentation must first stop tracing the child process, allocate a fresh trace buffer, and resume tracing the child process. The out-of-sync execution of the SuS and instrumentation complicates the creation of these new trace partitions because it can lead to reordered or missed events. This, in turn, would violate trace soundness, def. [1.](#page-1-4) <sup>212</sup> We supplement  $A_1$  $A_1$  and  $A_2$  with the following to keep our exposition in sec. [3](#page-7-0) manageable: **A<sup>3</sup>** *Single-process tracing.* Any SuS process can be traced *at most* once at any point in time. **A<sup>4</sup>** *Causally-ordered spawn events.* Tracing gathers the spawn trace event of a parent process before *all* the events of the child process spawned by that parent, *e.g.* if *P* spawns *Q*, 216 and *Q* receives, as in fig. [2a,](#page-5-0) the reported sequence is ' $\sim_{P}$ . ?<sub>*Q*</sub>' rather than '?<sub>*Q*</sub>. $\sim_{P}$ '. <sup>217</sup> The constraint of tracing assumption  $A_3$  $A_3$  is easily overcome by replicating trace events for a process and reporting them to different monitors (*e.g.* the events in the trace partition of

<span id="page-5-2"></span>process *P* are replicated to monitors  $M_{\{P_a\}}$ ,  $M_{\{P_b\}}$ ,  $M_{\{P_c\}}$  in fig. [2c\)](#page-5-0). Tracing assumption  $A_4$  $A_4$  $_{220}$  requires trace buffers to reorder  $\sim$  events using the spawner and spawned process information <sup>221</sup> carried by each event before reporting them to monitors. Sec. [3.3](#page-12-0) gives more details.

<sup>222</sup> ▶ **Example 5** (Unsound traces)**.** Fig. [3a](#page-6-0) shows one possible configuration that can be reached <sup>223</sup> by our three-process system introduced in fig. [2a,](#page-5-0) where the trace buffer *κ*{*P*} contains the <sup>224</sup> events for both *P* and *Q*. The trace in buffer  $κ_{iQ}$  is unsound, as it inaccurately characterises 225 the local behaviour of process *Q* (the sound trace for *Q* should be ' $?_Q$   $\rightarrow \infty$   $\star_Q$ ', not ' $\star_Q$ ').

 RIARC programs trace buffers to coordinate with one another to ensure that sound traces <sub>227</sub> are invariably reported to monitors. We refer to a trace buffer and the coordination logic it encapsulates as a *tracer*. RIARC employs an approach based on *next-hop routing* in IP networks [\[83,](#page-30-10) [107\]](#page-31-13) to counteract the effects of trace event reordering and loss by rearranging and forwarding events to different tracers. Fig. [3b](#page-6-0) conveys our organisation of tracers (refer to fig. [10](#page-34-0) in app. [A](#page-34-1) for legend). Sec. [3](#page-7-0) details how RIARC dynamically reorganises the tracer choreography and performs next-hop routing.

## <span id="page-5-3"></span><sup>233</sup> **2.2 Modelling decentralised instrumentation**

<sup>234</sup> Since RV monitors are passive verdict-flagging machines (refer to sec. [1\)](#page-0-0), they are orthogonal to <sup>235</sup> our instrumentation. We, thus, focus our narrative on tracers and omit monitors, except when 236 relevant in the surrounding context. The model assumes a set of SuS process,  $P,Q,R \in \text{Pre}$ , 237 and tracer names,  $T \in \text{TRC}$ , together with a countable set of PID values to reference processes. <sup>238</sup> We distinguish between SuS and tracer PIDs, which we denote respectively by the sets,  $p_S, q_S \in \text{PID}_S$  and  $p_T, q_T \in \text{PID}_T$ . The variables  $i_S$  and  $j_S$ , and  $i_T$  and  $j_T$  range over PIDs from <sup>240</sup> the corresponding sets PID<sub>S</sub> and PID<sub>T</sub>. We also assume the function signature sets,  $f_s \in SIGs$ , <sup>241</sup>  $f_{\rm T} \in \text{SIG}_{\rm T}$ , and,  $f_{\rm M} \in \text{SIG}_{\rm M}$ , to denote SuS, tracer, and RV monitor functions, together with

<span id="page-6-0"></span>



<sup>242</sup> the variables  $\zeta_{\rm S}$ ,  $\zeta_{\rm T}$ , and  $\zeta_{\rm M}$  that range over each signature set. New SuS processes are created <sup>243</sup> via the function spwn( $\varsigma$ <sub>S</sub>) that accepts the function signature  $\varsigma$ <sub>S</sub> to be spawned, and returns  $_{244}$  a fresh PID,  $i_S$ . We overload spwn to spawn tracer signatures  $\varsigma_T$  equivalently, returning <sup>245</sup> corresponding PIDs,  $i_T$ . The function self obtains the PID of the process invoking it. We <sup>246</sup> write *P* as shorthand for a singleton process set  $\{P\}$  to simplify notation.

<sup>247</sup> RIARC uses three message types, *τ* ∈ {evt*,*dtc*,*rtd}. These determine when to *create* or <sup>248</sup> *terminate* tracer processes, and what trace events to *route* between tracers:

<sup>249</sup> evt are *trace events* gathered via process tracing,

<sup>250</sup>  $\blacksquare$  dtc are *detach* requests that tracers exchange to reorganise the tracer choreography, and  $_{251}$   $\blacksquare$  rtd are *routing* packets that transport evt or dtc messages forwarded between tracers.

252 We encode messages *m* as tuples. Trace event messages,  $\langle \text{evt}, \ell, i_s, j_s, \varsigma_s \rangle$ , comprise the event 253 label  $\ell$  that ranges over the SuS events  $\sim$  *(spawn)*,  $\star$  *(exit)*, *! (send)*, and ? *(receive)*. The <sup>254</sup> PID value  $\iota_s$  identifies the SuS process exhibiting the trace event, and is defined for *all* <sup>255</sup> events. The SuS PID  $j_s$  and function signature  $\varsigma_s$  depend on the type of the event. Tbl. [1a](#page-6-1) <sup>256</sup> catalogues the values defined for each event. We write trace events in their shorthand form,  $_{257}$  omitting undefined values (denoted by  $\perp$ ), *e.g.*  $\langle$  evt,  $\star$ ,  $i_s\rangle$  instead of  $\langle$  evt,  $\star$ ,  $i_s, \perp, \perp$ .

258 Detach request messages have the form  $\langle \text{dtc}, i_T, i_S \rangle$ . A tracer with the PID  $i_T$  uses dtc to equest that another tracer *stop* tracing the SuS PID  $i_S$ . Routing packet messages,  $\langle \text{rtd}, i_T, m \rangle$ ,

<span id="page-6-1"></span>

**(a)** Messages encoding *spawn*, *exit*, *send*, and *receive* events

**(b)** Detach and routing messages

**Table 1** Trace event (evt), detach request (dtc), and routing packet (rtd) message index names

#### **23:8 Runtime Instrumentation for Reactive Components**

<span id="page-7-6"></span><span id="page-7-2"></span><span id="page-7-1"></span>

<span id="page-7-7"></span><span id="page-7-5"></span><span id="page-7-4"></span><span id="page-7-3"></span>**Table 2** RIARC approach to ensure trace soundness (def. [1\)](#page-1-4) and reactive instrumentation (sec. [1\)](#page-0-0)

260 move evt and dtc messages between tracers. The PID  $\iota_{\rm T}$  identifies the tracer that embeds the  $_{261}$  message *m* into the routing packet and dispatches it to other tracers. Tbl. [1b](#page-6-1) summarises <sup>262</sup> detach request and routing packet messages.

 We reserve the variables *e*, *d*, and *r* for the messages types evt, dtc, and rtd respectively. Our model uses the suggestive dot notation (.) to index message fields, *e.g. m.τ* reads the message type, *e.ℓ* reads the trace event label, *etc.* (see tbl. [1\)](#page-6-1). For simplicity, we occasionally write the label  $\ell$  in lieu of the full trace event form,  $e.q$  we write  $\star$  instead of  $\langle \text{evt}, \star, \iota_{\text{s}} \rangle$ .

## <span id="page-7-0"></span><sup>267</sup> **3 Decentralised instrumentation**

 Our reason for encapsulating trace buffers and their coordination logic as tracers stems from the actor model. Trace buffers align with actor mailboxes, which localise the tracer state and enable tracers to run *independently*. The main logic replicated at each tracer is given in  $_{271}$  $_{271}$  $_{271}$  algs. 1 – [3.](#page-13-1) Tracers operate in two modes, *direct* ( $\circ$ ) and *priority* ( $\bullet$ ), to counteract the effects of trace event reordering. We organise our tracer logic in algs. [1](#page-10-0) and [3](#page-13-1) to reflect these modes, 273 respectively. Algs. [1](#page-10-0) and [3](#page-13-1) use the function ANALYSEEVT, tasked with analysing events; see app. [C.5.2](#page-44-0) for details. Auxiliary tracer logic referenced in this section is relegated to app. [A.](#page-34-1)

Every tracer maintains an internal state  $\sigma$  consisting of the following three maps:

 $\mu_{276}$  the *routing* map,  $\Pi$ , governing how events are routed to other tracers,

 $277$  the *instrumentation* map,  $\Lambda$ , that determines which SuS processes to instrument, and

 $\mu_{278}$  the *traced-processes* map, Γ, tracking the SuS process set that the tracer currently traces. <sup>279</sup> Tbl. [2](#page-7-1) summarises the challenges that RIARC needs to overcome to attain the reactive 280 characteristics stated in sec. [1.](#page-0-0) [R](#page-7-3)equirements  $R_1$  and  $R_6$  in tbl. [2](#page-7-1) oblige the instrumentation <sup>281</sup> to reorganise dynamically while the SuS executes to preserve its *elasticity*. Requirement [R](#page-7-4)<sup>4</sup> <sup>282</sup> offers a modicum of *resiliency* between the SuS and tracer processes, whereas  $R_5$  $R_5$  minimises <sup>283</sup> the instrumentation overhead by gathering only the events monitors require. This keeps the <sup>284</sup> overall set-up *responsive*. Since RIARC builds on the actor model, it fulfils the *message-driven* <sup>285</sup> requirement intrinsically. *Trace soundness* is safeguarded by requirements  $R_2$  $R_2$  and  $R_3$ .

<sup>286</sup> The operations Trace, Clear and Preempt give access to the tracing infrastructure. <sup>287</sup> TRACE( $i_S, i_T$ ) enables a tracer with PID  $i_T$  to register its interest in receiving trace events of a <sup>288</sup> SuS process with PID  $i_S$ . This operation can be undone using CLEAR( $i_S, i_T$ ), which *blocks* the <sup>289</sup> calling tracer  $i_T$  and returns once all the trace event messages for the SuS process  $i_S$  that are <sup>290</sup> in transit to the tracer  $i<sub>T</sub>$  have been delivered to  $i<sub>T</sub>$ . It is worth remarking that this behaviour 291 conforms to our proviso in sec. [1,](#page-0-0) *i.e.*, no communication faults. PREEMPT $(i_S, i_T)$  combines 292 CLEAR and TRACE. It enables the tracer pre-empting  $i_T$  to take control of tracing the SuS <sup>293</sup> process  $i_S$  from another tracer  $i'_T$  that is currently tracing  $i_S$ . Tracers use CLEAR or PREEMPT

<sup>294</sup> to modify the default process-tracing inheritance behaviour that tracing assumption  $A_2$  $A_2$ <sup>295</sup> describes. We refer to alg. [5](#page-35-0) for the specifics of these operations.

<sup>296</sup> We focus our presentation in secs.  $3.1 - 3.6$  $3.1 - 3.6$  $3.1 - 3.6$  of how RIARC addresses the challenges listed in tbl. [2](#page-7-1) on the set-up of fig. [2b,](#page-5-0) where the processes *P*, *Q* and *R*, are instrumented separately. This specific case highlights two aspects. First, it *emphasises* the complications that RIARC overcomes to establish the desired set-up while ensuring trace soundness, def. [1.](#page-1-4) Second, fig. [2b](#page-5-0) *covers all* other possible instrumentation set-ups. Disjoint sets of SuS processes, <sup>301</sup> including the one shown in fig. [1c,](#page-3-0) can be obtained when tracers do not act on certain  $\sim$  (*spawn*) events, as sec. [3.1](#page-8-0) explains. Notably, *any* centralised set-up, *e.g.* the one in fig. [1b,](#page-3-0) <sup>303</sup> emerges naturally when the root tracer disregards all  $\sim$  events exhibited by the SuS.

 $304$   $\blacktriangleright$  Note 6 (Naming conventions). For clarity, we adopt the convention that a SuS process <sup>305</sup> *P* is spawned from the signature  $f_{S_P}$  and is assigned the PID  $p_S$ . A tracer for *P* is named  $T_P$  (short for  $T_{\{P\}}$ ) and has the PID  $p_T$ . Other processes are treated likewise, *e.g.* the SuS 307 process *Q* has signature  $f_{S_Q}$ , PID  $q_S$ , while the tracer  $T_Q$  for *Q* has PID  $q_T$ , *etc.* 

## <span id="page-8-0"></span><sup>308</sup> **3.1 Growing the set-up**

<sup>309</sup> Fig. [4](#page-8-1) illustrates how the hierarchical creation sequence of SuS processes described in sec. [2.1](#page-4-4)  $\frac{310}{2}$  is exploited to instrument separate tracers. RIARC programs tracers to react to  $\sim$  (*spawn*) 311 events in the trace. In fig. [4a,](#page-8-1) the root tracer  $T_P$  traces process  $P$ , step  $\mathbb{O}$ . When  $P$  spawns  $312$  process *Q*, *Q* automatically inherits  $T_P$  (tracing assumption  $A_2$  $A_2$  from sec. [2.1\)](#page-4-4). Steps  $\circledcirc$  in fig. [4a](#page-8-1) emphasise that tracing inheritance is instantaneous. The event  $e = \langle \text{evt}, \sim, p_s, q_s, f_{s_Q} \rangle$  $314$  is generated by *P* when it spawns its child *Q*, step  $\circled{3}$  in fig. [4a.](#page-8-1) The PID values of the parent 315 and child processes carried by  $e$ , namely  $p_s$  and  $q_s$ , are accessed via the indexes  $e.\iota_s$  and  $e.\jmath_s$ <sup>316</sup> respectively (see tbl. [1a\)](#page-6-1). Tracer *T<sup>P</sup>* uses this PID information to instrument a new tracer  $T_Q$  for process *Q* in step  $\circled{0}$  of fig. [4b.](#page-8-1) By invoking PREEMPT $(q_S, q_T)$ ,  $T_Q$  takes over tracing <sup>318</sup> process *Q* from the former tracer *T<sup>P</sup>* going forward. *T<sup>Q</sup>* creates a new trace partition for  $319$  process *Q* that is independent of the partition of *P*, step **6**. Meanwhile,  $T_P$  receives the send 320 event  $\langle$  evt, *!*,  $p_s$ ,  $q_s$  $\rangle$  in step  $\textcircled{1}$  after *P* messages *Q* in step  $\textcircled{1}$  of fig. [4c.](#page-8-1) Subsequent  $\sim$  events  $321$  that  $T_P$  or  $T_Q$  may gather are handled as described in steps  $\circled{3}$ - $\circled{3}$ . Figs. [4c](#page-8-1) and [4d](#page-8-1) show

<span id="page-8-1"></span>

**Figure 4** Growing the tracer instrumentation set-up for processes *P*, *Q* and *R* (monitors omitted)

<span id="page-9-1"></span>

**Figure 5** Next-hop trace event routing using tracer routing maps Π (monitors omitted)

<sup>322</sup> how the final tracer  $T_R$  is instrumented in step  $\textcircled{12}$  after *Q* spawns *R* in step  $\textcircled{8}$ . As before, *T*<sub>*Q*</sub> traces *R* automatically in step **8**. *T*<sub>*Q*</sub> receives the event  $\langle$  evt,  $\sim$   $,q_s$ ,  $r_s$ ,  $f_{s_R}$  $\rangle$  generated by  $\mathcal{Q}$  in step  $\mathbb{I}$ .  $T_R$  invokes PREEMPT $(r_S, r_T)$  to create the trace partition for *R* in step  $\mathbb{I}$ .

## <span id="page-9-0"></span><sup>325</sup> **3.2 Ensuring complete traces**

<sub>326</sub> The asynchrony between the SuS and tracer processes can induce the interleaved execution  $327$  shown in fig. [5,](#page-9-1) as an alternative execution to that shown in figs.  $4b-4d$  $4b-4d$ . In fig. [5a,](#page-9-1)  $T_p$  is slow <sup>328</sup> to handle  $\sim_P$  it receives in  $\circled{3}$  of fig. [4a](#page-8-1) and fails to instrument  $T_Q$  promptly. Consequently, <sup>329</sup> the events  $\partial_Q$  and  $\partial_Q$  that *Q* exhibits are sent to  $T_P$  in steps  $\mathcal D$  and  $\mathcal Q$  of fig. [5a.](#page-9-1) Step  $\mathcal Q$ 330 shows the case where  $\langle \text{evt}, ?, q_T \rangle$  is processed by  $T_P$ , rather than by the *intended* tracer  $T_Q$ <sup>331</sup> that would have been instrumented by *T<sup>P</sup>* . This error breaches the *completeness* property of  $332$  trace soundness w.r.t. *Q*, as the events  $?_Q$  and  $\sim_Q$  meant for *Q* reach the wrong tracer  $T_P$ .

<sup>333</sup> To address this issue, RIARC uses a next-hop routing approach, where tracers *retain* <sup>334</sup> the events they should handle and *forward* the rest to neighbouring tracers. We use the <sup>335</sup> term *dispatch tracer* (*dispatcher* for short) to describe a tracer that receives trace events <sup>336</sup> meant to be handled by another tracer. For instance, *T<sup>P</sup>* in fig. [5a](#page-9-1) becomes the dispatch 337 tracer for *Q* when it receives the events  $?_Q$  and  $\sim_Q$  exhibited by *Q*, steps  $\circled{D}$  and  $\circled{9}$ . We <sup>338</sup> expect these events to be handled by *T<sup>Q</sup>* once it is instrumented. Dispatchers are tasked <sup>339</sup> with embedding trace event (evt) or detach requests (dtc) into routing packet messages (rtd) <sup>340</sup> and transmitting them to the next *known* hop. In fig. [5b,](#page-9-1) *T<sup>P</sup>* dispatches the events ?*<sup>Q</sup>* and  $_{341}$   $\sim$ <sub>Q</sub> as follows. It first instruments  $T_Q$  with Q in step 11. Next,  $T_P$  prepares  $\langle \text{evt}, ?, r_s \rangle$  and  $\langle \text{evt}, \neg \diamond, q_S, r_S, f_{S_R} \rangle$  for transmission by embedding each in rtd messages (steps 14) and 18). *T<sub>P</sub>* forwards the resulting routing packets,  $\langle$ rtd, $\langle$ evt, $?\rangle$ *rs* $\rangle$  $\rangle$  and  $\langle$ rtd, $\langle$ evt, $\sim$ , $q_s$ , $r_s$ , $f_{s_R}$  $\rangle$  $\rangle$ , to its 344 next-hop neighbour  $T_Q$  in steps 15 and 19. The trace event  $\langle \text{evt}, l, p_s, q_s \rangle$ , however, is not <sup>345</sup> forwarded but handled by  $T_P$ , as step  $\overline{12}$  shows. Concurrently,  $T_Q$  acts on the forwarded <sup>346</sup> events  $?_Q$  and  $\sim_Q$  in steps 16 and 21 and instruments  $T_R$  as a result, step 22.

347 Tracers determine the events to retain or forward using the routing map,  $\Pi : PID_S \to PID_T$ . <sup>348</sup> Every tracer queries its private routing map for each message it receives on SuS PID  $m_{\lambda_5}$ . <sup>349</sup> A tracer forwards a message to its neighbouring tracer with PID  $i<sub>T</sub>$  if a next-hop for that

```
■ Algorithm 1 Logic handling ◦ trace events, detach request dispatching, and forwarding
   1 def \text{Loop}_{\alpha}(\sigma, \varsigma_M)2 forever do
   3 m ← next message from trace buffer κ
         4 match m.τ do
           \case evt: \sigma \leftarrow HANDLEVENT<sub>°</sub>(\sigma,\varsigma<sub>M</sub>,m)
           case dtc: σ ← DISPATCHDTC(σ,ς<sub>M</sub>,m)
   \tau<sup>:</sup> case rtd: \sigma \leftarrow \text{FORWDRTD}_{\circ}(\sigma, \varsigma_M, m)8 def HANDLEVT<sub>°</sub>(\sigma,\varsigma_M,e)
   9 match e.ℓ do
 \cos \phi: return HANDLSPWN°(\sigma, \zeta_M, e)
  \text{case } \star: \text{return } \text{HANDLEXIT}_{\circ}(\sigma, \varsigma_M, e)\text{case} !,?: return HANDLCOMM<sub>°</sub>(\sigma,\varsigma_M,e)
 13 def HANDLSPWN<sub>o</sub>(\sigma,\varsigma_M,e)
 \mathbf{u}_4 match \sigma \cdot \Pi(e.\iota_s) do
 \mathbf{case} \perp : \# No \text{ next-hop for } e.s. handle e
 16 ANALYSEEVT(\zeta_M, e) # C.5.217 \sigma \leftarrow \text{Instrument}_{\circ}(\sigma, e, \text{self}))18 case \eta_T: # Next-hop for e.i.s exists via \eta_T19 DISPATCH(e, \jmath_T)# Set next-hop of e.js to tracer of e.is
 \sigma \cdot \Pi \leftarrow \sigma \cdot \Pi \cup \{ \langle e \cdot \jmath_s, \jmath_\text{T} \rangle \}21 return σ
 22 def HANDLEXIT<sub>ο</sub>(σ, ζ_M, e)
 \alpha<sub>23</sub> match \sigma.\Pi(e.i<sub>S</sub>) do
 z_4 case \perp : # No next-hop for e.i.s; handle e
 25 ANALYSEEVT(\zeta_{\text{M}}, e) # C.5.2\sigma σ . Γ ← σ . Γ \{\langle e \cdot \iota_s, \circ \rangle}
 _{27} TryGC(\sigma)
 28 case j_T: DISPATCH(e, j_T)29 return σ
 30 def HANDLCOMM<sub>ο</sub>(σ, ζ_M, e)31 match \sigma \cdot \Pi(e.\iota_{\rm S}) do
 \text{case } \perp: ANALYSEEVT(\varsigma_M, eC.5.2
 \cos \theta_{\text{T}}: DISPATCH(e, \eta)34 return σ
                                                                         35 def DISPATCHDTC(\sigma,d)
                                                                         36 match \sigma. \Pi(d.\iota_{\rm s}) do
                                                                         37 case ⊥ : fail dtc next-hop must be defined
                                                                         38 case 7<sub>T</sub>:
                                                                         39 DISPATCH(d, \eta_T)# Next-hop for d.ıS no longer needed
                                                                         \sigma. \Pi \leftarrow \sigma. \Pi \setminus \{ \langle d . \imath_{\text{\tiny S}} , \jmath_{\text{\tiny T}} \rangle \}_{41} TryGC(\sigma)
                                                                         42 return σ
                                                                         43 def FORWDRTD\circ(\sigma, r)44 \n m \leftarrow r.m \# Read embedded message in r45 match m.τ do
                                                                         46 case dtc: return FORWDDTC(\sigma,r)
                                                                         47 case evt: return FORWDEVT(\sigma,r)48 def FORWDDTC(\sigma,r)a_9 \quad d \leftarrow r.m\mathfrak{so} match \sigma \cdot \Pi(d.\iota_{\mathbf{S}}) do
                                                                         51 case ⊥ : fail dtc next-hop must be defined
                                                                         52 case \eta<sup>T</sup>:
                                                                         _{53} FORWD(r, \eta_T)# Next-hop for d.ıS no longer needed
                                                                         \sigma. \Pi \leftarrow \sigma. \Pi \setminus \{ \langle d \cdot \iota_s, \jmath_{\mathrm{T}} \rangle \}_{55} TryGC(\sigma)
                                                                         56 return σ
                                                                         57 def FORWDEVT(σ, r)
                                                                         s \rightarrow e \leftarrow r.m59 match σ.Π(e.ıS) do
                                                                         60 case ⊥ : fail evt next-hop must be defined
                                                                         61 case \eta<sup>T</sup>:
                                                                         62 FORWD(r, \jmath_{\mathrm{T}})# For spawn events, tracer also sets a
                                                                                   # new next-hop for e.y<sub>s</sub># Next-hop of e.y<sub>S</sub> to same tracer of e.y<sub>S</sub>63 if (e.\ell = \diamond)\sigma \cdot \Pi \leftarrow \sigma \cdot \Pi \cup \{ \langle e, \jmath_s, \jmath_\text{T} \rangle \}65 return σ
```
350 message exists, *i.e.*,  $\Pi(m.\mathbf{s}) = \mathbf{1}_T$ . When the next-hop is undefined, *i.e.*,  $\Pi(m.\mathbf{s}) = \bot$ , *m* is <sup>35[1](#page-10-0)</sup> handled by the tracer. HANDLSPWN, HANDLEXIT and HANDLCOMM in alg. 1 implement <sup>352</sup> this forwarding logic on lines [14,](#page-9-0) [23](#page-9-0) and [31.](#page-9-0)

<span id="page-10-2"></span><span id="page-10-1"></span><sup>353</sup> Dynamically populating the routing map is key to transmitting messages between tracers. 354 A tracer adds the new mapping  $e.g. \mapsto \gamma$ <sup>T</sup> to its routing map  $\Pi$  in case [1](#page-10-1) or [2](#page-10-2) below whenever 355 it processes spawn trace events  $e = \langle \text{evt}, \neg \land, \textit{i}_S, \textit{j}_S, \textit{s}_S \rangle$ . One of two cases is considered for *e.i*<sub>s</sub>: 356 **1.**  $\Pi(i_S) = \bot$ . The next-hop for *e* is undefined, which cues the tracer to instrument the SuS <sup>357</sup> process with PID *ȷ*S. When applicable, the tracer processes the event *and* instruments a ss separate tracer with PID  $j_{\text{T}}$ . It then adds the mapping  $e, j_{\text{S}} \mapsto j_{\text{T}}$  to Π. The tracer leaves <sup>359</sup> Π *unmodified* and handles the event itself if a separate tracer is not required. Opting for  $360$  a separate tracer is determined by the instrumentation map Λ, as discussed in sec. [3.5.](#page-14-0)

#### **23:12 Runtime Instrumentation for Reactive Components**

| <b>Expect:</b> $e = \langle \text{evt}, \neg \Diamond, \iota_s, \iota_s, \varsigma_s \rangle$                                      | <b>Expect:</b> $e = \langle \text{evt}, \diamondsuit, i_{\text{S}}, j_{\text{S}}, \zeta_{\text{S}} \rangle$                             |
|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| <sup>1</sup> def INSTRUMENT <sub>0</sub> $(\sigma, e, i_{\text{T}})$                                                               | s def INSTRUMENT. $(\sigma, e, \iota_{\text{T}})$                                                                                       |
| $_2$ if $((\varsigma_M \leftarrow \sigma \cdot \Lambda(e.\varsigma_S)) \neq \bot)$                                                 | 9 if $((\varsigma_M \leftarrow \sigma \cdot \Lambda(e.\varsigma_S)) \neq \bot)$                                                         |
| # New tracer $j_T$ for new SuS process e.js                                                                                        | # New tracer $j_T$ for new SuS process e.js                                                                                             |
| $\beta_3$ : $\gamma_{\rm T} \leftarrow$ spwn(TRACER( $\sigma$ , $\varsigma_{\rm M}$ , $e$ . $\gamma_{\rm S}$ , $\imath_{\rm T}$ )) | 10 $j_{\text{T}} \leftarrow \text{spwn}(\text{TrACER}(\sigma, \zeta_{\text{M}}, e, \jmath_{\text{S}}, \iota_{\text{T}}))$               |
| $\sigma \cdot \Pi \leftarrow \sigma \cdot \Pi \cup \{ \langle e \cdot \jmath_s, \jmath_r \rangle \}$                               | $\sigma \cdot \Pi \leftarrow \sigma \cdot \Pi \cup \{ \langle e \cdot \jmath_s \cdot \jmath_{\mathrm{T}} \rangle \}$<br>$11\,$ $^\circ$ |
| else<br>5                                                                                                                          | $_{12}$ else                                                                                                                            |
| $# In \circ mode$ , this tracer has detached                                                                                       | $# In \bullet mode, this tracer must detach$                                                                                            |
| # all processes from its dispatcher $i_T$                                                                                          | $# SuS$ process e.js from its dispatcher $i_T$                                                                                          |
| $# This tracer traces new Sus process e.1s$                                                                                        | 13 DETACH $(e_{\cdot} \eta_s, \iota_T)$                                                                                                 |
| # by tracing inheritance assumption $A_2$                                                                                          | $# This tracer traces new Sus process e.1s$                                                                                             |
| $\sigma.\Gamma \leftarrow \sigma.\Gamma \cup \{ \langle e.g. , \circ \rangle \}$<br>6                                              | $\sigma.\Gamma \leftarrow \sigma.\Gamma \cup \{ \langle e.g. , \bullet \rangle \}$<br>14                                                |
| return $\sigma$                                                                                                                    | return $\sigma$<br>15                                                                                                                   |
|                                                                                                                                    |                                                                                                                                         |

<span id="page-11-0"></span>■ **Algorithm 2** Tracer instrumentation operations for direct ( $\circ$ ) and priority (●) modes

361 2.  $\Pi(\ell_s) = \ell_T$ . The next-hop for *e* is defined, and the tracer forwards the event to the  $\frac{362}{100}$  neighbouring tracer  $\jmath_T$ . The tracer also records a new next-hop by adding  $e.\jmath_S \mapsto \jmath_T$  to  $\Pi$ . 363 The addition of  $e.g. \mapsto g_T$  in cases [1](#page-10-1) and [2](#page-10-2) ensures that future events originating from  $g_s$  can <sup>364</sup> always be forwarded via a next-hop to a neighbouring tracer  $j_T$  (see invariants on lines [37,](#page-9-0) 365 [51,](#page-9-0) and [60\)](#page-9-0). Fig. [5b](#page-9-1) shows the routing maps of the tracers  $T_P$  and  $T_Q$ .  $T_P$  adds  $q_s \mapsto q_T$  in step  $\textcircled{13}$  after processing  $\langle \text{evt}, \neg \diamond, p_s, q_s, f_{s_Q} \rangle$  from its trace buffer in  $\textcircled{10}$ .  $T_P$  then instruments  $\overline{a}$  *Q* with the tracer  $T_Q$  in step  $\overline{w}$ ; an instance of case [1.](#page-10-1) The function INSTRUMENT in alg. [2](#page-11-0) <sup>368</sup> details this on line [4,](#page-10-2) where the mapping  $e_{\cdot}$ ,  $\rightarrow$  *j<sub>T</sub>* is added to  $\Pi$  following the creation of  $\frac{1}{369}$  tracer  $\jmath_T$ , line [3.](#page-10-2) Step 20 of fig. [5b](#page-9-1) is an instance of case [2.](#page-10-2) Here,  $T_P$  adds  $r_S \mapsto q_T$  to  $\Pi_P$ after processing  $\langle \text{evt}, \neg \diamond, q_S, r_S, f_{S_R} \rangle$  for *R* in step  $\circled{B}$  since  $\Pi_P(q_S) = q_T$ . Crucially,  $T_P$  does not 371 instrument a new tracer, but delegates the task to  $T_Q$  by forwarding  $\sim_Q$ . Lines [20](#page-9-0) and [64](#page-9-0) in  $372$  alg. [1](#page-10-0) (and later line [24](#page-12-0) in alg. [3\)](#page-13-1) are manifestations of this, where the mapping  $e.g. \mapsto g_T$  is 373 added after the  $\sim$  event *e* is forwarded to the next-hop  $j<sub>T</sub>$ .  $T_Q$  instruments the SuS process  $T_R$  *R* in step 22 with  $T_R$ , which has the PID  $r_T$ . It then adds the mapping  $r_S \mapsto r_T$  to  $\Pi_Q$  in  $\frac{375}{275}$  step 24, as no next-hop is defined for  $q_s$ , *i.e.*,  $\Pi_Q(q_s) = \bot$ . Henceforth, any events exhibited  $\frac{376}{276}$  by *R* and received at  $T_P$  can be dispatched by the latter tracer through  $T_Q$  to  $T_R$ .

<sup>377</sup> We note that every tracer is only aware of its neighbouring tracers. This means messages may pass through multiple tracers before reaching their intended destination. Next-hop routing keeps the logic inside RIARC straightforward since tracers forward messages based solely on local information in their routing map. Such an approach makes the instrumentation set-up readily adaptable to dynamic changes in the SuS, is easier to scale, and has been shown to induce lower latency when compared to general routing strategies [\[83,](#page-30-10) [107\]](#page-31-13). The DAG of interconnected tracers induced by next-hop routing ensures that every message is eventually delivered to the correct tracer if a path exists or is handled by the tracer otherwise. Fig. [5b](#page-9-1) illustrates this concept, where the next-hop mappings inside Π*<sup>P</sup>* point to *TQ*, and the  $\frac{386}{286}$  mappings in  $\Pi_Q$  point to  $T_R$  in turn. Consequently, any events that *R* exhibits and that  $T_P$ 387 receives are forwarded *twice* to reach the target tracer  $T_R$ : from tracer  $T_P$  to  $T_Q$ , and from  $T_Q$  to  $T_R$ . RIARC relies on the operations DISPATCH and FORWD to accomplish next-hop 389 routing (see alg. [4](#page-34-2) in app. [A\)](#page-34-1). DISPATCH creates a routing packet  $\langle i_s, m \rangle$  and embeds the trace event or detach message *m* to be routed. Alg. [1](#page-10-0) shows how routing packets are handled <sup>391</sup> by tracers. For instance, FORWDEVT extracts the embedded message from the routing packet on line [58](#page-9-0) and queries the routing map to determine the next-hop, line [59.](#page-9-0) If it does, <sup>393</sup> the packet is forwarded, as  $FORWD(r,1_T)$  on line [62](#page-9-0) indicates. Crucially, the **fail** invariant on line [60](#page-9-0) asserts that the next-hop for a routing packet is *always* defined. The cases for  $395$  DISPATCHDTC and FORWDDTC in alg. [1](#page-10-0) are analogous.

### <span id="page-12-0"></span><sup>396</sup> **3.3 Ensuring consistent traces**

<sup>397</sup> Next-hop routing alone does not guarantee trace consistency, *i.e.,* that the order of events in <sup>398</sup> the trace reflects the one in which these occur locally at SuS processes, def. [1.](#page-1-4) Trace event <sup>399</sup> reordering arises when a tracer gathers events of a SuS process (we call these *direct events*) <sup>400</sup> and simultaneously receives *routed events* concerning said process from other tracers. Fig. [6a](#page-12-1) <sup>401</sup> gives another interleaving to the one of fig. [5b](#page-9-1) to underscore the deleterious effect such a <sup>402</sup> race condition provokes when events are reordered at  $T_Q$ . In step  $\textcircled{2}$   $T_Q$  takes over  $T_P$  to 403 continue tracing process *Q*.  $T_Q$  collects the event  $\star_Q$  in step **55**, which happens before  $T_Q$ <sup>404</sup> receives the routed event  $?_Q$  concerning *Q* in step  $\overline{17}$  of fig. [6a.](#page-12-1) If  $T_Q$  processes events from 405 its trace buffer  $\kappa_Q$  in sequence, as in step  $\Omega$ , it violates trace consistency w.r.t.  $Q$  (the <sup>406</sup> correct trace should be '?<sub>*Q*</sub>  $\prec$ <sub>Q</sub>  $\prec$ <sub>Q</sub>'). Naïvely handling  $\star$  before ? erroneously reflects that *Q* <sup>407</sup> receives messages after it terminates.

 RIARC tracers resolve this issue by prioritising the processing of routed trace events using selective message reception [\[42\]](#page-29-15). In doing so, tracers encode the invariant that '*routed* events temporally precede all others that are gathered *directly* by the tracer'. RIARC tracers operate  $\mu_{11}$  in one of two modes, priority (•) and direct ( $\circ$ ), which adequately distinguishes past (*i.e.*, routed) and current (*i.e.,* direct) events from the perspective of the tracer receiving them.

 Fig. [6b](#page-12-1) illustrates this concept. It shows that when in priority mode, *T<sup>Q</sup>* dequeues the 414 routed events  $\partial_Q$  and  $\phi_Q$  labelled by  $\bullet$  first. The event  $\partial_Q$  is handled in step 23, whereas  $_{415}$   $\sim$ <sub>Q</sub> results in the instrumentation of tracer  $T_R$  in step 25 of fig. [6b.](#page-12-1) Meanwhile,  $T_Q$  can still receive events directly from *Q* while priority events are being handled. Yet, direct trace events from *Q* are considered only *after T<sup>Q</sup>* transitions to direct mode. Newly-instrumented tracers default to • mode to implement the described logic; see line [14](#page-34-3) in alg. [4](#page-34-2) of app. [A.](#page-34-1) Loop• in alg. [3](#page-13-1) shows the logic prioritising routed events, which are dequeued on line [3](#page-12-0)

420 and handled on line [6.](#page-12-0) HANDLSPWN, HANDLEXIT, and HANDLCOMM in LOOP<sub></sub> and LOOP<sub>•</sub>

<span id="page-12-1"></span>

**Figure 6** Trace event reordering using priority (●) and direct (○) tracer modes (monitors omitted)

#### **23:14 Runtime Instrumentation for Reactive Components**

```
1 def LOOP(\sigma, \varsigma_M)2 forever do
        r \leftarrow next rtd message from trace buffer \kappam \leftarrow r.m \# Read embedded message in r5 match m.τ do
          case evt: \sigma \leftarrow HANDLEVT_{\bullet}(\sigma, \varsigma_M, r)7 case dtc :
            # dtc ack relayed from dispatch tracer
 s \stackrel{\text{def}}{=} \sigma \leftarrow \text{HANDLDTC}(\sigma, \varsigma_M, r)9 def HANDLEVT<sub>\bullet</sub>(\sigma, \varsigma_M, r)e \leftarrow r.m11 match e.ℓ do
\cos \phi: return HANDLSPWN(\sigma, \zeta_M, r)\cos \theta: return HANDLEXIT_{\bullet}(\sigma,\varsigma_M,r)\text{case} !.?: return HANDLCOMM_\bullet(\sigma,\varsigma_M,r)15 def HANDLSPWN_{\bullet}(\sigma,\varsigma_M,r)e \rightarrow r.m17 match \sigma. \Pi(e.t<sub>S</sub>) do
\text{case } \perp : \text{# No } \text{next-hop for } e.\iota_s; \text{ handle } e19 ANALYSEEVTC.5.220 \mathbf{r}_T \leftarrow r \cdot \mathbf{r}_T \# Read \, PID \, of \, dispatch \, tracer21 \sigma \leftarrow \text{INSTRUMENT}_{\bullet}(\sigma, e, i_{\text{T}})22 case \eta_T: # Next-hop for e.i.s exists via \eta_T_{23} FORWD(r, \eta_T)# Set next-hop of e.<sub>Is</sub> to tracer of e.i<sub>s</sub></sub>
\sigma \cdot \Pi \leftarrow \sigma \cdot \Pi \cup \{ \langle e, \jmath_{\rm S}, \jmath_{\rm T} \rangle \}25 return σ
                                                                              26 def HANDLEXIT. (\sigma, \varsigma_M, r)27 e \leftarrow r m.
                                                                              28 \text{ match } \sigma \cdot \Pi(e.\iota_{\text{S}}) \text{ do}z_9 case \perp : # No next-hop for e.i.s; handle e
                                                                              30 ANALYSEEVT(\zeta_M, e) # C.5.231 \sigma.\Gamma \leftarrow \sigma.\Gamma \backslash \{\langle e.\iota_s, \bullet \rangle\}32 TRYGC(\sigma)
                                                                              \cos \theta_{\text{T}}: FORWD(r, \chi_T)34 return σ
                                                                              35 def HANDLCOMM_{\bullet}(\sigma, \varsigma_M, r)36 \t e \leftarrow r.m37 \text{ match } \sigma \cdot \Pi(e.\imath_{\text{S}}) do
                                                                              \cos \theta = \text{case } \bot: ANALYSEEVT(\zeta_M, eC.5.2
                                                                              _{39} case _{J\text{T}}: FORWD(r, \jmath_{\text{T}})40 return σ
                                                                              41 def HANDLDTC(\sigma, \zeta_M, r)
                                                                              a_4 = d \leftarrow r.m43 match \sigma. \Pi(d_{.jS}) do
                                                                              44 case ⊥ :
                                                                              45 assert d.\iota_{\mathcal{T}} = \text{self}() unexpected dtc ack
                                                                              46 σ.Γ ←

σ.Γ\{⟨d.ȷS ,•⟩}
∪ {⟨d.ȷS ,◦⟩}
                                                                              \mathbf{f} if (\{\langle i_s, \gamma \rangle | \langle i_s, \gamma \rangle \in \sigma \cdot \Gamma, \gamma = \bullet\} = \emptyset)\text{Loop}_{\circ}(\sigma, \varsigma_M) \# Put \ trace \ in \ \circ \ mode49 case j_T:
                                                                              50 assert d.\iota_T \neq \text{self}() dtc meant for \iota_T_{51} FORWD(r, j_T)52 return σ
```
<span id="page-13-1"></span>**Algorithm 3** Logic handling • trace events, detach request acknowledgements, and forwarding

<sup>421</sup> handle events *differently*. A tracer in direct mode performs *one* of three actions (see alg. [1\)](#page-10-0):

<span id="page-13-2"></span> $\mu_{22}$  **1.** it *analyses* events for RV purposes via the function ANALYSEEVT( $\zeta_M, e$ ), *e.g.* line 32.

- 423 **2.** it *dispatches* events that it directly gathers using DISPATCH( $e, \gamma_T$ ), when events ought to <sup>424</sup> be handled by other tracers, *e.g.* line [33,](#page-9-0) or
- <span id="page-13-3"></span>3. it *forwards* routed events to the next-hop through FORWD $(r, \eta_T)$ , *e.g.* line [62.](#page-9-0)
- <sup>426</sup> Tracers in priority mode exclusively handle routed messages as points [1](#page-13-2) and [3](#page-13-3) describe, *e.g.* <sup>427</sup> lines [38](#page-12-0) and [39](#page-12-0) in alg. [3.](#page-13-1) However, no event dispatching is performed.

## <span id="page-13-0"></span><sup>428</sup> **3.4 Isolating tracers**

<sup>429</sup> A tracer in priority mode coordinates with the dispatch tracer of a particular SuS process <sup>430</sup> it traces. This enables the tracer to determine when *all* of the events of that process have <sup>431</sup> been routed to it by the dispatch tracer. The negotiation is effected using dtc, which the <sup>432</sup> tracer sends to the relevant dispatch tracer. Each tracer records the set of processes it traces 433 in the *traced-processes map*,  $\Gamma: \text{PID}_s \to \{\circ,\bullet\}$ . A SuS process mapping is added to  $\Gamma$  when a <sup>434</sup> tracer starts gathering trace events for that process and removed once the process terminates. 435 Lines [6](#page-10-2) and [14](#page-10-2) in alg. [2](#page-11-0) add fresh mappings to  $\Gamma$ ; lines [26](#page-9-0) in alg. [1](#page-10-0) and [31](#page-12-0) in alg. [3](#page-13-1) purge <sup>436</sup> mappings from Γ. A tracer in priority mode must issue a dtc request *for each* process it <sup>437</sup> tracks in Γ before it can transition to direct mode and start operating on the trace events it 438 gathers directly. The detach request,  $d = \langle \text{dtc}, i_T, i_S \rangle$ , contains the PIDs of the issuing tracer

#### **Aceto et al. 23:15**

<sup>439</sup> and the SuS process to be detached from the dispatch tracer. Once the tracer receives an  $\frac{440}{400}$  acknowledgement to the dtc request for the SuS PID  $d.\iota_{\rm s}$  from the dispatch tracer, it updates 441 the corresponding entry  $d.\iota_s \mapsto \bullet$  in  $\Gamma$ , marking it as detached,  $d.\iota_s \mapsto \circ$ . Alg. [3](#page-13-1) shows this <sup>442</sup> logic on line [46.](#page-12-0) A tracer transitions from priority to direct mode once *all* the processes in 44[3](#page-13-1) its  $\Gamma$  map are marked detached; line [47](#page-12-0) in alg. 3 performs this check. Once in direct mode, <sup>444</sup> tracers are isolated from others in the choreography.

Fig. [6b](#page-12-1) depicts the tracer  $T_Q$  in priority mode sending the detach request  $\langle dt c, p_T, p_S \rangle$ <sup>446</sup> for SuS PID *Q* to the dispatch tracer. This happens in step  $\circled{3}$ , after  $T_Q$  starts tracing *Q*  $\frac{447}{447}$  directly in step  $\circled{2}$  $\circled{2}$  $\circled{2}$ . Alg. 2 effects this transaction with the dispatch tracer by the operation **448** DETACH on line [13;](#page-10-2) see app. [A](#page-34-1) for definition of DETACH. The dtc request issued by  $T_Q$ 449 is deposited in the trace buffer of the dispatch tracer  $T_P$  after the events  $?_Q$  and  $\sim_Q$ .  $T_P$ 450 processes the messages in its buffer sequentially in  $(\mathbb{D}, \mathbb{D}, \mathbb{D}, \mathbb{D})$  and  $(\mathbb{B})$ , and forwards  $?_Q$ 451 and  $\sim_Q$  to  $T_Q$ , steps 18 and 21. Crucially,  $T_P$  *acknowledges* the dtc request issued by  $T_Q$ :  $T_P$  dispatches dtc back to tracer  $T_Q$ , as step 29 indicates.  $T_Q$  first handles the events  $\partial_Q$  and  $\sim_{\mathbb{Q}}$  (tagged with • in fig. [6b\)](#page-12-1) in steps 23 and 24. Lastly,  $T_Q$  handles dtc in 30 and marks <sup>454</sup> process *Q* as detached from its dispatch tracer *T<sup>P</sup>* . The update on the traced-process map Γ <sup>455</sup> is performed by HANDLDTC on line [46](#page-12-0) in alg. [3.](#page-13-1) Tracer  $T_Q$  in fig. [6b](#page-12-1) transitions to direct  $^{456}$  mode in step  $\circled{3}$ , when the only process *Q* that it traces is detached.  $T_Q$  resumes handling  $\star$ <sub>57</sub>  $\star$ <sub>Q</sub> in step 32, which is consistent w.r.t. the events exhibited locally at *Q*, *i.e.*,  $\cdot$ <sup>2</sup><sub>0</sub>  $\cdot$  $\star$ <sub>0</sub><sup>2</sup>. An acknowledgement to a detach request sent from a dispatch tracer,  $\langle d\mathfrak{t}, \iota_T, \iota_s \rangle$ , is 459 generally propagated through multiple next-hops before it reaches the tracer with PID  $i_{\rm T}$ 

<sup>460</sup> issuing the request. Since a dtc request informs the dispatch tracer that  $i<sub>T</sub>$  is gathering trace  $\frac{461}{100}$  events for the SuS PID  $\iota_{\rm s}$  *directly*, the next-hop entries in the routing maps of tracers on the  $462$  DAG path from the dispatch tracer to  $i<sub>T</sub>$  are *stale*. Each tracer on this DAG path purges <sup>463</sup> the next-hop entry for the SuS PID  $i<sub>s</sub>$  in  $\Gamma$  once it forwards dtc to the neighbouring tracer. <sup>464</sup> DispatchDtc and ForwdDtc in alg. [1](#page-10-0) perform this clean-up. Fig. [6b](#page-12-1) does not illustrate <sup>465</sup> the latter clean-up flow, which we summarise next. After receiving dtc, the dispatch tracer  $T_P$  removes from  $\prod_P$  the next-hop mapping  $q_S \mapsto q_T$  and calls DISPATCHDTC to acknowledge <sup>467</sup> the detach request  $\langle \text{dtc}, q_T, q_S \rangle$  it receives from  $T_Q$ . Similarly,  $T_P$  removes  $r_S \mapsto q_T$  once it 468 acknowledges the detach request  $\langle \text{dtc}, r_{\text{T}} , r_{\text{S}} \rangle$  sent from  $T_R$ . Once  $T_Q$  receives the routing 469 packet  $\langle \text{rtd}, p_T, \langle \text{dtc}, r_T, r_s \rangle \rangle$  that embeds the detach acknowledgement  $T_P$  sends, it removes  $\frac{470}{470}$  the next-hop mapping  $r_S \mapsto r_T$  from  $\Pi_Q$ .  $T_Q$  then forwards this dtc acknowledgement to  $T_R$ . <sup>471</sup> RIARC ensures that all routing packets carrying dtc acknowledgements terminate at the

<sup>472</sup> tracers that issued these dtc requests. This requires *one* of two tracer conditions to hold:

<span id="page-14-1"></span><sup>473</sup> **1.** either the tracer cannot forward the dtc acknowledgement to a next-hop, meaning that <sup>474</sup> the tracer sent the dtc request, or

<span id="page-14-2"></span><sup>475</sup> **2.** the tracer can forward the dtc acknowledgement via a next-hop, in which case the tracer <sup>476</sup> did not issue the dtc request.

<sup>477</sup> Alg. [3](#page-13-1) enforces this invariant on lines [44](#page-12-0) and [45](#page-12-0) for case [1,](#page-14-1) and on lines [49](#page-12-0) and [50](#page-12-0) for case [2.](#page-14-2)

### <span id="page-14-0"></span><sup>478</sup> **3.5 Minimising overhead**

<sup>479</sup> Instrumenting specific processes—in contrast to fully instrumenting the SuS—reduces the 480 volume of gathered trace events and helps lower the runtime overhead induced. RIARC uses <sup>481</sup> the instrumentation map,  $\Lambda$ :SIG<sub>S</sub>  $\rightarrow$  SIG<sub>M</sub>, to this end.  $\Lambda$  specifies the SuS function signatures <sup>482</sup> to instrument and the corresponding RV monitor signatures tasked with the analysis via 483 ANALYSEEVT. RIARC utilises the signature  $e.\varsigma_{\rm s}$  carried by spawn events  $e = \langle \text{evt}, \neg \diamond, \iota_{\rm s}, \iota_{\rm s}, \iota_{\rm s} \rangle$  to  $484$  determine whether the SuS process spawning  $e.\varsigma_{\rm s}$  requires a separate tracer. The INSTRUMENT 485 operations in alg. [2](#page-10-2) perform this check against  $\Lambda$  (lines 2 and [9\)](#page-10-2). If a separate tracer is

#### **23:16 Runtime Instrumentation for Reactive Components**

486 not required,  $e_{\cdot}g_{\rm s}$  is instrumented using the tracer of its parent process,  $e_{\cdot}g_{\rm s}$ ; see tracing 487 assumptions  $A_1$  $A_1$  and  $A_2$ . This logic caters for all the set-ups shown in figs. [1b,](#page-3-0) [1c,](#page-3-0) and [2b.](#page-5-0)

### <span id="page-15-1"></span>**3.6 Shrinking the set-up**

 RIARC remains elastic by discarding unneeded tracers. Tracers in direct and priority mode <sup>490</sup> purge SuS PID references from the traced-process map when handling  $\star$  trace events.  $_{491}$  $_{491}$  $_{491}$  HANDLEXIT<sub>o</sub> and HANDLEXIT<sub>o</sub> implement this logic in algs. 1 and [3](#page-13-1) on lines [26](#page-9-0) and [31.](#page-12-0) Tracer termination does *not* occur when the tracer has no processes left to trace, *i.e.,* when  $\Gamma = \emptyset$ , since the tracer may be required to forward trace events to neighbouring tracers. Instead, tracers perform a garbage collection check each time a mapping from  $\Gamma$  or  $\Pi$  is <sup>495</sup> removed. A tracer terminates when  $\Gamma = \Pi = \emptyset$ , indicating that it has no SuS processes left to 496 trace or any next-hop forwarding to perform. TRYGC used on lines  $27, 41$  $27, 41$ , and  $55$  in alg. [1,](#page-10-0) as well as on line [32](#page-12-0) in alg. [3](#page-13-1) encapsulates this check. Note that garbage collection never prematurely disrupts the RV analysis that tracers conduct, as invocations to AnalyseEvt always precede TryGC checks in our logic of algs. [1](#page-10-0) and [3.](#page-13-1)

## <span id="page-15-0"></span>**4 Correctness validation**

 We assess the validity of RIARC in two stages. First, we confirm its implementability by instantiating the core logic of algs.  $1-3$  $1-3$  $1-3$  to Erlang. Our implementation targets two RV scenarios: online and offline monitoring [\[65,](#page-30-3) [21\]](#page-28-1). Second, we subject the implementation to a series of systematic tests using a selection of instrumentation set-ups. These tests exhaustively emulate the interleaved execution of the SuS and tracer processes by generating all the *valid* permutations of events in a set of traces. This exercises the tracer choreography invariants mentioned in sec. [3,](#page-7-0) confirming the integrity of the tracer DAG topology under each interleaving. We also use specialised RV monitor signatures in AnalyseEvt to assert the soundness (def. [1\)](#page-1-4) of trace event sequences analysed by tracers; see algs. [1](#page-10-0) and [3](#page-13-1) in sec. [3.](#page-7-0)

## <span id="page-15-3"></span>**4.1 Implementability**

 $_{511}$  Our implementation of RIARC maps the tracer processes from sec. [3](#page-7-0) to Erlang actors<sup>[2](#page-15-2)</sup>. The routing (Π), instrumentation (Λ), and traced-processes (Γ) maps constituting the tracer state *σ* are realised as Erlang maps for efficient access. Trace event buffers *κ* coincide with actor  $_{514}$  $_{514}$  $_{514}$  mailboxes, while the remaining logic in algs.  $1-3$  $1-3$  translates directly to Erlang code. This one-to-one mapping gives us confidence that our implementation reflects the algorithm logic. In *online* RV, monitors analyse trace events while the SuS executes, whereas the *offline* setting defers this analysis until the system terminates. Fig. [11](#page-36-0) in app. [B.1](#page-36-1) captures the distinction in process tracing between online and offline instrumentation in our setting  $\frac{1}{219}$  (showing trace buffers only). The online instrumentation set-up (fig. [11a\)](#page-36-0) employs the tracing infrastructure offered by the EVM, which deposits SuS trace event messages in  $\frac{521}{221}$  tracer mailboxes. Erlang tracing complies with tracing assumption  $A_1$  $A_1$ , enabling RIARC to 522 instrument disjoint SuS processes sets. We configure the EVM with the set\_on\_spawn flag  $\frac{523}{223}$  so that spawned processes automatically inherit the same tracer as their parent [\[42\]](#page-29-15). This tracer assignment is atomic, meeting tracing assumption  $A_2$  $A_2$ . We also use the procs, send, 525 and **receive** tracing flags, which constrain the events emitted by the EVM to  $\sim$ ,  $\star$ , !, and  $\star$ .

<span id="page-15-2"></span>The artefact may be found at <https://doi.org/10.5281/zenodo.10634182>.

#### **Aceto et al. 23:17**

 $\frac{1}{226}$  The EVM enforces single-process tracing, *i.e.*, tracing assumption  $A_3$  $A_3$ , and guarantees that

 $\sim$  events of descendant processes are causally-ordered [\[137\]](#page-32-15), *i.e.*, tracing assumption  $A_4$  $A_4$ .

 The offline counterpart differs only in its tracing layer, where events are read as *recorded* runs of the SuS. Recorded runs can be obtained externally, *e.g.* using DTrace [\[37\]](#page-28-0) or LTTng [\[56\]](#page-29-2), making it possible to monitor systems that execute outside of the EVM. Our  $\frac{531}{531}$  bespoke offline tracing engine of fig. [11b](#page-36-0) fulfils tracing assumptions  $A_1 - A_4$  $A_1 - A_4$ . This is crucial since it permits the *same* implementation of RIARC to be used in online and offline settings. Sec. [4.2](#page-16-0) leverages this aspect to validate RIARC exhaustively using trace permutations.

 We develop two versions of the Trace, Clear, and Preempt functions of alg. [5](#page-35-0) to standardise the tracing API for online and offline use. The overloads for online use give access to the EVM tracing via the Erlang built-in primitive trace  $[42]$ . The second set of overloads wraps around our offline tracing engine to replay files containing specifically-formatted trace  $\frac{538}{538}$  events. Offline tracing relaxes tracing assumption  $A_4$  $A_4$ , as recorded runs do not generally  $\frac{539}{139}$  guarantee that the  $\sim$  events of descendant SuS processes are causally ordered. Our offline  $_{540}$  tracing logic relies on the PID information carried by  $\Diamond$  events to rearrange them causally <sup>541</sup> and recover the causal ordering per tracing assumption  $A_4$  $A_4$ . Trace  $(i_S, i_T)$  registers a tracer  $\frac{542}{1}$  *u*<sub>T</sub> with the offline tracing engine, which maintains an event buffer for  $\frac{1}{4}$ , together with a  $_{543}$  set of SuS PIDs that  $i_T$  traces. A tracer can use TRACE with multiple SuS PIDs to register  $_{544}$  to obtain events for a set of processes, *i.e.*, tracing assumption  $A_1$  $A_1$ . The tracing engine accumulates the events it reads from file in each tracer buffer and delivers events to the  $_{546}$  corresponding tracer mailbox once the casual ordering between  $\sim$  events of descendant SuS processes is established. Our offline tracing engine implements tracing inheritance (tracing 548 assumption  $A_2$  $A_2$ ) and enforces single-process tracing (tracing assumption  $A_3$ ). Ex. [7](#page-36-2) in app. [B.1](#page-36-1) sketches how the tracing engine uses its internal tracer buffers to deliver events to tracers.

## <span id="page-16-0"></span>**4.2 Correctness**

 Conventional testing does not guarantee the absence of concurrency errors due to the different interleaved executions that may be possible [\[108\]](#page-31-14). While subjecting the system under test to high loads raises the likelyhood of obtaining more coverage, this still depends on external factors, such as scheduling, which dictate the executions induced in practice. Controlling the conditions for concurrency testing requires a *systematic exploration* of all the interleaved executions [\[77\]](#page-30-11). In fact, it is *not the size* of the testing load that matters, but the choice of interleaved executions that exhaust the space of possible system states [\[13\]](#page-27-13). Concuerror [\[48\]](#page-29-17) is a tool for systematic Erlang code testing. Unfortunately, we could not use Concuerror to test our RIARC implementation, as we were unable to integrate it with Erlang tracing.

 We, nevertheless, adopt the systematic scheme advocated by Concuerror. Our approach uses the offline tracing tool described in sec. [4.1](#page-15-3) to induce specific interleaved sequences for instrumentation set-ups, such as those of figs. [1b,](#page-3-0) [1c,](#page-3-0) and [2a.](#page-5-0) We obtain these sequences by taking all the sound (def. [1\)](#page-1-4) event permutations of traces produced by the SuS. These <sub>564</sub> sequences are then replayed by the offline tracing engine to systematically induce interleaving sequences in the SuS. Our final RIARC implementation embeds additional invariants besides the ones mentioned in sec. [3,](#page-7-0) *e.g.* the **assert** and **fail** statements in algs. [1](#page-10-0) and [3.](#page-13-1) Readers are referred to app. [B.2](#page-37-0) for the full list. We ascertain *trace soundness* for each SuS interleaving that is emulated. This is accomplished via the function AnalyseEvt, which we preload with monitors that assert the event sequence expected at each tracer. We also use identical tests in our empirical evaluation of sec. [5](#page-17-0) under high loads. It is worth mentioning that while we systematically drive the execution of the SuS, we do not control the execution of tracers. Yet, we indirectly induce various dynamic tracer arrangements in the monitor DAG topology

#### **23:18 Runtime Instrumentation for Reactive Components**

 under the different groupings of SuS process sets that tracers instrument. For example, 574 we fully instrument system depicted in fig. [2a](#page-5-0) in all its configurations, *e.g.*  $C_1 = [T_{\{P\}} \leadsto$  ${\cal F}_{575}$  { $P$ },  $T_{\{Q\}} \leadsto \{Q\}$ ,  $T_{\{R\}} \leadsto \{R\}$ ],  $C_2 = [T_{\{P,Q\}} \leadsto \{P,Q\}, T_{\{R\}} \leadsto \{R\}]$ , ...,  $C_5 = [T_{\{P,Q,R\}} \leadsto \{P,Q,R\}]$ , 576 as well as instrument it partially, *e.g.*  $C_6 = [T_{\{P\}} \leadsto {P}\}, C_7 = [T_{\{P, Q\}} \leadsto {P,Q}\}, etc.$  Each of these configurations, when individually paired with every fabricated interleaved execution of the SuS, indicate that our RIARC implementation and corresponding logic of sec. [3](#page-7-0) is correct.

## <span id="page-17-0"></span>**5 Empirical evaluation**

 We assess the feasibility of our RIARC implementation, confirming it safeguards the *responsive*, *resilient*, *message-driven*, and *elastic* attributes of the SuS. Sec. [4](#page-15-0) targets a small selection of instrumentation set-ups to induce interleaved execution sequences and validate correctness exhaustively. We now employ *stress testing* [\[112\]](#page-31-15) to investigate how RIARC performs in terms of the *runtime overhead* it exhibits. Our study focusses on *online* monitoring, as its overhead requirement is far more stringent than offline monitoring [\[64,](#page-30-12) [65,](#page-30-3) [21,](#page-28-1) [74\]](#page-30-1). We evaluate RIARC against inline instrumentation since the latter is regarded as the most efficient  $\frac{587}{2}$  instrumentation technique [\[63,](#page-29-0) [62,](#page-29-18) [21\]](#page-28-1). This comparison establishes a solid basis for our results to be generalised reliably. We also compare RIARC to centralised instrumentation to confirm that the latter approach does not scale under typical loads.

 Our experiments are extensive. We use two hardware platforms to model edge-case scenarios based on limited hardware and general-case scenarios using commodity hardware. <sub>592</sub> The evaluation subjects inline, centralised, and RIARC instrumentation to high loads that go beyond the state of the art and use realistic workload profiles. We gauge overhead under three performance metrics, the *response time*, *memory consumption*, and *scheduler utilisation*,  $_{595}$  which are crucial for reactive systems [\[7,](#page-27-14) [112\]](#page-31-15). Our results confirm that the overhead RIARC  $\frac{596}{2}$  induces is adequate for applications such as soft real-time systems [\[42,](#page-29-15) [97\]](#page-31-1), where the latency <sub>597</sub> requirement is typically in the order of seconds [\[95\]](#page-31-16). We also show that RIARC yields overhead comparable to inlining in settings exhibiting moderate concurrency.

### **5.1 Benchmarking tool**

 Benchmarking is standard practice for gauging runtime overhead in software [\[103,](#page-31-17) [80,](#page-30-13) [36\]](#page-28-11). <sup>601</sup> Frameworks, including DaCapo [\[28\]](#page-28-12) and Savina [\[87\]](#page-31-18), offer limited concurrency, making them inapplicable to our case; see App. [C.1](#page-40-0) for detailed reasons. Industry-proven *synthetic* load testing benchmarking tools cater to reactive systems, *e.g.* Apache JMeter [\[70\]](#page-30-14), Tsung [\[118\]](#page-32-16), and Basho Bench [\[23\]](#page-28-13). Their general-purpose design, however, necessarily treats systems as a black box by gathering metrics externally, which may impact measurement *precision* [\[7\]](#page-27-14). Moreover, these load testers generate standard workloads, *e.g.* Poisson processes [\[82,](#page-30-15) [105,](#page-31-19) [92\]](#page-31-20), but lack others, *e.g.* load bursts, that replicate typical operation or induce edge-case stress.

 We adopt BenchCRV [\[7\]](#page-27-14), another synthetic load tester specific to RV benchmarking for reactive systems. It sets itself apart from the tools above because it does not require external software (*e.g.*, a web server) to drive tests. Instead, BenchCRV produces different models that *closely emulate* real-world software behaviour. These models are based on the master-worker paradigm [\[127\]](#page-32-17): a pervasive architecture in distributed (*e.g.* Big Data frameworks, render 613 farms) and concurrent systems [\[138,](#page-32-18) [76,](#page-30-16) [55,](#page-29-19) [141\]](#page-33-0). Like Tsung and Basho Bench, BenchCRV exploits the lightweight EVM process model to generate highly-concurrent workloads.

 BenchCRV creates master-worker models and induces workloads derived from configurable parameters. In these models, the master process spawns a series of workers and allocates tasks. The volume of workers per benchmark run is set via the parameter *n*. Each worker <sup>618</sup> task consists of a *batch* of requests that the worker receives, processes, and echoes back to

<sub>619</sub> the master process. The amount of requests batched in one task is given by the parameter

<sup>620</sup> *w*. Workers terminate when all of their allotted tasks are processed and acknowledged by

<sup>621</sup> the master. BenchCRV creates workers based on *workload profiles*. A profile dictates how

 $\frac{622}{102}$  the master spreads its creation of workers along the loading timeline, *t*, given in seconds.

- <sup>623</sup> BenchCRV supports three workload profiles based on ones typical in practice (*e.g.* see fig. [13\)](#page-43-0):
- <sup>624</sup> **Steady** models the SuS under stable workload (Poisson process).
- <sup>625</sup> **Pulse** models the SuS under gradually rising and falling workload (Normal distribution).
- <sup>626</sup> **Burst** models the SuS under stress due to workload spikes (Log-normal distribution).
- $627$  The tool records three performance metrics to give a multi-faceted view of system overhead:
- <sup>628</sup> **Mean response time** in milliseconds (ms), gauging monitoring latency effects on the SuS.
- <sup>629</sup> **Mean memory consumption** in GB, gauging monitoring memory pressure on the SuS.
- <sup>630</sup> **Mean scheduler utilisation** as a percentage of the total processing capacity, showing how <sup>631</sup> monitors maximise the scheduler use.

<sub>632</sub> The prevalent use of the master-worker paradigm, the veracity with which BenchCRV models

<sup>633</sup> systems, the range of realistic workload profiles, and the choice of runtime metrics it gathers

<sup>634</sup> make this tool ideal for our experiments. Readers are referred to app. [C.2](#page-40-1) and [\[7\]](#page-27-14) for details.

## <span id="page-18-4"></span><sup>635</sup> **5.2 Benchmark configuration**

636 The BenchCRV master-worker models we generate take the role of the SuS in our experiments.

<sup>637</sup> We consider *edge-case* and *general-case* hardware platform set-ups for the following reasons:

<span id="page-18-0"></span><sup>638</sup> **P<sup>E</sup> Edge-case** captures platforms with *limited* hardware. It uses an Intel Core i7 M620 64-bit

<sup>639</sup> CPU with 8GB of memory, running Ubuntu 18.04 LTS and Erlang/OTP 22.2.1.

- <span id="page-18-1"></span><sup>640</sup> **P<sup>G</sup> General-case** captures platforms with *commodity* hardware. It uses an Intel Core i9 <sup>641</sup> 9880H 64-bit CPU with 16GB of memory, running macOS 12.3.1 and Erlang/OTP 25.0.3.
- <sup>642</sup> The EVMs on platforms  $P_{\rm E}$  $P_{\rm E}$  and  $P_{\rm G}$  are set with 4 and 16 scheduling threads, respectively. <sup>643</sup> These scheduler settings coincide with the processors available on each SMP [\[11\]](#page-27-7) platform.

644 We also use the  $P_{\rm E}$  $P_{\rm E}$  and  $P_{\rm G}$  platforms with two concurrency scenarios for reactive systems:

<span id="page-18-2"></span><sup>645</sup> **C<sup>H</sup> High concurrency scenarios** perform short-lived tasks, *e.g.* web apps that fulfil thousands

<sup>646</sup> of HTTP client requests by fetching static content or executing back-end commands.

- <span id="page-18-3"></span><sup>647</sup> **C<sup>M</sup> Moderate concurrency scenarios** engage in long-running, computationally-intensive tasks, <sup>648</sup> *e.g.* Big Data stream processing frameworks.
- 649 Our benchmark workloads match the hardware capacity afforded by  $P_E$  $P_E$  and  $P_G$ :
- 650 **High concurrency benchmarks** on  $P_E$  $P_E$  set  $n = 100k$  workers and  $w = 100$  work requests  $\epsilon_{651}$  per worker. These generate  $\approx (n \times w$  requests  $\times w$  responses) = 20M message exchanges 652 between the master and worker processes, totalling  $\approx (20M \times 1$  events  $\times$ ? events) = 40M  $\epsilon_{653}$  analysable trace events. [P](#page-18-1)latform  $P_G$  sets  $n = 500k$  workers batched with  $w = 100$  requests 654 to produce ≈ 100M messages and ≈ 200M trace events. The high concurrency model  $C_H$  $C_H$ <sup>655</sup> is studied in sec. [5.4.](#page-19-0)
- 656 **Moderate concurrency benchmarks** on  $P_G$  $P_G$  set  $n = 5k$  workers and  $w = 10k$  work requests  $657$  per worker. These settings yield roughly the same number of trace events as on  $P_G$  $P_G$  with 658 concurrency scenario  $C_H$  $C_H$ . The moderate concurrency model  $C_M$  is studied in sec. [5.5.](#page-23-0)

659 All experiments in secs. [5.4](#page-19-0) and [5.5](#page-23-0) use a total loading time of  $t = 100$ s. Each experiment consists of *ten* benchmarks that apply Steady, Pulse, and Burst workloads. We repeat every experiment *three* times to obtain *negligible variability* and ensure the accuracy of our results; see app. [C.4](#page-42-0) for a summary of these workloads and app. [C.5](#page-44-1) for the precautions we take.

#### **23:20 Runtime Instrumentation for Reactive Components**

<sup>663</sup> The hardware, OS, and Erlang versions of platforms  $P_E$  $P_E$  and  $P_G$ , combined with the 664 workloads of concurrency scenarios  $C_H$  $C_H$  and  $C_M$  provide generality to our conclusions.

## <span id="page-19-4"></span>**5.3 Instrumentation configuration**

 One challenge in conducting our experiments is the lack of RV monitoring tools targeting the EVM. To the best of our knowledge  $[65,$  Tables 3 and 4], detectEr  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $[75, 16, 17, 15, 73, 40]$  $\frac{668}{100}$  is the only RV tool for Erlang that implements centralised outline instrumentation<sup>[3](#page-19-1)</sup>. We are unaware of inline RV tools besides [\[39\]](#page-28-7) and [\[3,](#page-27-5) [4\]](#page-27-15). Since the former tool is *unavailable*, we  $\epsilon_{670}$  use the latter, more recent work<sup>[4](#page-19-2)</sup>. In our experiments, we instrument the master *and each*  worker process in the SuS models generated from sec. [5.2](#page-18-4) to exert the highest possible load and capture *worst-case* scenarios. BenchCRV annotates work requests and responses with a  $\frac{673}{673}$  unique sequence number to account for each message in benchmark runs. We leverage this numbering to write specialised monitor replicas that ascertain the *soundness* of trace event  $\frac{675}{675}$  sequences reported to every RV monitor linked with the master and workers; see app. [C.5](#page-44-1) for details. Equally crucial, this runtime checking introduces a degree of *realistic* RV analysis slowdown that is *uniform* across all monitors in the inline, centralised, and RIARC monitoring  $\frac{678}{678}$  set-ups. We empirically estimate this slowdown at  $\approx 5$ us per analysed event.

## <span id="page-19-0"></span>**5.4 High concurrency benchmarks**

680 We study runtime overhead in the high concurrency scenario  $C_H$  $C_H$  with two aims. First, we show <sup>681</sup> the effect overhead has on the SuS as it executes. Specifically, we consider how the memory consumption and scheduler utilisation impact the *latency* a client of the SuS experiences, *e.g.*  $\frac{683}{683}$  end-user or application. We use the edge-case platform  $P<sub>E</sub>$  $P<sub>E</sub>$  for these experiments; analogous  $\frac{684}{684}$  results obtained on  $P_G$  $P_G$  are detailed in app. [C.](#page-40-2) Our second goal targets the general-case  $\frac{685}{685}$  platform  $P_G$  $P_G$  to assess the *scalability* of the instrumentation methods through their optimal use of the *additional* memory and scheduler capacity afforded by [P](#page-18-1)G.

 The charts in secs. [5.4.1](#page-19-3) – [5.4.3](#page-22-0) plot performance metrics, *e.g.* memory consumption (*y*-axis) against the number of concurrent worker processes or the execution duration (*x*-axis). <sup>689</sup> Since inline instrumentation prevents us from delineating the SuS and monitoring-induced runtime overhead, we follow the standard RV literature practice and include the *baseline* plots, *e.g.* [\[17,](#page-27-4) [75,](#page-30-7) [46,](#page-29-9) [39,](#page-28-7) [102,](#page-31-21) [117,](#page-32-7) [115\]](#page-32-19). Baseline plots show the *unmonitored* SuS to compare the relative overhead between each evaluated instrumentation method.

## <span id="page-19-3"></span>**5.4.1 Instrumentation overhead**

<sub>694</sub> The first set of experiments isolates the instrumentation overhead induced on the SuS: this is the aggregated cost of tracing *and* reporting the traces soundly per def. [1](#page-1-4) to RV monitors. Crucially, these experiments *omit monitors*, as we want to quantify the instrumentation overhead and understand its impact on the SuS. This enables us to focus on the differences 698 between inlining—regarded as the most efficient instrumentation method  $[63, 62, 21]$  $[63, 62, 21]$  $[63, 62, 21]$  $[63, 62, 21]$  $[63, 62, 21]$ —and outlining. As far as we know [\[65,](#page-30-3) [74\]](#page-30-1), outlining has *never* been used for decentralised RV in a *dynamic* setting such as ours. While we confirm that inline instrumentation uses less memory and scheduler capacity, RIARC dynamically scales and economises their use *without* adverse impact on the latency. In fact, the latency induced by RIARC is a mere 519ms higher than

<span id="page-19-1"></span><https://bitbucket.org/duncanatt/detecter-lite>

<span id="page-19-2"></span><https://github.com/ScienceofComputerProgramming/SCICO-D-22-00294>

 that of inline instrumentation at the peak stress-inducing loading point of 3*.*7k workers/s under Burst workloads. Our experiments indicate that centralised instrumentation manages

 resources poorly due to its inability to scale, increasing the chances of failure; see sec. [5.4.2.](#page-20-0) Fig. [7](#page-21-0) plots our results. Centralised instrumentation carries the largest overhead penalty.  $\frac{707}{707}$  Regardless of the workload applied, it uses the most memory,  $\approx 3.8GB$ , highlighting its ineptitude to scale. This stems from the backlog of trace event messages that accumulate in the mailbox of the central tracer and is a manifestation of two aspects. First, the central tracer does not consume events at the same rate worker processes produce them. Evidence of this *bottleneck* is visible as high scheduler utilisation in fig. [7](#page-21-0) (bottom). This values settles at ≈ 36% for the benchmarks with ≈ 40k workers under the Steady workload and ≈ 60k workers under Pulse and Burst workloads. Interpreting these *<* 36% scheduler usage values in isolation may suggest that centralised instrumentation has the potential to scale. However, its memory consumption plots in fig. [7](#page-21-0) (middle) contradict this erroneous hypothesis.

 By contrast, RIARC uses fewer resources to yield lower response times across the three workloads. The scheduler utilisation for RIARC slightly plateaus in the Steady ( $\approx$ 60k workers)  $_{718}$  and Pulse ( $\approx$  70k workers) workload charts. This is not owed to scalability limitations of RIARC but to the intrinsic throttling instigated by the master process [\[127\]](#page-32-17). In fact, the plots for the baseline system and inline instrumentation in fig. [7](#page-21-0) (middle) exhibit analogous signs of throttling. Even at a peak Burst workload of 3*.*7k workers/s, inline and RIARC instrumentation consume fairly similar amounts of memory, 1*.*7GB *vs.* 1*.*9GB, respectively.

## <span id="page-20-0"></span>**5.4.2 Monitoring overhead**

 Our second set of experiments extends the results of sec. [5.4.1](#page-19-3) and quantifies the cost of RV monitoring. The *runtime monitoring* overhead combines the instrumentation and slowdown due to the RV analysis, established at  $\approx$  5µs per event in sec. [5.3](#page-19-4) for our experiments. Fig. [8](#page-22-1) plots the instrumentation (*instr.*) overhead from sec. [5.4.1](#page-19-3) next to the runtime monitoring overhead (*mon.*). It shows that the RV analysis slowdown aggravates centralised monitoring to the point of crashing. Inline and RIARC monitoring are minimally affected. Our results also reveal that the instrumentation incurs the *major* overhead portion, not the RV analysis. Sec. [5.6](#page-25-0) comments on this finding in the context of existing RV tools.

 Fig. [8](#page-22-1) plots our results under the Steady and Burst workloads; fig. [14](#page-46-0) in app. [C.6.1](#page-45-0) includes all three workloads. The charts for centralised monitoring exhibit a significant disparity between the instrumentation and runtime monitoring bar plots as the workload increases. This trend is consistent across both workloads in fig. [8.](#page-22-1) The lack of scalability of centralised monitoring in fig. [8](#page-22-1) manifests as an increase in memory consumption but stabilised scheduler usage, as in fig. [7.](#page-21-0) Memory consumption and scheduler usage for centralised monitoring grow  $\frac{738}{128}$  rapidly beyond  $\approx 30k$  and  $\approx 20k$  workers under the Steady and Burst workloads, respectively. Bottlenecks led our experiments to crash (shown as missing bar plots in fig. [8\)](#page-22-1). Crashes  $_{740}$  occur at ≈70k workers under the Steady and at ≈80k under Burst workload. By analysing the resulting dumps, we could attribute these crashes to memory exhaustion, which caused the EVM to fail. The dumps indicate severe memory pressure due to the vast backlog of trace event messages in the mailbox of the central tracer.

 Inline and RIARC monitoring scale to accommodate the RV analysis slowdown. This is confirmed by cross-referencing the memory consumption and scheduler utilisation in fig. [8](#page-22-1) for both monitoring methods. Each displays comparable overhead in their respective instrumentation and corresponding runtime monitoring bar plots. Fig. [8](#page-22-1) (top) shows that inline and RIARC monitoring increase the latency, albeit for different reasons. The internal operation of RIARC enables us to deduce that its latency stems from message routing and

<span id="page-21-0"></span>

**Figure 7** Isolated instrumentation overhead (*high* workload, 100k workers)

<sup>750</sup> dynamic tracer reconfiguration. Its scheduler utilisation plots support this observation. The <sup>751</sup> latency due to inlining is a direct effect of RV analysis slowdown, provoked by the lock-step <sup>752</sup> execution of monitors and the SuS. Other works, *e.g.* [\[46,](#page-29-9) [38\]](#page-28-6), offer similar observations.

 Dissecting our results uncovers further subtleties. The optimal scheduler utilisation of RIARC implies that its monitors are only active when triggered by trace events but remain idle otherwise. This inference is supported by the absence of sudden or continued memory growth for RIARC in fig. [8](#page-22-1) (middle). The instrumentation and runtime monitoring latency bar plots for inline monitoring exhibit a growing pairwise gap that starts at  $\approx 80k$  workers  $_{758}$  $_{758}$  $_{758}$  in fig. 8 (top right). The respective gap for RIARC at this mark is perceptibly lower. We credit this lower latency gap to outlining, which absorbs the slowdown effect of RV analyses. This leads us to conjecture that RIARC could accommodate monitors that perform richer RV analyses with minimal impact on the SuS. Our calculations from fig. [8](#page-22-1) (top right) put the latency at 1093ms for inline monitoring *vs.* 1547ms for RIARC at a peak Burst workload of 3*.*7k workers/s: a 454ms difference, which is *lower* than the 519ms gap measured in sec. [5.4.1.](#page-19-3) Sec. [5.5](#page-23-0) shows this gap is negligible in moderate concurrency scenarios.

<span id="page-22-1"></span>

**Figure 8** Instrumentation and RV monitoring overhead gap (*high* workload, 100k workers)

## <span id="page-22-0"></span><sup>765</sup> **5.4.3 Resource usage**

<sup>766</sup> We employ platform  $P_G$  $P_G$  with high concurrency  $C_H$  $C_H$  to confirm that our observations about inline and RIARC monitoring transfer to general cases. Secs. [5.4.1](#page-19-3) and [5.4.2](#page-20-0) deem centralised monitoring to be impractical. We, thus, omit it from the sequel; see app. [C.6.3](#page-48-0) for results.  $\sigma_{169}$  Our experiments now use 16 scheduling threads,  $n = 500k$  workers, and  $w = 100$  requests per worker, producing  $\approx 100M$  messages and  $\approx 200M$  trace events. Fig. [13](#page-43-0) in app. [C.4](#page-42-0) render  $\pi$ <sup>1</sup> these Steady, Pulse, and Burst workload models. Secs. [5.4.1](#page-19-3) and [5.4.2](#page-20-0) bound the memory and scheduler metrics to the period the SuS executes to portray the *actual overhead* impact on the system. We refocus that view to assess the monitoring overhead in *its entirety*—from the point of SuS launch until monitors complete their RV analysis. Doing so reveals how inline and RIARC monitoring optimise the use of added memory and processing capacity. Results show that inline and RIARC monitoring are elastic and dynamically adapt to changes  $\overline{m}$  in the applied workloads. App. [C.6.3](#page-48-0) reconfirms that centralised monitoring lacks this trait. Fig. [9](#page-23-1) gives a complete benchmark run under the Steady and Burst workloads. We relabel the *x*-axis with the benchmark duration and omit the response time plots since response time is inapplicable to these experiments (latency is an attribute of the SuS, not the monitors).

#### **23:24 Runtime Instrumentation for Reactive Components**

 $_{781}$  In this run, the Steady workload generates a sustained load of  $\approx 5k$  workers/s whereas Burst

 $782$  peaks at  $\approx 17.8$ k workers/s under maximum load at  $\approx 5$ s; see fig. [13](#page-43-0) in app. [C.4.](#page-42-0)

 $F_{183}$  Fig. [9](#page-23-1) (top) illustrates the memory consumption patterns for inline and RIARC monitoring. <sup>784</sup> which exhibit *elasticity*. This elastic behaviour occurs at different points in the plots. Inline <sup>785</sup> monitoring peaks at  $\approx$  3.7GB at  $\approx$  72s and RIARC at  $\approx$  5.7GB at  $\approx$  100s under the Burst <sup>786</sup> workload. The memory consumption for both methods stabilises at around  $\approx$  36s under the  $787$  Steady workload, with  $\approx 2.3\text{GB}$  for inline and  $\approx 2.7\text{GB}$  for RIARC monitoring. Elasticity  $788$  in these methods is due to different reasons: it is intrinsic to inline monitoring (see sec. [1\)](#page-0-0), <sup>789</sup> whereas the RIARC spawns and garbage collects monitors on demand (secs. [3.1](#page-8-0) and [3.6\)](#page-15-1). <sup>790</sup> Fig. [16](#page-48-1) in app. [C.6.3](#page-48-0) certifies these observations under the Pulse workload. Centralised <sup>791</sup> monitoring is *insensitive* to the workload applied, as figs. [17](#page-49-0) and [18](#page-49-1) in app. [C.6.3](#page-48-0) reconfirm.

 The effect of dynamic message routing and tracer reconfiguration that RIARC performs is  $\frac{793}{103}$  evident in the scheduler utilisation plots of fig. [9.](#page-23-1) Under the Steady and Burst workloads, scheduler utilisation oscillates continually due to the sustained influx of trace events. Oscil- lations corroborate our observation in sec. [5.4.2](#page-20-0) about RIARC, namely, that monitors are activated by trace events but remain idle otherwise. Active monitor periods manifest as peaks in fig. [9.](#page-23-1) Idle periods, where monitors are placed in the EVM waiting queues, are reflected as regions with low and stable scheduler utilisation. These oscillations showcase the message-driven aspect of RIARC, which analyses events asynchronously. Inlining exhibits <sup>800</sup> minimal scheduler utilisation oscillations due to its lock-step execution with the SuS.

## <span id="page-23-0"></span><sup>801</sup> **5.5 Moderate concurrency benchmarks**

802 Our last experiment studies moderate concurrency scenarios  $C_M$  $C_M$ . The general-case plat-<sup>803</sup> form  $P_G$  $P_G$  sets  $n = 5k$  workers and  $w = 10k$  requests per worker, and uses 16 EVM schedulers.

<span id="page-23-1"></span>

**Figure 9** Inline and RIARC monitoring resource usage (*high* workload, 500k workers)

<sup>804</sup> We show that under these loads, RIARC induces overhead on par with inline monitoring.

<sup>805</sup> Moderate concurrency alters the execution of the master-worker model, compared to

 $\frac{1}{806}$  our benchmarks of secs. [5.4.1](#page-19-3) – [5.4.3.](#page-22-0) In this set-up, the master creates most of its worker

<sup>807</sup> processes at the initial stage of benchmark runs and spends the remaining time allocating work

<sup>808</sup> requests. This change grows the request throughput markedly, *e.g.* see tbl. [5](#page-42-1) in app. [C.4.](#page-42-0) One

<sup>809</sup> consequence is that centralised monitoring consistently crashes under the rapid accumulation

810 of messages in its mailbox. We, thus, limit our study to inline and RIARC monitoring.

 $811$  Tbl. [3](#page-24-0) compares the results taken on platform  $P_G$  $P_G$  from sec. [5.4.3](#page-22-0) with 500k workers (high 812 concurrency,  $C_H$  $C_H$ ) against the ones on  $P_G$  $P_G$  with 5k workers (moderate concurrency,  $C_M$ ). The <sup>813</sup> figures shown estimate the percentage overhead w.r.t. the baseline systems  $C_H$  $C_H$  and  $C_M$  at <sup>814</sup> this *maximum* load. Our ensuing discussion is limited to the overhead under the Steady and <sup>815</sup> Burst workloads since each respectively captures the SuS operation in *typical* and *severe* 816 load conditions. Readers are referred to fig. [20](#page-51-0) in app. [C.6.4](#page-50-0) for the overhead comparison <sup>817</sup> given in absolute metric values for the entirety of benchmark runs.

<sup>818</sup> Tbl. [3](#page-24-0) indicates that the memory consumption overhead due to inline monitoring is not <sup>819</sup> affected under the Steady workload, which remains at  $1\%$  in both the high and moderate 820 concurrency scenarios  $C_H$  $C_H$  and  $C_M$ . However, it decreases from 16% in  $C_H$  to 1% in  $C_M$ . <sup>821</sup> We observe the opposite effect on the scheduler utilisation overhead for inline monitoring. 822 For the moderate concurrency case  $C_M$  $C_M$ , the scheduler overhead under the Steady and Burst 823 workloads increases to  $3\%$  and  $4\%$  respectively.

<sup>824</sup> Tbl. [3](#page-24-0) also shows that under the Steady workload, RIARC induces a 23% memory overhead <sup>825</sup> in concurrency scenario  $C_H$  $C_H$  *vs.* 8% in concurrency scenario  $C_M$ , a decrease of 15%. Under <sup>826</sup> the Burst workload, this overhead is reduced by 46%, from 56% in  $C_H$  $C_H$  to 10% in  $C_M$ . 827 The scheduler utilisation overhead for RIAR[C](#page-18-3) from C<sub>H</sub> to C<sub>M</sub> also registers drops of  $\approx 71\%$ 828 under both Steady and Burst workloads. We attribute these overhead improvements to the <sup>829</sup> lower number of worker processes the master creates in the moderate concurrency set-up,  $\mathcal{C}_{\mathcal{M}}$  $\mathcal{C}_{\mathcal{M}}$  $\mathcal{C}_{\mathcal{M}}$ . The long-running worker processes induce stability in the SuS. RIARC adapts to this 831 change favourably by performing fewer trace event routing and tracer reconfigurations. The 832 ramification of this adaptability is perceivable in the latency overhead discussed next.

833 RIAR[C](#page-18-3) inflates the latency overhead from 95% in  $C_H$  to 194% in  $C_M$  under the Steady 834 workload  $(+99\%)$ , and from 97% in [C](#page-18-3)<sub>H</sub> to 190% in C<sub>M</sub> under the Burst workload  $(+93\%)$ . <sup>835</sup> However, RIARC induces *less latency* overhead than inline monitoring. Tbl. [3](#page-24-0) reveals that <sup>836</sup> the latency overhead for inline monitoring grows from  $4\%$  in the high concurrency set-up [C](#page-18-2)<sub>H</sub> 837 to 246% in the moderate concurrency set-up  $C_M$  $C_M$  under the Steady workload (+242%). It 838 also grows under the Burst workload, from 55% in  $C_H$  $C_H$  to 193% in  $C_M$  (+138%). In fact, our <sup>839</sup> calculations confirm that the *absolute* response time for inline monitoring is slightly worse <sup>840</sup> than that of RIARC in [C](#page-18-3)M: 116ms *vs.* 98ms under the Steady, and 182ms *vs.* 179ms under

<span id="page-24-0"></span>

**Table 3** Percentage overhead on [C](#page-18-3)<sub>H</sub> (500k) and C<sub>M</sub> (5k) w.r.t. baseline at *maximum* workload

#### **23:26 Runtime Instrumentation for Reactive Components**

<sup>841</sup> the Burst workloads respectively. This latency degradation for inline monitoring stems from  $\frac{1}{842}$  the ≈5us slowdown induced by the RV analysis, which results in frequent 'pausing' of worker <sup>843</sup> processes. Monitors comprising richer analyses produce longer pauses in worker processes,  $\frac{844}{100}$  which can degrade the response time further [\[46,](#page-29-9) [38,](#page-28-6) [72\]](#page-30-6).

### <span id="page-25-0"></span><sup>845</sup> **5.6 Discussion**

846 The RIARC scheduler utilisation in tbl. [3](#page-24-0) is higher than the reported values for inline 847 monitoring. This should not be construed as an inefficiency. From a reactive systems <sup>848</sup> perspective, growth in the scheduler utilisation indicates *scalability*, as the low memory 849 consumption in tbl. [3](#page-24-0) affirms. RIARC benefits from the ample schedulers to improve the <sup>850</sup> overall system response time *without* overtaxing the system. Indeed, fig. [20](#page-51-0) in app. [C.6.4](#page-50-0)  $\frac{851}{100}$  demonstrates that the mean absolute scheduler utilisation in the benchmarks of sec. [5.5](#page-23-0) is <sup>852</sup> just  $\approx 10\%$  under both the Steady and Burst workloads. Tbl. [3](#page-24-0) shows that the reduction in <sup>853</sup> latency makes RIARC comparable to inline monitoring in moderate concurrency scenarios.

 Sec. [1](#page-0-0) names *responsiveness* as a key reactive systems attribute [\[97\]](#page-31-1). RIARC prioritises responsiveness by isolating its monitors into asynchronous concurrent units. This design naturally exploits the available processing capacity of the host platform by maximising monitor *parallelism* when possible. Inline monitoring reaps fewer benefits in identical settings because its lock-step execution with the SuS robs it of potential parallelism gains.

 $s_{859}$  Secs. [5.4.1](#page-19-3) – [5.4.3](#page-22-0) attest to the impracticality of centralised monitoring for reactive systems. Bottlenecks hinder its ability to scale, compelling it to consume inordinate amounts of memory, which can lead to failure, as sec. [5.4.2](#page-20-0) shows. Despite these shortcomings, many RV tools in this setting use centralised monitoring, *e.g.* [\[50,](#page-29-10) [16,](#page-27-8) [133,](#page-32-20) [66,](#page-30-8) [84,](#page-30-17) [113,](#page-32-3) [75,](#page-30-7) [38,](#page-28-6) [41,](#page-29-16) [39,](#page-28-7) [2,](#page-27-16) [106\]](#page-31-22).

## <sup>863</sup> **6 Conclusion**

864 Reactive software calls for instrumentation methods that uphold the responsive, resilient, message-driven, and elastic attributes of systems. This is attainable *only if* the instru- mentation exhibits these qualities. Runtime verification imposes another demand on the instrumentation: that the trace event sequences it reports to monitors are *sound*, *i.e.,* traces do not omit events and preserve the ordering with which events occur locally at processes.

 This paper presents RIARC, a novel decentralised instrumentation algorithm for outline 870 monitors meeting these two demands. RIARC uses outline monitors to decouple the runtime analysis from system components, which minimises latency and promotes *responsiveness*. Outline monitors can fail independently of the system and each other to improve *resiliency*. RIARC gathers events non-invasively via a tracing infrastructure, making it *message-driven* and suited to cases where inlining is inapplicable. The algorithm is *elastic*: it reacts to 875 specific events in the trace to instrument and garbage collect monitors on demand.

<sup>876</sup> Our asynchronous setting complicates the instrumentation due to potential trace event <sup>877</sup> loss or reordering. RIARC overcomes these challenges using a next-hop IP routing approach 878 to rearrange and report events soundly to monitors. We validate RIARC by subjecting its 879 corresponding Erlang implementation to rigorous systematic testing, confirming its correctness. <sup>880</sup> This implementation is evaluated via extensive empirical experiments. These subject the <sup>881</sup> implementation to large realistic workloads to ascertain its reactiveness. Our experiments 882 show that RIARC optimises its memory and scheduler usage to maintain latency feasible for 883 soft real-time applications. We also compare RIARC to inline and centralised monitoring, <sup>884</sup> revealing that it induces *comparable* latency to inlining under moderate concurrency.

 **Related work** Works on inlining besides the ones cited in sec. [1,](#page-0-0) *e.g.* [\[81,](#page-30-18) [25,](#page-28-5) [50,](#page-29-10) [49,](#page-29-5) [53,](#page-29-20) [52\]](#page-29-4), do not separate the instrumentation and runtime analysis. This is common in monolithic 887 settings, where the instrumentation is often assumed to induce minimal runtime overhead. As a result, many inline approaches focus on the efficiency of the analysis but neglect the instrumentation cost (*e.g.* [\[64\]](#page-30-12) attributes overhead solely to the analysis). Sec. [5.4.1](#page-19-3) shows this is not the case. This line of reasoning for monolithic systems is often ported to concurrent <sup>891</sup> settings. For instance,  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  $[110, 133, 29, 46, 132, 67, 19]$  propose efficient runtime monitoring 892 algorithms but do not account for, nor quantify, the overhead due to gathering trace events. Tools, such as [\[41,](#page-29-16) [38,](#page-28-6) [17,](#page-27-4) [35,](#page-28-16) [75,](#page-30-7) [142\]](#page-33-1), that quantify the runtime overhead coalesce the instrumentation and runtime analysis costs, making it difficult to gauge whether inefficiencies <sup>895</sup> arise from one or the other. We are unaware of empirical studies such as ours that distinguish 896 between the instrumentation and runtime analysis overhead.

<sup>897</sup> Sec. [5.6](#page-25-0) remarks that centralised monitoring is used for concurrent runtime verification 898 despite its evident limitations. One plausible reason for this is that the empirical scrutiny of such tools lacks proper benchmarking (*e.g.* [\[50,](#page-29-10) [16,](#page-27-8) [133,](#page-32-20) [66,](#page-30-8) [84\]](#page-30-17)) or uses insufficient workloads that fail to expose the issues of centralised set-ups (*e.g.* [\[113,](#page-32-3) [75,](#page-30-7) [38,](#page-28-6) [41,](#page-29-16) [39,](#page-28-7) [2,](#page-27-16) [106\]](#page-31-22)). Gathering inadequate metrics can also bias the interpretation of empirical data; see sec. [5.4.1.](#page-19-3) Works, such as [\[39,](#page-28-7) [17,](#page-27-4) [35,](#page-28-16) [131\]](#page-32-21), consider the memory consumption and latency metrics. Our evaluation of inline, centralised, and RIARC monitoring uses (i) *combinations* of hardware and software, with (ii) two concurrency models that test *edge-case* and *general-case* scenarios, under (iii) *high* workloads that go beyond the state of the art, applying (iv) *realistic* workload profiles, interpreted against (v) *relevant* performance metrics that give a multi-faceted view 907 of runtime overhead. To the best of our knowledge, this is generally not done in other studies, *e.g.* [\[117,](#page-32-7) [116,](#page-32-8) [47,](#page-29-12) [46,](#page-29-9) [124,](#page-32-9) [30,](#page-28-8) [109,](#page-31-24) [39,](#page-28-7) [41,](#page-29-16) [17,](#page-27-4) [50,](#page-29-10) [51,](#page-29-21) [53,](#page-29-20) [75,](#page-30-7) [60,](#page-29-8) [61,](#page-29-7) [27,](#page-28-17) [113,](#page-32-3) [100,](#page-31-8) [35\]](#page-28-16).

 Outline instrumentation decouples the execution of the SuS and monitor components in space (*i.e.,* isolated threads) and time (*i.e.,* asynchronous messaging). The tracing infrastruc- ture outline instrumentation uses mirrors the publish-subscribe (Pub/Sub) pattern [\[138\]](#page-32-18). In this set-up, consumers subscribe to a *broker* that advertises events. Centralised instru-913 mentation follows a Pub/Sub approach: the SuS produces trace events and deposits them into *one* global trace buffer that tracers receive from (see fig. [1b\)](#page-3-0). Despite similarities, *e.g.* tracers register and deregister with the tracing infrastructure at runtime, RIARC differs from conventional Pub/Sub messaging in three fundamental aspects. Chiefly, Pub/Sub publishers are unaware of the subscribers interested in receiving messages because this bookkeeping task is appointed to the broker. By contrast, next-hop routing relies on the *explicit* address of recipients to forward messages. Furthermore, in Pub/Sub messaging, subscribers do not communicate with publishers, whereas RIARC tracers exchange *direct* detach requests between one another to reorganise the choreography (refer to sec. [3.4\)](#page-13-0). Lastly, Pub/Sub brokers are typically predefined and remain fixed, while trace partitioning *reconfigures* the tracing topology, creating and destroying brokers in reaction to dynamic changes in SuS.

 $\mathcal{Q}_{924}$  One assumption we make about process tracing is  $A_4$  $A_4$ , *i.e.*, tracing gathers the spawn events of parent processes before all the events of child processes. While  $A_4$  $A_4$  induces a partial order over trace events, it is *weaker* than happened-before causality [\[98\]](#page-31-25), as the events gathered from sets of child SuS processes need not be causally ordered. Demanding the latter condition would entail additional computation on the part of the tracing infrastructure and could increase runtime overhead. Maintaining minimal overhead is critical to our instrumentation 930 because it preserves the responsiveness attribute of reactive systems. Tracing assumption  $A_4$  $A_4$  and the RIARC logic detailed in sec. [3](#page-7-0) guarantee trace soundness (def. [1\)](#page-1-4), which suffices for RV monitoring. Since our work targets soft real-time systems [\[97,](#page-31-1) [95\]](#page-31-16) scoped in a reliable

## **23:28 Runtime Instrumentation for Reactive Components**

933 messaging setting (see sec. [1\)](#page-0-0), we do not tackle the problem of ensuring time-bounded

934 causally-ordered message delivery [\[18\]](#page-27-18) nor implement exactly-once delivery semantics [\[86\]](#page-30-20).

935 We will address these challenges in future extensions of this work.

<span id="page-27-18"></span><span id="page-27-17"></span><span id="page-27-16"></span><span id="page-27-15"></span><span id="page-27-14"></span><span id="page-27-13"></span><span id="page-27-12"></span><span id="page-27-11"></span><span id="page-27-10"></span><span id="page-27-9"></span><span id="page-27-8"></span><span id="page-27-7"></span><span id="page-27-6"></span><span id="page-27-5"></span><span id="page-27-4"></span><span id="page-27-3"></span><span id="page-27-2"></span><span id="page-27-1"></span><span id="page-27-0"></span>

- <span id="page-28-19"></span> **20** Ezio Bartocci, Yliès Falcone, Borzoo Bonakdarpour, Christian Colombo, Normann Decker, Klaus Havelund, Yogi Joshi, Felix Klaedtke, Reed Milewicz, Giles Reger, Grigore Rosu, Julien Signoles, Daniel Thoma, Eugen Zalinescu, and Yi Zhang. First International Competition on Runtime Verification: Rules, Benchmarks, Tools, and Final Results of CRV 2014. *STTT*, 21:31–70, 2019.
- <span id="page-28-1"></span> **21** Ezio Bartocci, Yliès Falcone, Adrian Francalanza, and Giles Reger. Introduction to Runtime Verification. In *Lectures on Runtime Verification*, volume 10457 of *LNCS*, pages 1–33. Springer, 2018.
- <span id="page-28-20"></span> **22** Ezio Bartocci, Yliès Falcone, and Giles Reger. International Competition on Runtime Verifica-tion (CRV). In *TACAS*, volume 11429 of *LNCS*, pages 41–49, 2019.
- <span id="page-28-13"></span>**23** Basho. Bench, 2017. URL: [https://github.com/basho/basho\\_bench](https://github.com/basho/basho_bench).
- <span id="page-28-3"></span> **24** David A. Basin, Felix Klaedtke, and Eugen Zalinescu. Failure-Aware Runtime Verification of Distributed Systems. In *FSTTCS*, volume 45 of *LIPIcs*, pages 590–603, 2015.
- <span id="page-28-5"></span>**25** Andreas Bauer and Yliès Falcone. Decentralised LTL Monitoring. *FMSD*, 48:46–93, 2016.
- <span id="page-28-2"></span> **26** André Bento, Jaime Correia, Ricardo Filipe, Filipe Araújo, and Jorge Cardoso. Automated Analysis of Distributed Tracing: Challenges and Research Directions. *J. Grid Comput.*, 19(1):9, 998 2021
- <span id="page-28-17"></span> **27** Shay Berkovich, Borzoo Bonakdarpour, and Sebastian Fischmeister. Runtime Verification with Minimal Intrusion through Parallelism. *FMSD*, 46:317–348, 2015.
- <span id="page-28-12"></span> **28** Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khan, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony L. Hosking, Maria Jump, Han Bok Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In *OOPSLA*, pages 169–190, 2006.
- <span id="page-28-15"></span> **29** Eric Bodden. The Design and Implementation of Formal Monitoring Techniques. In *OOPSLA Companion*, pages 939–940, 2007.
- <span id="page-28-8"></span> **30** Eric Bodden, Laurie J. Hendren, Patrick Lam, Ondrej Lhoták, and Nomair A. Naeem. Collaborative Runtime Verification with Tracematches. *J. Log. Comput.*, 20:707–723, 2010.
- <span id="page-28-4"></span> **31** Borzoo Bonakdarpour, Pierre Fraigniaud, Sergio Rajsbaum, David A. Rosenblueth, and Corentin Travers. Decentralized Asynchronous Crash-Resilient Runtime Verification. In *CONCUR*, volume 59 of *LIPIcs*, pages 16:1–16:15, 2016.
- <span id="page-28-10"></span> **32** Jonas Bonér, Dave Farley, Roland Kuhn, and Martin Thompson. The Reactive Manifesto. Technical report, 2014.
- <span id="page-28-9"></span> **33** Jonas Bonér and Viktor Klang. Reactive Programming *vs.* Reactive Systems. Technical report, Lightbend Inc., 2016.
- <span id="page-28-18"></span> **34** Werner Buchholz. A Synthetic Job for Measuring System Performance. *IBM Syst. J.*, 8:309–318, 1019 1969
- <span id="page-28-16"></span> **35** Christian Bartolo Burlò, Adrian Francalanza, and Alceste Scalas. On the Monitorability of Session Types, in Theory and Practice. In *ECOOP*, volume 194 of *LIPIcs*, pages 20:1–20:30, 2021.
- <span id="page-28-11"></span> **36** Rajkumar Buyya, James Broberg, and Andrzej M. Goscinski. *Cloud Computing: Principles and Paradigms*. Wiley-Blackwell, 2011.
- <span id="page-28-0"></span>**37** Bryan Cantrill. Hidden in Plain Sight. *ACM Queue*, 4:26–36, 2006.
- <span id="page-28-6"></span> **38** Ian Cassar and Adrian Francalanza. On Synchronous and Asynchronous Monitor Instru- mentation for Actor-based Systems. In *FOCLASA*, volume 175 of *EPTCS*, pages 54–68, 2014.
- <span id="page-28-7"></span> **39** Ian Cassar and Adrian Francalanza. On Implementing a Monitor-Oriented Programming Framework for Actor Systems. In *IFM*, volume 9681 of *LNCS*, pages 176–192, 2016.
- <span id="page-28-14"></span> **40** Ian Cassar, Adrian Francalanza, Duncan Paul Attard, Luca Aceto, and Anna Ingólfsdóttir. A Suite of Monitoring Tools for Erlang. In *RV-CuBES*, volume 3 of *Kalpa Publications in Computing*, pages 41–47, 2017.

#### **23:30 Runtime Instrumentation for Reactive Components**

- <span id="page-29-16"></span> **41** Ian Cassar, Adrian Francalanza, and Simon Said. Improving Runtime Overheads for detectEr. In *FESCA*, volume 178 of *EPTCS*, pages 1–8, 2015.
- <span id="page-29-15"></span> **42** Francesco Cesarini and Simon Thompson. *Erlang Programming: A Concurrent Approach to Software Development*. O'Reilly Media, 2009.
- <span id="page-29-13"></span> **43** Bernadette Charron-Bost, Friedemann Mattern, and Gerard Tel. Synchronous, Asynchronous, and Causally Ordered Communication. *Distributed Comput.*, 9(4):173–191, 1996.
- <span id="page-29-14"></span> **44** Natalia Chechina, Kenneth MacKenzie, Simon J. Thompson, Phil Trinder, Olivier Boudeville, Viktoria Fordós, Csaba Hoch, Amir Ghaffari, and Mario Moro Hernandez. Evaluating Scalable Distributed Erlang for Scalability and Reliability. *IEEE Trans. Parallel Distributed Syst.*, 28(8):2244–2257, 2017.
- <span id="page-29-6"></span> **45** Feng Chen and Grigore Rosu. Java-MOP: A Monitoring Oriented Programming Environment for Java. In *TACAS*, volume 3440 of *LNCS*, pages 546–550, 2005.
- <span id="page-29-9"></span> **46** Feng Chen and Grigore Rosu. Mop: An Efficient and Generic Runtime Verification Framework. In *OOPSLA*, pages 569–588, 2007.
- <span id="page-29-12"></span> **47** Feng Chen and Grigore Rosu. Parametric Trace Slicing and Monitoring. In *TACAS*, volume 5505 of *LNCS*, pages 246–261, 2009.
- <span id="page-29-17"></span> **48** Maria Christakis, Alkis Gotovos, and Konstantinos Sagonas. Systematic Testing for Detecting Concurrency Errors in Erlang Programs. In *ICST*, pages 154–163. IEEE Computer Society, 2013.
- <span id="page-29-5"></span> **49** Christian Colombo and Yliès Falcone. Organising LTL Monitors over Distributed Systems with a Global Clock. *FMSD*, 49:109–158, 2016.
- <span id="page-29-10"></span> **50** Christian Colombo, Adrian Francalanza, and Rudolph Gatt. Elarva: A Monitoring Tool for Erlang. In *RV*, volume 7186 of *LNCS*, pages 370–374, 2011.
- <span id="page-29-21"></span> **51** Christian Colombo, Adrian Francalanza, Ruth Mizzi, and Gordon J. Pace. polyLarva: Runtime Verification with Configurable Resource-Aware Monitoring Boundaries. In *SEFM*, volume 7504 of *LNCS*, pages 218–232, 2012.
- <span id="page-29-4"></span> **52** Christian Colombo and Gordon J. Pace. *Runtime Verification - A Hands-On Approach in Java*. Springer, 2022.
- <span id="page-29-20"></span> **53** Christian Colombo, Gordon J. Pace, and Gerardo Schneider. LARVA — Safer Monitoring of Real-Time Java Programs (Tool Paper). In *SEFM*, pages 33–37, 2009.
- <span id="page-29-1"></span> **54** Markus Dahm. Byte Code Engineering with the BCEL API. Technical report, Java Informa-tionstage 99, 2001.
- <span id="page-29-19"></span> **55** Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. *Commun. ACM*, 51:107–113, 2008.
- <span id="page-29-2"></span> **56** Mathieu Desnoyers and Michel Dagenais. The LTTng Tracer: A Low Impact Performance and Behavior Monitor for GNU/Linux. Technical report, École Polytechnique de Montréal, 2006.
- <span id="page-29-22"></span> **57** Jay L. Devore and Kenneth N. Berk. *Modern Mathematical Statistics with Applications*. Springer, 2012.
- <span id="page-29-11"></span> **58** Jean Dollimore, Tim Kindberg, and George Coulouris. *Distributed Systems: Concepts and Design*. Addison-Wesley, 2005.
- <span id="page-29-3"></span>**59** Eclipse/IBM. OpenJ9, 2021. URL: <https://www.eclipse.org/openj9>.
- <span id="page-29-8"></span> **60** Antoine El-Hokayem and Yliès Falcone. Monitoring Decentralized Specifications. In *ISSTA*, pages 125–135, 2017.
- <span id="page-29-7"></span> **61** Antoine El-Hokayem and Yliès Falcone. On the Monitoring of Decentralized Specifications: Semantics, Properties, Analysis, and Simulation. *ACM Trans. Softw. Eng. Methodol.*, 29:1:1– 1:57, 2020.
- <span id="page-29-18"></span> **62** Úlfar Erlingsson. *The Inlined Reference Monitor Approach to Security Policy Enforcement*. PhD thesis, Cornell University, US, 2004.
- <span id="page-29-0"></span> **63** Úlfar Erlingsson and Fred B. Schneider. SASI Enforcement of Security Policies: A Retrospective. In *NSPW*, pages 87–95, 1999.

#### **Aceto et al. 23:31**

- <span id="page-30-12"></span> **64** Yliès Falcone, Klaus Havelund, and Giles Reger. A Tutorial on Runtime Verification. In *Engineering Dependable Software Systems*, volume 34 of *NATO Science for Peace and Security Series, D: Information and Communication Security*, pages 141–175. IOS Press, 2013.
- <span id="page-30-3"></span> **65** Yliès Falcone, Srdan Krstic, Giles Reger, and Dmitriy Traytel. A Taxonomy for Classifying Runtime Verification Tools. *STTT*, 23:255–284, 2021.
- <span id="page-30-8"></span> **66** Yliès Falcone, Hosein Nazarpour, Saddek Bensalem, and Marius Bozga. Monitoring Distributed Component-Based Systems. In *FACS*, volume 13077 of *LNCS*, pages 153–173, 2021.
- <span id="page-30-19"></span> **67** Yliès Falcone, Hosein Nazarpour, Mohamad Jaber, Marius Bozga, and Saddek Bensalem. Tracing Distributed Component-Based Systems, a Brief Overview. In *RV*, volume 11237 of *LNCS*, pages 417–425, 2018.
- <span id="page-30-21"></span> **68** Yliès Falcone, Dejan Nickovic, Giles Reger, and Daniel Thoma. Second International Com- petition on Runtime Verification CRV 2015. In *RV*, volume 9333 of *LNCS*, pages 405–422, 2015.
- <span id="page-30-22"></span> **69** Dror G. Feitelson. From Repeatability to Reproducibility and Corroboration. *ACM SIGOPS Oper. Syst. Rev.*, 49:3–11, 2015.
- <span id="page-30-14"></span><span id="page-30-5"></span>**70** Apache Software Foundtation. JMeter, 2020. URL: <https://jmeter.apache.org>.
- **71** Pierre Fraigniaud, Sergio Rajsbaum, and Corentin Travers. On the Number of Opinions Needed for Fault-Tolerant Run-Time Monitoring in Distributed Systems. In *RV*, volume 8734 of *LNCS*, pages 92–107, 2014.
- <span id="page-30-6"></span>**72** Adrian Francalanza. A Theory of Monitors. *Inf. Comput.*, 281:104704, 2021.
- <span id="page-30-2"></span> **73** Adrian Francalanza, Luca Aceto, Antonis Achilleos, Duncan Paul Attard, Ian Cassar, Dario Della Monica, and Anna Ingólfsdóttir. A Foundation for Runtime Monitoring. In *RV*, volume 10548 of *LNCS*, pages 8–29, 2017.
- <span id="page-30-1"></span> **74** Adrian Francalanza, Jorge A. Pérez, and César Sánchez. Runtime Verification for Decentralised and Distributed Systems. In *Lectures on RV*, volume 10457 of *LNCS*, pages 176–210. Springer, 2018.
- <span id="page-30-7"></span> **75** Adrian Francalanza and Aldrin Seychell. Synthesising Correct Concurrent Runtime Monitors. *FMSD*, 46:226–261, 2015.
- <span id="page-30-16"></span>**76** Sukumar Ghosh. *Distributed Systems: An Algorithmic Approach*. CRC, 2014.
- <span id="page-30-11"></span> **77** Patrice Godefroid. Model Checking for Programming Languages using Verisoft. In *POPL*, pages 174–186. ACM Press, 1997.
- <span id="page-30-4"></span> **78** Susanne Graf, Doron A. Peled, and Sophie Quinton. Monitoring Distributed Systems Using Knowledge. In *FORTE*, volume 6722 of *LNCS*, pages 183–197, 2011.
- <span id="page-30-0"></span> **79** Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick. gprof: A Call Graph Execution Profiler. In *SIGPLAN Symposium on Compiler Construction*, pages 120–126. ACM, 1982.
- <span id="page-30-13"></span> **80** Jim Gray. *The Benchmark Handbook for Database and Transaction Processing Systems*. Morgan Kaufmann, 1993.
- <span id="page-30-18"></span> **81** Radu Grigore, Dino Distefano, Rasmus Lerchedahl Petersen, and Nikos Tzevelekos. Runtime Verification Based on Register Automata. In *TACAS*, volume 7795 of *LNCS*, pages 260–276, 2013.
- <span id="page-30-15"></span> **82** Duncan A. Grove and Paul D. Coddington. Analytical Models of Probability Distributions for MPI Point-to-Point Communication Times on Distributed Memory Parallel Computers. In *ICA3PP*, volume 3719 of *LNCS*, pages 406–415, 2005.
- <span id="page-30-10"></span>**83** Eric A. Hall. *Internet Core Protocols: The Definitive Guide*. O'Reilly Media, 2000.
- <span id="page-30-17"></span> **84** Klaus Havelund, Giles Reger, Daniel Thoma, and Eugen Zalinescu. Monitoring Events that Carry Data. In *Lectures on Runtime Verification*, volume 10457 of *LNCS*, pages 61–102. Springer, 2018.
- <span id="page-30-9"></span> **85** Carl Hewitt, Peter Boehler Bishop, and Richard Steiger. A Universal Modular ACTOR Formalism for Artificial Intelligence. In *IJCAI*, pages 235–245, 1973.
- <span id="page-30-20"></span> **86** Yongqiang Huang and Hector Garcia-Molina. Exactly-Once Semantics in a Replicated Mes-saging System. In *ICDE*, pages 3–12. IEEE Computer Society, 2001.

#### **23:32 Runtime Instrumentation for Reactive Components**

<span id="page-31-25"></span><span id="page-31-24"></span><span id="page-31-23"></span><span id="page-31-22"></span><span id="page-31-21"></span><span id="page-31-20"></span><span id="page-31-19"></span><span id="page-31-18"></span><span id="page-31-17"></span><span id="page-31-16"></span><span id="page-31-15"></span><span id="page-31-14"></span><span id="page-31-13"></span><span id="page-31-12"></span><span id="page-31-11"></span><span id="page-31-10"></span><span id="page-31-9"></span><span id="page-31-8"></span><span id="page-31-7"></span><span id="page-31-6"></span><span id="page-31-5"></span><span id="page-31-4"></span><span id="page-31-3"></span><span id="page-31-2"></span><span id="page-31-1"></span><span id="page-31-0"></span> **87** Shams Mahmood Imam and Vivek Sarkar. Savina - An Actor Benchmark Suite: Enabling Empirical Evaluation of Actor Libraries. In *AGERE!@SPLASH*, pages 67–80, 2014. **88** Justin Iurman, Frank Brockners, and Benoit Donnet. Towards Cross-Layer Telemetry. In *ANRW*, pages 15–21. ACM, 2021. **89** Richard Jones, Antony Hosking, and Eliot Moss. *The Garbage Collection Handbook: The Art of Automatic Memory Management*. CRC, 2020. **90** Nicolai M. Josuttis. *SOA in Practice: The Art of Distributed System Design: Theory in Practice*. O'Reilly Media, 2007. **91** Saša Jurić. *Elixir in Action*. Manning, 2019. **92** Bill Kayser. What is the expected distribution of website response times?, 2017. URL: [https:](https://blog.newrelic.com/engineering/expected-distributions-website-response-times) [//blog.newrelic.com/engineering/expected-distributions-website-response-times](https://blog.newrelic.com/engineering/expected-distributions-website-response-times). **93** Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G. Griswold. An Overview of AspectJ. In *ECOOP*, volume 2072 of *LNCS*, pages 327–353, 2001. **94** Moonzoo Kim, Mahesh Viswanathan, Sampath Kannan, Insup Lee, and Oleg Sokolsky. Java- MaC: A Run-Time Assurance Approach for Java Programs. *FMSD*, 24:129–155, 2004. **95** Hermann Kopetz. *Real-Time Systems: Design Principles for Distributed Embedded Applications (Real-Time Systems Series)*. Springer, 2011. **96** Ajay D. Kshemkalyani and Mukesh Singhal. *Distributed Computing: Principles, Algorithms, and Systems*. Cambridge University Press, 2011. **97** Roland Kuhn, Brian Hanafee, and Jamie Allen. *Reactive Design Patterns*. Manning, 2016. **98** Leslie Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. *Commun. ACM*, 21(7):558–565, 1978. **99** Leslie Lamport, Robert E. Shostak, and Marshall C. Pease. The Byzantine Generals Problem. *ACM Trans. Program. Lang. Syst.*, 4:382–401, 1982. **100** Julien Lange and Nobuko Yoshida. Verifying Asynchronous Interactions via Communicating Session Automata. In *CAV*, volume 11561 of *LNCS*, pages 97–117, 2019. **101** Paul Lavery and Takuo Watanabe. An Actor-Based Runtime Monitoring System for Web and Desktop Applications. In *SNPD*, pages 385–390. IEEE Computer Society, 2017. **102** Philipp Lengauer, Verena Bitto, Hanspeter Mössenböck, and Markus Weninger. A Compre- hensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008. In *ICPE*, pages 3–14, 2017. **103** Bryon C. Lewis and Albert E. Crews. The Evolution of Benchmarking as a Computer Performance Evaluation Technique. *MIS Q.*, 9:7–16, 1985. **104** Jay Ligatti, Lujo Bauer, and David Walker. Edit Automata: Enforcement Mechanisms for Run-Time Security Policies. *Int. J. Inf. Sec.*, 4:2–16, 2005. **105** Zhen Liu, Nicolas Niclausse, and César Jalpa-Villanueva. Traffic Model and Performance Evaluation of Web Servers. *Perform. Evaluation*, 46:77–100, 2001. **106** Qingzhou Luo and Grigore Rosu. EnforceMOP: A Runtime Property Enforcement System for Multithreaded Programs. In *ISSTA*, pages 156–166, 2013. **107** Deep Medhi and Karthik Ramasamy. Chapter 3 - routing protocols: Framework and principles. In *Network Routing (Second Edition)*, The Morgan Kaufmann Series in Networking, pages 64–113. Morgan Kaufmann, 2018. **108** Silvana M. Melo, Jeffrey C. Carver, Paulo S. L. Souza, and Simone R. S. Souza. Empirical Research on Concurrent Software Testing: A Systematic Mapping Study. *Inf. Softw. Technol.*, 105:226–251, 2019. **109** Patrick O'Neil Meredith, Dongyun Jin, Dennis Griffith, Feng Chen, and Grigore Rosu. An Overview of the MOP Runtime Verification Framework. *STTT*, 14:249–289, 2012. **110** Patrick O'Neil Meredith and Grigore Rosu. Efficient Parametric Runtime Verification with Deterministic String Rewriting. In *ASE*, pages 70–80, 2013. **111** Microsoft. MSDN, 2021. URL: <https://msdn.microsoft.com>. **112** Ian Molyneaux. *The Art of Application Performance Testing 2e*. O'Reilly Media, 2014.

- <span id="page-32-3"></span> **113** Menna Mostafa and Borzoo Bonakdarpour. Decentralized Runtime Verification of LTL Specifications in Distributed Systems. In *IPDPS*, pages 494–503, 2015.
- <span id="page-32-0"></span> **114** Nicholas Nethercote and Julian Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In *PLDI*, pages 89–100. ACM, 2007.
- <span id="page-32-19"></span> **115** Rumyana Neykova. *Multiparty Session Types for Dynamic Verification of Distributed Systems*. PhD thesis, Imperial College London, UK, 2017.
- <span id="page-32-8"></span> **116** Rumyana Neykova and Nobuko Yoshida. Let it Recover: Multiparty Protocol-Induced Recovery. In *CC*, pages 98–108, 2017.
- <span id="page-32-7"></span>**117** Rumyana Neykova and Nobuko Yoshida. Multiparty Session Actors. *LMCS*, 13, 2017.
- <span id="page-32-16"></span>**118** Nicolas Niclausse. Tsung, 2017. URL: <http://tsung.erlang-projects.org>.
- <span id="page-32-14"></span> **119** Scott Oaks. *Java Performance: In-Depth Advice for Tuning and Programming Java 8, 11, and Beyond*. CRC, 2020.
- <span id="page-32-10"></span> **120** Martin Odersky, Lex Spoon, Bill Venners, and Frank Sommers. *Programming in Scala*. Artima 1199 Inc., 2021.
- <span id="page-32-26"></span> **121** Athanansios Papoulis. *Probability, Random Variables, and Stochastic Processes*. McGraw Hill, 1991.
- <span id="page-32-23"></span> **122** Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tuma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. Renaissance: Benchmarking Suite for Parallel Applications on the JVM. In *PLDI*, pages 31–47, 2019.
- <span id="page-32-11"></span>**123** Kevin Quick. Thespian, 2020. URL: <https://thespianpy.com/doc>.
- <span id="page-32-9"></span> **124** Giles Reger, Helena Cuenca Cruz, and David E. Rydeheard. MarQ: Monitoring at Runtime with QEA. In *TACAS*, volume 9035 of *LNCS*, pages 596–610, 2015.
- <span id="page-32-25"></span> **125** Giles Reger, Sylvain Hallé, and Yliès Falcone. Third International Competition on Runtime Verification - CRV 2016. In *RV*, volume 10012 of *LNCS*, pages 21–37, 2016.
- <span id="page-32-27"></span> **126** Giles Reger and David E. Rydeheard. From First-Order Temporal Logic to Parametric Trace Slicing. In *RV*, volume 9333 of *LNCS*, pages 216–232, 2015.
- <span id="page-32-17"></span> **127** Sartaj Sahni and George L. Vairaktarakis. The Master-Slave Paradigm in Parallel Computer and Industrial Settings. *J. Glob. Optim.*, 9:357–377, 1996.
- <span id="page-32-2"></span> **128** Raja R. Sambasivan, Ilari Shafer, Jonathan Mace, Benjamin H. Sigelman, Rodrigo Fonseca, and Gregory R. Ganger. Principled Workflow-Centric Tracing of Distributed Systems. In *SoCC*, pages 401–414. ACM, 2016.
- <span id="page-32-4"></span> **129** Torben Scheffel and Malte Schmitz. Three-Valued Asynchronous Distributed Runtime Verific-ation. In *MEMOCODE*, pages 52–61, 2014.
- <span id="page-32-1"></span>**130** Fred B. Schneider. Enforceable Security Policies. *ACM Trans. Inf. Syst. Secur.*, 3:30–50, 2000.
- <span id="page-32-21"></span> **131** Joshua Schneider, David A. Basin, Frederik Brix, Srdan Krstic, and Dmitriy Traytel. Scalable Online First-Order Monitoring. *Int. J. Softw. Tools Technol. Transf.*, 23:185–208, 2021.
- <span id="page-32-6"></span> **132** Koushik Sen, Grigore Rosu, and Gul Agha. Runtime Safety Analysis of Multithreaded Programs. In *ESEC / SIGSOFT FSE*, pages 337–346, 2003.
- <span id="page-32-20"></span> **133** Koushik Sen, Grigore Rosu, and Gul Agha. Online Efficient Predictive Safety Analysis of Multithreaded Programs. *Int. J. Softw. Tools Technol. Transf.*, 8:248–260, 2006.
- <span id="page-32-5"></span> **134** Koushik Sen, Abhay Vardhan, Gul Agha, and Grigore Rosu. Efficient Decentralized Monitoring of Safety in Distributed Systems. In *ICSE*, pages 418–427, 2004.
- <span id="page-32-24"></span> **135** Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. DaCapo con Scala: design and analysis of a Scala benchmark suite for the JVM. In *OOPSLA*, pages 657–676, 2011.
- <span id="page-32-22"></span>**136** SPEC. SPECjvm2008, 2008. URL: <https://www.spec.org/jvm2008>.
- <span id="page-32-15"></span>**137** Eric Stenman. *The Erlang Runtime System*. 2023.
- <span id="page-32-18"></span>**138** Sasu Tarkoma. *Overlay Networks: Toward Information Networking*. Auerbach, 2010.
- <span id="page-32-12"></span>**139** The Pony Team. Ponylang, 2021. URL: <https://tutorial.ponylang.io>.
- <span id="page-32-13"></span> **140** Ulf T. Wiger, Gösta Ask, and Kent Boortz. World-Class Product Certification using Erlang. *ACM SIGPLAN Notices*, 37(12):25–34, 2002.

## **23:34 Runtime Instrumentation for Reactive Components**

- <span id="page-33-0"></span>**141** Jiali Yao, Zhigeng Pan, and Hongxin Zhang. A Distributed Render Farm System for Animation
- Production. In *ICEC*, volume 5709 of *LNCS*, pages 264–269, 2009.
- <span id="page-33-1"></span> **142** Teng Zhang, Greg Eakman, Insup Lee, and Oleg Sokolsky. Overhead-Aware Deployment of Runtime Monitors. In *RV*, volume 11757 of *LNCS*, pages 375–381, 2019.

<span id="page-34-0"></span>

**Figure 10** Legend and notation for figures

## <span id="page-34-1"></span><sup>1241</sup> **A Appendix A: Auxiliary Instrumentation Logic**

12[4](#page-34-2)2 The operations  $\text{DISPATH}(m, \iota_{\text{T}})$  and  $\text{FORWD}(r, \iota_{\text{T}})$  given in alg. 4 enable tracers to perform  $_{1243}$  next-hop routing, as described in sec. [3.](#page-7-0) DISPATCH embeds an evt or dtc acknowledgement 1244 message *m* into a rtd packet, which is sent to the next-hop tracer with PID  $i<sub>T</sub>$ . In the 1245 packet, DISPATCH also inserts the PID of the invoker tracer, obtained via the function  $\text{self}($ ).  $_{1246}$  This is the PID of the *dispatch tracer*, and is used when a *forwarded*  $\sim$  event results in  $1247$  the instrumentation of a new SuS process (line [20](#page-12-0) in alg. [3\)](#page-13-1). Upon instrumenting the SuS  $1248$  PID carried by  $\Diamond$ , the tracer issues a dtc request to that dispatch tracer PID. The function 1249 DETACH( $i_S, i_T$ ) encapsulates the detachment logic. It signals the dispatch tracer with PID  $i_T$ <sup>1250</sup> that the SuS PID  $i_S$  is being traced by the *current* tracer with PID  $j_T = \text{self}$ ; see line [13](#page-10-2)  $1251$  $1251$  $1251$  in alg. 2 and line [13](#page-34-3) in alg. [4.](#page-34-2) Before sending the dtc request, DETACH uses PREEMPT so <sup>1252</sup> that the current tracer  $j<sub>T</sub>$  takes over the tracing of SuS PID  $i<sub>S</sub>$ . FORWARD $(r, i<sub>T</sub>)$  passes on <sup>1253</sup> the specified rtd packet *r* to the next-hop,  $i_T$ . TRYGC determines whether a tracer can be <sup>1254</sup> safely terminated by confirming that the traced-processes and routing maps for a tracer are <sup>1255</sup> both empty.

<sup>1256</sup> Alg. [4](#page-34-2) also includes the function Tracer used by alg. [2](#page-11-0) to spawn the core logic of algs. [1](#page-10-0) <sup>1257</sup> and [3](#page-13-1) to execute in a separate tracer process. Tracer accepts four parameters:

- 1258 **1.**  $\sigma$ , the state of the parent tracer,
- $1259$  $1259$  **2.**  $\varsigma_M$ , the RV monitor signature utilised by the function ANALYSEEVT in algs. 1 and [3](#page-13-1) to <sup>1260</sup> analyse trace events incrementally,
- $\frac{1261}{1261}$  3.  $\frac{1}{8}$ , the PID of the SuS process to instrument, and
- <span id="page-34-3"></span> $1262$  **4.**  $i_T$ , the PID of the dispatch tracer (from the rtd packet) to which the dtc request is issued.

<sup>1263</sup> The process tracing functions Trace, Clear and Preempt described in sec. [3](#page-7-0) are listed <sup>1264</sup> in alg. [5.](#page-35-0) Trace and Clear abstract the inner workings of the EVM tracing exposed via the



<span id="page-34-2"></span>

## **23:36 Runtime Instrumentation for Reactive Components**



<span id="page-35-0"></span>**Algorithm 5** Abstraction of the operations offered by process tracing  $\mathbb{R}^2$ 

 Erlang built-in primitive trace, and the underlying operation of our offline tracing engine described in sec. [4.1](#page-15-3) and app. [B.](#page-36-3)

 The function Start in alg. [6](#page-35-1) launches the SuS and root tracer in tandem. Start accepts the main SuS function signature  $\varsigma_s$  together with the instrumentation map, *Λ. Copies* of this map (see line [12](#page-34-3) in alg. [4\)](#page-34-2) are propagated between tracers, enabling them to determine whether a spawned SuS process requires instrumentation through a separate tracer. To safeguard against the initial loss of trace events, the SuS is launched in a *paused* state (line [2\)](#page-35-0). This permits the root tracer to start tracing the root system process that runs  $\zeta_{\rm S}$ . ROOT resumes the system (line [6\)](#page-35-0), and begins its trace inspection in *direct* mode, as line [8](#page-35-0) shows.

<span id="page-35-1"></span>**Algorithm 6** Launching root SuS and tracer processes



## <span id="page-36-3"></span><sup>1274</sup> **B Appendix B: Offline Tracing and Algorithm Invariants**

 RIARC can be extended with the event reordering scheme described when the underlying  $_{1276}$  tracing infrastructure does not guarantee tracing assumption  $A_4$  $A_4$ . This can be done in Erlang by peeking at the mailbox using the built-in primitive process\_info. In principle, this is inefficient if the mailbox contains many messages [\[42\]](#page-29-15). We, however, remark that in practice, such inefficiency arises only in the extreme case where  $\Diamond$  events are deposited into a tracer mailbox in exactly the reverse order in which descendant processes are spawned. Alternatively, one can use an auxiliary trace buffer (*e.g.* a list) that is populated by dequeuing the tracer mailbox first. Both amendments can be made on lines [3](#page-9-0) of algs. [1](#page-10-0) and [3.](#page-13-1)

## <span id="page-36-1"></span><sup>1283</sup> **B.1 Offline Tracing**

<sup>1284</sup> Ex. [7](#page-36-2) sketches below how our offline tracing engine operates. Internally, it uses tracer buffers  $_{1285}$  and sets of processes to rearrange process  $\sim$  events for descendant SuS processes. The tracing  $_{1286}$  engine rearranges  $\sim$  events using the PID information they carry. In doing so, it recovers the happens-before causality between each  $\sim$  event. Concurrent  $\sim$  events for sibling processes, <sup>1288</sup> such as when process *P* spawns *Q* and *R*, are not reordered.

<span id="page-36-2"></span>1289 **Example 7** (Reordering spawn events). Suppose the tracer  $T_P$  with PID  $p_T$  registers to  $1290$  trace the SuS process *P* with PID  $p_S$ . *P* spawns process *Q*, which, in turn, spawns *R*, as in <sup>1291</sup> fig. [5a.](#page-9-1)  $T_P$  invokes  $\text{Trace}(p_S, p_T)$ , which registers its PID  $p_T$  with the tracing engine. The tracing engine assigns the empty trace buffer *B* and set  $S = \{p_s\}$  to  $p_T$ .

1293 **Scan 1.** When the event  $e_1 = \langle \text{evt}, ?, q_s \rangle$  is read into *B*, the engine does not deliver it to  $p_{\text{T}}$ . The occurs because none of the SuS PID values in *S* match the value of the originator PID in the  $?_Q$  event, *i.e.*,  $e_1 \cdot s = q_s \notin \{p_s\}.$ 

**Scan 2.** Event  $e_2 = \langle evt, \neg \Diamond, q_s, r_s, f_{s_R} \rangle$  is read next into the buffer. A scan is performed but 1297 no action is taken, as  $e_2 \iota_s = q_s \notin \{p_s\}$ . *B* now contains ' $?_Q \rightarrow_Q ?$ .

**Scan 3.** Events  $e_3 = \langle \text{evt}, \neg \diamond, p_s, q_s, f_{s_Q} \rangle$  and  $e_4 = \langle \text{evt}, !, p_s, q_s \rangle$  are appended to *B*. The engine scans *B* and dequeues  $\langle \text{evt}, \neg \diamond, p_s, q_s, f_{s_Q} \rangle$  since the value of the originator PID  $e_3 \cdot i_s = p_s$ 1300 is contained in  $\{p_s\}$ . This triggers the event  $\sim_P$  to be delivered to  $T_P$ . Additionally, the engine sets  $S = \{p_s, q_s\}$  per the inheritance tracing assumption  $A_2$  $A_2$  of sec. [2.](#page-3-1)

<sup>1302</sup> **Scan 4.** Updating *S* triggers another buffer scan to check whether any events require 1303 dequeuing. The event  $\langle \text{evt}, ?, q_s \rangle$  is dequeued and delivered to  $T_P$ , since now,  $e_1 \cdot s = q_s \in \{p_s, q_s\}.$  Similarly,  $\langle \text{evt}, \diamondsuit, q_s, r_s, f_{s_R} \rangle$  is dequeued and delivered to  $T_P$ . <sup>1305</sup> *S* is updated to  $\{p_S, q_S, r_S\}$ . The engine continues scanning the buffer and dequeues <sup>1306</sup>  $\langle \text{evt}, !, p_s, q_s \rangle$ , which it delivers to  $T_P$ .

<span id="page-36-0"></span>

**Figure 11** Online tracing via the EVM and offline tracing based on replayed trace files

#### **23:38 Runtime Instrumentation for Reactive Components**

<sup>1307</sup> **Scan 5.** Since *B* is empty, the update in *S* does not trigger another buffer scan. The engine <sup>1308</sup> pauses until new events are read into the buffer.

The input trace in the buffer ' $?_Q \rightarrow \Diamond_P . \vert_P$ ' has been delivered to  $T_P$  as ' $\Diamond_P . \partial_Q . \partial_Q . \vert_P'$ ',  $_{1310}$  matching the one shown in fig. [5a.](#page-9-1)

<span id="page-37-1"></span><sup>1311</sup> ▶ **Example 8** (Other interleaved executions)**.** Other executions are possible. The input buffer  $_{1312}$   $^{'}?$ <sub>*Q*</sub>  $\sim$   $\phi$ <sub>*P*</sub>  $\cdot$   $\phi$ <sub>*Q*</sub></sub> $\cdot$   $\cdot$   $\phi$ <sub>*P*</sub>  $^{'}\circ$ <sub>*z*</sub>  $\cdot$ <sup>*Q*</sup> $\cdot$ <sub>*z*</sub> $^{'}\circ$ <sup>*D*</sup><sub>*z*</sub>  $^{'}\circ$ <sup>*z*</sup> $^{'}\circ$ *z*<sup>*z*</sup> $^{'}\circ$ *z*<sup>*z*</sup> $^{'}\circ$ *z* $^{$ 

1313 We underscore that the *input* traces ' $?_Q \rightarrow \Diamond_P . \Diamond_P . \Diamond_P . \Diamond_P . \Diamond_Q . \Diamond_P . \Diamond_Q . \Box_P$  from exs. [7](#page-36-2) and [8](#page-37-1) observe *trace consistency* of def. [1](#page-1-4) w.r.t. *P* and *Q*. For instance, the input trace ' $\ll_{\alpha}$ ,  $?_{\alpha}$ ,  $\ll_{\rho}$ ,  $P_{\rho}$ ' <sup>1315</sup> is inconsistent w.r.t. *Q*. Ex. [9](#page-37-2) shows that our tracing engine preserves *trace identity*, *i.e.,*  $_{1316}$  a consistent trace with the correct causal ordering between  $\sim$  events in descendant SuS <sup>1317</sup> processes is not modified.

<span id="page-37-2"></span>**Example 9** (Trace identity). For the same tracer set-up of ex. [7,](#page-36-2) *i.e.*,  $T_P$  initially tracing 1319 P, the buffer ' $e_1 = \langle \text{evt}, \negthinspace \negthinspace \negthinspace \negthinspace \negthinspace \negthinspace \negthinspace \negthinspace \negthinspace \circ \negthinspace \negthinspace \negthinspace \circ \negthinspace \negthinspace \circ \negthinspace \negthinspace \circ \negthinspace \negthinspace \circ \negthinspace \circ \negthinspace \negthinspace \negthinspace \circ \negthinspace \negthinspace \negthinspace \negthinspace \circ \negthinspace \negthinspace \negthinspace \circ \negthinspace \negthinspace \negthinspace \circ \$ <sup>1320</sup> and  $T = {p<sub>s</sub>}$ , our trace engine performs the following scans:

**Scan 1.** Event  $e_1 = \langle \text{evt}, \neg \diamond, p_s, q_s, f_{s_Q} \rangle$  is read and delivered to  $T_P$  since  $e_1 \cdot s = p_s \in \{p_s\}$ . *T* is <sup>1322</sup> updated to  $\{p_s, q_s\}$ , by tracing assumption  $A_2$  $A_2$ .

**Scan 2.** The update in *T* triggers the next scan. Event  $e_2 = \langle \text{evt}, ?, q_s \rangle$  is delivered to  $T_P$ , as  $e_2 \iota_s = q_s \in \{p_s, q_s\}$ . The events  $\langle \text{evt}, !, p_s, q_s \rangle$  and  $\langle \text{evt}, \sim q_s, r_s, f_{s_R} \rangle$  follow, and *T* is <sup>1325</sup> updated to  $\{p_{\rm s}, q_{\rm s}, r_{\rm s}\}.$ 

<sup>1326</sup> **Scan 3.** *B* is empty and no buffer scan is performed.

1327 The event sequence ' $\sim_P$   $?_Q$   $.$ ! $_P$   $\sim_Q$ ' in our initial buffer is delivered to  $T_P$  unchanged.

## <span id="page-37-0"></span><sup>1328</sup> **B.2 Algorithm Invariants**

 The invariants listed below ensure the correct handling of evt, dtc, rtd and messages by tracers. Lines [37,](#page-9-0) [51,](#page-9-0) and [60](#page-9-0) in alg. [1,](#page-10-0) and lines [45](#page-12-0) and [50](#page-12-0) in alg. [3](#page-13-1) include the main invariants 331 below (respectively  $I_{17}$  $I_{17}$  $I_{17}$ ,  $I_{20}$  $I_{20}$  $I_{20}$ , and  $I_{19}$  $I_{19}$  $I_{19}$  in alg. 1 and  $I_{22}$  $I_{22}$  $I_{22}$  in alg. [3\)](#page-13-1). We elide the remaining invariants from algs. [1](#page-10-0) and [3](#page-13-1) in favour of presentation conciseness. As is the case with the invariants  $I_{17}$  $I_{17}$  $I_{17}$ ,  $I_{19}$  $I_{19}$  $I_{19}$ ,  $I_{20}$  $I_{20}$  $I_{20}$ , and  $I_{22}$  $I_{22}$  $I_{22}$ , our Erlang realisation of RIARC implements the elided ones as **assert** and **fail** statements. These invariants reason about general properties the tracer choreography should observe at all times. For instance, our invariants guarantee properties, such as, 'every trace event that is dispatched by the dispatch tracer eventually reaches the intended tracer', that 'the monitor choreography grows dynamically', and that 'redundant tracers are always garbage collected'. The invariants make use of three notions introduced in the main paper, which we recall for the benefit of readers.

- $_{1340}$   $\triangleright$  Note 10 (Tracers and messages).
- <sup>1341</sup> *Dispatch tracer*, sec. [3.2.](#page-9-0) A tracer that receives trace events meant to be handled by <sup>1342</sup> another tracer,
- <sup>1343</sup> *Forwarded message*, sec. [3.2.](#page-9-0) An evt or dtc message that is embedded in a rtd packet <sup>1344</sup> dispatched by a dispatch tracer,
- <sup>1345</sup> *Direct trace event*, sec. [3.3.](#page-12-0) An evt event that is not dispatched by a dispatched tracer  $_{1346}$  but gathered from a SuS process via tracing.

<sup>1347</sup> We organise invariants into two categories: the first describes properties of the tracer <sup>1348</sup> DAG topology, while the second focusses on tracer coordination and correct message delivery.

- <sup>1349</sup> **Tracer choreography invariants** Ensure that a DAG topology between tracers is always <sup>1350</sup> maintained by dynamic message routing.
- <sup>1351</sup> **I<sup>1</sup>** A tracer *never* terminates unless its routing (Π) and traced-processes (Γ) maps are empty.
- <span id="page-38-3"></span><sup>1352</sup> **I<sup>2</sup>** A tracer *never* adds a SuS PID that already exists in its traced-processes map Γ.
- <span id="page-38-4"></span><sup>1353</sup> **I<sup>3</sup>** A tracer *never* removes an inexistent SuS PID from its traced-processes map Γ.
- <span id="page-38-9"></span> $I<sub>1354</sub>$  **I**<sub>4</sub> A tracer *always* acts on a  $\diamond$  event by adding the spawned SuS PID to its traced-processes <sup>1355</sup> map Γ. *Requires invariant [I](#page-38-3)<sup>2</sup> to hold.*
- <span id="page-38-10"></span>1356 **I**<sub>5</sub> A tracer *always* acts on an  $\times$  event by removing the SuS PID from its traced-processes  $\sum_{1357}$  map Γ. *Requires invariant [I](#page-38-4)<sub>3</sub> to hold.*
- <span id="page-38-5"></span><sup>1358</sup> **I<sup>6</sup>** A tracer *never* adds a next-hop that already exists in its routing map Π.
- <span id="page-38-6"></span><sup>1359</sup> **I<sup>7</sup>** A tracer *never* removes an inexistent next-hop from its routing map Π.
- <span id="page-38-7"></span> $I_{360}$  **I**<sub>8</sub> A tracer *always* acts on a  $\sim$  event by adding a next-hop for the spawned SuS PID to its 1361 routing map Π. *Requires invariant*  $I_6$  $I_6$  *to hold.*
- 1362 **I**<sub>9</sub> A dispatch tracer that dispatches a  $\sim$  event *always* adds a next-hop for the spawned SuS **P[I](#page-38-5)D** to its routing map Π. *Requires invariant*  $I_6$  *to hold.*
- <span id="page-38-8"></span> $_{1364}$  **I**<sub>10</sub> A tracer that forwards a  $\sim$  event *always* adds a next-hop for the spawned SuS PID to its 1365 routing map Π. *Requires invariant*  $I_6$  $I_6$  *to hold.*
- <span id="page-38-11"></span><sup>1366</sup> **I<sup>11</sup>** A dispatch tracer that dispatches a dtc acknowledgement *always* removes the corresponding <sup>1367</sup> next-hop for the detached SuS PID from its routing map Π. *Requires invariant [I](#page-38-6)<sup>7</sup> to* <sup>1368</sup> *hold.*
- <span id="page-38-14"></span><sup>1369</sup> **I<sup>12</sup>** A tracer that forwards a dtc acknowledgement *always* removes the corresponding next-hop
- 1370 for the detached SuS P[I](#page-38-6)D from its routing map  $\Pi$ . *Requires invariant*  $I_7$  *to hold.*
- <sup>1371</sup> **Message routing invariants** Ensure that trace events are reported soundly to monitors.
- <span id="page-38-12"></span><sup>1372</sup> **I<sup>13</sup>** A tracer *never* dispatches or forwards an evt or dtc message unless a route exists in its 1373 routing map Π. *Requires invariants*  $I_8 - I_{10}$  $I_8 - I_{10}$  $I_8 - I_{10}$  $I_8 - I_{10}$  *to hold.*
- <span id="page-38-13"></span><sup>1374</sup> **I<sup>14</sup>** A tracer in • mode *always* prioritises rtd packets until it switches to ◦ mode.
- <sup>1375</sup> **I<sup>15</sup>** A tracer in mode *always* transitions to mode only if all of the SuS PIDs in its 1376 traced-processes map  $\Gamma$  are marked as  $\circ$  or  $\Gamma$  is empty.
- <sup>1377</sup> **I<sup>16</sup>** The total amount of dtc requests a tracer issues is *always* equal to the sum of the number <sup>1378</sup> of SuS PIDs in its traced-processes map Γ and the number of terminated SuS PIDs for 1379 the tracer. *Requires invariants*  $I_4$  $I_4$  *and*  $I_5$  *to hold.*
- <span id="page-38-0"></span><sup>1380</sup> **I<sup>17</sup>** A tracer in ◦ mode *always* acts on a dtc request by dispatching it to the next-hop. *Requires*  $\frac{1}{1381}$  $\frac{1}{1381}$  $\frac{1}{1381}$  *invariants*  $I_{11}$  $I_{11}$  $I_{11}$  *and*  $I_{13}$  *to hold* (see line [37](#page-9-0) in alg. [1\)](#page-10-0).
- <sup>1382</sup> If dispatching is not possible, the dtc request is incorrectly issued.
- $_{1383}$  **I**<sub>18</sub> A tracer in  $\circ$  mode *always* acts on a direct evt by analysing or dispatching it to the <sup>1384</sup> next-hop. *Requires invariant I[13](#page-38-12) to hold.*
- <span id="page-38-2"></span><sup>1385</sup> **I<sup>19</sup>** A tracer in ◦ mode *always* acts on a dispatched evt by forwarding it to the next-hop. <sup>[13](#page-38-12)86</sup> *Requires invariant*  $I_{13}$  *to hold* (see line [60](#page-9-0) in alg. [1\)](#page-10-0).
- <sup>1387</sup> Analysing a dispatched evt in mode means that the tracer dequeued a priority event,  $_{1388}$  violating invariant  $I_{14}$  $I_{14}$  $I_{14}$ .
- <span id="page-38-1"></span><sup>1389</sup> **I<sup>20</sup>** A tracer in ◦ mode *always* acts on a dispatched dtc acknowledgement by forwarding it to [13](#page-38-12)90 the next-hop. *Requires invariants*  $I_{12}$  $I_{12}$  $I_{12}$  *and*  $I_{13}$  *to hold* (see line [51](#page-9-0) in alg. [1\)](#page-10-0).
- <sup>1391</sup> Handling a dispatched dtc acknowledgement in mode means that the tracer dequeued a  $_{1392}$  priority acknowledgement, violating invariant  $I_{14}$  $I_{14}$  $I_{14}$ .
- <sup>1393</sup> **I<sup>21</sup>** A tracer in mode *always* acts on a dispatched evt by analysing or forwarding it to the <sup>1394</sup> next-hop. *Requires invariant I[13](#page-38-12) to hold.*

## **23:40 Runtime Instrumentation for Reactive Components**

- <sup>1395</sup> A tracer in mode never dispatches events. Only tracers in mode can dispatch events, <sup>1396</sup> which are always direct events. Dispatching in • mode means that the tracer dequeued a
- 1397 non-priority event, violating invariant  $I_{14}$  $I_{14}$  $I_{14}$ .
- <span id="page-39-0"></span><sup>1398</sup> **I<sup>22</sup>** A tracer in • mode *always* acts on a dispatched dtc acknowledgement by handling or [13](#page-38-12)99 forwarding it to the next-hop. *Requires invariants*  $I_{12}$  $I_{12}$  $I_{12}$  *and*  $I_{13}$  *to hold* (see lines [45](#page-12-0) and <sup>1400</sup> [50](#page-12-0) in alg. [3\)](#page-13-1).
- 1401 A tracer in mode never dispatches dtc acknowledgements. Only dispatch tracers in  $\circ$
- <sup>1402</sup> mode can dispatch dtc acknowledgements, which are always received from the tracers
- <sup>1403</sup> wishing to detach a SuS PID from the dispatch tracer. Dispatching in mode means
- that the tracer dequeued a non-priority command, violating invariant  $I_{14}$  $I_{14}$  $I_{14}$ .

## <span id="page-40-2"></span>**C Appendix C: Empirical Evaluation**

 App. [C.1](#page-40-0) details why existing benchmarking tools adopted in monolithic RV are inapplicable to our work. We use BenchCRV, which is tailored for setting up and building experiments that target RV for reactive systems; see apps. [C.2](#page-40-1) and [C.3.](#page-41-0) The message numbering scheme BenchCRV employs in its master-worker models provides monitoring tools with a hook to implement assertions about trace events. We rely on this feature to ensure trace soundness in experiments. Our experiment set-up is summarised in app. [C.4,](#page-42-0) along with a list of precautions in app. [C.5.](#page-44-1) App. [C.6](#page-45-1) concludes with results supporting our arguments and conclusions in the main text.

## <span id="page-40-0"></span>**C.1 Benchmarking**

 Benchmarking is a standard method of gauging runtime overhead in software [\[103,](#page-31-17) [80,](#page-30-13) [36\]](#page-28-11). Established benchmarks such as SPECjvm2008 [\[136\]](#page-32-22), DaCapo [\[28\]](#page-28-12), Renaissance [\[122\]](#page-32-23)  $_{1417}$  ScalaBench [\[135\]](#page-32-24)—developed for fine-tuning aspects of the JVM and actor libraries—are used by the RV community to assess the applicability of monitoring, *e.g.* see [\[116,](#page-32-8) [47,](#page-29-12) [46,](#page-29-9) [124,](#page-32-9) [30,](#page-28-8) [109,](#page-31-24) [81\]](#page-30-18). These frameworks rely on third-party off-the-shelf (OTS) programs to broaden and diversify benchmark coverage. *Synthetic benchmarks*, *e.g.* Savina [\[87\]](#page-31-18), are an alternative way to perform benchmarking [\[34\]](#page-28-18) and offer benefits over their OTS program-based analogues. For instance, parameters are used to induce variations in the core benchmark behaviour, enabling them to *reproduce* and control the *repeatability* of experiments. Interested readers are referred to [\[7\]](#page-27-14) for a detailed account of the pros of synthetic benchmarking. All the benchmarking tools cited are *not* built with concurrency in mind, *e.g.* cannot generate high workloads that follow profiles typical in practice [\[7\]](#page-27-14). Along with synthetic benchmarking  $_{1427}$  tools by the RV community [\[20,](#page-28-19) [68,](#page-30-21) [125,](#page-32-25) [22\]](#page-28-20), the former ones gather metrics specific to *monolithic* batch-style programs (*e.g. execution slowdown*), which are orthogonal to reactive systems. These reasons make these tools inapplicable to our setting.

## <span id="page-40-1"></span>**C.2 BenchCRV workload parameters**

 BenchCRV generates workloads based on profiles observed in practice. A workload profile dictates how the master spreads its creation of worker processes along the loading timeline, specified by the parameter *t* in seconds (s). The volume of workers per run is set via the parameter *n*. Every task the master allocates a worker consists of a *batch* of requests that the worker receives and echoes back to the master. The number of requests batched in one task is given by the parameter *w*. BenchCRV uses *w* to generate different batch sizes for each worker to induce a modicum of variability in the master-worker models it generates. The actual batch size is generated within the range *w* by drawing the number of work requests from a normal distribution with mean  $\mu = w$  and standard deviation  $\sigma = \mu \times 0.02$ .

- BenchCRV tool offers three load profiles.
- **Steady** models scenarios where the SuS operates under stable conditions. The Steady workload is modelled on homogeneous Poisson distribution with *rate*  $\lambda$ , which specifies the mean number of workers created per second along the loading timeline with the  $_{1444}$  duration  $t = \lceil n/\lambda \rceil$ .
- **Pulse** models scenarios where the SuS experiences gradually rising and falling loads. The Pulse workload is configured by the *spread* parameter *η*, which determines how slowly or sharply the load increases as it nears its peak, halfway along *t*. Pulses are modelled on a 1448 Normal distribution with  $\mu = t/2$  and  $\sigma = \eta$ .

#### **23:42 Runtime Instrumentation for Reactive Components**

<span id="page-41-1"></span>

|                  | Param Description                           | Param     | Description                                            |
|------------------|---------------------------------------------|-----------|--------------------------------------------------------|
| $\boldsymbol{n}$ | Total number of worker                      | $\lambda$ | Steady workload rate                                   |
|                  | processes per experiment                    | $\eta$    | Pulse workload spread                                  |
| $\overline{u}$   | Total number of requests<br>per worker task | $\pi$     | Burst workload pinch                                   |
| t                | Load timeline (inapplic-                    |           | $Pr(send)$ Probability master issues a work request    |
|                  | able for Steady workload)                   |           | $Pr(recv)$ Probability master dequeues a work response |
|                  | (a) Master-worker model parameters          |           | (b) Workload and reactiveness parameters               |

**Table 4** BenchCRV configurable parameters for generating master-worker models and workloads

<sup>1449</sup> **Burst** models scenarios where the SuS is stressed due to load spikes. The Burst workload is <sup>1450</sup> configured by the *pinch* parameter *π*, which controls the concentration of the initial load burst. Bursts are modelled on a Log-normal distribution with  $\mu = \ln(m^2/\sqrt{p^2 + m^2})$  and  $\sigma = \sqrt{\ln(1+p^2/m^2)}$ .

<sup>1453</sup> Tbl. [4](#page-41-1) summarises the parameters used to generate master-worker models [\(4a\)](#page-41-1) and <sup>1454</sup> workloads [\(4b\)](#page-41-1). Fig. [13](#page-43-0) shows examples of the Steady, Pulse, and Burst workloads for a <sup>1455</sup> loading timeline of  $t = 100$ . These benchmarks are set with  $n = 500k$  workers and  $w = 100$ <sup>1456</sup> work requests per batch. The Steady workload is configured with  $\lambda = 5k$ , Pulse with  $\eta = 25$ , <sup>1457</sup> and Burst with  $\pi = 100$ .

 Systems respond to load at different rates, *e.g.* due to the computational demand of tasks, IO, *etc.* BenchCRV simulates such phenomena via the parameters Pr(*send*) and Pr(*recv*). Pr(*send*) controls the probability that the master allocates requests to workers; Pr(*recv*) determines the probability that work responses received by the master are dequeued and acknowledged. Sending and receiving are *turn-based* and modelled on a Bernoulli trial [\[121\]](#page-32-26). The master picks a worker from its Work queue. It then draws a random number *X* from a uniform distribution on the interval [0*,*1] and sends a work request when the Bernoulli trial succeeds, *i.e.*,  $X \leq \Pr(send)$ . The master decrements the work request counter for that worker and keeps sending requests to the same worker by drawing the next *X* until the Bernoulli trial fails, *i.e., X >* Pr(*send*), or the request counter reaches 0. If a Bernoulli trial fails on the first request-sending attempt, the worker misses its turn, and the next worker in the Work queue is picked. The master dequeues work responses it receives from workers using the scheme described. It repeatedly dequeues one response per successful Bernoulli trial, *i.e.*,  $X \leq Pr(recv)$ , until the trial fails or the Receive queue is empty. The master signals workers to terminate once it acknowledges their work responses.

<sup>1473</sup> The developers of BenchCRV establish that adjusting  $Pr(send) = Pr(recv) = 0.9$  yields <sup>1474</sup> SuS models that emulate *realistic* web-server response times. We use these recommended <sup>1475</sup> values in our experiments of sec. [5.](#page-17-0) Readers are referred to [\[7\]](#page-27-14) for details.

## <span id="page-41-0"></span><sup>1476</sup> **C.3 BenchCRV messaging model**

 The master-worker models that BenchCRV generates use a simple protocol to track the work requests allotted to different workers. Workers are initialised with IDs, which we denote by the placeholder *Id*, which enable the master to track the progress of *tasks* assigned. Each worker task comprises a sequence of work requests, *NumReqs*. The value of *NumReqs* for all workers is initially set to the value of the batch parameter *w*; see tbl. [4a.](#page-41-1) Work requests

<span id="page-42-2"></span>

**Figure 12** Centralised and RIARC monitoring arrangement on the master *M* and workers *W<sup>i</sup>*

 $_{1482}$  in a task are assigned a unique sequence number,  $ReqNum$ , where  $1 \leq ReqNum \leq NumRegs$ . that identifies each request sent to a worker. The master process relies on *ReqNum* to determine when a task assigned to a particular worker is completed. A worker task completes <sup>1485</sup> when  $RegNum = NumRegs$ , whereupon the master sends a special termination message to the worker. The triple ⟨*Id , ReqNum, NumReqs*⟩ used in BenchCRV uniquely identifies work requests and responses in the system. BenchCRV relies on four messages to emulate work between the master and worker processes:

 $\langle Pid_M, \langle \text{chunk}, \langle \text{Id}, \text{RegNum}, \text{NumReg} \rangle \rangle \rangle$ . Work request message that the master sends <sup>1490</sup> to the worker.

 $\langle Pid_M, \langle \text{term}, \langle Id, \text{RegNum}, \text{NumRegs} \rangle \rangle \rangle$ . Termination message that the master sends to <sup>1492</sup> the worker once the task is complete, *i.e.*, when  $\text{Re}qNum = \text{Num} \text{Re}qs.$ 

 $\langle Pid_w, \langle \text{ack}, \langle Id, \text{RegNum}, \text{NumRegs} \rangle \rangle \rangle$ . Work response message that the worker sends <sup>1494</sup> to the master.

 $\langle Pid_w, \langle \text{end}, \langle Id, \text{RegNum}, \text{NumReqs} \rangle \rangle$ . Completion message that the worker sends to the <sup>1496</sup> master when the last work request in a task is processed, *i.e.,* when *ReqNum* = *NumReqs*.

## <span id="page-42-0"></span><sup>1497</sup> **C.4 Experiment set-up**

 Our empirical evaluation of sec. [5](#page-17-0) configures benchmarks to monitor the master process and each worker that the master spawns. Fig. [12](#page-42-2) overviews the arrangements of centralised and RIARC monitoring; inline monitoring follows that of fig. [1a.](#page-3-0) Inline monitoring uses the tool of [\[3,](#page-27-5) [4\]](#page-27-15) to instrument the master and worker components in BenchCRV *statically*. The resulting modified code is then run in benchmarks. Centralised and RIARC monitoring rely on the EVM tracing to gather events without modifying the BenchCRV code. Our centralised monitoring benchmarks utilise detectEr [\[75,](#page-30-7) [16,](#page-27-8) [17,](#page-27-4) [15,](#page-27-1) [73,](#page-30-2) [40\]](#page-28-14) to collectively instrument the master and every worker process with one central monitor. This central monitor, labelled *T<sup>C</sup>* in fig. [12a,](#page-42-2) analyses all the trace events gathered. The benchmarks set up with RIARC

<span id="page-42-1"></span>

**Table 5** Benchmark configurations and message throughput at *maximum* Steady workloads

#### **23:44 Runtime Instrumentation for Reactive Components**

 monitoring instrument the master and worker processes with identical monitor replicas, as illustrated in fig. [12b.](#page-42-2)

 Tbl. [5](#page-42-1) summarises all our experiment configurations from sec. [5.2.](#page-18-4) The table includes the mean throughput of work request and response messages exchanged between the master and worker processes under the Steady workload at its maximum. This maximum workload 1512 is at 100k workers for the high concurrency scenario  $C_H$  $C_H$  on platform  $P_E$  $P_E$ , and at 500k <sup>1513</sup> workers for the high [C](#page-18-3)<sub>H</sub> and at 5k workers for the moderate concurrency scenario C<sub>M</sub> on  $_{1514}$  platform  $P_G$  $P_G$ . It is worth underscoring that the high and moderate concurrency settings <sup>1515</sup> used on platform  $P_G$  $P_G$  yield an approximate number of messages in the master-worker models 1516 generated by Bench[C](#page-18-3)RV. However, the throughput of 328k messages/s generated by  $C_M$  is  $_{1517}$   $\approx$  76% higher than that of [C](#page-18-2)<sub>H</sub> at 218k messages/s. This gap in throughput stems from the task batch size *w*, which controls the number of requests the master issues to each worker. [C](#page-18-3)<sub>H</sub> and C<sub>M</sub> assess two facets of inline, centralised, and **RIARC** instrumentation:

**Stress handling** [C](#page-18-2)<sub>H</sub> stresses each instrumentation method by inducing intense concurrency. The master provokes stress by spawning large numbers of workers (*n* = 500k) continually during benchmark runs. Combined with the short worker lifespan due to modest request processing  $(w = 100)$ , this induces constant dynamic changes in the master-worker model. Intense concurrency tests the ability of RIARC to reorganise the tracer DAG topology <sup>1525</sup> and how this affects runtime overhead.

**Throughput handling**  $C_M$  $C_M$  studies how instrumentation copes with high message throughput. The master creates comparatively fewer workers (*n*=1k), which engage in computationally long tasks (*w*=100k). Most workers are spawned in the first stages of benchmark runs and produce master-worker models exhibiting milder concurrency where workers terminate less frequently. Milder concurrency tests how RIARC operates in stabler conditions and how the infrequent trace event routing and tracer reconfigurations affect runtime overhead. Sec. [5.5](#page-23-0) shows that inline and RIARC monitoring deliver similar results in these scenarios.

 We reshape the stress and throughput factors described using the Steady, Pulse, and Burst workload profiles (see app. [C.2\)](#page-40-1). This variation increases our benchmark coverage and, in turn, the generality of our conclusions drawn from the results. Fig. [13](#page-43-0) visualises the 1536 Steady, Pulse, and Burst workloads for the high concurrency scenario  $C_H$  $C_H$  with 500k workers for each of the *ten* benchmark runs we use in experiments.

<span id="page-43-0"></span>

**Figure 13** Steady, Pulse and Burst workloads distributions of 500k workers sustained for 100s

## <span id="page-44-1"></span>**C.5 Precautions**

 The following precautions minimise the biases in our benchmarks and enhance the repeatability of our empirical evaluation presented in sec. [5.](#page-17-0)

## **C.5.1 Repeatability**

 Data variability affects the repeatability of experiments [\[69\]](#page-30-22). The coefficient of variation <sup>1543</sup> (CV) [\[57\]](#page-29-22), *i.e.*, the ratio of the standard deviation  $\sigma$  to the mean  $\bar{x}$ , can be used to empirically establish the minimum number of experiment repetitions needed to obtain representative 1545 data. We denote this number by the variable *m*. The CV is calulated using  $CV = \sigma / \bar{x}$ .

 We choose the minimum value of *m* for our experiments as follows. First, we calculate the CV for the *first* batch of experiments for an initial number of repetitions *m*. This result, *cv*, is then compared to the CV calculation for the *next* batch of experiment repetitions, *m'*. The value *m'* increments the number of benchmark repetitions to take by some batch offset value *b*, *i.e.*,  $m' \leftarrow m + b$ . We denote the CV obtained from the new calculation over  $m'$  repetitions as  $cv'$ . The value  $cv$  is subtracted from  $cv'$ : if the difference is sufficiently small for some error threshold *ϵ*, the former number of repetitions, *m*, is selected. Otherwise, we repeat this procedure, setting  $cv \leftarrow cv'$  and calculating the *new* CV value,  $cv'$ , for the next batch increment,  $m'' \leftarrow m' + b$ . Crucially, the condition  $(cv' - cv) \leq \epsilon$  must hold for *all* the variables measured in the experiment before *m* can be fixed. We perform these calculations to determine the number of benchmark repetitions used in sec. [5.](#page-17-0)

 We also seed the Erlang pseudorandom number generator to minimise the data variability between experiments. Fixing the randomisation seed replicates the same workloads in all our experiments, making them repeatable. The upshot is that it requires fewer benchmark repetitions before the response time, memory consumption, and scheduler utilisation gathered by BenchCRV converge to an acceptable CV. Note that fixing the seed still permits our master-worker models to enjoy a degree of variability, which stems from the interleaved execution of processes due to scheduling.

## <span id="page-44-0"></span>**C.5.2 Centralised and decentralised monitoring**

 RIARC projects the global trace into partitions that reflect the *local* execution at SuS processes. It exploits the natural tree relationship induced by process spawning to create trace partitions, as sec. [2.1](#page-4-4) remarks. By contrast, centralised monitoring gathers process events as one *global* trace sequence capturing the overall SuS behaviour. Existing work [\[47,](#page-29-12) [126\]](#page-32-27) shows how a global trace can be efficiently sliced to recover trace partitions via a technique called parametric trace slicing (PTS). PTS generates the same local view of the SuS process execution induced by RIARC. Our centralised monitoring set up with detectEr employs PTS. Its implementation consists of a specialised singleton monitor that *dynamically* demulti- plexes the incoming stream of trace events. The projection relies on the PID carried by trace events, *i.e.*, *e.i.*<sub>S</sub> in tbl. [1a](#page-6-1) of sec. [2.1,](#page-4-4) to direct them to corresponding local monitors. PTS enables us to reuse the monitors from our benchmarks with inline and RIARC monitoring. One crucial benefit of monitor reuse is that the *same* RV analysis logic is executed by the outline, inline, and RIARC monitors in our experiments, eliminating biases. The central monitor maintains a *monitor map* indexed by this PID to access the associated monitors efficiently and delegate the RV analysis. Our central monitor implementation ensures that every local monitor is created when needed and removed when its RV analysis completes. This measure guarantees the lowest possible overhead and does not bias our results against centralised monitoring.

#### **23:46 Runtime Instrumentation for Reactive Components**

<sup>1583</sup> The function ANALYSEEVT( $\zeta_M$ e) conducts the RV analysis. ANALYSEEVT takes a monitor  $_{1584}$  signature,  $\varsigma_M$ , and reduces it by repeatedly applying it to the next event *e* from a sequence of trace events. Each application,  $\varsigma_M(e_i)$ , returns the *new* monitor state  $\varsigma'_M$ , which is used for <sup>1586</sup> the next reduction,  $\zeta'_{\mathcal{M}}(e_{i+1})$ , and so forth. ANALYSEEVT *stops* reducing  $\zeta_{\mathcal{M}}$  when one of two conditions hold:

- **Verdict flag** signals that the RV monitor *accepts* or *rejects* the behaviour of the SuS process based on the events analysed. We refer interested readers to [\[21,](#page-28-1) [15,](#page-27-1) [73\]](#page-30-2) for an introduction to RV monitoring.
- **End of partition** informs the RV monitor that there are *no* further trace events to analyse <sup>1592</sup> for the SuS process. The end of the partition is marked by the  $\star$  event.

 Either condition terminates the RV analysis, whereupon the monitor becomes stale. Sec. [3.6](#page-15-1) overviews how stale monitors are disposed of when tracers are garbage collected.

 In our empirical experiments, we use the sequence numbers carried by BenchCRV work request and response messages to ensure trace soundness; see app. [C.3.](#page-41-0) Our specialised 1597 monitor signature  $\varsigma_M$  maintains an internal offset to assert the trace event number, *ReqNum*, expected next. Monitors also confirm that the trace is reported in its entirety. We rely on *NumReqs*, which is used by BenchCRV worker processes to detect that all the work request messages from their respective batches are delivered to them. These basic checks guarantee that the trace event sequences monitors receive are *complete* and *consistent* per def. [1.](#page-1-4)

## <span id="page-45-1"></span>**C.6 Further results**

We include further data plots supporting our conclusions of sec. [5.](#page-17-0)

## <span id="page-45-0"></span>**C.6.1 Monitoring overhead**

 Fig. [14](#page-46-0) shows the overhead induced by centralised, inline, and RIARC monitoring. Charts include the overhead for the three monitoring methods under the Pulse workload to complete our findings from sec. [5.4.2.](#page-20-0) We recall that the *runtime monitoring* overhead combines the instrumentation and slowdown due to the RV analysis. Sec. [5.3](#page-19-4) establishes this RV slowdown  $_{1609}$  at  $\approx$  5µs per analysed trace event in our experiments. The slowdown stems from the runtime checking that our monitors perform to ensure that the trace event sequences reported by the instrumentation are sound, def. [1;](#page-1-4) see also app. [C.5.2.](#page-44-0)

 As fig. [8](#page-22-1) from sec. [5.4.2,](#page-20-0) fig. [14](#page-46-0) demonstrates that centralised monitoring crashes in  $_{1613}$  our experiments (marked by  $\times$  in plots) when the Pulse workload is applied. The dumps recovered from crashes indicate that centralised monitoring fails for the reasons given in sec. [5.4.2.](#page-20-0) These plots also confirm that inline and RIARC monitoring are not afflicted by the  $_{1616}$   $\approx$  5µs RV analysis slowdown. We emphasise that RIARC induces almost comparable latency to inline monitoring even under the Pulse workload. Fig. [14](#page-46-0) (top right, middle) puts the latency at 212ms for inline monitoring *vs.* 538ms for RIARC at a peak Pulse workload of 1*.*7k workers/s. The difference of 326ms between the two methods is lower than the 454ms gap calculated for the Burst workload in sec. [5.4.2.](#page-20-0)

 The plots in fig. [14](#page-46-0) (bottom) exhibit high scheduling utilisation: a byproduct of the  $_{1622}$  limited number of scheduling threads (4) available on the edge-case platform  $P_{E}$  $P_{E}$ . Our plots  $_{1623}$  in app. [C.6.2](#page-46-1) for experiments conducted on the general-case platform  $P_G$  $P_G$  show that the scheduler utilisation is drastically reduced when using 16 scheduling threads. This reduction  $\frac{1625}{1625}$  is exhibited even under the maximum workloads of  $\approx 200M$  trace events, which is five times  $_{1626}$  higher than the  $\approx 40M$  workload used in fig. [14.](#page-46-0) Inline, and in particular, RIARC monitoring,

<span id="page-46-0"></span>

**Figure 14** Instrumentation and RV monitoring overhead gap (*high* workload, 100k workers)

<sup>1627</sup> benefit from the added scheduling capacity to scale accordingly. Centralised monitoring does <sup>1628</sup> not exhibit this behaviour; see app. [C.6.2](#page-46-1) for details.

## <span id="page-46-1"></span><sup>1629</sup> **C.6.2 Scaled set-up**

 $1630$  Our experiments on platform  $P_E$  $P_E$  study how centralised, inline, and RIARC monitoring behave <sup>1631</sup> in edge-case situations where the memory is constrained, and the possibility of parallelism <sup>1632</sup> is limited; see app. [C.6.1.](#page-45-0) The next set of experiments confirms that the same behaviour  $_{1633}$  observed on platform  $P_{\rm E}$  $P_{\rm E}$  for the three monitoring methods is preserved in general cases. <sup>1634</sup> These benchmarks are conducted on the general-case platform  $P_G$  $P_G$  and use  $n = 500k$  workers,  $w = 100$  requests per worker, and 16 scheduling threads.

<sup>1636</sup> Fig. [15](#page-47-0) completes our view of instrumentation and runtime monitoring overhead given <sup>1637</sup> in fig. [8](#page-22-1) from sec. [5.4.2.](#page-20-0) The memory consumption and scheduler utilisation plots of fig. [15](#page-47-0) <sup>1638</sup> (bottom) magnify the bottleneck that afflicts centralised monitoring in fig. [8](#page-22-1) of sec. [5.4.2.](#page-20-0) In <sup>1639</sup> the latter benchmarks taken on the edge-case platform  $P_{\rm E}$  $P_{\rm E}$  with 100k workers, centralised 1640 monitoring plateaus to a mean scheduler utilisation of  $\approx 31.8\%$  at the  $\approx 50$ k workers mark 1641 before eventually crashing. By comparison, the plots in fig. [15](#page-47-0) show this to be at  $\approx$ 4.7% at the

<span id="page-47-0"></span>

#### **23:48 Runtime Instrumentation for Reactive Components**

**Figure 15** Instrumentation and RV monitoring overhead gap (*high* workload, 500k workers)

 *same* workload of 50k workers. This drop in scheduler utilisation for centralised monitoring stems from two reasons. First, the central monitor is limited in its use of the scheduling 1644 resources offered by platform  $P_G$  $P_G$  due to the sequential processing of trace event messages. Second, the mean scheduler utilisation in this set-up is calculated over 16 scheduling threads.

1646 Sec. [5.4.2](#page-20-0) reports higher scheduler utilisation values on the edge-case platform  $P_E$  $P_E$  because 1647 the EVM scheduling is limited to 4 threads; processes on  $P_G$  $P_G$  are spread across more schedulers. 1648 The added parallelism gained through the extra 12 scheduling threads on platform  $P_G$  $P_G$  permits workers to increase the message throughput in the corresponding master-worker models. For instance, the throughput of 162k messages/s with 100k workers under the Steady workload is raised to 218k messages/s in the benchmarks using 500k workers; refer to tbl. [5.](#page-42-1) This higher message throughput exacerbates the stress on the central monitor. We emphasise that the absence of crashes in the plots of fig. [15](#page-47-0) is attributable to the considerable memory provided  $_{1654}$  by the general-case platform  $P_G$  $P_G$  rather than by the ability of centralised monitoring to cope with high workloads. Fig. [15](#page-47-0) indicates that the continued increase in memory consumption eventually leads to failure when the memory capacity is exceeded.

 $1657$  Inline and RIARC monitoring enjoy the ample resources of platform  $P_{\rm E}$  $P_{\rm E}$ , scaling accord-<sup>1658</sup> ingly. This scalability manifests as conservative memory consumption and higher scheduler

<span id="page-48-1"></span>

**Figure 16** Inline and RIARC monitoring resource usage (*high* workload, 500k workers)

 utilisation. Readers may notice the response time gains of centralised monitoring over inline and RIARC monitoring in fig. [15.](#page-47-0) We attribute this to very different reasons. The RV analysis slowdown causes the response time degradation in the case of inline monitoring. The latency overhead RIARC induces on our master-worker models is a byproduct of outline monitors, which compete for the same pool of scheduling threads used by worker processes. Under fair execution [\[137\]](#page-32-15), workers reside in the EVM waiting queues for longer periods, impacting their ability to respond to work requests promptly. Fig. [8](#page-22-1) in sec. [5.4.2](#page-20-0) exhibits analogous behaviour. We conjecture that the response time for RIARC monitoring drastically improves in less extreme scenarios to those used for our benchmarks, which instrument *every* worker process in the model (see sec. [5.3\)](#page-19-4).

## <span id="page-48-0"></span><sup>1669</sup> **C.6.3 Resource usage**

 Sec. [5.4.3](#page-22-0) gives an alternative view that studies the overall monitoring overhead—from the point of SuS launch until monitors complete their RV analysis. We supplement those results, showing that centralised monitoring is not scalable, whereas inline and RIARC monitoring <sup>1673</sup> leverage the extended processing capacity provided by the general-case platform  $P_G$  $P_G$ .

 $_{1674}$  $_{1674}$  $_{1674}$  Fig. 16 complements fig. [9](#page-23-1) in sec. [5.4.3,](#page-22-0) showing that inline and RIARC monitoring display elastic behaviour under Pulse workloads, too. Figs. [17](#page-49-0) and [18](#page-49-1) put the *same* plots of figs. [9](#page-23-1) and [16](#page-48-1) into the context of centralised monitoring. The former plots attest to the vast amounts of memory centralised monitoring consumes. They also highlight its lack of elasticity, where the memory consumption patterns are insensitive to the workload profile applied.

<sup>1679</sup> The sequential operation of the central monitor protracts the time taken for the RV <sup>1680</sup> analysis to complete. Such delays may render centralised monitoring inapplicable to cases <sup>1681</sup> where the RV set-up depends on timely detections, as in online monitoring. For instance, the

## **23:50 Runtime Instrumentation for Reactive Components**

<span id="page-49-0"></span>

**Figure 17** Centralised, inline, and RIARC monitoring resource usage (*high* workload, 500k workers)

<span id="page-49-1"></span>

**Figure 18** Centralised, inline, and RIARC monitoring resource usage (*high* workload, 500k workers)

1682 benchmark runs captured in fig. [17](#page-49-0) respectively take  $\approx 862\%$  and  $\approx 843\%$  longer to finish <sup>1683</sup> executing under the Steady and Burst workloads, when compared to the baseline system.

<span id="page-50-1"></span>

**Figure 19** Centralised, inline, and RIARC monitoring scheduler load (*high* workload, 500k workers)

 Inline and RIARC monitoring terminate quicker under the same workloads. Inline monitoring 1685 registers an execution duration overhead of  $\approx 1\%$  and  $\approx 31\%$  w.r.t. baseline system in fig. [17](#page-49-0) 1686 (bottom). RIARC monitoring prolongs the execution further, at  $\approx 73\%$  and  $\approx 85\%$  under the Steady and Pulse workloads. Fig. [18](#page-49-1) for the Pulse workload shows analogous behaviour.

 Fig. [9](#page-23-1) of sec. [5.4.3](#page-22-0) and fig. [16](#page-48-1) unify the scheduler utilisation values by averaging over <sub>1689</sub> the 16 scheduler threads used in our general-case benchmarks on  $P_G$  $P_G$ . Scheduler oscillations with high peaks suggest simultaneous use of the scheduling threads. The absence of peaks in figs. [17](#page-49-0) and [18](#page-49-1) (bottom) for centralised monitoring results from the single-threaded monitor that cannot utilise other unoccupied EVM threads. Fig. [19](#page-50-1) records the load on the individual EVM scheduling threads (S<sub>1</sub> to S<sub>16</sub>) for the centralised and RIARC monitoring benchmark runs of fig. [17.](#page-49-0) The scheduler plots indicate *even load* distribution amongst the available threads for RIARC (top) under the Steady and Burst workloads. Even load distribution is consistent with the mean scheduler utilisation plots shown in fig. [17](#page-49-0) for RIARC monitoring. By contrast, the load distribution for centralised monitoring in fig. [19](#page-50-1) (bottom) becomes principally concentrated on scheduler threads  $S_1$  and  $S_2$  once the master and worker processes terminate. This behaviour is responsible for the *right skew* (*i.e.,* the right 'tail') in the scheduler utilisation plots of figs. [17](#page-49-0) and [18](#page-49-1) (bottom), which prolongs the execution of our centralised monitoring benchmarks.

## <span id="page-50-0"></span>**C.6.4 Moderate concurrency systems**

 Tbl. [3](#page-24-0) in sec. [5.5](#page-23-0) summarises the percentage overhead due to inline and RIARC monitoring w.r.t. the baseline system under the Steady and Burst workloads. These results are given on the general-case platform [P](#page-18-1)<sup>G</sup> at *maximum* workloads with 500k workers (high concurrency,  $_{1706}$  [C](#page-18-3)<sub>H</sub>) and 5k workers (moderate concurrency, C<sub>M</sub>). Fig. [20](#page-51-0) plots the results of *all* ten

<span id="page-51-0"></span>

#### **23:52 Runtime Instrumentation for Reactive Components**

**Figure 20** Inline and RIARC monitoring overhead gap (*high*/*moderate* workload, 500k/5k workers)

 $_{1707}$  benchmark runs. The master process in our  $\rm C_{H}$  $\rm C_{H}$  $\rm C_{H}$  spawns substantially more worker processes than the master on  $C_M$  $C_M$  in each corresponding benchmark run. These differences make the experiments on  $C_H$  $C_H$  and  $C_M$  incomparable in the number of processes created in a benchmark. For this reason, we use the benchmark run number (*x*-axis) to compare the overhead measured on  $C_H$  $C_H$  and  $C_M$  in fig. [20.](#page-51-0) We recall that the benchmarks on  $C_H$  and  $C_M$ generate an approximate volume of trace event messages.

 $F$ izi3 Fig. [20](#page-51-0) (bottom) registers negligible changes in scheduler utilisation between  $C_M$  $C_M$  and [C](#page-18-2)<sub>H</sub> for inline monitoring. Inline monitoring reduces its consumption of memory in our  $_{1715}$  experiments with  $\text{C}_{\text{M}}$  $\text{C}_{\text{M}}$  $\text{C}_{\text{M}}$ . We attribute this to the lower number of workers BenchCRV creates relative to the models with  $C_{\rm H}$  $C_{\rm H}$ . This change lowers the strain on the master process  $_{1717}$  induced by the constant spawning of workers throughout benchmark runs, which shrinks the memory footprint of the generated master-worker models. RIARC benefits from these moderately-sized master-worker models, as the memory consumption plots in fig. [20](#page-51-0) (middle) indicate. However, most of the memory gains RIARC shows ensue from the fewer trace event routing and tracer reconfigurations it needs to perform compared to our experiments with concurrency scenario  $C_H$  $C_H$ . As a result, inline and RIARC monitoring consume comparable <sup>1723</sup> amounts of memory. RIARC recruits more scheduler capacity,  $\approx 6.4\%$  *vs.*  $\approx 4.2\%$  of inline

## **Aceto et al. 23:53**

<sup>1724</sup> monitoring under both the Steady and Burst workloads. This slight  $\approx 2.2\%$  increase in <sup>1725</sup> scheduler utilisation enables RIARC to optimise the latency, bringing it *on par* with the <sup>1726</sup> latency induced by inline monitoring.