Systems can be described in terms of their boundaries and the information that passes across those boundaries. But at some point the system needs to perform activities. In the parlance of the ‘actor model’ we might describe the activities as behaviours that are invoked when a message is received.
However, this doesn’t really help one to get a handle on the causal cascade that could be triggered as a result of this message. Instead, to capture this notion of a long running activity that is intended to complete we often bandy around the term “process.”
However, what exactly is a process and how does this construct help us manage complexity within systems?
There are probably many definitions of covering notions of causally related branching chains of events (similar to traces in event based systems). However, without wanting to be quite so formal, and at the risk of over simplifying issues, I am going to consider a process to be an activity in a system that can be described by:
- a trigger event - some information packet (i.e. message) that is pushed to the system and starts a flurry of activity
- a mapping activity - some function that translates this message into further messages, possibly within the context of system state messages, and dispatches these to other systems (including the system itself or its state)
- a termination condition - some evaluation that be true when the activity is complete.
So, essentially, we are saying that a process as a start, middle and end in time.
However, building software (or general) systems that honour this is not necessarily as straightforward as one might like. In particular the issue of completion and closure posses a problem.
Now, rather than claiming to always solve this, as engineers it is sometimes sufficient to simply be clear about the degree to which we have solved and implemented the notion of closure and completion of processes within the systems we build.
To this end we can potentially see a number of levels of process management providing increasing levels of surety:
- event fire and forget - the system receives a trigger and simply starts doing stuff by directly calculating and dispatching messagese but not creating any long lasting context for the activity.
- create context without
ACK- the system receives a trigger and creates a dispatch context for the process that is intended to manage causually related ‘happens-after’ messages. But the process does not expect or track downstream
- create context with
ACKs - the system receives a trigger and creates a dispatch context to handle causally related ‘happens-after’ events. Additionally it expects and notes
ACKs from each downstream subsystem when subprocesses are initiated.
- create context with
FINs - the system receives a trigger and creates a dispatch context to handle causally related ‘happens-after’ events. Additionally it expects and tracks
ACKs for created subprocesses as well as expects and tracks
FINs for completed subprocesses.
- create context with capabilities,
FINs - the system creates a dispatch context to handle causally related ‘happens-after’ events. Additionally it and passes capabilities downstream with triggering subprocesses. These capabilities are tracked along with
Now, with the informal definition of a process in mind we can see that the first approach provides no surety of operations. If wired correctly then the system will implicitly achieve the desired goal, but the notion of a process is never explicitly reified in the system and there is no explicit record of completion.
The second approach makes the notion of a process explicit, but is really being somewhat hopeful. It triggers downstream activity, but does not attempt to track that activity. As such, once all expected ‘happens-after’ events are accounted for and any final downstream activity is started, the best that the implementation can do is to consider the process complete. This is clearly an inaccurate representation of the system state since there may be any amount of activity still in flight when the process signals its completion.
The third approach is a slight improvement in that the process expects to receive and acknowledgement that a downstream subprocess is taking place. As such, any triggered subprocesses are accounted for, but there is no mechanism by which to verify their completion. As such, beyond noting that the subprocess was initiated, there may be little point in retaining these details.
In the forth approach, we are finally able to maintain an accurate record of activity within the system. Here, the process not only records the activation of a subprocess, it also keeps track of their completion. That is, it has sufficient information to not only handle ‘happens-after’ events, but also to accurately signal completion of activities within the system. Here we can expect the process to remain alive in the system until all subprocesses have ceased their activity and signalled as much. This also means that the process now has sufficient information to accurately effect a process cancellation – strictly speaking one could argue that the third approach also has sufficient information since it could keep reference to all processes and then cancel them even if they has already completed, but this does seem somewhat inelegant.
Finally, the fifth approach pushes this one step further. It allows the initiating process to manage the resources involved downstream. It also means that in addition to requesting a cancellation, which is somewhat discretionary, it can now enforce the reclamation of resources by revoking the capabilities it passed downstream.
This last level of control is necessary in order to really be able to manage resources within a large scale system, thereby keeping operations bounded and honouring the pragmatic and real resource limitations inherent in any real system.
Thus, by having a reasonable and workable definition of a process we make it possible to introduce better mechanisms for managing completion and closure of processes. As such, we open up the opportunity for being explicit about how we expect processes to operate within a system. Ultimately this means that we can make better and more considered trade-offs and decisions when designing complex systems, and we can provide surety around honouring those design decisions.