In this post we consider the situation of a long running engineering project and see how taking a hypothetical, but explicitly, model driven approach could help to guide the undertaking to completion.
Firstly, what is a “project”?
This may seem like a obvious question, but the definition is important, because all too often in organisations the colloquial understanding is something like:
A project is a catch all for stuff related to that thing that somebody somewhere wants and that is going to need a bunch of people and time.
This may not be far from the useful reality, but it is somehow nebulous in nature.
We could more usefully define a project as:
A project is the collective coordination of the effort of a number of people, machines and general entities over time in order to reach a well defined and bounded singular goal that has measurable completion criteria.
Now, we see that the emphasis shifts from:
- Hmmm, the project is somebodies problem and we all hope that they make it work.
- Ah, the project is an objectively defined problem that we can all work towards.
But other than a nice play on words, what does this really give us?
The Coordination Problem
A project is about coordinating the efforts and activities of a number of people and systems.
One of the easiest ways to see that projects have a large coordination component is to consider a project that is carried out by just one person. Suddenly the apparent velocity of work seems to increase.
This has been noted may time, but probably most famously in the “Mythical Man Month”.
The problem here is that when there is only one person involved, they only need to coordinate with themselves. This means that they have a very high bandwidth view of the world as they think through aspects of the work and how to sequence it.
However, not matter how effective a person is, their efforts can only scale so far. At some point, in order to tackle larger undertakings, more people need to be brought into the mix.
So, what exactly is a coordination problem. As alluded to, the issue is around sequencing of work. This immediately begs the question of constraints, why can’t the work be sequenced in any way one likes? Dependencies. Many items of work will depend on a previous item being completed first. And it is often desirable to break up work into smaller and smaller tasks so that each can be evaluated independently.
Thus, for work on a project to flow usefully we need to sequence work based on the dependencies between items.
However, the danger in viewing projects this way is that the simplest thing to do is to create a sequential path through the work. This leads to idle time or work blockages because different people are waiting for each other before being able to continue. That is, we run the risk of loosing concurrency in the team.
Interestingly this has a number overlaps with computing systems, and achieving high concurrency and utilisation of computational resources so as to achieve better efficiency and lower wall clock time for the completion of computations.
Now, there are different ways to improve utilisation. If the work can be broken down into clear domains, where complete domains have standardised dependencies, then we can use a pipeline model:
- in CPUs we have a literal instruction pipeline
- in software we could follow the design pattern of a SEDA
- in project management we could follow something like Kanban
If, instead, the work can be broken up into a number of independent, yet well defined, pieces, then we use the worker pool model:
- in CPUs we have many core GPUs
- in data centres we have compute clusters
- in software we have thread pools
- in project management we could follow something like Scrum
The problem comes when the work is not independent, but the dependencies can not be factored out in a standardised manner ahead of time. Then neither of the above architectural patterns succeeds at solving the coordination problem and one is unable to obtain a concomitant increase in concurrency or improvement in utilisation.
Yet, this is not an unsolved problem. Compilers solve this sort of problem in order to compute optimisations. Instruction dependencies are derived from the source code, written in a given language, and then instructions are reordered while producing the same semantic behaviour while achieving improved use of registers or securing efficiencies by leveraging memory caching layers.
Operating systems also solve this problem by mapping processes to schedulers, while leveraging dependencies between processes and shared resources in order to reduce context switches while simultaneously ensuring effective use of I/O lines.
To this end, we can do the same for a project. As we decompose the undertaking into smaller work items we can immediately encode dependencies between the tasks. This will ultimately result in a graph of work that needs to be performed. Note, while the graph will often be tree like in nature, as a result of the process of breaking down aspects into ever smaller pieces, there are often aspects that have more complex dependencies given rise to a general DAG (cycles would normally highlight some form of block or conflict in the project that needs to be resolved).
Now, as with compilers, one can compute the dominator tree. This in turn allows one to understand which work is actually independent of each other, and therefore can be carried out concurrently, versus where there are dependencies. The junctions in the dominator tree are essentially “barriers” where concurrent flows need to come together and coalesce, before work can continue again.
On the back of this view of a project, it becomes possible to decompose larger problems while still keeping track of the inter-dependencies in the work. That is, it becomes possible to hand out work to a team more fluidly, while understanding when the team members will need to coordinate by merging their work together at junction points.
Rather than always needing high bandwidth e.g. time consuming meetings and shared physical work environments, it becomes possible for team members to optimise their time around when they will need high bandwidth versus when it makes sense for them to be individually focused.
However, for this process to succeed there needs to be continual feedback between the team members and the tools tracking the work.
The Search Space Problem
A project has well defined bounds and context, but can be achieved via searches along many different paths.
If the complete solution to a given engineering problem is known a priori then solving the coordination problem via task dependencies would actually suffice as a reasonable project management approach.
That is, if we can detail all the work that needs to be carried out in order to solve a problem and therefore complete the project, then we can simply create a full decomposition, with dependencies, then then enact the work according to the resultant flow graph. In the same way that compilers generally perform a once-off analysis and optimisation of the solution (program) and then produce a binary that will always execute in the same way.
For many complex engineering problems, we do not know the full solution up front. Sometimes this can be attributed to poor and insufficient planning and design, when in fact, there was sufficient information and time to perform a full analysis. However, this is not always the case. Sometimes it makes sense to tackle the problem through a series of experiments. That is, we start whittling away at the problem ahead of know the full solution, and then we use the feedback from the system itself in order to decide what aspect to tackle next.
Thus, we are essentially treating the project as a search space. Rather than performing a full solution search ahead of investment in work on the project, we expend effort while we are searching for a viable path to reach the objective of the search.
Thinking of compilers, a simpler version of this might be JIT compilers that defer full optimisation of the program until runtime, so that they can use real world telemetry to guide the optimisation efforts.
Another approach to treating projects as a search space is the naive Monte Carlo method of experimentation i.e. the random walk. In fact, I would posit that many large corporations with large cash flows use exactly this method because it is easy to parallelise if you have the resources. Simply set up many small teams all working on related and overlapped problems and then pick the “best” candidates that rise above the noise. If the degree of parallelisation in the search is high then novel solutions can often be found in relatively short amounts of time. Alas, many organisations also follow this approach unwittingly.
But most engineering efforts do not have the slack in their resources to be able to take this approach. Rather, a guided search is necessary.
For this, a bounded definition of the problem is required. This would be described relative to various constraint dimensions. Technical, economic, skills availability, etc. This enables one to choose further search directions in the project, while ensuring that the constraints continue to be met.
For some dimensions these constraints can be evaluated through simple linear computations e.g. tracking time and costs, or pruning solutions which require skills that are not available.
For other dimensions it may be appropriate to build more complex models of the search space. To wit, for the technical constraints it may be better to build models of the dynamics of the system. These models are then kept synchronised with the current state of the actual system that has been developed to date via the experiments that are exploring the project search space. Then, when selecting the next experiment to embark on, it may be possible to first run a much faster and less resource intensive exploratory experiment against the model. If this demonstrates sufficient merit within the model, then it can be selected as a complex task that can be added to the projects dependency graph, and sequenced for implementation by the team.
The accuracy of the model will then affect the fidelity of the search direction choice.
Additionally, each completed phase can then be evaluated against the model(s) at hand so as to gauge the accuracy and utility of the models themselves, or to further calibrate the models that are being used to guide the search across the project space.
The Completion Problem
A project has measurable progress that can be tracked and analysed in order to evaluate completion.
Long running projects are unavoidable when tackling complex engineering endeavours. However, there is a difference between “long running” and “never ending.”
It is fundamental to the success of the project that it has a clear completion goal (since by definition a project must eventually reach a point of closure). Which, in turn means that one needs to ensure that there is a measure of the progress of a project. Simply put, “Are we there yet?” and “How far until we get there?”
From these we can infer approximations to the question “When will we get there?”
The key here it to ensure that the progress is errs on the side of quantitative rather than qualitative. We need to be able to measure, in concrete and objective terms, our progress.
Being able to measure our progress, together with a well defined end point (i.e. knowing when we have achieved sufficient progress in order to declare completion), essentially encodes the objective function for the search. That is, we can engage in a search strategy as discussed previously, but simply achieving coverage of the search space is not sufficient. We also need to know when we have found a solution. We need to know when we’ve done enough work, and solved enough small problems to be able to objectively compute a signal that indicates we have now completed the project.
Now, as stated, these completion criteria define the objective function for the search. That is, when we evaluate the result of some phase in project we can then not only validate that we are operating within the constraints of the project, but we can also evaluate the impact that this phase has in helping to progress the project towards a point of completion.
Similarly, these same completion criteria can be used when evaluating potential avenues within the confines of the system models ahead of making actual changes to the final system.
An additional point that is crucial when carrying out long running projects is the fact that the context can change during the lifespan of the project. Again, this is why it is valuable to collect continual telemetry from the system that give metrics regarding the performance of the system being modified. These can provide guidance during the project even if other factors have come into play. The completion criteria remain the same, but the factors in that completion may be relative to measurements from the context.
If the environmental or contextual factors are changing faster than expected then this might show that there is actually a regression in progress, and that a different strategy is required. This can be carried out by feeding back into the search space decisions and then back into the coordination sequencing.
A project is not a static collection of steps. Also, a project is not a never ending quest. Rather a project is a fully dynamic system in and of itself, that has a well defined halting condition.
The complexity in this system needs to be managed in the same ways that one would manage complexity in other systems. We create architectural separations between subsystem and aspects of the system that exhibit clear boundaries of decoupling or reciprocally of cohesion in operation. We build models to match these system elements. We guide change in the system relative to these models.
In the case of projects, one possible decomposition of the system is into:
- mechanisms supporting the coordination of work
- mechanisms enabling a guided search across solution spaces
- mechanisms for measurement of progress relative to completion criteria
If these three aspects of a project can be made to work seamlessly together, while operating efficiently apart, then there is every likelihood that the project will be effective, adaptive and successful.