Why do so?
- To identify ways in which the contents of storylines can be improved
- To identify how to objectives relating to peoples participation can be better achieved
Contents of this page
- Dropout behavior
- Participants’ contributions to storylines
- Participants’ comments on storylines
- Participants’ evaluation of storylines
- Performance analysis
1. Dropouts
The participation rate ParEvo exercises has varied according to the source of the participants. In most of the early ParEvo exercises which were carried out with volunteers from a Community of Interest there were some volunteers who dropped out, average of 24%. But in the two exercises carried out since then each of which involved staff from a particular organisation there were no dropouts
2 Contributions made within each iteration
2.1 Available data
When a participant makes a new contribution to existing storylines, they make two types of connections: (a) they connect to another participant – the one whose contribution they are immediately adding to, and in doing so (b) they connect to a specific storyline – a string of contributions that have been built on, one after the other.
In a ParEvo exercise, data on these connections is accumulated in the form of two downloadable matrices, known in social network analysis (SNA) jargon as (a) an adjacency matrix, and (b) an affiliation matrix, respectively. An example of each is shown below. In the adjacency matrix, the cell values are the numbers of times the row actor has added a contribution to an existing contribution made by the column actor. In the affiliation matrix, the cells’ values indicate the number of times each column participant has contributed to a row storyline. These two examples are based on a pretest exercise, which only ran for four iterations. In exercises with more iterations, the cell values would be higher.
2.1.1 Adjacency matrix

2.1.2 Affiliation matrix

2.2 Analysis options
The contents of this section reflect my current preoccupations. There are probably many other ways of analysing the above data. If you do explore these, please let me know,
I have analysed the data in each of these matrices from three perspectives
- Variations across rows
- Variations across columns
- Variations across the whole matrix
In all of these analyses, we can look at what is happening in terms of diversity. Why examine diversity?
- Variation is intrinsic to an evolutionary process
- Diversity is indicative of a degree of agency (/the absence of a common constraint)
- Lots of research has been done on diversity & group performance
- Simple but sophisticated measures available, already used in other fields:
- Ecology
- Social Network Analysis
There is a big literature on the measurement of diversity. Here I make use of Stirling’s (1998) influential paper. He suggested that diversity can be measured on three dimensions: variety, balance, and disparity. In the immediate discussion below the focus will be mainly on variety and balance.
2.3 Participants relationships to each other
2.3.1 Participants as contributors – Variations across rows
Individual participants varied in the way they contributed to others’ existing contributions.
Variety in this context refers to the range of other participants they contributed to. Variety = count of row cells values >0 / sum of row cells values. A score of a hundred per cent means a participant built on every other participant’s contribution at least on one occasion. In the MSC pretest participants’ variety ranged from 67% to 100%, with an average of 91 %, whereas in the Brexit pretest variety ranged from 50% to 100% with an average of 70%. Variety was greater in the MSC pretest.
Balance in this context refers to the extent to which their contributions were evenly spread across those they had contributed to, or not. With datasets like to matrices above balance can be measured by calculating the standard deviation (SD) of the values in a row. If all participants have received the same number of contributions from a row participant in the standard deviation will be zero. Alternatively, if the number of contributions they received varies widely the standard deviation high. In the MSC pretest, the SD of values ranged from 0.0 to 0.5, with an average of 0.15, whereas in the Brexit pretest SD values ranged from 0.0 to 1.0 with an average of 0.39. Balance was greater in the MSC pretest.
These two measures can be combined into a single measure known as Simpsons Diversity Index. There is a useful online calculator here: https://www.alyoung.com/labs/biodiversity_calculator.html This is a more sophisticated measure suitable when there is a larger and more varied number of values in the matrix.
A much simpler measure which does not make these distinctions between variety and balance is the proportion of a participant’s contributions which built on others’ contributions (and not their own). This is probably the most suitable for feedback to participants and one which might, if publicised, encourage such behavior. In Social Network Analysis terms this is a particular way of measuring OutDegree. In the MSC pretest, this percentage ranged from 33% to 100%. In the Brexit pretest, it ranged from 0% to 100%. Averaged overall participants 73% of MSC pretests contributions built on others’ contributions, whereas in the Brexit pretest the proportion was much lower at 33%.
2.3.2 Participants as recipients – Variations across columns
Individual participants can also vary in the way others contribute to their existing contributions.
Variety in this context refers to the range of other participants they received contributions from. Variety here = count of column cells values >0 / sum of column cells values. A score of a hundred percent means every other participant built on this participant’s contribution at least on one occasion. In the MSC pretest, this variety ranged from 67% to 100%, with an average of 90 %, whereas in the Brexit pretest this variety ranged from 33% to 100% with an average of 70%. Variety was greater in the MSC pretest.
Balance in this context refers to the extent to which the contributions of others were evenly received. In the MSC pretest, SD values ranged from 0.0 to 0.5, with an average of 0.15, whereas in the Brexit pretest SD values ranged from 0.0 to 0.5 with an average of 0.18. The difference in the balance of received contributions was very small.
Another measure that does not make these distinctions is the proportion of all contributions by all others which were received by a participant. In Social Network Analysis terms this is a particular way of measuring InDegree. As above, this is probably the most suitable for feedback to participants and one which might act as a motivator. In the MSC pretest, the proportion ranged from 8% to 15%. In the Brexit pretest, it ranged from 3% to 17%. These ranges might be expected to grow as the number of iterations increases. In the pretests, there were only four iterations each
2.3.3 Variation across the whole matrix – a whole network view
In the adjacency matrix, showing relationships between contributors and recipients a simple aggregate measure of variety can be based on a count of the cells with any non-zero values in them. In Social Network Analysis terms this is a particular way of measuring network density.. In the MSC pretest there were 23. This was 88% of the possible maximum, given that there were 26 contributions in total (the sum of all the cells). The whole matrix represents all the possible combinations of types of ideas. One could argue that a higher variety score means participants have been more willing to explore a wider range of ideas. In the Brexit pretest the variety measure was lower, at 66%.
An aggregate measure of the balance of these contributions would look at how evenly spread these contributions were. As above, the standard deviation was calculated for all the nonzero values in the adjacency matrix. In MSC pretest example the SD is 0.33. In the Brexit pretest, it was 0.75, indicating a much more uneven spread of contributions.
2.3.3.1 The network structure of participation
The same adjacency matrix data can be imported into social network analysis software to generate a visualisation of the network structure of participants’ contributions. Here are three examples.
The first example comes from a 1990s pretest of the ParEvo process. Each red node is a participant, each grey line is a contribution from one participant to another participant’s contribution. Thicker lines mean more contributions. Red lines mean reciprocated contributions.

In this example, there is a visible “clique” of three participants who built on each other’s contributions (shown connected by red links). This can be seen as a form of specialisation. Another type of specialisation can be seen when participants build on their own previous contributions. This is evident in the green diagonal cells in the adjacency matrix. These can be measured as a proportion of all cells in the matrix with values. In the MSC pretest, this proportion was 27%. In the Brexit pretest, it was much higher at 65%.
The second and third examples below come from the Brexit and MSC ParEvo pretests. The contrast in the structures is dramatic, with the MSC network structure having a much higher density (more of the possible links that could exist do exist). High density can be seen as representing an alternative strategy to specialisation i.e. diversification. People are building on a wide range of others’ contributions and a wide range of others are building on their contributions.


2.3.3.2 An Indegree x Outdegree scatterplot as a map of the structure of cooperation
In the adjacency matrix above, most participants can be seen to have: (a) added some or all of their contributions on to that of others, and (b) some or all of their contributions have been added on to by others. In social network analysis these outward connections (“a”) are known as outdegree and the inward connections (“b”) are known as indegree. Row and column totals can be calculated to find the total outdegree and indegree values respectively for each participant. These values can then be plotted on a scatterplot, as shown below.

Figure 6
Given the range of possible indegree and outdegree measured values, four extreme types of participants can be identified, as shown above. None of these is intrinsically of more value than other. (e.g. “isolates” might be equally labeled “individualists”). It is possible that the optimum mix of the four types might be quite context specific. What has been interesting so far is that the distribution of participants across the four quadrants had been noticeably diverse, even among a small set of three exercises. In one ParEvo exercise most were in the “isolates ” quadrant. In the two other most were in the “follower” quadrant, but with one exercise having no connectors whereas the other did. Both had participants in the “leader” quadrant.
2.3.3.3 Disparity
Disparity is the 3rd dimension of diversity mentioned above. Disparity is the distance between two types, in terms of differences in their attributes. An ape and human being are not very disparate, compare to an ape and a frog. One way of conceptualising and measuring disparity in a ParEvo exercise is to use the SNA measure known as “closeness”. Closeness is the average distance, in terms of links, of the shortest paths that connect a given actor in a network, with each other actor. In Figure 3, the first of the three network diagrams shown above C is the most distant, and so could be seen as the most disparate. E is the closest and be seen as the least disparate. In Figure 5, the bottom diagram, the nodes at either end will have the lowest closeness measure i.e. be the most disparate. Disparity may be a useful measure of how central or peripheral different participants are in the collective construction of storylines.
2.4 Participants relationships to storylines
The same threefold perspective can be applied to the affiliation matrix, showing how participants contributed to different storylines:
- Variations across rows
- Variations across columns
- Variations across the whole matrix
2.4.1 Storylines as recipients – Variations across rows
The same variety and balance measures used above can also be applied to the affiliation matrix, showing the relationship between storylines and participants. In the MSC pretest affiliation matrix, the measure of variety of contributions received by different storylines ranged from 25% to 100% with an average of 80%. Balance of their contributions ranged from an SD of 0.00 to 1.00 with an average of 0.22. A SD of 1.00 occurred where the storyline received 3 out of 4 contributions from one participant. An SD of 0.0 occurred where the storyline received an equal number of contributions from each participant.
Another recipient measure is the proportion of participants contributing to each surviving storyline (relative to the number possible given the number of iterations completed), a type of indegree measure. In the MSC pretest, storyline scores on this measure ranged from 25% to 75%. If wide ownership of storylines is desired then high scores on this measure would be valued.
Predictive models
The kind of data shown in Figure 3 can be analysed using simple machine learning algorithms, to find what combinations of participants best predict if a storyline will survive or become extent. Here below is a Decision Tree model generated by EvalC3 Excel app, using data from a recent ParEvo exercise, now anonymised. Although there were 11 participants the outcome status of the 31 storylines can be predicted with names of only 3 participants for the surviving storylines and 3 for the extinct storylines. These models are indicative of their influence on the development of those storylines

2.4.2 Participants as contributors to storylines – Variations across columns
In the MSC pretest affiliation matrix, the variety measure for individual contributors ranged from 25% to 100% with an average of 74%. Balance of their contributions ranged from an SD of 0.00 to 1.00 with an average of 0.33.
Another contribution measure is the proportion of all of a participant’s contributions that are present in the surviving storylines to date. In the MSC pretest participants’ scores on this measure ranged from 0% to 80%, with an average of 51%. This might be considered as an achievement measure for individual participants if a gamified approach was being considered
2.4.3 The whole network view
In the MSC pretest affiliation matrix diversity was lower than in the adjacency matrix. Variety is lower, at 71% of the maximum possible. Balance is also lower, with an SD of 0.8. In the Brexit pretest, the corresponding values were 57% for variety and for balance, a SD of 0.98. These differences are similar to those found in the participants x participants adjacency matrix analysis. They don’t seem to tell us much that is new.
Probably of more interest is the measure of disparity when applied to a set of storylines generated in particular ParEvo exercise. As explained above, disparity can be measured using the social network analysis metric of ‘closeness ‘. If we look at the tree structure of the surviving storylines closeness is the distance between the end of one storyline and the end of another. One way of getting an intuitive idea of what the range of disparity might look like is to construct two tree structures representing two alternative set of storylines, as shown below.
The first of these showed 10 storylines that have each developed without any branching. The distance between the ends of each of these storylines is long. It is 10 degrees back to the seed, and then 10 degrees out to any other storyline ending.

The second of these showed 10 storylines that have all branched out from one storyline in the most recent iteration. Here the distance between the end of any two storylines is only one degree back and one degree out again. (The grey nodes are extinct storylines that did not get built on)

Now for the sake of comparison, here are three examples generated by three ParEvo exercises. The first two were pretests prior to the development of the ParEvo app. The third was generated by an early use of the ParEvo app.



A crude measure of likely disparity of storylines is the number of initial (iteration 1) contributions that still belong to surviving storylines. Figure 9 has 45%, Figure 10 has 27% and Figure 11 has 45% – so Figure 10 storylines are marginally less disparate.
2.4.3 Exploration and Exploitation
At its simplest, exploration is the process of searching out and testing out of multiple alternatives. In contrast, exploitation involves focusing in on one option, to extract its full potential.
The distinction, and tension, between exploration and exploitation strategies, has been around for a long time but is perhaps most strongly associated with a paper of that name by James March, published in 1991. Here is a recent review of the impact of that 1991 showing just how wide its influence has been.
It seems possible that the prevalence of these contrasting strategies could be identified at two levels: Within individual storylines and within the whole set of storylines in an exercise.
2.4.3.1 Exploration within storylines
The number of side-branching storylines produced by a storyline could be significant. A higher proportion means there was a wider exploration of alternatives in the course of a given storyline’s development. In the MSC pretest one storyline had 3 side branches developed over four iterations (See branch 1.2.3.1.1. in Figure 9). In the Brexit pretest, 5 storylines had 2 side branches developed over four iterations. In an exercise with four iterations and 10 participants the maximum possible number of side branches for a given storyline would be, I think, 27 i.e. 9 per iteration, excluding the final iteration. Figure 8 is an example.
2.4.3.2 Exploration across all storylines
The proportion of extinct versus surviving storylines as a whole is another potentially useful measure. A higher proportion means there was a wider exploration of alternatives. If all participants contributed to their own storylines only there would be no dead storylines at all per generation (See Figure 7 above). On the other hand, if all participants contributed to the same storyline in each new iteration there would be the highest possible proportion of dead storylines per iteration. The maximum proportion of extinct storylines =((Number of participants-1)*(#Iterations-1))/(N*I= 81% for Figure 8 above ). In the MSC pretest, 64% of all storylines became extinct. In the Brexit pretest, 47% became extinct. There was less diversity in the form of exploration of alternatives.
2.4.3.3 Exploration then exploitation?
The strategies that participants collectively use may change over time. Initially, it might be expected that exploration would prevail, then later on exploitation would become more dominant, as certain original storylines became the main focus of interest. In other words, the tree structure might start by looking like Figure 7 but then change towards one looking more like Figure 8.
Looking at Figures 9,10 and 11, the proportion of original storylines that remained of interest in the last iteration was around 50-60%. It would be interesting to know more about how this balance changes (if at all) with a greater number of iterations.
In organic evolution, environmental demands may lead to only a few lineages surviving. But in small populations, there is an alternate interpretation for such an outcome, which may also apply with ParEvo storylines. This is known as “genetic drift“. In small populations, an accumulation of random choices can lead to some genes (read here storylines) becoming dominant. So in the ParEvo exercise context, a shared approach by participants would not necessarily be needed to generate an emerging dominance by a few original storylines.
3. Participation in the comment making stage of each ParEvo iteration
During each iteration, participants can be enabled (by the facilitator) to make a single comment on one or more of the contributions made to the storylines during that iteration. Facilitators can now download data showing which participants commented on which contributions.
4. Participating in the evaluation stage of the ParEvo exercise
After the last iteration of a ParEvo exercise it is useful to involve participants in some form of evaluation of the exercise results. The simplest way of doing so is to use the built-in evaluation mechanism. In its default form, participants are asked to identify individual storylines that are: (a) most likely to happen, (b) least likely to happen, (c) most desirable to happen, (d) least desirable to happen. After this steps completed, the aggregated results are then visible to participants in the same panel they used to record their judgements.
In addition, their responses are available to the facilitator as a downloaded data set. This is in the form of a series of matrices, one per respondents, where rows = storylines, and columns = evaluation criteria, and cell values or 1 or 0 = if a given criterion chosen or not for a given storyline. Using UCINET social network analysis software it is possible to identify the degree to which the contents of each respondents’ matrix are correlated with each other. The resulting correlation matrix can then be visually represented as a network structure.
The example below is based on data generated from a ParEvo pre-test exercise (anonymised). Each link represents a positive correlation between two respondents judgments. Thicker lines = a higher correlation. The three participants at the left had a larger number of correlated judgments and the most highly correlated judgments. Those on the right had fewer correlated judgments and these were of lower value.

Figure 12
One observation from this analysis was that similarities between respondents in evaluation judgments do not simply match similarities noticed during storyline construction, i.e. as in which participants added to which other participants’ contributions. This suggests participants’ judgments are changing over time.
5. Performance analysis
THIS SECTION IS CURRENTLY UNDER REVISION
An analysis of the factors which contribute to good performance will obviously depend on how good performance is defined.
At this point in time (2021 02 07) the optimal outcome of a ParEvo exercise would be a maximally diverse set of storylines. The rationale being that the participants are interested in identifying possible futures that might come into being and which they might need to be ready for, and this is by definition a difficult task. So, a ParEvo exercise is a little like using a fishing net, a process used to find a sufficient variety of possible futures to think about.
There are at least two ways of measuring the diversity of a set of storylines generated by a ParEvo exercise. The first would be just to look at the tree structure that has generated and to measure the three facets of diversity (variety, balance, and disparity) that are present in the tree structure. So, a maximally diverse set of storylines defined in these terms might look like the tree structure in figure 7. There are ten different types of storylines, there is one instance of each of these, and the distance between each of these is the maximum possible. We could go further in detail and look at the extent to which each of these storylines has been constructed by the participants i.e. are each of them constructed by a single participant or each of them by all of the participants. I’m not sure which of these options would signify the greatest diversity…
The second way of measuring diversity in a set of storylines would be to pay more attention to the content of the storylines, as distinct simply from the network structure of how their contributions are connected. In the section of this website concerned with analysis of storylines Figure 2 is a scatterplot showing how the surviving storylines are distributed on two axes: likelihood and desirability. Within this framework a diversity of storylines would be visible in a distribution of storylines that ranged across all possible values of likelihood and desirability. In some exercises an eyeball examination will be sufficient to identify where there are gaps i.e. a whole quadrant in a scatterplot with no storylines present . In other exercises more quantitative measurement might be needed. For example, not only the number of quadrants containing storylines (variety) but also the number of storylines in each quadrant (balance) .
There are probably also other ways of measuring diversity in a set of storylines. There may be other axes that storylines could be usefully plotted against e.g. sustainability x equity. Or one might be looking at the diversity of types of actors present in the storylines, as made evident through some form of content analysis.
In the section below I will try to list some ideas relating to how performance can be improved, given some adequate prior definition of performance.
1. Market favourites
One idea that has attracted me for a long time, but which is also worried me, has been the possibility of publicising during a ParEvo exercise the extent to which each participant has had their contributions added to by others. I know from anecdotal reports that some participants are pleased to see when this happens to their own contributions, and disappointed when it doesn’t. But I’m also fairly certain that participants vary in the extent to which this matters to them at all. The question which remains of interest to me here is what would happen if an exercise facilitator did publicise, on an ongoing basis during an exercise, the extent to which each participant’s contributions were added to by others. At present only each participant can identify how well are own contributions are doing in these terms, but they have no means of comparing their own performance with that of other participants.
On the surface this might incentivise creativity, in the form of imaginative alternatives that others want to build on, i.e. more diversity. But might there also be counter-productive aggregated affects? One possibility is that participants would try and build on the contributions of the person whose contributions have previously been the most popular. Figure 8 could be the result. This is a form of ‘exploitation ‘ orientated behaviour. It could lead to premature convergence of ideas, in situations where a greater diversity was possible or even really in need.
2. Courageous minorities
In a paper titled Optimal incentives for collective intelligence Mann and Helbing (2017) are concerned about the consequence of loss of diversity arising from the use of popularity measures and explored ways of enhancing it in, order to maximise potential collective intelligence. In particular, they ask “…how can minority viewpoints be fostered in the first place to enhance diversity and its potential benefits for collective intelligence?”. Their conclusion: “…we suggest that individuals should not be rewarded simply for having made successful predictions or findings and also that a total reward should not be equally distributed among those who have been successful or accurate. Instead, rewards should be primarily directed toward those who have made successful predictions in the face of majority opposition from their peers.
The challenge here is how to operationalise this approach within a ParEvo exercise. It assumes that amongst the set of surviving storyline there are some best/most accurate storylines. One possible application would be to focus on how storylines are rated in terms of likelihood of occurring in real life, by participants during the evaluation stage of a ParEvo exercise. Storylines do vary in their rated likelihood.
Aggregate scores could be generated for each participant, which would be the sum of their scores for each storyline they had contributed to. Their score for each such storyline = the net number of people judging that story as likely to happen / the percentage of people who contributed to that storyline. The smaller the latter number, the higher a participant’s score would be.
I have tested this method on one recently completed ParEvo exercise that had ten participants and seven iterations. The storyline rated as most probable was built from the contributions of just one participant. In contrast, the two storylines with the widest range of participants had probability ratings which were more in the middle to upper-middle of the range. Not the worst, but also not the best. Two of the participants, who performed well in terms of the number of other participants who added to their contributions, performed poorly in terms of their predictions of the most probable storylines.
But this was only one ParEvo exercise, it will be interesting to see what happens as further ParEvo exercises are carried out.
One thought on “Participation Analysis”