Why do so?
- To identify ways in which the contents of storylines can be improved
- To identify how to objectives relating to peoples participation can be better achieved
People participate in a ParEvo exercise in the following ways:
- By registering as a participant
- Making contributions in each iteration of a ParEvo exercise
- Making comments in each iteration of the ParEvo exercise
- Participating in the evaluation stage of the ParEvo exercise
We can analyse the nature of people’s participation at each of these stages in a ParEvo exercise.
It is also possible to analyse the relationship between these different kinds of events. In particular, the relationship between (2) and (4) – how people made contributions to the different storylines and how these relate to the subsequent judgements of those storylines by participants. So see the last section
Experience so far suggests that it is likely that more people will register than actually participate in a ParEvo exercise. And that as the exercise proceeds some people will drop out at various points. This tree diagram from a recent exercise shows the attrition rate as the exercise proceeded. Each row represents one iteration, with the first iteration at the top and the last at the bottom. Each node represents one participant’s contribution within an iteration.
2 Contributions made within each iteration
When a participant makes a new contribution to existing storylines, they make two types of connections: (a) they connect to another participant – the one whose contribution they are immediately adding to, and in doing so (b) they connect to a specific storyline – a string of contributions that have been built on, one after the other.
In a ParEvo exercise, data on these connections is accumulated in the form of two downloadable matrices, known in social network analysis (SNA) jargon as (a) an adjacency matrix, and (b) an affiliation matrix, respectively. An example of each is shown below. In the adjacency matrix, the cell values are the numbers of times the row actor has added a contribution to an existing contribution made by the column actor. In the affiliation matrix, the cells’ values indicate the number of times each column participant has contributed to a row storyline. These two examples are based on a pretest exercise, which only ran for four iterations. In exercises with more iterations, the cell values would be higher.
The contents of this section reflect my current preoccupations. There are probably many other ways of analysing the above data. If you do explore these, please let me know,
I have analysed the data in each of these matrices from three perspectives
- Variations across rows
- Variations across columns
- Variations across the whole matrix
In all of these analyses, we can look at what is happening in terms of diversity. Why examine diversity?
- Variation is intrinsic to an evolutionary process
- Diversity is indicative of a degree of agency (/the absence of a common constraint)
- Lots of research has been done on diversity & group performance
- Simple but sophisticated measures available, already used in other fields:
- Social Network Analysis
There is a big literature on the measurement of diversity. Here I make use of Stirling’s (1998) influential paper. He suggested that diversity can be measured on three dimensions: variety, balance, and disparity. In the immediate discussion below the focus will be mainly on variety and balance.
Participants as contributors – Variations across rows
Individual participants varied in the way they contributed to others’ existing contributions. Variety in this context refers to the range of other participants they contributed to. Variety = count of row cells values >0 / sum of row cells values. A score of a hundred per cent means a participant built on every other participant’s contribution at least on one occasion. In the MSC pretest participants’ variety ranged from 67% to 100%, with an average of 91 %, whereas in the Brexit pretest variety ranged from 50% to 100% with an average of 70%. Variety was greater in the MSC pretest.
Balance in this context refers to the extent to which their contributions were evenly spread across those they had contributed to, or not. With datasets like to matrices above balance can be measured by calculating the standard deviation (SD) of the values in a row. If all participants have received the same number of contributions from a row participant in the standard deviation will be zero. Alternatively, if the number of contributions they received varies widely the standard deviation high. In the MSC pretest, the SD of values ranged from 0.0 to 0.5, with an average of 0.15, whereas in the Brexit pretest SD values ranged from 0.0 to 1.0 with an average of 0.39. Balance was greater in the MSC pretest.
These two measures can be combined into a single measure known as Simpsons Diversity Index. There is a useful online calculator here: https://www.alyoung.com/labs/biodiversity_calculator.html This is a more sophisticated measure suitable when there is a larger and more varied number of values in the matrix.
A much simpler measure which does not make these distinctions between variety and balance is the proportion of a participant’s contributions which built on others’ contributions (and not their own). This is probably the most suitable for feedback to participants and one which might, if publicised, encourage such behavior. In the MSC pretest, this percentage ranged from 33% to 100%. In the Brexit pretest, it ranged from 0% to 100%. Averaged overall participants 73% of MSC pretests contributions built on others’ contributions, whereas in the Brexit pretest the proportion was much lower at 33%.
Participants as recipients – Variations across columns
Individual participants can also vary in the way others contribute to their existing contributions. Variety in this context refers to the range of other participants they received contributions from. Variety here = count of column cells values >0 / sum of column cells values. A score of a hundred percent means every other participant built on this participant’s contribution at least on one occasion. In the MSC pretest, this variety ranged from 67% to 100%, with an average of 90 %, whereas in the Brexit pretest this variety ranged from 33% to 100% with an average of 70%. Variety was greater in the MSC pretest.
Balance in this context refers to the extent to which the contributions of others were evenly received. In the MSC pretest, SD values ranged from 0.0 to 0.5, with an average of 0.15, whereas in the Brexit pretest SD values ranged from 0.0 to 0.5 with an average of 0.18. The difference in the balance of received contributions was very small.
Another measure that does not make these distinctions is the proportion of all contributions by all others which were received by a participant. As above, this is probably the most suitable for feedback to participants and one which might act as a motivator. In the MSC pretest, the proportion ranged from 8% to 15%. In the Brexit pretest, it ranged from 3% to 17%. These ranges might be expected to grow as the number of iterations increases. In the pretests, there were only four iterations each
Variation across the whole matrix
In the adjacency matrix, showing relationships between contributors and recipients a simple aggregate measure of variety can be based on a count of the cells with any non-zero values in them. In the MSC pretest there were 23. This was 88% of the possible maximum, given that there were 26 contributions in total (the sum of all the cells). The whole matrix represents all the possible combinations of types of ideas. One could argue that a higher variety score means participants have been more willing to explore a wider range of ideas. In the Brexit pretest the variety measure was lower, at 66%.
A measure of the balance of these contributions would look at how evenly spread these contributions were. As above, the standard deviation was calculated for all the nonzero values in the adjacency matrix. In MSC pretest example the SD is 0.33. In the Brexit pretest, it was 0.75, indicating a much more uneven spread of contributions.
The network structure of participation
The same adjacency matrix data can be imported into social network analysis software to generate a visualisation of the network structure of participants’ contributions. Here are three examples.
The first example comes from a 1990s pretest of the ParEvo process. Each red node is a participant, each grey line is a contribution from one participant to another participant’s contribution. Thicker green lines mean more contributions. Red lines mean reciprocated contributions.
In this example, there is a visible “clique” of three participants who built on each other’s contributions (shown connected by red links). This can be seen as a form of specialisation. Another type of specialisation can be seen when participants build on their own previous contributions. This is evident in the green diagonal cells in the adjacency matrix. These can be measured as a proportion of all cells in the matrix with values. In the MSC pretest, this proportion was 27%. In the Brexit pretest, it was much higher at 65%.
The second and third examples below come from the Brexit and MSC ParEvo pretests. The contrast in the structures is dramatic, with the MSC network structure having a much higher density (more of the possible links that could exist do exist). High density can be seen as representing an alternative strategy to specialisation i.e. diversification. People are building on a wide range of others’ contributions and a wide range of others are building on their contributions.
Disparity is the 3rd dimension of diversity mentioned above. Disparity is the distance between two types, in terms of differences in their attributes. An ape and human being and not very disparate, compare to an ape and a frog. One way of conceptualising and measuring disparity in a ParEvo exercise is to use the SNA measure known as “closeness”. Closeness is the average distance, in terms of links, of the shortest paths that connect a given actor in a network, with each other actor. In the first of the three network diagrams shown above C is the most distant, and so could be seen as the most disparate. E is the closest and be seen as the least disparate. In the bottom diagram to nodes at either end will have the lowest closeness measure i.e. be the most disparate. Disparity may be a useful measure of how central or peripheral different participants are in the collective construction of storylines.
Measuring the diversity of storylines
The same threefold perspective can be applied to the affiliation matrix, showing how participants contributed to different storylines:
- Variations across rows
- Variations across columns
- Variations across the whole matrix
Storylines as recipients – Variations across rows
The same variety and balance measures used above can also be applied to the affiliation matrix, showing the relationship between storylines and participants. In the MSC pretest affiliation matrix, the measure of variety of contributions received by different storylines ranged from 25% to 100% with an average of 80%. Balance of their contributions ranged from an SD of 0.00 to 1.00 with an average of 0.22. An SD of 1.00 occurred where the storyline received 3 out of 4 contributions from one participant. An SD of 0.0 occurred where the storyline received an equal number of contributions from each participant.
Another recipient measure is the proportion of participants contributing to each surviving storyline (relative to the number possible given the number of iterations completed). In the MSC pretest, storyline scores on this measure ranged from 25% to 75%. If wide ownership of storylines is desired then high scores on this measure would be valued.
Participants as contributors to storylines – Variations across columns
In the MSC pretest affiliation matrix, the variety measure for individual contributors ranged from 25% to 100% with an average of 74%. Balance of their contributions ranged from an SD of 0.00 to 1.00 with an average of 0.33.
Another contribution measure is the proportion of all of a participant’s contributions that are present in the surviving storylines to date. In the MSC pretest participants’ scores on this measure ranged from 0% to 80%, with an average of 51%. This might be considered as an achievement measure for individual participants if a gamified approach was being considered
The whole matrix view
In the MSC pretest affiliation matrix diversity was lower than in the adjacency matrix. Variety is lower, at 71% of the maximum possible. Balance is also lower, with an SD of 0.8. In the Brexit pretest, the corresponding values were 57% for variety and for balance, a SD of 0.98. These differences are similar to those found in the participants x participants adjacency matrix analysis. They don’t seem to tell us much that is new.
Probably of more interest is the measure of disparity when applied to a set of storylines generated in particular ParEvo exercise. As explained above, disparity can be measured using the social network analysis metric of ‘closeness ‘. If we look at the tree structure of the surviving storylines closeness is the distance between the end of one storyline and the end of another. One way of getting an intuitive idea of what the range of disparity might look like is to construct two tree structures representing two alternative set of storylines, as shown below.
The first of these showed 10 storylines that have each developed without any branching. The distance between the ends of each of these storylines is long. It is 10 degrees back to the seed, and then 10 degrees out to any other storyline ending.
The second of these showed 10 storylines that have all branched out from one storyline in the most recent iteration. Here the distance between the end of any two storylines is only one degree back and one degree out again. (The grey nodes are extinct storylines that did not get built on)
Now for the sake of comparison, here are three examples generated by three ParEvo exercises. The first two were pretests prior to the development of the ParEvo app. The third was generated by an early use of the ParEvo app.
Exploration and Exploitation
At its simplest, exploration is the process of searching out and testing out of multiple alternatives. In contrast, exploitation involves focusing in on one option, to extract its full potential.
The distinction, and tension, between exploration and exploitation strategies, has been around for a long time but is perhaps most strongly associated with a paper of that name by James March, published in 1991. Here is a recent review of the impact of that 1991 showing just how wide its influence has been.
It seems possible that the prevalence of these contrasting strategies could be identified at two levels: Within individual storylines and within the whole set of storylines in an exercise.
Exploration within storylines
The number of side-branching storylines produced by a storyline could be significant. A higher proportion means there was a wider exploration of alternatives in the course of a given storyline’s development. In the MSC pretest one storyline had 3 side branches developed over four iterations (See branch 18.104.22.168.1. in Figure 7). In the Brexit pretest, 5 storylines had 2 side branches developed over four iterations. In an exercise with four iterations and 10 participants the maximum possible number of side branches for a given storyline would be, I think, 27 i.e. 9 per iteration, excluding the final iteration. Figure 8 is an example.
Exploration across all storylines
The proportion of extinct versus surviving storylines as a whole is another potentially useful measure. A higher proportion means there was a wider exploration of alternatives. If all participants contributed to their own storylines only there would be no dead storylines at all per generation (See Figure 7 above). On the other hand, if all participants contributed to the same storyline in each new iteration there would be the highest possible proportion of dead storylines per iteration (=((N-1)*(N-1))/N= 81% – See Figure 8 above ). In the MSC pretest, 64% of all storylines became extinct. In the Brexit pretest, 47% became extinct. There was less diversity in the form of exploration of alternatives.
Exploration then exploitation?
The strategies that participants collectively use may change over time. Initially, it might be expected that exploration would prevail, then later on exploitation would become more dominant, as certain original storylines became the main focus of interest. In other words, the tree structure might start by looking like Figure 7 but then change towards one looking more like Figure 1.
Looking at Figures 9,10 and 11, the proportion of original storylines that remained of interest in the last iteration was around 50-60%. It would be interesting to know more about how this balance changes (if at all) with a greater number of iterations.
In organic evolution, environmental demands may lead to only a few lineages surviving. But in small populations, there is an alternate interpretation for such an outcome, which may also apply with ParEvo storylines. This is known as “genetic drift“. In small populations, an accumulation of random choices can lead to some genes (read here storylines) becoming dominant. So in the ParEvo exercise context, a shared approach by participants would not necessarily be needed to generate an emerging dominance by a few original storylines.
3. Participation in the comment making stage of each ParEvo iteration
During each iteration, participants are allowed to make a single comment on one or more of the contributions made to the storylines during that iteration. In a forthcoming version of ParEvo data will be available showing which participants commented on which contributions.
4. Participating in the evaluation stage of the ParEvo exercise
After the last iteration of a ParEvo exercise it is useful to involve participants in some form of evaluation of the exercise results. The simplest way of doing so is to use the built-in evaluation mechanism. In its default form, participants are asked to identify individual storylines that are: (a) most likely to happen, (b) least likely to happen, (c) most desirable to happen, (d) least desirable to happen. After this steps completed, the aggregated results are then visible to participants in the same panel they used to record their judgements.
In addition, their responses are available to the facilitator as a downloaded data set. This is in the form of a series of matrices, one per respondents, where rows = storylines, and columns = evaluation criteria, and cell values or 1 or 0 = if a given criterion chosen or not for a given storyline. Using UCINET social network analysis software it is possible to identify the degree to which the contents of each respondents’ matrix are correlated with each other. The resulting correlation matrix can then be visually represented as a network structure.
The example below is based on data generated from a ParEvo pre-test exercise (anonymised). Each link represents a positive correlation between two respondents judgments. Thicker lines = a higher correlation. The three participants at the left had a larger number of correlated judgments and the most highly correlated judgments. Those on the right had fewer correlated judgments and these were of lower value.
One observation from this analysis was that similarities between respondents in evaluation judgments do not simply match similarities noticed during storyline construction, i.e. as in which participants added to which other participants’ contributions. This suggests participants’ judgments are changing over time.
5. Performance analysis
How do we measure the contribution of individuals to surviving storylines in a way that contributes to, rather than undermines the potential value of those storylines? For example, individuals could be scored according to the number of contributions their contributions attract from other participants. But would this sort of incentive – the desire to score highly in these terms – undermine the usefulness of the ParEvo process. For example, by leading to a much less diverse set of storylines, and possibly a premature convergence on a narrow set of possibilities.
This research suggests that this kind of outcome could be what happens. “Mann, R. P., & Helbing, D. (2017). Optimal incentives for collective intelligence. Proceedings of the National Academy of Sciences, 114(20), 5077–5082. In this paper, the measure I have proposed above is what the authors describe as a market-based incentive. Participants are rewarded by their ability to produce contributions that others value. And these rewards are nearly in real-time, at the end of each iteration.
Mann and Helbing are concerned about the consequence of loss of diversity and explore ways of enhancing it in, order to maximise potential collective intelligence. In particular, they ask “…how can minority viewpoints be fostered in the ﬁrst place to enhance diversity and its potential beneﬁts for collective intelligence?”. Their conclusion: “…we suggest that individuals should not be rewarded simply for having made successful predictions or ﬁndings and also that a total reward should not be equally distributed among those who have been successful or accurate. Instead, rewards should be primarily directed toward those who have made successful predictions in the face of majority opposition from their peers.
How could this idea be applied to a ParEvo exercise? Two steps would be involved. The first would be to redefine the focus of ‘success ‘and how it is measured. The second would be to think carefully how to measure the extent to which individuals have contributed to that success. The suggestion is as follows:
- Success would be the development of storylines that are most probable. Such storylines could be identified either through the existing built-in participatory evaluation feature in ParEvo and/or by the evaluation judgements of independent third parties.
- Individual participants could be seen as predicting a storyline as most probable in as much as they had contributed one or more paragraphs of text to that storyline, prior to the evaluation stage
Aggregate scores could be generated for each participant, which would be the sum of their scores for each storyline they had contributed to. Their score for each such storyline = the net number of people judging that story as likely to happen / the percentage of people who contributed to that storyline. The smaller the number of others who shared their views, the higher their score would be.
I have tested this method on one recently completed ParEvo exercise that had ten participants and seven iterations. The storyline rated as most probable was built from the contributions of just one participant. In contrast, the two storylines with the widest range of participants had probability ratings which were more in the middle to upper-middle of the range. Not the worst, but also not the best. Two of the participants, who performed well in terms of the number of other participants who added to their contributions, performed poorly in terms of their predictions of the most probable storylines.
But this was only one ParEvo exercise, it will be interesting to see what happens as further ParEvo exercises are carried out.