So far, I have identified three different ways of analysing the content of a ParEvo exercise:
1. Participatory, using forced-choice questions to identify storylines that are most exceptional on some predefined evaluation criteria.
2. Participatory, using a pile-sorting exercise to identify clusters of storylines with similar content, and to capture what characterises each cluster.
2. Machine learning (using a ‘topic modelling ‘ algorithm) to identify clusters of storylines having similar content.
Each of these is discussed in detail below. In addition, there are also discussions of other measurement issues: (a) Probability and uncertainty, (b) The social structure of evaluation judgments, and (c) Evaluating participatory reconstruction of histories.
1. Using predefined evaluation criteria
The ParEvo web app has a built-in evaluation stage. When this is triggered by the facilitator, at a time of their own choosing, the following panel appears above any currently visible storyline.
Participants are able to scan across all surviving storylines and click on each of the above options to reflect their choice of which storyline best fit each of these criteria. These choices are recorded and aggregated by the web app, and can be downloaded as a dataset by the facilitator. When loaded into Excel one way of visualising aggregated evaluation results is as follows:
Each circle in the graph represents a particular storyline, and the numbers beside the circle reflect the ID of that storyline. The left-hand axis shows the degree to which the storyline has a low versus high probability of actually happening, and the bottom axis shows the degree to which the participants think the storyline concerns something that has low versus high desirability.
The central position of many of the storylines in this graph reflects their contested judgment. If there was uniform agreement on the probability or desirability of a storyline then it would be located on the outer edges of this graph. Central positions are net positions on desirability and probability. In this example, there was most agreement about the probability of storyline 41 and desirability of storylines 40.
There are other evaluation criteria that could be applied to ParEvo storylines. For example:
- Consistency/continuity between one contribution and the next
- Radical versus conservative: The extent to which a future state is different from the present
- Expected versus unexpected: Perhaps the same as the above, but more binary
- Describing local versus global changes
- Involving reversible versus irreversible changes
Future versions of ParEvo may enable the facilitator to choose their own preferred evaluation criteria
2. Using participants’ evaluation criteria
This approach uses a method described as pile or card sorting. It can be done manually or through an online survey. In both cases, participants are asked to look at the surviving storylines and sort them into 2 piles, which can be of any size. The contents of each pile should have some characteristics in common which are not present in the other pile. Those characteristics should be ones that the participant sees as significant, or at least interesting, in the context of the aim of the particular ParEvo exercise. Once the 2 piles are constructed the participant should then describe what is unique about the contents of each of the piles.
Here is an example of how the exercise is presented via a Survey Monkey online survey:
This exercise generates a dataset describing what storylines were put into what groups (and its characteristics) by which participants. Social network analysis software can be then used to provide an aggregate perspective, showing how storylines are connected to each other by frequently being placed in the same piles.
The same exercise generates qualitative data i.e the descriptions given by participants to the membership of each ile – what they had in common and how they differed from other piles. Here are some of the descriptions of what is common between some of the above storylines
Storyline 41 and 40
- These described more generalised societal problems
- These are attempts to explore issues (the challenge was, however, that issues were all at different scale and focus… making the threads not comparable)
- These describe bigger issues, more human-focused solutions
Storyline 41 and 37
- These storylines all involve named individuals and how they are reacting to climate change and other people’s reactions to climate change. It brings our focus down to a fairly micro level, where we can identify what might or could be happening.
- These show some adaption, accommodation though perhaps not enough.
Storyline 41 and 40 and 37
- These are optimistic
- These have more interesting information on the actions that pushed towards the mitigation of climate change effects
- These revealed the seeking of alternative actions and decision to understand and mitigate climate change
- These are broadly positive, showing signs that people will be able to raise climate change up the global agenda and mobilise efforts to do something about, albeit not without a lot of struggle and difficulty.
- Here some positive message is clear (though perhaps insufficient)
Looking back at Figure 2, it appears that the two clusters of storylines seen in Figure 4 occupy two distinct spaces on the graph: 40,41 and 37 occupy the bottom right; 38, 39 and 42 occupy the top left.
3. Machine learning used to find storyline clusters
Storylines can also be grouped or clustered using machine learning algorithms. One of these algorithms, which I have tried, is called topic modeling. This can be done using an online machine learning platform known as BigML, which I can recommend.
The completed storylines, generated by a ParEvo exercise, can be downloaded by the Facilitator. These can be then uploaded to BigML in a CSV. file format where each row contains a complete storyline.
Within BigML choices can be made as to how many clusters are to be identified. In one of my trial runs, I set this choice as 2, so I can compare the results to those of the past sorting run of results described above, which also generate 2 piles (per participant)
Two kinds of results are generated. One is a list of words that are most probably associated with each particular pile, in order of probability. The other is a list of the storylines along with the probabilities of which should clusters they will belong to, given the words found in no storylines. Here are examples of both of those outputs:
The results are broadly similar to the pile sorting exercise results showing in Figure 4: storylines 38, 39 and 42 are in one cluster and storylines 40,41 and 37 are in another cluster (but bear in mind the pile sorting exercise did not include all original participants)
It may be coincidental, but the 2 least similar surviving storylines listed above (39 and 40) are respectively about events happening in Greenland and the Pacific!
One of the challenges when using topic modeling is a clustering method is how to label each cluster. The BigML algorithm does this by selecting the word with the higher highest probability of being associated with a cluster. But this is not necessarily meaningful to humans. Scanning the other words with high probabilities may help, but not always. Another option that I have explored is to do predictive modeling to find out combinations of keywords that best protect membership of a cluster. And another option is simply to take the extreme cases like I have above, and eyeball these to identify, in one’s own opinion, how they most differ.
Probability and uncertainty
In their discussion of scenario planning in the context of disasters Briggs and Matejova (2019) argue that the distinction between probability and uncertainty in scenarios is often conflated or not recognised, yet they have different consequences.
The graph shown above, generated form downloaded ParEvo exercise data shows probability (likelihood) as one of the two dimensions. In the same graph, uncertainty can be also be seen, in two ways:
- Participants gave contradictory ratings on probability (or desirability) to the same scenario, with the net result that the scenario fell in the middle of one or both axes of the graph above. See data points (storylines) 37,38,39.
- Participants did not choose a given scenario to be more or less probable or more or less desirable. This suggests the scenario sits somewhere in the midpoint of both axes. These storylines do not appear as data points in the graph above. Their uncertainty is different from that of storylines with contradictory ratings.
The social structure of evaluation judgments
One of the data products from an evaluation process is a series of matrices, one per respondents, where rows = storylines, and columns = evaluation criteria, and cell values or 1 or 0 = if a given criterion chosen or not for a given storyline. Using UCINET social network analysis software it is possible to identify the degree to which the contents of each respondents’ matrix are correlated with each other. The resulting correlation matrix can then be visually represented as a network structure.
The example below is based on data generated from a ParEvo pre-test exercise (anonymised). Each link represents a positive correlation between two respondents judgments. Thicker lines = a higher correlation. The three participants at the left had a larger number of correlated judgments and the most highly correlated judgments. Those on the right had fewer correlated judgments and these were of lower value.
One observation from this analysis was that similarities between respondents in evaluation judgments do not simply match similarities noticed during storyline construction, i.e. as in which participants added to which other participants’ contributions. This suggests participants’ judgments are changing over time.
Evaluating participatory reconstruction of histories
All the above has been about evaluating future scenarios. But as pointed out on the Purpose and Design page, ParEvo can also be used to reconstruct alternate histories, from a given point and location onwards. In those circumstances what sort of criteria would be relevant to the evaluation of the surviving storylines? Some candidates might be:
- Availability of evidence?
- Verifiability – of the events described, if no evidence is yet available?
- Continuity/coherence – are there missing gaps and discontinuities?
- Salience – are the most important events included?