So far, I have identified three different ways of analysing the content of a ParEvo exercise:
1. Participatory, using forced choice questions to identify storylines that are most exceptional on some predefined evaluation criteria.
2. Participatory, using a pile-sorting exercise to identify clusters of storylines with similar content, and to capture what characterises each cluster.
2. Machine learning (using a ‘topic modelling ‘ algorithm) to identify clusters of storylines having similar content.
1. Using predefined evaluation criteria
The ParEvo web app has a built-in evaluation stage. When this is triggered by the facilitator, at a time of their own choosing, the following panel appears above any currently visible storyline.
Participants are able to scan across all surviving storylines and click on each of the above options to reflect their choice of which storyline best fit each of these criteria. These choices are recorded and aggregated by the web app, and can be downloaded as a dataset by the facilitator. When loaded into Excel one way of visualising aggregated evaluation results is as follows:
Each circle in the graph represents a particular storyline, and the numbers inside the circle reflect the ID of that storyline. The left-hand axis shows the degree to which the storyline has a low versus high probability of actually happening, and the bottom axis shows the degree to which the participants think the storyline concerns something that has low versus high desirability.
The central position of many of the storylines in this graph reflects their contested judgement. If there was uniform agreement on the probability or desirability of a storyline then it would be located on the outer edges of this graph. Central positions are net positions on desirability and probability. In this example, there was most agreement about the probability and desirability of storylines 40 and 41.
There Are other evaluation criteria which could be applied to ParEvo storylines. For example:
- Consistency / coherence
- Relevance / Utility
Future versions of ParEvo may enable the facilitator to choose their own preferred evaluation criteria
2. Using participants’ evaluation criteria
This approach uses a method described as pile or card sorting. It can be done manually or through an online survey. In both cases participants are asked to look at the surviving storylines and sort them into 2 piles, which can be of any size. The contents of each pile should have some characteristics in common which are not present in the other pile. Those characteristics should be ones that the participant sees as significant, or at least interesting, in the context of the aim of the particular ParEvo exercise. Once the 2 piles are constructed of participant should then describe what is unique about the contents of each of the piles.
Here is an example of how the exercise is presented via a Survey Monkey online survey:
This exercise generates a dataset describing what storylines were put into what groups (and its characteristics) by which participant. Social network analysis software can be then used to provide an aggregate perspective showing the ways in which the different groupings (and their characteristics) are connected to each other.
I hope to have a new example of such a visualisation to display here in the near future. Stay tuned.
Example: Descriptions for the 6 piles that storyline 43 belonged to:
- Pile x reflects rising debate and efforts to gain sufficient momentum (either political or via science/journalism getting the message out as the saviour- but no clear resolution. An ongoing battle for public opinion to change the political landscape.
- Pile x storylines are more negative. The Greenland story has some positive features to it but the apparent physical reality behind the human story is very threatening. The story involving Francisca (one version) was negative in a different way, in the confusion of intention and hope at the level of individuals.
- Pile x storylines all involve named individuals and how they are reacting to climate change and other people’s reactions to climate change. It brings our focus down to a fairly micro level, where we can identify what might or could be happening.
- Pile x ex-rays the challenges that surround the purposeful and sincere actions towards unfolding climate change conditions across the world
- Pile x has a personal story
- Pile x is pessimistic
3. Machine learning, used to find storyline clusters
Storylines can also be grouped or clustered using machine learning algorithms. One of these algorithms, which I have tried, is called topic modelling. This can be done using an online machine learning platform known as BigML, which I can recommend.
The completed storylines, generated by a ParEvo exercise, can be downloaded by the facilitator. These can be then uploaded to BigML in a CSV. file format where each row contains a complete storyline.
Within BigML choices can be made as to how many clusters are to be identified. In one of my trial runs I set this choice as 2, so I can compare the results to those of the past sorting run of results described above, which also generate 2 piles (per participant)
Two kinds of results are generated. One is a list of words which are most probably associated with each particular pile, in order of probability. The other is a list of the storylines along with the probabilities of which should clusters they will belong to, given the words found in no storylines. Here are examples of both of those outputs:
It may be coincidental, but the 2 least similar surviving storylines listed above (39 and 40) are respectively about events happening in Greenland and the Pacific!
The results are broadly similar to the pile sorting exercise results: storylines 39 and 43 are in one cluster and storylines 40 and 41 are in another cluster (but bear in mind the pile sorting exercise did not include all participants)
One of the challenges when using topic modelling is a clustering method is how to label each cluster. The BigML algorithm does this by selecting the word with the higher highest probability of being associated with a cluster. But this is not necessarily meaningful to humans. Scanning the other words with high probabilities may help, but not always. Another option which I have explored which has some possibility is to do predictive modelling to find out combinations of keywords which best protect membership of a cluster. And another option is simply to take the extreme cases, like I have above, and eyeball these to identify, in one’s own opinion, how they most differ.
Probability and uncertainty
In their discussion of scenario planning in the context of disasters Briggs and Matejova (2019) argue that the distinction between probability and uncertainty in scenarios is often conflated or not recognised, yet they have different consequences.
In the evaluation stage of the ParEvo pretest scenario uncertainty could be seen in two ways:
- Participants gave contradictory ratings on probability (or desirability) to the same scenario, with the net result that the scenario fell in the middle of one or both axes of the graph above. See data points (storylines) 37,38,39.
- Participants did not choose a given scenario to be more or less probable or more or less desirable. This suggests the scenario sits somewhere in the midpoint of both axes. These storylines do not appear as data points in the graph above. Their status is more uncertain than that of storylines with contradictory ratings.
The social structure of evaluation judgments
One of the data products from an evaluation process is a series of matrices, one per respondents, where rows = storylines, and columns = evaluation criteria, and cell values or 1 or 0 = if a given criterion chosen or not for a given storyline. Using UCINET it is possible to identify the degree to which the contents of each respondents’ matrix are correlated with each other. The resulting correlation matrix can then be visually represented as a network structure.
The example below is based on the MSC pre-test data (anonymised). Each link represents a positive correlation between two respondents judgments. Thicker lines = a higher correlation. The three participants at the left had a larger number of correlated judgments and the most highly correlated judgments. Those on the right had fewer correlated judgments and these were of lower value.
One observation from this analysis was that similarities between respondents in evaluation judgments do not simply match similarities noticed during storyline construction, i.e. as in which participants added to which other participants’ contributions. This suggests participants’ judgments are changing over time.
Evaluating participatory reconstruction of histories
All the above has been about evaluating future scenarios. But as pointed out on the Start Here page, ParEvo can also be used to reconstruct alternate histories, from a given point and location onwards. In those circumstances what sort of criteria would be relevant to the evaluation of the surviving storylines? Some candidates might be:
- Availability of evidence?
- Verifiability – of the events described, if no evidence is yet available?
- Continuity/coherence – are there missing gaps and discontinuities?
- Salience – are the most important events included?