So far, I have identified four different ways of analysing the content of a ParEvo exercise:
1. Participatory, using forced-choice questions to identify storylines that are most exceptional on some predefined evaluation criteria.
2. Participatory, using a pile-sorting exercise to identify clusters of storylines with similar content, and to capture what characterises each cluster.
3. Specialist, where the facilitator (for example) codes the contents of each of the contributions to each of the storylines according to themes or prior and emerging interest to themselves or their clients. There are both manual and semi-automated coding options now possible.
4. Machine learning (using a ‘topic modelling ‘ algorithm) to identify clusters of storylines having similar content.
Each of these is discussed in detail below. In addition, see also Evaluating participatory reconstruction of histories.
1. Using predefined evaluation criteria
The ParEvo web app has a built-in evaluation stage. When this is triggered by the facilitator, at a time of their own choosing, the following panel appears above any currently visible storyline.
Likelihood and desirability
Participants are able to scan across all surviving storylines and click on each of the above options to reflect their choice of which storyline best fit each of these criteria. These choices are recorded and aggregated by the web app, and can be downloaded as a dataset by the facilitator. When loaded into Excel one way of visualising the aggregated evaluation results is as follows:
Each circle in the graph represents a particular storyline, and the numbers beside the circle reflect the ID of that storyline. The Y axis shows the degree to which the storyline has a low versus high probability of actually happening, and the X axis shows the degree to which the participants think the storyline concerns something that has low versus high desirability.
The associated data matrix is shown in Figure 3.
Probability and uncertainty
The central position of many of the storylines in this graph reflects their contested or uncertain judgment. If there was uniform agreement on the probability and desirability of a storyline then it would be located on the outer corners of this graph. More central positions are net positions on desirability and probability. Or, they can reflect the absence of judgements (i.e. not selected by anyone as most or least desirable od likely)
These distinctions have some degree of corresponence with the risk versus uncertainty distinction identified by economists (Knight). They would distinguish between:
- Scenarios where some relative probabilities can be assigned (identifiable risks)
- Scenarios where it is not possible to do so (uncertainties)
Except that there in the above matrix there are four categories, not two
- Scenarios where there is high agreement in probability. No examples in Figure 3 above
- Scenarios where probability is partially agreed on (some 0 values in the matrix). Examples are #42 and #38 in Figure 3
- Scenarios, where probability judgements are contradictory e.g., assessed as both more and less probable. Examples are #41 and #37 in Figure 3
- Scenarios where there there are no judgements on probability (all 0 values). Examples are #39, #40 and #43 in Figure 3)
This may be a more realistic view. Here is one commentary on the risk/uncertainty distinction: “Some economists have argued that this distinction is overblown. In the real business world, this objection goes, all events are so complex that forecasting is always a matter of grappling with “true uncertainty,” not risk; past data used to forecast risk may not reflect current conditions, anyway. In this view, “risk” would be best applied to a highly controlled environment, like a pure game of chance in a casino, and “uncertainty” would apply to nearly everything else”
Perhaps, unlike the striking academics in the Hitchhikers Guide to Galaxy, we should not be demanding “clearly demarcated areas of doubt and uncertainty” 🙂
Update 2020 11 07: Reflecting on all the above, more guidance may still be needed on how to respond to the different storylines in Figure 3. Here are my current suggestions:
1. Firstly, pay attention to storylines with contradictory ratings on probability and desirability. If there is any opportunity to discuss these contradictory ratings with participants then use that opportunity to discuss and resolve those contradictions.
2. Storylines with high undesirability and low probability ratings can probably be ignored.
3. Storylines with high undesirability and high probability ratings should receive attention, focused on how those events can be avoided, or failing that, their effects being mitigated. For example storyline #42 in figure 3 above.
4. Storylines with high desirability and low probability may deserve more attention than those with high probability i.e. which are more likely to happen without further intervention. The focus here would be and identifying anyways of enabling these lower probability events to happen.
5. Storylines with high desirability and high probability may need less attention. For example, in some form of monitoring to watch whether they are unfolding as expected.
It is now possible for a ParEvo exercise Facilitator to change the default evaluation criteria. Other possible criteria that could be used, but have not yet been tried out are:
- Equity and Sustainability, where each axis has range from most to least equitable/sustainable change, as described in the different ParEvo storylines
- Individual versus Society level changes, combined with Formal versus Informal change. This frame is borrowed from, and described in detail, on page 22 of The ALIV[H]E Framework: Action Linking Initiatives on Violence Against Women and HIV Everywhere.
2. Using participants’ evaluation criteria
This approach uses a method described as pile or card sorting. It can be done manually or through an online survey. In both cases, participants are asked to look at the surviving storylines and sort them into 2 piles, which can be of any size. The contents of each pile should have some characteristics in common which are not present in the other pile. Those characteristics should be ones that the participant sees as significant, or at least interesting, in the context of the aim of the particular ParEvo exercise. Once the 2 piles are constructed the participant should then describe what is unique about the contents of each of the piles.
Here is an example of how the exercise is presented via a Survey Monkey online survey:
This exercise generates a dataset describing what storylines were put into what groups (and its characteristics) by which participants. Social network analysis software can be then used to provide an aggregate perspective, showing how storylines are connected to each other by frequently being placed in the same piles.
The same exercise generates qualitative data i.e the descriptions given by participants to the membership of each ile – what they had in common and how they differed from other piles. Here are some of the descriptions of what is common between some of the above storylines
Storyline 41 and 40
- These described more generalised societal problems
- These are attempts to explore issues (the challenge was, however, that issues were all at different scale and focus… making the threads not comparable)
- These describe bigger issues, more human-focused solutions
Storyline 41 and 37
- These storylines all involve named individuals and how they are reacting to climate change and other people’s reactions to climate change. It brings our focus down to a fairly micro level, where we can identify what might or could be happening.
- These show some adaption, accommodation though perhaps not enough.
Storyline 41 and 40 and 37
- These are optimistic
- These have more interesting information on the actions that pushed towards the mitigation of climate change effects
- These revealed the seeking of alternative actions and decision to understand and mitigate climate change
- These are broadly positive, showing signs that people will be able to raise climate change up the global agenda and mobilise efforts to do something about, albeit not without a lot of struggle and difficulty.
- Here some positive message is clear (though perhaps insufficient)
Looking back at Figure 2, it appears that the two clusters of storylines seen in Figure 5 occupy two distinct spaces on the graph: 40,41 and 37 occupy the bottom right; 38, 39 and 42 occupy the top left.
3. Specialist coding of themes within storylines
By specialist, I mean a person, such as a ParEvo facilitator, or others identified by them who have relevant skills. When examining the storylines generated during a ParEvo exercise they might recognise that the various storylines have different kinds of actors involved and refer to different kinds of events. These could be coded (i.e. categorised) and then analysed to identify: (a) the frequencies of different categories, and (b) the co-occurrence of these categories. Organisers of one of the completed ParEvo exercises have already gone down this route, categorising and coding the content of their generated storylines.
As I have explained elsewhere, I think that network analysis and visualisation software (such as Ucinet/Netdraw) can be a particularly useful means of analysing the relationships between the coded contents of narrative data.
Using the data generated by the content analysis mentioned above, I generated a network visualisation representing the co-occurrences of different themes within contributions, as shown below. Nodes = themes, links between nodes = co-occurrence of the linked themes. Red nodes = problem themes, green nodes = solution nodes. Bear in mind that this is the core of a larger network structure, the peripheral themes with less frequent co-occurrences have been filtered out of this view. Larger nodes = themes mentioned more frequently. What is conspicuous about this network is that problem themes were more densely connected i.e. had much higher levels of co-occurrence, when compared to solution themes. That does not augur well for the resolution of those problems, should future reality actually develop along these lines.
Making sense of complex network diagrams like this can be difficult. One other approach, after comparing core and periphery, is to take look at selected ego-networks. An ego-network perspective selects one node, then the other nodes it is connected to, and the relationships between those other nodes. In the example below the selected ego node is one of the solution themes: vaccines. It is then connect to some but not all of the problem themes and some but not all of the solution themes. The network diagram also shows the frequency of mentions for each theme node, and the frequency of co-occurrence for each link between nodes.
The significance of the co-occurrence of any two themes needs to be assessed in the light of the frequencies of mentions of the individual linked nodes. So, for example, while “crime and violence..” theme has 9 co-occurrence with “food shortage and stockpiling …” theme, this was only a small proportion of the 23 mentions of the “food shortage and stockpiling …” theme. If these frequencies of mention and co-occurrence figures are placed in a Confusion Matrix structure (as shown below) it then appears that:
- “crime and violence ..” were almost sufficient, but by no means necessary, for “food shortage and stockpiling” OR…
- “food shortage and stockpiling” were almost necessary, but by no means sufficient, for “crime and violence ..”
There are two problems with the above analysis. One is that the density of the network may be partly a measurement artefact. Co-occurrence was measured using iterations as the unit of analysis. If individual contributions were used instead of the incidence of co-occurrence same likely to be lower. There were 10 contributions per iteration in this particular exercise, spread over 7 iterations.
The other is that while potential causal contributions can be identified for each link in the resulting network it is not possible to identify the potential causal contributions of packages of links connected to an event of interest. That is, a configuration analysis is not possible. Yet it a complex system it is more than likely that most individual causal influences will be part of a package, rather than necessary and or sufficient by themselves. The solution to this problem is, I think, the same as the above: use individual contributions as the unit of analysis rather than individual iterations. In the example used above, this would provide 60 “cases” compared to the 7 provided by using iterations as cases. There are two types of software that can then be used to this type of configurational analysis: prediction modelling applications (BigML, EvalC3) and Qualitative Comparative Analysis(QCA) software.
2020 04 18 update: When the above content analysis was carried out the incidence of each theme was counted per iteration, over 7 iterations. Two of the theme categories were “Problems” (two sub-sets of 33 and 33) and “Solutions” (1 set of 29). As the iterations progressed two trends were visible:
- The number of problem themes (and their total incidence) declined.
- The number of solution themes (and their total incidence) increased
It subsequently appeared that this trend could be largely explained by changes in guidance given by the exercise facilitator.
This is now possible in two forms.
1. Keyword searches
These can now be carried out using the search facility within ParEvo. The results can then be saved and downloaded. Results of downloaded keyword searches are in the form of an affiliation matrix, where the rows represent each keyword search and each column represents the identity of a particular contribution, and each cell value of one or zero represents whether that particular row keyword was found in that particular column contribution of or not. These matrices can then be analysed using social network analysis software such as Ucinet/NetDraw
2. Text analytics software
This can now be used to automatically extract keywords, or phrases of two , three or more associated words, and words used to describe people, organisations, or places. Free and easy-to-use forms of software of this kind include: text2data.com , Rapid Table , and MonkeyLearn.
The same kinds of software can be used to “sentiment analysis”. I plan to use this to see if contributions containing more negative sentiments are selected for or against, in a given ParEvo exercise. For more on the reliability of sentiment analysis methods see Ribeiro, F. N., Araújo, M., Gonçalves, P., Benevenuto, F., & Gonçalves, M. A. (2016). SentiBench—A benchmark comparison of state-of-the-practice sentiment analysis methods.
4. Machine learning used to find storyline clusters
Storylines can also be grouped or clustered using machine learning algorithms. One of these algorithms, which I have tried, is called topic modelling. This can be done using an online machine learning platform known as BigML, which I can recommend.
The completed storylines, generated by a ParEvo exercise, can be downloaded by the Facilitator. These can be then uploaded to BigML in a CSV. file format where each row contains a complete storyline.
Within BigML choices can be made as to how many clusters are to be identified. In one of my trial runs, I set this choice as 2, so I can compare the results to those of the past sorting run of results described above, which also generate 2 piles (per participant)
Two kinds of results are generated. One is a list of words that are most probably associated with each particular pile, in order of probability. The other is a list of the storylines along with the probabilities of which should clusters they will belong to, given the words found in no storylines. Here are examples of both of those outputs:
The results are broadly similar to the pile sorting exercise results showing in Figure 5: storylines 38, 39 and 42 are in one cluster and storylines 40,41 and 37 are in another cluster (but bear in mind the pile sorting exercise did not include all original participants)
It may be coincidental, but the 2 least similar surviving storylines listed above (39 and 40) are respectively about events happening in Greenland and the Pacific!
One of the challenges when using topic modelling is a clustering method is how to label each cluster. The BigML algorithm does this by selecting the word with the higher highest probability of being associated with a cluster. But this is not necessarily meaningful to humans. Scanning the other words with high probabilities may help, but not always. Another option that I have explored is to do predictive modelling to find out combinations of keywords that best protect membership of a cluster. And another option is simply to take the extreme cases like I have above, and eyeball these to identify, in one’s own opinion, how they most differ.
Evaluating participatory reconstruction of histories
All the above has been about evaluating future scenarios. But as pointed out on the Purpose and Design page, ParEvo can also be used to reconstruct alternate histories, from a given point and location onwards. In those circumstances what sort of criteria would be relevant to the evaluation of the surviving storylines? Some candidates might be:
- Availability of evidence?
- Verifiability – of the events described if no evidence is yet available?
- Continuity/coherence – are there missing gaps and discontinuities?
- Salience – are the most important events included?