On this page:
- Using predefined evaluation criteria
- Using emergent criteria
- Specialist coding of themes within storylines
- Machine learning
- Evaluating participatory reconstruction of histories
:1. Monitoring versus evaluation: Most of the contents of this page talks about content analysis within a tacit assumption that an evaluation had taken place and that various forms of content analysis could be part of that process. That could be misleading. I have now set up a separate page on Qualitative Monitoring, arguing that this is well worth paying attention to.
2. Surviving versus extinct storylines Most of the discussion below focuses on the analysis of the contents of surviving storylines. However in many exercises these are outnumbered by extinct storylines. For more on the significance of extinct storylines go to: Analysing Extinct Storylines
1. Using predefined evaluation criteria
The ParEvo web app has a built-in evaluation stage. When this is triggered by the facilitator, at a time of their own choosing, the following panel appears above any currently visible storyline.
1.1 Likelihood and desirability
Participants are able to scan across all surviving storylines and click on each of the above options to reflect their choice of which storyline best fit each of these criteria. These choices are recorded and aggregated by the web app, and can be downloaded as a dataset by the facilitator. When loaded into Excel one way of visualising the aggregated evaluation results is as follows:
Each circle in the graph represents a particular storyline, and the numbers beside the circle reflect the ID of that storyline. The Y axis shows the degree to which the storyline has a low versus high probability of actually happening, and the X axis shows the degree to which the participants think the storyline concerns something that has low versus high desirability.
The associated data matrix is shown in Figure 3.
1.2 Probability and uncertainty
The central position of many of the storylines in this graph reflects their contested or uncertain judgment. If there was uniform agreement on the probability and desirability of a storyline then it would be located on the outer corners of this graph. More central positions are net positions on desirability and probability. Or, they can reflect the absence of judgements (i.e. not selected by anyone as most or least desirable od likely)
These distinctions have some degree of corresponence with the risk versus uncertainty distinction identified by economists (Knight). They would distinguish between:
- Scenarios where some relative probabilities can be assigned (identifiable risks)
- Scenarios where it is not possible to do so (uncertainties)
Except that there in the above matrix there are four categories, not two
- Scenarios where there is high agreement in probability. No examples in Figure 3 above
- Scenarios where probability is partially agreed on (some 0 values in the matrix). Examples are #42 and #38 in Figure 3
- Scenarios, where probability judgements are contradictory e.g., assessed as both more and less probable. Examples are #41 and #37 in Figure 3
- Scenarios where there there are no judgements on probability (all 0 values). Examples are #39, #40 and #43 in Figure 3)
This may be a more realistic view. Here is one commentary on the risk/uncertainty distinction: “Some economists have argued that this distinction is overblown. In the real business world, this objection goes, all events are so complex that forecasting is always a matter of grappling with “true uncertainty,” not risk; past data used to forecast risk may not reflect current conditions, anyway. In this view, “risk” would be best applied to a highly controlled environment, like a pure game of chance in a casino, and “uncertainty” would apply to nearly everything else“
Perhaps, unlike the striking academics in the Hitchhikers Guide to Galaxy, we should not be demanding “clearly demarcated areas of doubt and uncertainty” 🙂
1.3 Alternative criteria
It is now possible for a ParEvo exercise Facilitator to change the default evaluation criteria. Other possible criteria that could be used, but have not yet been tried out are:
- Equity and Sustainability, where each axis has range from most to least equitable/sustainable change, as described in the different ParEvo storylines
- Individual versus Society level changes, combined with Formal versus Informal change. This frame is borrowed from, and described in detail, on page 22 of The ALIV[H]E Framework: Action Linking Initiatives on Violence Against Women and HIV Everywhere.
2. Using emergent criteria
2.1 Pile sorting by participants
This approach uses a method described as pile or card sorting. It can be done manually or through an online survey. In both cases, participants are asked to look at the surviving storylines and sort them into 2 piles, which can be of any size. The contents of each pile should have some characteristics in common which are not present in the other pile. Those characteristics should be ones that the participant sees as significant, or at least interesting, in the context of the aim of the particular ParEvo exercise. Once the 2 piles are constructed the participant should then describe what is unique about the contents of each of the piles.
Here is an example of how the exercise is presented via a Survey Monkey online survey:
This exercise generates a dataset describing what storylines were put into what groups (and its characteristics) by which participants. Social network analysis software can be then used to provide an aggregate perspective, showing how storylines are connected to each other by frequently being placed in the same piles.
The same exercise generates qualitative data i.e the descriptions given by participants to the membership of each ile – what they had in common and how they differed from other piles. Here are some of the descriptions of what is common between some of the above storylines
Storyline 41 and 40
- These described more generalised societal problems
- These are attempts to explore issues (the challenge was, however, that issues were all at different scale and focus… making the threads not comparable)
- These describe bigger issues, more human-focused solutions
Storyline 41 and 37
- These storylines all involve named individuals and how they are reacting to climate change and other people’s reactions to climate change. It brings our focus down to a fairly micro level, where we can identify what might or could be happening.
- These show some adaption, accommodation though perhaps not enough.
Storyline 41 and 40 and 37
- These are optimistic
- These have more interesting information on the actions that pushed towards the mitigation of climate change effects
- These revealed the seeking of alternative actions and decision to understand and mitigate climate change
- These are broadly positive, showing signs that people will be able to raise climate change up the global agenda and mobilise efforts to do something about, albeit not without a lot of struggle and difficulty.
- Here some positive message is clear (though perhaps insufficient)
Looking back at Figure 2, it appears that the two clusters of storylines seen in Figure 5 occupy two distinct spaces on the graph: 40,41 and 37 occupy the bottom right; 38, 39 and 42 occupy the top left.
2.2 Pile sorting variants by others
Pile sorting by observers
The same kind of pile sorting exercise can also be carried out by others, such as Observers. One option currently being explored is for a set of Observers to review each new set of contributions , and to each sort those contributions into two piles (or any size according to what they saw as the most significant difference between them. These simple analyses could then serve two purposes:
- Firstly, where an observer identifies a particular type being dominant i. e.. most are pessimistic, then the facilitator may choose to add a comment to one or more of these kinds of contributions raising the question like this “Are there any possible positive outcomes here? This might help restore more diversity of responses in the next round. But it would be a judgment call as to when and where they wanted to make such comments..
- The second use is as a form of running/cumulative content analysis, and is in contrast to more common practice of only doing this at the end of an exercise (as in most ParEvo exercises to date). The Facilitator would accumulate a matrix of differences x contributions (with different observers ‘differences being identifiable). This data set could then be analysed to “charactrise” the different kinds of storylines e.g surviving versus extinct, likely versus unlikely, etc. This form of analysis would be different to much content analysis in that it would be engaging multiple observers rather than one or two content analysts at the end of the content. Repetition of identified differences, by different observers, might give more confidence in their significance (but would probably also need further discussion).
In-depth pile sorting
Another type of pile sorting, known as Hierarchical Card Sorting (HCS) is particularly suited for use by single individuals, rather than a group of people (whose responses are then aggregated). HCS generates a more detailed nested categorisation of sorted entities. Because the ParEvo storylines are complex entities, a variant of the HCS has been developed to make the sorting task less cognitively demanding for respondents. This involves a sequence of pair comparisons of individual items, rather than comparisons of groups of items. Another simplifying difference is the focus on the final paragraph (and outcome) of each storyline as the items to be compared, rather than the whole text of each storyline. It is described in detail here: https://mande.co.uk/special-issues/hierarchical-card-sorting-hcs/#incremental Here is an example of a nested classification of storyline outcomes from a recent exercise. Storyline IDs are listed on the left. Click on the image to expand its size and readability.
3. Specialist coding of themes within storylines
3.1 Manual coding
By specialist, I mean a person, such as a ParEvo facilitator, or others identified by them who have relevant skills. When examining the storylines generated during a ParEvo exercise they might recognise that the various storylines have different kinds of actors involved and refer to different kinds of events. These could be coded (i.e. categorised) and then analysed to identify: (a) the frequencies of different categories, and (b) the co-occurrence of these categories. Organisers of one of the completed ParEvo exercises have already gone down this route, categorising and coding the content of their generated storylines.
As I have explained elsewhere, I think that network analysis and visualisation software (such as Ucinet/Netdraw) can be a particularly useful means of analysing the relationships between the coded contents of narrative data.
Using the data generated by the content analysis mentioned above, I generated a network visualisation representing the co-occurrences of different themes within contributions, as shown below. Nodes = themes, links between nodes = co-occurrence of the linked themes. Red nodes = problem themes, green nodes = solution nodes. Bear in mind that this is the core of a larger network structure, the peripheral themes with less frequent co-occurrences have been filtered out of this view. Larger nodes = themes mentioned more frequently. What is conspicuous about this network is that problem themes were more densely connected i.e. had much higher levels of co-occurrence, when compared to solution themes. That does not augur well for the resolution of those problems, should future reality actually develop along these lines.
Making sense of complex network diagrams like this can be difficult. One other approach, after comparing core and periphery, is to take look at selected ego-networks. An ego-network perspective selects one node, then the other nodes it is connected to, and the relationships between those other nodes. In the example below the selected ego node is one of the solution themes: vaccines. It is then connect to some but not all of the problem themes and some but not all of the solution themes. The network diagram also shows the frequency of mentions for each theme node, and the frequency of co-occurrence for each link between nodes.
The significance of the co-occurrence of any two themes needs to be assessed in the light of the frequencies of mentions of the individual linked nodes. So, for example, while “crime and violence..” theme has 9 co-occurrence with “food shortage and stockpiling …” theme, this was only a small proportion of the 23 mentions of the “food shortage and stockpiling …” theme. If these frequencies of mention and co-occurrence figures are placed in a Confusion Matrix structure (as shown below) it then appears that:
- “crime and violence ..” were almost sufficient, but by no means necessary, for “food shortage and stockpiling” OR…
- “food shortage and stockpiling” were almost necessary, but by no means sufficient, for “crime and violence ..”
There are two problems with the above analysis. One is that the density of the network may be partly a measurement artefact. Co-occurrence was measured using iterations as the unit of analysis. If individual contributions were used instead of the incidence of co-occurrence same likely to be lower. There were 10 contributions per iteration in this particular exercise, spread over 7 iterations.
The other is that while potential causal contributions can be identified for each link in the resulting network it is not possible to identify the potential causal contributions of packages of links connected to an event of interest. That is, a configuration analysis is not possible. Yet it a complex system it is more than likely that most individual causal influences will be part of a package, rather than necessary and or sufficient by themselves. The solution to this problem is, I think, the same as the above: use individual contributions as the unit of analysis rather than individual iterations. In the example used above, this would provide 60 “cases” compared to the 7 provided by using iterations as cases. There are two types of software that can then be used to this type of configurational analysis: prediction modelling applications (BigML, EvalC3) and Qualitative Comparative Analysis(QCA) software.
2020 04 18 update: When the above content analysis was carried out the incidence of each theme was counted per iteration, over 7 iterations. Two of the theme categories were “Problems” (two sub-sets of 33 and 33) and “Solutions” (1 set of 29). As the iterations progressed two trends were visible:
- The number of problem themes (and their total incidence) declined.
- The number of solution themes (and their total incidence) increased
It subsequently appeared that this trend could be largely explained by changes in guidance given by the exercise facilitator.
3.2 Automated coding
This is now possible in two forms.
3.2.1 Keyword searches
These can now be carried out using the search facility within ParEvo. The results can then be saved and downloaded. Results of downloaded keyword searches are in the form of an affiliation matrix, where the rows represent each keyword search and each column represents the identity of a particular contribution, and each cell value of one or zero represents whether that particular row keyword was found in that particular column contribution of or not. These matrices can then be analysed using social network analysis software such as Ucinet/NetDraw
3.2.2 Text analytics software
This can now be used to automatically extract keywords, or phrases of two , three or more associated words, and words used to describe people, organisations, or places. Free and easy-to-use forms of software of this kind include: text2data.com , Rapid Table , and MonkeyLearn.
The same kinds of software can be used to “sentiment analysis”. I plan to use this to see if contributions containing more negative sentiments are selected for or against, in a given ParEvo exercise. For more on the reliability of sentiment analysis methods see Ribeiro, F. N., Araújo, M., Gonçalves, P., Benevenuto, F., & Gonçalves, M. A. (2016). SentiBench—A benchmark comparison of state-of-the-practice sentiment analysis methods.
4. Machine learning
4. 1 Used to find storyline clusters
Storylines can also be grouped or clustered using machine learning algorithms. One of these algorithms, which I have tried, is called topic modelling. This can be done using an online machine learning platform known as BigML, which I can recommend.
The completed storylines, generated by a ParEvo exercise, can be downloaded by the Facilitator. These can be then uploaded to BigML in a CSV. file format where each row contains a complete storyline.
Within BigML choices can be made as to how many clusters are to be identified. In one of my trial runs, I set this choice as 2, so I can compare the results to those of the past sorting run of results described above, which also generate 2 piles (per participant)
Two kinds of results are generated. One is a list of words that are most probably associated with each particular pile, in order of probability. The other is a list of the storylines along with the probabilities of which should clusters they will belong to, given the words found in no storylines. Here are examples of both of those outputs:
The results are broadly similar to the pile sorting exercise results showing in Figure 5: storylines 38, 39 and 42 are in one cluster and storylines 40,41 and 37 are in another cluster (but bear in mind the pile sorting exercise did not include all original participants)
It may be coincidental, but the 2 least similar surviving storylines listed above (39 and 40) are respectively about events happening in Greenland and the Pacific!
One of the challenges when using topic modelling is a clustering method is how to label each cluster. The BigML algorithm does this by selecting the word with the higher highest probability of being associated with a cluster. But this is not necessarily meaningful to humans. Scanning the other words with high probabilities may help, but not always. Another option that I have explored is to do predictive modelling to find out combinations of keywords that best protect membership of a cluster. And another option is simply to take the extreme cases like I have above, and eyeball these to identify, in one’s own opinion, how they most differ.
4.2 Used to predict storyline survival
One of the datasets that can be downloaded is a Participant x Storyline matrix. This matrix shows participants in the rows and storylines in the columns, with cell values indicating which participant contributed to which storyline. The extent storylines are shown in the uppermost section of the matrix and the surviving storylines in the lower part of the matrix. This data can be imported into and analysed by the EvalC3 Excel app. Using any one of the four search algorithms is possible to identify if the contributions of any particular combination of participants to a storyline is a good predictor of whether that storyline will survive or become extinct.
My initial exploration of this type of analysis did not find any strong relationship between who contributed to a storyline and its survival or extinction. But this finding cannot be guaranteed for any and every exercise. It would be worth carrying out on each new exercise.
Other similar forms of predictive modelling should be also be possible. For example, using the co-occurrence of various keywords of interest as possible predictors of storyline survival or extinction. Data on the results of keyword searches can be downloaded in the form of an affiliation matrix showing which keywords were found in which numbered contributions. The later could be coded according to whether they were part of extinct storyline segments, or not
4.3 Used to summarise pile sorting data
Data sorted by free sort exercises, discussed in section 2.2 above, can be used to summarise the contents of each storyline into a string of attributes. In each iteration observers may have free-sorted the contributions into two categories and then named each of these. This means that each storyline will be describable by the attributes each of their constituent contributions now have. That data can then be summarised in a matrix where rows – storylines and columns = attributes (2 per iteration, shown as 0 or 1). An additional column can be added that describes the outcome status of each storyline. This could be its survival or extinction or some other attribute based on free sorting or some other designed coding. The Excel based EvalC3 app can then be used to identify one or predictive models, which indicate what combinations of storyline attributes best predict the storyline outcome of interest.
5. Evaluating participatory reconstruction of histories
All the above has been about evaluating future scenarios. But as pointed out on the Purpose and Design page, ParEvo can also be used to reconstruct alternate histories, from a given point and location onwards. In those circumstances what sort of criteria would be relevant to the evaluation of the surviving storylines? Some candidates might be:
- Focuses on the analysis of the contents of surviving storylines all Availability of evidence?
- Verifiability – of the events described if no evidence is yet available?
- Continuity/coherence – are there missing gaps and discontinuities?
- Salience – are the most important events included?