Evaluation

This page summarises the method and results of the participant’s evaluations of ParEvo (MSC and Brexit exercises) pre-tests. In the pretests, participants were asked to assess surviving storylines in terms of their probability and desirability. The same approach is now being built into the ParEvo web app. Later developments will enable the use of a wider range of evaluation tools e.g.

  • The ability to use other evaluation criteria e.g
    • Consistency / coherence
    • Plausibility
    • Relevance / Utility
    • Originality
    • Verifiability
  • The use of open-ended questions e.g.
    • What is missing from all these surviving storylines?
    • What is the most important difference between the surviving storylines?

Go to: MSC exercise  or  Brexit exercise

MSC exercise

After the 4th iteration participants were sent a final Survey Monkey survey. The survey form listed all the surviving storylines and asked participants two questions, as follows:

1. This question is about probability

In your own opinion, which of these storylines do you think is most likely to happen in real life?  And which is least likely?

Please identify one storyline in  the most column, and one in the least columns

Go to this web page to see all the storylines, and use mouseover to see each storyline’s text. PS: Try to pay attention to whole storylines, from beginning to end, not just the most recent text additions.

2. This question is about desirability.

From your own perspective, which of these storylines would you most want to see happen in real life? And which would you least want to see?

Please identify one storyline, in the most and one in the least columns

Go to this web page to see all the storylines, and use mouseover to see each storyline’s text. PS: Try to pay attention to whole storylines, from beginning to end, not just the most recent text additions.

The responses, from 9 of the 11 original participants, were then aggregated in Excel then plotted in the chart below. Each circle represents one storyline, now labeled. Larger circles = higher levels of agreement that a storyline is more or less probable or desirable. Note that the plotted position on each axis represents the net number of people judging a story as more versus less desirable and more versus less probable.

Two storylines stood out as having high probability but contrasting desirability:

1.3.1.1.1 “The process itself proved valuable…[More desirable]

1.6.1.1.1 “”We were hearing the stories of some other areas…[Less desirable]

Screenshot 2019-03-03 13.13.16

You can read the contents of each storyline by viewing the tree diagram here and holding your mouse over any node to see the text added at that point.

Some participants also left specific comments about their choices. You can also leave any comments of your own at the bottom of this web page.

Most likely to happen  1.3.1.1.1 This outcome is less ambitious than above therefore more likely to occur.
1.6.1.1.1 Story revealed frustrations at the ground level. Deserving beneficiaries not getting benefit of project,
Least likely to see happen 1.2.3.1.1 It is very unlikely a diverse group of participants agree on a list of valuable interventions unless more efforts are put into many iterations of more sense making sessions.
Most want to see happen 1.2.3.1.1 Storyline at least shows MSC information about what participants value feeding back to change M&E of the program
Second choice would be 1.4 storyline which shows program managers using the information in some way.
1.3.1.1.1 Discovering wider discrepancy when using indicator measurement alone.
Least want to see happen 1.2.3.1.1 Finding out what people valued helped to revise M&E plan. At one level, it is a good finding.. However, it does not highlight the MSC process.
1.8.1.1.1 Storyline is very centrally driven but ends up with no themes identified – seems the opposite of emergent stories and converging to MSC stories
1.9.1.1.1 It would be very disheartening to have to explain the usefulness of MSC at such a stage of the project.

Brexit exercise

After the 4th iteration participants were sent a final Survey Monkey survey. The survey form listed all the surviving storylines and asked participants two questions, as follows:

1. This question is about probability

In your own opinion, which of these storylines do you think is most likely to happen in real life?  And which is least likely?

Please identify one storyline in  the most column, and one in the least columns

Go to this web page to see all the storylines, and use mouseover to see each storyline’s text. PS: Try to pay attention to whole storylines, from beginning to end, not just the most recent text additions.

2. This question is about desirability.

From your own perspective, which of these storylines would you most want to see happen in real life? And which would you least want to see?

Please identify one storyline, in the most and one in the least columns

Go to this web page to see all the storylines, and use mouseover to see each storyline’s text. PS: Try to pay attention to whole storylines, from beginning to end, not just the most recent text additions.

The responses, from 9 of the 11original participants, were then aggregated in Excel then plotted in the chart below. Each circle represents one storyline, now labeled. Larger circles = a higher level of agreement that this storyline was more or less probable or desirable. Note that the plotted position on each axis represents the net number of people judging a story as more versus less desirable and more versus less probable.

Screenshot 2019-03-03 10.49.10

You can read the contents of each storyline by viewing the tree diagram here and holding your mouse over any node to see the text added at that point.

Four participants also left specific comments about their choices. You can also leave any comments of your own at the bottom of this web page.

  • Most likely to happen
    • As there were four iterations, sometimes some
      iterations were more plausible or probable than others. In such
      instances, the last iteration determined how I rated the storyline
      overall.
    • Likely a hybrid of some of the answers will be the actual result.
      Hard to predict at the moment globally what is going on as a whole.
  • Most want to see happen
    • In the storylines 1.2.2.1.1 and 1.2.2.2.1, I like to see the
      “Remains wins, parliament cancels Brexit” of the 3rd iteration, but
      I don’t like to see the “decade of turbulence” or other negative
      developments. Even so, since the “cancel Brexit” was the most
      important/desirable outcome, I marked these as “Most want to see
      happen”.For 1.3.1.2.1, I very much like “a new wave of enthusiasm”
      and the “emergence of a powerful economic giant” to happen, and
      marked it therefore as “most want to see happen”, but deep down I
      very much doubt that it will come to be.
  • General comments
    • No consistent timescale – it is not always possible to compare
      internally. And like many scenario type narratives, how much of the
      parts do you have to agree with to agree with the whole Again like many scenario exercises, you tend to get less desirable
      scenario dominating
    • No more comment. This is a good exercise already

Probability and uncertainty

In their discussion of scenario planning in the context of disasters Briggs and Matejova (2019) argue that the distinction between probability and uncertainty in scenarios is often conflated or not recognised, yet they have different consequences.

In the evaluation stage of the ParEvo pretest scenario uncertainty could be seen in two ways:

  • Participants gave contradictory ratings on probability (or desirability) to the same scenario, with the net result that the scenario fell in the middle of one or both axes of the graph above
  • Participants did not choose a given scenario to be more or less probable or more or less desirable. This suggests the scenario sits in the midpoint of both axes, but if it was not rated at either extreme of both criteria it could be located anywhere in a  bigger space around that midpoint

Meta-evaluation

Given the results shown in the scatter plot above, how can they be evaluated? There are two layers of data there. One is the rating of each storyline on each evaluation criteria: probability and desirability, in this example. The other is the degree of participant agreement on these ratings. Big circles towards the edges of the plot signify rated positions where participants were largely in agreement. Small circles towards the center signify rated positions where participants were in more disagreement, where their positive and negative ratings were, to some extent, canceling each other out.

Which of these types of outcomes are more important and need to be given more attention, thereafter? And how would one decide? Is usefulness a potential criterion? In other words, are there likely to be more practical implications for future actions that need to be taken, from the storylines in the center or in the periphery?

The social structure of evaluation judgments

One of the data products from an evaluation process is a series of matrices, one per respondents, where rows = storylines, and columns = evaluation criteria, and cell values or 1 or 0 = if a given criterion chosen or not for a given storyline. Using UCINET it is possible to identify the degree to which the contents of each matrix are correlated with each other. The resulting correlation matrix can then be visually represented as a network structure.

The example below is based on the MSC pre-test data (anonymised). Each link represents a positive correlation between two respondents judgments. Thicker lines = a higher correlation. The three participants at the left had a larger number of correlated judgments and the most highly correlated judgments. Those on the right had fewer correlated judgments and these were of lower value.

participants x articipants evaluation judgments

One observation from this analysis was that similarities between respondents in evaluation judgments do not simply match similarities noticed during storyline construction, i.e. as in which participant added to which other participants contributions. This suggests participants’ judgments are changing over time.

Evaluating participatory reconstruction of  histories

All the above has been about evaluating future scenarios. But as pointed out on the Start Here page, ParEvo can also be used to reconstruct alternate histories, from a given point and location onwards. In those circumstances what sort of criteria would be relevant to the evaluation of the surviving storylines? Some candidates might be:

  • Availability of evidence?
  • Verifiability – of the events described, if no evidence is yet available?
  • Continuity/coherence – are there missing gaps and discontinuities?
  • Salience – are the most important events included?

2 thoughts on “Evaluation

  1. Good analysis and findings, if third party can review and give score then could be get different opinion on probability and desirability of this exercise.

  2. Hi Hanif
    I agree with your point. The web version under development will have an option of allowing a wider group of people to make a summary judgement of the surviving storylines

Leave a comment or ask a question

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.