ORGAPET Section D1:
Integrating and Interpreting Results of Evaluations

Nic Lampkin
Aberystwyth University, UK

Version 6, April 2008

D1-1    Introduction

Many of the evaluation tools presented in ORGAPET have been developed in the context of relatively simple programmes where the use of one or a few indicators presents no major problems. The challenge with complex policy programmes, such as organic action plans with their multiple objectives, action points and policy instruments, as well as multiple stakeholders and beneficiaries, is to reach a conclusion that reflects all the different elements fairly and appropriately.

This section examines how a diverse range of results can be integrated to provide an overall assessment or evaluative judgement of action plans, allowing for trade-offs and conflicts between objectives and the differing priorities of stakeholders. A range of approaches, from stakeholder feedback and expert judgement to formal methods such as multi-criteria and cost-benefit analysis, building on both the qualitative information and quantitative indicators, are described.

D1-2    Issues to be considered

It is important when synthesising the results of an evaluation to remember the original purpose of the evaluation (formative or summative) and the stage in the policy cycle (ex-ante, mid-term, ex-post), as this will have a significant influence on the way results are interpreted. Key issues that may need to be considered include:

D1-3    Interpreting and comparing indicator results

Indicators can rarely be interpreted in isolation and need to be looked at in a comparative framework in conjunction with qualitative findings. It is necessary to consider the context as a whole, the factors which help to facilitate or hinder the performance of the programme, the rationales of the programme and the process of implementation.

Evalsed provides some indications as to how the process of interpretation and comparison can be approached. Indicators can be compared with context or programme indicators, a procedure that can be carried out concisely by expressing the result as a per cent of the relevant context indicator (see Section C3 for examples). Comparison with other related indicators may also be worthwhile: for example, if planned expenditure, committed expenditure and actual expenditure are looked at together, is there evidence of implementation failure that might not be noted when the indicators are examined individually.

The results and conclusions can be summarised in a synoptic table (Table D1-1), where policy measures or actions are shown in individual rows and the impacts (e.g. economic, environmental) are identified in columns. The available qualitative/quantitative data and (provisional) conclusions can be summarised in each cell.

Table D1-1: Extract from synoptic presentation of conclusions relative to an impact

Measure Description of the equal opportunities impact
1 Training intended for the long-term unemployed The training had a positive impact in terms of orientating women towards secure jobs traditionally occupied by men
2 Aid for new business creation


45% of the new businesses were created by women, as compared to the national average which is 30%. The difference can be imputed to the programme.
  Other measures Etc.

Source: Evalsed

Whilst it might be tempting to combine the results from several indicators into a single overall score or index, this is not advisable as important details may be lost, and the weightings used (if any) are likely to reflect only one particular perspective amongst many.

An alternative way of dealing with multiple indicators, especially when comparing different policy options, is to visualise them using radar or ‘cobweb’ diagrams (Figure D1-1). One policy option may perform very well with respect to one indicator but relatively poorly with respect to the others, while another option may not score so well on that particular indicator but a higher score achieved for the others gives the better result overall.

Figure D1-1: A hypothetical radar or ‘cobweb’ diagram

When considering performance or cost-effectiveness, certain policy interventions may appear to perform less well than others, but there is also the need to consider how difficult the specific environment is where the intervention is being applied, and the relative size of the effect that is being achieved. For example, schools in poorer areas may achieve lower grades than schools in richer areas for the same investment but, if they are starting from a lower position, the value added could be greater. This may be resolved by a benchmarking approach using a cluster of similar situations as the basis for comparison.

Evalsed also identifies how indicators can be used to improve management. Here the focus is on managing performance rather than resources: the focus of assessments is not on controlling the use of resources and the output achieved, but on overall performance in terms of results and impacts. Using this performance management approach, operators are endowed with greater autonomy in the use of their resources. In return, they commit themselves to clearer objectives as regards the results and impacts to be obtained. The indicators are used to measure their performance in this context.

However, the use of indicators in this way may be limited by potential adverse effects, which include:

  • Skimming-off or creaming,

  • Convergence to the average,

  • Unanticipated effects where results are subordinated to indicator scores.

Skimming-off or creaming effects can occur when organisations preferentially select beneficiaries who are most likely to provide good results or high indicator scores. For example, high examination performance grades can be achieved if only the best students are allowed to be entered. This effect is undesirable because it focuses assistance on those who are in relatively less need.

Convergence towards the average can occur if undue weight is given to poorly performing areas of activity. If resources are moved from better performing areas of activity to try to improve the poorly performing ones, improvement at the bottom end may be achieved at the cost of reduced performance at the top end, resulting in convergence towards the middle, rather than focusing on excellence.

Unanticipated effects can occur where indicators reward undesired results, or where operators work to deliver the indicator rather than the objective that the indicator is intended to reflect. For example, if the objective is to provide information to a target group but the indicator focuses strongly on the number of meetings to be held, then other forms of communication may be ignored if the operator adheres strictly to the target number of meetings.

Adverse effects inevitably appear after a system of indicators has functioned for two or three years, no matter how well it is designed. These undesirable effects are generally not foreseeable, but the possible appearance of such effects should not be an argument for refusing to measure performance. It is possible to minimise adverse effects, either by amending the indicator causing the problem, or by creating a procedure for interpretation of the indicator by expert panels. It is then important to watch out for the appearance of adverse effects and to correct the system when these effects appear.

D1-4    Stakeholder/expert judgement

The consideration of indicators so far has focused on specific issues of interpretation that might be addressed by the evaluation team or programme managers. However, their interpretations may be very different from those of beneficiaries and other stakeholders. In order to address this, one option is to present the results to the action plan steering group or a special workshop of stakeholders, and to invite comments on the interpretations that have been made. There is, of course, no guarantee that the different perspectives that might be presented will be taken up by the evaluators in the final report. There is also a potential conflict between the need for the evaluation team to be impartial and the perception that stakeholder views may be partial, i.e. be focused on their specific (particularly business or political) interests.

An alternative approach to constructing a synthetic judgement of the programme being evaluated is to use specially constituted expert panels, through procedures similar to those described in ORGAPET Sections A4 and C4. In this context, however, the expert panel is not being used to develop policy proposals or evaluate impacts in the context of individual indicators, but to collectively produce a value judgement on the programme as a whole. Expert panels are used to reach consensus on complex and ill-structured questions for which other tools do not provide univocal or credible answers. They are a particularly useful tool in relation to complex programmes, when it seems too difficult or complicated, in an evaluation, to embark on explanations or the grading of criteria in order to formulate conclusions. Expert panels can take account of the quantitative and qualitative information assembled as part of the evaluation, as well as the previous and external experiences of the experts. The practical steps involved in setting up expert panels are outlined in Evalsed.

The experts should be chosen to represent all points of view in a balanced and impartial way. These experts are independent specialists, recognised in the domain of the evaluated programme. The definition of expert can include stakeholders that meet pre-defined selection criteria. They are asked to examine all data and analyses made during the evaluation, in order to highlight areas of consensus on the conclusions that the evaluation must draw and, particularly, on the answers to the key evaluative questions. The panel does not fully explain its judgement references nor its trade-off between criteria, but the credibility of the evaluation is guaranteed by the fact that the conclusions result from consensus between people who are renowned specialists and represent the different 'schools of expertise'. The advantage of this type of approach is that it takes account of the different possible interpretations of results that might be made by different experts.

However, Evalsed also identifies potential weaknesses of an expert panel approach. The experts must have extensive experience in the field and, therefore, are at risk of bias and unwillingness to criticise the relevance of objectives or to focus on any undesirable effects. Moreover, the comparison of opinions often leads to the under-evaluation of minority points of view. The consensual mode of functioning on which the dynamics of the panel is based, produces a convergence of opinions around majority values which are not necessarily the most relevant. To some extent the potential weaknesses of expert panels can be avoided by taking precautions in the way they are assembled and organised. This could include:

Evalsed provides an example of the use of scoring systems in an expert panel context, whereby an evaluation team had evidence of the impact of a range of measures, but where the impacts were not directly comparable and the opinion of an expert panel was required. In a half-day seminar, the evaluation conclusions were presented to the participants, measure by measure, so that they could be validated and their credibility verified. The experts were then presented with a synoptic table of conclusions and each participant was asked to, intuitively, situate the impacts of each measure on a scale (from maximum positive impact, through neutral impact, to maximum negative impact). The participants' classifications were compared, discussed with them and made to converge as much as possible. After the seminar, the classifications were converted into scores ranging from -10 (maximum negative impact) to +10 (maximum positive impact), through 0 (neutral impact). The synoptic table was converted into a table of ratings (impact scoring matrix). The construction of scoring scales allows for comparisons to be made within the same column (e.g. one particular measure has a better rating than another with respect to a particular impact). On the other hand, scores in different columns cannot be compared (e.g. a score of 5 for employment is not comparable to the same score for an environmental impact). Many of the procedures described here are similar to those used in the more formalised Nominal Group Technique described in detail in ORGAPET Section C4.

D1-5    Formal methods

More formalised techniques for making evaluative judgements include multi-criteria, cost-benefit and cost-effectiveness analysis, as well as benchmarking and Environmental Impact Assessment. The application of many of these approaches to agri-environmental policy evaluation was reviewed as part of an OECD-sponsored workshop (OECD, 2004). Some involve the allocation of monetary values to outcomes that are normally unpriced (due to the absence of a market for the 'goods'), potentially making them more difficult to apply. However, if it can be done, it might be worth determining a measure of the return to resources invested in the action plan, provided that the analysis focuses on the multiple objectives that organic farming and the organic action plan are seeking to deliver, not just on a single measure or objective.

D1-5.1    Multi-criteria analysis

Multi-criteria analysis (see also Bouyssou et al., 2006) is a decision-making tool used to assess alternative projects or heterogeneous policy measures, taking several criteria into account simultaneously in a complex situation. The method is designed to reflect the opinions of different actors – their participation is central to the approach. It may result in a single synthetic conclusion, or a range reflecting the different perspectives of partners. The approach described here was used as part of the EU-CEE-OFP project to evaluate different organic farming policies (Annex D1-1).

Many of the stages in applying the multi-criteria analysis approach, in particular the definition of the actions to be judged and the relevant performance criteria, are similar to the procedures outlined in ORGAPET Sections C1 and C2 for structuring objectives and defining indicators. The key issue at the synthesis stage is how the weightings for (and trade-offs between) performance criteria are determined by the evaluators and by the stakeholders.

The first step is the construction of a multi-criteria evaluation matrix which should have as many columns as there are criteria and as many rows as there are measures to be compared. Each cell represents the evaluation of one measure for one criterion. Multi-criteria analysis requires an evaluation of all the measures for all the criteria (no cell must remain empty), but does not require that all the evaluations take the same form, and can include a mix of quantitative criteria expressed by indicators, qualitative criteria expressed by descriptors, and intermediate criteria expressed by scores (similar to a synoptic table comparing measures and impacts).

The relative merits of the different measures can then be compared by one of two scoring techniques: compensation or outranking. Outranking does not always produce clear conclusions, whereas analysis based on compensation is always conclusive. From a technical point of view, the compensation variant is also easier to implement. The most pragmatic way of designing the multi-criteria evaluation matrix is for the evaluation team to design scoring scales to all the evaluation conclusions, whether quantitative or qualitative. The multi-criteria evaluation matrix is then equivalent to the impact scoring matrix. Usually the compensation method is used unless members of the steering group identify a problem which might justify the use of the veto system.

The next step is to evaluate the impacts or effects of the actions in terms of each of the selected criteria. If the compensation method is used, the process involves allocating scores and a simple analysis using a basic spreadsheet. For the outranking variant, the approach will differ according to the type of analysis. The process could be based on quantitative data or undertaken, more subjectively, by experts or the stakeholders of the evaluation themselves. In reality, the technique usually combines factual and objective elements concerning impacts, with the points of view and preferences of the main partners or 'assessors' (e.g. evaluation steering group, using individual or focus group interviews).  The assessors' preferences may be taken into account by:

  • direct expression in the form of weighting attributed to each criterion (e.g. distributing points in a voting system);

  • revealing preferences by classification of profiles, where successive pairs of profiles are presented, preferences for one compared with the other in the pair are expressed as weak, average, strong or very strong, and the results are analysed using dedicated software;

  • revealing preferences through the ranking of real projects, which may be seen by participants as more realistic than the classification of profiles approach.

In the final step, computer software can be used to sort the actions in relation to each other. A single weighting system for criteria can then be deduced, or the evaluation team and steering group can decide to establish average weightings which has the effect of downplaying the different points of view among the assessors. There are three different approaches to the aggregation of judgements:

  • Personal judgements: the different judgement criteria are not synthesised in any way. Each participant constructs their own personal judgement based on the analysis and uses it to argue their point of view.

  • Assisting coalition: the different judgement criteria are ranked using a computer package. An action will be classified above another one if it has a better score for the majority of criteria (maximum number of allies) and if it has less 'eliminatory scores' compared to the other criteria (minimum number of opponents).

  • Assisting compromise: a weighting of the criteria is proposed by the evaluator or negotiated by the participants. The result is a classification of actions in terms of their weighted score.

It is now possible to calculate global weighted scores for the different measures. The results and impacts of each measure will have been evaluated in relation to the same criteria; all these evaluations will have been presented in the form of scores in an impact scoring matrix; and there is a weighting system which expresses the average preferences of assessors for a particular criterion. The global score is calculated by multiplying each elementary score by its weighting and by adding the elementary weighted scores. Based on weighted average scores, the evaluation team can classify measures by order of contribution to the overall success of the programme.

The synthesised judgement on the effectiveness of measures is usually considered sound and impartial provided that:

  • the evaluation criteria have been validated by the steering group;

  • the conclusions on the impacts of each measure, as well as the impact scoring matrix summarising them, have been validated;

  • the weighting coefficients for criteria have been established with the assistance of the assessors and the agreement of the steering group.

Experience also shows that the partners are far more willing to accept the conclusions of the report if the evaluation team has recorded their opinions carefully and taken the trouble to take their preferences into account in presenting its conclusions. If, on the contrary, the evaluation team chooses and weights the criteria itself, without any interaction with its partners, the impartiality of the results will suffer and the multi-criteria analysis will be less useful.

D1-5.2    Cost-benefit analysis

Cost-benefit analysis (CBA) (see also Pearce et al., 2006) is a method of evaluating the net economic impact of a public project which has some similarities to multi-criteria analysis, but with the aim of expressing the result in monetary terms. Various techniques can be applied to the valuation of non-financial benefits so that externalities can also be taken into account. Projects typically involve public investments but, in principle, the same methodology is applicable to a variety of interventions, for example, subsidies for private projects, reforms in regulation, new tax rates. CBA is normally used in ex-ante evaluation to make a selection between projects, typically of a large infrastructure nature. It is not normally used to evaluate programmes and policies, even though, in principle, it could be used to study the effect of changes in specific political parameters (for example customs tariffs, pollution thresholds, etc.).

D1-5.3    Cost-effectiveness analysis

Cost-effectiveness analysis (CEA) (see also OECD, 2004 and Annex D1-1) is a tool that can help to ensure efficient use of resources in sectors where benefits are difficult to value. It is a tool for the selection of alternative projects with the same objectives (quantified in physical terms). CEA can identify the alternative that, for a given output level, minimises the actual value of costs or, alternatively, for a given cost, maximises the output level. This might, for example, be relevant if organic farming is being compared with other agri-environment schemes in terms of biodiversity outputs and the costs of achieving those outputs. CEA is used when measurement of benefits in monetary terms is impossible, where the information required is difficult to determine, or where any attempt to make a precise monetary measurement of benefits would be open to considerable dispute. It does not, however, consider subjective judgements and is not helpful in the case of projects with multiple objectives. In the case of multiple objectives, a more sophisticated version of the tool could be used, weighted cost-effectiveness analysis, which gives weights to objectives in order to measure their priority scale. Another alternative is a multi-criteria analysis. The CEA technique, which looks at the cost of an intervention and relates it to the benefits created, is also closely related to the use of a Value for Money Assessment (though value for money does not necessarily mean achieving outcomes at the lowest cost).

D1-5.4    Benchmarking

Adopted from the private sector, benchmarking has become an increasingly popular tool for improving the policy implementation processes and outcomes of the public sector. Benchmarking was originally developed by companies operating in an industrial environment to improve competition and has therefore been applied most widely at the level of the business enterprise. The technique is based on the exchange and comparison of information between organisations in a given field, one or more of which is regarded as an example of good or best practice. This is potentially relevant in a policy framework, including organic action plans, where, for example, comparisons are being made between countries or regions.

D1-5.5    Environmental Impact Assessment

In certain situations, Environmental Impact Assessment (EIA) may be relevant as a method of assessing the environmental impact of a project before it is undertaken. This might be relevant where a significant capital investment in processing or distribution facilities might be involved, in particular where there might be concerns that the negative environmental impacts of a development of this type might outweight the benefits to be derived from organic land management/production of the raw materials. It is seldom applied in a mid-term or ex-post situation, where appropriate environmental indicators analysed using the other techniques outlined above is likely to be more relevant.

D1-6    Conclusions

An evaluation is incomplete if it only includes monitoring results for a series of indicators. There is a need for evaluative or synthetic judgements to be derived, a process which needs the input of stakeholders and impartial experts, so that different perspectives on interpreting the results can be considered. Where possible, a consensus on the overall effect of the programme is desirable - this should also include the answering of the key evaluation questions identified at the outset. To achieve this, adequate resources need to have been allocated to the evaluation process from the outset, to ensure monitoring systems can be put it place and to permit the final stages of the evaluation to take place as outlined in this section.

It is pointless, however, successfully completing an evaluation if the report is then filed away and nothing is done with it. There is a need to reflect and act on the results in an appropriate stakeholder context, such as that of an action plan steering group, and there is a need to be clear about who is responsible for taking actions arising from the evaluation and for monitoring that the actions have been taken. In an ex-ante or mid-term review, this may involve adjusting objectives, improving monitoring procedures, refining the measures or re-targeting resources. In an ex-post, summative context, the emphasis might be more on highlighting best practice and the general lessons learned (see ORGAPET Section A5-4).

The results of the evaluation also need to be communicated effectively, for example through seminars and publications, to a range of groups:

Ideally, the impact of the evaluation itself on achieving change, learning etc. should be assessed, including whether the evaluation reflected stakeholder goals and expectations.

D1-7    Checklist

  1. Has stakeholder/expert input into the evaluative judgements been included?

  2. Are formal methods for assessing the overall effects relevant?

  3. Have the key issues to be considered identified above been addressed?

  4. Have the key evaluative questions (defined as part of the scope of the evaluation) been answered?

  5. Has a process been put in place to ensure that the results of the evaluation are communicated and applied?

D1-8    References

Bouyssou, D., T. Marchant, M. Pirlot, A. Tsoukiās and P. Vincke (2006) Evaluation and decision models with multiple criteria: Stepping stones for the analyst. International Series in Operations Research and Management Science, Volume 86. Springer, Boston.

Pearce, D., G. Atkinson and S. Mourato (2006) Cost benefit analysis and the environment. Organisation for Economic Co-operation and Development, Paris.

OECD (2004) Evaluating Agri-Environmental Policies: Design, Practice and Results. Organisation for Economic Co-operation and Development, Paris.

D1-9    Annexes

Annex D1-1    Application of multi-criteria analysis to the evaluation of organic farming policies in the EU-CEE-OFP project