Evaluating impact of participatory agricultural interventions: do we see what we want to see?


by Beliyou Haile, Carlo Azzarri, David Spielman, Evgeniya Anisimova, and Frank Place | March 20, 2017

“Agricultural (research) programs without rigorous impact evaluation tend to focus on rapid testing (and rollout) of technologies with high-probability adopters to provide swift feedback to donor constituencies. Such programs also tend to favor numeric accomplishments over a deeper understanding of complex development processes with the unintended consequence of promoting solutions without strong evidence of impact or cost-effectiveness.”

This rather stark observation comes from a recently published paper on the targeting and early effects of the Africa RISING program in Malawi. The study, conducted by one of the program’s partners, the International Food Policy Research Institute (IFPRI), reveals multiple challenges in evaluating the impact of participatory agricultural interventions, especially in programs that work with self-selected farmers and program-selected treatment areas. The authors illustrate how the program's approach, while useful in strengthening the innovative capabilities of participating farmers, may not provide much insight on the efficacy, scalability, and poverty reduction potential of the tested technologies.

Photo credit: S.Malyon/CIAT

In this blog, we asked the authors Beliyou Haile, Carlo Azzarri, and David Spielman to summarize the main discoveries of their study and answer a few questions about the lessons learned and the potential use of this work for future participatory agricultural interventions.


The current world population of 7.3 billion is expected to reach 9.7 billion in 2050, with Africa accounting for more than half of this growth. Will the world be able to feed this growing population sustainably? Are the right technologies being developed to allow smallholders to be part of the solutions?

The USAID-funded Africa Research in Sustainable Intensification for the Next Generation (Africa RISING) program aims to identify, test, and scale up successful technologies and management practices for core farming systems in Africa south of the Sahara, covering Mali, Ghana, Malawi, Tanzania, Zambia, and Ethiopia. Africa RISING technologies are best described as sustainable intensification solutions: an integrated set of inputs, practices, and technologies designed to increase productivity and conserve natural resources.  The new approaches also build resilience to frequent, complex, or unpredictable climatic shocks. So, in a sense, Africa RISING is one of the projects exploring the intersections between productivity and sustainability that are a big part of our future.

The study

The study focuses on the Africa RISING program in Malawi, where researchers have been testing several agricultural technologies and practices since 2012. The program is being implemented in the Dedza and Ntcheu districts of the Central region, and the technologies include fertilized maize, improved legume varieties, as well as maize-legume and legume-legume intercropping. The study examines the targeting and early effects of the technologies on maize yield and harvest values based on data collected in 2013 from three groups – non-randomly selected households testing the technologies (beneficiaries), randomly selected households from program-target villages not participating in the program (non-beneficiaries), and randomly selected households from non-program villages with similar agroecological conditions as program villages, measured by elevation and precipitation (controls). Researchers used alternative matching techniques to construct the missing counterfactual for beneficiaries and, in addition, conducted placebo tests that estimated the extent of targeting bias, if any. Three comparisons were conducted – beneficiaries versus control (out-of-village comparison), beneficiaries versus non-beneficiaries (within-village comparison), and non-beneficiaries versus control (placebo comparison).


The study finds that beneficiaries are systematically different, i.e. better-off than the other two groups along several socioeconomic dimensions including education, land size, and asset-based wealth, which suggests program targeting of better-off households. After taking into account these observed differences through matching techniques, the out-of-village comparison shows significant and positive effects on maize yield and harvest values while the results from the within-village comparison are statistically insignificant (or marginally significant) and much smaller in magnitude. While the statistical insignificance of the latter estimates is linked to small sample, a weaker magnitude implies possible upward biases of the out-of-village estimates. In other words, program beneficiaries could have attained higher yield and harvest values than control households even without the tested technologies due to the observed (and possibly unobserved) socioeconomic differences. This finding is further supported by the placebo tests results, where belonging to the program target villages is associated with a significant positive “effect.” The authors acknowledge that the analysis is based on non-experimental methods and one wave of data, with the consequent limitations with cross-section based analysis.


Q: While the obvious best evaluation approach is to introduce randomness at the pilot site selection stage, this often does not happen and the control group is created after the treatment is rolled out. After examination of the data, it may turn out that some control villages and households are, in fact, unsuitable for this purpose. The technique of matching attempts to identify suitable controls for treatment observations. Matching can be implemented at village (i.e. whole treatment villages might be dropped in the absence of a good match) or household level, or some combination. What are the benefits and limitations of matching at the different levels? 

A: A successful match relies on two key assumptions. First is conditional mean independence, an assumption that potential outcomes are independent of the treatment status and are based only on the observed covariates at an individual, household or village level. It works if variables used in the matching exercise are statistically important in explaining the outcomes. If this is not the case, there usually are important unobservable variables and matching techniques are therefore less valid. The other assumption is the overlap condition, which postulates that the groups compared should be similar in terms of observable characteristics (such as farm size, gender of the household head, education level), summarized by a probability (or propensity) score. The success of any matching (at any level) depends on the extent to which it upholds these two assumptions. While the variables used for matching should explain some of the variability in treatment status, they should not predict it perfectly. (See Caliendo and Kopeinig (2005) for details on these assumptions).

Most applied studies rely on data collected from households (or individuals) rather than from villages. The larger the sample, the easier it is to find a good matched control unit for each treated unit. If intra-village variation is more important than inter-village variation in explaining the treatment results, the conditional mean independence assumption is more likely to be satisfied with household (or individual)-level matching than with village-level matching. On the flip side, household-level matching won’t allow to control for such village-level factors as local governance systems or cultural norms.

Q: What does your experience suggest for how to better select, ex ante, control locations? Are some of the village level variables better indicators of underlying household characteristics than others?

A: One way to improve the matching between treatment and control sites is through adequate characterization of the target area. Working with our local partners, we characterized the focus districts based on several biophysical variables including population density, elevation, precipitation, temperature, market access, length of growing period, and slope. Among them, elevation and temperature-adjusted rainfall were found to be better proxies for the agricultural potential and were used to stratify the target districts into “development domains.” After program implementers selected target sites, control sites have been randomly selected under the constraints that they would be “similar” to target sites while being distant enough from them to avoid contamination. Despite the explicit stratification and constrained random selection, observable as well as unobservable differences between the two groups could persist. This is the main challenge with research studies that attempt to establish attribution using a counterfactual group drawn from “similar” non-treated areas. One possible improvement, provided there are good quality village-level data, is to base the site characterization on several relevant variables including, for example, access to agricultural extension services.

Q: In the case of settling for a second-best approach, and with recognized problems in doing so in this case study, what could be done to improve the validity of the results and what are the types of results that could still be obtained with confidence and be useful for decision makers?

A: In our situation, with non-randomly selected beneficiaries, more rigorous evidence could be generated if panel data (multiple years’ observations for the same households) were available, which would allow the evaluators to control for unobserved time-invariant confounding factors. In this case, researchers would have much more confidence that observed differences in outcomes are a result of the treatment and not of other factors. This should be possible with this study as we plan to continue to collect data in the years ahead.

In addition, evidence on the effects of a specific technology (or of a technology mix) to help prioritize the innovations with the highest potential could be generated if we had larger sample sizes.

One of the key lessons from this study and recommendation for policy makers and donors: rigorous impact evaluation needs to be embedded into research-for-development programs that aim to identify scalable technology options from the very beginning. This is crucial for increasing the chances of generating solid empirical evidence that is both internally valid and reasonably generalizable to the target population.

Citation for the article discussed in this blog:

Haile, B., Azzarri, C., Roberts, C. and Spielman, D. J. (2016), Targeting, bias, and expected impact of complex innovations on developing-country agriculture: evidence from Malawi. Agricultural Economics. http://doi.org/10.1111/agec.12336

The study described in the paper is a collaborative work between Africa Research in Sustainable Intensification for the Next Generation (Africa RISING) research program (funded by the United States Agency for International Development) and the CGIAR Research Program on Policies, Institutions, and Markets (PIM) led by IFPRI.

About the authors:

Beliyou Haile and Carlo Azzarri are Research Fellows at the Environment and Production Technology Division (EPTD) of IFPRI. David Spielman is Senior Research Fellow at EPTD and co-leader of PIM's research cluster on Science Policy and Innovation Systems for Sustainable Intensification (Flagship 1). Evgeniya Anisimova is Communications Specialist at PIM. Frank Place, PIM's Senior Research Fellow, contributed to this blog and suggested questions for the Discussion.