Research Design Process and Stages: Questions and Credibility

Fill In Your Name

01 March 2022

Research Designs Process and Stages

What makes a research question good?

  • The answer to a good research question should produce knowledge that people will care about.

  • Addressing the question should (help) solve a problem, make a decision, or clarify/challenge our understanding of the world.

  • But an interesting question is not enough.

We also need a good research design

  • A good research design is a practical plan for research that makes the best use of available resources and produces a credible answer.

  • The quality of a research design can be assessed by how well it produces results that can be used to guide policy and improve science:

    • A great research design produces results that clearly point in certain directions that we care about.

    • A poor research design produces results that leave us in the dark — results with confusing interpretation.

The importance of theory

All research design involves theory, whether implicit or explicit.

  • Why do the research? We have implicit theories and values which guide the questions we ask. Our questions are value laden: For example, social scientists studied marijuana use in the 1950s as a form of “deviance”, the questions focused on “why are people making such bad decisions?” or “how can policy makers prevent marijuana use?” (see Becker (1998)).

  • Why do the research? We might want to change how scientists explain the world and/or change the policy decisions in (a) one place and time and/or (b) in other places and times.

  • Research focused on learning the causal effect of \(X\) on \(Y\) requires a model of the world: how might intervention \(X\) might have an effect on some outcome \(Y\), and why, and how large might be the effect. It helps us think about how a different intervention or targeting different recipients might lead to different results.

  • Our theories and models are important not just for generating hypotheses, but for informing design and strategies for inference.

  • Designing research will often clarify where we are less certain about our theories. Our theories will point to problems with our design. And questions arising from the process of design may indicate a need for more work on explanation and mechanism.

Designing or selecting your treatment

  • From this point forward, we will use \(T\) for treatment or what we want to learn the effect of. We will use \(X\) to refer to background variables.

  • Your treatment (\(T\)) and control (not \(T\)) need to clearly connect to your research question. (See the module on Measurement.)

  • The treatment you’re interested in might be a bundle of multiple components. If your research question is about one specific component, then the control should be different from the treatment in just that component. Everything else should be the same.

An example

A campaign where someone visits a home to talk with a family for 15 minutes to share health information.

  • If you’re interested in the effect of the specific information, then your control should still have all the other components (home visit with 15 minutes duration, similar visitor, etc.) but have different information. This design will not teach you about the effect of visits, just about the effect of information.

  • If your question focuses on the effect of visits, then you need a control group without a visit. But this design will not do a good job answering specific questions about information (visits and information are bundled together).

Interpretation

  • Sometimes it’s not possible to separate out a specific component of your treatment.

  • For example, your partner community health organization that visits homes may not be interested in visiting homes and sharing other information. Then your control might be no visit.

  • You must be careful to interpret your effects as the effect of the information delivered in this particular way.

  • You will not be able to conclude that you have estimated the effect of only the information.

    • This might be fine for certain policy purposes: maybe the policy question is about the visits as an implicit bundle of treatments.

    • But it is difficult to interpret the results of this design as telling us something clear about information alone.)

The Research Process

An overview of the research process

  • Articulate and fine-tune your question (interrogating why you are asking this question and what will happen given different kinds of possible answers.)

  • Develop your research design.

  • Plan your analysis and state and justify specific hypotheses and register this plan with a credible impersonal date stamp.

  • Implement your intervention and collect data.

  • Analyze your data and write up your results.

EGAP Research Design Form

EGAP Research Design Form

Sections of the EGAP Research Design Form

  • Research question
  • Sample
  • Treatment
  • Outcome
  • Randomization strategy
  • Implementation
  • Power
  • Analysis and interpretation

Research question and motivation

  • What is the substantive motivation for this research? What problem are you trying to address? What decision are you trying to make?

  • Whose mind are you trying to change, and what do they currently believe?

  • What general theoretical questions can this research help address?

  • State your research question in one sentence.

  • What is your main hypothesis?

Sample

  • Where and when will your study take place?

  • Who/what units are in your study?

  • How is this sample selected?

  • Do some units need to be left out of the study, because they must receive treatment or must be left out of treatment for logistical or other reasons?

  • Do you expect treatment to work differently for certain subgroups?

Treatment

  • What is your treatment? Will you have multiple treatments?

  • What will your control condition be? Pure control or placebo?

  • Are there any ethical concerns with the treatment?

  • At what level will you randomize treatment?

Outcome

  • What is your primary outcome?

  • How will you measure it?

  • What data do you need? At what level is the measure available?

  • What are your priors about the outcome? This may come from previous studies or educated guesses.

  • How many rounds of data will you collect?

  • How will you minimize attrition?

  • How will you minimize mismeasurement and untruthful reporting?

Randomization strategy

  • What type of randomization strategy will you use? Examples: simple, complete, block, cluster, factorial, two-tier, step-wedge, etc.

  • Make sure this strategy is consistent with the level of randomization (possible clusters) and expected heterogeneity of treatment effects (possible blocks).

  • Specify your blocks and clusters (if any). How many will you have? How large will they be?

  • Is interference a possible concern? If so, how will your sample selection and randomization strategy minimize interference?

Implementation I

  • How will you do the actual randomization? In public, drawing from a bowl? On a computer?

  • Who will implement the treatment?

  • If there is a partner who will implement the treatment, what arrangements do you have?

  • What are the logistical challenges? Any special challenges for control units?

Implementation II

  • How will you track the quality of implementation?

  • How will you track compliance with the treatment?

  • How will you minimize non-compliance with the treatment (if applicable)?

  • How will you check the quality of your data?

  • How will data be anonymized and stored securely (if applicable)?

Power

  • What is your expected effect size?

    • This might be from a previous study or a target size below which one would not be interested in future interventions.
  • Power calculation.

    • If you have clusters, there are additional concerns with intra-cluster correlation.

Analysis and interpretation

  • What is your estimand? (e.g., average treatment effect, complier average causal effect, etc.)

  • What is your estimator? (e.g., difference in means, OLS with block weights, any clustering). Note that this should be closely linked to your randomization design.

  • If you find that your results are consistent with your hypothesis, what alternative explanations might there be? What data would help you distinguish between your explanation and alternative ones?

  • If you find that your results are not consistent with your hypothesis, what data will help you figure out what might have happened?

DeclareDesign

Introduction to DeclareDesign

  • Declare Design is a software package in R.

  • Helps us be concrete about the stages of research design by allowing us to represent them in code, which then allows us to simulate the stages of research design in order to understand the properties of the statistical estimators and tests that we use.

  • For more see (https://declaredesign.org/getting-started)

  • See also the module on Estimands and Estimators that uses DeclareDesign to help determine the correct estimators.

Introduction to DeclareDesign

  • See https://declaredesign.org/

  • Regardless of the method, research designs have four components

  • MIDA:

    • M: Model (of how the world works)
    • I: Inquiry
    • D: Data strategy
    • A: Answer strategy
  • Critical insight: Simulation of a research design teaches what answers a research design can find.

  • Working with simulated data before data collection helps prevent errors and oversights.

Model

  • A model of how we think the world works, including:

    • \(T\)s and \(X\)s (treatments or focal causal variables like policy interventions and other background variables)

    • \(Y\)s (dependent variables)

    • Relations between variables (potential outcomes, functional forms, auxiliary variables and contexts)

    • Probability distribution over \(X\)s if not also over \(Y\)s.

  • This is the theory!

    • Codified numerically.
  • The model is wrong by definition. If it were correct, you wouldn’t need to do the study.

  • But without a model, we don’t have a place to start to assess what can be learned.

Inquiry

  • An answerable question.

  • What is the effect of a treatment \(T\) on an outcome \(Y\) ?

  • Usually a quantity of interest, some summary of the data:

    • Descriptive: What is the mean of \(Y\) in treatment, formally.

    • Causal: What would be the average difference of \(Y\) if we switched treatment to control? If we claimed that \(T\) cannot cause \(Y\), how much evidence do we have about this claim?

    • Quantity is the estimand or hypothesis.

  • Not all questions that we want to ask are answerable.

    • And the range of inquiries we can ask are limited: how much can we learn from some summary quantity such as the average treatment effect (ATE)?

Data

  • Realize (generate) data on the set of variables (all \(X\)s, \(T\)s and \(Y\)s)

  • A function of your model

  • Includes both:

    • Sampling — how units arrive in your sample

    • Treatment assignment — what values of endogenous variables are revealed

Answer

  • Given a realization of the data, generate an answer – an estimate of the quantity of interest (inquiry)

  • This is your estimator or test:

    • Difference-in-means

    • \(t\)-test

    • Regression methods

    • etc.

  • Answer is an estimate of the quantity of interest or \(p\)-value (inquiry/estimand/test)

Pre-Registration of Analysis Plans

Bias in published research against null results

  • Anticipating or facing difficulties in getting published, manuscripts with null results are never submitted for review or put away in a “file drawer” after several rejections.

  • We all face incentives to change your specifications, measurements, or even hypotheses to get a statistically significant result (\(p\)-hacking) to improve chances of publication.

  • Even people not facing these incentives make many decisions when they analyze data: handling missing values and duplicate observations, creating scales, etc. And these choices can be consequential.

  • Overall result: reduced credibility for individual pieces of research and (rightly) reduced confidence in whether we actually know what we claim to know.

Towards review of design rather than outcomes

  • One part of solving this problem is to focus on the design, rather than the outcomes.

  • The bias against null results can be overcome by reviewing the design, prior to learning the results.

  • A good design executed well will produce credible research, which might be a null result. We want credible and actionable null results.

  • Reviews of designs are also an opportunity to improve the research before it is implemented.

Pre-registration of analysis plans and research designs I

  • Pre-registration is the filing of your research design and hypotheses with a publicly-accessible repository. EGAP hosts one that you can use for free (currently on OSF.io using the EGAP registration form).

  • Pre-registration does not preclude later exploratory analyses that were not stated in advance. You just have to clearly distinguish between the two.

Pre-registration of analysis plans and research designs II

  • Even if you will be submitting a paper with results rather than a design to an academic journal or you are primarily interested in a final report with findings for a policy audience, there are important advantages to you and to other researchers from pre-registering your research.

    • You can learn about other research, completed and in progress; others can learn about yours. We can learn about studies that produced null results.

    • It forces you to state your hypotheses and plan of analysis in advance of seeing the results, which limits \(p\)-hacking.

Summary

The research process: Questions, theory, and credibility

  • Research starts with our values and theories about how the world works.

  • It continues by articulating questions that can be clearly addressed by observation (in this course, using randomized experimentation).

  • Good questions have consequential answers: changing scientific explanations, changing policy decisions.

  • Good designs tick all the boxes and give readers reason to believe the results.

  • Checklists (like the research design form or pre-registration forms) help avoid careless errors.

  • Pre-registration further increases credibility and thus the odds of your results having an impact on science and policy.

References

References

Becker, Howard S. 1998. Tricks of the Trade : How to Think about Your Research While You’re Doing It (Chicago Guides to Writing, Editing, and Publishing). University Of Chicago Press.