Threats to Validity: How to Handle Them in Your Defense or Viva
Examiners press on validity because it is where the link between your claims and your evidence either holds or breaks. Knowing the main types — internal, external, construct, and statistical conclusion validity — and being able to speak plainly about the threats you identified and the ones you chose to accept is one of the most reliable ways to demonstrate methodological command.
Last updated
Why examiners press on validity
A research claim is only as strong as the design that produced it. Examiners are not trying to disqualify your thesis when they ask about validity — they are checking whether you understand the limits of your own evidence. A candidate who can name the threats, trace their implications, and explain the trade-offs they accepted is showing exactly the kind of critical thinking a doctorate requires.
The questions tend to cluster around two concerns: whether the design produces the kind of evidence it claims to produce (internal and construct validity), and whether the results travel beyond the specific study (external validity). Statistical validity gets its own attention in quantitative work. These are not independent — a design weak on construct validity usually produces findings with unclear external validity too.
Internal validity: did the design support a causal claim?
Internal validity is the question of whether the relationship you found between variables is what you say it is, rather than an artefact of the design. It is most directly relevant when you are making causal claims — that X caused or influenced Y. If you are not making causal claims, say so clearly, and much of this pressure lifts.
The classic threats to internal validity include selection bias (the groups being compared were not equivalent before the study), history (something outside the study happened during data collection), maturation (participants changed over time for reasons unrelated to the intervention), and instrumentation (measuring instruments changed between time points). Not all threats apply to every design. Know which ones are plausible given your specific study.
How do you know the relationship you found isn't explained by a confounding variable?
The examiner wants to know whether you considered and, where possible, controlled for the most plausible alternatives. Name the confounds most relevant to your study, explain what you did about them (design controls, statistical controls, or transparent acknowledgment), and be clear about what remains unresolved. Claiming you have eliminated all confounds is not credible; claiming you identified and managed the most serious ones is.
You don't have a control or comparison group — how does that affect your conclusions?
Be direct about what the absence of a comparison group means for the causal claim. It likely means you can describe association or change over time but cannot attribute causation with confidence. If your research question was never causal — if you were mapping phenomena, not testing cause and effect — say so early and explain why the design is appropriate to the question.
External validity: do your findings travel?
External validity is about the scope of your findings — whether the results hold in other populations, settings, or time periods. Examiners press on this when the sample is small, specific, or convenient.
Generalisability depends on the type of research. A random probability sample from a defined population supports statistical generalisation to that population. A purposive case study does not support that kind of generalisation — but it may support analytic or theoretical generalisation: the claim that the mechanism, concept, or theory developed here applies in comparable contexts. These are legitimate but different claims. Know which one your design supports.
Your sample is small and specific — why should we believe these findings apply beyond it?
Do not defend a claim the design cannot make. Instead, restate what kind of generalisation your study does support. A twelve-participant interview study does not generate prevalence estimates, but it can generate theoretical insight or grounded conceptual categories that transfer to other settings where the underlying conditions are similar. Explain what you are and are not claiming, and be specific about the conditions under which transfer is plausible.
How representative is your sample of the population you are interested in?
Know the answer to this before you walk in. If the sample is representative by design (probability sampling), say how. If it is not representative by design (convenience, snowball, purposive), explain the logic of the selection and what it buys you for your specific question. Avoid the phrase 'this is a limitation but...' as an opener — state the implication first, then the reasoning.
Construct validity: are you measuring what you think you are?
Construct validity asks whether your operationalisation — the questionnaire items, interview questions, behavioural measures, or documentary sources you used — actually captures the theoretical concept you are studying. This is quietly one of the hardest questions in empirical research, and examiners who specialise in methodology return to it repeatedly.
Common threats include construct under-representation (the measure only captures part of the construct) and construct-irrelevant variance (the measure picks up noise unrelated to the construct). An anxiety questionnaire that also picks up general neuroticism, an interview protocol that prompts participants toward the answer — these are construct validity problems.
How did you operationalise [construct], and why is that a valid operationalisation?
Give the specific items, protocol, or measures you used, then explain the chain of reasoning: what theoretical definition you were working from, why the operationalisation captures that definition, and what alternatives you rejected and why. If you adapted an existing validated measure, explain the adaptation decisions. If you developed your own instrument, explain your validation steps.
Could participants have understood the questions differently from how you intended?
In survey and interview research, this is a specific construct validity threat. If you piloted the instrument, say what you found and how you revised it. If you did not pilot it, acknowledge that and assess the likely impact. Ambiguous constructs produce ambiguous data; examiners want to see you thought about this.
Statistical conclusion validity
Statistical conclusion validity is the most technically narrow of the four types and only directly applies to quantitative work. It asks whether you used statistical methods correctly and whether your conclusions about the relationship between variables are warranted by the statistics.
The main threats: insufficient statistical power (your sample is too small to detect a real effect reliably), fishing for significance across multiple comparisons without correction, violating the assumptions of the statistical test you used, and unreliable measurement reducing your ability to detect real signals. Know which of these is a plausible concern in your own study.
How did you determine your sample size?
If you conducted a power analysis, explain the inputs — effect size estimate, alpha level, desired power — and where the effect size estimate came from. If a power analysis was not feasible (as in much qualitative work, or secondary data analysis with a fixed dataset), explain the rationale for your sample and be honest about what this means for the precision of your estimates.
Reliability: consistency of measurement
Reliability is not the same as validity — a measure can be reliably wrong. A bathroom scale that consistently reads three kilos high is reliable but not valid. That said, unreliable measurement is a threat to validity: if your instrument is inconsistent, the variance it introduces will obscure real relationships and compromise the conclusions you can draw.
In quantitative work, reliability is assessed through internal consistency (e.g. Cronbach's alpha for scale items), test-retest reliability, or inter-rater reliability where multiple people code the same data. In qualitative work, the analogue is dependability — whether the process was documented and consistent enough that another researcher following the same logic would reach similar interpretations. Inter-rater agreement in qualitative coding, audit trails, and member-checking are the usual approaches.
The difference between a limitation and a fatal flaw
Every thesis has limitations. A limitation is a constraint you accepted knowingly, can explain, and whose implications you have thought through — it does not prevent the thesis from making a defensible contribution. A fatal flaw is a design problem that makes the central claim untenable. The examiner's job is partly to determine which you have.
A sample of convenience is almost always a limitation, not a flaw — provided you are making claims proportionate to what a convenience sample can support. A complete absence of any validity check on a central measure is closer to a flaw. Missing a major body of literature that directly challenges your theoretical framework is a flaw. The difference is whether the problem undermines the central argument or constrains how far it reaches.
This limitation seems like it could undermine your main finding — how do you respond?
Do not get defensive. Restate the finding, then trace the actual implication of the limitation for that specific finding. Often the limitation constrains generalisation or introduces uncertainty about mechanism, but does not make the finding itself wrong. If the limitation does materially weaken the finding, say so and explain why the thesis is still a defensible contribution at a reduced claim level. Examiners respect honesty; they distrust bravado.
Frequently asked questions
- Do I need to know all four types of validity for my defense?
- You need to know whichever types are relevant to your design, and be able to explain why others are less applicable. In quantitative causal research, all four are likely in play. In qualitative work, the trustworthiness framework may be more appropriate, but the underlying questions map closely onto the same concerns. Examiners are testing your ability to reason about the limits of your evidence — the framework is a tool for that, not an end in itself.
- What if an examiner identifies a validity threat I didn't address in the thesis?
- Acknowledge it. If the examiner has identified a genuine threat you overlooked, the strongest response is to assess its implication honestly — does it limit the scope of your conclusions, introduce uncertainty about a specific claim, or does it not materially affect the central argument? Arguing that the threat does not exist when it does is the wrong move. Examiners are more interested in whether you can think through the implication than whether you caught every threat in advance.
- How should I talk about sample size and generalisability without being defensive?
- Separate the question of size from the question of what kind of generalisation the design supports. A smaller sample is not inherently problematic — it depends entirely on what you are claiming. Restate the type of generalisation your design supports (analytic, theoretical, or statistical), name the conditions under which that generalisation is plausible, and be direct about what it does not support. Defensiveness usually comes from implicitly defending a claim the design cannot make.
- My thesis is qualitative — do these validity concepts apply to me?
- The concepts apply, but the terminology often differs. Qualitative researchers typically work with credibility instead of internal validity, transferability instead of external validity, dependability instead of reliability, and confirmability instead of objectivity. The underlying question — whether your interpretations are grounded, whether your findings transfer, whether your process was consistent — is the same. Use the framework appropriate to your paradigm and be ready to explain what each term means and how you addressed the underlying concern.
The MockDefense Committee
Doctoral defense preparation, MockDefense
MockDefense builds AI examiners that rehearse the questions a real doctoral committee asks — on methodology, contribution, and the gaps you haven't patched yet. Our guides are written from that examiner's-eye view of what defenses actually test.
Keep preparing
Practise before the real thing
Describe your dissertation in two minutes and let a simulated committee question you the way real examiners will. Start with a free drill.