How much do I need to know about the replication crisis to defend a psychology dissertation?

Enough to situate your own work. You need to know the general finding — that replication rates in social and cognitive psychology were substantially lower than expected — and you need to know the methodological factors associated with lower replicability: small samples, low power, p-value thresholds without effect size reporting, selective reporting. You do not need to have memorised every failed replication, but you should know whether any core effects your thesis builds on have a contested replication record.

My study was not pre-registered. Will examiners view that negatively?

Not automatically. Pre-registration became more common after 2015, but the majority of completed psychology dissertations are not pre-registered. What matters is whether you distinguish clearly between confirmatory and exploratory analyses in your write-up, and whether you present exploratory findings as hypothesis-generating rather than as confirmed effects. Examiners who care about open science practices will not penalise you for not pre-registering; they will penalise you for treating exploratory analyses as if they were confirmatory ones.

Can I say my sample size was limited by time and funding constraints?

Yes, and you should. Pragmatic constraints are real and examiners understand them. The key is to say explicitly what your achieved power was given those constraints, and to be precise about what that means for interpreting null results. A study that was constrained to 60 participants per group when 120 were needed for 80% power does not report 'no significant effect was found' as a clean negative result — it reports a non-significant result from a study with insufficient power to detect the target effect.

An examiner asked about a measurement invariance test I never ran. What should I say?

Acknowledge the gap and say specifically what it means for your comparisons. If you compared construct means across groups without testing measurement invariance, the honest position is that you assumed invariance rather than tested it, and that a failure of scalar invariance would mean your group difference reflects measurement non-equivalence rather than true differences in the construct. Offering to analyse this as a revision is appropriate if the comparison was central to your argument.

How do I handle a question about ethics if my study used deception and I had some participant distress?

Describe what happened factually: how many participants indicated distress, what your protocol was, what actually occurred, and whether any changes were made to the procedure as a result. Examiners are not looking for a clean outcome — they are looking for a researcher who monitored the study, responded appropriately, and can reflect on what the experience showed about the protocol's adequacy. Distress that was managed well is not a failing; distress that was not monitored is.

Guide

Psychology Dissertation Defense Questions: What Examiners Actually Ask

Psychology committees probe five areas most other disciplines do not: whether your statistical power was adequate before you collected data, whether your measures actually capture the constructs they claim to, how your findings hold up against the replication crisis, what alternative explanations survive your design, and how you handled the ethical tensions specific to psychological research. This page covers 25 questions drawn from those areas, with examiner-side notes on what each one is testing.

Last updated June 14, 2026

What psychology defenses probe that others don't

The core questions at any defense — why this design, what do the findings mean, what are the limits — apply across disciplines. What distinguishes a psychology defense is where those questions land hardest.

Psychology committees carry specific methodological concerns into the room. Statistical power and sample size justification are treated as first-order issues, not footnotes. Construct validity — whether your operationalisation of 'anxiety' or 'working memory' or 'implicit bias' actually captures what you claim — is contested ground. And since the early 2010s, most examiners in quantitative psychology will ask, explicitly or implicitly, how your work sits relative to the replication crisis: whether you pre-registered, whether your analyses were confirmatory or exploratory, whether you report effect sizes.

UK candidates face these questions in a viva voce, typically with two examiners over two to three hours. US candidates face them from a committee at the end of their oral presentation. The questions are largely the same; the pacing and sequencing differ.

Statistical power and sample size

Sample size justification is the first place many psychology examiners go in a quantitative defense. The question is not just 'is your N large enough?' — it is 'what principled basis did you use to determine what was large enough before you started?'

How did you justify your sample size?

Examiners are checking whether your N was determined a priori, based on an expected effect size and target power level, or post hoc after the data were already in hand. The acceptable answers vary by design: a power analysis with an effect size drawn from a meta-analysis or a well-powered prior study is strongest; if you used a rule of thumb, name it and say explicitly what it costs you. An honest 'this was a convenience sample and I powered down' is better than a retroactive power calculation presented as prospective planning.

What power did your study have to detect the effect you were looking for, and what effect size did you assume?

This is the harder version of the sample size question. Examiners want to see you distinguish between the effect size you planned for and the effect size you found — and, if your study was underpowered, that you can say clearly what that means for interpreting a non-significant result. A non-significant finding from an underpowered study is uninformative; saying so yourself, before the examiner points it out, is the right move.

Your study did not find a significant effect. Does that mean there is no effect?

A null result from an adequately powered study that tested a precise hypothesis is evidence of no effect. A null result from an underpowered study is not. Examiners want to hear you work through this distinction — power, the width of your confidence interval, and what effect size you can rule out based on your data. Saying 'the result was non-significant, so I conclude there is no effect' without engaging with power will draw a follow-up.

Why did you report p-values rather than effect sizes, or vice versa?

Reporting p-values without effect sizes is harder to defend than it was a decade ago. If you reported both, explain how you interpreted each. If you reported only p-values, be ready to say what effect sizes would have added and why you did not include them. Examiners in effect-size-conscious departments will want to see Cohen's d, eta-squared, or equivalent alongside significance — and they will ask about the practical magnitude of the effects you found, not just whether they crossed .05.

Construct validity and measurement

Psychology measures human constructs — concepts that cannot be directly observed. That indirection is where examiners apply the most pressure, because the distance between the construct and the operationalisation is where the interesting epistemological work lives.

How do you know your measures actually captured the constructs they were intended to measure?

Examiners are asking about the construct validity evidence for the instruments you used, and for any adaptations you made to them. If you used an established scale, point to the validation studies behind it and note any known weaknesses. If you adapted a scale or created your own items, describe what evidence you collected — factor structure, internal consistency, convergent and discriminant correlations. The claim 'I used a validated scale' stops too soon if you cannot say what that validation showed.

What is the difference between convergent and discriminant validity, and how did you address both?

Convergent validity — your measure correlating with other measures of the same construct — is the easier half. Discriminant validity — your measure not correlating with measures of different constructs — is where most measurement stories get complicated. Examiners want to see that you tested both, not just one. If your anxiety measure correlates strongly with your depression measure, that is a discriminant validity problem worth addressing directly rather than hoping the committee does not notice.

Did your measures perform the same way across all participant groups you compared?

This is a measurement invariance question, and it comes up whenever you compare construct means across groups — gender, clinical vs. non-clinical, different nationalities, pre- and post-intervention. If you ran a multi-group comparison without testing for configural, metric, and scalar invariance, an examiner may ask whether your group differences are real or a measurement artefact. Know whether your analysis software tested this and what the results were.

Could your findings reflect response bias rather than genuine differences in the construct?

Self-report measures are vulnerable to acquiescence bias, social desirability, and demand characteristics. Examiners ask this to see whether you built in any checks — reverse-scored items, bogus items, implicit measures, behavioural outcomes — and whether you can distinguish what your design actually ruled out from what it assumed away. Naming the specific bias risk relevant to your measures is better than a generic acknowledgement that self-report has limitations.

You used [scale name] — are you aware of the criticisms of that measure in the recent literature?

Examiners sometimes ask this about measures with known contested validity, particularly scales that have been challenged during the post-replication-crisis measurement re-examination. The answer expected is not a defense of the scale at all costs — it is an honest account of what the criticisms are, what they mean for the construct you were studying, and whether you took any steps to address or mitigate them. Not knowing the criticism is worse than knowing it and having an argument about it.

The replication crisis and research practices

Since the Open Science Collaboration's 2015 replication attempt — in which only about 36 of 97 significant psychology findings replicated — examiners in quantitative psychology have been asking explicit questions about research practices. Candidates who have not thought about this will be uncomfortable. Those who have will find it relatively easy ground.

Did you pre-register your study? If not, how do you distinguish your confirmatory from your exploratory analyses?

Pre-registration matters here because it fixes the distinction between a hypothesis test and a pattern search before the data arrive. If you did pre-register, point to it. If you did not, the expected answer is not 'I should have' — it is an honest account of which hypotheses were specified in advance, which analyses were planned, and which emerged from looking at the data. Framing exploratory findings as hypothesis-generating rather than confirmatory is the right epistemic stance, and examiners will notice whether you use that framing.

How confident are you that your significant findings would replicate in an independent sample?

Not a gotcha — a genuine question about how seriously you have thought about evidential weight. Factors that speak to replicability include effect size (larger effects replicate better), whether the effect appeared in every cell of your design or only some, whether you ran any internal replications, and how your finding relates to the broader literature. Saying 'I cannot be certain, but here is what the pattern of evidence suggests' is more credible than either 'absolutely' or 'probably not'.

How did you decide which analyses to report? Were any analyses conducted but not included in the thesis?

This is the direct questionable research practices question. Selective reporting — running many analyses and reporting only the significant ones — inflates false positive rates. Examiners are checking whether your reported analyses were the planned analyses or a subset of a larger exploratory effort. If you ran additional analyses that did not reach significance, the expected answer is that you report them as exploratory or in a supplement, not that they do not appear because they were 'not relevant'. Transparency here is straightforward to demonstrate if you were transparent in the work itself.

How does your work relate to previous failed replications of findings in this area?

For any active area of psychology research, there are probably attempted replications of foundational findings. Examiners expect you to know the replication record of the core effects you are building on or challenging. If a finding your thesis depends on has a poor replication record, your examiner may ask how that affects the theoretical scaffolding of your argument. Knowing the literature includes knowing where the bodies are.

Internal validity, confounds, and alternative explanations

Internal validity questions come in two forms: the broad 'what threatens your conclusions?' and the targeted 'could X explain your finding instead?' Examiners in psychology are trained to generate alternative accounts of data, and they will test whether you have done the same.

What are the main confounds in your study, and how did you address them?

Name the confounds specific to your design — not a generic list of threats to internal validity copied from a methods textbook. If you studied cognitive training effects, address practice effects and regression to the mean. If you studied implicit bias, address measurement reliability and the specific critique that IAT scores may not predict behaviour. Then say what you did: random assignment, covariates, active control conditions, statistical controls. Finally, be clear about which confounds remain and what they limit.

Could [specific alternative mechanism] explain your findings just as well as your theoretical account?

The examiner will name a specific alternative — arousal rather than attention, demand characteristics rather than genuine attitude change, social facilitation rather than the construct you measured. The expected answer works through whether the alternative is consistent with the full pattern of your data, not just the headline finding. If the alternative cannot account for a secondary result or the condition ordering, say so explicitly. If it remains viable, say what design addition would distinguish the two accounts.

Did you test for the role of mediators or moderators, and if not, why not?

A significant main effect tells you something happened; a mediation analysis begins to tell you why; a moderation analysis tells you for whom or under what conditions. Examiners ask this when a finding's theoretical interpretation depends on a mechanism that was not directly tested. If you have mediation analyses, explain what they add and what their limits are — mediation from cross-sectional data does not establish causal ordering. If you did not test for mediators, have a clear account of why that was beyond the scope of the study rather than simply an omission.

How do your findings generalise beyond your specific sample?

External validity in psychology rests on more than representativeness. WEIRD samples — Western, Educated, Industrialised, Rich, Democratic — are the norm in published psychology, and examiners will ask directly whether your findings are bounded to that population or speak more broadly. Be precise: 'this effect has been demonstrated in East Asian samples with similar results' is better than 'future research should test this cross-culturally'. Name what you know about the boundary conditions of your effect.

Theoretical framing and competing theories

Psychology is not short of competing theoretical accounts for most phenomena. Examiners test whether you chose a framework deliberately, understand its competitors, and can defend why your data bear on your chosen account rather than the alternatives.

Why did you use [theoretical framework X] rather than [theoretical framework Y]?

Examiners are not expecting you to have solved the theoretical debate. They want to see that you made a deliberate choice, can articulate what framework X predicts that Y does not, and can say why your research question was better suited to one than the other. Saying 'X is more widely used' is not a theoretical justification. Saying 'X makes a more specific prediction about the direction of effect in my conditions, whereas Y leaves that direction underspecified' is.

How do your findings sit with theories that predict the opposite result?

Every active research area has competing accounts. If your findings support theory A, an examiner may ask how they are inconsistent with theory B, which your literature review will have mentioned. The right answer distinguishes between findings that are evidence against B and findings that are merely consistent with A while leaving B unaddressed. Claiming your results 'disprove' a competing theory when your design was not built to test it directly is a position an examiner will push back on.

What is the most important theoretical contribution your thesis makes?

Examiners ask this to see whether you can name a specific, bounded contribution rather than a vague advance. 'This thesis extends the literature on X' is not a contribution statement. 'This thesis provides the first evidence that effect X depends on condition Y, which the standard dual-process model does not predict' is a contribution statement. Be specific, and be honest about the magnitude — a careful, well-powered test of an existing hypothesis is a genuine contribution even if it does not overturn a theory.

Are there findings in your own data that your theoretical framework struggles to explain?

Examiners value intellectual honesty more than a tidy theoretical narrative. If there are anomalous cells in your design, unexpected interactions, or secondary findings that sit awkwardly with your framework, name them before the examiner does. Saying 'I found this pattern in condition 3 that my account does not predict cleanly — it might suggest [alternative mechanism]' is the kind of critical self-appraisal that distinguishes a researcher from a student who memorised an argument.

Ethics — deception, debriefing, and vulnerable groups

Psychology has a sharper ethics conversation than most disciplines because of its history with deception and its frequent work with clinical, child, and vulnerable populations. Examiners treat ethical questions as substantive methodological questions, not compliance formalities.

Your study used deception. How did you justify that, and what did your debriefing procedure involve?

Deception requires two justifications: that the research could not be conducted without it, and that the information withheld would not have altered participants' willingness to take part in ways that matter. Examiners want to see that you worked through both, not just ticked a box. Describe the debriefing procedure — what you told participants, when, how you monitored for distress, and what happened if a participant remained uncomfortable. The debriefing is not a formality; it is part of the design.

How did you assess and manage the risk of psychological distress in your participants?

This applies beyond clinical research. Any study touching trauma, stigma, sensitive attitudes, or performance under pressure needs an account of distress monitoring and participant care. Examiners ask whether you built stopping rules or distress protocols into the procedure, not just whether you included a wellbeing statement in the information sheet. If no participants triggered the protocol, say so — that is relevant data on whether your risk assessment was accurate.

If your sample included a vulnerable or clinical population, how did you balance scientific validity with participant welfare?

The expected answer is not 'welfare always wins' or 'scientific rigour comes first'. Examiners want to see that you worked through the specific tensions in your study — whether inclusion criteria excluded participants who might have enriched the sample but faced higher burden, whether the control group was adequately protected, and whether the research question was worth the burden placed on that population. The ethics committee approval is the starting point, not the end of the analysis.

How did you ensure your consent procedures were adequate for your population?

Standard consent procedures are designed for competent adults who speak the language of the information sheet. Examiners probe consent more closely when your sample was younger, had cognitive impairments, was in an institutional context, or could not be fully informed in advance because of the deceptive elements of the study. Name what you adapted and why.

How to prepare specifically for psychology defense questions

Most of these questions can be answered well if you have genuinely thought through the methodological choices in your thesis rather than written them up from habit. The preparation that matters is not rehearsing answers — it is re-reading your methods and results chapters looking for every place where a choice was made without a fully articulated justification.

For each measure you used: know the validation evidence, the internal consistency in your data, any known criticisms in recent literature. For each analysis: know whether it was planned or exploratory, and what the effect sizes were. For the ethics: be able to describe the debriefing from memory.

The questions about the replication crisis are not traps. Examiners are not expecting you to have solved the problem. They are checking that you are a researcher who knows the methodological context of your field and can situate your own work within it honestly. That is a low bar to clear if you have been paying attention to the field.

UK viva candidates should also be prepared for examiners to stay on a single topic for fifteen or twenty minutes. The question count matters less than depth. US defense candidates tend to face more breadth, with the committee moving across topics more quickly. In both formats, the psychology-specific questions in this guide are more likely to come from methodologists on the committee than from the theorists — though theorists will push on the framework sections.

Frequently asked questions

How much do I need to know about the replication crisis to defend a psychology dissertation?: Enough to situate your own work. You need to know the general finding — that replication rates in social and cognitive psychology were substantially lower than expected — and you need to know the methodological factors associated with lower replicability: small samples, low power, p-value thresholds without effect size reporting, selective reporting. You do not need to have memorised every failed replication, but you should know whether any core effects your thesis builds on have a contested replication record.
My study was not pre-registered. Will examiners view that negatively?: Not automatically. Pre-registration became more common after 2015, but the majority of completed psychology dissertations are not pre-registered. What matters is whether you distinguish clearly between confirmatory and exploratory analyses in your write-up, and whether you present exploratory findings as hypothesis-generating rather than as confirmed effects. Examiners who care about open science practices will not penalise you for not pre-registering; they will penalise you for treating exploratory analyses as if they were confirmatory ones.
Can I say my sample size was limited by time and funding constraints?: Yes, and you should. Pragmatic constraints are real and examiners understand them. The key is to say explicitly what your achieved power was given those constraints, and to be precise about what that means for interpreting null results. A study that was constrained to 60 participants per group when 120 were needed for 80% power does not report 'no significant effect was found' as a clean negative result — it reports a non-significant result from a study with insufficient power to detect the target effect.
An examiner asked about a measurement invariance test I never ran. What should I say?: Acknowledge the gap and say specifically what it means for your comparisons. If you compared construct means across groups without testing measurement invariance, the honest position is that you assumed invariance rather than tested it, and that a failure of scalar invariance would mean your group difference reflects measurement non-equivalence rather than true differences in the construct. Offering to analyse this as a revision is appropriate if the comparison was central to your argument.
How do I handle a question about ethics if my study used deception and I had some participant distress?: Describe what happened factually: how many participants indicated distress, what your protocol was, what actually occurred, and whether any changes were made to the procedure as a result. Examiners are not looking for a clean outcome — they are looking for a researcher who monitored the study, responded appropriately, and can reflect on what the experience showed about the protocol's adequacy. Distress that was managed well is not a failing; distress that was not monitored is.

The MockDefense Committee

Doctoral defense preparation, MockDefense

MockDefense builds AI examiners that rehearse the questions a real doctoral committee asks — on methodology, contribution, and the gaps you haven't patched yet. Our guides are written from that examiner's-eye view of what defenses actually test.

Keep preparing

Practice these questions before the day

Reading questions is not the same as answering them under pressure. MockDefense puts you in front of an AI examining committee that asks the psychology-specific questions your real examiners will ask — statistical power, construct validity, replication, ethics — and pushes back on vague answers the same way a live examiner would.