A (not too serious) look at 20 years of Oxford PPE outcomes.
This is a prior, not a prediction.
Pick one paper per exam slot. Difficulty badges show spread (hover for definition).
This adjusts the model's baseline — a higher self-assessment shifts all your expected paper marks up.
Percentile: 50th — shifts paper means by +0.0 marks
Enter marks on papers you have a feel for — the rest will be simulated. For example: "If Ethics and IR are average but Micro goes badly, what do the others need to look like?"
Browse and compare 79 papers across Philosophy, Politics, and Economics.
Each dot is a paper — size proportional to candidate numbers, ✕ markers have limited data. Hover for details, click to see a full profile.
Click any paper above or browse below.
Select a paper from the scatter plot or the list below.
Oxford PPE results at a glance — drawn from 15 examiner reports (2011–2025), with candidate numbers back to 2005.
Percentage of candidates awarded a First, 2005–2025. The 2020 spike is COVID, not a cohort change.
First-class rate by gender where data is available. The gap has been persistent at roughly 8–10 percentage points.
Weighted averages across all papers in each subject.
Full distribution of results each year.
Mean marks for papers with statistically significant drift (p<0.05), coloured by subject. Most papers are stable — 7 of 64 show significant trends (6 rising, 1 falling). 2020 included (marking was normal; only classification was anomalous). 2023 has no per-paper data (marking boycott).
How each paper's share of all sittings has changed over time. Papers shown have a statistically significant linear trend (p<0.05) and more than 1pp total drift. Dot = average share in the first 3 years of data; arrowhead = average share in the most recent 3 years. 2023 has no data (marking boycott).
Each paper's mark distribution is modelled using a latent ability factor model. The idea: students have some underlying ability level, and each paper's marks reflect that ability plus paper-specific noise.
This page describes the statistical model behind the grade prior calculator. The approach uses a latent ability factor model fitted to 15 years of Oxford PPE examiners' reports (2011–2025, covering candidates going back to 2005), with Monte Carlo simulation to estimate classification probabilities.
Your mark on any particular paper can be modelled as a function of:
In expectation, a more academically-capable student will score higher across all their papers, but there's still randomness (e.g., from question selection, marker idiosyncrasies, and luck on the day).
Our model treats ability as a single shared parameter $\theta$, and then adds on independent random noise for each paper. In reality, there's more than one latent variable which affects marks (e.g., plausibly there are distinct features like "philosophy ability", "politics ability", and "economics ability"), but we don't have sufficiently granular data to reliably estimate these.
Since marks are bounded between 0 and 100 (and in practice cluster in the 55–75 range), we use a truncated normal distribution when fitting paper parameters. This prevents the model from placing probability mass on impossible marks like −5 or 110, and better fits the compression of marks near the boundaries of the scale.
Each paper's mark is modelled as:
$$\text{mark}_i = \mu_i + \sigma_i \sqrt{\rho} \cdot \theta + \varepsilon_i$$
where:
The $\sigma_i \sqrt{\rho}$ term means that ability matters more on high-spread papers — i.e., they're more discriminating.
The total variance in marks on paper $i$ is $\sigma_i^2$. The model splits this into:
So, doing well on one paper is (weak) Bayesian evidence that your $\theta$ is high, which in turn predicts slightly higher marks on your other papers.
The examiners' reports give band counts: for each paper, the number of candidates scoring 70+, 60–69, 50–59, 40–49, 30–39, and <30. We fit a truncated normal $\mathcal{N}(\mu_i, \sigma_i^2)$ truncated to $[0, 100]$ by maximising the multinomial log-likelihood of these bin counts. Band data is available from 2017 onwards, and is pooled across all years where it's available (2017–2022, 2024–2025), excluding 2020.
For earlier years (2011–2016), examiners' reports provide only the mean and standard deviation for each paper rather than full band counts. For these we use method-of-moments — i.e., simply setting $\mu_i$ and $\sigma_i$ equal to the observed sample mean and standard deviation.
In total, 63 papers are fitted by MLE on band data and 16 by method-of-moments from reported summary statistics.
The inter-paper correlation $\rho \approx 0.196$ is calibrated so that the model reproduces the observed ~23% first-class rate when averaged across the full ability distribution (integrating over $\theta \sim \mathcal{N}(0,1)$). This was done via binary search on 500k simulations with the 8 most popular papers.
Note that at $\theta = 0$ (the median student), the First rate is only ~11%. The population average is pulled up by the right tail — analogous to how mean income exceeds median income.
For reference, the classification rules given in the examination conventions are reproduced below:
| Class | Average ≥ | Additional requirement |
|---|---|---|
| 1st | 68.5 | ≥ 2 marks of 70+, no mark below 50 |
| 2.1 | 59.0 | ≥ 3 marks of 60+ |
| 2.2 | 49.0 | ≥ 3 marks of 50+ |
| 3rd | 40.0 | ≥ 3 marks of 40+ |
| Pass | 30.0 | — |
Because the rules are conjunctive, high-variance papers introduce additional risk. For instance, a single mark below 50 blocks a First regardless of average, making high-$\sigma$ papers risky even when their mean is above 70.
For a given set of 8 papers and ability percentile, the tool draws $N = 50{,}000$ independent exam sittings. Each draw:
The reported probability for each class is the empirical frequency across all $N$ draws. Uncertainty from finite simulation is negligible ($\lt 0.1\text{pp}$ at $N = 50{,}000$); the reported $\pm 3\text{pp}$ uncertainty reflects model limitations rather than Monte Carlo error.
Source data is extracted from Oxford PPE Final Honour School internal examiners' reports, 2011–2025. Mark distributions come from band data (2017+) or reported summary statistics (2011–2016). 79 papers are fitted in total.
Two years are worth special mention: