A (not too serious) look at 15 years of Oxford PPE results.
This is a prior, not a prediction.
Pick one paper per exam slot. Difficulty badges show volatility (hover for definition).
This adjusts the model's baseline — a higher self-assessment shifts all your expected paper marks up.
Percentile: 50th — shifts paper means by +0.0 marks
Enter marks on papers you have a feel for — the rest will be simulated. For example: "If Ethics and IR are average but Micro goes badly, what do the others need to look like?"
Browse and compare 79 papers across Philosophy, Politics, and Economics.
Each dot is a paper. Hover for details. Bubble size reflects average recent candidate numbers (2019–2025). X-axis: mean mark (higher = easier). Y-axis: standard deviation (higher = more volatile outcomes).
Click any paper above or browse below.
Select a paper from the scatter plot or the list below.
15 years of Oxford PPE results at a glance.
Percentage of candidates awarded a First, 2005–2025. The 2020 spike is COVID, not a cohort change.
First-class rate by gender where data is available. The gap has been persistent at roughly 8–10 percentage points.
Weighted averages across all papers in each subject.
Full distribution of results each year.
Mean marks for papers with statistically significant drift (p<0.05), coloured by subject. Most papers are stable — 7 of 64 show significant trends (6 rising, 1 falling). 2020 included (marking was normal; only classification was anomalous). 2023 has no per-paper data (marking boycott).
How each paper's share of all sittings has changed over time. Papers shown have a statistically significant linear trend (p<0.05) and more than 1pp total drift. Dot = average share in the first 3 years of data; arrowhead = average share in the most recent 3 years. 2023 has no data (marking boycott).
Each paper's mark distribution is modelled using a latent ability factor model. The idea: students have some underlying ability level, and each paper's marks reflect that ability plus paper-specific noise.
This page describes the statistical model behind the grade prior calculator. The approach uses a latent ability factor model fitted to 15 years of Oxford PPE examiners' report data, with Monte Carlo simulation to estimate classification probabilities.
Each paper's mark is modelled as a draw from:
$$\text{mark}_i = \mu_i + \lambda_i \cdot \theta + \varepsilon_i$$
The key idea is a variance decomposition: the total spread in marks on any paper ($\sigma_i^2$) is split into two components:
The parameters:
Marks are bounded in $[0, 100]$ and typically clustered in the 55–75 range. An ordinary normal would place implausible probability mass below 0 or above 100. The truncated normal respects these bounds and better fits the observed compression of marks near class boundaries.
The examiners' reports give band counts: for each paper, the number of candidates scoring 70+, 60–69, 50–59, 40–49, 30–39, and <30. We fit a truncated normal $\mathcal{N}(\mu_i, \sigma_i^2)$ truncated to $[0, 100]$ by maximising the multinomial log-likelihood of these bin counts. Data is pooled across all available years (2017–2022, 2024–2025) excluding 2020.
The inter-paper correlation $\rho \approx 0.196$ is calibrated so that the model reproduces the observed ~23% first-class rate when averaged across the full ability distribution (integrating over $\theta \sim \mathcal{N}(0,1)$). This was done via binary search on 500k simulations with the 8 most popular papers.
Note that at $\theta = 0$ specifically (the median student), the First rate is only ~11%. The population average is pulled up by the right tail — analogous to how mean income exceeds median income.
PPE uses conjunctive classification — candidates need both an average threshold and a minimum count of papers at the relevant mark level:
| Class | Average ≥ | Additional requirement |
|---|---|---|
| 1st | 68.5 | ≥ 2 marks of 70+, no mark below 50 |
| 2.1 | 59.0 | ≥ 3 marks of 60+ |
| 2.2 | 49.0 | ≥ 3 marks of 50+ |
| 3rd | 40.0 | ≥ 3 marks of 40+ |
| Pass | 30.0 | — |
These conjunctive rules create non-linear interactions with paper variance — for instance, a single mark below 50 vetoes a First regardless of average, making high-$\sigma$ papers risky even when their mean is above 70.
For a given set of 8 papers and ability percentile, the tool draws $N = 50{,}000$ independent exam sittings. Each draw:
The reported probability for each class is the empirical frequency across all $N$ draws. Uncertainty from finite simulation is negligible ($\lt 0.1\text{pp}$ at $N = 50{,}000$); the reported $\pm 3\text{pp}$ uncertainty reflects model limitations rather than Monte Carlo error.
Source data is extracted from Oxford PPE Final Honour School internal examiners' reports, 2011–2025. Mark distributions come from band data: the percentage of candidates achieving each class-level mark on each paper. 79 papers are fitted in total — 63 via MLE on band data, 16 via method-of-moments from reported mean and standard deviation.
Two years receive special treatment: