Abstract
We conducted a small within-subjects pilot study (n = 20) to estimate the association between studying with Examo and exam performance. Each participant completed a matched baseline assessment before adopting Examo and an equivalent assessment after one academic term of use. Mean exam scores rose from 58.0% (SD = 9.2) to 75.4% (SD = 8.1) — an absolute gain of +17.4 percentage points and a relative improvement of approximately 30%. The change was statistically significant in a paired-samples t-test, t(19) = 9.98, p < 0.001, with a large effect size (Cohen's d = 2.23; 95% CI for the mean difference [13.8, 21.0]). In an ordinary least squares regression, average weekly hours of active Examo use predicted score improvement (b = 2.11 points per hour, SE = 0.46, p < 0.001), explaining roughly half of the variance (R-squared = 0.54). We report the full methodology and coefficients below, and we are candid about the limitations of an uncontrolled pilot: this is an encouraging early signal, not causal proof.
Why we ran this study
Marketing pages love a headline number. We wanted ours to mean something. Before publishing any figure about grade improvement, we ran a structured pilot, fixed the primary outcome in advance, logged usage automatically rather than by self-report where possible, and committed to publishing the limitations alongside the result. This article is that commitment.
Method
Participants
Twenty university students were recruited from a volunteer focus-group cohort across five disciplines. Participation was voluntary and unpaid, and students could withdraw at any time. Sample characteristics are summarised below.
| Characteristic | Value |
|---|---|
| Sample size (n) | 20 |
| Disciplines | Economics (5), Biology (4), Engineering (4), Psychology (4), Law (3) |
| Year of study | First (6), Second (8), Third (6) |
| Mean age | 20.4 years (SD = 1.6) |
| Gender | 11 female, 9 male |
| Baseline attainment band | C to A minus (mixed) |
Design
We used a within-subjects (repeated-measures) pre/post design, so each student served as their own control. The "pre" measure was a matched assessment taken under standard conditions before the student began using Examo on the focal course. The "post" measure was an equivalent assessment of comparable scope and difficulty taken after one term (approximately six weeks) of Examo use on the same course.
Measures
- Primary outcome: exam score, expressed as a percentage on matched assessments.
- Primary predictor: average weekly hours of active Examo use, captured automatically from product usage logs.
- Covariate: baseline exam score, used to test whether starting attainment moderated the size of the gain.
Analysis
We computed descriptive statistics for the pre and post conditions, tested the mean difference with a paired-samples t-test, estimated the effect size with Cohen's d for paired data, and fitted two ordinary least squares (OLS) regression models predicting the score gain. Alpha was set at 0.05 (two-tailed).
Results
Descriptive statistics
| Measure | Pre-Examo | Post-Examo |
|---|---|---|
| Mean exam score | 58.0% | 75.4% |
| Standard deviation | 9.2 | 8.1 |
| Minimum | 41% | 60% |
| Maximum | 74% | 91% |
The absolute gain was +17.4 percentage points. Relative to the baseline mean, that is a 17.4 / 58.0 = 30.0% improvement — the source of the "30% average grade boost" figure.
Significance of the change
A paired-samples t-test confirmed the increase was unlikely to be due to chance.
| Statistic | Value |
|---|---|
| Mean difference (post minus pre) | +17.4 pp |
| SD of differences | 7.80 |
| Standard error | 1.74 |
| t (df = 19) | 9.98 |
| p-value | < 0.001 |
| 95% CI of the difference | [13.8, 21.0] |
| Cohen's d (paired) | 2.23 |
A Cohen's d of 2.23 is a large effect by conventional benchmarks (d of 0.8 or above). The 95% confidence interval excludes zero and, in this sample, excludes any improvement smaller than about 14 percentage points.
Does more usage predict more improvement?
We regressed the score gain on average weekly hours of active Examo use.
Model 1 — simple linear regression
Gain = b0 + b1 × (weekly hours)
| Term | Coefficient | SE | t | p |
|---|---|---|---|---|
| Intercept (b0) | 4.81 | 2.34 | 2.06 | 0.054 |
| Weekly hours (b1) | 2.11 | 0.46 | 4.59 | < 0.001 |
Model fit: R-squared = 0.54, adjusted R-squared = 0.51, F(1, 18) = 21.1, p < 0.001.
Interpretation: each additional hour per week of active use was associated with roughly a 2-point gain in exam score, and weekly usage alone explained about 54% of the variance in improvement. This dose-response relationship is what you would hope to see if the tool — rather than something incidental — is doing the work.
Model 2 — adding baseline attainment
Gain = b0 + b1 × (weekly hours) + b2 × (baseline score)
| Term | Coefficient | SE | t | p |
|---|---|---|---|---|
| Intercept (b0) | 14.2 | 6.10 | 2.33 | 0.032 |
| Weekly hours (b1) | 1.94 | 0.44 | 4.41 | < 0.001 |
| Baseline score (b2) | -0.18 | 0.09 | -2.00 | 0.062 |
Model fit: R-squared = 0.59, adjusted R-squared = 0.54.
The usage coefficient stayed positive and significant after controlling for baseline. The baseline coefficient was negative and marginal (p = 0.062), hinting that lower-scoring students tended to gain more — consistent both with having more headroom and with ordinary regression to the mean.
Qualitative feedback
In the post-study focus group, 18 of 20 participants said Examo made revision feel "more efficient," and 16 of 20 said they spent less total time studying than in the previous term. The most common themes were faster access to the right material, exam-style practice, and on-demand tutoring from Loki AI. Two participants reported no meaningful change in their study habits.
Discussion
Within this cohort, a term of Examo use was associated with a large, statistically significant improvement in exam scores — about 30% relative to baseline — and the improvement scaled with how much students actually used the product. The direction and size are consistent with the broader cognitive-science literature on retrieval practice and spaced repetition, which Examo's summaries, practice questions, and Loki AI tutoring are designed to put into practice.
That said, an honest reading requires equal attention to what this study cannot show.
Limitations
This is a pilot, and it should be read as one.
- No control group. A within-subjects design cannot, on its own, separate the effect of Examo from maturation, growing familiarity with the course, or concurrent studying from other sources.
- Small, self-selected sample. Twenty volunteers from a single focus-group cohort are not representative of all students, and volunteers may be more motivated than average.
- Regression to the mean. Students measured when scoring relatively low will, on average, score higher on re-test regardless of any intervention.
- Possible Hawthorne effect. Knowing they were part of a study may have changed how hard participants worked.
- Association, not proof. The usage-to-improvement relationship is correlational; heavier users may simply be more diligent students.
- Short duration and single institution. One term at one institution limits how far the result generalises.
In short: the result is a promising signal, not a causal claim. We are now designing a larger, randomised controlled trial with an independent comparison group, a pre-registered analysis plan, and blinded marking to test whether the effect holds once these confounds are removed.
Conclusion
In a 20-student within-subjects pilot, studying with Examo was associated with an approximately 30% improvement in average exam scores (58.0% to 75.4%; paired t(19) = 9.98, p < 0.001; Cohen's d = 2.23), with a clear dose-response link between usage and improvement (b1 = 2.11 points per weekly hour, p < 0.001). We are encouraged by the size of the effect and equally clear-eyed about the limits of an uncontrolled pilot. A controlled trial is the next step.
How the statistics were computed
Paired t-test: t = mean difference divided by (SD of differences divided by the square root of n). Effect size: Cohen's d = mean difference divided by SD of differences. Regression: ordinary least squares, with coefficients estimated by minimising squared residuals; R-squared is the proportion of variance in the score gain explained by the model. All analyses used a two-tailed alpha of 0.05.
Want to see what a term with Examo does for you? It is free to start at examo.me/signup.
Stop revising. Start retaining.
Examo's Loki AI builds personalised practice questions, marks your answers instantly, and tracks every topic you find difficult — so you spend time on what actually moves your grade.
Keep reading
StudyFetch Alternatives: Best AI Study Tools for Exam Practice in 2026
Looking for a StudyFetch alternative? Compare Examo, Quizlet, Knowt, NotebookLM, Mindgrasp, TurboLearn, StudyX, and more by notes, flashcards, AI tutors, and exam-style practice.
Quizlet Alternatives: Best Flashcard and AI Practice Apps for 2026
Compare the best Quizlet alternatives for students who want flashcards, spaced repetition, AI notes, practice tests, and exam-style question generation.