| Lakatos & Lan (1992) comparison with East, nQuery and R packages | ||||||||||
| Survival at 10 years; Study duration = 10 years; α = 0.05; β = 0.1 | ||||||||||
| Survival | HR | Accrual | n_rpact | n_rashnu | n_gsdesign2 | n_east | n_nquery | n_L | n_F | n_RGS |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.8 | 0.667 | 1 | 1588 | 1620 | 1610 | 1593 | 1640 | 1617 | 1628 | 1640 |
| 0.8 | 0.667 | 5 | 1979 | 2022 | 2007 | 1984 | 2046 | 2017 | 2024 | 2046 |
| 0.8 | 0.667 | 9 | 2671 | 2728 | 2710 | 2679 | 2764 | 2724 | 2709 | 2764 |
| 0.8 | 0.500 | 1 | 601 | 638 | 625 | 604 | 664 | 638 | 649 | 664 |
| 0.8 | 0.500 | 5 | 749 | 798 | 781 | 753 | 832 | 798 | 807 | 831 |
| 0.8 | 0.500 | 9 | 1011 | 1078 | 1055 | 1017 | 1124 | 1079 | 1081 | 1124 |
| 0.8 | 0.250 | 1 | 181 | 230 | 214 | 182 | 270 | 230 | 241 | 269 |
| 0.8 | 0.250 | 5 | 225 | 290 | 267 | 227 | 340 | 289 | 299 | 338 |
| 0.8 | 0.250 | 9 | 304 | 392 | 362 | 306 | 460 | 392 | 401 | 459 |
| 0.2 | 0.667 | 1 | 361 | 362 | 362 | 362 | 364 | 360 | 370 | 363 |
| 0.2 | 0.667 | 5 | 414 | 416 | 416 | 415 | 420 | 414 | 419 | 418 |
| 0.2 | 0.667 | 9 | 527 | 530 | 530 | 528 | 534 | 528 | 509 | 534 |
| 0.2 | 0.500 | 1 | 133 | 134 | 135 | 134 | 138 | 134 | 144 | 138 |
| 0.2 | 0.500 | 5 | 154 | 156 | 157 | 155 | 162 | 156 | 164 | 161 |
| 0.2 | 0.500 | 9 | 196 | 200 | 201 | 197 | 208 | 200 | 200 | 207 |
| 0.2 | 0.250 | 1 | 40 | 44 | 44 | 40 | 50 | 43 | 53 | 48 |
| 0.2 | 0.250 | 5 | 46 | 52 | 51 | 47 | 58 | 51 | 61 | 58 |
| 0.2 | 0.250 | 9 | 59 | 68 | 66 | 60 | 78 | 66 | 74 | 76 |
| Blue: Computed sample sizes | ||||||||||
| Brown: Sample sizes from the 1992 article | ||||||||||
Comparison of Lakatos & Lan (1992), nQuery, EAST and R packages
This analysis started from a simple question:
nQuery cites Lakatos and Lan (1992) as a reference, but which method from that article does nQuery actually implement?
Because Lakatos and Lan (1992) is often cited in a generic way, it’s easy to forget that the paper actually compared three distinct methods for calculating sample sizes under proportional hazards:
- Freedman – a simple exponential approximation that estimates the required number of events for a log-rank test under proportional hazards.
- RGS (Rubin–Gail–Santner) – an improved approximation that refines the variance calculation of the log-rank test and better accounts for the timing of events.
- Lakatos – a more general piecewise-exponential approach that models accrual, follow-up, and hazard rates over time, providing more realistic sample size calculations.
To understand which method is implemented by each contemporary tool, we reproduced the set of scenarios from the 1992 article and compared the sample sizes obtained from:
- nQuery
- East
- Rpact
- gsDesign2
- Rashnu
against the benchmark values for Lakatos, Freedman, and RGS. We rely on rashnu::LakatosSampleSize as an R implementation of the original Lakatos method, allowing a direct comparison between our results and the classical Lakatos benchmarks.
Results
Here is the comparison of sample sizes calculated by the five software/packages (blue) versus those reported in the article (brown).
| Lakatos & Lan (1992) comparison with East, nQuery and R packages | ||||||||||
| Survival at 10 years; Study duration = 10 years; α = 0.05; β = 0.1 | ||||||||||
| Survival | HR | Accrual | n_rpact | n_rashnu | n_gsdesign2 | n_east | n_nquery | n_L | n_F | n_RGS |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.8 | 0.667 | 1 | 1588 | 1620 | 1610 | 1593 | 1640 | 1617 | 1628 | 1640 |
| 0.8 | 0.667 | 5 | 1979 | 2022 | 2007 | 1984 | 2046 | 2017 | 2024 | 2046 |
| 0.8 | 0.667 | 9 | 2671 | 2728 | 2710 | 2679 | 2764 | 2724 | 2709 | 2764 |
| 0.8 | 0.500 | 1 | 601 | 638 | 625 | 604 | 664 | 638 | 649 | 664 |
| 0.8 | 0.500 | 5 | 749 | 798 | 781 | 753 | 832 | 798 | 807 | 831 |
| 0.8 | 0.500 | 9 | 1011 | 1078 | 1055 | 1017 | 1124 | 1079 | 1081 | 1124 |
| 0.8 | 0.250 | 1 | 181 | 230 | 214 | 182 | 270 | 230 | 241 | 269 |
| 0.8 | 0.250 | 5 | 225 | 290 | 267 | 227 | 340 | 289 | 299 | 338 |
| 0.8 | 0.250 | 9 | 304 | 392 | 362 | 306 | 460 | 392 | 401 | 459 |
| 0.2 | 0.667 | 1 | 361 | 362 | 362 | 362 | 364 | 360 | 370 | 363 |
| 0.2 | 0.667 | 5 | 414 | 416 | 416 | 415 | 420 | 414 | 419 | 418 |
| 0.2 | 0.667 | 9 | 527 | 530 | 530 | 528 | 534 | 528 | 509 | 534 |
| 0.2 | 0.500 | 1 | 133 | 134 | 135 | 134 | 138 | 134 | 144 | 138 |
| 0.2 | 0.500 | 5 | 154 | 156 | 157 | 155 | 162 | 156 | 164 | 161 |
| 0.2 | 0.500 | 9 | 196 | 200 | 201 | 197 | 208 | 200 | 200 | 207 |
| 0.2 | 0.250 | 1 | 40 | 44 | 44 | 40 | 50 | 43 | 53 | 48 |
| 0.2 | 0.250 | 5 | 46 | 52 | 51 | 47 | 58 | 51 | 61 | 58 |
| 0.2 | 0.250 | 9 | 59 | 68 | 66 | 60 | 78 | 66 | 74 | 76 |
| Blue: Computed sample sizes | ||||||||||
| Brown: Sample sizes from the 1992 article | ||||||||||
The modern software outputs (blue columns) mostly form a tight cluster of values. In many scenarios, these columns differ only by a handful of participants.
Within the brown block, RGS is usually the largest value, followed by Lakatos, then Freedman.
nQuery closely matches RGS, with values that line up almost identically across all scenarios.
Rashnu matches the Lakatos method, as expected, showing nearly the same sample sizes row by row.
rpact, gsDesign2, and East do not match any single historical method, but their outputs cluster tightly together, and they generally fall closer to the Lakatos values than to Freedman or RGS.