PowerTOST • Reference-Scaled Average Bioequivalence

library(PowerTOST) # attach the library


Parameter Argument Purpose Default
\(\alpha\) alpha Nominal level of the test 0.05
\(\pi\) targetpower Minimum desired power 0.80
\(\theta_{0}\) theta0 ‘True’ or assumed T/R ratio 0.90
\(\theta_{1}\) theta1 Lower BE limit and PE constraint 0.80
\(\theta_{2}\) theta2 Upper BE limit and PE constraint 1.25
CV CV CV none
design design Planned replicate design "2x3x3"
regulator regulator ‘target’ jurisdiction "EMA"
nsims nsims Number of simulations see below
nstart nstart Start if a previous run failed none
imax imax Maximum number of iterations 100
print print Show information in the console? TRUE
details details Show details of the sample size search? FALSE
setseed setseed Issue a fixed seed of the random number generator? TRUE

Arguments targetpower, theta0, theta1, theta2, and CV have to be given as fractions, not percent.
The CV is the within-subject coefficient of variation. If one value is given, homoscedasticity (equal variances) is assumed and therefore, CVwT = CVwR. If two values are given (i.e., CV = c(x, y)) heteroscedasticity (unequal variances) is assumed, where CV[1] has to be CVwT and CV[2] CVwR.

If simulating for power (theta0 within the BE limits), nsims defaults to 100,000. If simulating for the empiric type I error (theta0 set to one of the BE limits), nsims defaults to 1 million.

Implemented Designs

#    design                        name   df
#   "2x2x3"   2x2x3 replicate crossover 2n-3
#   "2x2x4"   2x2x4 replicate crossover 3n-4
#   "2x4x4"   2x4x4 replicate crossover 3n-4
#   "2x3x3"   partial replicate (2x3x3) 2n-3
#   "2x4x2"            Balaam’s (2x4x2)  n-2
#  "2x2x2r" Liu’s 2x2x2 repeated x-over 3n-2

The terminology of the design argument follows this pattern: treatments x sequences x periods.

With foo(..., details = FALSE, print = FALSE) results are given as a data.frame with eleven columns Design, alpha, CVwT, CVwR, theta0, theta1, theta2, Sample size, Achieved power, Target power, and nlast. To access e.g., the sample size use either foo(...)[1, 8] or foo(...)[["Sample size"]]. We suggest to use the latter in your code for clarity.

The estimated sample size gives always the total number of subjects (not subject/sequence).

Conditions and Methods

Regulatory conditions and methods of evaluation are different.

#  regulator CVswitch   CVcap r_const      L      U pe_constr method
#        EMA      0.3 0.50000 0.76000 0.6984 1.4319      TRUE  ANOVA
#         HC      0.3 0.57382 0.76000 0.6667 1.5000      TRUE    ISC
#        FDA      0.3    none 0.89257   none   none      TRUE    ISC

CVswitch is the lower cap of scaling, i.e., if CVwR is below this value reference-scaling is not acceptable. The upper cap of scaling CVcap differes between the EMA and HC, whereas for the FDA scaling is unlimited. The regulatory constant r_const is used for calculating the expanded limits (EMA, HC) and ‘implied limits’ (FDA) based on swR: \(\left[ {L,U} \right] = 100{e^{ \mp 0.760 \cdot {s_{wR}}}}\)
Here L and U give the maximum acceptable expansion based on \(s^*_{wR}=\sqrt{log({CVcap}^2+1)}\). The point estimate constraint pe_constr [0.80, 1.25] is applicable in all regulations. Evaluation has to be performed by ANOVA (EMA) or a mixed-effects model (HC, FDA). For the latter intra-subject contrasts are a suffiently close approximation.

Sample Size

Highly Variable Drugs / Drug Products

Estimate the sample size for assumed intra-subject CV 0.55 (CVwT = CVwR). Employ the defaults (theta0 = 0.90, targetpower = 0.80, design = "2x3x3", nsims = 1e5).


Average Bioequivalence with Expanding Limits (ABEL). Default regulator = "EMA".
Note that this approach is recommended in other jurisdictions as well (e.g., the WHO; ASEAN States, Australia, Brazil, the East African Community, Egypt, the Eurasian Economic Union, New Zealand, the Russian Federation).


Whilst in full replicate designs simulating via the ‘key’ statistics closely matches the ‘gold standard’ of subject simulations, this is less so for unequal variances in the partial replicate design if CVwT > CVwR. Let us keep CVw 0.55 and split variances by a ratio of 1.5 (i.e., T has a higher variance than R).

Although the runtime will be longer, we recommend the function sampleN.scABEL.sdsims() instead.

The sample size is slightly larger.

Explore sample sizes for extreme heterogenicity (variance ratio 2.5) via the ‘key’ statistics and subject simulations (4- and 3-period full replicate and partial replicate dedigns).

As shown in the previous example, subject simulations are recommended for the partial replicate design. For full replicate designs simulations via the ‘key’ statistics give identical results and are recommended for speed reasons. In this example sampleN.scABEL() is 60times faster than sampleN.scABEL.sdsims().
However, if CVwTCVwR we get the identical result via the ‘key’ statistics.

Narrow Therapeutic Index Drugs (FDA)

Assuming heteroscedasticity (CVw 0.125, σ2 ratio 2.5, i.e., T has a substantially higher variability than R). Details of the sample size search suppressed. Assess additionally which one of the three components (scaled, ABE, swT/swR ratio) drives the sample size.

The swT/swR component shows the lowest power and hence, drives the sample size.
Compare that with homoscedasticity (CVwT = CVwR = 0.125):

Here the scaled ABE component shows the lowest power and drives the sample size, which is much lower than in the previous example.

Highly Variable Narrow Therapeutic Index Drugs (FDA)

Almost a contradiction in itself. Required for dagibatran and rivaroxaban.
Assuming homoscedasticity (CVwT = CVwR = 0.30). Employ the defaults (theta0 = 0.95, targetpower = 0.80, design = "2x2x4", nsims = 1e5). Details of the sample size search suppressed.

Assuming heteroscedasticity (CVw 0.30, σ2 ratio 2.5).

In this case a substantially higher sample size is required since the variability of T is higher than the one of R.


Power can by calculated by the counterparts of the respective sample size functions (instead the argument targetpower use the argument n and provide the observed theta0), i.e.,
power.scABEL(), power.RSABE(), power.NTIDFDA(), and power.HVNTID().

Type I Error

Contrary to average bioequivalence, where the Null-hypothesis is based on fixed limits, in reference-scaling the Null is generated in face of the data (i.e, the limits are random variables).

Endrényi and Tóthfalusi (2009,1 20192), Labes (20133), Wonnemann et al. (20154), Muñoz et al. (20165), Labes and Schütz (20166), Tóthfalusi and Endrényi (2016,7 20178), Molins et al. (20179), Deng and Zhou (201910) showed that under certain conditions (EMA, Health Canada: CVwR ~0.22–0.45, FDA: CVwR ≤0.30) the type I error will be substantially inflated.
Below the inflation region the study will be evaluated for ABE and the type I error controlled by the TOST. Above the inflation region the type I error is controlled by the PE restriction and for the EMA and Health Canada additionally by the upper cap of scaling.

CV      <- 0.35
res     <- data.frame(n = NA, CV = CV, TIE = NA)
res$n   <- sampleN.scABEL(CV = CV, design = "2x2x4", print = FALSE,
                          details = FALSE)[["Sample size"]]
U       <- scABEL(CV = CV)[["upper"]]
res$TIE <- power.scABEL(CV = CV, n = res$n, theta0 = U, design = "2x2x4")
print(res, row.names = FALSE)
#   n   CV      TIE
#  34 0.35 0.065566

With ~0.0656 the type I error is inflated (significantly larger than the nominal \(\alpha\) 0.05).

Fig. 1 Empiric type I error for the EMA’s ABEL, 4-period full replicate design. Pink plane at nominal \alpha 0.05. Contour lines enclose region of inflation.

Fig. 1 Empiric type I error for the EMA’s ABEL, 4-period full replicate design.
Pink plane at nominal \(\alpha\) 0.05. Contour lines enclose region of inflation.

A substantially higher inflation of the type I error was reported for the FDA’s model. However, Davit et al. (201211) assessed the type I error not at the ‘implied limits’ but with the ‘desired consumer risk model’ if \(s_{wR} \geq s_0\) (CVwR ≥~25.4%) at \(e^{log(1.25)/s_0\sqrt{log(CV_{wR}^{2}+1)}}\). Some statisticians call the latter ‘hocus-pocus’. However, even with this approach the type I error is still (although less) inflated.

res <- data.frame(CV = sort(c(seq(0.25, 0.32, 0.01), se2CV(0.25))),
                  impl.L = NA, impl.U = NA, impl.TIE = NA,
                  des.L = NA, des.U = NA, des.TIE = NA)
for (i in 1:nrow(res)) {
  res[i, 2:3] <- scABEL(CV = res$CV[i], regulator = "FDA")
  if (CV2se(res$CV[i]) <= 0.25) {
    res[i, 5:6] <- c(0.80, 1.25)
  } else {
    res[i, 5:6] <- exp(c(-1, +1)*(log(1.25)/0.25)*CV2se(res$CV[i]))
  res[i, 4] <- power.RSABE(CV = res$CV[i], theta0 = res[i, 3],
                           design = "2x2x4", n = 32, nsims = 1e6)
  res[i, 7] <- power.RSABE(CV = res$CV[i], theta0 = res[i, 5],
                           design = "2x2x4", n = 32, nsims = 1e6)
print(signif(res, 4), row.names = FALSE)
#     CV impl.L impl.U impl.TIE  des.L des.U des.TIE
#  0.250 0.8000  1.250  0.06068 0.8000 1.250 0.06036
#  0.254 0.8000  1.250  0.06396 0.8000 1.250 0.06357
#  0.260 0.8000  1.250  0.07008 0.7959 1.256 0.05692
#  0.270 0.8000  1.250  0.08352 0.7892 1.267 0.05047
#  0.280 0.8000  1.250  0.10130 0.7825 1.278 0.04770
#  0.290 0.8000  1.250  0.12290 0.7760 1.289 0.04644
#  0.300 0.8000  1.250  0.14710 0.7695 1.300 0.04562
#  0.310 0.7631  1.310  0.04515 0.7631 1.310 0.04466
#  0.320 0.7568  1.321  0.04373 0.7568 1.321 0.04325
Fig. 2 Empiric type I error for the FDA’s RSABE, 4-period full replicate design, n = 32. Thick line ‘implied limits’ (max. TIE 0.147 at CVwR 30%). Thin line ‘desired consumer risk model’ (max. TIE 0.0636 at CVwR 25.4%).

Fig. 2 Empiric type I error for the FDA’s RSABE, 4-period full replicate design, n = 32.
Thick line ‘implied limits’ (max. TIE 0.147 at CVwR 30%).
Thin line ‘desired consumer risk model’ (max. TIE 0.0636 at CVwR 25.4%).

Various approaches were suggested to control the patient’s risk. The methods of Labes and Schütz (2016) and Molins et al. (2017) are implemented in the function scABEL.ad(). The method of Tóthfalusi and Endrényi (2017) is implemented in the function power.RSABE2L.sds().

Iteratively adjusted α

If an inflated type I error is expected, \(\alpha\) is adjusted based on the observed CVwR and the study should be evaluated with a wider confidence interval (Labes and Schütz 2016). Implemented designs: "2x3x3" (default), "2x2x3", "2x2x4".
No adjustment is suggested if the study’s conditions (CVwR, sample size, design) will not lead to an inflated type I error.

Inside the region of inflated type I errors.

An adjusted \(\alpha\) of 0.0363 (i.e., the 92.74% CI) controls the patient’s risk. However, it leads to a slightly lower power (0.773 instead of 0.812).

In order to counteract this loss in power, we can adjust the sample size with the function sampleN.scABEL.ad().

We have to increase the sample size to 38 in order to maintain power. Since the type I error depends to a minor degree on the sample size as well, we have to adjust slightly more (\(\alpha\) 0.0361 instead of 0.0363 with 34 subjects).

Since the observed CVwR is not the true – unknown – one, Molins et al. recommended to ‘assume the worst’ and adjust for CVwR 0.30 in all cases.

# CV = 0.35, n = 34, design = "2x2x4"
#               method adj   alpha    TIE power
#  EMA (nominal alpha)  no 0.05000 0.0656 0.812
#     Labes and Schütz yes 0.03630 0.0500 0.773
#        Molins et al. yes 0.02857 0.0500 0.740

Although Molin’s adjusted \(\alpha\) controls the patient’s risk, it leads to a further loss in power.

Example with a CVwR far above the region of inflated type I errors (i.e., >0.45).

# CV = 0.8, n = 50, design = "2x2x4"
#            method adj  alpha    TIE power
#  Labes and Schütz  no 0.0500 0.0496 0.812
#     Molins et al. yes 0.0282 0.0500 0.732

For high variability the negative impact on power is substantial.

‘Exact’ Procedure

Proposed by Tóthfalusi and Endrényi (2016). Example of the ‘ncTOST’ method by the same authors (2017). Implemented designs: "2x3x3" (default), "2x2x3", "2x2x4".

With ~0.0482 the patient’s risk is controlled. However, the regulatory acceptance is unclear.


BE limits

Regulatory Settings


function author(s)
sampleN.scABEL, sampleN.RSABE, sampleN.NTIDFDA, sampleN.HVNTID,
power.scABEL, power.RSABE2L.sdsims, scABEL, reg_const
Detlew Labes
power.scABEL.sdsims Detlew Labes, Benjamin Lang
sampleN.scABEL.ad, sampleN.scABEL.sdsims, sampleN.RSABE2L.sdsims,
Helmut Schütz


Man-pages of functions used in examples of this vignette:

Online manual of all functions.


Helmut Schütz 2019-12-19

GPL-2 | GPL-3

  1. Endrényi L, Tóthfalusi L. Regulatory and study conditions for the determination of bioequivalence of highly variable drugs. J Pharm Sci. 2009: 12(1); 138–49. open access.

  2. Endrényi L, Tóthfalusi L. Bioequivalence for highly variable drugs: regulatory agreements, disagreements, and harmonization. J Pharmacokin Pharmacodyn. 2019: 46(2); 117–26. doi:s10928-019-09623-w.

  3. Labes D. ‘alpha’ of scaled ABE? BEBA Forum. Vienna, 2013. open access.

  4. Wonnemann M, Frömke C, Koch A. Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs. Pharm Res. 2015: 32(1); 135–43. doi:10.1007/s11095-014-1450-z.

  5. Muñoz J, Alcaide D, Ocaña J. Consumer’s risk in the EMA and FDA regulatory approaches for bioequivalence in highly variable drugs. Stat Med. 2016: 35(12); 1933–43. doi:10.1002/sim.6834.

  6. Labes D, Schütz H. Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control. Pharm Res. 2016: 33(11); 2805–14. doi:10.1007/s11095-016-2006-1.

  7. Tóthfalusi L, Endrényi L. An Exact Procedure for the Evaluation of Reference-Scaled Average Bioequivalence. AAPS J. 2016: 18(2); 476–89. doi:10.1208/s12248-016-9873-6.

  8. Tóthfalusi L, Endrényi L. Algorithms for Evaluating Reference Scaled Average Bioequivalence: Power, Bias, and Consumer Risk. Stat Med. 2017: 36(27); 4378–90. doi:10.1002/sim.7440.

  9. Molins E, Cobo E, Ocaña J. Two-Stage Designs Versus European Scaled Average Designs in Bioequivalence Studies for Highly Variable Drugs: Which to Choose? Stat Med. 2017: 36(30); 4777–88. doi:10.1002/sim.7452.

  10. Deng Y, Zhou X-H. Methods to control the empirical type I error rate in average bioequivalence tests for highly variable drugs. Stat Meth Med Res. Epub ahead of print 3 Sep 2019. doi:10.1177/0962280219871589.

  11. Davit BM, Chen ML, Conner DP, Haidar SH, Kim S, Lee CH, Lionberger RA, Makhlouf FT, Nwakama PE, Patel DT, Schuirmann DJ, Yu LX. Implementation of a Reference-Scaled Average Bioequivalence Approach for Highly Variable Generic Drug Products by the US Food and Drug Administration. AAPS J. 2012: 14(4); 915–24. doi:10.1208/s12248-012-9406-x.