To demonstrate how to use efficiency augmentation and the propensity score utilities available in the `personalized`

package, we simulate data with two treatments. The treatment assignments are based on covariates and hence mimic an observational setting with no unmeasured confounders.

`library(personalized)`

In this simulation, the treatment assignment depends on covariates and hence we must model the propensity score \(\pi(x) = Pr(T = 1 | X = x)\). In this simulation we will assume that larger values of the outcome are better.

```
library(personalized)
set.seed(1)
<- 1000
n.obs <- 50
n.vars <- matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars)
x
# simulate non-randomized treatment
<- 0.5 + 0.25 * x[,21] - 0.25 * x[,41]
xbetat <- exp(xbetat) / (1 + exp(xbetat))
trt.prob <- rbinom(n.obs, 1, prob = trt.prob)
trt
# simulate delta
<- (0.5 + x[,2] - 0.5 * x[,3] - 1 * x[,11] + 1 * x[,1] * x[,12] )
delta
# simulate main effects g(X)
<- x[,1] + x[,11] - 2 * x[,12]^2 + x[,13] + 0.5 * x[,15] ^ 2
xbeta <- xbeta + delta * (2 * trt - 1)
xbeta
# simulate continuous outcomes
<- drop(xbeta) + rnorm(n.obs) y
```

Estimation of the propensity score is a fundamental aspect of the estimation of individualized treatment rules (ITRs). The `personalized`

package offers support tools for construction of the propensity score function used by the `fit.subgroup()`

function. The support is via the `create.propensity.function()`

function. This tool allows for estimation of the propensity score in high dimensions via `glmnet`

. In high dimensions it can be important to account for regularization bias via cross-fitting (https://arxiv.org/abs/1608.00060); the `create.propensity.function()`

offers a cross-fitting approach for high-dimensional propensity score estimation. A basic usage of this function with cross-fitting (with 10 folds) is as follows:

```
# arguments can be passed to cv.glmnet via `cv.glmnet.args`
<- create.propensity.function(crossfit = TRUE,
prop.func nfolds.crossfit = 10,
cv.glmnet.args = list(type.measure = "auc", nfolds = 5))
```

`prop.func`

can then be passed to `fit.subgroup()`

as follows:

```
<- fit.subgroup(x = x, y = y,
subgrp.model trt = trt,
propensity.func = prop.func,
loss = "sq_loss_lasso",
nfolds = 10) # option for cv.glmnet (for ITR estimation)
summary(subgrp.model)
```

```
## family: gaussian
## loss: sq_loss_lasso
## method: weighting
## cutpoint: 0
## propensity
## function: propensity.func
##
## benefit score: f(x),
## Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
##
## Average Outcomes:
## Recommended 0 Recommended 1
## Received 0 -10.3946 (n = 115) -17.6061 (n = 285)
## Received 1 -14.9248 (n = 368) -11.4371 (n = 232)
##
## Treatment effects conditional on subgroups:
## Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0]
## 4.5301 (n = 483)
## Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1]
## 6.169 (n = 517)
##
## NOTE: The above average outcomes are biased estimates of
## the expected outcomes conditional on subgroups.
## Use 'validate.subgroup()' to obtain unbiased estimates.
##
## ---------------------------------------------------
##
## Benefit score quantiles (f(X) for 1 vs 0):
## 0% 25% 50% 75% 100%
## -16.3571 -3.0735 0.1874 3.6822 14.9558
##
## ---------------------------------------------------
##
## Summary of individual treatment effects:
## E[Y|T=1, X] - E[Y|T=0, X]
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -32.7141 -6.1471 0.3749 0.8080 7.3644 29.9116
##
## ---------------------------------------------------
##
## 8 out of 50 interactions selected in total by the lasso (cross validation criterion).
##
## The first estimate is the treatment main effect, which is always selected.
## Any other variables selected represent treatment-covariate interactions.
##
## Trt1 V2 V3 V4 V12 V21 V29 V36 V41
## Estimate 0.6779 0.7138 -0.3185 -8e-04 -0.017 -1.1168 -0.012 0.2883 0.7512
```

Efficiency in estimating ITRs can be improved by including an augmentation term. The optimal augmentation term generally is a function of the main effects of the full outcome regression model marginalized over the treatment. Especially in high dimensions, regularization bias can potentially have a negative impact on performance. Cross-fitting is again another reasonable approach to circumventing this issue. Augmentation functions can be constructed (with cross-fitting as an option) via the `create.augmentation.function()`

function, which works similarly as the `create.propensity.function()`

function. The basic usage is as follows:

```
<- create.augmentation.function(family = "gaussian",
aug.func crossfit = TRUE,
nfolds.crossfit = 10,
cv.glmnet.args = list(type.measure = "mae", nfolds = 5))
```

`aug.func`

can be used for augmentation by passing it to `fit.subgroup()`

like:

```
<- fit.subgroup(x = x, y = y,
subgrp.model.aug trt = trt,
propensity.func = prop.func,
augment.func = aug.func,
loss = "sq_loss_lasso",
nfolds = 10) # option for cv.glmnet (for ITR estimation)
summary(subgrp.model.aug)
```

```
## family: gaussian
## loss: sq_loss_lasso
## method: weighting
## cutpoint: 0
## augmentation
## function: augment.func
## propensity
## function: propensity.func
##
## benefit score: f(x),
## Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
##
## Average Outcomes:
## Recommended 0 Recommended 1
## Received 0 -6.1711 (n = 140) -20.1641 (n = 260)
## Received 1 -17.8515 (n = 207) -11.2203 (n = 393)
##
## Treatment effects conditional on subgroups:
## Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0]
## 11.6804 (n = 347)
## Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1]
## 8.9438 (n = 653)
##
## NOTE: The above average outcomes are biased estimates of
## the expected outcomes conditional on subgroups.
## Use 'validate.subgroup()' to obtain unbiased estimates.
##
## ---------------------------------------------------
##
## Benefit score quantiles (f(X) for 1 vs 0):
## 0% 25% 50% 75% 100%
## -5.8772 -0.5707 0.7856 2.3832 8.3908
##
## ---------------------------------------------------
##
## Summary of individual treatment effects:
## E[Y|T=1, X] - E[Y|T=0, X]
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -11.754 -1.141 1.571 1.647 4.766 16.782
##
## ---------------------------------------------------
##
## 1 out of 50 interactions selected in total by the lasso (cross validation criterion).
##
## The first estimate is the treatment main effect, which is always selected.
## Any other variables selected represent treatment-covariate interactions.
##
## Trt1 V2
## Estimate 0.8569 0.69
```

We first run the training/testing procedure to assess the performance of the non-augmented estimator:

```
<- validate.subgroup(subgrp.model, B = 5,
valmod method = "training_test",
train.fraction = 0.75)
valmod
```

```
## family: gaussian
## loss: sq_loss_lasso
## method: weighting
##
## validation method: training_test_replication
## cutpoint: 0
## replications: 5
##
## benefit score: f(x),
## Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
##
## Average Test Set Outcomes:
## Recommended 0 Recommended 1
## Received 0 -10.9315 (SE = 5.5637, n = 24.6) -16.2869 (SE = 2.7115, n = 73.4)
## Received 1 -14.7973 (SE = 2.647, n = 88.2) -11.6371 (SE = 3.6671, n = 63.8)
##
## Treatment effects conditional on subgroups:
## Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0]
## 3.8658 (SE = 6.848, n = 112.8)
## Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1]
## 4.6498 (SE = 2.2761, n = 137.2)
##
## Est of
## E[Y|Trt received = Trt recom] - E[Y|Trt received =/= Trt recom]:
## 4.0368 (SE = 3.0118)
```

Then we compare with the augmented estimator. Although this is based on just 5 replications, we can see that the augmented estimator is much better at discriminating between benefitting and non-benefitting patients, as evidenced by the large treatment effect among those predicted to benefit by the augmented estimator versus the smaller conditional treatment effect above.

```
<- validate.subgroup(subgrp.model.aug, B = 5,
valmod.aug method = "training_test",
train.fraction = 0.75)
valmod.aug
```

```
## family: gaussian
## loss: sq_loss_lasso
## method: weighting
##
## validation method: training_test_replication
## cutpoint: 0
## replications: 5
##
## benefit score: f(x),
## Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
##
## Average Test Set Outcomes:
## Recommended 0 Recommended 1
## Received 0 -7.7207 (SE = 4.8852, n = 37.8) -18.6941 (SE = 4.3459, n = 65.8)
## Received 1 -16.52 (SE = 4.0857, n = 56) -10.8847 (SE = 2.8148, n = 90.4)
##
## Treatment effects conditional on subgroups:
## Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0]
## 8.7993 (SE = 7.2892, n = 93.8)
## Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1]
## 7.8094 (SE = 5.1533, n = 156.2)
##
## Est of
## E[Y|Trt received = Trt recom] - E[Y|Trt received =/= Trt recom]:
## 7.9081 (SE = 5.1089)
```