This `vignette`

provides a quick demo of the `truh`

package. The example that we consider here is taken from Figure 3 of the paper: Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee Ann. Appl. Stat. 14(4): 1777-1805 (December 2020) <DOI: 10.1214/20-AOAS1362>.

We will consider a nonparametric two sample testing problem where the \(d\) dimensional baseline (or uninfected) sample \(\boldsymbol{U}=(U_1,\ldots,U_n)\) are i.i.d with cdf \(F_0\) and the \(d\) dimensional treated (infected) sample \(\boldsymbol{V}=V_1,\ldots,V_m\) are i.i.d with cdf \(G\). Here, we assume that the heterogeneity in the baseline population is reflected by \(K\) different subgroups, each having unimodal distributions with distinct modes and cdfs \(F_1,\ldots,F_K\), and mixing proportions \(w_1,\ldots,w_K\) such that \[F_0=\sum_{a=1}^{K}w_aF_a~\text{where}~w_a\in(0,1)~\text{and}~\sum_{a=1}^{K}w_a=1. \]

The goal is to test the following composite hypothesis: \[H_0:G\in\mathcal{F}(F_0)~\text{versus}~H_1:G\notin\mathcal{F}(F_0), \] where \(\mathcal{F}(F_0)\) is the convex hull of \(F_1,\ldots,F_K\). We take \(d=2,n=2000,m=500\) and sample \(U_1,\ldots,U_n\) from \(F_0\) where \[F_0=0.3N(\boldsymbol{0},\boldsymbol{I}_2)+0.3N(\boldsymbol{\mu}_1,\boldsymbol{I}_2)+0.4N(\boldsymbol{\mu}_2,\boldsymbol{I}_2), \] with \(\boldsymbol{\mu}_1=(0,-4)\) and \(\boldsymbol{\mu}_2=(4,-2)\).

```
= 2000
n = 2
d
#Sampling the baseline (uninfected)
set.seed(1)
<-runif(n,0,1)
pset.seed(10)
<- (p<=0.3)*matrix(rnorm(d*n),n,d)+
U>0.3 & p<=0.6)*cbind(matrix(rnorm(n),n,1),
(pmatrix(rnorm(n,-4),n,1))+
>0.6)*cbind(matrix(rnorm(n,4),n,1),
(pmatrix(rnorm(n,-2),n,1))
```

To sample \(V_1,\ldots,V_m\) we consider three settings for \(G\).

- Setting 1: \(G=N(\boldsymbol{\mu}_2,\boldsymbol{I}_2)\) which is the third component cdf of \(F_0\). In this setting clearly \(G\in\mathcal{F}(F_0)\) and the null hypothesis \(H_0\) is true.

```
# Sampling the treated (infected)
= 500
m set.seed(50)
<-cbind(matrix(rnorm(m,4),m,1),
V1matrix(rnorm(m,-2),m,1))
#Scatter plot of the data
= c(rep('Baseline',n),
grp rep('Treated',m))
plot(c(U[,1],V1[,1]), c(U[,2],V1[,2]),
pch = 19,
col = factor(grp),
xlab = 'X_1',
ylab = 'X_2')
# Legend
legend("topright",
legend = levels(factor(grp)),
pch = 19,
col = factor(levels(factor(grp))))
```

- Setting 2: \(G=0.5N(\boldsymbol{\mu}_3,\boldsymbol{I}_2)+0.5N(\boldsymbol{\mu}_4,\boldsymbol{I}_2)\) where \(\boldsymbol{\mu}_3=0.25\boldsymbol{\mu}_1+0.5\boldsymbol{\mu}_2\) and \(\boldsymbol{\mu}_4=(3/4)\boldsymbol{\mu}_1+(9/8)\boldsymbol{\mu}_2\). Clearly in this case \(G\notin\mathcal{F}(F_0)\).

```
# Sampling the treated (infected)
= 500
m set.seed(20)
<-runif(m,0,1)
qset.seed(50)
<-(q<=0.5)*cbind(matrix(rnorm(m,2),m,1),
V2matrix(rnorm(m,-2),m,1))+
>0.5)*cbind(matrix(rnorm(m,3),m,1),
(qmatrix(rnorm(m,3),m,1))
#Scatter plot of the data
plot(c(U[,1],V2[,1]), c(U[,2],V2[,2]),
pch = 19,
col = factor(grp),
xlab = 'X_1',
ylab = 'X_2')
# Legend
legend("topright",
legend = levels(factor(grp)),
pch = 19,
col = factor(levels(factor(grp))))
```

- Setting 3: \(G=0.8N(\boldsymbol{0},\boldsymbol{I}_2)+0.1N(\boldsymbol{\mu}_1,\boldsymbol{I}_2)+0.1N(\boldsymbol{\mu}_2,\boldsymbol{I}_2)\). This is the most interesting setting as here \(G\in\mathcal{F}(F_0)\) but \(G\neq F_0\) because the mixing weights differ.

```
# Sampling the treated (infected)
= 500
m set.seed(20)
<-runif(m,0,1)
qset.seed(50)
<-(q<=0.8)*matrix(rnorm(d*m),m,d)+
V3>0.8 & q<=0.9)*cbind(matrix(rnorm(m),m,1),
(qmatrix(rnorm(m,-4),m,1))+
>0.9)*cbind(matrix(rnorm(m,4),m,1),
(qmatrix(rnorm(m,-2),m,1))
#Scatter plot of the data
plot(c(U[,1],V3[,1]), c(U[,2],V3[,2]),
pch = 19,
col = factor(grp),
xlab = 'X_1',
ylab = 'X_2')
# Legend
legend("topright",
legend = levels(factor(grp)),
pch = 19,
col = factor(levels(factor(grp))))
```

Let us now execute the `truh`

testing procedure for these scenarios. Recall that the goal is to test the following composite hypothesis: \[H_0:G\in\mathcal{F}(F_0)~\text{versus}~H_1:G\notin\mathcal{F}(F_0).
\] - Setting 1: Here we know that \(G=F_0\) and so \(H_0\) is true.

```
library(truh)
.1 = truh(V1,U,B=200)
truh.1$pval truh
```

`## [1] 0.375`

So, `truh`

fails to reject the null hypothesis.

- Setting 2: Here we know that \(G\notin\mathcal{F}(F_0)\) and so \(H_0\) is false.

```
library(truh)
.2 = truh(V2,U,B=200)
truh.2$pval truh
```

`## [1] 0`

We see that `truh`

rejects the null hypothesis.

- Setting 3: Here \(G\in\mathcal{F}(F_0)\) but \(G\neq F_0\). The null hypothesis \(H_0\) is true in this setting.

```
library(truh)
.3 = truh(V3,U,B=200)
truh.3$pval truh
```

`## [1] 0.205`

In this case, `truh`

makes the correct decision and fails to reject \(H_0\).