# Introduction

bayesm’s posterior sampling function rhierMnlRwMixture permits the imposition of sign constraints on the individual-specific parameters of a hierarchical multinomial logit model. This may be desired if, for example, the researcher believes there are heterogenous effects from, say, price, but that all responses should be negative (i.e., sign-constrained). This vignette provides exposition of the model, discussion of prior specification, and an example.

# Model

The model follows the hierarchical multinomial logit specification given in Example 3 of the “bayesm Overview” Vignette, but will be repeated here succinctly. Individuals are assumed to be rational economic agents that make utility-maximizing choices. Utility is modeled as the sum of deterministic and stochastic components, where the inverse-logit of the probability of chosing an alternative is linear in the parameters and the error is assumed to follow a Type I Extreme Value distribution:

$U_{ij} = X_{ij}\beta_i + \varepsilon_{ij} \hspace{0.8em} \text{with} \hspace{0.8em} \varepsilon_{ij}\ \sim \text{ iid Type I EV}$

These assumptions yield choice probabilities of:

$\text{Pr}(y_i=j) = \frac{\exp \{x_{ij}'\beta_i\}}{\sum_{k=1}^p\exp\{x_{ik}'\beta_i\}}$

$$x_i$$ is $$n_i \times k$$ and $$i = 1, \ldots, N$$. There are $$p$$ alternatives, $$j = 1, \ldots, p$$. An outside option, often denoted $$j=0$$ can be introduced by assigning $$0$$’s to that option’s covariate ($$x$$) values.

We impose sign constraints by defining a $$k$$-length constraint vector SignRes that takes values from the set $$\{-1, 0, 1\}$$ to define $$\beta_{ik} = f(\beta_{ik}^*)$$ where $$f(\cdot)$$ is as follows:

$\beta_{ik} = f(\beta_{ik}^*) = \left\{ \begin{array}{lcl} \exp(\beta_{ik}^*) & \text{if} & \texttt{SignRes[k]} = 1 \\ \beta_{ik}^* & \text{if} & \texttt{SignRes[k]} = 0 \\ -\exp(\beta_{ik}^*) & \text{if} & \texttt{SignRes[k]} = -1 \\ \end{array} \right.$

The “deep” individual-specific parameters ($$\beta_i^*$$) are assumed to be drawn from a mixture of $$M$$ normal distributions with mean values driven by cross-sectional unit characteristics $$Z$$. That is, $$\beta_i^* = z_i' \Delta + u_i$$ where $$u_i$$ has a mixture-of-normals distribution.1

Considering $$\beta_i^*$$ a length-$$k$$ row vector, we will stack the $$N$$ $$\beta_i^*$$’s vertically and write:

$B=Z\Delta + U$ Thus we have $$\beta_i$$, $$z_i$$, and $$u_i$$ as the $$i^\text{th}$$ rows of $$B$$, $$Z$$, and $$U$$. $$B$$ is $$N \times k$$, $$Z$$ is $$N \times M$$, $$\Delta$$ is $$M \times k$$, and $$U$$ is $$N \times k$$ where the distribution on $$U$$ is such that:

$\Pr(\beta_{ik}^*) = \sum_{m=1}^M \pi_m \phi(z_i' \Delta \vert \mu_j, \Sigma_j)$

$$\phi$$ is the normal pdf.

# Priors

Natural conjugate priors are specified:

$\pi \sim \text{Dirichlet}(a)$ $\text{vec}(\Delta) = \delta \sim MVN(\bar{\delta}, A_\delta^{-1})$ $\mu_m \sim MVN(\bar{\mu}, \Sigma_m \otimes a_\mu^{-1})$ $\Sigma_m \sim IW(\nu, V)$

This specification of priors assumes that the $$(\mu_m,\Sigma_m)$$ are independent and that, conditional on the hyperparameters, the $$\beta_i$$’s are independent.

$$a$$ implements prior beliefs on the number of normal components in the mixture with a default of 5. $$\nu$$ is a “tightness” parameter of the inverted-Wishart distribution and $$V$$ is its location matrix. Without sign constraints, they default to $$\nu=k+3$$ and $$V=\nu I$$, which has the effect of centering the prior on $$I$$ and making it “barely proper”. $$a_\mu$$ is a tightness parameter for the priors on $$\mu$$, and when no sign constraints are imposed it defaults to an extremely diffuse prior of 0.01.

These defaults assume the logit coefficients ($$\beta_{ik}$$’s) are on the order of approximately 1 and, if so, are typically reasonable hyperprior values. However, when sign constraints are imposed, say, SignRes[k]=-1 such that $$\beta_{ik} = -\exp\{\beta_{ik}^*\}$$, then these hyperprior defults pile up mass near zero — a result that follows from the nature of the exponential function and the fact that the $$\beta_{ik}^*$$’s are on the log scale. Let’s show this graphically.

# define function
drawprior <- function (mubar_betak, nvar, ncomp, a, nu, Amu, V, ndraw) {
betakstar <- double(ndraw)
betak     <- double(ndraw)
otherbeta <- double(ndraw)
mubar     <- c(rep(0, nvar-1), mubar_betak)

for(i in 1:ndraw) {
comps=list()
for(k in 1:ncomp) {
Sigma <- rwishart(nu,chol2inv(chol(V)))$IW comps[[k]] <- list(mubar + t(chol(Sigma/Amu)) %*% rnorm(nvar), backsolve(chol(Sigma), diag(1,nvar)) ) } pvec <- rdirichlet(a) beta <- rmixture(1,pvec,comps)$x
betakstar[i] <- beta[nvar]
betak[i]     <- -exp(beta[nvar])
otherbeta[i] <- beta
}

return(list(betakstar=betakstar, betak=betak, otherbeta=otherbeta))
}
set.seed(1234)
# specify rhierMnlRwMixture defaults
mubar_betak <- 0
nvar  <- 10
ncomp <- 3
a     <- rep(5, ncomp)
nu    <- nvar + 3
Amu   <- 0.01
V     <- nu*diag(c(rep(1,nvar-1),1))
ndraw <- 10000
defaultprior <- drawprior(mubar_betak, nvar, ncomp, a, nu, Amu, V, ndraw)
# plot priors under defaults
par(mfrow=c(1,3))
trimhist <- -20
hist(defaultprior$betakstar, breaks=40, col="magenta", main="Beta_k_star", xlab="", ylab="", yaxt="n") hist(defaultprior$betak[defaultprior$betak>trimhist], breaks=40, col="magenta", main="Beta_k", xlab="", ylab="", yaxt="n", xlim=c(trimhist,0)) hist(defaultprior$otherbeta, breaks=40, col="magenta",
main="Other Beta", xlab="", ylab="", yaxt="n")