This vignette illustrates the standard use of the `PLNLDA`

function and the methods accompanying the R6 Classes `PLNLDA`

and `PLNLDAfit`

.

The packages required for the analysis are **PLNmodels** plus some others for data manipulation and representation:

`library(PLNmodels)`

We illustrate our point with the trichoptera data set, a full description of which can be found in the corresponding vignette. Data preparation is also detailed in the specific vignette.

```
data(trichoptera)
<- prepare_data(trichoptera$Abundance, trichoptera$Covariate) trichoptera
```

The `trichoptera`

data frame stores a matrix of counts (`trichoptera$Abundance`

), a matrix of offsets (`trichoptera$Offset`

) and some vectors of covariates (`trichoptera$Wind`

, `trichoptera$Temperature`

, etc.) In the following, we’re particularly interested in the `trichoptera$Group`

**discrete** covariate which corresponds to disjoint time spans during which the catching took place. The correspondence between group label and time spans is:

Label | Number.of.Consecutive.Nights | Date |
---|---|---|

1 | 12 | June 59 |

2 | 5 | June 59 |

3 | 5 | June 59 |

4 | 4 | June 59 |

5 | 4 | July 59 |

6 | 1 | June 59 |

7 | 3 | June 60 |

8 | 4 | June 60 |

9 | 5 | June 60 |

10 | 4 | June 60 |

11 | 1 | June 60 |

12 | 1 | July 60 |

In the vein of Fisher (1936) and Rao (1948), we introduce a multi-class LDA model for multivariate count data which is a variant of the Poisson Lognormal model of Aitchison and Ho (1989) (see the PLN vignette as a reminder). Indeed, it can viewed as a PLN model with a discrete group structure in the latent Gaussian space.

This PLN-LDA model can be written in a hierarchical framework where a sample of \(p\)-dimensional observation vectors \(\mathbf{Y}_i\) is related to some \(p\)-dimensional vectors of latent variables \(\mathbf{Z}_i\) and a discrete structure with \(K\) groups in the following way: \[\begin{equation} \begin{array}{rcl} \text{group structure } & \mathbf{\mu}_i = \mu_{g_i} & g_i \in \{1, \dots, K\} \\ \text{latent space } & \mathbf{Z}_i \quad \text{indep.} & \mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol{\Sigma}) & \\ \text{observation space } & Y_{ij} | Z_{ij} \quad \text{indep.} & Y_{ij} | Z_{ij} \sim \mathcal{P}\left(\exp\{Z_{ij}\}\right) \end{array} \end{equation}\] where \(g_i\) denotes the group sample \(i\) belongs to.

The different parameters \({\boldsymbol\mu}_k \in\mathbb{R}^p\) corresponds to the group-specific main effects and the variance matrix \(\boldsymbol{\Sigma}\) is shared among groups. An equivalent way of writing this model is the following: \[\begin{equation} \begin{array}{rcl} \text{latent space } & \mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i,\boldsymbol\Sigma) & \boldsymbol{\mu}_i = \mathbf{g}_i^\top \mathbf{M} \\ \text{observation space } & Y_{ij} | Z_{ij} \quad \text{indep.} & Y_{ij} | Z_{ij} \sim \mathcal{P}\left(\exp\{Z_{ij}\}\right), \end{array} \end{equation}\] where, with a slight abuse of notation, \(\mathbf{g}_i\) is a group-indicator vector of length \(K\) (\(g_{ik} = 1 \Leftrightarrow g_i = k\)) and \(\mathbf{M} = [\boldsymbol{\mu}_1^\top, \dots, \boldsymbol{\mu}_K^\top]^\top\) is a \(K \times p\) matrix collecting the group-specific main effects.

Just like PLN, PLN-LDA generalizes to a formulation close to a multivariate generalized linear model where the main effect is due to a linear combination of the discrete group structure, \(d\) covariates \(\mathbf{x}_i\) and a vector \(\mathbf{o}_i\) of \(p\) offsets in sample \(i\). The latent layer then reads \[\begin{equation} \mathbf{Z}_i \sim \mathcal{N}({\mathbf{o}_i + \mathbf{g}_i^\top \mathbf{M} + \mathbf{x}_i^\top\boldsymbol\Theta},\boldsymbol\Sigma) \end{equation}\] where \(\boldsymbol\Theta\) is a \(d\times p\) matrix of regression parameters.

Given:

- a new observation \(\mathbf{Y}\) with associated offset \(\mathbf{o}\) and covariates \(\mathbf{x}\)
- a model with estimated parameters \(\hat{\boldsymbol{\Sigma}}\), \(\hat{\boldsymbol{\Theta}}\), \(\hat{\mathbf{M}}\) and group counts \((n_1, \dots, n_K)\)

We can predict the observation’s group using Bayes rule as follows: for \(k \in {1, \dots, K}\), compute \[\begin{equation} \begin{aligned} f_k(\mathbf{Y}) & = p(\mathbf{Y} | \mathbf{g} = k, \mathbf{o}, \mathbf{x}, \hat{\boldsymbol{\Theta}}, \hat{\boldsymbol{\Sigma}}) \\ & = \boldsymbol{\Phi}_{PLN}(\mathbf{Y}; \mathbf{o} + \boldsymbol{\mu}_k + \mathbf{x}^\top \hat{\boldsymbol{\Theta}}, \hat{\boldsymbol{\Sigma}}) \\ p_k & = \frac{n_k}{\sum_{k' = 1}^K n_{k'}} \end{aligned} \end{equation}\] where \(\boldsymbol{\Phi}_{PLN}(\bullet; \boldsymbol{\mu}, \boldsymbol{\Sigma})\) is the density function of a PLN distribution with parameters \((\boldsymbol{\mu}, \boldsymbol{\Sigma})\). \(f_k(\mathbf{Y})\) and \(p_k\) are respectively plug-in estimates of (i) the probability of observing counts \(\mathbf{Y}\) in a sample from group \(k\) and (ii) the probability that a sample originates from group \(k\).

The posterior probability \(\hat{\pi}_k(\mathbf{Y})\) that observation \(\mathbf{Y}\) belongs to group \(k\) and most likely group \(\hat{k}(\mathbf{Y})\) can thus be defined as \[\begin{equation} \begin{aligned} \hat{\pi}_k(\mathbf{Y}) & = \frac{p_k f_k(\mathbf{Y})}{\sum_{k' = 1}^K p_{k'} f_{k'}(\mathbf{Y})} \\ \hat{k}(\mathbf{Y}) & = \underset{k \in \{1, \dots, K\}}{\arg\max} \hat{\pi}_k(\mathbf{Y}) \end{aligned} \end{equation}\]

Classification and prediction are the main objectives in (PLN-)LDA. To reach this goal, we first need to estimate the model parameters. Inference in PLN-LDA focuses on the group-specific main effects \(\mathbf{M}\), the regression parameters \(\boldsymbol\Theta\) and the covariance matrix \(\boldsymbol\Sigma\). Technically speaking, we can treat \(\mathbf{g}_i\) as a discrete covariate and estimate \([\mathbf{M}, \boldsymbol{\Theta}]\) using the same strategy as for the standard **PLN** model. Briefly, we adopt a variational strategy to approximate the log-likelihood function and optimize the consecutive variational surrogate of the log-likelihood with a gradient-ascent-based approach. To this end, we rely on the CCSA algorithm of Svanberg (2002) implemented in the C++ library (Johnson 2011), which we link to the package.

In the package, the PLN-LDA model is adjusted with the function `PLNLDA`

, which we review in this section. This function adjusts the model and stores it in a object of class `PLNLDAfit`

which inherits from the class `PLNfit`

, so we strongly recommend the reader to be somehow comfortable with `PLN`

and `PLNfit`

before using `PLNLDA`

(see the PLN vignette).

We start by adjusting the above model to the Trichoptera data set. We use `Group`

, the catching time spans, as a discrete structure and use log as an offset to capture differences in sampling luck.

The model can be fitted with the function `PLNLDA`

as follows:

```
<- PLNLDA(Abundance ~ 0 + offset(log(Offset)),
myLDA_nocov grouping = Group,
data = trichoptera)
```

```
##
## Performing discriminant Analysis...
## DONE!
```

Note that `PLNLDA`

uses the standard `formula`

interface, like every other model in the **PLNmodels** package.

`PLNLDAfit`

The `myLDA_nocov`

variable is an `R6`

object with class `PLNLDAfit`

, which comes with a couple of methods. The most basic is the `show/print`

method, which sends a brief summary of the estimation process and available methods:

` myLDA_nocov`

```
## Linear Discriminant Analysis for Poisson Lognormal distribution
## ==================================================================
## nb_param loglik BIC ICL
## 391 -800.028 -1560.879 -1225.04
## ==================================================================
## * Useful fields
## $model_par, $latent, $latent_pos, $var_par, $optim_par
## $loglik, $BIC, $ICL, $loglik_vec, $nb_param, $criteria
## * Useful S3 methods
## print(), coef(), sigma(), vcov(), fitted(), predict(), standard_error()
## * Additional fields for LDA
## $percent_var, $corr_map, $scores, $group_means
## * Additional S3 methods for LDA
## plot.PLNLDAfit(), predict.PLNLDAfit()
```

Comprehensive information about `PLNLDAfit`

is available via `?PLNLDAfit`

.

The user can easily access several fields of the `PLNLDAfit`

object using `S3`

methods, the most interesting ones are

- the \(p \times p\) covariance matrix \(\hat{\boldsymbol{\Sigma}}\):

`sigma(myLDA_nocov) %>% corrplot::corrplot(is.corr = FALSE)`

- the regression coefficient matrix \(\hat{\boldsymbol{\Theta}}\) (in this case
`NULL`

as there are no covariates)

`coef(myLDA_nocov)`

`## NULL`

- the \(p \times K\) matrix of group means \(\mathbf{M}\)

`$group_means %>% head() %>% knitr::kable(digits = 2) myLDA_nocov`

grouping1 | grouping2 | grouping3 | grouping4 | grouping5 | grouping6 | grouping7 | grouping8 | grouping9 | grouping10 | grouping11 | grouping12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-21.61 | -6.82 | -21.18 | -20.00 | -22.96 | -23.54 | -21.85 | -20.99 | -21.06 | -5.67 | -3.48 | -19.99 |

-21.58 | -22.32 | -5.68 | -19.97 | -7.45 | -23.50 | -21.81 | -20.96 | -21.03 | -21.10 | -18.94 | -4.48 |

-2.38 | -3.96 | -2.36 | -2.92 | -6.48 | -5.54 | -2.69 | -2.03 | -2.71 | -2.65 | -19.04 | -3.85 |

-21.59 | -22.34 | -5.69 | -4.52 | -22.86 | -23.57 | -21.85 | -5.52 | -5.59 | -4.91 | -19.00 | -20.02 |

-0.28 | -0.26 | -0.62 | -1.09 | -0.62 | -0.11 | -0.66 | -0.80 | -0.39 | -0.45 | -0.61 | -1.16 |

-4.00 | -3.32 | -2.93 | -4.47 | -6.72 | -8.00 | -2.85 | -3.06 | -5.53 | -3.99 | -18.91 | -19.93 |

The `PLNLDAfit`

class also benefits from two important methods: `plot`

and `predict`

.

`plot`

methodThe `plot`

methods provides easy to interpret graphics which reveals here that the groups are well separated:

`plot(myLDA_nocov)`

By default, `plot`

shows the first 3 axis of the LDA when there are 4 or more groups and uses special representations for the edge cases of 3 or less groups.

`ggplot2`

-savvy users who want to make their own representations can extracts the \(n \times (K-1)\) matrix of sample scores from the `PLNLDAfit`

object …

`$scores %>% head %>% knitr::kable(digits = 2) myLDA_nocov`

LD1 | LD2 | LD3 | LD4 | LD5 | LD6 | LD7 | LD8 | LD9 | LD10 | LD11 |
---|---|---|---|---|---|---|---|---|---|---|

1141218 | -345890.7 | -19565.53 | -36438.04 | -8158.59 | 3126.82 | 3234.54 | -8.42 | 114.38 | -13.37 | -82.84 |

1142398 | -344464.1 | -19702.37 | -36728.69 | -8066.40 | 3125.09 | 3257.54 | -38.22 | 110.92 | -12.66 | -83.62 |

1153829 | -331104.0 | -21401.86 | -38662.74 | -7349.46 | 3108.75 | 3098.77 | -50.44 | 118.20 | -9.40 | -78.45 |

1166545 | -315989.7 | -23304.10 | -40733.95 | -6580.43 | 3094.36 | 2825.04 | 27.46 | 129.42 | -7.33 | -72.07 |

1159038 | -324807.4 | -22173.05 | -39485.71 | -7052.29 | 3101.76 | 2943.85 | 22.90 | 124.19 | -8.94 | -75.47 |

1149716 | -335826.4 | -20786.65 | -37948.35 | -7618.88 | 3115.49 | 3118.14 | -10.14 | 116.83 | -11.19 | -80.07 |

…or the \(p \times (K-1)\) matrix of correlations between scores and (latent) variables

`$corr_map %>% head %>% knitr::kable(digits = 2) myLDA_nocov`

LD1 | LD2 | LD3 | LD4 | LD5 | LD6 | LD7 | LD8 | LD9 | LD10 | LD11 | |
---|---|---|---|---|---|---|---|---|---|---|---|

Che | -0.35 | -0.07 | -0.37 | 0.22 | 0.56 | -0.48 | -0.20 | -0.53 | 0.83 | -0.42 | 0.06 |

Hyc | 0.08 | 0.62 | -0.13 | 0.32 | -0.11 | -0.05 | -0.73 | 0.81 | -0.22 | -0.12 | 0.61 |

Hym | 0.11 | 0.05 | 0.21 | -0.77 | -0.40 | -0.36 | 0.21 | -0.07 | -0.26 | -0.05 | -0.61 |

Hys | -0.41 | 0.21 | 0.68 | 0.00 | -0.33 | -0.61 | -0.02 | -0.12 | -0.29 | 0.01 | 0.28 |

Psy | 0.11 | -0.50 | -0.21 | -0.25 | 0.47 | -0.05 | 0.13 | -0.01 | 0.57 | -0.64 | -0.32 |

Aga | 0.13 | 0.04 | 0.18 | -0.90 | -0.11 | -0.28 | 0.01 | -0.26 | -0.05 | -0.26 | -0.50 |

`predict`

methodThe `predict`

method has a slightly different behavior than its siblings in other models of the **PLNmodels**. The goal of `predict`

is to predict the discrete class based on observed *species counts* (rather than predicting counts from known covariates).

By default, the `predict`

use the argument `type = "posterior"`

to output the matrix of log-posterior probabilities \(\log(\hat{\pi})_k\)

```
<- predict(myLDA_nocov, newdata = trichoptera)
predicted.class ## equivalent to
## predicted.class <- predict(myLDA_nocov, newdata = trichoptera, type = "posterior")
%>% head() %>% knitr::kable(digits = 2) predicted.class
```

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-58.34 | -81.98 | -79.01 | -96.96 | -90.19 | -73.65 | -66.79 | -124.62 | -109.22 | -61.47 | -199.99 | -135.64 |

-50.80 | -58.18 | -90.28 | -93.07 | -102.53 | -68.42 | -55.81 | -68.85 | -56.33 | -55.83 | -136.02 | -127.38 |

-53.71 | -63.29 | -134.23 | -111.11 | -139.37 | -71.58 | -63.21 | -98.08 | -79.75 | -63.61 | -157.23 | -170.35 |

-51.78 | -64.02 | -144.17 | -182.65 | -167.31 | -93.57 | -80.72 | -152.03 | -119.71 | -77.15 | -281.34 | -258.64 |

-54.13 | -60.22 | -87.34 | -131.04 | -107.78 | -78.52 | -67.02 | -100.03 | -79.94 | -65.95 | -193.25 | -170.21 |

-50.31 | -54.54 | -72.87 | -96.02 | -83.94 | -64.79 | -55.65 | -69.22 | -56.32 | -55.36 | -120.50 | -128.63 |

You can also show them in the standard (and human-friendly) \([0, 1]\) scale with `scale = "prob"`

to get the matrix \(\hat{\pi}_k\)

```
<- predict(myLDA_nocov, newdata = trichoptera, scale = "prob")
predicted.class %>% head() %>% knitr::kable(digits = 3) predicted.class
```

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|

0.958 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0 | 0.000 | 0.042 | 0 | 0 |

0.982 | 0.001 | 0 | 0 | 0 | 0 | 0.007 | 0 | 0.004 | 0.006 | 0 | 0 |

1.000 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0 | 0.000 | 0.000 | 0 | 0 |

1.000 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0 | 0.000 | 0.000 | 0 | 0 |

0.998 | 0.002 | 0 | 0 | 0 | 0 | 0.000 | 0 | 0.000 | 0.000 | 0 | 0 |

0.972 | 0.014 | 0 | 0 | 0 | 0 | 0.005 | 0 | 0.002 | 0.006 | 0 | 0 |

Setting `type = "response"`

, we can predict the most likely group \(\hat{k}\) instead:

```
<- predict(myLDA_nocov, newdata = trichoptera, type = "response")
predicted.class predicted.class
```

```
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 1 1 1 1 1 1 9 2 1 1 1 1 2 3 2 2 2 3 3 3 9 3 3 1 5 5
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
## 5 5 5 5 6 7 7 7 7 9 9 9 9 1 9 9 9 10 10 10 10 11 5
## Levels: 1 2 3 4 5 6 7 8 9 10 11 12
```

We can assess that the predictions are quite similar to the real group (*this is not a proper validation of the method as we used data set for both model fitting and prediction and are thus at risk of overfitting*).

`table(predicted.class, trichoptera$Group, dnn = c("predicted", "true"))`

```
## true
## predicted 1 2 3 4 5 6 7 8 9 10 11 12
## 1 10 0 0 1 0 0 0 0 1 0 0 0
## 2 1 4 0 0 0 0 0 0 0 0 0 0
## 3 0 1 4 1 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 2 4 0 0 0 0 0 0 1
## 6 0 0 0 0 0 1 0 0 0 0 0 0
## 7 0 0 0 0 0 0 3 1 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0 0 0
## 9 1 0 1 0 0 0 0 3 4 0 0 0
## 10 0 0 0 0 0 0 0 0 0 4 0 0
## 11 0 0 0 0 0 0 0 0 0 0 1 0
## 12 0 0 0 0 0 0 0 0 0 0 0 0
```

Finally, we can get the coordinates of the new data on the same graph at the original ones with `type = "scores"`

. This is done by averaging the latent positions \(\hat{\mathbf{Z}}_i + \boldsymbol{\mu}_k\) (found when the sample is assumed to come from group \(k\)) and weighting them with the \(\hat{\pi}_k\). Some samples, have compositions that put them very far from their group mean.

```
library(ggplot2)
<- predict(myLDA_nocov, newdata = trichoptera, type = "scores")
predicted.scores colnames(predicted.scores) <- paste0("Axis.", 1:ncol(predicted.scores))
<- as.data.frame(predicted.scores)
predicted.scores $group <- trichoptera$Group
predicted.scoresplot(myLDA_nocov, map = "individual", nb_axes = 2, plot = FALSE) +
geom_point(data = predicted.scores,
aes(x = Axis.1, y = Axis.2, color = group, label = NULL))
```

It is possible to correct for other covariates before finding the LDA axes that best separate well the groups. In our case ,we’re going to use `Wind`

as a covariate and illustrate the main differences with before :

```
<- PLNLDA(Abundance ~ Wind + 0 + offset(log(Offset)),
myLDA_cov grouping = Group,
data = trichoptera)
```

```
##
## Performing discriminant Analysis...
## DONE!
```

All fields of our new `PLNLDA`

fit can be accessed as before with similar results. The only important difference is the result of `coef`

: since we included a covariate in the model, `coef`

now returns a 1-column matrix for \(\hat{\boldsymbol{\Theta}}\) instead of `NULL`

`coef(myLDA_cov) %>% head %>% knitr::kable()`

Wind | |
---|---|

Che | -0.3321637 |

Hyc | 1.2304346 |

Hym | -0.1930088 |

Hys | -0.4546537 |

Psy | 0.0384414 |

Aga | -0.0612139 |

The group-specific main effects can still be accessed with `$group_means`

`$group_means %>% head %>% knitr::kable(digits = 2) myLDA_cov`

grouping1 | grouping2 | grouping3 | grouping4 | grouping5 | grouping6 | grouping7 | grouping8 | grouping9 | grouping10 | grouping11 | grouping12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-15.27 | -9.71 | -18.20 | -14.14 | -19.35 | -33.16 | -22.54 | -21.19 | -14.65 | 1.86 | 7.70 | -17.68 |

-18.08 | -33.08 | -2.47 | -11.94 | -10.57 | -36.95 | -25.22 | -22.50 | -11.06 | -13.04 | -3.98 | 0.51 |

-1.22 | -4.40 | -1.79 | -1.84 | -5.80 | -6.80 | -2.66 | -1.84 | -1.62 | -1.61 | -18.46 | -3.33 |

-18.37 | -25.03 | -3.47 | -0.76 | -20.44 | -29.31 | -22.72 | -5.24 | -1.55 | -0.95 | -13.90 | -19.20 |

-0.05 | -0.42 | -0.47 | -0.85 | -0.47 | -0.43 | -0.60 | -0.71 | -0.06 | -0.15 | -0.18 | -1.03 |

-1.93 | -4.05 | -1.79 | -2.52 | -5.54 | -10.38 | -2.72 | -2.76 | -3.43 | -1.88 | -16.91 | -20.10 |

`plot`

methodOnce again, the `plot`

method is very useful to get a quick look at the results.

`plot(myLDA_cov)`

`predict`

methodWe can again predict the most likely group for each sample :

`<- predict(myLDA_cov, newdata = trichoptera, type = "response") predicted.class_cov `

and check that we recover the correct class in most cases (again, we used the same data set for model fitting and group prediction only for ease of exposition):

`table(predicted.class_cov, trichoptera$Group, dnn = c("predicted", "true"))`

```
## true
## predicted 1 2 3 4 5 6 7 8 9 10 11 12
## 1 12 2 2 2 2 0 0 2 4 2 0 0
## 2 0 3 1 0 0 0 0 0 0 1 0 0
## 3 0 0 2 0 0 0 0 0 0 0 0 0
## 4 0 0 0 2 0 0 0 0 0 0 0 0
## 5 0 0 0 0 2 0 0 0 0 0 0 1
## 6 0 0 0 0 0 1 0 0 0 0 0 0
## 7 0 0 0 0 0 0 3 0 0 0 0 0
## 8 0 0 0 0 0 0 0 2 0 0 0 0
## 9 0 0 0 0 0 0 0 0 1 0 0 0
## 10 0 0 0 0 0 0 0 0 0 1 0 0
## 11 0 0 0 0 0 0 0 0 0 0 1 0
## 12 0 0 0 0 0 0 0 0 0 0 0 0
```

Aitchison, J., and C. H. Ho. 1989. “The Multivariate Poisson-Log Normal Distribution.” *Biometrika* 76 (4): 643–53.

Fisher, R. A. 1936. “The Use of Multiple Measurements in Taxonomic Problems.” *Annals of Eugenics* 7 (2): 179–88.

Johnson, Steven G. 2011. *The NLopt Nonlinear-Optimization Package*. https://nlopt.readthedocs.io/en/latest/.

Rao, C. Radhakrishna. 1948. “The Utilization of Multiple Measurements in Problems of Biological Classification.” *Journal of the Royal Statistical Society. Series B (Methodological)* 10 (2): 159–203.

Svanberg, Krister. 2002. “A Class of Globally Convergent Optimization Methods Based on Conservative Convex Separable Approximations.” *SIAM Journal on Optimization* 12 (2): 555–73.