# Factor Analysis with Missing Data

In this example, we will show how to use lslx to conduct semi-confirmatory factor analysis with missing data. The example uses data HolzingerSwineford1939 in the package lavaan. Hence, lavaan must be installed.

## Missing Data Construction

Because HolzingerSwineford1939 doesn’t contain missing values, we use the code in semTools to create NA (see the example of twostage() function in semTools).

data_miss <- lavaan::HolzingerSwineford1939
data_miss$x5 <- ifelse(data_miss$x1 <= quantile(data_miss$x1, .3), NA, data_miss$x5)
data_miss$age <- data_miss$ageyr + data_miss$agemo/12 data_miss$x9 <- ifelse(data_miss$age <= quantile(data_miss$age, .3),
NA, data_miss$x9) By the construction, we can see that the missingness of x5 depends on the value of x1 and the missingness of x9 relies on the age variable. Note that age is created by ageyr and agemo. Since ageyr and agemo are not the variables that we are interested, the two variables are treated as auxiliary in the later analysis. ## Model Specification and Object Initialization A usual confirmatory factor analysis (CFA) model is specified. model_miss <- "visual :=> x1 + x2 + x3 textual :=> x4 + x5 + x6 speed :=> x7 + x8 + x9 visual <=> 1 * visual textual <=> 1 * textual speed <=> 1 * speed" Here, 1 before * will be interpreted as fix(1). To initialize an lslx object with auxiliary variables, we need to specify the auxiliary_variable argument. The auxiliary_variable argument only accepts numeric variables. If any categorical variable is considered as a valid auxiliary variable, user should transform it as a set of dummy variables first. One possible method is using model.matrix function. library(lslx) lslx_miss <- lslx$new(model = model_miss, data = data_miss,
auxiliary_variable = c("ageyr", "agemo"))
An 'lslx' R6 class is initialized via 'data' argument.
Response Variables: x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factors: visual textual speed
Auxiliary Variables: ageyr agemo 

Because the specified CFA might not fit the data well, we add a correlated residual structure to the model by $penalize_block() lslx_miss$penalize_block(block = "y<->y", type = "fixed", verbose = FALSE)

The code penalizes all the coefficients in y<->y block with fixed parameter type. Note that this model is not identified under the usual SEM framework. PL method can still estimate it because the penalty function introduces additional constraints on parameters. However, we don’t recommend using such type of model because it is difficult to be interpreted.

## Model Fitting

So far, the specified auxiliary variables are only stored in lslx object. They are actually used after implementing the $fit() related methods. lslx_miss$fit_lasso()
CONGRATS: Algorithm converges under EVERY specified penalty level.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100 

By default, fit related methods implement two-step method (possibly with auxiliary variables) for handling missing values. User can specify the missing method explicitly via missing_method argument. Another missing method in the current version is listwise deletion. However, listwise deletion has no theoretical advantages over the two-step method.

## Model Summarizing

The following code summarizes the fitting result under the penalty level selected by a Robust version of Akaike information criterion (RAIC). The number of missing patterns shows how many missing patterns present in the data set (include the complete pattern). If the lslx object is initialized via raw data, by default, a corrected sandwich standard error will be used for coefficient test. The correction is based on the asymptotic covariance of saturated moments derived by full information maximum likelihood. Also, the mean adjusted likelihood ratio test is based on this quantity. For the reference, please see the section of Missing Data in ?lslx.

lslx_miss\$summarize(selector = "raic")
General Information
number of observations                             301.000
number of complete observations                    138.000
number of missing patterns                           4.000
number of groups                                     1.000
number of responses                                  9.000
number of factors                                    3.000
number of free coefficients                         30.000
number of penalized coefficients                    36.000

Numerical Conditions
selected lambda                                      0.130
selected delta                                        none
selected step                                         none
objective value                                      0.168
objective gradient absolute maximum                  0.001
objective Hessian convexity                          0.741
number of iterations                                 6.000
loss value                                           0.051
number of non-zero coefficients                     41.000
degrees of freedom                                  13.000
robust degrees of freedom                           16.174
scaling factor                                       1.244

Fit Indices
root mean square error of approximation (rmsea)      0.025
comparative fit index (cfi)                          0.997
non-normed fit index (nnfi)                          0.992
standardized root mean of residual (srmr)            0.032

Likelihood Ratio Test
statistic         df    p-value
unadjusted          15.431     13.000      0.281
mean-adjusted       12.403     13.000      0.495

Root Mean Square Error of Approximation Test
estimate      lower      upper
unadjusted           0.025      0.000      0.071
mean-adjusted        0.000      0.000      0.069

Coefficient Test (Std.Error = "sandwich")
Factor Loading
type  estimate  std.error  z-value  p-value  lower  upper
x1<-visual     free     0.921      0.092    9.992    0.000  0.740  1.101
x2<-visual     free     0.441      0.079    5.559    0.000  0.286  0.597
x3<-visual     free     0.607      0.070    8.631    0.000  0.469  0.745
x4<-textual     free     0.974      0.063   15.585    0.000  0.852  1.097
x5<-textual     free     1.057      0.063   16.738    0.000  0.933  1.180
x6<-textual     free     0.920      0.059   15.688    0.000  0.805  1.035
x7<-speed     free     0.438      0.094    4.669    0.000  0.254  0.622
x8<-speed     free     0.526      0.093    5.634    0.000  0.343  0.709
x9<-speed     free     0.799      0.092    8.660    0.000  0.618  0.980

Covariance
type  estimate  std.error  z-value  p-value  lower  upper
textual<->visual     free     0.486      0.075    6.505    0.000  0.340  0.633
speed<->visual     free     0.567      0.077    7.407    0.000  0.417  0.718
speed<->textual     free     0.243      0.089    2.732    0.003  0.069  0.417
x2<->x1      pen     0.000        -        -        -      -      -
x3<->x1      pen     0.000        -        -        -      -      -
x4<->x1      pen     0.048      0.050    0.958    0.169 -0.050  0.147
x5<->x1      pen    -0.112      0.089   -1.254    0.105 -0.286  0.063
x6<->x1      pen     0.000        -        -        -      -      -
x7<->x1      pen    -0.080      0.063   -1.261    0.104 -0.204  0.044
x8<->x1      pen     0.000        -        -        -      -      -
x9<->x1      pen     0.000        -        -        -      -      -
x3<->x2      pen     0.103      0.074    1.386    0.083 -0.043  0.249
x4<->x2      pen     0.000        -        -        -      -      -
x5<->x2      pen     0.000        -        -        -      -      -
x6<->x2      pen     0.000        -        -        -      -      -
x7<->x2      pen    -0.104      0.058   -1.816    0.035 -0.217  0.008
x8<->x2      pen     0.000        -        -        -      -      -
x9<->x2      pen     0.000        -        -        -      -      -
x4<->x3      pen     0.000        -        -        -      -      -
x5<->x3      pen    -0.079      0.060   -1.317    0.094 -0.196  0.039
x6<->x3      pen     0.000        -        -        -      -      -
x7<->x3      pen     0.000        -        -        -      -      -
x8<->x3      pen     0.000        -        -        -      -      -
x9<->x3      pen     0.000        -        -        -      -      -
x5<->x4      pen     0.000        -        -        -      -      -
x6<->x4      pen     0.000        -        -        -      -      -
x7<->x4      pen     0.079      0.043    1.851    0.032 -0.005  0.163
x8<->x4      pen     0.000        -        -        -      -      -
x9<->x4      pen     0.000        -        -        -      -      -
x6<->x5      pen     0.000        -        -        -      -      -
x7<->x5      pen     0.000        -        -        -      -      -
x8<->x5      pen     0.000        -        -        -      -      -
x9<->x5      pen     0.008      0.065    0.121    0.452 -0.120  0.136
x7<->x6      pen     0.000        -        -        -      -      -
x8<->x6      pen     0.013      0.042    0.300    0.382 -0.070  0.095
x9<->x6      pen    -0.017      0.048   -0.354    0.362 -0.111  0.077
x8<->x7      pen     0.253      0.072    3.531    0.000  0.113  0.394
x9<->x7      pen     0.000        -        -        -      -      -
x9<->x8      pen     0.000        -        -        -      -      -

Variance
type  estimate  std.error  z-value  p-value  lower  upper
visual<->visual    fixed     1.000        -        -        -      -      -
textual<->textual    fixed     1.000        -        -        -      -      -
speed<->speed    fixed     1.000        -        -        -      -      -
x1<->x1     free     0.473      0.143    3.300    0.000  0.192  0.754
x2<->x2     free     1.150      0.108   10.683    0.000  0.939  1.361
x3<->x3     free     0.878      0.077   11.470    0.000  0.728  1.028
x4<->x4     free     0.387      0.053    7.305    0.000  0.284  0.491
x5<->x5     free     0.410      0.064    6.435    0.000  0.285  0.535
x6<->x6     free     0.348      0.046    7.496    0.000  0.257  0.439
x7<->x7     free     0.930      0.086   10.861    0.000  0.762  1.098
x8<->x8     free     0.719      0.090    8.021    0.000  0.544  0.895
x9<->x9     free     0.334      0.137    2.437    0.007  0.065  0.603

Intercept
type  estimate  std.error  z-value  p-value  lower  upper
x1<-1     free     4.936      0.067   73.473    0.000  4.804  5.067
x2<-1     free     6.088      0.068   89.855    0.000  5.955  6.221
x3<-1     free     2.250      0.065   34.579    0.000  2.123  2.378
x4<-1     free     3.061      0.067   45.694    0.000  2.930  3.192
x5<-1     free     4.420      0.093   47.369    0.000  4.237  4.603
x6<-1     free     2.186      0.063   34.667    0.000  2.062  2.309
x7<-1     free     4.186      0.063   66.766    0.000  4.063  4.309
x8<-1     free     5.527      0.058   94.854    0.000  5.413  5.641
x9<-1     free     5.366      0.074   72.693    0.000  5.221  5.510