modelStudio - perks and features

Hubert Baniecki

2020-01-21

modelStudio::modelStudio uses your model, data and new observations, to provide local and global explanations. It generates plots and descriptions in the form of the serverless HTML site, that supports animations and interactivity made with D3.js.

Let’s use DALEX::HR dataset to explore modelStudio parameters:

train <- DALEX::HR[1:100,]
train$fired <- ifelse(train$status == "fired", 1, 0)
train <- train[,-6]

head(train)
DALEX::HR dataset
gender age hours evaluation salary fired
male 32.58 41.89 3 1 1
female 41.21 36.34 2 5 1
male 37.71 36.82 3 0 1
female 30.06 38.96 3 2 1
male 21.10 62.15 5 3 0
male 40.12 69.54 2 0 1

Prepare data and model for the explainer:

# create a random forest model
library("randomForest")
model <- randomForest(fired ~., data = train)

# prepare validation dataset
test <- DALEX::HR_test[1:100,]
test$fired <- ifelse(test$status == "fired", 1, 0)
test <- test[,-6]

# create an explainer
explainer <- DALEX::explain(model = model,
                            data = test[,-6],
                            y = test[,6],
                            verbose = FALSE)

# start modelStudio
library("modelStudio")

modelStudio parameters

local explanations

You can pass data points to new_observation parameter for local explanations such as Break Down, SHAP Values and Ceteris Paribus Profiles.

grid size

You can achieve bigger or smaller modelStudio grid with facet_dim parameter.

animations

You can manipulate time parameter to set animation length. Value 0 will make them invisible.

more calculations means more time

N is a number of observations used for calculation of partial dependency profiles. B is a number of random paths used for calculation of SHAP values. You can decrease N and B parameters to lower computation time or increase them to get more accurate empirical results.

progress bar

You can hide computation progress bar messages with show_info parameter.

viewer or browser?

You can change viewer parameter to set where to display modelStudio. Best described here: r2d3 viewer argument.


parallel computation

You can speed up modelStudio computation by setting parallel parameter to TRUE. It uses parallelMap package to calculate local explainers faster. It is really useful when using modelStudio with complicated models, vast datasets or simply many observations are being processed.

All options can be set outside of function call. More on that here.

#set up the cluster 
options(
  parallelMap.default.mode        = "socket",
  parallelMap.default.cpus        = 4,
  parallelMap.default.show.info   = FALSE
)

# calculations will be distributed into 4 cores
modelStudio(explainer, new_observation = test[1:16,], parallel = TRUE)

plot options

You can customize some of modelStudio looks by overwriting default options returned by modelStudioOptions().

# set additional graphical parameters
new_options <- modelStudioOptions(
  show_subtitle = TRUE,
  bd_subtitle = "Hello World",
  line_size = 5,
  point_size = 9,
  line_color = "pink",
  point_color = "purple",
  bd_positive_color = "yellow",
  bd_negative_color = "orange" 
)

modelStudio(explainer, new_observation = test[1,], options = new_options)

R mlr model

You can use DALEXtra::explain_*() functions to explain various models. Bellow basic example of making modelStudio for mlr model using DALEXtra::explain_mlr()

library(DALEXtra)
library(mlr)

# prepare mlr model
task <- mlr::makeRegrTask(id = "task",
                          data = train,
                          target = "fired")

learner <- mlr::makeLearner("regr.randomForest",
                            par.vals = list(ntree = 300),
                            predict.type = "response")

model <- mlr::train(learner, task)

# create an explainer for mlr model
explainer_mlr <- explain_mlr(model, data = test[,-6], y = test[,6])

# call model studio for mlr model
modelStudio(explainer_mlr,
            new_observation = test[1:4,],
            N = 100, B = 10)

Python scikit-learn model

Bellow basic example of making modelStudio for scikit-learn model using DALEXtra::explain_scikitlearn() and Python Anaconda.

library(DALEXtra)

titanic_test <- read.csv(system.file("extdata", "titanic_test.csv", package = "DALEXtra")) 
titanic_train <- read.csv(system.file("extdata", "titanic_train.csv", package = "DALEXtra"))

# read scikitlearn model
yml <- system.file("extdata", "scikitlearn.yml", package = "DALEXtra")
pkl_gbm <- system.file("extdata", "scikitlearn.pkl", package = "DALEXtra")
pkl_SGDC <- "SGDC.pkl"

# prepare an explainer for scikitlearn model
explainer_scikit <- explain_scikitlearn(pkl_gbm,
                                        yml = yml,
                                        data = titanic_test[,1:17],
                                        y = titanic_test[,18])

# start model studio
modelStudio(explainer_scikit,
            new_observation = titanic_test[1:4,1:17],
            N = 100, B = 10, options = modelStudioOptions(margin_left = 160))

modelStudio should work with any explainer class object. Find more about making those here.

References