SmartEDA CRAN status

Downloads Total Downloads

Authors: Dayanand Ubrangala, Kiran R, Ravi Prasad Kondapalli and Sayan Putatunda


In a quality statistical data analysis the initial step has to be exploratory. Exploratory data analysis begins with the univariate exploratory analysis - examining the variable one at a time. Next comes bivariate analysis followed by multivariate analysis. SmartEDA package helps in getting the complete exploratory data analysis just by running the function instead of writing lengthy r code.

Functionalities of SmartEDA

The SmartEDA R package has four unique functionalities as


Journal of Open Source Software Article

An article describing SmartEDA package for exploratory data analysis approach has been published in the Journal of Open Source Software JOSS. Please cite the paper if you use SmartEDA in your work.


The package can be installed directly from CRAN.


To contribute, download the latest development version of SmartEDA from GitHub via devtools:

devtools::install_github("daya6489/SmartEDA",ref = "develop")



In this vignette, we will be using a simulated data set containing sales of child car seats at 400 different stores.

Data Source ISLR package.

Install the package “ISLR” to get the example data set.

    ## Load sample dataset from ISLR pacakge
    Carseats= ISLR::Carseats

Overview of the data

Understanding the dimensions of the dataset, variable names, overall missing summary and data types of each variables

## overview of the data; 
## structure of the data    

Summary of numerical variables

To summarise the numeric variables, you can use following r codes from this pacakge

## Summary statistics by – overall
## Summary statistics by – overall with correlation 
## Summary statistics by – category

Graphical representation of all numeric features

## Generate Boxplot by category
ExpNumViz(mtcars,target="gear",type=2,nlim=25,fname = file.path(tempdir(),"Mtcars2"),Page = c(2,2))
## Generate Density plot
ExpNumViz(mtcars,target=NULL,type=3,nlim=25,fname = file.path(tempdir(),"Mtcars3"),Page = c(2,2))
## Generate Scatter plot
ExpNumViz(mtcars,target="carb",type=3,nlim=25,fname = file.path(tempdir(),"Mtcars4"),Page = c(2,2))

Summary of Categorical variables

## Frequency or custom tables for categorical variables
## Summary statistics of categorical variables
    ExpCatStat(Carseats,Target="Urban",result = "Stat",clim=10,nlim=5,Pclass="Yes")
## Inforamtion value and Odds value
    ExpCatStat(Carseats,Target="Urban",result = "IV",clim=10,nlim=5,Pclass="Yes")

Graphical representation of all categorical variables

## column chart
    ExpCatViz(Carseats,target="Urban",fname=NULL,clim=10,col=NULL,margin=2,Page = c(2,1),sample=2)
## Stacked bar graph
    ExpCatViz(Carseats,target="Urban",fname=NULL,clim=10,col=NULL,margin=2,Page = c(2,1),sample=2)
## Variable importance graph using information values

Variable importance based on Information value

  ExpCatStat(Carseats,Target="Urban",result = "Stat",clim=10,nlim=5,bins=10,Pclass="Yes",plot=TRUE,top=10,

Create HTML EDA report

Create a exploratory data analysis report in HTML format


Quantile-quantile plot for numeric variables


Parallel Co-ordinate plots

## Defualt ExpParcoord funciton
## With Stratified rows and selected columns only
## Without stratification

Univariate outlier analysis

## Boxplot method
  ExpOutliers(Carseats, varlist = c("Sales","CompPrice","Income"), method = "boxplot",  capping = c(0.1, 0.9))

## treating outlier value with mean imputation
  ExpOutliers(Carseats, varlist = c("Sales","CompPrice","Income"), method = "boxplot",  treatment = "mean", capping = c(0.1, 0.9))

## Standard deviation method
  ExpOutliers(Carseats, varlist = c("Sales","CompPrice","Income"), method = "3xStDev",  treatment = "mean", capping = c(0.1, 0.9))

Exploratory analysis - Custom tables, summary statistics

Descriptive summary on all input variables for each level/combination of group variable. Also while running the analysis we can filter row/cases of the data.

    ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")



Please read the contribution guidelines prior to submitting a pull request. Try to code and submit a new pull request (PR). Even if not perfect, we will help you to make a great PR


See article wiki page.


Chon Ho, Y. (2010). Exploratory data analysis in the context of data mining and resampling. International Journal of Psychological Research, 3(1), 9–22.

DiCerbo et al. (2015). Serious Games Analytics. Advances in Game-Based Learning. In C. Loh, Y. Sheng, & D. Ifenthaler (Eds.),. Cham: Springer. doi:10.1007/978-3-319-05834-4

Hoaglin, D., Mosteller, F., & Tukey, J. (1983). Understanding robust and exploratory data analysis. Wiley Series in probability and mathematical statistics, New-York.

Jaggi, S. (2013). Descriptive statistics and exploratory data analysis. Indian Agricultural Statistics Research Institute

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2017). ISLR: Data for an Introduction to Statistical Learning with Applications in R.

Konopka et al. (2018). Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data. PLoS ONE, 13(8).

Liu, Q. (2014, October). The Application of Exploratory Data Analysis in Auditing (PhD thesis). Newark Rutgers, The State University of New Jersey, Newark, New Jersey.

Ma, X., Hummer, D., Golden, J. J., Fox, P. A., Hazen, R. M., Morrison, S. M., Downs, R.T., et al. (2017). Using Visual Exploratory Data Analysis to Facilitate Collaboration and Hypothesis Generation in Cross-Disciplinary Research. International Journal of Geo-Information, 6(368), 1–11.

Nair, A. (2018). RtutoR: Shiny Apps for Plotting and Exploratory Analysis.

Ryu, C. (2018). dlookr: Tools for Data Diagnosis, Exploration, Transformation.

Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.

Ubrangala, D., Rama, K., Kondapalli, R. P., & Putatunda, S. (2018). SmartEDA: Summarize and Explore the Data. Retrieved from