deepdep package

Introduction

Package deepdep was created to acquire and visualize information on dependencies of R packages in a smart and convenient way. Most of its functionality is contained in two functions: deepdep – to get a data.frame with dependencies described and plot_dependencies – visualize this data.frame.

library(deepdep)

Use case

Suppose you’re creating an R package and you want to include graph of its dependencies to your vignette, README.md file on your git repository or article on your package. With deepdep you simply need to type one line:

plot_deepdep("YourPackageName")

But before we describe how this function works in detail, let’s see what are other functionalities of the package.

Features

deepdep package exports the following functions:

Those functions rely on each other and are ordered from the lowest to the highest level. We’ll describe what they exactly do and how on examples.

get_available_packages

This function lists, as the name indicates, available packages. The default behaviour is listing all CRAN packages.

t <- get_available_packages()
head(t, 100)
#>                   A3                aaSEA               ABACUS 
#>                 "A3"              "aaSEA"             "ABACUS" 
#>               abbyyR                  abc             abc.data 
#>             "abbyyR"                "abc"           "abc.data" 
#>              ABC.RAP               abcADM          ABCanalysis 
#>            "ABC.RAP"             "abcADM"        "ABCanalysis" 
#>             abcdeFBA             ABCoptim                ABCp2 
#>           "abcdeFBA"           "ABCoptim"              "ABCp2" 
#>                abcrf              abcrlda             abctools 
#>              "abcrf"            "abcrlda"           "abctools" 
#>                  abd                abdiv                  abe 
#>                "abd"              "abdiv"                "abe" 
#>                 abf2         ABHgenotypeR                abind 
#>               "abf2"       "ABHgenotypeR"              "abind" 
#>             abjutils                  abn          abnormality 
#>           "abjutils"                "abn"        "abnormality" 
#>          abodOutlier                 ABPS        AbsFilterGSEA 
#>        "abodOutlier"               "ABPS"      "AbsFilterGSEA" 
#>                AbSim            abstractr               abtest 
#>              "AbSim"          "abstractr"             "abtest" 
#>             abundant               Ac3net                  ACA 
#>           "abundant"             "Ac3net"                "ACA" 
#>                  acc        accelerometry         accelmissing 
#>                "acc"      "accelerometry"       "accelmissing" 
#>               accept   AcceptanceSampling               ACCLMA 
#>             "accept" "AcceptanceSampling"             "ACCLMA" 
#>              accrual              accrued               accSDA 
#>            "accrual"            "accrued"             "accSDA" 
#>                  ACD                 ACDm            ace2fastq 
#>                "ACD"               "ACDm"          "ace2fastq" 
#>             acebayes              acepack                 ACEt 
#>           "acebayes"            "acepack"               "ACEt" 
#>           acfMPeriod                 acid                acm4r 
#>         "acfMPeriod"               "acid"              "acm4r" 
#>             ACMEeqtl                acmeR                 ACNE 
#>           "ACMEeqtl"              "acmeR"               "ACNE" 
#>                 acnr              acopula     AcousticNDLCodeR 
#>               "acnr"            "acopula"   "AcousticNDLCodeR" 
#>                  acp                 aCRM            AcrossTic 
#>                "acp"               "aCRM"          "AcrossTic" 
#>                 acrt                  acs            ACSNMineR 
#>               "acrt"                "acs"          "ACSNMineR" 
#>                 acss            acss.data                ACSWR 
#>               "acss"          "acss.data"              "ACSWR" 
#>                ACTCD              ActFrag           Actigraphy 
#>              "ACTCD"            "ActFrag"         "Actigraphy" 
#>         ActiveDriver      ActiveDriverWGS       ActivePathways 
#>       "ActiveDriver"    "ActiveDriverWGS"     "ActivePathways" 
#>             activity       activityCounts             activPAL 
#>           "activity"     "activityCounts"           "activPAL" 
#>   activpalProcessing           actogrammr               actuar 
#> "activpalProcessing"         "actogrammr"             "actuar" 
#>           AcuityView                  ada               adabag 
#>         "AcuityView"                "ada"             "adabag" 
#>               adagio           adamethods        AdapEnetClass 
#>             "adagio"         "adamethods"      "AdapEnetClass" 
#>                adapr             AdapSamp           adaptalint 
#>              "adapr"           "AdapSamp"         "adaptalint" 
#>             AdaptFit           AdaptFitOS           AdaptGauss 
#>           "AdaptFit"         "AdaptFitOS"         "AdaptGauss" 
#>         adaptiveGPCA     AdaptiveSparsity          adaptivetau 
#>       "adaptiveGPCA"   "AdaptiveSparsity"        "adaptivetau" 
#>            adaptMCMC              adaptMT               ADAPTS 
#>          "adaptMCMC"            "adaptMT"             "ADAPTS" 
#>         adaptsmoFMRI            adaptTest          AdaSampling 
#>       "adaptsmoFMRI"          "adaptTest"        "AdaSampling" 
#>                 ADCT 
#>               "ADCT"

However, if you want to check if package is present in a little wider range – on CRAN or Bioconductor repositories, you simply need to set argument bioc = TRUE. In this case function is simply wrapper around BiocManager::available() and to use it you need to have BiocManager package (available via CRAN) installed.

t <- get_available_packages(bioc = TRUE)
head(t, 100)
#>   [1] "A3"               "ABACUS"           "ABAData"         
#>   [4] "ABAEnrichment"    "ABC.RAP"          "ABCanalysis"     
#>   [7] "ABCoptim"         "ABCp2"            "ABHgenotypeR"    
#>  [10] "ABPS"             "ABSSeq"           "ABarray"         
#>  [13] "ACA"              "ACCLMA"           "ACD"             
#>  [16] "ACDm"             "ACE"              "ACEt"            
#>  [19] "ACME"             "ACMEeqtl"         "ACNE"            
#>  [22] "ACSNMineR"        "ACSWR"            "ACTCD"           
#>  [25] "ADAM"             "ADAMgui"          "ADAPTS"          
#>  [28] "ADCT"             "ADDT"             "ADGofTest"       
#>  [31] "ADMM"             "ADMMnet"          "ADMMsigma"       
#>  [34] "ADPF"             "ADPclust"         "ADaCGH2"         
#>  [37] "AEDForecasting"   "AER"              "AF"              
#>  [40] "AFM"              "AFheritability"   "AGD"             
#>  [43] "AGDEX"            "AGHmatrix"        "AGSDest"         
#>  [46] "AGread"           "AHCytoBands"      "AHEnsDbs"        
#>  [49] "AHM"              "AHMbook"          "AICcmodavg"      
#>  [52] "AID"              "AIG"              "AIM"             
#>  [55] "AIMS"             "ALA4R"            "ALDEx2"          
#>  [58] "ALDqr"            "ALEPlot"          "ALL"             
#>  [61] "ALLMLL"           "ALS"              "ALSCPC"          
#>  [64] "ALSM"             "ALTopt"           "ALassoSurvIC"    
#>  [67] "AMAP.Seq"         "AMARETTO"         "AMCP"            
#>  [70] "AMCTestmakeR"     "AMGET"            "AMIAS"           
#>  [73] "AMModels"         "AMOEBA"           "AMORE"           
#>  [76] "AMOUNTAIN"        "AMPLE"            "AMR"             
#>  [79] "ANF"              "ANN2"             "ANOM"            
#>  [82] "ANOVA.TFNs"       "ANOVAIREVA"       "ANOVAShiny"      
#>  [85] "ANOVAreplication" "APCanalysis"      "APFr"            
#>  [88] "APIS"             "APML0"            "APPEstimation"   
#>  [91] "APSIM"            "APSIMBatch"       "APfun"           
#>  [94] "APtools"          "AR"               "AR1seg"          
#>  [97] "ARCensReg"        "ARHT"             "ARIbrain"        
#> [100] "AROC"

Another possibility is checking what packages are installed – you do it by adding local = TRUE parameter.

t <- get_available_packages(local = TRUE)
head(t, 100)
#> [1] "deepdep"

Result of this function is cached (for more details, see Caching section of this vignette).

get_description

When you know, that given package is available, you may want to obtain DESCRIPTION of this package, at least the most essential parts of it, especially dependencies. You can do it by calling:

get_description("DALEXtra")
#> DALEXtra: Extension for 'DALEX' Package
#> Maintainer: Szymon Maksymiuk <sz.maksymiuk@gmail.com> 
#> Description: 
#>  Provides wrapper of various machine learning models.
#> In applied machine learning, there
#> is a strong belief that we need to strike a balance
#> between interpretability and accuracy.
#> However, in field of the interpretable machine learning,
#> there are more and more new ideas for explaining black-box models,
#> that are implemented in 'R'.
#> 'DALEXtra' creates 'DALEX' Biecek (2018) <arXiv:1806.08915> explainer for many type of models
#> including those created using 'python' 'scikit-learn' and 'keras' libraries, 'java' 'h2o' library and
#> 'mljar' API. Important part of the package is Champion-Challenger analysis and innovative approach
#> to model performance across subsets of test data presented in Funnel Plot.
#> Third branch of 'DALEXtra' package is aspect importance analysis
#> that provides instance-level explanations for the groups of explanatory variables. 
#> Depends: R DALEX 
#> Imports: reticulate ggplot2 glmnet ggdendro gridExtra 
#> LinkingTo: 
#> Suggests: auditor ingredients gbm ggrepel h2o mljar mlr mlr3 randomForest rmarkdown rpart xgboost testthat 
#> Enhances: 
#> Scrap date: 2019-11-18 18:26:29

Again, you can pass bioc = TRUE if you want to check for this package in Bioconductor repository. Notice that if package is not found there, it will be searched for on CRAN. The reason behind this type of behaviour is the fact that packages present on Bioconductor are updated more often than on CRAN and not all of them are present here. Option local = TRUE for only installed packages is also possible. If a package is not available in a given source, the function will return NULL value:

get_description("a4")
#> NULL
get_description("a4", bioc = TRUE)
#> a4: Automated Affymetrix Array Analysis Umbrella Package
#> Maintainer: Tobias Verbeke <tobias.verbeke@openanalytics.eu>, Willem Ligtenberg <willem.ligtenberg@openanalytics.eu> 
#> Description: 
#>  Automated Affymetrix Array Analysis Umbrella Package 
#> Depends: a4Base a4Preproc a4Classif a4Core a4Reporting 
#> Imports: 
#> LinkingTo: 
#> Suggests: MLP nlcv ALL Cairo 
#> Enhances: 
#> Scrap date:

Result of this function is also cached (for more details, see Caching section of this vignette).

get_downloads

This package allows you obtaining information on how many times specified package was downloaded. However, it works only with CRAN packages.

get_downloads("ggplot2")
#>   last_day last_week last_month last_quarter last_half grand_total
#> 1    42154    257239    1100176      2889969   5886983    29510797

Results of this function is not cached.

get_dependencies

After parsing description file, you can now create a data.frame which will describe dependencies between given package and others. You do it by using this function:

get_dependencies("ggplot2")
#>           name  version    type last_day last_week last_month last_quarter
#> 1       digest     <NA> Imports    46802    286381    1085360      2565509
#> 2       gtable >= 0.1.1 Imports    20477    115493     504677      1274863
#> 3     lazyeval     <NA> Imports    24058    147338     615762      1675621
#> 4         MASS     <NA> Imports     6428     38682     170078       429475
#> 5         mgcv     <NA> Imports     2664     22556      85557       237226
#> 6     reshape2     <NA> Imports    25812    158260     660910      1759029
#> 7        rlang >= 0.3.0 Imports    60842    368553    1719344      4215001
#> 8       scales >= 0.5.0 Imports    28073    169199     717096      1947306
#> 9       tibble     <NA> Imports    37344    225580     980343      2578959
#> 10 viridisLite     <NA> Imports    20337    114338     495203      1247046
#> 11       withr >= 2.0.0 Imports    21365    122020     524415      1357577
#>    last_half grand_total
#> 1    5459762    25373731
#> 2    2830038    15980087
#> 3    3599710    17276392
#> 4     795539     5944026
#> 5     647687     4784117
#> 6    3803374    19927559
#> 7    8319435    26138949
#> 8    4060027    20247243
#> 9    5301979    23225517
#> 10   2757092    11612034
#> 11   2969451    12497526

As with two previously described functions - get_available_packages and get_description, here you can also use bioc = TRUE or local = TRUE and again, in case the package is not available, the result will be NULL. Here you have another options to set.

The first one is parameter downloads – should number of downloads of packages be included? It uses get_downloads and works only with CRAN packages.

Another, more important parameter is dependency_type. You can specify how detailed should be list of dependencies. Default value is c("Depends", "Imports"), but you can chose any combination of those and additionally "Suggests", "Enhances", "LinkingTo".

get_dependencies("ggplot2", downloads = FALSE, dependency_type = c("Imports", "Suggests", "Enhances"))
#>             name        version     type
#> 1         digest           <NA>  Imports
#> 2         gtable       >= 0.1.1  Imports
#> 3       lazyeval           <NA>  Imports
#> 4           MASS           <NA>  Imports
#> 5           mgcv           <NA>  Imports
#> 6       reshape2           <NA>  Imports
#> 7          rlang       >= 0.3.0  Imports
#> 8         scales       >= 0.5.0  Imports
#> 9         tibble           <NA>  Imports
#> 10   viridisLite           <NA>  Imports
#> 11         withr       >= 2.0.0  Imports
#> 12          covr           <NA> Suggests
#> 13         dplyr           <NA> Suggests
#> 14 ggplot2movies           <NA> Suggests
#> 15        hexbin           <NA> Suggests
#> 16         Hmisc           <NA> Suggests
#> 17         knitr           <NA> Suggests
#> 18       lattice           <NA> Suggests
#> 19       mapproj           <NA> Suggests
#> 20          maps           <NA> Suggests
#> 21      maptools           <NA> Suggests
#> 22      multcomp           <NA> Suggests
#> 23       munsell           <NA> Suggests
#> 24          nlme           <NA> Suggests
#> 25       profvis           <NA> Suggests
#> 26      quantreg           <NA> Suggests
#> 27         rgeos           <NA> Suggests
#> 28     rmarkdown           <NA> Suggests
#> 29         rpart           <NA> Suggests
#> 30            sf       >= 0.7-3 Suggests
#> 31       svglite >=\n1.2.0.9001 Suggests
#> 32      testthat      >= 0.11.0 Suggests
#> 33        vdiffr       >= 0.3.0 Suggests
#> 34            sp           <NA> Enhances

Result of this function is not cached (at least yet).

deepdep

The main function of the package – it is simply wrapper around get_dependencies, that allows you getting not only dependencies, but also dependencies of the dependencies iteratively! (Now you know, why we called it deepdep).

Parameters are the same as in get_dependencies, but additionally you can specify depth parameter, which describes how many iterations it function should perform. If depth equals 1, it’s simply the same as calling get_dependencies.

deepdep("ggplot2", depth = 2)
#>      origin         name   version    type
#> 1   ggplot2       digest      <NA> Imports
#> 2   ggplot2       gtable  >= 0.1.1 Imports
#> 3   ggplot2     lazyeval      <NA> Imports
#> 4   ggplot2         MASS      <NA> Imports
#> 5   ggplot2         mgcv      <NA> Imports
#> 6   ggplot2     reshape2      <NA> Imports
#> 7   ggplot2        rlang  >= 0.3.0 Imports
#> 8   ggplot2       scales  >= 0.5.0 Imports
#> 9   ggplot2       tibble      <NA> Imports
#> 10  ggplot2  viridisLite      <NA> Imports
#> 11  ggplot2        withr  >= 2.0.0 Imports
#> 12     mgcv         nlme >= 3.1-64 Depends
#> 13     mgcv       Matrix      <NA> Imports
#> 14 reshape2         plyr  >= 1.8.1 Imports
#> 15 reshape2         Rcpp      <NA> Imports
#> 16 reshape2      stringr      <NA> Imports
#> 17   scales       farver  >= 2.0.0 Imports
#> 18   scales     labeling      <NA> Imports
#> 19   scales      munsell    >= 0.5 Imports
#> 20   scales           R6      <NA> Imports
#> 21   scales RColorBrewer      <NA> Imports
#> 22   scales  viridisLite      <NA> Imports
#> 23   scales    lifecycle      <NA> Imports
#> 24   tibble          cli      <NA> Imports
#> 25   tibble       crayon  >= 1.3.4 Imports
#> 26   tibble        fansi  >= 0.4.0 Imports
#> 27   tibble       pillar >=\n1.3.1 Imports
#> 28   tibble    pkgconfig      <NA> Imports
#> 29   tibble        rlang  >= 0.3.0 Imports

plot_dependencies

As famous quote says,

A picture is worth more than a thousand words.

That’s why we have plot_dependencies function. It allows visualizing easily what are dependencies of specified package.

The function is generic, and currently supports two types of object – you can pass a deepdep object, result of the calling the deepdep function or just type name of the package. With the latter option you can also pass arguments to get_dependencies as additional parameters.

dd <- deepdep("tibble", 2)
plot_dependencies(dd)

plot_dependencies("DT", depth = 2, dependency_type = c("Imports", "Depends", "Suggests"))

In each of the plots you can see one package name in the centre and two circles of packages gathered around them. These are dependencies of the first and second level.

Default plot type is circular, as you can see on the examples presented above. However, you can set plot_type parameter to tree.

plot_dependencies(dd, type = "tree")

Not all dependencies are plotted. To increase readability, dependencies on the same level are hidden, but you can change this behaviour

plot_dependencies(dd, same_level = TRUE)

You can also make use of numbers of downloads you obtained. There is an option to add labels to only certain percentage of most downloaded packages among those that are about to be plotted. This is meant to increase readability of the plot.

plot_dependencies("tidyverse", type = "circular", label_percentage = 0.2, downloads = TRUE, depth = 3)

Finally, returned object is a ggplot object, so you can easily manipulate them with syntax known from ggplot2 package. We also use ggraph enhancement for plotting graphs.

plot_dependencies(dd) +
  ggplot2::scale_fill_manual(values = c("#462CF8", "#F23A90", "#AF1023")) +
  ggraph::scale_edge_color_manual(values = "black") 
#> Scale for 'fill' is already present. Adding another scale for 'fill', which
#> will replace the existing scale.

Caching

As we notoriously indicated in the text, we are using caching to make everything a little bit faster. Functions that operate on the lowest level, after scrapping information from the repositories, store them in temporary files on the local machine. As a consequence, calling function the second time with the same set of parameters, should be faster.

State is also refreshed every 20 minutes to make sure you don’t miss any major update.

To make sure that you have the most recent data, you need to call get_available_packages and get_description with parameter reset_cache = TRUE.

get_available_packages(reset_cache = TRUE)