After writing about 20 R packages, I found I had accumulated several utility functions that I used across different packages, so I decided to extract them into a separate package. Previously I had been using the evil triple-colon
::: to access these internal utility functions. Now with xfun, these functions have been exported, and more importantly, documented. It should be better to use them under the sun instead of in the dark.
This page shows examples of a subset of functions in this package. For a full list of functions, see the help page
help(package = 'xfun'). The source package is available on Github: https://github.com/yihui/xfun.
I have been bitten many times by partial matching in lists, e.g., when I want
x$a but the element
a does not exist in the list
x, it returns the value
abc exists in
x. This is very annoying to me which is why I created strict lists. A strict list is a list for which the partial matching of the
$ operator is disabled. The functions
xfun::as_strict_list() are the equivalents to
base::as.list() respectively which always return as strict list, e.g.,
## $aaa ##  "I am aaa" ## ## $b ##  1 2 3 4 5
##  "I am aaa"
##  1 2 3 4 5
##  "I am aaa"
Similarly, the default partial matching in
attr() can be annoying, too. The function
xfun::attr() is simply a shorthand of
attr(..., exact = TRUE).
I want it, or I do not want. There is no “I probably want”.
When R prints a character vector, your eyes may be distracted by the indices like
, double quotes, and escape sequences. To see a character vector in its “raw” form, you can use
cat(..., sep = '\n'). The function
raw_string() marks a character vector as “raw”, and the corresponding printing function will call
cat(sep = '\n') to print the character vector to the console.
A B C D E F
 "a \"b\"" "hello\tworld!"
a "b" hello world!
I have used
paste(readLines('foo'), collapse = '\n') many times before I decided to write a simple wrapper function
xfun::file_string(). This function also makes use of
raw_string(), so you can see the content of a file in the console as a side-effect, e.g.,
YEAR: 2018 COPYRIGHT HOLDER: Yihui Xie
 "YEAR: 2018\nCOPYRIGHT HOLDER: Yihui Xie"
I can never remember how to properly use
sed to search and replace strings in multiple files. My favorite IDE, RStudio, has not provided this feature yet (you can only search and replace in the currently opened file). Therefore I did a quick and dirty implementation in R, including functions
gsub_ext(), to search and replace strings in multiple files under a directory. Note that the files are assumed to be encoded in UTF-8. If you do not use UTF-8, we cannot be friends. Seriously.
All functions are based on
gsub_file(), which performs searching and replacing in a single file, e.g.,
gsub_dir() is very flexible: you can limit the list of files by MIME types, or extensions. For example, if you want to do substitution in text files, you may use
gsub_dir(..., mimetype = '^text/').
WARNING: Before using these functions, make sure that you have backed up your files, or version control your files. The files will be modified in-place. If you do not back up or use version control, there is no chance to regret.
sans_ext() are based on functions in tools. The function
with_ext() adds or replaces extensions of filenames, and it is vectorized.
##  "doc" "tex" "Rmd"
##  "abc" "def123" "path/to/foo"
##  "abc.txt" "def123.txt" "path/to/foo.txt"
##  "abc.ppt" "def123.sty" "path/to/foo.Rnw"
##  "abc.html" "def123.html" "path/to/foo.html"
The series of functions
is_windows() test the types of the OS, using the information from
##  TRUE
##  TRUE
##  FALSE
##  FALSE
Oftentimes I see users attach a series of packages in the beginning of their scripts by repeating
library() multiple times. This could be easily vectorized, and the function
xfun::pkg_attach() does this job. For example,
is equivalent to
I also see scripts that contain code to install a package if it is not available, e.g.,
This could be done via
pkg_attach2() is a shorthand of
pkg_attach(..., install = TRUE), which means if a package is not available, install it. This function can also deal with multiple packages.
loadable() tests if a package is loadable.
write_utf8() can be used to read/write files in UTF-8. They are simple wrappers of
n2w() for short) converts numbers to English words.
##  "Zero"
##  "zero" "eleven" ##  "twenty-two" "thirty-three" ##  "forty-four" "fifty-five" ##  "sixty-six" "seventy-seven" ##  "eighty-eight" "ninety-nine" ##  "one hundred and ten" "one hundred and twenty-one"
##  "one million"
##  "one hundred billion, twelve million, three hundred forty-five thousand, six hundred seventy-eight"
##  "minus nine hundred eighty-seven million, six hundred fifty-four thousand, three hundred twenty-one"
##  "nine hundred ninety-nine trillion, nine hundred ninety-nine billion, nine hundred ninety-nine million, nine hundred ninety-nine thousand, nine hundred ninety-nine"
cache_rds() provides a simple caching mechanism: the first time an expression is passed to it, it saves the result to an RDS file; the next time it will read the RDS file and return the value instead of evaluating the expression again. If you want to invalidate the cache, you can use the argument
rerun = TRUE.
When the function is used in a code chunk in a knitr document, the RDS cache file is saved to the path determined by the chunk label (the filename) and the chunk option
cache.path (usually the cache directory), so you do not have to provide the
dir arguments of
This caching mechanism is much simpler than knitr’s caching. Cache invalidation is often tricky (see this post), so this function may be helpful if you want more transparency and control over when to invalidate the cache (for
cache_rds(), the cache is invalidated only when the cache file is deleted, which can be achieved via the argument
rerun = TRUE).
R CMD check on the reverse dependencies of knitr and rmarkdown is my least favorite thing in developing R packages, because the numbers of their reverse dependencies are huge. The function
rev_check() reflects some of my past experience in this process. I think I have automated it as much as possible, and made it as easy as possible to discover possible new problems introduced by the current version of the package (compared to the CRAN version). Finally I can just sit back and let it run.
rstudio_type() inputs characters in the RStudio source editor as if they were typed by a human. I came up with the idea when preparing my talk for rstudio::conf 2018 (see this post for more details).
Since I have never been fully satisfied by the output of
sessionInfo(), I tweaked it to make it more useful in my use cases. For example, it is rarely useful to print out the names of base R packages, or information about the matrix products / BLAS / LAPACK. Oftentimes I want additional information in the session information, such as the Pandoc version when rmarkdown is used. The function
session_info() tweaks the output of
sessionInfo(), and makes it possible for other packages to append information in the output of
You can choose to print out the versions of only the packages you specify, e.g.,
## R version 3.6.3 (2020-02-29) ## Platform: x86_64-apple-darwin15.6.0 (64-bit) ## Running under: macOS Catalina 10.15.3 ## ## Locale: C / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8 ## ## Package version: ## knitr_1.28.2 rmarkdown_2.1.1 tinytex_0.21.2 xfun_0.13 ## ## Pandoc version: 184.108.40.206