# Wrapping APIs

A common use for httr2 is wrapping up a useful API and exposing it in an R package where each API endpoint (i.e. a URL with parameters) becomes an R function with documented arguments. This vignette will show you how, starting with a very simple API that doesn’t need authentication, then slowly working up in complexity. Along the way, you’ll learn about how to:

• Expose important details from HTTP errors in R errors.

• Handle various types of authentication.

• Consistently throttle the rate of requests or dynamically respond to rate limiting headers sent by the server.

I assume you’re familiar with the basics of building a package. If not, you might want to read the “The Whole Game” chapter of R packages first.

library(httr2)

## Faker API

We’ll start with a very simple API, faker API, which provides a collection of techniques for generating fake data. Before we start writing the sort of functions that you might put in a package, we’ll perform a request just to see how the basics work:

# We start by creating a request that uses the base API url
req <- request("https://fakerapi.it/api/v1")
resp <- req %>%
# Then we add on the images path
req_url_path_append("images") %>%
# Add query parameters _width and _quantity
req_url_query(_width = 380, _quantity = 1) %>%
req_perform()

# The result comes back as JSON
resp %>% resp_body_json() %>% str()
#> List of 4
#>  $status: chr "OK" #>$ code  : int 200
#>  $total : int 1 #>$ data  :List of 1
#>   ..$:List of 3 #> .. ..$ title      : chr "Corporis omnis est sint et."
#>   .. ..$description: chr "Laudantium quasi quia enim. Esse aut dolor quam tenetur. Nihil voluptatibus eius sed non autem autem sint. Haru"| __truncated__ #> .. ..$ url        : chr "http://placeimg.com/380/480/any"

### Errors

It’s always worth a little early experimentation to see if we get any useful information from errors. The httr2 defaults get in your way here, because if you retrieve an unsuccessful HTTP response, you automatically get an error that prevents you from further inspecting the body:

req %>%
req_url_path_append("invalid") %>%
req_perform()
#> Error: HTTP 404 Not Found.

However, you can access the last response (successful or not) with last_response():

resp <- last_response()
resp %>% resp_body_json()
#> $status #> [1] "Not found" #> #>$code
#> [1] 404
#>
#> $total #> [1] 0 It doesn’t look like there’s anything useful there. Sometimes useful info is returned in the headers, so let’s check: resp %>% resp_headers() #> <httr2_headers> #> Server: nginx #> Content-Type: application/json #> Transfer-Encoding: chunked #> Connection: keep-alive #> Vary: Accept-Encoding #> X-Powered-By: PHP/7.3.16 #> Cache-Control: no-cache, private #> Date: Mon, 27 Sep 2021 20:40:35 GMT #> Access-Control-Allow-Origin: * #> Access-Control-Allow-Methods: GET #> Access-Control-Allow-Credentials: true #> Access-Control-Max-Age: 86400 #> Access-Control-Allow-Headers: Content-Type, Authorization, X-Requested-With #> Content-Encoding: gzip It doesn’t look like we’re getting any more useful information, so we can leave the req_error() default as is. We’ll have another go later with an API that does provide more details. ### User agent If you’re wrapping this code into a package, it’s considered polite to set a user agent, so that, if your package accidentally does something horribly wrong, the developers of the website can figure out who to reach out to. You can do this with the req_user_agent() function: req %>% req_user_agent("my_package_name (http://my.package.web.site)") %>% req_dry_run() #> GET /api/v1 HTTP/1.1 #> Host: fakerapi.it #> User-Agent: my_package_name (http://my.package.web.site) #> Accept: */* #> Accept-Encoding: deflate, gzip ### Core request function Once you’ve made a few successful requests, it’s worth seeing if you can figure out the general pattern so you can wrap it up into a function that will become the core of your package. For faker, I spent a little with the documentation noting some commonalities: • Every URL is of the form https://fakerapi.it/api/v1/{resource}, and data is passed to the resource with query parameters. All parameters start with _. • Every resource has three common query parameters: _locale, _quantity, and _seed. • All endpoints return JSON data. This lead me to construct the following function: faker <- function(resource, ..., quantity = 1, locale = "en_US", seed = NULL) { params <- list( ..., quantity = quantity, locale = locale, seed = seed ) names(params) <- paste0("_", names(params)) request("https://fakerapi.it/api/v1") %>% req_url_path_append(resource) %>% req_url_query(!!!params) %>% req_user_agent("my_package_name (http://my.package.web.site)") %>% req_perform() %>% resp_body_json() } str(faker("images", width = 300)) #> List of 4 #>$ status: chr "OK"
#>  $code : int 200 #>$ total : int 1
#>  $data :List of 1 #> ..$ :List of 3
#>   .. ..$title : chr "Repellat illo officiis nemo." #> .. ..$ description: chr "Quia fugit est sed velit sint sit minima. Iusto saepe sint hic recusandae ducimus expedita. Voluptatem quis ull"| __truncated__
#>   .. ..$url : chr "http://placeimg.com/300/480/any" I’ve made a few important choices here: • I’ve decided to supply default values for the quantity and locale parameters. This makes my function easier to demo in this vignette. • I’ve used a default of NULL for the seed argument. req_url_query() will automatically drop NULL arguments so this means that no default value is sent to the API, but when you read the function definition you can still see that seed is accepted. • I automatically all prefix query parameters with _ because argument names starting with _ are hard to type in R. • My function generates the request, performs it, and extracts the body of the response. This works well for simple cases, but for more complex APIs you might want to return a request object that can be modified before being performed. I also used one cool trick: req_url_query() uses dynamic dots, so we can use !!! to convert (e.g.) req_url_query(req, !!!list(_quantity = 1, _locale = "en_US")) into req_url_query(req, _quantity = 1, _locale = "en_US"). ### Wrapping individual endpoints faker() is quite general — it’s a good tool for the package developer because you can read the faker documentation and translate it to a function call. But it’s not very friendly for the package user who might not know anything about web APIs. So typically the next step in the process is to wrap up some individual endpoints with their own functions. For example, let’s take the persons endpoint which has three additional parameters: gender (male or female), birthday_start, and birthday_end. A simple wrapper would start something like this: faker_person <- function(gender = NULL, birthday_start = NULL, birthday_end = NULL, quantity = 1, locale = "en_US", seed = NULL) { faker( "persons", gender = gender, birthday_start = birthday_start, birthday_end = birthday_end, quantity = quantity, locale = locale, seed = seed ) } str(faker_person("male")) #> List of 4 #>$ status: chr "OK"
#>  $code : int 200 #>$ total : int 1
#>  $data :List of 1 #> ..$ :List of 9
#>   .. ..$firstname: chr "Muhammad" #> .. ..$ lastname : chr "Kilback"
#>   .. ..$email : chr "jacobson.domenica@yahoo.com" #> .. ..$ phone    : chr "+8824082411860"
#>   .. ..$birthday : chr "1936-09-27" #> .. ..$ gender   : chr "male"
#>   .. ..$address :List of 9 #> .. .. ..$ street        : chr "224 Wolff Point"
#>   .. .. ..$streetName : chr "Arnoldo Grove" #> .. .. ..$ buildingNumber: chr "2008"
#>   .. .. ..$city : chr "Port Pattie" #> .. .. ..$ zipcode       : chr "30524-8062"
#>   .. .. ..$country : chr "Guam" #> .. .. ..$ county_code   : chr "AW"
#>   .. .. ..$latitude : num 35.8 #> .. .. ..$ longitude     : num -22.3
#>   .. ..$website : chr "http://bruen.org" #> .. ..$ image    : chr "http://placeimg.com/640/480/people"

We could make it more user friendly by checking the input types, and returning the result as a tibble. I did a quick and dirty conversion using purrr; depending on your needs you could use base R code or tidyr::hoist().

library(purrr)

faker_person <- function(gender = NULL, birthday_start = NULL, birthday_end = NULL, quantity = 1, locale = "en_US", seed = NULL) {
if (!is.null(gender)) {
gender <- match.arg(gender, c("male", "female"))
}
if (!is.null(birthday_start)) {
if (!inherits(birthday_start, "Date")) {
stop("birthday_start must be a date")
}
birthday_start <- format(birthday_start, "%Y-%m-%d")
}
if (!is.null(birthday_end)) {
if (!inherits(birthday_end, "Date")) {
stop("birthday_end must be a date")
}
birthday_end <- format(birthday_end, "%Y-%m-%d")
}

json <- faker(
"persons",
gender = gender,
birthday_start = birthday_start,
birthday_end = birthday_end,
quantity = quantity,
locale = locale,
seed = seed
)

tibble::tibble(
firstname = map_chr(json$data, "firstname"), lastname = map_chr(json$data, "lastname"),
email = map_chr(json$data, "email"), gender = map_chr(json$data, "gender")
)
}
faker_person("male", quantity = 5)
#> # A tibble: 5 × 4
#>   firstname lastname email                      gender
#>   <chr>     <chr>    <chr>                      <chr>
#> 1 Makenna   Thiel    halvorson.mariah@jones.net male
#> 2 Adonis    Botsford grace.walsh@hotmail.com    male
#> 3 Dorthy    Hoeger   paucek.calista@gutmann.com male
#> 4 Emmitt    Kirlin   kulas.cali@hotmail.com     male
#> 5 Kennedy   Rice     nick.koch@gulgowski.com    male

The next steps would be to export and document this function; I’ll leave that up to you.

## Secret management

We need to take a quick break from APIs to talk about secrets. Secrets are important, because every API (except for very simple APIs like faker) is going to require that you identify yourself in some way, typically with an API key or a token. And even if you expect that your users will usually provide this information, you’re still going to need your own credentials in order to actually test your package.

This system is probably overkill if you only have one secret that you need to share in one or two places. But you almost invariably accumulate more secrets over time, and more people and computers that you need to share them with, so I think spending a little time to understand this system and set up it for your package will pay off in the long term.

### Basics

httr2 provides secret_encrypt() and secret_decrypt() to scramble secrets so that you can include them in your public source code without worrying that others can read them. There are three basic steps to this process:

1. You create an encryption key with secret_make_key() that is used to scramble and descramble secrets using symmetric cryptography:

key <- secret_make_key()
key
#> [1] "yPHVtbmSVXAOaXl5WG6W1g"

(Note that secret_make_key() uses a cryptographically secure random number generator provided by OpenSSL; it is not affected by R’s RNG settings, and there’s no way to make it reproducible.)

2. You scramble your secrets with secret_encrypt() and store the resulting text directly in the source code of your package:

secret_scrambled <- secret_encrypt("secret I need to work with an API", key)
secret_scrambled
#> [1] "5tAfRAIfX1cgTZGxV65taRXYKgUkvCMY0DkBbdlhO7lqhXMQrcYmyL7ynhg_mY6mFg"
3. When needed, you descramble the the secret using secret_decrypt():

secret_decrypt(secret_scrambled, key)
#> [1] "secret I need to work with an API"

### Package keys and secrets

You can create any number of encryption keys, but I highly recommend that you create one key per package, which I’ll call the package key. In this section, I’ll show you how to store that key so that you (and your automated tests) can use it, but no one else can.

httr2 is built around the notion that this key should live in an environment variable. So the first step is to make your package key available on your local development machine by adding a line to your your user-level .Renviron (which you can easily open with usethis::edit_r_environ()):

YOURPACKAGE_KEY=key_you_generated_with_secret_make_key

Now (after you restart R), you’ll be able to take advantage of a special secret_encrypt() and secret_decrypt() feature: the key argument can be the name of an environment variable, instead of the encryption key itself. In fact, this is most natural usage.

secret_scrambled <- secret_encrypt("secret I need to work with an API", "YOURPACKAGE_KEY")
secret_scrambled
#> [1] "8tpRjEdFKmPq3xUXuhLrFmA1n75GVceiDdQmZ1MMEk6tZmNtLA4gQPAaZ-rO38cD6g"
secret_decrypt(secret_scrambled, "YOURPACKAGE_KEY")
#> [1] "secret I need to work with an API"

You’ll also need to make the key available in your GitHub Actions (both check and pkgdown) so your automated tests can use it. This requires two steps:

2. Share the key with the workflows that need it by adding a line to the appropriate workflow:

    env:
YOURPACKAGE_KEY: ${{ secrets.YOURPACKAGE_KEY }} You can see how httr2 does it in its GitHub workflow. Other continuous integration platforms will offer similar ways to make a key available as a secure environment variable. ### When the package key isn’t available There are a few important cases where your code won’t have access to your package key: on CRAN, on the personal machines of external contributors, and in automated checks on their PRs. So if you want to share your package on CRAN or make it easy for others to contribute, you need to make sure that your examples, vignettes, and tests all work without error: • In vignettes, you can run knitr::opts_chunk(eval = secret_has_key("YOURPACKAGE_KEY")) so that chunks are only evaluated if your key is available. • In examples, you can surround code blocks that require your key with if (httr2::secret_has_key("YOURPACKAGE_KEY")) {} • You don’t need to do anything in tests because when secret_decrypt() is run by testthat, it will automatically skip() the test if the key isn’t available. ## NYTimes Books API Next we’ll take a look at the NYTimes Books API. It requires a very simple authentication with an API key that’s included in every request. When you’re wrapping an API that has a key you’re going to face two struggles: • How do you test your package without sharing your key with the whole world? • How do you allow your users to supply their own key, without having to pass it to every function? So now you can understand how the following code works to get my NYTimes Book API key: my_key <- secret_decrypt("4Nx84VPa83dMt3X6bv0fNBlLbv3U4D1kHM76YisKEfpCarBm1UHJHARwJHCFXQSV", "HTTR2_KEY") I’ll start by tackling the first problem because otherwise there’s no way for me to show how the API works in this vignette 😃. We’ll come back to the second at the very end of this section, because it’s easiest to tackle once we have a function in place. ### Security considerations Note that including an API key as a query parameter is relatively insecure; if an API uses this method of auth, it’s typically because the key is relatively easy to create or gives relatively few privileges. Here it only takes a couple of minutes to generate your own NYTimes API key, so there’s little incentive for someone to try and steal yours. The main problem of conveying credentials via the url is that it’s easily exposed, because httr2 makes no efforts to redact confidential information stored in query parameters. This means it’s relatively easy to leak your key if you use req_perform(verbose = 1), req_dry_run(), or even just print the request object. And indeed, you’ll see that in the examples below — this is bad practice for a real package, but I think it’s ok here because the key doesn’t allow you to do anything valuable and it makes teaching APIs so much easier. ### Basic request Now let’s perform a test request and look at the response: resp <- request("https://api.nytimes.com/svc/books/v3") %>% req_url_path_append("/reviews.json") %>% req_url_query(api-key = my_key, isbn = 9780307476463) %>% req_perform() resp #> <httr2_response> #> GET #> https://api.nytimes.com/svc/books/v3/reviews.json?api-key=qZ4iJAGzcL5drrRDErhTvuRalSlZxut4&isbn=9780307476463 #> Status: 200 OK #> Content-Type: application/json #> Body: In memory (1349 bytes) Like most modern APIs, this one returns the results as JSON: resp %>% resp_body_json() %>% str() #> List of 4 #>$ status     : chr "OK"
#>  $copyright : chr "Copyright (c) 2021 The New York Times Company. All Rights Reserved." #>$ num_results: int 2
#>  $results :List of 2 #> ..$ :List of 9
#>   .. ..$url : chr "http://www.nytimes.com/2011/11/10/books/1q84-by-haruki-murakami-review.html" #> .. ..$ publication_dt: chr "2011-11-10"
#>   .. ..$byline : chr "JANET MASLIN" #> .. ..$ book_title    : chr "1Q84"
#>   .. ..$book_author : chr "Haruki Murakami" #> .. ..$ summary       : chr "In “1Q84,” the Japanese novelist Haruki Murakami writes about characters in a Tokyo with two moons."
#>   .. ..$uuid : chr "00000000-0000-0000-0000-000000000000" #> .. ..$ uri           : chr "nyt://book/00000000-0000-0000-0000-000000000000"
#>   .. ..$isbn13 :List of 9 #> .. .. ..$ : chr "9780307476463"
#>   .. .. ..$: chr "9780307593313" #> .. .. ..$ : chr "9780307957023"
#>   .. .. ..$: chr "9780345802934" #> .. .. ..$ : chr "9781446484197"
#>   .. .. ..$: chr "9781446484203" #> .. .. ..$ : chr "9781455830497"
#>   .. .. ..$: chr "9781469258843" #> .. .. ..$ : chr "9788483832967"
#>   ..$:List of 9 #> .. ..$ url           : chr "http://www.nytimes.com/2011/11/06/books/review/1q84-by-haruki-murakami-translated-by-jay-rubin-and-philip-gabri"| __truncated__
#>   .. ..$publication_dt: chr "2011-11-06" #> .. ..$ byline        : chr "KATHRYN SCHULZ"
#>   .. ..$book_title : chr "1Q84" #> .. ..$ book_author   : chr "Haruki Murakami"
#>   .. ..$summary : chr "Haruki Murakami has translated Raymond Chandler into Japanese, and there’s a lot of Marlowe to his madness." #> .. ..$ uuid          : chr "00000000-0000-0000-0000-000000000000"
#>   .. ..$uri : chr "nyt://book/00000000-0000-0000-0000-000000000000" #> .. ..$ isbn13        :List of 9
#>   .. .. ..$: chr "9780307476463" #> .. .. ..$ : chr "9780307593313"
#>   .. .. ..$: chr "9780307957023" #> .. .. ..$ : chr "9780345802934"
#>   .. .. ..$: chr "9781446484197" #> .. .. ..$ : chr "9781446484203"
#>   .. .. ..$: chr "9781455830497" #> .. .. ..$ : chr "9781469258843"
#>   .. .. ..$: chr "9788483832967" Before we start wrapping this up into a function, let’s consider what happens with errors. ### Error handling What happens if there’s an error? For example, if we deliberately supply an invalid key: resp <- request("https://api.nytimes.com/svc/books/v3") %>% req_url_path_append("/reviews.json") %>% req_url_query(api-key = "invalid", isbn = 9780307476463) %>% req_perform() #> Error: HTTP 401 Unauthorized. To see if there’s any extra useful information we can again look at last_response(): resp <- last_response() resp #> <httr2_response> #> GET #> https://api.nytimes.com/svc/books/v3/reviews.json?api-key=invalid&isbn=9780307476463 #> Status: 401 Unauthorized #> Content-Type: application/json #> Body: In memory (90 bytes) resp %>% resp_body_json() #>$fault
#> $fault$faultstring
#> [1] "Invalid ApiKey"
#>
#> $fault$detail
#> $fault$detail$errorcode #> [1] "oauth.v2.InvalidApiKey" It looks like there’s some useful additional info in the faultstring: resp %>% resp_body_json() %>% .$fault %>% .$faultstring #> [1] "Invalid ApiKey" To add that information to future errors we can use the body argument to req_error(). This should be a function that takes a response and returns a character vector of additional information to include in the error. Once we do that and re-fetch the request, we see the additional information displayed in the R error: nytimes_error_body <- function(resp) { resp %>% resp_body_json() %>% .$fault %>% .$faultstring } resp <- request("https://api.nytimes.com/svc/books/v3") %>% req_url_path_append("/reviews.json") %>% req_url_query(api-key = "invalid", isbn = 9780307476463) %>% req_error(body = nytimes_error_body) %>% req_perform() #> Error: HTTP 401 Unauthorized. #> * Invalid ApiKey ### Rate limits Another common source of errors is rate-limiting — this is used by many servers to prevent one unruly client consuming too many resources. The frequently asked questions page describes the rate limits for the NYT APIs: Yes, there are two rate limits per API: 4,000 requests per day and 10 requests per minute. You should sleep 6 seconds between calls to avoid hitting the per minute rate limit. If you need a higher rate limit, please contact us at . Many APIs return additional information about how long to wait when the rate limit is exceeded (often using the Retry-After header). So I deliberately violated the rate limit by quickly making 11 requests; unfortunately while the response was a standard 429 (Too many requests), it did not include any information about how long to wait in either the response body or the headers. That means we can’t use req_retry(), which automatically waits the amount of time the server requests. Instead, we’ll use req_throttle() to ensure we don’t make more than 10 requests every 60 seconds: req <- request("https://api.nytimes.com/svc/books/v3") %>% req_url_path_append("/reviews.json") %>% req_url_query(api-key = "invalid", isbn = 9780307476463) %>% req_throttle(10 / 60) By default, req_throttle() shares the limit across all requests made to the host (i.e. api.nytimes.com). Since the docs suggest the rate limit applies per API, you might want to use the realm argument to be a bit more specific: req <- request("https://api.nytimes.com/svc/books/v3") %>% req_url_path_append("/reviews.json") %>% req_url_query(api-key = "invalid", isbn = 9780307476463) %>% req_throttle(10 / 60, realm = "https://api.nytimes.com/svc/books") ### Wrapping it up Putting together all the pieces above yields a function something like this: nytimes_books <- function(api_key, path, ...) { request("https://api.nytimes.com/svc/books/v3") %>% req_url_path_append("/reviews.json") %>% req_url_query(..., api-key = api_key) %>% req_error(body = nytimes_error_body) %>% req_throttle(10 / 60, realm = "https://api.nytimes.com/svc/books") %>% req_perform() %>% resp_body_json() } drunk <- nytimes_books(my_key, "/reviews.json", isbn = "0316453382") drunk$results[[1]]$summary #> [1] "In “Drunk,” Edward Slingerland plays devil’s advocate for the pleasure and utility of Dionysian abandon." To finish this up for a real package, you’d want to: • Add explicit arguments and check that they have the correct type. • Export and document the function. • Convert the nested list into a more user-friendly data structure (probably a data frame with one row per review). You’d also want to provide some convenient way for the user to supply their own API key. ### User-supplied key A good place to start is an environment variable, because environment variables are easy to set without typing anything in the console (which can get accidentally shared via your .Rhistory) and are easily set in automated processes. Then you’d write a function to retrieve the API key, returning a helpful message if it’s not found: get_api_key <- function() { key <- Sys.getenv("NYTIMES_KEY") if (identical(key, "")) { stop("No API key found, please supply with api_key argument or with NYTIMES_KEY env var") } key } Then you could modify nytimes_books() to use get_api_key() as the default value for api_key. Since the argument is now optional, we can move it to end of the argument list, since it’ll only be needed in exceptional circumstances. nytimes_books <- function(path, ..., api_key = get_api_key()) { ... } You can make this approach a little more user friendly by providing a helper that sets the environment variable: set_api_key <- function(key = NULL) { if (is.null(key)) { key <- askpass::askpass("Please enter your API key") } Sys.setenv("NYTIMES_KEY" = key) } Using askpass (or similar) here is good practice since you don’t want to encourage the user to type their secret key into the console, as mentioned above. It’s a good idea to extend get_api_key() to automatically use your encrypted key to make it easier to write tests: get_api_key <- function() { key <- Sys.getenv("NYTIMES_KEY") if (!identical(key, "")) { return(key) } if (is_testing()) { return(testing_key()) } else { stop("No API key found, please supply with api_key argument or with NYTIMES_KEY env var") } } is_testing <- function() { identical(Sys.getenv("TESTTHAT"), "true") } testing_key <- function() { secret_decrypt("4Nx84VPa83dMt3X6bv0fNBlLbv3U4D1kHM76YisKEfpCarBm1UHJHARwJHCFXQSV", "HTTR2_KEY") } ## Github Gists API Next we’ll take a look at an API that can make changes on behalf of a user, not just retrieve data: GitHub’s gist API. This uses different HTTP methods to perform different actions, like creating, updating, and deleting gists. But before we can get to those, let’s handle authentication, rate-limiting, and errors. ### Authentication The easiest way to authenticate with a GitHub API is to use a personal access token. A token is an alternative to a username and password. You have one username + password per site; you can have one token per use case. This lets each use case have a minimal set of permissions, and you can easily revoke one token without affecting any other use case. I created a personal access token specifically for this vignette that can only access gists, and, as in the last example, stored an encrypted version in this vignette: token <- secret_decrypt("Guz59woxKoIO_JVtp2IzU3mFIU3ULtaUEa8xvvpYUBdVthR8jhxzc3bMZFhA9HL-ZK6YZudOI6g", "HTTR2_KEY") If you want to run this vignette yourself, you’ll need to create a new token in your GitHub settings; just make sure it includes the “gist” scope. It’s also a good idea to give every token a descriptive name, that reminds you of its motivating use case. To authenticate a request with the token, we need to put it in the Authorization header with a “token” prefix: req <- request("https://api.github.com/gists") %>% req_headers(Authorization = paste("token", token)) req %>% req_perform() #> <httr2_response> #> GET https://api.github.com/gists #> Status: 200 OK #> Content-Type: application/json #> Body: In memory (68160 bytes) Because the authorization header usually contains secret information, httr2 automatically redacts it1: req #> <httr2_request> #> GET https://api.github.com/gists #> Headers: #> • Authorization: '<REDACTED>' #> Body: empty req %>% req_dry_run() #> GET /gists HTTP/1.1 #> Host: api.github.com #> User-Agent: httr2/0.1.1 r-curl/4.3.2 libcurl/7.64.1 #> Accept: */* #> Accept-Encoding: deflate, gzip #> Authorization: <REDACTED> ### Errors Once you’ve got authentication working, it’s always a good idea to work on errors next, since that will help you debug any failed requests. In my experience APIs rarely do a good job of documenting their errors, so you’ll often have to do a little experimentation. To add to the pain, in large APIs different endpoints often return different amounts of information in different forms. You’ll typically need to tackle your error handling iteratively, improving your code each time you encounter a new problem. While GitHub does document its errors, I’m sufficiently distrustful that I still want to construct a deliberately malformed query and see what happens: resp <- request("https://api.github.com/gists") %>% req_url_query(since = "abcdef") %>% req_headers(Authorization = paste("token", token)) %>% req_perform() #> Error: HTTP 422 Unprocessable Entity. As documented I get a 422 “Unprocessable Entity” error. But the response is rather different to documentation which suggests there should be a string message and a list of errors: resp <- last_response() resp #> <httr2_response> #> GET https://api.github.com/gists?since=abcdef #> Status: 422 Unprocessable Entity #> Content-Type: application/json #> Body: In memory (156 bytes) resp %>% resp_body_json() #>$message
#> [1] "Invalid since parameter: 'abcdef'. Must be an ISO 8601 timestamp."
#>
#> $documentation_url #> [1] "https://docs.github.com/v3/gists/#parameters" I’ll proceed anyway, writing a function that extracts the data and formats it for presentation to the user: gist_error_body <- function(resp) { body <- resp_body_json(resp) message <- body$message
if (!is.null(body$documentation_url)) { message <- c(message, paste0("See docs at <", body$documentation_url, ">"))
}
message
}
gist_error_body(resp)
#> [1] "Invalid since parameter: 'abcdef'. Must be an ISO 8601 timestamp."
#> [2] "See docs at <https://docs.github.com/v3/gists/#parameters>"

Now I can pass this function to the body argument of req_error() and it will be automatically included in the error when a request fails:

request("https://api.github.com/gists") %>%
req_url_query(since = "yesterday") %>%
req_error(body = gist_error_body) %>%
req_perform()
#> Error: HTTP 422 Unprocessable Entity.
#> * Invalid since parameter: 'yesterday'. Must be an ISO 8601 timestamp.
#> * See docs at <https://docs.github.com/v3/gists/#parameters>

Notice that each element of the character vector produced by gh_error_body() becomes a bullet in the resulting error.

### Rate-limiting

While we’re thinking about errors, it’s useful to look at what happens if the requests are rate limited. Luckily, GitHub consistently uses response headers to provide information about the remaining rate limits.

resp <- req %>% req_perform()
#> x-ratelimit-limit: 5000
#> x-ratelimit-remaining: 4971
#> x-ratelimit-reset: 1632778078
#> x-ratelimit-used: 29
#> x-ratelimit-resource: core

We can teach httr2 about this so it can automatically wait for a reset if the rate limit is hit. We need to define two functions. The first tells us whether or not a response has a transient error, i.e. it’s worth waiting and trying again. For GitHub, when the rate limit is exceeded, the response has a 403 status and a X-RateLimit-Remaining: 0 header:

gist_is_transient <- function(resp) {
resp_status(resp) == 403 &&
}
gist_is_transient(resp)
#> [1] FALSE

Then we need a function tells how long to wait. GitHub tells us when the rate limit resets (as number of seconds since 1970-01-01) in the X-RateLimit-Reset header. To convert that to a number of seconds to wait we first convert it to a number (since HTTP headers are always strings), then subtract off the current time (in number of seconds since 1970-01-01):

gist_after <- function(resp) {
time - unclass(Sys.time())
}
gist_after(resp)
#> [1] 2838.164

We then pass functions to req_retry() so httr2 has all the information it needs to handle rate-limiting automatically:

request("http://api.github.com") %>%
req_retry(
is_transient = gist_is_transient,
after = gist_after,
max_seconds = 60
)
#> <httr2_request>
#> GET http://api.github.com
#> Body: empty
#> Policies:
#> • retry_max_wait: 60
#> • retry_is_transient: a function
#> • retry_after: a function

You also need to supply either max_tries or max_seconds in order to activate req_retry().

### Wrapping it all up

Let’s wrap up everything we’ve learned so far into a single function that creates a request:

req_gist <- function(token) {
request("https://api.github.com/gists") %>%
req_error(body = gist_error_body) %>%
req_retry(
is_transient = gist_is_transient,
after = gist_after
)
}

# Check it works:
req_gist(token) %>%
req_perform()
#> <httr2_response>
#> GET https://api.github.com/gists
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (68160 bytes)

We’ll use this as the basis to solve the next challenge: uploading a gist.

### Sending data

To create a gist we need to change the method to POST and add a body that contains data encoded as JSON. httr2 provides one function that does both of these things: req_body_json():

req <- req_gist(token) %>%
req_body_json(list(
description = "This is my cool gist!",
files = list(test.R = list(content = "print('Hi!')")),
public = FALSE
))
req %>% req_dry_run()
#> POST /gists HTTP/1.1
#> Host: api.github.com
#> User-Agent: httr2/0.1.1 r-curl/4.3.2 libcurl/7.64.1
#> Accept: */*
#> Accept-Encoding: deflate, gzip
#> Authorization: <REDACTED>
#> Content-Type: application/json
#> Content-Length: 100
#>
#> {"description":"This is my cool gist!","files":{"test.R":{"content":"print('Hi!')"}},"public":false}

Depending on the API you’re wrapping, you might need to send data in a different way. req_body_form() and req_body_multipart() make it easier to encode data in two other common forms. If the API requires something different you can use req_body_raw().

Typically, the API will return some useful data about the resource you’ve just created. Here I’ll extract the gist ID so we can use it in the next examples, culminating with deleting the gist so I don’t end up with a bunch of duplicated gists 😃.

resp <- req %>% req_perform()
id <- resp %>% resp_body_json() %>% .$id id #> [1] "af48df660c167e7cb7d6cb2d1239a724" ### Changing a gist Actually, that description wasn’t very true and I want to change it. To do so, I need to again send JSON encoded data, but this time I need to use the PATCH verb. So after adding the data to request, I use req_method() to override the default method: req <- req_gist(token) %>% req_url_path_append(id) %>% req_body_json(list(description = "This is a lame gist")) %>% req_method("PATCH") req %>% req_dry_run() #> PATCH /gists/af48df660c167e7cb7d6cb2d1239a724 HTTP/1.1 #> Host: api.github.com #> User-Agent: httr2/0.1.1 r-curl/4.3.2 libcurl/7.64.1 #> Accept: */* #> Accept-Encoding: deflate, gzip #> Authorization: <REDACTED> #> Content-Type: application/json #> Content-Length: 37 #> #> {"description":"This is a lame gist"} ### Deleting a gist Deleting a gist is similar, except we don’t send any data, we just need to adjust the default method from GET to DELETE. req <- req_gist(token) %>% req_url_path_append(id) %>% req_method("DELETE") req %>% req_dry_run() #> DELETE /gists/af48df660c167e7cb7d6cb2d1239a724 HTTP/1.1 #> Host: api.github.com #> User-Agent: httr2/0.1.1 r-curl/4.3.2 libcurl/7.64.1 #> Accept: */* #> Accept-Encoding: deflate, gzip #> Authorization: <REDACTED> req %>% req_perform() #> <httr2_response> #> DELETE https://api.github.com/gists/af48df660c167e7cb7d6cb2d1239a724 #> Status: 204 No Content ## OAuth If the API provides access to a website where the user already has an account (think Twitter, Instagram, Facebook, Google, GitHub, etc), it’s likely to use OAuth to allow you to authorise on behalf of the user. OAuth2 is an authorisation framework that’s designed so that you don’t have to share your username and password with an app; instead the app asks for permission to use your account. You’ve almost certainly used this before on the web; it’s used in most cases where one website wants to use another website on your behalf. OAuth is a broad framework that’s has many many many different variants which makes it hard to provide generalisable advice. The following advice draws on my experience working with a number of OAuth using APIs, but don’t be surprised if you need to do something slightly different for the API you’re working with. ### Clients The first step in working with any OAuth API is to create a client. This involves you registering for a developer account on the API’s website and creating a new OAuth app. The process varies from API to API, but at the end of it you’ll get a client id and in most cases a client secret. (You’ll definitely need this for testing your package, and you’ll probably also baked it into your package for the convenience of your users. Bundling the app is user friendly, but not always possible, particularly if rate limits are enforced on a per-app rather than per-user basis. You should always provide some way for the user to provide their own app.) If the API provides a way to authenticate your app without the client secret, you should leave it out of your package. But in most cases, you’ll need to include the secret in the package. You can use obfuscate() to hide the secret; this is not bulletproof3, but in most cases it’ll be easier create a new client than try and steal yours. Additionally, it’s unusual for an OAuth client to be able to do anything in its own right, so even if someone does steal your secret there’s not much harm they can do with it. To obfuscate a secret, call obfuscate(): obfuscate("secret") #> obfuscated("OyE2iKLoZC9dDxj_xAcWMD-gAqxhGw") Then use the client id from the website along with the obfuscated secret to create a client. The following code shows a GitHub OAuth app that I created specifically for this vignette: client <- oauth_client( id = "28acfec0674bb3da9f38", secret = obfuscated("J9iiGmyelHltyxqrHXW41ZZPZamyUNxSX1_uKnvPeinhhxET_7FfUs2X0LLKotXY2bpgOMoHRCo"), token_url = "https://github.com/login/oauth/access_token", name = "hadley-oauth-test" ) You need to figure out the token_url from the documentation. I wish I could give good advice about how to find it 😞. Note that if you print the client the secret is automatically redacted: client #> <httr2_oauth_client> #> name: hadley-oauth-test #> id: 28acfec0674bb3da9f38 #> secret: <REDACTED> #> token_url: https://github.com/login/oauth/access_token #> auth: oauth_client_req_auth_body ### Flows Once you have a client you need to use it with a flow in order to get a token. OAuth provides a number of different “flows”, the most common is the “authorisation code” flow, which is implemented by req_oauth_auth_code(). You can try it out by running this code: token <- oauth_flow_auth_code(client, auth_url = "https://github.com/login/oauth/authorize") This flow can’t be used inside a vignette because it’s designed specifically for interactive use: it will open a webpage on GitHub that requires you to interactively confirm it’s OK for this app to use your GitHub account. Other flows provide different ways of getting the token: • req_oauth_client_credentials() is used to allow the client to perform actions on its own behalf (instead of on behalf of some other user). This is typically need if you want to support service accounts, which are used in non-interactive environments. • req_oauth_device() uses the “device” flow which is designed for devices like TVs that don’t have an easy way to enter data. It also works well from the console. • req_oauth_bearer_jwt() uses a JWT signed by a private key. • req_oauth_password() exchanges a user name and password for an access token. • req_oauth_refresh() works directly with a refresh token that you already have. It’s useful for testing. There’s one historically important OAuth flow that httr2 doesn’t support: the implicit grant flow. This is now mostly deprecated and was never a particularly good fit for native applications because it relies on a technique for returning the access token that only works inside a web browser. When wrapping an API, you’ll need to carefully read the documentation to figure out which flows are available. Typically you’ll want to use the auth code flow, but if it’s not available you’ll need to carefully consider the others. An additional wrinkle is that many APIs don’t implement the flow in exactly the same way as the spec. If your initial attempt doesn’t work, you’re going to need to do some sleuthing. This is going to be painful, but unfortunately there’s no way around it. I recommend using with_verbosity() so you can see exactly what httr2 is sending to the server. You’ll then need to carefully compare this to the API documentation and play “spot the difference”. ### Tokens The point of a flow is to get a token. You can use req_auth_bearer_token() to authorise a request with the access token stored inside the token object: request("https://api.github.com/user") %>% req_auth_bearer_token(token$access_token) %>%
req_perform() %>%
resp_body_json() %>%
.\$name
#> [1] "Hadley Wickham"

However, in most cases you won’t want to do this, but instead allow httr2 to manage the whole process, by switching from oauth_flow_{name} to req_oauth_{name}:

request("https://api.github.com/user") %>%
req_perform() %>%
resp_body_json()

This is important because most APIs provide only a short-lived access token that needs to be regularly refreshed using a longer-lived refresh token. httr2 will automatically refresh the token if its expired (i.e. its expiry date is in the past) or if the request errors with a 401 and there’s an invalid_token error in the WWW-authenticate header.

### Caching

By default, req_oauth_auth_code() and friends will cache the token in memory, so that multiple requests in the same session all use the same token. In some cases, you may want to save the token so that it’s automatically used across sessions. This is easy to do (just set cache_disk = TRUE in req_oauth_auth_code()) but you need to carefully consider the consequences of saving the user’s credentials on disk.

httr2 does the best it can to save these credentials securely. They are stored in a local cache directory (rappdirs::user_cache_dir("httr2")) that should only be accessible to the current user, and are encrypted so they will be hard for any package other than httr2 to read. However, there’s no way to prevent other R code from using httr2 to access them, so if you do choose to cache tokens, you should inform the user and give them the ability to opt-out.

You can see which clients have cached tokens by looking in the cache directory used by httr:

dir(rappdirs::user_cache_dir("httr2"), recursive = TRUE)
#> [1] "hadley-oauth-test/ae743e0fbd718c21f2cca632e77bd180-token.rds"

httr2 automatically deletes any cached tokens that are older than 30 days whenever it’s loaded. This means that you’ll need to re-auth at least once a month, but prevents tokens for hanging around on disk long after you’ve forgotten you created them.

1. Again, it’s still possible to extract it with a little extra work, but httr2 tries to help you avoid revealing it by accident. httr2 protects you from yourself, not from someone deliberately trying to find the secret.↩︎

2. Here I’ll only talk about OAuth 2.0 which is the only version in common use today. OAuth 1.0 is largely only of historical interest.↩︎

3. It uses secret_encrypt() with a special encryption key that’s bundled with httr2.↩︎