This vignette aims at explaining how you can complement a data.frame with weather data using rnoaa. In this vignette we shall use air quality data from the OpenAQ platform queried with the ropenaq package, for India. Using ropenaq one can get e.g. PM2.5 values over time in Indianapolis over the last days. For getting all data for march we’ll loop over several pages.
First, we need to know how many measures are available for Indianapolis for March 2016.
We filter negative values.
We now transform these data into daily data.
# only keep stations with geographical information measurementsIndianapolis <- filter(measurementsIndianapolis, !is.na(latitude)) # now transform to daily data measurementsIndianapolis <- measurementsIndianapolis %>% #mutate(day = as.Date(dateLocal)) %>% group_by(location, day) %>% summarize(value = mean(value), longitude = longitude, latitude = latitude) %>% ungroup() measurementsIndianapolis %>% head() %>% knitr::kable()
Air quality and weather are correlated, so one could be interested in getting a time series of say temperature for the same location. The OpenAQ platform itself does not provide weather data but nearly all stations have geographical coordinates. Our goal here will be to use rnoaa to complement this table with precipitation and temperature.
For finding the right station(s) we shall use the
meteo_nearby_stations function. It returns a list with the weather stations nearby each latitude/longitude given as arguments, respecting the other arguments such as maximal radius, first year with data, etc. For finding stations one might have to play a bit with the parameters until there is at least one station for each location.
Here we query stations with a less than 15km distance from the air quality stations, with precipitation and temperature data, and with data starting from 2016. Note that the query takes a while.
library("rnoaa") station_data <- ghcnd_stations() lat_lon_df <- select(measurementsIndianapolis, location, latitude, longitude) %>% unique() %>% ungroup() %>% rename(id = location) %>% mutate(id = factor(id)) stationsIndianapolis <- meteo_nearby_stations(lat_lon_df = as.data.frame(lat_lon_df), station_data = station_data, radius = 5, year_min = as.character(format(Sys.Date(), "%Y")), var = c("TAVG", "PRCP")) stationsIndianapolis <- unique(bind_rows(stationsIndianapolis) %>% select(- distance)) save(stationsIndianapolis, file = "data/stationsIndianapolis.RData")
Now let us plot the AQ and weather stations on a quick and dirty map with no legend, red for AQ stations, blue for weather stations.
For pulling weather data from these weather monitors, we shall use the
Here we notice some values are not available. Therefore, we might need to go back to weather stations searching with, for instance, a larger radius. In this case let’s say we’re ok with the result of the search.
Therefore, in this case we will bind the rows of the air quality table with the weather table.
Now some locations are air quality locations and have only missing values in the weather columns, and some locations are weather locations and have only missing values in the air quality columns.
We can plot the data we got.
data_plot <- measurementsIndianapolis %>% rename(pm25 = value) %>% select(- longitude, - latitude) data_plot <- tidyr::gather_(data_plot, "parameter", "value", names(data_plot)[3:ncol(data_plot)]) library("ggplot2") ggplot(data_plot) + geom_line(aes(x = day, y = value, col = location)) + facet_grid(parameter ~ ., scales = "free_y")