There are a lot of great R packages that let you import data from an API with a single function. However, sometimes an API doesn’t have an already-written function. The good news is that it’s easy to code your own.
I’ll demonstrate this with the AccuWeather API, but the process and code will work for most other APIs that use a key for authentication.
Sign up for API access
If you want to follow along, go to developer.accuweather.com and sign up for a free account. Under Packages and Pricing, select the Limited Trial, which allows 50 API calls per day — enough if you just want to check your local forecast a couple of times a day, but obviously not for any sort of public-facing application.
If you’re not immediately presented with an option to create an app, go to My Apps and create a new app.
I chose Other for where the API will be used, Internal App for what I’m creating, and Other for programming language (sadly, R isn’t an option). Your app should be assigned an API key.
If you don’t want to hard code that API key into your AccuWeather forecast script, save it as an R environment variable. The easiest way to do this is with the usethis package. usethis::edit_r_environ()
opens your R environment file for editing. Add a line such as ACCUWEATHER_KEY = 'my_key_string'
to that file, save the file, and restart your R session. You can now access the key value with Sys.getenv("ACCUWEATHER_KEY")
instead of hard coding the value itself.
Determine the API’s URL structure
For this project, I’ll first load the httr, jsonlite, and dplyr packages: httr for getting data from the API, jsonlite for parsing it, and dplyr to eventually use pipes (you can also use the magrittr package).
Next — and this is critical — you need to know how to structure a URL in order to request the data you want from the API. Figuring out the query structure can be the hardest part of the process, depending on how well the API is documented. Fortunately, the AccuWeather API docs are pretty good.
Any API query needs a resource URL, or what I think of as the URL’s root, and then specific parts of the query. Here’s what AccuWeather says in its documentation for the one-day forecast API:
http://dataservice.accuweather.com/forecasts/v1/daily/1day/{locationKey}
The base URL for a forecast is mostly constant, but this one needs a location code. If you’re just looking for a forecast for one location, well, you can cheat and use the AccuWeather website to search for a forecast at accuweather.com and then check the URL that comes back. When I search for Zip code 01701 (our office in Framingham, MA), the following URL comes back along with the forecast:
https://www.accuweather.com/en/us/framingham/01701/weather-forecast/571_pc
See the /571_pc
at the end? That’s the location key. You can also use an AccuWeather Locations API to pull location codes programmatically, which I’ll show in a bit, or one of AccuWeather’s Web-based Locations API tools such as City Search or Postal Code Search.
Construct a request URL
Query parameters for specific data requests get tacked onto the end of a base URL. The first parameter starts with a question mark followed by name equals value. Any additional key-value pairs are added with an ampersand followed by name equals value. So to add my API key, the URL would look like:
http://dataservice.accuweather.com/forecasts/v1/daily/1day/571_pc?apikey=MY_KEY
If I wanted to add a second query parameter — say, changing the default details from false to true — it would look like this:
http://dataservice.accuweather.com/forecasts/v1/daily/1day/571_pc?apikey=MY_KEY&details=true
Get the data
We can use the httr::GET()
function to make an HTTP GET
request of that URL, such as
my_url <- paste0("http://dataservice.accuweather.com/forecasts/",
"v1/daily/1day/571_pc?apikey=",
Sys.getenv("ACCUWEATHER_KEY"))
my_raw_result <- httr::GET(my_url)
That paste0()
command creating the URL broke the URL root into two lines for readability and then added the API key stored in the ACCUWEATHER_KEY R environment variable.
my_raw_result
is a somewhat complex list. The actual data we want is mostly in content, but if you look at its structure, you’ll see it’s a “raw” format that looks like binary data.
Fortunately, the httr package makes it easy to convert from raw to a usable format — with the content()
function.
Parse the results
content()
gives you three conversion options: as raw (which definitely isn’t helpful in this case); parsed, which seems to usually return some sort of list; and text. For JSON — especially nested JSON — I find text to be the easiest to work with. Here is the code:
my_content <- httr::content(my_raw_result, as = 'text')
This is where the jsonlite package comes in. The fromJSON()
function will turn a JSON text string from content()
into a more usable R object.
Here are partial results of running dplyr’s glimpse()
function on my_content
to get a look at the structure:
It’s a list with two items. The first item has some metadata and a text field we might want. The second item is a data frame with a lot of data points we definitely want for the forecast.
Running glimpse()
on just that data frame shows it was nested JSON, because some of the columns are actually their own data frames. But fromJSON()
made it all pretty seamless.
Observations: 1 Variables: 8 $ Date <chr> "2019-08-29T07:00:00-04:00" $ EpochDate <int> 1567076400 $ Temperature <df[,2]> <data.frame[1 x 2]> $ Day <df[,3]> <data.frame[1 x 3]> $ Night <df[,3]> <data.frame[1 x 3]> $ Sources <list> ["AccuWeather"]
So these are the basic steps to pulling data from an API:
- Figure out the API’s base URL and query parameters, and construct a request URL.
- Run
httr::GET()
on the URL. - Parse the results with
content()
. You can try it withas = 'parsed'
, but if that returns a complicated list, tryas = 'text'
. - If necessary, run
jsonlite::fromJSON()
on that parsed object.
A couple of more points before we wrap up. First, if you look again at my_raw_result
— the initial object returned from GET
— you should see a status code. A 200 means all was OK. But a code in the 400s means something went wrong. If you’re writing a function or script, you can check whether the status code is in the 200s before additional code runs.
Second, if you’ve got multiple query parameters, it can get a little annoying to string them all together with a paste0()
command. GET()
has another option, which is creating a named list of query arguments, such as:
my_raw_result2 <- GET(url,
query = list(
apikey = Sys.getenv("ACCUWEATHER_KEY"),
details = 'true'
)
)
See the structure? The GET()
function takes the base URL as the first argument and a list of names and values as the second query argument. Each one is name = value
, with the name not in quotation marks. The rest of the code is the same.
That works for the AccuWeather Locations API as well.
Here’s what the API is looking for:
I can use similar code as with the forecast API, but this time with the query parameters apikey
and q
, the AccuWeather key and the text of the place I’m searching for, respectively:
base_url <- "http://dataservice.accuweather.com/locations/v1/cities/search"
ny_location_raw <- GET(base_url,
query = list(apikey = Sys.getenv("ACCUWEATHER_KEY"),
q = "New York, NY"
))
ny_parsed <- content(ny_location_raw, as = 'text') %>%
fromJSON()
The location code is in the Key column.
> glimpse(ny_parsed) Observations: 1 Variables: 15 $ Version <int> 1 $ Key <chr> "349727" $ Type <chr> "City" $ Rank <int> 15 $ LocalizedName <chr> "New York" $ EnglishName <chr> "New York" $ PrimaryPostalCode <chr> "10007" $ Region <df[,3]> <data.frame[1 x 3]> $ Country <df[,3]> <data.frame[1 x 3]> $ AdministrativeArea <df[,7]> <data.frame[1 x 7]> $ TimeZone <df[,5]> <data.frame[1 x 5]> $ GeoPosition <df[,3]> <data.frame[1 x 3]> $ IsAlias <lgl> FALSE $ SupplementalAdminAreas <list> [<data.frame[1 x 3]>]
Now all you need is code to use the data you’ve pulled from the API.
For more R tips, head to the “Do More With R” page with a searchable table of articles and videos.