Without fine-tuning or being trained on a specific topic, ChatGPT can answer questions about a wide range of technology subjects—including how to write R code. That means ChatGPT's power is available to any R programmer, even one who knows little about large language models. (A large language model, or LLM, is the technology underpinning AI chatbots like OpenAI's ChatGPT.)
An ecosystem is forming around ChatGPT and R, making it easy to incorporate the AI technology into your R language workflow. But before you begin using ChatGPT and tools associated with it for projects in R, there are a few important things to keep in mind:
- Everything you ask with these tools gets sent to OpenAI's servers. Don't use ChatGPT tools to process sensitive information.
- ChatGPT may confidently return answers that are wrong. Even incorrect responses can be a time-saving starting point, but don't assume the code will do exactly what you expect. Kyle Walker, an associate professor at Texas Christian University and author of the popular
tidycensus
R package, recently tweeted that ChatGPT can "supercharge your work if you understand a topic well," or it can leave you "exposed for not knowing what you are doing." The difference is in knowing when the AI output isn't right. Always check ChatGPT's responses. - ChatGPT can generate different responses to the same query—and some answers might be accurate while others aren't. For instance, when I asked multiple times for a
ggplot2
bar chart with blue bars, the code generated a graph with blue bars sometimes but not others, even though I submitted the exact same request. This is obviously less than ideal if you need a reproducible workflow. - If there's been a recent update to a package you're using, ChatGPT won't know about it, since its training data ends in 2021.
- Most of the resources in this article require you to have your own OpenAI API key, and the API isn't free to use. While pricing is low at the moment, there's no guarantee it will stay that way. Current pricing is 2 cents per 10,000 tokens for the ChatGPT 3.5 turbo model. What does a token get you? As one example, the request to create a scatter plot from a 234-row mpg data set cost 38 tokens, a fraction of a cent.
- Asking ChatGPT for coding help is unlikely to ensnare you in the ethics of AI racial and gender bias. However, there are heated discussions about the wisdom of furnishing OpenAI with yet more data; the ethics of how the training data was scraped and repurposed; and if it's better to use open source large language models (such as H2O.ai's h2oGPT) rather than OpenAI's. Those dilemmas are for every individual and organization to parse for themselves. However, as of this writing, there simply aren't R-specific LLM tools that are comparable to those building up around ChatGPT.
Now, let's look at some of the most notable R-focused ChatGPT resources currently available.
TheOpenAIR
TheOpenAIR package is an excellent choice for incorporating ChatGPT technology into your own R applications, such as a Shiny app that sends user input to the OpenAI API. You can register your key with the openai_api_key(“YOUR-KEY”)
function.
Its chat()
function gives you the option to print results to your console with chat(“My request”)
, save results as text with my_results <- chat(“My request”, output = “message”)
, or return a complete API response object with my_results_object <- chat(“My request”, output = “response object”)
. The response object is a list that also includes information like tokens used.
Other useful functions include count_tokens()
to count the number of ChatGPT tokens a character string will cost when sent to the API, extract_r_code()
to get R code from a ChatGPT response that includes a text explanation with code, and get_chatlog_id()
to get the ID of the current ChatGPT (useful if you want to break up a complex application into smaller functions).
The package has some general coding functions, as well. For example, write_code(“filename”)
generates a prompt asking for your input and in what language you want the code written. refactor()
, which is R-specific, does what you’d expect.
There are also functions to convert between Python and R or Java and R, although you may end up with warning messages such as “The conversion from R to Python has potentially resulted in invalid Python code. Please verify the output code carefully!”
Run help(package = “TheOpenAIR”)
in your R console to see its many other functions.
The package, by Ulrich Matter, an assistant professor at the University of St. Gallen in Switzerland, and St. Gallen PhD student Jonathan Chassot, is on CRAN.
RTutor
This app is an elegant and easy way to sample ChatGPT and R. Upload a data set, ask a question, and watch as it generates R code and your results, including graphics. Although it's named RTutor, the app can also generate Python code.
RTutor is on the web at https://rtutor.ai/. It's currently the only app or package listed that doesn't require a ChatGPT API key to use, but you're asked to supply your own for heavy use so as not to bill the creators' account.
The app's About page explains that RTutor's primary goal "is to help people with some R experience to learn R or be more productive ... RTutor can be used to quickly speed up the coding process using R. It gives you a draft code to test and refine. Be wary of bugs and errors."
The code for RTutor is open source and available on GitHub, so you can install your own local version. However, licensing only allows use of the app for nonprofit or non-commercial use, or for commercial testing. RTutor is a personal project of Dr. Steven Ge, a professor of bioinformatics at South Dakota State University.
CodeLingo
This multi-language app "translates" code from one programming language to another. Available languages include Java, Python, JavaScript, C, C++, PHP and more, including R. This is a web application only, available at https://analytica.shinyapps.io/codelingo/ . You need to input your OpenAI API key to use it (you may want to regenerate the key after testing).
A request to translate code for a ggplot2 R graph into JavaScript generated output using the rather hard-to-learn D3 JavaScript library, as opposed to something a JavaScript newbie would be more likely to want such as Observable Plot or Vega-Lite.
The request to translate into Python, shown in Figure 3, was more straightforward and used libraries I'd expect. However, ChatGPT didn't understand that "Set1" is a ColorBrewer color palette and can't be used directly in Python. As is the case for many ChatGPT uses, translating code between programming languages may give you a useful starting point, but you will need to know how to fix mistakes.
The app was created by Analytica Data Science Solutions.
askgpt
This package, available at https://github.com/JBGruber/askgpt, can be a good starting point for first-time users who want ChatGPT in their console, in part because it gives some instructions upon initial startup. Load the package with library(askgpt)
and it responds with:
Hi, this is askgpt ☺.
• To start error logging, run `log_init()` now.
• To see what you can do use `?askgpt()`.
• Or just run `askgpt()` with any question you want!
Use the login()
function without first storing a key, and you'll see a message on how to get an API key:
ℹ It looks like you have not provided an API key yet.
1. Go to <https://platform.openai.com/account/api-keys>
2. (Log into your account if you haven't done so yet)
3. On the site, click the button + Create new secret key to create an API key
4. Copy this key into R/RStudio
You'll be asked to save your key in your keyring, and then you're all set for future sessions. If your key is already stored, login()
returns no message.
askgpt
's default is to store results of your query as an object so you can save them to a variable like this one:
barchart_instructions <- askgpt("How do I make a bar chart with custom colors with ggplot2?")
Submit a query and you'll first see:
GPT is thinking ⠴
This way, you know your request has been sent and an answer should be forthcoming, instead of wondering what is happening after you hit submit.
Along with the package's general askgpt()
function, there are a few coding-specific functions such as annotate_code()
, explain_code()
, and test_function()
. These will involve cutting and pasting responses back into your source code.
For those familiar with the OpenAI API, the package's chat_api()
function allows you to set API parameters such as the model you want to use, maximum tokens you're willing to spend per request, and your desired response temperature (which I'll explain in more detail later in the article).
The chat_api()
function returns a list, with the text portion of the response in YourVariableName$choices[[1]]$message$content
. Other useful info is stored in the list, as well, such as the number of tokens used.
The askgpt
package was created by Johannes Gruber, a post-doc researcher at Vrije Universiteit Amsterdam. It can be installed from CRAN.
gptstudio
This package and its sibling, gpttools
(discussed below), feature RStudio add-ins to work with ChatGPT, although there are also some command-line functions that will work in any IDE or terminal.
You can access add-ins within RStudio either from the add-in drop-down menu above the code source pane or by searching for them via the RStudio command palette (Ctrl-shift-p).
According to the package website, gptstudio
is a general-purpose helper "for R programmers to easily incorporate use of large language models (LLMs) into their project workflows." It is on CRAN.
One add-in, ChatGPT, launches a browser-based app for asking your R coding questions, and offers options for programming style (tidyverse, base, or no preference) and proficiency (beginner, intermediate, advanced, and genius).
In the screenshot below, I've asked how to create a scatter plot in R as an intermediate coder with a tidyverse style.
Asking the same question with the base programming style produced code using base R’s plot function as the answer.
Although designed for R coding help, gptstudio
can tap into more ChatGPT capabilities, so you can ask it anything that you would the original web-based ChatGPT. For instance, this app worked just as well as a ChatGPT tool to write Python code and answer general questions like, "What planet is farthest away from the sun?"
Another of the gptstudio
package's add-ins, ChatGPT in Source, seems closest to magic. You write code as usual in your source pane, add a comment requesting changes you'd like in the code, select the block of code including your comment, and apply the add-in. Then, voilà! Your requested changes are made.
When I applied the add-in to this code:
# Sort bars by descending Y value, rotate x-axis text 90 degrees, color bars steel blue
ggplot(states, aes(x = State, y = Pop_2020)) +
geom_col()
My code was replaced with what is shown in the highlighted selection of Figure 5:
That's cool . . . except if you run this code, the bars won't display as steel blue. Moving fill = "steelblue"
inside geom_col()
makes it work. That mistake has nothing to do with this specific add-in, but with the vagaries of ChatGPT itself. As I previously mentioned, I've run the same request other times and the results were accurate.
Sending the following code to the ChatGPT in Source add-in generated complete instructions and code for a Shiny app:
# Create an R Shiny app with this data
states <- readr::read_csv("https://raw.githubusercontent.com/smach/SampleData/main/states.csv")
Submitting my request twice returned two completely different results, however—the first with a two-file app that forgot to load the ggplot2
library before using it; the second calling columns that weren't actually in the data. It takes more work to craft a query that handles the specifics of an existing data set, but the code still could serve as a framework to build on.
gptstudio
was written by Michel Nivard and James Wade.
gpttools
The aim of the gpttools
package "is to extend gptstudio
for R package developers to more easily incorporate use of large language models (LLMs) into their project workflows," according to the package website. The gpttools
package isn't on CRAN as of this writing. Instead, you can install gpttools
from the JamesHWade/gpttools GitHub repo or R Universe with the following:
# Enable repository from jameshwade
options(repos = c(
jameshwade = "https://jameshwade.r-universe.dev",
CRAN = "https://cloud.r-project.org"
))
# Download and install gpttools in R
install.packages("gpttools")
The package's add-ins include:
- ChatGPT with Retrieval
- Convert Script to Function
- Add roxygen to Function (documents a function)
- Suggest Unit Test
- Document Data
- Suggest Improvements
To run an add-in, highlight your code and then select the add-in either from the RStudio Addins dropdown menu or by searching for it in the command palette (Tools > Show Command Palette in the RStudio Addins menu or Ctrl-Shift-P on Windows, or Cmd-Shift-P on a Mac).
When I ran an add-in, I didn't always see a message telling me that something was happening, so be patient.
The Suggest Improvements add-in generated uncommented text below my function in an R file followed by modified code. Some of the suggestions weren't very helpful. For example, for this code
if (exportcsv) {
filename_root <- strsplit(filename, "\\.")[[1]][1]
filename_with_winner <- paste0(filename_root, "_winners.csv")
rio::export(data, filename_with_winner)
}
the add-in recommended
Use `paste()` instead of `paste0()` to ensure a space is included between the names of the winners.
I didn't want a space in my file name! Still, I couldn't argue with all of its advice. The following suggestion seemed reasonable:
Use a switch statement instead of multiple if statements, to allow for additional functionality in the future
In this case, I'd be more likely to use dplyr's case_when()
or data.table's fcase()
than base R's switch()
.
Make sure you have an original copy of your code if you're using any package's ChatGPT add-in, since there is a risk of code being overwritten in a way you don't necessarily want.
chatgpt
The chatgpt R package offers both functions and RStudio add-ins for using ChatGPT in R, with 10 add-ins documented at the time I tested.
Code-specific functions include comment_code()
, complete_code(
), create_unit_tests()
, document_code()
, find_issues_in_code()
, and refactor_code()
. There's also a generic ask_chatgpt()
function and add-in if you'd like to use ChatGPT for something not code-related.
Store your key in your .Renviron
file with
OPENAI_API_KEY="your key"
and you're good to go. If you attempt to run one of the add-ins before storing your key, you'll get an error message telling you how to do the key setup.
The package is on CRAN, or you can install the development version with
remotes::install_github("jcrodriguez1989/chatgpt", build_vignettes = TRUE)
When I tried an add-in without loading the package first, nothing happened. I then loaded the package with library(chatgpt)
and got this message:
Warning message:
In run_addin("document_code") :
Please set one of `OPENAI_ADDIN_REPLACE=TRUE` or `OPENAI_VERBOSE=TRUE`
I followed the instructions in my R environment file, setting the verbose option to TRUE
as I didn't want my initial code to be replaced.
That resulted in a query to ChatGPT being displayed in my console and a response also being displayed in the console.
With the option OPENAI_ADDIN_REPLACE=TRUE
in my R environment file and my code selected in RStudio, some of my initial code occasionally disappeared when documentation was added. I ended up sticking with this package's command-line functions instead of the add-ins, but you might find the add-ins useful—just remember to make a copy of your code before experimenting.
The chatgpt
package was created by Juan Cruz Rodriguez.
gptchatteR
Billed as "an experimental and unofficial wrapper for interacting with OpenAI GPT models in R," one advantage of gptchatteR
is its chatter.plot()
function.
Install the package with
remotes::install_github("isinaltinkaya/gptchatteR", build_vignettes = TRUE, dependencies = TRUE)
This ensures that it also installs the required openai
package. Then, you can load the package and authenticate with
library(gptchatteR)
chatter.auth("YOUR KEY")
Once that's done, launch a chat session with chatter.create()
.
The chatter_create()
arguments include a model for the OpenAI model (default is text-davinci-003
), max_tokens
for the maximum number of tokens you want it to use (default is 100), and a "temperature" set with an argument like this one:
chatter.create(temperature = 0)
According to the OpenAI documentation, the temperature setting can be between 0 and 1 and represents "how often the model outputs a less likely token."
The higher the temperature, the more random (and usually creative) the output. This, however, is not the same as "truthfulness." For most factual use cases such as data extraction, and truthful Q&A, the temperature of 0 is best.
The package default is a neutral 0.5. Unless you want to be entertained as opposed to getting usable code, it's worth setting your temperature to 0.
As of when I tested, the package was working but generated this warning:
The `engine_id` argument of `create_completion()` is deprecated as of openai 0.3.0.
ℹ Please use the `model` argument instead.
ℹ The deprecated feature was likely used in the gptchatteR package.
Please report the issue to the authors.
You can create a "casual" chat with chatter.chat("Your input here")
. If you think you'll want follow-up after your initial request, use chatter.feed()
, which stores your first query for use in a second question, and so on.
After running the following code:
library(gptchatteR)
chatter.auth(Sys.getenv("OPENAI_API_KEY"))
chatter.create(temperature = 0)
chatter.feed('I have the following data in R mydf <- data.frame(State = c("CT", "NJ", "NY"), Pop = c(3605944, 9288994, 20201249))')
myplot <- chatter.plot("Make a graph with State on the x axis and Pop on the Y axis")
a graph appeared in my RStudio view pane. The graph code was stored in myplot$code
.
The gptchatteR
package was created by Isin Altinkaya, a PhD fellow at the University of Copenhagen.
One more ...
That's the top eight ChatGPT packages for R. Here's one more—and I will keep adding to this list, so check back in the future.
chatgptimages
wasn't designed to help you code. Instead, it uses a familiar R and Shiny interface to access another ChatGPT capability: creating images. There are a number of ethical intellectual property issues currently tangled up in AI image creation based on what was used to train models, which is important to keep in mind if you want to use this package for anything beyond entertainment.
That said, if you'd like to give it a try, note that it doesn't install like a usual package. First, make sure you also have shiny
, golem
, shinydashboard
, openai
, config
, and testthat
installed on your system. Then, fork and download the entire GitHub repo at https://github.com/analyticsinmotion/chatgpt-images-r-shiny or download and unzip the .zip file from https://github.com/analyticsinmotion/chatgpt-images-r-shiny. Open the chatgptimages.Rproj
file in RStudio, open the run_dev.R
file in the project's dev folder, and run that short file line by line. This app should open in your default browser:
Follow the instructions on storing a ChatGPT API key, and you can start creating and saving images.
The results look something like what's shown in Figure 7.
Beyond ChatGPT
If you'd like to test out other large language models that are open source, one non-R-specific tool, Chat with Open Large Language Models, is interesting. It offers access to nine different models as of this writing and an "arena" where you can test two at once and vote for the best.
Be aware of the terms of use: "non-commercial use only. It only provides limited safety measures and may generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The service collects user dialogue data for future research."
As a final note, H2o.ai has a website where you can test its model at https://gpt.h2o.ai/.