Review: GitHub Copilot preview gives me hope

The Copilot technical preview doesn’t always generate good, correct, or even running code, but it’s still somewhat useful. Future versions could be real time-savers.

ipopba / Getty Images

People have been predicting the death of computer programming for as long as I can remember. It hasn’t happened (yet) for a variety of reasons, the most important of which is that programming is as much an art as it is a science or an engineering discipline.

GitHub Copilot, billed as “Your AI pair programmer” and currently in a limited technical preview, takes a stab at helping to automate programming in a way that’s a bit beyond what IntelliSense and the like can provide. It’s not completely autonomous. You do have to declare (type out) your intentions before Copilot can generate meaningful code, as we’ll see, and you also do have to supervise Copilot to set it back on track when it inevitably slips off the rails.

Copilot is a cloud service with interfaces to Visual Studio Code (running on your own machine or running in the cloud on GitHub Codespaces); to JetBrains IDEs, such as IntelliJ IDEA; and to Neovim. The cloud service is a code prediction engine powered by OpenAI Codex, a language model trained on billions of lines of public code.

Yes, there has been controversy about Codex and Copilot. Before you start frothing at the mouth at Copilot’s potential copyright and privacy violations (I’m looking at you, Free Software Foundation), however, you need to understand that Codex was trained on publicly available code in a way often considered to be fair use within the machine learning community.

You also need to understand that Codex is a code synthesizer, not a search engine. The Copilot developers acknowledge that this may not be the last word on the subject:

… this is a new space, and we are keen to engage in a discussion with developers on these topics and lead the industry in setting appropriate standards for training AI models.

How GitHub Copilot works

According to GitHub, “OpenAI Codex was trained on publicly available source code and natural language, so it understands both programming and human languages. The GitHub Copilot editor extension sends your comments and code to the GitHub Copilot service, which then uses OpenAI Codex to synthesize and suggest individual lines and whole functions.” In addition, the service uses user choices to improve future suggestions.

GitHub

As shown on this diagram, GitHub Copilot is a service that uses the OpenAI Codex language model to provide suggestions based on editor content from Visual Studio Code and a few other editors.

Testing GitHub Copilot on Visual Studio Code

Currently Copilot is in a limited technical preview phase. Before you can usefully install it you need to apply to the preview program waitlist.

Once you have received your welcome e-mail, you can browse to the GitHub Copilot extension page on the Visual Studio Code Marketplace and install the extension. Then you’ll have to authorize the extension in Visual Studio Code. The getting started page has a tutorial you can do, starting at point number 2. In this tutorial you create a .JS file, type

   function calculateDaysBetweenDates(begin, end) {

and wind up with a fully implemented function inferred from the function name. In the next tutorial on this page, you type a comment summarizing what a function should do, and wind up with a fully implemented function inferred from the comment, even though the function name is too general to be helpful.

IDG

The GitHub Copilot extension page in the Visual Studio Code marketplace. As you can see at the top, I have already installed the extension.

IDG

A screenshot of Visual Studio Code with the GitHub Copilot active. I have just started the first tutorial in the documentation, and you can see the “ghost” code suggestion below what I typed, as well as the pop-up Copilot control bar. I created the new file as TypeScript rather than JavaScript, mostly because I’m bloody-minded. In this case Copilot generated JavaScript code anyway.

GitHub Copilot capabilities

In addition to inferring function bodies from the function name and from a summary comment, Copilot can take its cues from other code in the file you’re editing and from variable names. For example, if I type a colon after a variable name in TypeScript, Copilot will take a stab at filling in the type. If I type “var test1 =”, Copilot will cue on the word “test” and generate a runnable test for the previous function. If I type several lines that form a repetitive pattern, Copilot will try to generate more examples of the same pattern.

Copilot works with a broad set of frameworks and languages. It works best with Python, JavaScript, TypeScript, Ruby, Go, and more recently Java, with the C family of languages (C, C++, and C#) planned for the future. I have heard from others that it does very well with popular JavaScript frameworks such as React.

GitHub has plans for Copilot to consider more of the current code project than just the current file for its context in the future.

IDG

I generated lines 8 and 9 by typing the beginnings of the lines and a bunch of tabs. I typed line 10 and the beginning of line 11, and Copilot finished line 11. I was working in TypeScript; this is after compilation to JavaScript, which I ran under Node.js as you can see at the bottom of the screen. Note the incorrect generated comments about the expected result values in lines 8 and 9.

GitHub Copilot limitations

First of all, Copilot doesn’t always generate good code. It doesn’t always generate correct code. Even worse, it doesn’t always generate runnable code. (I encountered all three cases in my testing.)

You absolutely need to review the code that Copilot generates. Treat it as though it was written by a green programmer intern who is good with Google searches but needs close supervision.

One way to avoid accepting the first snippet that Copilot offers is to use the “Open Copilot” option on its context menu, or use the Ctrl-Enter key combination, to bring up the Copilot suggestions window in a separate tab. Look at all 10 suggested solutions, and accept the one that’s closest to what you actually want. That done, you may then want to edit the generated code a bit to improve its robustness.

GitHub did a benchmark on Copilot code generation:

We recently benchmarked against a set of Python functions that have good test coverage in open source repos. We blanked out the function bodies and asked GitHub Copilot to fill them in. The model got this right 43% of the time on the first try, and 57% of the time when allowed 10 attempts. And it’s getting smarter all the time.

Obviously, 43% right isn’t a very good (or even acceptable) correctness score for production use, even though it’s an impressive accomplishment for a new code generation technology. Nevertheless, if you are a good code reviewer, you can edit Copilot-generated code to be correct and robust much more quickly than you could write it yourself from scratch, especially if you’re working with a library or framework that’s new to you.

IDG

The tab on the right shows 10 suggested code snippets for the function body. You can accept whichever is closest to what you want.

GitHub Copilot examples

There were roughly 25 small examples of Copilot code generation on its home page, and four larger examples with accompanying screen videos in the Copilot gallery when I looked on November 5, 2021. It’s likely that the Copilot team will post more examples in a wider variety of programming languages over time. By the way, it’s worth watching the animations in the examples on the home page, as well as downloading and watching the MP4 videos from the gallery.

IDG

GitHub Copilot example for sentiment analysis in Python, following the gallery. I typed parts of about six lines, plus a lot of tabs to accept the code. I also rejected several suggestions, including test sentences that I considered negative generated for the positive_sentences list. The code did not run until I installed the Python Requests package on my machine with pip3.

Overall, GitHub Copilot is somewhat useful in its current technical preview stage of development. Its current performance gives me hope that it will become even more of a time-saver in the future. Whether it will be worth buying the planned commercial Copilot product if and when it is released is an open question that will depend not only on its evolved performance but on your own skills and role.

There are several products that purport to compete with GitHub Copilot. The most promising of these seems to be Tabnine, from a company of the same name in Tel Aviv. Tabnine looks like IntelliSense on steroids, and can optionally train on your own code corpus as well as on open source code. A couple of the other alternatives essentially search StackOverflow for relevant code, which makes me somewhat wary of their methodology.

It’s certainly worth trying GitHub Copilot in your own environment and following its progress over time.