LLMs and the rise of the AI code generators

Large language models like GPT-4 and tools like GitHub Copilot can make good programmers more efficient and bad programmers more dangerous. Are you ready to dive in?

1 2 Page 2
Page 2 of 2

CodeT5

CodeT5 is a 2021 code-specific unified pre-trained encoder-decoder Transformer model from Salesforce AI Research. It’s based on the 2020 Google T5 model architecture, and fine-tuned on the CodeSearchNet data set plus some C/C# code from BigQuery. The official PyTorch implementation of CodeT5 resides on GitHub, and two checkpoints are available at Hugging Face, with links in the GitHub README.

GitHub Copilot

When I reviewed a pre-release version of GitHub Copilot in November 2021, I found that, while it didn’t always generate good, correct, or even running code, it was still somewhat useful. Copilot is based on OpenAI Codex, which is in turn based on GPT-3, which was fine-tuned for code generation on 54 million open-source GitHub repositories. GitHub Copilot currently costs $10 per month or $100 per year, unless you qualify for a free version. 

I like the way that Copilot works within Visual Studio Code. You basically have to write the first line of a function, or a comment describing the function, and Copilot will generate up to 10 versions of the function that you can use as is, edit, or not use. As I noted above, you should take any code generated by Copilot with a grain of salt, as it does tend to hallucinate, for example in the code comments in lines 8 and 9 of the example shown below.

ai code generation 03 IDG

Code produced by GitHub Copilot. I generated lines 8 and 9 by typing the beginnings of the lines and a bunch of tabs. I typed line 10 and the beginning of line 11, and Copilot finished line 11. Note the incorrect generated comments about the expected result values in lines 8 and 9.

GitHub Copilot X

GitHub Copilot X, currently in technical preview, is based on GPT-4. It “levels up” the original Copilot with chat and terminal interfaces, the ability to generate unit tests, the ability to generate pull request descriptions, and the ability to extract explanations from documentation.

GitHub Copilot X is greatly improved over the original GitHub Copilot, and can sometimes generate a correct function and set of tests without much human help. It still makes mistakes and hallucinates, but not nearly as much as its predecessor. For reference, my writeup on the original Copilot is here.

ai code generation copilot x IDG

I was able to get GitHub Copilot X to generate most of this correct function and good set of parameterized tests simply by typing the comment at the top and pressing Enter and Tab four or five times.

IntelliSense and IntelliCode

Microsoft IntelliSense is a built-in capability of Visual Studio and Visual Studio Code that uses language semantics to offer a menu of choices for short code completions. It often works well for helping you find the API or method call you want, but tends to offer many choices.

IntelliCode is an add-in enhancement to IntelliSense that uses AI running on your local machine to detect your code context—including variable names, functions, and the type of code you’re writing—to give you the best suggestions, and in some cases give you whole-line completions. IntelliCode can also help you clean up repetitive code and recommend quick actions for common programming tasks.

IntelliCode works with C#, C++, Java, SQL, and XAML in Visual Studio 2022, and with TypeScript, JavaScript, and Python in Visual Studio Code.

Kite

Kite was an early attempt at using AI to help developers write code, operating from 2014 to 2021. While it attracted over 500K developers, it never generated any revenue. The Kiteco repositories contain most of its source code, but there were private bits that have been replaced with XXXXX, so some of the code won’t run.

PolyCoder

PolyCoder is a 2022, 2.7 billion parameter open-source large language model for code generation from Carnegie Mellon University (see paper). It’s based on the GPT-2 model architecture and trained on 249 GB of code across 12 programming languages. In the C programming language, PolyCoder outperforms all models including Codex.

Replit Ghostwriter

Replit Ghostwriter, released on Halloween 2022, offers five functions: code completion, code explanation, code transformation, code generation, and error detection with debugging, for $10 per month (more or less, depending on how many “cycles” you use). It integrates with the Replit online editor (only) and supports Python, Ruby, JavaScript, TypeScript, HTML, CSS, Go, Lisp, Haskell, Bash, C, C++, Rust, Java, and JSON.

According to Replit, Ghostwriter “returns results generated from large language models trained on publicly available code and tuned by Replit.” Replit doesn’t specify either the LLMs or the training corpora it uses for Ghostwriter, which opens it up to the same accusation that Emily Bender made about GPT-4: You should assume Ghostwriter to be toxic trash until and unless Replit is open about its training data, model architecture, etc. It also opens Replit up to the same accusations of “open-source software piracy” making their way through the courts about GitHub Copilot.

Tabnine

Tabnine, from a company of the same name in Tel Aviv, looks like IntelliSense on steroids, and can optionally train on your own code corpus as well as on open-source code. It does whole-line and full-function code completions in your editor or IDE, with support for 20 such tools, from Visual Studio Code and IntelliJ to Emacs and Vim.

Depending on the plan you choose, Tabnine can use a generic AI model trained on open-source code with permissive licenses, or a set of generative AI models optimized for all programming languages, “specialized to match your tech stack,” or a private code model trained on your own repositories.

Tabnine’s free Starter plan only does basic code completion. The Pro plan does whole-line and full-function code completions for $12 per user per month. Tabnine has not disclosed its model architecture or training corpora. So, by the Emily Bender principle, you should assume the worst about any code it generates.

Large language models can sometimes work to generate or complete code, whether or not they’ve been trained on code corpora. Language models that have been trained on code tend to know more about the importance of whitespace. And code generation products such as OpenAI Codex and Tabnine often have better integrations with programming editors than more generic language models.

We should expect AI code generators to improve with time and use. GitHub Copilot X is better than the original Copilot, and I’m confident the next Copilot will be better still. Nevertheless, you can never assume that code generated by AI of any kind is correct or efficient, or even that it will compile and run. You should always treat AI-generated code like a pull request from an unknown programmer, which means reviewing it, testing it, and debugging it before making it part of your application.

Copyright © 2023 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2