When I wrote about the GitHub Copilot preview in 2021, I noted that the AI pair programmer didn't always generate good, correct, or even running code, but was still somewhat useful. At the time, I concluded that future versions could be real time-savers. Two years later, Copilot is improving. These days, it costs money even for individuals, and it has some competition. In addition, the scope of coding assistants has expanded beyond code generation to code explanations, pull request summaries, security scanning, and related tasks.
Three tools for AI pair programming
Let's start with a quick overview of the tools under review, then we'll dive in for a closer look at each one.
- Amazon CodeWhisperer is the product that competes most directly with Copilot. A “coding companion” like Copilot, CodeWhisperer integrates with Visual Studio Code and JetBrains IDEs, generates code suggestions in response to comments and code completions based on existing code, and can scan code for security issues. CodeWhisperer supports five programming languages well, and another 10 at a lesser degree of support. It can optionally flag and log references to code it uses and optionally filter out code suggestions that resemble open source training data.
- Google Bard is a web-based interface to LaMDA (Language Model for Dialogue Applications), a conversational AI model capable of fluid, multi-turn dialogue. Bard recently added the ability to help with coding and topics about coding. When Bard emits code that may be subject to an open source license, it cites its sources and provides the relevant information. Bard is also good at code explanations.
- GitHub Copilot X is a set of technical preview features that extend the original Copilot with chat and terminal interfaces, support for pull requests, and early adoption of OpenAI’s GPT-4. Currently, to access these Copilot X capabilities you need to have an active Copilot subscription and join the waiting list, with no guarantee about when you’ll get access to the new features. It took about a month for my invitation to arrive after I joined the waiting list. Note that Copilot X is not (and may never be) an available product offering of GitHub Copilot. I use the name Copilot X throughout this review to distinguish the GPT-4 version of Copilot I tested from the current Copilot product based on GPT-3.
Using one of these code generators is not the only way to generate code. To begin with, you can access general-purpose transformers like GPT-4 and its predecessors, including ChatGPT, BingGPT/Bing Chat (available in the Edge browser), and Boo.ai. There are also other code-specific AI tools, such as StarCoder, Tabnine, Cody, AlphaCode, Polycoder, and Replit Ghostwriter. In every case I’ve mentioned, it is vital to use discretion and carefully test and review the generated code before using it.
How the tools were tested
In my previous article about code generation, I evaluated the AI code generators based on the rather easy task of writing a program to determine the number of days between two dates. Most did okay, although some needed more guidance than others. For this review, I tried the code generators on the more difficult task of scraping InfoWorld.com for a list of articles. I gave them an outline but no additional help. None generated correct code, although some came closer than others. As an additional task, I asked the tools that support code explanation to explain a Python code example from an MIT Open Courseware introductory programming course.
For reference, the outline I gave to the code generators is:
Scrape infoworld.com front page:
Open https://www.infoworld.com/
Find all articles by looking for links with ‘article’ in the href; extract title, author, date from each
List all articles alphabetically by title; eliminate duplicates
List all articles alphabetically by author last name
List all articles latest first
In general, I tried to act like a more naive programmer than I am, just to see what the tools would do.
Now, let's look more closely at each of our code generators.
Amazon CodeWhisperer
Within your IDE, Amazon CodeWhisperer analyzes your English language comments and surrounding code to infer what code it should generate to complete what you are typing. Then, it offers code as a suggestion that you can either accept or reject, or you can ask CodeWhisperer for alternate code, or ignore and continue writing your own code. CodeWhisperer’s large language model (LLM) was trained on billions of lines of code, including Amazon and open source code. Any given suggestion is based not only on your comments and immediate code context, but also on the contents of other files open in the IDE.
In addition to code generation, CodeWhisperer can scan your Python, Java, and JavaScript code for security vulnerabilities and suggest fixes for them. The vulnerability lists it consults include Open Web Application Security Project (OWASP), crypto library best practices, AWS API best practices, and other API best practices. Security scans occur on-demand, unlike code completion, which is offered continuously as you code unless you turn off suggestions.
Programming languages and IDEs
CodeWhisperer’s best programming languages for code generation (the most prevalent languages in the training corpus) are Java, Python, JavaScript, TypeScript, and C#. It has been trained to a lesser extent on Ruby, Go, PHP, C++, C, Shell, Scala, Rust, Kotlin, and SQL.
There are CodeWhisperer plugins for Visual Studio Code and JetBrains IDEs. You can also activate CodeWhisperer for use inside AWS Cloud9 and AWS Lambda; in both cases, you must edit your IAM permissions as well as checking the Enable CodeWhisperer box.
I installed CodeWhisperer in Visual Studio Code. There are four steps:
- Add the CodeWhisperer plugin to VS Code.
- Add a connection to AWS.
- Sign in on the AWS website.
- Start CodeWhisperer from the AWS developer tools panel.
Code suggestions and completions
It’s worth watching some of the videos demonstrating CodeWhisperer’s capabilities, listed on the CodeWhisperer resource page. They’ll tell you more than I can in words and screenshots. While watching them, it became clear to me that CodeWhisperer shines on code that calls AWS APIs.
Nevertheless, here's a screenshot of the generated code from CodeWhisperer:
Note that the generated code did not work, and also that CodeWhisperer generated multiple functions for more or less the same purpose. It was fairly close, however, and CodeWhisperer kind of read my mind about using Beautiful Soup to scrape the web page.
Security scans and open source references
I ran the CodeWhisperer security scan on my project, as well. Unsurprisingly, no issues were detected in this file, although the CodeWhisperer scan found a resource leak in another file generated by GitHub Copilot X. The CodeWhisperer security scan is a differentiating feature that hasn’t yet surfaced in the competing AI code generators.
This file did not trigger any open source references, but when I opened the CodeWhisperer Reference Log, I found a link to the setting that controls whether CodeWhisperer will suggest code that has such references. In the Amazon examples I viewed, code suggestions with references displayed the license type at the top.
The bottom line for CodeWhisperer
CodeWhisperer generates code in an IDE and does security reviews, but it lacks a chat window and code explanations. The tool works more or less as advertised, and the price (free) is right for individual developers. For corporate developers, the $19 per month price for CodeWhisperer is the same as for Copilot. I’d favor CodeWhisperer over Copilot for shops that use AWS heavily and shops that need to know when code suggestions refer to open source.
Google Bard
Bard support for programming was announced April 21, 2023. The blog post mentioned “more than 20 programming languages including C++, Go, Java, Javascript, Python and Typescript.”
When I asked Bard for the full list, I received the following response:
Bard supports over 20 programming languages, including:
- C++
- Go
- Java
- JavaScript
- Python
- TypeScript
- C#
- PHP
- Ruby
- Swift
- Kotlin
- Rust
- Julia
- R
- Haskell
- Erlang
- Scala
- Lua
- OCaml
- Scheme
Bard can generate code in these languages, debug code, and explain code. It can also translate from one programming language to another.
Please note that Bard is still under development, so its capabilities may not be perfect. It is always a good idea to review AI-generated code carefully before using it.
I tested Bard on my code explanation and code generation tasks, which are shown below. The code explanation was quite good—so good that I wondered whether Bard had been trained on that MIT course code sample, which dates from 2008 and has been available on the web since then.
The code generation for the site scraper looked fairly credible at first glance but had several obvious problems on further inspection. It was a matter of clicking twice to open the code in Colab, but it threw a type error when I tried to run it, and it didn’t even get as far as the code I had identified as wrong.
Rather than fixing the problems, I tried running the code generation again, and the result made a bit more sense. It threw an attribute error when I tried to run it in Colab, however. I fed that traceback back to Bard, it generated a fix, I ran the “fixed” code in Colab, and there was a type error in a different spot. Enough, already! If I was actually a naive programmer, I would have thrown up my hands.
What about citing web pages and open source code? The Bard FAQ had this to say:
Bard, like some other standalone LLM experiences, is intended to generate original content and not replicate existing content at length. We’ve designed our systems to limit the chances of this occurring, and we will continue to improve how these systems function. If Bard does directly quote at length from a webpage, it cites that page.
Sometimes the same content may be found on multiple webpages and Bard attempts to point to a popular source. In the case of citations to code repositories, the citation may also reference an applicable open source license.
Here is the instruction I gave to Bard and the resulting dialog:
I'm about to give you some code to explain. Please wait until the code is complete.
Sure, I will wait until you are finished giving me the code. Please feel free to take your time. I am here to help you in any way that I can.
Here are some tips for explaining code:
• Start by explaining the overall purpose of the code. What does it do? What problem is it trying to solve?
• Then, break down the code into smaller chunks and explain each one in turn.
• Use clear and concise language. Avoid jargon and technical terms.
• Use examples to illustrate your points.
• Be patient and don't be afraid to ask questions.
I am confident that I can help you explain any code that you give me. Just let me know what you need and I will do my best to help.
And here is the code I provided:
import pylab, random
class Stock(object):
def __init__(self, price, distribution, vol):
self.price = price
self.history = [price]
self.distribution = distribution
self.vol = vol
self.lastChangeInfluence = 0.0
def setPrice(self, price):
self.price = price
self.history.append(price)
def getPrice(self):
return self.price
def makeMove(self, bias, mo):
oldPrice = self.price
baseMove = self.distribution(self.vol) + bias
self.price = self.price * (1.0 + baseMove)
self.price += mo*random.choice([0.0, 1.0])*self.lastChangeInfluence
self.history.append(self.price)
change = self.price - oldPrice
if change >= 0:
self.lastChangeInfluence = min(change, oldPrice*0.01)
else:
self.lastChangeInfluence = max(change, -oldPrice*0.01)
def showHistory(self, fig, test):
pylab.figure(fig)
pylab.plot(self.history)
pylab.title('Closing Prices, Test ' + test)
pylab.xlabel('Day')
pylab.ylabel('Price')
class SimpleMarket(object):
def __init__(self, numStks, volUB):
self.stks = []
self.bias = 0.0
for n in range(numStks):
volatility = random.uniform(0, volUB)
distribution = lambda vol: random.gauss(0.0, vol)
stk = Stock(100.0, distribution, volatility)
self.addStock(stk)
def addStock(self, stk):
self.stks.append(stk)
def setBias(self, bias):
self.bias = bias
def getBias(self):
return self.bias
def getStocks(self):
return self.stks[:]
def move(self, mo):
prices = []
for s in self.stks:
s.makeMove(self.bias, mo)
prices.append(s.getPrice())
return prices
class Market(SimpleMarket):
def __init__(self, numStks, volUB, dailyBiasRange):
SimpleMarket.__init__(self, numStks, volUB)
self.dailyBiasRange = dailyBiasRange
def move(self, mo):
prices = []
dailyBias = random.gauss(self.dailyBiasRange[0], self.dailyBiasRange[1])
for s in self.stks:
s.makeMove(self.bias + dailyBias, mo)
prices.append(s.getPrice())
return prices
def simMkt(mkt, numDays, mo):
endPrices = []
for i in range(numDays):
vals = mkt.move(mo)
vals = pylab.array(vals)
mean = vals.sum()/float(len(vals))
endPrices.append(mean)
return endPrices
def plotAverageOverTime(endPrices, title):
pylab.plot(endPrices)
pylab.title(title)
pylab.xlabel('Days')
pylab.ylabel('Price')
def plotDistributionAtEnd(mkt, title, color):
prices = []
sumSoFar = 0
for s in mkt.getStocks():
prices.append(s.getPrice())
sumSoFar += s.getPrice()
mean = sumSoFar/float(len(prices))
prices.sort()
pylab.plot(prices, color)
pylab.axhline(mean, color = color)
pylab.title(title)
pylab.xlabel('Stock')
pylab.ylabel('Last Sale')
pylab.semilogy()
def runTrial(showHistory, test, p):
colors = ['b','g','r','c','m','y','k']
mkt = Market(p['numStocks'], p['volUB'], p['dailyBiasRange'])
mkt.setBias(p['bias'])
endPrices = simMkt(mkt, p['numDays'], p['mo'])
pylab.figure(1)
plotAverageOverTime(endPrices, 'Average Closing Prices')
pylab.figure(2)
plotDistributionAtEnd(mkt, 'Distribution of Prices', colors[test%len(colors)])
if showHistory:
for s in mkt.getStocks():
s.showHistory(test+2, str(test))
def runTest(numTrials):
#Constants used in testing
numDaysPerYear = 200.0
params = {}
params['numDays'] = 200
params['numStocks'] = 500
params['bias'] = 0.1/numDaysPerYear #General market bias
params['volUB'] = 12.0/numDaysPerYear #Upper bound on volatility for a stock
params['mo'] = 1.1/numDaysPerYear #Momentum factor
params['dailyBiasRange'] = (0.0, 4.0/200.0)
for t in range(1, numTrials+1):
runTrial(True, t, params)
runTest(3)
pylab.show()
Here is some of Bard's response, which was quite in depth:
The code you have provided is a Python program that simulates a stock market. The program defines three classes:
Stock: This class represents a single stock. It has properties for the stock's price, history, distribution, volatility, and last change influence.
SimpleMarket: This class represents a simple stock market. It has a list of stocks, a bias, and a volatility upper bound.
Market: This class inherits from SimpleMarket and adds a daily bias range property. The program also defines several functions:
• simMkt: This function simulates the market for a given number of days and momentum factor.
• plotAverageOverTime: This function plots the average closing price of the stocks over time.
• plotDistributionAtEnd: This function plots the distribution of prices at the end of the simulation.
• runTrial: This function runs a single simulation and plots the results.
• runTest: This function runs a number of simulations and plots the results.
The program is run by calling the runTest function with the desired number of simulations. The program will then generate three plots:
• A plot of the average closing price of the stocks over time.
• A plot of the distribution of prices at the end of the simulation.
• A plot of the history of prices for each stock.
The program can be used to simulate a variety of different stock markets. By changing the values of the parameters, you can change the volatility of the market, the bias of the market, and the momentum factor. You can also use the program to simulate different trading strategies. For example, you could simulate a strategy that buys stocks when they are undervalued and sells them when they are overvalued.
Bard also provided an explanation of how each part of the code worked:
• The Stock class represents a single stock. It has properties for the stock's price, history, distribution, volatility, and last change influence. The price property is the current price of the stock. The history property is a list of the stock's prices over time. The distribution property is a function that generates random prices for the stock. The volatility property is a measure of how much the stock's price can fluctuate. The lastChangeInfluence property is a measure of how much the stock's price is affected by its previous change.
• The SimpleMarket class represents a simple stock market. It has a list of stocks, a bias, and a volatility upper bound. The list of stocks property is a list of the stocks in the market. The bias property is a measure of how much the market is biased towards rising prices. The volatility upper bound property is a measure of how much the market can fluctuate.
• The Market class inherits from SimpleMarket and adds a daily bias range property. The daily bias range property is a measure of how much the market's bias can change from day to day.
• The simMkt function simulates the market for a given number of days and momentum factor. The number of days parameter is the number of days to simulate. The momentum factor parameter is a measure of how much the prices of stocks are affected by their momentum.
• The plotAverageOverTime function plots the average closing price of the stocks over time. The plotDistributionAtEnd function plots the distribution of prices at the end of the simulation.
• The runTrial function runs a single simulation and plots the results. The showHistory parameter is a boolean value that determines whether the history of prices for each stock is plotted. The test parameter is a unique identifier for the simulation.
• The runTest function runs a number of simulations and plots the results. The numTrials parameter is the number of simulations to run.
I hope this explanation is helpful. Please let me know if you have any other questions.
When I followed up, typing in, "where is the pylab module?" Bard responded with the location of the module, how to import it, a description of features, and a pointer to the documentation.
Additionally, here is the Bard-generated code for scraping InfoWorld.com for a list of articles:
import requests
from bs4 import BeautifulSoup
def scrape_infoworld():
"""Scrape the InfoWorld front page and return a list of articles."""
# Get the HTML from the InfoWorld front page.
response = requests.get("https://www.infoworld.com/")
soup = BeautifulSoup(response.content, "html.parser")
# Find all articles by looking for links with 'article' in the href.
articles = soup.find_all("a", href=lambda x: "article" in x["href"])
# Extract the title, author, and date from each article.
article_data = []
for article in articles:
title = article.find("h3").text
author = article.find("span", class_="byline").text
date = article.find("time").text
article_data.append({
"title": title,
"author": author,
"date": date,
})
# Sort the articles alphabetically by title.
article_data.sort(key=lambda x: x["title"])
# Remove any duplicate articles.
article_data = list(set(article_data))
# Sort the articles alphabetically by author last name.
article_data.sort(key=lambda x: x["author"].split()[-1])
# Sort the articles latest first.
article_data.sort(key=lambda x: x["date"], reverse=True)
return article_data
if __name__ == "__main__":
articles = scrape_infoworld()
# Print the articles.
for article in articles:
print(article)
Explanation:
This code will first get the HTML from the InfoWorld front page. Then, it will find all articles by looking for links with 'article' in the href. For each article, it will extract the title, author, and date. The articles will then be sorted alphabetically by title, removed of any duplicates, sorted alphabetically by author last name, and sorted latest first. Finally, the articles will be printed.
And here is the runtime result:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-afbe7808577e> in <cell line: 40>()
39
40 if __name__ == "__main__":
---> 41 articles = scrape_infoworld()
42
43 # Print the articles.
6 frames
<ipython-input-1-afbe7808577e> in <lambda>(x)
10
11 # Find all articles by looking for links with 'article' in the href.
---> 12 articles = soup.find_all("a", href=lambda x: "article" in x["href"])
13
14 # Extract the title, author, and date from each article.
TypeError: string indices must be integers
The bottom line for Bard
Bard has a chat interface and both generates and explains code, but it doesn't have an interactive IDE integration. Given that Bard is completely free at this time, and requires no setup, it’s certainly worth trying when you need to generate, debug, and explain code in any of the 20 supported languages.
GitHub Copilot X
GitHub Copilot X is greatly improved over the original GitHub Copilot, and can sometimes generate a correct function and set of tests without much human help. It still makes mistakes and hallucinates (generates false information), but not nearly as much as it once did.
In addition to generating code within a programming editor, currently supporting only the most current versions of Visual Studio and the latest insider version of Visual Studio Code, Copilot X adds a GPT-4 chat panel to the editor. It also adds a terminal interface, support for generating unit tests and pull request descriptions, and the ability to extract explanations from documentation.
I asked the Copilot X chat what programming languages it supports, and it answered “̉I support a wide range of programming languages, including but not limited to: Python, JavaScript, TypeScript, Ruby, Java, C++, C#, PHP, Go, Swift, Kotlin, Rust, and many more.” I did my testing primarily in Python.
When I used the Copilot Chat facility to ask Copilot X to explain the MIT market simulation code, it gave a partially correct answer. I had to metaphorically pull its teeth to get it to explain the rest of the code.
Copilot X's most notable failure was the web-scraping code generation task. The tool generated a bunch of superficially credible-looking code that didn’t use Beautiful Soup, but it was clear from reviewing the code that it would never work. I kept bringing the problems to Copilot Chat, but it just dug itself a deeper hole. I could probably have started over and given it better hints, including handing it an import from bs4
and adding some comments showing the HTML and directory structure of the InfoWorld home page. I didn't do it because that would not be in character for the naive coder persona I had adopted for this round of tests.
As with all AI helpers, you have to take the code generated by Copilot X with a huge grain of salt, just as you would for a pull request from an unknown programmer.
The bottom line for Copilot X
In addition to generating code within an IDE, Copilot X adds a GPT-4 chat panel to the editor. It also adds a terminal interface, support for unit test generation, support for generating pull request descriptions, and the ability to extract explanations from technical documentation. Again, Copilot X is a set of technical preview features that are not yet available in a Copilot product offering; their product form and pricing remain to be seen. GitHub Copilot currently costs $10 per month for individuals and $19 per user per month for businesses.
Conclusion
GitHub Copilot X works decently on simple problems, but not necessarily better than the combination of Amazon CodeWhisperer in a code editor and Google Bard in a browser. It’s too bad that CodeWhisperer doesn’t yet have a chat capability or the facility for explaining code, and it’s too bad that Bard doesn’t exactly integrate with an editor or IDE.
I’d be tempted to recommend Copilot X (i.e., the forthcoming version of GitHub Copilot based on GPT-4, whatever the name turns out to be) if it hadn’t gone off the rails on my advanced code generation task—mainly because it integrates chat and code generation in an editor. At this point, however, Copilot X isn’t quite ready. Overall, none of the code generation products are really up to snuff, although both Bard and Copilot X do a decent job of code explanation.
All of these products are in active development, so my recommendation is to keep watching them and experimenting, but don't put your faith in any of them just yet.