Llama Chat and Code Llama are good at coding

Llama 2 Chat can generate and explain Python code quite well, right out of the box. Code Llama’s fine-tuned models offer even better capabilities for code generation.

1 2 Page 2
Page 2 of 2

This model took my request for tests a little more seriously, and demonstrated how to use the Python Unit Test module.

I wondered how different the output of the Python-tuned Code Llama 7B model would be:

martinheller@Martins-M1-MBP ~ % ollama run codellama:7b-python
pulling manifest
pulling 1dd2448f61e7... 100% |████████████████████████████████████████████| (3.8/3.8 GB, 48 MB/s)
pulling 2c8743bdc4ad... 100% |███████████████████████████████████████████████| (7.0/7.0 kB, 52 MB/s)
pulling 38fa20ee7daa... 100% |███████████████████████████████████████████████| (4.8/4.8 kB, 64 MB/s)
pulling b507b9c2f6ca... 100% |██████████████████████████████████████████████████| (13/13 B, 60 kB/s)
pulling 404e21afdc6a... 100% |█████████████████████████████████████████████████| (30/30 B, 240 kB/s)
pulling b413591f8542... 100% |███████████████████████████████████████████████| (507/507 B, 3.2 MB/s)
verifying sha256 digest
writing manifest
success
>>> Generate a Python program to scrape a website. Include tests for python.org and infoworld.com
"""

import urllib.request, html.parser
class Website(object):
    """
    Class used to store the data scraped from the website
    :param str title: Title of article/page
    :param str author: Author of article/page
    :param str date: Date that the article was published on
    :param str content: Body text of article
    """
    
    def __init__(self, title=None, author=None, date=None, content=None):
        self.title = title #title of post/article
        self.author = author #author of the post or article
        self.date = date #date that the article was published on
        self.content = content #the body text of the post/article
class Article(object):
    """
    Class used to store data scraped from an infowars article
    :param str title: Title of article/page
    :param str author: Author of article/page
    :param str date: Date that the article was published on
    :param str content: Body text of article
    """
    
    def __init__(self, title=None, author=None, date=None, content=None):
        self.title = title #title of post/article
        self.author = author #author of the post or article
        self.date = date #date that the article was published on
        self.content = content #the body text of the post/article
def scrape(website_url, htmlparser=html.parser):
    """
    Function used to gather all the data from a website and return it in a form for use later

    :param str url: The URL of the website you are scraping from
    :param object parser: Defaults to the built-in html.parser but can be changed depending on the website's code style
    """
    
    #create website object and assign it to a variable called 'website'
    website = Website()
    
    #open url using urlopen function from urllib.request module
    with urllib.request.urlopen(website_url) as url:
        #use read function to get the contents of the page and store it in a variable called 'content'
        content = url.read()
        
        #use decode_content=True argument for html parsing module to be able to parse the url content
        parsed_content = htmlparser(markup=content, convert_charrefs=True, decode_entities=False, decode_content=True)
        print("\n")
        
        #search through content of webpage and find all instances where a tag is used to bold/italicize text in the article body text
        title = parsed_content.find("h1", {"class": "entry-title"})
        author = parsed_content.find("div", {"class": "author-info"}).text
        date = parsed_content.find("span", {"class": "updated"}).text
        content = parsed_content.findAll("p")[3].text + '\n' + parsed_content.findAll("p")[4].text + '\n' + parsed_content.findAll("p")[5].text
        
        #store all the data gathered from the website into variables in the 'website' object created earlier
        website.title = title.text
        website.author = author
        website.date = date
        website.content = content
        return website

All of a sudden, we see Python classes and a function that uses them. This run forgot to generate the unit tests. Let’s try the same prompt and model again and see if we get different output.

martinheller@Martins-M1-MBP ~ % ollama run codellama:7b-python
>>> Generate a Python program to scrape a website. Include tests for python.org and infoworld.com
.
#Python program to scrap data from website#
import requests
from bs4 import BeautifulSoup as soup
url_1 = 'http://python.org/'
reqs_1 = requests.get(url_1)
url_2 = 'https://www.infoworld.com/article/3067955/programming-language/python-the-best-programming-language.html'
reqs_2 = requests.get(url_2)
#HTML parsing from website python.org#
page_1 = reqs_1.text
#HTML parsing from website infoworld.com#
page_2 = reqs_2.text
#BeautifulSoup object for website python.org#
soup_1 = soup(page_1, 'html.parser')
#BeautifulSoup object for website infoworld.com#
soup_2 = soup(page_2,'html.parser')
def scrape():
    #Dictionary to store data from both websites python.org and infoworld.com#
    website_data = {}
    #Scraping all the text from headings of a website#
    headlines = soup_1.find_all('div',{'class':'headline'})
    for line in headlines:
        data = line.text
        if "Python" in data or 'Programming' in data:
            website_data['headlines'] = [data]
    
    #Scraping all the text from paragraphs of a website#
    paragrphs = soup_1.find_all('p')
    for line in paragrphs:
        data = line.text
        if "Python" in data or 'Programming' in data:
            website_data['paras'] = [data]
    
    #Scraping all the text from headings of a website#
    headlines2 = soup_2.find_all('p')
    for line in headlines2:
        data = line.text
        if "Python" in data or 'Programming' in data:
            website_data['headlines'] = [data]
    
    #Scraping all the text from paragraphs of a website#
    paragrphs2 = soup_1.find_all('p')
    for line in paragrphs2:
        data = line.text
        if "Python" in data or 'Programming' in data:
            website_data['paras'] = [data]
    
    #Return the dictionary of all text scraped from both websites python.org and infoworld.com#
    return website_data

This time the model generated code in a functional style instead of an object-oriented style. It did something ugly, though: Instead of using the home page of infoworld.com for its second test, it used the URL of an article about the Python programming language. Alas, that page does not currently exist, so we may have either bumped up against old content in the model, or the model had a hallucination.

Two Llamas for coding

As you’ve seen, Llama 2 Chat can generate and explain Python code quite well, right out of the box. There’s no need to fine-tune it further on code-generation tasks—although Meta has done exactly that for Code Llama.

Llama 2 Chat is not without controversy, however. Meta says that it’s open source, but the OSI begs to disagree, on two counts. Meta says that it’s more ethical and safer than other LLMs, but a class action lawsuit from three authors says that its training has violated their copyrights.

It’s nice that Llama 2 Chat works so well. It’s troubling that to train it to work well Meta may have violated copyrights. Perhaps, sooner rather than later, someone will find a way to train generative AIs to be effective without triggering legal problems.

Code Llama’s nine fine-tuned models offer additional capabilities for code generation, and the Python-specific versions seem to know something about Python classes and testing modules as well as about functional Python.

When the bigger Code Llama models are more widely available online running on GPUs, it will be interesting to see how they stack up against Llama 2 70B Chat. It will also be interesting to see how well the smaller Code Llama models perform for code completion when integrated with Visual Studio Code or another code editor.

At a Glance

Copyright © 2023 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2