The launch of Microsoft’s new AI-powered Bing shone new light on the company’s investments in OpenAI’s large language models and in generative AI, turning them into a consumer-facing service. Early experiments with the service quickly revealed details of the predefined prompts that Microsoft was using to keep the Bing chatbot focused on delivering search results.
Large language models, like OpenAI’s GPT series, are best thought of as prompt-and-response tools. You give the model a prompt and it responds with a series of words that fits both the content and the style of the prompt and, in some cases, even the mood. The models are trained using large amounts of data which is then fine-tuned for a specific task. By providing a well-designed prompt and limiting the size of the response, it’s possible to reduce the risk of the model producing grammatically correct but inherently false outputs.
Introducing prompt engineering
Microsoft’s Bing prompts showed that it was being constrained to simulate a helpful personality that would construct content from search results, using Microsoft’s own Prometheus model as a set of additional feedback loops to keep results on topic and in context. What’s perhaps most interesting about these prompts is that it’s clear Microsoft has been investing in a new software engineering discipline: prompt engineering.
It’s an approach that you should invest in too, especially if you’re working with Microsoft’s Azure OpenAI APIs. Generative AIs, like large language models, are going to be part of the public face of your application and your business, and you’re going to need to keep them on brand and under control. That requires prompt engineering: designing an effective configuration prompt, tuning the model, and ensuring user prompts don’t result in unwanted outputs.
Both Microsoft and OpenAI provide sandbox environments where you can build and test base prompts. You can paste in a prompt body, add sample user content, and see the typical output. Although there’s an element of randomness in the model, you’re going to get similar outputs for any input, so you can test out the features and construct the “personality” of your model.
This approach is not just necessary for chat- and text-based models; you’ll need some aspect of prompt engineering in a Codex-based AI-powered developer tool or in a DALL-E image generator being used for slide clip art or as part of a low-code workflow. Adding structure and control to prompts keeps generative AI productive, helps avoid errors, and reduces the risk of misuse.
Using prompts with Azure OpenAI
It’s important to remember that you have other tools to control both context and consistency with large language models beyond the prompt. One other option is to control the length of the response (or in the case of a ChatGPT-based system, the responses) by limiting the number of tokens that can be used in an interaction. This keeps responses concise and less likely to go off topic.
Working with the Azure OpenAI APIs is a relatively simple way to integrate large language models into your code, but while they simplify delivering strings to APIs, what’s needed is a way to manage those strings. It takes a lot of code to apply prompt engineering disciplines to your application, applying the appropriate patterns and practices beyond the basic question-and-answer options.
Manage prompts with Prompt Engine
Microsoft has been working on an open source project, Prompt Engine, to manage prompts and deliver the expected outputs from a large language model, with JavaScript, C#, and Python releases all in separate GitHub repositories. All three have the same basic functionality: to manage the context of any interaction with a model.
If you’re using the JavaScript version, there’s support for three different classes of model: a generic prompt-based model, a code model, and a chat-based system. It’s a useful way to manage the various components of a well-designed prompt, supporting both your own inputs and user interactions (including model responses). That last part is important as a way of managing context between interactions, ensuring that state is preserved in chats and between lines of code in an application.
You get the same options from the Python version, allowing you to quickly use the same processes as JavaScript code. The C# version only offers generic and text analysis model support, but these can easily be repurposed for your choice of applications. The JavaScript option is good for web applications and Visual Studio Code extensions, whereas the Python tool is a logical choice for anyone working with many different machine learning tools.
The intent is to treat the large language model as a collaborator with the user, allowing you to build your own feedback loops around the AI, much like Microsoft’s Prometheus. By having a standard pattern for working with the model, you’re able to iterate around your own base prompts by tracking outputs and refining inputs where necessary.
Managing GPT interactions with Prompt Engine
Prompt Engine installs as a library from familiar repositories like npm and pip, with sample code in their GitHub repositories. Getting started is easy enough once the module imports the appropriate libraries. Start with a Description of your prompt, followed by some example Interactions. For example, where you’re turning natural language into code, each interaction is a pair that has a sample query followed by the expected output code in the language you’re targeting.
There should be several Interactions to build the most effective prompt. The default target language is Python, but you can configure your choice of languages using a CodeEngineConfig
call.
With a target language and a set of samples, you can now build a prompt from a user query. The resulting prompt string can be used in a call to the Azure OpenAI API. If you want to keep context with your next call, simply add the response to a new Interaction, and it will carry across to the next call. As it’s not part of the original sample Interactions, it won’t persist beyond the current user session and can’t be used by another user or in another call. This approach simplifies building dialogs, though it’s important to keep track of the total tokens used so your prompt doesn’t overrun the token limits of the model. Prompt Engine includes a way to ensure prompt length doesn’t exceed the maximum token number for your current model and prunes older dialogs where necessary. This approach does mean that dialogs can lose context, so you may need to help users understand there are limits to the length of a conversation.
If you’re explicitly targeting a chat system, you can configure user and bot names with a contextual description that includes bot behaviors and tone that can be included in the sample Interactions, again passing responses back to Prompt Engine to build context into the next prompt.
You can use cached Interactions to add a feedback loop to your application, for example, looking for unwanted terms and phrases, or using the user rating of the response to determine which Interactions persist between prompts. Logging successful and unsuccessful prompts will allow you to build a more effective default prompt, adding new examples as needed. Microsoft suggests building a dynamic bank of examples that can be compared to the queries, using a set of similar examples to dynamically generate a prompt that approximates your user’s query and hopefully generates more accurate output.
Prompt Engine is a simple tool that helps you construct an appropriate pattern for building prompts. It’s an effective way to manage the limitations of large language models like GPT-3 and Codex, and at the same time to build the necessary feedback loops that help avoid a model behaving in unanticipated ways.