Large language model-based AI systems are here to stay. While they’re unlikely to be the foundation of world-changing artificial general intelligence, they look set to become an important part of the modern software development toolkit. That’s because large language models (LLMs) not only can help us code, but can be used in our code, giving us capabilities that require too much processing power using older techniques.
LLMs as natural language interfaces
Perhaps most important is the role large language models can play in providing natural language interfaces to applications, as Microsoft CEO Satya Nadella noted in his Inspire 2023 keynote to partners. Models like OpenAI’s GPT-4 are powerful tools for extracting meaning and sentiment from user inputs, quickly summarizing content, and delivering content in a surprising range of formats.
Once we start focusing on what we can use tools like GPT for, they stop being an interesting novelty and the powerful role they can play in our toolchains becomes apparent. After all, there are only so many times you can ask ChatGPT to “Tell me a story.”
Instead, we need to think of LLMs as a way of capturing user requests, parsing them, and then pushing them to APIs. It’s simple enough to use prompt engineering to get the outputs we want, for example generating formatted lists that contain key elements from user requests. But we can go further. We can take those outputs, parse them, and then use them in our applications, passing them to familiar APIs.
Using LLMs to generate JSON
But what if we could short-circuit that process and, instead of building complicated regular expression handlers to turn text into data, go straight to a machine-readable format like JSON? As the TypeChat team at Microsoft discovered, that’s surprisingly easy, because much of the training data for LLMs included well-formatted JSON documents. Simply craft a prompt that includes a sample JSON document, and the LLM you’re working with will return its output in the correct format.
So instead of building a traditional user interface for ordering, say, hamburgers, one that uses buttons and forms to collect the order, you could use a series of LLM calls—first to convert speech to text, and then to convert the text to JSON. Then all a customer needs to do is pick up their phone, tap a microphone icon, and ask for “one large cheeseburger with fries.”
That’s what Satya Nadella meant by a natural language interface, one that’s functionally identical to talking to a person. The LLM behind the scenes “understands” the request and then produces an API-ready output that delivers the user request to an ecommerce platform. We can significantly reduce the risk of incorrect outputs by grounding the model, using embeddings from external vector databases to constrain the outputs to a defined set of content, sometimes described as semantic memory.
If you’re using a LLM orchestration tool like LangChain or Semantic Kernel, you can make this grounding part of your application flow by taking a natural language input, summarizing it, and using that summary as an input to a vector search. The LLM takes those search results as part of a prompt and turns them into API JSON, ready for use by your application.
Constraining LLM outputs to a JSON schema
But what’s really needed is a way to express this all programmatically, so we can write the necessary code to build these interfaces. That’s where the TypeChat library comes in, building on the familiar type definition model and tooling of TypeScript to help manage and parse these natural language connections.
Instead of providing example JSON structure, like many other prompt engineering-driven approaches, TypeChat uses type definitions to guide the LLM response. Experiments with handcrafted prompts showed that defining TypeScript-style types and using them to constrain the JSON produced by a LLM gave better responses. Furthermore, those responses could be validated by TypeScript’s transpiler, ensuring that errors could be detected quickly and used as feedback to build better responses in future.
Thus TypeChat uses constraints to reduce the risk of random responses from the model, and at the same time elicits JSON data that can be used by your application. The underlying LLM has been, for want of a better word, tamed—directed toward extracting the information your code needs from natural language inputs. As Microsoft puts it, TypeChat is designed to define a “response schema” that extracts intent from user requests and to use that intent as part of an API call.
Using TypeChat in your code
You can install TypeChat from npm, using it as a TypeScript module. The TypeChat development team recommends building your response schema outside of your application code, creating it as an interface and importing it along with the TypeChat library. This approach allows you to maintain your prompts outside the body of your code, making it easier to change your application quickly and add additional features as needed.
It’s easy to get started with TypeChat—all you need is a machine that’s set up with a recent Node.js install. Once you have cloned the TypeChat repository, use npm install
to set it up. Microsoft offers a set of examples that can be built to try out different ways of using TypeChat, helping you see how different types of natural language interface can be used in your code. You will need either an OpenAI or Azure OpenAI endpoint, with your chosen model and your API key stored as environment variables or in a .env file that can be read at runtime.
Building a TypeChat interface to a large language model is as simple as setting up the model you’re using, using your environment variables, then loading a schema before calling TypeChat’s createJsonTranslator
method. You can then call the processRequests
method with your user input to deliver your prompt, then wait for a response before extracting the response data from the resulting JSON object.
What’s nice about the TypeChat approach is that it’s still the familiar TypeScript (and by extension JavaScript) programming model. You’re using standard requests and responses. All you need to do is take in a text object, deliver it to the LLM endpoint, and extract the formatted data that corresponds to the schema you designed earlier, ready for use by your application business logic. There’s nothing new here—the constrained LLM output is like that from any RESTful API.
TypeChat best practices
One useful feature of TypeChat is the ability to have an “unknown” category in your schema that serves as an escape hatch. If a user’s request can’t be interpreted, the LLM response can be delivered here and used as a trigger to ask the user to refine the original query. At the same time, the unknown category can serve as a dump for spurious outputs. It’s worth logging the contents of this field in the JSON output to monitor overall accuracy and to refine your type schema where necessary.
Other best practices include ensuring that your types are compatible with JSON, and that you keep them simple. Natural language comments in your schema help build your prompts. The documentation compares using TypeChat to using MVVM (Model-View-ViewModel) patterns, where TypeChat is described as a Response Model, serving the same bridging role between business logic code and user interfaces, where the LLM has become the equivalent of the UI.
The combination of LLMs and TypeChat is effectively a new type of machine translation, but instead of linking human to human, it’s linking human to machine (and as we’ve seen with Semantic Kernel, vice versa). However, it shouldn’t be used as a shortcut to avoid user research. If you’re going to extract meaning from natural language user inputs, in speech or in text, you will still need an understanding of what users want from the system and how they ask for it. That will give you the structure you need to refine your response schema, and to build the necessary TypeChat code.
While the current version of TypeChat work with Azure OpenAI and OpenAI’s own APIs, it’s intended to be model neutral. That should allow you to use it with Hugging Face transformers and Meta’s Llama 2. Because TypeChat is an open-source project, with an MIT license, you have full access to the code on GitHub, and you can deliver any modifications you make as pull requests.