Microsoft CEO Satya Nadella described the arrival of large language AI models like GPT-4 as “a Mosaic moment,” comparable to the arrival of the first graphical web browser. Unlike that original Mosaic moment, when Microsoft was late to the browser wars and was forced to buy its first web development tooling, the company has taken a pole position in AI, rapidly rolling out AI technologies across enterprise and consumer products.
One key to understanding Microsoft is its view of itself as a platform company. That internal culture drives it to deliver tools and technologies for developers and foundations on which developers can build. For AI, that starts with the Azure OpenAI APIs and extends to tools like Prompt Engine and Semantic Kernel, which simplify the development of custom experiences on top of OpenAI’s Transformer-based neural networks.
As a result, much of this year’s Microsoft Build developer event is focused on how you can use this tooling to build your own AI-powered applications, taking cues from the “copilot” model of assistive AI tooling that Microsoft has been rolling out across its Edge browser and Bing search engine, in GitHub and its developer tooling, and for enterprises in Microsoft 365 and the Power Platform. We’re also learning where Microsoft plans to fill in gaps in its platform and make its tooling a one-stop shop for AI development.
LLMs are vector processing tools
At the heart of a large language model like OpenAI’s GPT-4 is a massive neural network that works with a vector representation of language, looking for similar vectors to those that describe its prompts and creating and refining the optimal path through a multidimensional semantic space that results in a comprehensible output. It’s similar to the approach used by search engines, but where search is about finding similar vectors to those that answer your queries, LLMs extend the initial set of semantic tokens that make up your initial prompt (and the prompt used to set the context of the LLM in use). That’s one reason why Microsoft’s first LLM products, GitHub Copilot and Bing Copilot, build on search-based services, as they already use vector databases and indexes, providing context that keeps LLM responses on track.
Unfortunately for the rest of us, vector databases are relatively rare, and they are built on very different principles from familiar SQL and NoSQL databases. They’re perhaps best thought of as multi-dimensional extensions of graph databases, with data transformed and embedded as vectors with direction and size. Vectors make finding similar data fast and accurate, but they require a very different way of working than other forms of data.
If we’re to build our own enterprise copilots we need to have our own vector databases, as they allow us to extend and refine LLMs with our domain-specific data. Maybe that data is a library of common contracts or decades worth of product documentation, or even all your customer support queries and answers. If we could store that data in just the right way, it could be used to build AI-powered interfaces to your business.
But do we have the time or the resources to take that data and store it in an unfamiliar format, on an unproven product? What we need is a way to deliver that data to AI quickly, building on tools we’re already using.
Vector search comes to Cosmos DB
Microsoft announced a series of updates to its Cosmos DB cloud-native document database at BUILD 2023. While most of the updates are focused on working with large amounts of data and managing queries, perhaps the most useful for AI application development is the addition of vector search capabilities. This also applies to existing Cosmos DB instances, allowing customers to avoid moving data to a new vector database.
Cosmos DB’s new vector search builds on the recently launched Cosmos DB for MongoDB vCore service, which allows you to scope instances to specific virtual infrastructure, along with high availability across availability zones—and to use a more predictable per node pricing model, while still using the familiar MongoDB APIs. Existing MongoDB databases can be migrated to Cosmos DB, allowing you to use MongoDB on premises to manage your data and use Cosmos DB in Azure to run your applications. Cosmos DB’s new change feed tooling should make it easier to build replicas across regions, replicating changes from one database across other clusters.
Vector search extends this tooling, adding a new query mode to your databases that can be used to work with your AI applications. While vector search isn’t a true vector database, it offers many of the same features, including a way to store embeddings and use them as a search key for your data, applying the same similarity rules as more complex alternatives. The tooling Microsoft is launching will support basic vector indexing (using IVF Flat), three types of distance metrics, and the ability to store and search on vectors up to 2,000 dimensions in size. Distance metrics are a key feature in vector search, as they help define how similar vectors are.
What’s perhaps most interesting about Microsoft’s initial solution is that it’s an extension to a popular document database. Using a document database to create a semantic store for a LLM makes a lot of sense: It’s a familiar tool we already know how to use to deliver and manage content. There are already libraries that allow us to capture and convert different document formats and encapsulate them in JSON, so we can go from existing storage tooling to LLM-ready vector embeddings without changing workflows or having to develop skills with a whole new class of databases.
It’s an approach that should simplify the task of assembling the custom data sets needed to build your own semantic search. Azure OpenAI provides APIs for generating embeddings from your documents that can then be stored in Cosmos DB along with the source documents. Applications will generate new embeddings based on user inputs that can be used with Cosmos DB vector search to find similar documents.
There’s no need for those documents to contain any of the keywords in the initial query; they only need to be semantically similar. All you need to do is run documents through a GPT summarizer and then generate embeddings, adding a data preparation step to your application development. Once you have a prepared data set, you will need to build a load process that automates adding embeddings as new documents are stored in Cosmos DB.
This approach should work well alongside the updates to Azure AI Studio to deliver AI-ready private data to your Azure OpenAI-based applications. What this means for your code is that it will be a lot easier to keep applications focused, reducing the risk of them going off prompt and generating illusory results. Instead, an application that’s generating bid responses for, say, government contracts can use document data from your company’s history of successful bids, to produce an outline that can be fleshed out and personalized.
Using vector search as semantic memory
Along with its cloud-based AI tooling Microsoft is bringing an interactive Semantic Kernel extension to Visual Studio Code, allowing developers to build and test AI skills and plugins around Azure OpenAI and OpenAI APIs using C# or Python. Tooling like Cosmos DB’s vector search should simplify building semantic memories for Semantic Kernel, allowing you to construct more complex applications around API calls. An example of how to use embeddings is available as an extension to the sample Copilot Chat, which should allow you to swap in a vector search in place of the prebuilt document analysis function.
Microsoft’s AI platform is very much that, a platform for you to build on. Azure OpenAI forms the backbone, hosting the LLMs. Bringing vector search to data in Cosmos DB will make it easier to ground results in our own organization’s knowledge and content. That should factor into other AI platform announcements, around tools like Azure Cognitive Search, which automates attaching any data source to Azure OpenAI models, providing a simple endpoint for your applications and tooling to test the service without leaving Azure AI Studio.
What Microsoft is providing here is a spectrum of AI developer tooling that starts with Azure AI Studio and its low-code Copilot Maker, through custom Cognitive Search endpoints, to your own vector search across your documents. It should be enough to help you build the LLM-based application that meets your needs.