Using Hugging Face machine learning models in Azure

Microsoft’s recent Azure Open Source Day showed off a new reference application built using cloud-native tools and services, with a focus on Microsoft’s own open source tools. The app was built to be a service to help owners reunite with lost pets. It uses machine learning to quickly compare photographs of a missing animal with images from animal shelters, rescues, and community sites. It’s a good example of how open source tools can build complex sites and services, from infrastructure as code tools to application frameworks and various tools that add functionality to code.

At the heart of the application is an open source machine learning model, part of a library of many thousands of models and data sets developed by the Hugging Face community and built on top of its large selection of different tools and services. The community’s scale is a good reason to use Hugging Face’s models, either importing them for inferencing in your own code, running on your own servers, or accessing them via a cloud API.

Why use Hugging Face?

There’s another reason for considering working with Hugging Face in Azure: It allows you to apply AI to many different business problems. Although Microsoft’s own Cognitive Services APIs cover many common AI scenarios with well-defined APIs, they’re one company’s opinionated view of what machine learning services make sense for enterprises. That does make them something of a jack-of-all-trades, designed for general purposes rather than specific tasks. If your code needs to support an edge case, it can be a lot of work to add appropriate tunings to the APIs.

Yes, there’s the option of building your own specific models using Azure’s Machine Learning studio, working with tools like PyTorch and TensorFlow to design and train models from scratch. But that requires significant data science and machine learning expertise in building and training models. There are other issues with a “from scratch” approach to machine learning. Azure has an expanding number of virtual machine options for machine learning training, but the process can have significant compute requirements and is expensive to run, especially if you’re building a large model that requires a lot of data. We’re not all Open AI and don’t have the budgets to build cloud-hosted supercomputers for training!

With over 40,000 models building on its Transformer model framework, Hugging Face can help short-circuit the customization problem by having models that have been built and trained by the community for many more scenarios than Microsoft’s alone. You’re not limited to text, either; Hugging Face’s Transformers have been trained to work with natural language, audio, and computer vision. Hugging Face describes these functions as “tasks,” with, for example, over 2,000 different models for image classification and nearly 18,000 for text classification.

Hugging Face in Azure

Microsoft recently launched support for Hugging Face models on Azure, offering a set of endpoints that can be used in your code, with models imported from the Hugging Face Hub and from its pipeline API. Models are built and tested by the Hugging Face community, and the endpoint approach means they’re ready for inference.

Models are available for no cost; all you pay for are the Azure compute resources to run inferencing tasks. That’s not insignificant, especially if you are working with large amounts of data, and you should compare pricing with Azure’s own Cognitive Services.

Building endpoints for your code

Creating an endpoint is simple enough. In the Azure Marketplace, select Hugging Face Azure ML to add the service to your account. Add your endpoint to a resource group, then select a region and give it a name. You can now choose a model from the Hugging Face Hub and select the model ID and any associated tasks. Next, choose an Azure compute instance for the service and a VNet to keep your service secure. This is enough to create an endpoint, generating the URLs and keys necessary to use it.

Usefully, the service supports endpoints to autoscale as necessary, based on the number of requests per minute. By default, you’re limited to a single instance, but you can use the sliders in the configuration screen to set a minimum and maximum number of instances. Scaling is driven by an average number of requests over a five-minute period, aiming to smooth out spikes in demand that could cause unnecessary costs.

For now, there’s very little documentation on the Azure integration, but you can get a feel for it by looking at Hugging Face’s AWS endpoint documentation. The Endpoint API is based on the existing Inference API, and you can determine how to structure payloads.

The service gives you a handy playground URL to test out your inferencing model. This includes sample Python and JavaScript code, as well as the option of using curl from the command line. Data is sent as JSON, with responses delivered in a similar fashion. You can use standard libraries to assemble and process the JSON, allowing you to embed REST calls to the API in your code. If you’re using Python, you can take the sample code and copy it into a Jupyter notebook, where you can share tests with colleagues, collaboratively building a more complete application.

Customizing Hugging Face models in Azure Machine Learning

You can now use Hugging Face’s foundation models in Azure Machine Learning with the same tools you use to build and train your own models. Although the capability is currently in preview, it’s a useful way of working with the models, using familiar tools and technologies, using Azure Machine Learning to fine-tune and deploy Hugging Face models in your applications. You can search for models using the Azure Machine Learning registry, ready to run.

This is a quick way of adding additional pretrained model endpoints for your code; you also have the option of fine-tuning models on your own data, using Azure storage for both training and test data and working with Azure Machine Learning’s pipelines to manage the process. Treating Hugging Face models as a foundation for your own makes a lot of sense; they’re proven in a range of cases that might not quite be right for you. A model trained on recognizing flaws in metalwork has some of the features necessary for handling glass or plastic, so additional training will reduce the risk of error.

There’s a growing open source machine learning community, and it’s important that companies like Microsoft embrace it. They may have experience and skills, but they don’t have the scale of that wider community—or its specialization. By working with communities like Hugging Face, developers get more options and more choice. It’s a win for everyone.

Using Hugging Face machine learning models in Azure

Microsoft is working to bring open source machine learning models into Azure applications and services.

Why use Hugging Face?

Hugging Face in Azure

Building endpoints for your code

Customizing Hugging Face models in Azure Machine Learning