Microsoft has unveiled a preview of the ML.NET Text Classification API, an API intended to make it easier to train custom text classification models using the open source ML.NET machine learning framework.
Introduced June 14, the ML.NET Text Classification API uses “state-of-the-art” deep learning techniques, Microsoft said. ML.NET allows developers to integrate custom machine learning models into .NET apps. Text classification is the process of applying labels or categories to text. Common use cases include categorizing email as spam or not spam, analyzing sentiment as positive or negative from customer reviews, and applying labels to support tickets.
The ML.NET Text Classification API is powered by the TorchSharp .NET library, which provides access to the libtorch library that powers the PyTorch machine learning framework. TorchSharp has low-level capabilities for training neural networks from scratch in .NET. For ML.NET, some of the complexity of TorchSharp has been abstracted to make this training easier.
In collaboration with Microsoft Research, Microsoft took the TorchSharp implementation of NAS-BERT (Bidirectional Encoder Representations from Transformers), a variant of BERT obtained with neural architecture search, and added it to ML.NET. Starting with a pre-trained version of this model, the Text Classification API uses the user’s data to fine-tune the existing model rather than to build a new model from scratch.
The Text Classification API is part of the 2.0.0 and 0.20.0 preview versions of ML.NET. In addition to the Microsoft.ML package, it requires Microsoft.ML.TorchSharp and either TorchSharp-cpu (if using a CPU) or TorchSharp-cuda-windows or TorchSharp-cuda-linux (if using a GPU).
Developers can use the NuGet package manager in Visual Studio or the .NET CLI to install the packages. Code samples of the API can be found in the Text Classification API Notebook.
Microsoft pointed out there are still limitations with the API such as not being able to use the Evaluate
method to calculate evaluation metrics. Improvements are planned to the API along with introducing other scenario-based APIs.