There’s an old adage often shared by developers building on Microsoft platforms: “How can you tell if a Microsoft product is ready for prime time? When Microsoft uses it for one of its flagship applications or services.”
That means it was time to start using the Orleans distributed application framework when it powered large parts of Halo, time to use Fluid Framework when it went into Teams, and on and on. The latest service to get the stamp of approval is Windows containers on Azure Kubernetes Service. Microsoft has spent the past year or so working to move large pieces of the Microsoft 365 platform onto AKS with the aim of making it more scalable and flexible in the light of the rapid changes in work patterns driven by the COVID-19 pandemic.
Moving Microsoft 365 to cloud-native and AKS
Moving a service the size of Microsoft 365 to containers was a complex process; it had been hard enough going from the Office Online single-tenant systems to a multitenant virtualized architecture, especially when combined with a move to CI/CD (continuous integration and continuous delivery). That first shift put in place many of the architectural refinements that would be necessary for a shift to containers. First and foremost was moving state from the application VMs to what we now know as the Microsoft Graph. Still, a lot of the service was custom, especially for managing availability and supporting networking between the machines and services that made up a tenant.
That approach led to a lack of consistency: Application builds had to target specific platforms. As a result, it built in architectural inefficiencies as different server types were required to host different VMs, increasing the complexity and costs of the data centers that hosted Microsoft 365 services. That added to the cost of running the service. Loads couldn’t easily be moved between servers to ensure optimal utilization, which reduced the cost advantages of hyperscale.
Building on Kubernetes requires rethinking what had been monolithic services and refactoring them to work as scalable microservices. However, as they could use Windows containers, the team didn’t lose anything they were already using: AKS container hosts could be provisioned with the appropriate .NET tools and services with access to Windows APIs. While those host features are shared between containers, container isolation ensures they can be accessed securely.
At the same time, the smaller size of container instances compared to VMs ensures that more applications can be run on the same number of physical hosts, reducing overall costs and allowing more efficient use of Azure hardware. Microsoft’s internal accounting systems mean that groups need to budget for cloud usage, so any savings can be invested elsewhere in the service.
There are other benefits of moving to cloud-native architecture for Microsoft 365. All developers share the same API surface, which simplifies tests and change management and allows the team to use CI/CD as part of an application ops model, keeping platform ops separate from the code and managing the AKS features used by the service. Applications are built and deployed first to test clusters, then to early rings for internal users and external insiders before being moved to production.
How to containerize your own code
If Microsoft can move its code to containers and AKS, how can you do the same? Clearly, much of the change has to be organizational. You need a mature devops practice that’s already split into three parts, with dedicated infrastructure, platform, and applications teams. Then you need to lift and shift that code, making necessary changes to support working in a container environment. Monolithic applications are unlikely to function well in a container-based environment, especially one like AKS where much of your platform operations are automated, scaling on the fly and using platform-level service meshes to manage declarative networking and security.
Usefully, Microsoft’s Windows Containers team recently put out documentation based on its experience working with customers like Microsoft 365. This gives you a set of pointers to consider when moving an application from a Windows Server environment—even one that’s virtualized—to containers. Working with containers isn’t like working with a server, even if we do get access to familiar APIs and libraries.
Keep an eye out for container blockers
Much of the list of blockers is common sense. Containers aren’t for interactive applications, and there’s no GUI support. The host OS is a version of Windows Server Core, so code needs to be designed to work for it, for example, only supporting silent installs or not allowing RDP access. With no UI, code needs alternate management APIs, for example, providing endpoints for use with Windows Admin Center.
Similarly, you should make sure that code never stores data inside a container. That includes settings. Containers need to be treated as stateless, ephemeral items that are created and destroyed as required by a container orchestration platform such as Kubernetes. If you’re targeting AKS, consider using an Azure storage instance, such as Azure Files or a Blob to hold state and data for your containers. That way, if a container handling a payment process fails, a replacement can pick up session state and carry on without a user noticing. Similarly, if demand requires extra containers, they can pick up application state and settings as soon as they’re ready to go.
There are other limitations. Your code needs to run on Windows Server 2016 or newer, so older applications may need some compatibility work. The same goes for older versions of the .NET Framework. Although Microsoft provides container images with supported versions, you’re best off making sure code runs under more recent versions which are designed to support microservice architectures and have a smaller footprint, allowing more containers to run on the same host. It's important to avoid any dependencies on Active Directory roles, or for that matter, any Windows Server infrastructure features. Your container is for your application, nothing else.
Take advantage of cloud services where possible
If you’re planning on moving to AKS or Azure Container Instances, or even Azure Container Apps, it’s worth considering where you can use other Azure services within your application. If you have dependencies on databases or other applications, you may well find using the Azure equivalent easier than setting up a virtual server to host the application. Alternatively, a cloud-optimized version and vendor-supported version may be in the Azure Marketplace. Similarly, where you might have used Active Directory for access control, consider using Azure Active Directory APIs as these are compatible with ephemeral containers.
Microsoft’s containerization documentation provides suitable alternatives for on-premises services that aren’t supported in containers. Switching to them may take time and require additional development work, which could be a problem with legacy applications. In some cases, as much as you may want to move to cloud-native and containers, it may prove uneconomical or too complex.
Containerization is a useful technique for building new cloud-native applications, treating containers as the endpoint of a CI/CD pipeline, and using Kubernetes to orchestrate and scale the services that make up your application. Microsoft’s own experience shows that moving from virtual infrastructures to cloud-native is possible, and its documentation provides pointers on how to make the necessary changes. It’s not easy, but as Microsoft 365 proves, the benefits can be well worth the engineering effort necessary.