So, you’re building a cloud architecture and also designing generative AI-powered systems. What do you need to do differently? What do you need to do the same? What are the emerging best practices? After building a few of these in the past 20 years, and especially in the past two years, here are my recommendations:
Understand your use cases
Clearly define the purpose and goals of the generative AI within your cloud architecture. If I see any mistake repeatedly, it’s not understanding the meaning of generative AI within the business systems. Understand what you aim to achieve, whether it’s content generation, recommendation systems, or other applications.
This means writing stuff down and finding consensus on the objectives, how to address the goals, and most importantly, how to define success. This is not new with generative AI only; this is a step to win with every migration and net-new system built in the cloud.
I’m seeing whole generative AI projects in the cloud fail because they don’t have well-understood business use cases. Companies build something that is cool but does not return any value to the business. That won’t work.
Data sources and quality are key
Identify the data sources required for training and inference by the generative AI model. The data has to be accessible, good quality, and carefully managed. You must also ensure availability and compatibility with cloud storage solutions.
Generative AI systems are highly data-centric. I would call them data-oriented systems; the data is the fuel that drives outcomes from generative AI systems. Garbage in, garbage out.
Thus, it helps to focus on data accessibility as a primary driver of cloud architecture. You need to access most of the relevant data as training data, typically leaving it where it exists and not migrating it to a single physical entity. Otherwise, you end up with redundant data and no single source of truth. Consider efficient data pipelines for preprocessing and cleaning data before feeding it into the AI models. This ensures data quality and model performance.
This is about 80% of the success of cloud architecture that use generative AI. However, it is most overlooked as the cloud architects focus on the generative AI system processing more than the data feeding these systems. Data is everything.
Data security and privacy
Just as data is important, so is security and privacy as applied to that data. Generative AI processing could turn seemingly unmeaningful data into data that can expose sensitive information.
Implement robust data security measures, encryption, and access controls to protect sensitive data used by the generative AI and the new data that generative AI may produce. At a minimum, comply with relevant data privacy regulations. This does not mean bolting some security system on your architecture as a final step; security must be architected into the systems at every step.
Scalability and inference resources
Plan for scalable cloud resources to accommodate varying workloads and data processing demands. Most companies consider auto-scaling and load-balancing solutions. One of the more significant mistakes I see is building systems that scale well but are hugely expensive. It’s best to balance scalability with cost-efficiency, which can be done but requires good architecture and finops practices.
Also, examine training and inference resources. I suppose you’ve noticed that much of the news at cloud conferences is around this topic, and for good reason. Select appropriate cloud instances with GPUs or TPUs for model training and inference. Again, optimize the resource allocation for cost-efficiency.
Consider model selection
Choose the exemplary generative AI architecture (General Adversarial Networks, transformers, etc.) based on your specific use case and requirements. Consider cloud services for model training, such as AWS SageMaker and others, and find optimized solutions. This also means understanding that you may have many connected models, which will be the norm.
Implement a robust model deployment strategy, including versioning and containerization, to make the AI model accessible to applications and services in your cloud architecture.
Monitoring and logging
Setting up monitoring and logging systems to track AI model performance, resource utilization, and potential issues is not optional. Establish alerting mechanisms for anomalies as well as observability systems that are built to deal with generative AI in the cloud.
Moreover, continuously monitor and optimize cloud resource costs, as generative AI can be resource intensive. Use cloud cost management tools and practices. This means having finops monitor all aspects of your deployment—operational cost-efficiency at a minimum and architecture efficiency to evaluate if your architecture is optimal. Most architecture needs tuning and ongoing improvements.
Other considerations
Failover and redundancy are needed to ensure high availability, and disaster recovery plans can minimize downtime and data loss in case of system failures. Implement redundancy where necessary. Also, regularly audit and assess the security of your generative AI system within the cloud infrastructure. Address vulnerabilities and maintain compliance.
It’s a good idea to establish guidelines for ethical AI usage, especially when generating content or making decisions that impact users. Address bias and fairness concerns. There are currently lawsuits over AI and fairness, and you need to ensure that you’re doing the right thing. Continuously evaluate the user experience to ensure AI-generated content aligns with user expectations and enhances engagement.
Other aspects of cloud computing architecture are pretty much the same whether you’re using generative AI or not. The key is to be aware that some things are much more important and need to have more rigor, and there is always room for improvement.