Risk is part of our everyday lives. We risk an auto accident when we drive to the store but choose to do so since the value of obtaining food outweighs the minimal risk of driving a short distance. Most of us don’t even realize the amount of risk management we deal with subconsciously every day. On the other hand, most of us are very aware of the need to manage health risks during a pandemic.
The world of finance uses a term called “value at risk” (VAR) to quantify various aspects of risk. This statistic quantifies the extent of possible financial losses within a portfolio or investment position over a specific period. It’s a mathematical method to determine the impact of risk, whether the investment should be made at all, and how much it will cost if things go south versus the potential upside.
In the world of cloud computing—specifically, cloud computing architecture—we must make risk-to-value decisions every day. Although risk can’t be eliminated, it can be managed. We start by understanding the trade-offs of using different types of technologies at different price points.
For example, let’s look at tri-redundant storage. You leverage a different cloud provider to mitigate the risk that a single cloud provider will go down and affect the primary and secondary redundant storage systems on that single cloud brand (say, AWS). You also use an additional redundant storage system on another cloud brand (say, Microsoft Azure).
This will significantly reduce the chance that the storage system will be offline for any period. Risk is not eliminated, and the cost to lower that risk is roughly three times greater than if you leveraged a single non-redundant storage system. So, if you pay $30K per year for a single storage system, you will pay $90K to be tri-redundant.
The next question: Is the extra storage cost worth it to lower the risk? For this answer, you need to consider the value at risk for each technology selection and configuration decision.
Consider the value of redundant compute platforms that act as hot standbys. The cost can easily double, but you must evaluate how much risk that removes, and how much that risk means to the business. This means understanding the risk components of the business and the technology that can lower those risks.
There are two extremes here:
- On one end are companies that bet the cloud provider will continue to offer good uptime with minimal or no outages. While there is risk, it’s not worth the extra expense to lower it. Think of industries that can continue to operate when systems are offline for a few hours and the outage doesn’t significantly impact the business. Not many businesses fit into this category these days, but they do exist. For these companies, risk is not a huge worry. The selection and configuration of technology is done without much consideration that the technology will fail in some way.
- On the other end are companies for which risk avoidance is a primary consideration. Banks and many other businesses can’t tolerate system downtime that could cost as much as $1 million per hour in lost revenue. The value at risk, and consequently the value of reducing risk, is much higher, and these companies may opt for the more costly system that reduces the risk that the systems will be down for a significant amount of time.
Most businesses fall somewhere between those two extremes, and the value of risk avoidance is not as well defined. If that describes your enterprise, you’ll need to conduct an analysis to understand what your risk tolerance should be and define the value of risk avoidance.
Right now, few cloud architects understand how to find this risk-to-value figure for their enterprise. They may erroneously assume that they need more risk avoidance and spend more money than they should for little value returned. Or they may assume that the least expensive solution is fine, even though it comes with more risk. This is when risk avoidance has a much higher value than they understand.
Most current cloud architecture solutions are either over-engineered or under-engineered, based on an incorrect assumption of risk tolerance. It’s not rocket science to figure out how much risk can be avoided to reach optimized value. You just have to take the time to understand the value at risk for your business.