Deep learning is a rapidly growing discipline that models high-level patterns in data as complex multilayered networks. Because it is the most general way to model a problem, deep learning has the potential to solve the most challenging questions in machine learning and artificial intelligence. Companies like Microsoft and Google use deep learning to solve difficult problems in areas such as speech recognition, image recognition, 3-D object recognition, and natural language processing.
However, deep learning requires considerable computing power to construct a useful model. Until recently, the cost and availability of computing limited its practical application. Moreover, researchers lacked the theory and experience to apply deep learning to practical problems. Given available time and resources, other methods often performed better.
Today the advance of Moore’s Law has radically reduced computing costs. In addition, innovative algorithms provide faster and more efficient ways to train a model. With more experience and accumulated knowledge, data scientists also have more theory and practical guidance to derive value from deep learning.
While media reports tend to focus on futuristic applications in speech and image recognition, data scientists are using deep learning to solve highly practical problems in all aspects of business. For example:
- Payment systems providers use deep learning to identify suspicious transactions in real time.
- Organizations with large data centers and computer networks use deep learning to mine log files and detect threats.
- Vehicle manufacturers and fleet operators use deep learning to mine sensor data to predict part and vehicle failure.
- Deep learning helps companies with large and complex supply chains predict delays and bottlenecks in production.
With the increased availability of deep learning software and the skills to use it effectively, you can expect the list of commercial applications to grow rapidly in the next several years.
The power of deep learning
Relative to other machine learning techniques, deep learning has four key advantages:
- Its ability to detect complex interactions among features
- Its ability to learn low-level features from minimally processed raw data
- Its ability to work with high-cardinality class memberships
- Its ability to work with unlabeled data
Taken together, these four strengths mean that deep learning can produce useful results where other methods fail; it can build more accurate models than other methods; and it can reduce the time needed to build a useful model.
Deep learning detects interactions among variables that may be invisible on the surface. Interactions are the effect of two or more variables acting in combination. For example, suppose that a drug causes side effects in young women, but not in older women. A predictive model that incorporates the combined effect of sex and age will perform much better than a model based on sex alone.
Conventional predictive modeling methods can measure these effects, but only with a lot of manual hypothesis testing. Deep learning detects these interactions automatically and does not depend on an analyst’s expertise or prior hypotheses. It also creates nonlinear interactions automatically and can approximate any arbitrary function with enough neurons, especially when deep neural networks are used.
With conventional predictive analytics methods, success depends heavily on the data scientist’s ability to use feature engineering to prepare the data, a step that requires considerable domain knowledge and skill. Feature engineering also takes time. Deep learning works with minimally transformed raw data and learns the most predictive features automatically, without making assumptions about the correct distribution of data.
The figures below illustrate the power of deep learning. The four charts demonstrate how different techniques model a complex pattern. In the lower right, the chart shows how a Generalized Linear Model fits a straight line through the data. Tree-based methods, such as Random Forests and Gradient Boosted Machines, in the lower left and upper right, respectively, perform better than a General Linear Model. Instead of fitting a single straight line, these methods fit many straight lines through the data, markedly improving model “fit.” Deep learning, shown in the upper left, fits complex curves to the data, delivering the most accurate model.
Deep learning works well with what data scientists call high-cardinality class memberships, a type of data that has a very large number of discrete values. Practical examples of this type of problem include speech recognition, where a sound may be one of many possible words; image recognition, where a particular image belongs to a large class of images; or recommendation engines, where the optimal item to offer can be one of many.
Another strength of deep learning is its ability to learn from unlabeled data. Unlabeled data lacks a definite “meaning” pertinent to the problem at hand. Common examples include untagged images, videos, news articles, tweets, and computer logs. In fact, most of the data generated in the information economy today is unlabeled. Deep learning can detect fundamental patterns in such data, grouping similar items together or identifying outliers for investigation.
Drawbacks of deep learning
However, deep learning also has some disadvantages. Compared to other machine learning methods, it can be very difficult to interpret a model produced with deep learning. Such models may have many layers and thousands of nodes; interpreting each of these individually is impossible. Data scientists evaluate deep learning models by measuring how well they predict, treating the architecture itself as a “black box.”
Critics sometimes object to this aspect of deep learning, but it’s important to keep in mind the goals of the analysis. For example, if the primary goal of the analysis is to explain variance or to attribute outcomes to treatments, deep learning may be the wrong method to choose. However, it is possible to rank the predictor variables based on their importance, which is often all that data scientists look for. Partial dependency plots offer the data scientist an alternative way to visualize a deep learning model.
Deep learning also shares other machine learning methods’ propensity to overlearn the training data. This means the algorithm “memorizes” characteristics of the training data that may or may not generalize to the production environment where the model will be used. This problem is not unique to deep learning, and there are ways to avoid it through independent validation.
Because deep learning models are complex, they require a great deal of computing power to build. While the cost of computing has declined dramatically, computing is not free. For simpler problems with small data sets, deep learning may not produce sufficient added benefit over simpler methods to justify the cost and time.
Complexity is also a potential issue for deployment. Netflix never deployed the model that won its million-dollar prize because the engineering costs were too high. A predictive model that performs well with test data but cannot be implemented is useless.
Deep learning isn’t new; the original deep learning techniques date back to the 1950s. But as the cost of computing has fallen, the volume of data has risen, and the technology has improved, interest in deep learning has surged. Able to unlock the hidden relationships in massive data sets, without requiring domain expertise, time-consuming feature engineering, or even extensive data preparation, deep learning has become a compelling approach to solving a growing array of business problems.
SriSatish Ambati is co-founder and CEO of H2O.ai.
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.