Review: IBM Watson strikes again
Built on Watson and SPSS predictive analytics, IBM's cloud machine learning services address the needs of developers, data scientists, and businesses
-
IBM Watson and Predictive Analytics
Machine learning reviews
- Review: 6 machine learning clouds
- Review: Azure Machine Learning is for...
- Review: Amazon puts machine learning in...
- Review: Databricks makes big data...
- Review: IBM Watson strikes again
The IBM Watson AI system drew the world’s attention by winning at "Jeopardy" in February 2011 against two of the game’s all-time champions, and IBM has strived to apply the Watson system to more interesting problems than a trivia quiz ever since. IBM has also extended Watson’s capabilities to developers, data scientists, and even ordinary business users. Along with IBM’s SPSS predictive analytics software, Watson forms the foundation of IBM’s cloud offerings in machine learning and advanced analytics.
IBM breaks the Watson system into five parts: machine learning, question analysis, natural language processing, feature engineering, and ontology analysis. From these parts, IBM has built out a suite of composable cloud services from which you can make your own mini-Watson for a solution to your problem. (Note that compiling the knowledge base for the answers is easy: 95 percent of "Jeopardy" questions can be answered by the titles of Wikipedia articles.)
Meanwhile, IBM is collaborating on applying Watson techniques to health care, seismology, education, and genomics, at Enterprise levels. While these efforts are very interesting, especially in the long term, for the purposes of this review I’ll concentrate on Watson and other machine learning (ML) technology that is available for use in the IBM Cloud, which includes the Bluemix PaaS.
What other ML tech? In a distant corner of IBM’s far-flung empire, IBM SPSS offers both Windows and cloud implementations of the SPSS Modeler package, plus a Predictive Analytics service that can run its model predictions in real time in the Bluemix PaaS and periodic batch jobs to update the models. IBM SPSS Modeler is comparable to Azure Machine Learning and Databricks, while the IBM Watson services are comparable to Microsoft’s Project Oxford and Cortana Analytics, as well as to HPE’s Haven OnDemand.
IBM SPSS Modeler and Predictive Analytics
Let’s start with IBM SPSS Modeler and Predictive Analytics. I downloaded the 30-day free trial of SPSS Modeler for Windows and put it through its paces. The free version has the Personal Edition features enabled for the trial period: data access and export; automatic data prep, wrangling, and ETL; 30-plus base machine learning algorithms and automodeling; R extensibility and Python scripting. It does not have access to big data through an IBM SPSS Analytic Server for Hadoop/Spark, and it does not include champion/challenger functionality, A/B testing, text and entity analytics, or social network analysis. Those features come with the more expensive SKUs.
The ML algorithms in SPSS Modeler are comparable to what you find in Azure Machine Learning and Spark.ml, as are the feature selection methods and the selection of supported formats. Even the automodeling (train and score a bunch of models and pick the best) is comparable, though it’s more obvious how to use it in SPSS Modeler than in the others.
What SPSS Modeler has that you won’t find in Azure Machine Learning’s Jupyter Notebooks or Databricks’ Notebooks is a point-and-click interface. There was a time (long ago) when I gushed about how great it was that SPSS was making its statistical analysis programs easy to use by adding Windows mouse-and-menu interfaces. I no longer care much about that. In fact, I now prefer a notebook approach, primarily because an annotated live notebook (which I think I first saw in Mathcad for DOS) makes it easy for another analyst to follow what you’ve done and to check or extend your work.
Overall, I think that IBM SPSS Modeler is very capable and easy to use, with good performance, but it's awfully expensive. The “call for pricing” designation tells me that both SPSS Modeler Gold on IBM Cloud and SPSS Analytic Server are probably even more expensive.
What do you do with SPSS models once you’ve created them? Upload them to Bluemix. IBM Bluemix hosts Predictive Analytics Web services that apply SPSS models to expose a scoring API that you can call from your apps. IBM has posted two example apps on GitHub; these are based on sample data sets provided with SPSS Modeler, and they're implemented as Web services called by Node.js and/or Angular.js apps. Both look relatively straightforward.
In addition to Web services, Predictive Analytics supports batch jobs to retrain and reevaluate models on additional data. Optionally, a batch job can update a deployed model with a retrained model; that solves the common problem of predictive models becoming stale as the data changes. Currently, Predictive Analytics batch jobs are only exposed as API calls; there is no user interface that I have found.
Watson in Bluemix
You'll find 18 Bluemix services listed under Watson, shown in the figure below. Each service exposes a REST API. In addition, you can download SDKs for using the API from your applications. For example, the AlchemyAPI has SDKs and examples available for Java, C/C++, C#, Perl, PHP, Python, Ruby, JavaScript, and Android OS. You’ll need an API key to run the samples and call the API successfully. In general, once you provision a Watson service in Bluemix, you will be presented with links to an online sample that you can run and fork, as well as to the documentation.
The AlchemyAPI offers a set of three services (AlchemyLanguage, AlchemyVision, and AlchemyData) that enable businesses and developers to build cognitive applications that understand the content and context within text and images. AlchemyLanguage processes text to score its sentiment, emotions (Beta), keywords, entities, and high-level concepts. AlchemyVision processes images to recognize images, scenes, and objects. AlchemyData provides searchable news and blog content enriched with natural language processing. AlchemyAPI appears to draw capabilities from several of the other Watson services and merge them into a single service that includes a combined call for Web pages.
Next up are Concept Expansion, which analyzes text and learns similar words or phrases based on context, and Concept Insights, which link documents that you provide with a preexisting graph of concepts based on Wikipedia topics. (Remember what I mentioned earlier about how well "Jeopardy" topics map to Wikipedia topics.) A note in the documentation says the Watson Concept Expansion Service tile will be removed from the Bluemix catalog on March 6, 2016. However, it was still there on March 18 as a beta service with a predefined data set and domain, and I was able to provision the service and run the sample.
The Dialog Service allows you to design the way an application interacts with a user through a conversational interface, using natural language and user profile information. The Document Conversion service converts a single HTML, PDF, or Microsoft Word document into normalized HTML, plain text, or a set of JSON-formatted Answer units that can be used with other Watson services.
Language Translation works in several knowledge domains and language pairs. In the news and conversation domains, the to/from pairs are English and Brazilian Portuguese, French, Modern Standard Arabic, or Spanish. In patents, the pairs are English and Brazilian Portuguese, Chinese, Korean, or Spanish. The Translation service can identify plain text as being written in one of 62 languages.
The Natural Language Classifier service applies cognitive computing techniques to return the best matching classes for a sentence, question, or phrase, after training on your set of classes and phrases. You can see how this capability was useful for playing "Jeopardy."
Personality Insights derives insights from transactional and social media data (at least 1,000 words written by a single individual) to identify psychological traits, which it returns as a tree of characteristics in JSON format. Relationship Extraction parses sentences into their components and detects relationships between the components (parts of speech and functions) through contextual analysis. The Personality Insights API is documented for Curl, Node, and Java; the demo for the API analyzes the tweets of Oprah, Lady Gaga, and King James as well as several textual passages.
Retrieve and Rank is an ML-trained relevancy improver for Apache Solr search results. Solr is a taxonomy-aware search server built in turn on Apache Lucene full-text indexing.
The Speech to Text service converts the human voice into the written word for English, Japanese, Arabic (MSA), Mandarin, Portuguese (Brazil), and Spanish. Along with the text, the service returns metadata that includes confidence score per word, start/end time per word, and alternate hypotheses/N-Best (the N most likely alternatives) per phrase.
The Text to Speech service processes text and natural language to generate synthesized audio output complete with appropriate cadence and intonation. Voices are available for U.S. and U.K. English, French, German, Italian, Castilian, North American Spanish, Brazilian Portuguese, and Japanese. According to the documentation, one of the three U.S. English voices was used as Watson’s voice for "Jeopardy," but that voice was not on offer when I ran the demo.
Tone Analyzer, still in beta, identifies emotion, social propensities, and writing styles from text. Tradeoff Analytics uses Pareto filtering techniques in order to identify the optimal alternatives across multiple criteria, then uses various analytical and visual approaches to help the decision maker explore the trade-offs within the identified set of alternatives.
Finally, the Visual Recognition service enables you to analyze the visual appearance of JPEG images (or video frames) to understand what is happening in a scene. Using pretrained machine learning technology, semantic classifiers recognize many common visual entities, such as settings, objects, and events, returning labels and likelihood scores.
The three non-IBM Watson services on Bluemix are in closed betas.
Watson Analytics
Watson Analytics uses IBM’s own natural-language processing to make machine learning easier to use for business analysts and other non-data-scientist business roles. It is a Web application that apparently uses many of the services that IBM includes in the Watson section of Bluemix. I tried the free edition and used it to analyze the familiar bike rental data set supplied as one of the samples.
I can see where this approach could be useful for someone who wants the results of ML without programming or without even understanding the methods very well. However, I found that the natural language interface and all the helpful diagnostics mostly got in my way. That surprised me because the UIs of business intelligence products Tableau and Qlik Sense, which implement a subset of what Watson Analytics tries to accomplish, definitely did not get in my way.
I’ve tried to cover three (or more, depending how you count) of IBM’s ML products in a single review. I’ll admit that wasn’t easy, and I wasn’t able to do as extensive an evaluation of each product as I would have liked, but I’ve still come to some general conclusions.
IBM SPSS Modeler offers conventional ML training and scoring in a Windows or online UI. It’s very good, but expensive. Bluemix Predictive Analytics can run the SPSS models as a Web service and return predictions. It can also run batch jobs to update the models.
Watson Services in Bluemix offer cloud services and APIs for useful and specialized ML applications. There are 15 IBM Watson services offered, which can be incorporated into your own applications. While they are all different, they appear to be good, reasonably priced additions to a programmer’s bag of tricks.
Watson Analytics is a Web application for analyzing data with ML and associated tools, including data exploration. Watson Analytics tries so hard to be easy to use that it makes me feel disoriented and makes me want to rip off the UI and fiddle with the code. I can see the value of Watson Analytics for its intended audience of business people not trained in data science, but I don’t particularly like it myself.
Actual data scientists will probably want to skip Watson Analytics in favor of SPSS Modeler and Watson Services in Bluemix. Business analysts could use Watson Analytics, but they might be better off using Tableau for their exploratory data analysis, then collaborating with a data scientist to develop their predictive models.
InfoWorld Scorecard |
Variety of models (25%)
|
Ease of development (25%)
|
Integrations (15%)
|
Performance (15%)
|
Additional services (10%)
|
Value (10%)
|
Overall Score (100%)
|
---|---|---|---|---|---|---|---|
IBM Watson and Predictive Analytics | 10 | 9 | 9 | 9 | 9 | 8 |
Copyright © 2016 IDG Communications, Inc.
Machine learning reviews
Haven OnDemand’s enterprise search and format conversions are the strongest services, while more...