Welcome to InfoWorld’s Technology of the Year Awards, our annual celebration of the best, most innovative, most important products in the information technology landscape. In this 2019 edition of the awards, you might happen to guess that containers, cloud-native application stacks, distributed data processing systems, and machine learning are major themes.
Among our 17 winners, you’ll find three leading machine learning libraries, a distributed training framework that accelerates deep learning, and an automated platform that guides nonexperts through feature engineering, model selection, training, and optimization. That makes more picks in machine learning than any other product category, including software development—a reflection of the astonishing level of activity in the space.
Three databases made our winner’s list this year, including a wide-column data store, a multi-purpose data store, and a database that seems as much application platform as data store. Because data always has to move from here to there, preferably in real-time, we’ve also included two leading platforms for building stream processing applications.
Read on to learn about this year’s winners.
Kubernetes
Kubernetes (aka K8s) has had an astonishing rise the past couple of years. It used to be one of a crowd of container orchestration systems, but now it is rapidly becoming the standard platform everywhere, whether on one of the major cloud providers or in on-premises enterprise installations. If you’re in the operations realm, spending time getting to grips with Kubernetes will likely pay back dividends as the open source project continues its relentless march.
Based on ideas and lessons learned from running Google’s massive data centers over the course of a decade, Kubernetes is a battle-tested platform for deploying, scaling, and monitoring container-based applications and workloads across large clusters. In the past year, Kubernetes releases brought major highlights such as an overhaul of storage and move to the Container Storage Interface, TLS-secured Kubelet bootstrapping, and improved support for Microsoft Azure.
We’ve also seen important additions to the core Kubernetes stack, such as Istio, which defines a service mesh for even more control over deployment, observability, and security. And we’ve seen more specialized frameworks appear, such as Kubeflow, which allows you to easily spin up TensorFlow or PyTorch machine learning pipelines on Kubernetes, all controlled by Jupyter Notebooks likewise running on the cluster.
The number of third-party tools and frameworks aimed at easing some aspect of Kubernetes management—from simplifying app definitions to monitoring multiple clusters—seems to grow with each passing day. As does the number of Kubernetes adopters, with major announcements and testimonials in 2018 coming from the likes of IBM, Huawei, Sling TV, and ING. Heck, even Chick-fil-A is running Kubernetes in every restaurant. Isn’t it about time you jumped on board?
—Ian Pointer
Firebase
In the future we may or may not have quantum computing, mind reading AIs, and sublinear algorithms for solving the traveling salesman’s problem, but whatever comes along, we can be sure that we’ll call them a “database.” All great software technology eventually gets absorbed by the Borg tended by the DBAs. The emergence of Firebase is a good example of just how this will happen.
At first glance, Firebase looks like a simple storage solution for keys and their accompanying values. In other words, a bag of pairs that is kept reasonably consistent just like the other NoSQL databases. But over the years, Google has been adding features that have let Firebase do more and more of the work that a cloud-based web app might do. Google has even started referring to Firebase as a mobile platform.
Remember the challenge you have caching data on the client when the Internet is less than perfect? The Firebase team realized that the synchronization routines that keep the database consistent are also ideal tools for pushing and pulling data from your mobile client. They opened up their synchronization process and now your code doesn’t need to juggle some complicated algorithm for handshaking or fiddling with the network. You just hand your bits to Firebase and, like magic, they appear in the copy of the handset. It’s all just one big database and your server routines and client routines just read and write from the communal pool.
Google keeps adding more as the company integrates Firebase with the rest of its stack. Authentication? Your social log-in to Facebook or, of course, Google, will get your users access to the right slices of the database. Analytics? Hosting? Messaging? All of Google’s solutions are gradually being pulled under the umbrella of the database. And the machine learning of the future? It’s already a beta option for Firebase users who want to analyze the key/value pairs already in the database. In a sense, we’ve already started to merge AIs with databases.
—Peter Wayner
Serverless Framework
The first generation of cloud, which rented us servers, saved us time by lifting all of the tedious hardware-related duties from our shoulders. The servers lived in distant buildings where the heating, cooling, and maintenance were someone else’s problem. The next generation of cloud technology is getting rid of the servers, at least in name, and saving us not only from fretting over operating system patches and updates, but from most of the headaches associated with application delivery.
There is still server hardware and an operating system somewhere under our code, but now even more of it is someone else’s responsibility. Instead of getting chores that come with root access, we can just upload our functions and let someone else’s stack of software evaluate them. We can focus on the functions and leave everything else to the little elves that keep the clouds running.
But there are challenges. Serverless computing means rethinking technical architectures. Relying on events and asynchronous queues requires refactoring applications into neatly divided tasks. While some tooling support has arrived, much still needs to be figured out: integration debugging, distributed monitoring, deployment packaging, function versioning, etc.
Then there is vendor lock-in to worry about. The leading FaaS (functions as a service) providers—AWS Lambda, Microsoft Azure Functions, and Google Cloud Functions—all have their own specialized methods for deployment and operation.
That is where Serverless Framework comes to the rescue, offering a layer of abstraction over vendor-specific implementations to streamline app deployment. The open source framework gives you convenient ways to test and deploy your functions to various cloud providers and eases configuration updates via a common YAML file, while also providing rich features for function management and security.
In addition to the majors, Serverless Framework supports Kubeless, a framework for deploying FaaS on Kubernetes clusters, and Apache OpenWhisk, a Docker-based platform that underpins IBM Cloud Functions and offers broad language support and unique features to handle more-persistent connections.
Serverless computing is neither mature nor a silver bullet for every use case, but the economics and efficiency are hard to resist. With Serverless Framework available to smooth over the bumps, why not join the growing number of businesses turning to serverless to slash operational costs and speed up deployments?
—James R. Borck
Elastic Stack
If you’re running a user-facing web application these days, providing sophisticated search functionality is not an option. Users are constantly being presented with free-text search interfaces that will fix their spelling, automatically suggest alternative phrases, and highlight search results to show them why certain results were returned. Like it or not, these are the search standards you have to live up to.
Luckily, the Elastic Stack will meet all of your search needs and much more. Consisting primarily of Elasticsearch, Kibana, Logstash, and Beats, the Elastic Stack supports many use cases including user-facing document search and centralized log aggregation and analytics. Indexing documents one at a time or in bulk into Elasticsearch is a breeze from almost any language, complete with best guesses for mapping types for all of your fields (think column data types in relational databases). Now you have the full search API at your disposal, including fuzzy search, highlighting, and faceted search results. Pair that with a front-end tool like Searchkit and you’ll have a quick prototype of faceted, free-text searching in no time.
Aggregating logs from any number of separate services couldn’t be easier using Logstash and Beats, allowing you to send log lines to a centralized Elasticsearch cluster for easier troubleshooting and analytics. Once you have log data indexed, use Kibana to build charts and assemble dashboards to get system health at a glance. The Elastic Stack is one of today’s must-haves for any new web project.
—Jonathan Freeman
DataStax Enterprise
Apache Cassandra—an open-source large-scale distributed column-family database inspired by Google’s Bigtable paper—is a great way to run massively scalable global data infrastructure. The masterless design is ideal for running many types of high-throughput cloud applications.
However, Cassandra is not the easiest system to deploy and manage. It also leaves you wanting when trying to do various types of applications involving analytics, search, and graph operations. DataStax Enterprise (aka DSE) adds these capabilities along with improved performance and security, vastly improved management, advanced replication, in-memory OLTP, a bulk loader, tiered storage, search, analytics, and a developer studio.
Like Bigtable and Cassandra, DataStax Enterprise is best suited for large databases—terabytes to petabytes—and is best used with a denormalized schema that has many columns per row. DataStax and Cassandra users tend to use it for very large-scale applications. For example, eBay uses DataStax Enterprise to store 250TB of auction data with 6 billion writes and 5 billion reads daily.
DataStax Enterprise 6 brought several new features in DSE Analytics, DSE Graph, and DSE Search in 2018, along with finer-grained security settings. Improvements to DataStax Studio track the improvements in DSE Analytics, such as support for Spark SQL, and expanded IDE support for DSE Graph with interactive graphs. To top it all off, benchmarks show DSE 6 to be multiples faster than Cassandra (see InfoWorld’s review).
—Andrew C. Oliver
Apache Kafka
Honestly, it’s odd to imagine a world without Apache Kafka. The distributed streaming platform will soon celebrate its eighth birthday, and the project continues to be the rock-solid open source choice for streaming applications, whether you’re adding something like Apache Storm or Apache Spark for processing or using the processing tools provided by Apache Kafka itself. Kafka can handle low-latency applications without breaking a sweat, and its log-based storage makes it a great choice where reliability is required.
For interfacing with databases and other data sources, Kafka Connect includes a host of connectors to popular offerings such as Microsoft SQL Server, Elasticsearch, HDFS, Amazon S3, and many more, allowing you to flow data into your Apache Kafka cluster simply by editing a configuration file. Imagine setting up an entire pipeline from a database to Amazon S3 without having to write custom code—or touch any Java code whatsoever.
Confluent, one of the major developers of Apache Kafka—including the original creators: Jay Kreps, Neha Narkhede, and Jun Rao—offers a platform that builds on top of the open source offering. While this includes traditional enterprise goodies such as better operational user interfaces, it also includes KSQL, a library that provides you with the ability to interrogate and process the data held within Kafka topics using straight SQL.
And if you don’t feel up to the task of running Apache Kafka yourself, Google offers a managed platform in conjunction with Confluent, while Amazon has Managed Streaming for Kafka (Amazon MSK). Amazon MSK is currently in public preview, likely to hit general availability sometime in 2019.
—Ian Pointer
Apache Beam
Apache Beam takes a forward-thinking approach to developing batch and stream processing pipelines. Unlike most platforms, Beam abstracts away the development language from the final execution engine. You can write your pipeline in Java, Python, or Go, then mix-and-match a runtime engine to fit your specific needs—say, Apache Spark for in-memory jobs or Apache Flink for low-latency performance.
Your business logic isn’t pegged to a specific execution engine, so you’re not locked in as technologies obsolesce. Plus, developers don’t need to grapple with the specifics of runner configuration.
Internally, Beam manages all of the mechanics for temporal event processing. Whether its well-defined batches or out-of-sequence bursts coming from intermittent IoT sensors, Beam aggregates multiple event windows, waits for its onboard heuristics to determine that enough data has accumulated, then fires a trigger to begin processing. Transforms, data enrichment, and flow monitoring are all part of the mix.
Beam supports a multitude of runners (Spark, Flink, Google Dataflow, etc.), I/O transforms (Cassandra, HBase, Google BigQuery, etc.), messaging (Kinesis, Kafka, Google Pub/Sub, etc.), and file sources (HDFS, Amazon S3, Google Cloud Storage, etc.). The open source underpinnings of Beam are even showing up in third-party solutions like Talend Data Streams, which compiles to Beam pipelines.
Apache Beam doesn’t merely provide a solid engine for processing distributed ETL, real-time data analytics, and machine learning pipelines, it does so in a way that future-proofs your investment.
—James R. Borck
Redis
It’s a NoSQL database! It’s an in-memory cache! It’s a message broker! It’s all of the above and then some! Redis provides so many useful capabilities in one bag, it is not surprising that the so-called “in-memory data structure store” has become a staple of modern web application stacks, with library support in just about every programming language you might choose to use.
Redis offers the ability to work at just the level of complexity and power you need for a given job. If all you need is a simple in-memory cache for data fragments, you can have Redis set up and working with your application in just a few minutes. If you want what amounts to a disk-backed NoSQL system, with different data structures and your choice of cache eviction schemes, you can have that with just a little more effort.
Redis 5.0, released in October 2018, introduced many powerful new features, the most significant being the new stream data type. This log-like, append-only data structure makes it possible to build Apache Kafka-like messaging systems with Redis. Other improvements in Redis 5.0 include better memory management and fragmentation control—important performance enhancements for a system built around in-memory storage as its main metaphor.
Redis Enterprise, available from Redis Labs, adds advanced features like shared-nothing clusters, automatic sharding and rebalancing, instant auto-failover, multi-rack and multi-region replication, tunable durability and consistency, and auto-tiering across RAM and Flash SSDs.
—Serdar Yegulalp
Visual Studio Code
The beauty of Visual Studio Code is that it can be just as much, or as little, as you want it to be. Visual Studio Code will serve as a fast and lightweight editor, if that’s all you need, or balloon into a full-blown development environment, thanks to plug-ins and add-ons for just about every major language or runtime in use today. Python, Java, Kotlin, Go, Rust, JavaScript, TypeScript, and Node.js (not to mention Microsoft’s own .Net languages) all have excellent support—as do supplementary document formats such as Markdown, HTML, reStructuredText, and LLVM IR.
In addition to broad support and wide adoption, Visual Studio Code stands out for the relentless stream of improvements and additions that pours into the product. No area of functionality has been ignored. Thus you’ll find strong support for Git, Team Foundation Server, Docker, code linting, refactoring, large files, etc. etc. There’s even the ability to run Visual Studio Code in a self-contained directory, opening the door to repackaging Visual Studio Code as a standalone environment for whatever new purpose you could dream up.
—Serdar Yegulalp