The best open source software of 2022

InfoWorld’s 2022 Bossie Awards celebrate the most important and innovative application development, devops, data analytics, and machine learning tools of the year.

IDG

The best open source software of 2022

If you’re a software developer today—or a cloud engineer, or a data scientist, or anyone who works with code—then open source software is like the air that you breathe. But open source wasn’t always so ubiquitous. Twenty or thirty years ago, when “free software” was just getting started, open source projects were unusual, and the people who created them were generally academics, researchers, or eccentrics. Yet somehow this eccentricity caught on, and today open source software is, as the saying goes, eating the world.

Open source software projects have become the engines of innovation in virtually every corner of computing. Yesterday’s open source developers built the tools that built the internet, and today’s open source developers build on, inventing new and better tools for front-end development, back-end development, devops, data ops, distributed data processing, data analytics, machine learning… and much, much more.

We salute the year’s best of these cutting-edge open source projects in our 2022 Bossie Awards.

IDG

AlmaLinux

Does the world need more Linux distributions? Well, a lot of commercial software has standardized on Red Hat Enterprise Linux. The basically free-as-in-beer version of RHEL was CentOS, which Red Hat acquired and essentially snuffed out, at least as a drop-in replacement for RHEL. This went over predictably poorly with many people who used CentOS, resulting in the arrival of Rocky Linux and AlmaLinux to fill the gap. AlmaLinux claims binary compatibility with RHEL and community ownership.

— Andrew C. Oliver

IDG

Podman

New to the genre of science fiction horror… check that… container management comes Podman. Well, actually, Podman 1.0 was released in 2019. Unlike Docker, Podman can run as a single process with an unprivileged user and comparatively fewer limitations than Docker rootless. Plus, the container images and pods tend to be smaller in Podman than in Docker. Moreover, you can build Kubernetes pods directly in Podman. Migration may even be as simple as alias docker=podman as Podman supports many of the same commands. Maybe you want to undock and become a pod person?

— Andrew C. Oliver

IDG

Play with Docker

Sure, you can do all of the things with your laptop or EKS or GKE or whatever but what if you just want to putz around with a few containers? Then you can just go to Play with Docker and do the things. While you cannot start running your new startup to do security, AI, or analytics (all new startups do those things now) because of a five-instance, four-hour limit, Play with Docker is a good place to try something out before you fully commit. And because maybe you do not want to expose yourself in public (always a bad idea), maybe you want to install an internal version of Play with Docker from the open source (MIT licensed) repository on GitHub so people in your organization can putz around?

— Andrew C. Oliver

IDG

Vaadin

A web framework that allows developers to implement web user interfaces in Java without having to code any HTML or JavaScript? What could be better? Not everyone who codes makes pretty UIs or likes dealing with HTML. Some coders would rather clean the toilet with a toothbrush. Vaadin not only makes new apps simpler to code, but its server-side API is somewhat Swing-like, so converting Swing applications to modern web applications is made somewhat less painful than full rewrites. You can check it out at vaadin.com or fork it from GitHub.

— Andrew C. Oliver

IDG

JHipster

JHipster is an ambitious, even visionary, full-stack, rapid application development platform for Java. Its most visionary aspect may be allowing a range of different technologies to fulfill different roles in both the front end and the data layer, but it doesn’t stop there. JHipster delivers a slew of other niceties including a CLI tool that handles scaffold generation and that works against heterogenous technology stacks: MongoDB fronted by a Vue.js UI, Postgres fronted by React, and many other combos. All stitched together with state-of-the-art Spring/Java middleware.

JHipster also alleviates pain points like adding security via Spring security. And you’ll get several modern deployment options and CI/CD integrations out of the box.

— Matthew Tyson

IDG

Solid

Choosing a winner among so many innovative front-end JavaScript frameworks is luxuriously difficult. Even the Facebook-backed front runner, React, remains an admirably evolutionary project that delivers compelling new features at a regular pace. Vue.js, Angular, and Svelte are all active and impressive. No matter what we choose, someone will say “What about framework X? It does Y better”—and they will be right!

Last year we gave the prize to Svelte. This year we stared long and hard at Astro, Qwik, and Solid. Astro gives us a groundbreaking approach to hydration with the concept of islands, which are usable in other frameworks to boot. Qwik is a bold reimagining of the entire reactivity paradigm from the ground up. Ultimately, Solid wins the day for delivering a host of best-in-class performance features in a familiar and easy-to-grasp package. 

— Matthew Tyson

IDG

Redwood

Picking a full-stack JavaScript framework is almost as hard as picking a pure front-end JS framework. The industry leading Next.js (see next slide) has not rested on its laurels. It remains a dynamic force that is still pushing the envelope. At the same time, alternatives like SvelteKit and Nuxt, as well as newer entrants like Blitz.js, are exploring new approaches and techniques. Among these newer frameworks, Redwood stands out for daring to have a strong opinion about how an app will be structured. This up-front decision makes for an admirably fast developer experience.

In addition to taking a Rails-like approach to recurring requirements like data modeling and scaffolding, Redwood tackles other real-world demands like security and tracing integrations. And Redwood allows for targeting a variety of deployment environments including serverless platforms like Vercel and Netlify.

— Matthew Tyson

IDG

Next.js

Next.js pioneered the full-stack JavaScript framework. Node.js opened the door to isomorphic JavaScript applications, and Next.js walked through it, ushering full-stack JavaScript into practical application. Next.js begins with the simple premise of uniting a React front end with a JavaScript server in a single build pipeline, then elaborates from there. Many important aspects of application development—from routing to data access, security to server-side rendering—are made simpler and more consistent. Moreover, Next.js supports a variety of deployment targets including serverless and edge. Its corporate backer, Vercel, leverages this support to enable automated deployments that hide the complexity of connecting the back end with the front end.

— Matthew Tyson

IDG

Wasmtime

Similar to what Node.js does for the JavaScript runtime, Wasmtime allows developers to leverage all of the advantages that WebAssembly provides inside the browser—including safe sandboxed execution, near-native performance, and support across multiple programming languages and platforms—outside the browser. Other Wasmtime benefits include fine-grained adjustments to CPU and memory use, high-speed execution thanks to the Cranelift code generator, and staying abreast of new WebAssembly features.

While earlier Wasmtime releases were already considered production-ready, Wasmtime 1.0 adds a slew of performance-related improvements: faster instantiation of Wasm modules, smarter memory use, and better runtime performance with optimized stack traces and cooperative multitasking. It’s a major milestone.

— Serdar Yegulalp

IDG

PyScript

One of the long-gestating promises of WebAssembly is enabling the use of languages other than JavaScript in the web browser. PyScript delivers a full Python runtime in the browser, allowing you to use Python in webpages as a full-blown scripting language. Even some advanced libraries like NumPy are supported, allowing you to construct powerful and complex apps with native HTML front ends, with no need for a Python server on the back end. Note that PyScript is currently experimental and brittle, and there’s typically a long startup time. But as a peek into the future, PyScript is tantalizing, and kicks open the door to a great many possibilities.

— Serdar Yegulalp

IDG

Hardhat

Developing for the Blockchain is tricky, but new generations of tooling are making it easier. Hardhat is an excellent, open source framework that simplifies coding, testing, and deploying Dapps and smart contracts on Ethereum. Built around an extensible task runner and plug-in framework, Hardhat is flexible enough to handle most development workflows, and integrates with a local Ethereum testnet—essential for deploying and debugging code without interacting with the remote testnets.

Hardhat includes an extension for Visual Studio that supports Solidity, and offers Chai extensions for Ethereum-specific test case assertions. Beyond all of these items of usefulness, Hardhat delivers a superb developer experience. Things tend to work just as you expect out of the box, making for a happier happy path.

— Matthew Tyson

IDG

OpenFGA

OpenFGA is Auth0’s open source implementation of a universal authorization platform based on Zanzibar, Google’s global authorization system. It is also the engine behind Auth0’s enterprise authorization-as-a-service offering. Addressing a broad range of authorization requirements, from role-based to relationship-based to fine-grained authorization, OpenFGA packs an incredible amount of power and flexibility into a built-for-scale package. It’s not only a major win for authorization knowhow in the open source software community, but also a reaffirmation of the fundamental premise that what’s good for open source is good for the enterprise: the freedom of code as speech.

— Matthew Tyson

IDG

Sentry

Alongside security, error and performance tracing are among the most frustratingly inevitable requirements for many apps. Cue a sigh of relief. Sentry offers an entire ecosystem of open source tools for monitoring the health of applications, services, and APIs, from the server-side API for collecting data, to a dashboard for making it manageable, to a comprehensive slew of application-side integrations.

These integrations address virtually any conceivable stack you might be using, from Perl to Python. Best of all, they are dead simple to use. Just import the library—no need to instrument your business logic with extraneous code. Sentry also offers integration points for a number of tools like project trackers, source control systems, and deployment platforms.

— Matthew Tyson

IDG

Appsmith

Appsmith is a low-code framework that helps back-end developers customize software like admin panels, forms, and dashboards with minimal HTML and CSS coding. The platform jumpstarts projects with pre-built UI components and reusable templates, integrates with a broad range of APIs, data sources, and cloud services, and supports both cloud and self-hosting deployment options. Appsmith boasts more than 10 million downloads on Docker, more than 21 thousand stars on GitHub, and recently announced $41 million in Series B funding. Example use cases include customer support tools and internal processes such as communications.

— Isaac Sacolick

IDG

Spinnaker

Spinnaker is an open-source, multi-cloud continuous delivery platform that helps devops teams automate releases and implement canary and other deployment strategies. More than 220 companies use Spinnaker, including Airbnb, SAP, Pinterest, Mercari, and Salesforce, and the community has more than 2500 contributors. Smaller engineering organizations are successful using Spinnaker too. For example, Upwave’s 20-person engineering team manages 100 deployments per week, with lead times under 20 minutes for changes. Major cloud providers support Spinnaker, and you’ll find several ebooks to help developers get started. Spinnaker has several notable success stories including the 2020 Biden for President campaign.

— Isaac Sacolick

IDG

Hypertrace

Built by Traceable on Apache Kafka, Hypertrace is an open-source, distributed tracing and observability engine capable of ingesting and processing huge volumes of real-time performance data from large numbers of services across sprawling cloud-native architectures. Hypertrace monitors your applications and microservices, tracing distributed transactions across their multiple touchpoints, and distills all of this information into service metrics and application flow maps, which it displays in fully customizable dashboards.

In addition to enabling path-based analysis, Hypertrace delivers real-time alerts that help you proactively address performance bottlenecks and troubling application delivery trends before they impact your bottom line. Hypertrace supports popular tracing formats out of the box—including Zipkin and Jaeger—and offers native instrumentation agents for Java, Go, and Python.

— James R. Borck

IDG

Gravitee

The Gravitee API management platform allow you to centrally manage, govern, and secure distributed APIs, an absolute necessity to rein in cost and complexity in today’s event-driven API and microservices world. Gravitee’s Cockpit portal pairs guided access with a feature-rich toolset for publishing, documenting, and discovering APIs, while its onboard API designer offers a visual, low-code approach to model development and documentation. Task automation minimizes errors, speeds debugging, and simplifies deployment. Secure access and auditing underpinnings allow you to lock down endpoints through authentication and authorization services. An enterprise license unlocks additional designer and production gateway features, along with perks like an alert engine, anomaly detection, and real-time analytics. 
— James R. Borck

IDG

OpenTelemetry

For visibility into today’s distributed applications, yesterday’s simple logs and metrics are no longer enough. Hence the rise of observability tools like Zipkin and Jaeger, and paid services like Honeycomb, which allow developers to understand their applications at a much deeper level than ever before. Of course, the downside to this proliferation of new tools is that they all work a bit differently.

OpenTelemetry bridges the gaps between observability systems with a set of standard APIs and tools, uniting the generation, emission, collection, processing, and export of telemetry data in a vendor-agnostic fashion. Did you start out with Observability Product A, but then find yourself wanting to use Observability Product B? OpenTelemetry can make that happen with just a few small configuration changes.

— Ian Pointer

IDG

Grafana

Grafana’s creators strived to make one open-source dashboard to rule them all, and it’s hard to find another product that comes close—whether fully proprietary or open source with for-pay options, as Grafana is. Grafana 8.0 merged Prometheus alert visualization with Grafana’s native alerting, and complemented that with Prometheus Alertmanager handling. Grafana 9.0 adds query building tools for the Prometheus PromQL and Loki LogQL query languages; lets you preview dashboards with thumbnails instead of mere descriptions; and promotes role-based access control from beta to general availability in the enterprise edition of the product.

— Serdar Yegulalp

IDG

Dapr

With distributed applications, every time you build a new service you confront the same myriad concerns: securing connections, setting up observability, dealing with state, dealing with messaging, etc. All of these things need to be done time and time again, often with different third-party services, all of which add layers of cruft to your code and tie it to external services you may or may not want to continue to use.

Enter Dapr, an incubating project at CNCF, which strives to eliminate some of that hardship and duplication. Running as a sidecar to your application, Dapr abstracts away the complexity of microservice connectivity. Your app talks to Dapr, and Dapr does the rest, so you could be running on AWS and using Kinesis, or running on Google Cloud and using PubSub, and your service need not know the difference. You can spend more time on your application logic and less on all that glue code.

— Ian Pointer

IDG

Redpanda

Redpanda is a plug-in replacement for Kafka written primarily in C++ using the Seastar asynchronous framework and the Raft consensus algorithm for its distributed log. It can deliver up to 10x lower average latencies and up to 6x faster Kafka transactions, all while running on fewer resources. Redpanda does not require using ZooKeeper or the JVM, and its source is available on GitHub under the Business Source License (BSL).

Even beyond its reimplementation in C++, Redpanda uses an asynchronous, shared-nothing, thread-per-core model, with no locking, minimal context switching, and thread-local memory access. Redpanda goes beyond the Kafka protocol into the future of streaming with inline WebAssembly transforms and geo-replicated hierarchical storage/shadow indexing.

— Martin Heller

IDG

Apache Iceberg

A high-performance format for huge analytic tables, Apache Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines such as Trino, Spark, Sonar, Presto, Hive, Flink, and Impala to safely work with the same tables, at the same time. Iceberg supports flexible SQL commands to merge new data, update existing rows, and perform targeted deletes. It can eagerly rewrite data files for read performance, or it can use delete deltas for faster updates. And Iceberg supports schema evolution, automatic partitioning, time travel queries, version rollback, and data compaction out of the box.

— Martin Heller

IDG

Apache Druid

Apache Druid is a real-time analytics database that delivers sub-second queries, high concurrency, and real-time and historical insights with stream ingestion from Kafka, Kinesis, and other platforms. The technology builds on 10 years of releases, more than 400 contributors, and a distributed data store architecture that combines ideas from data warehouses, time series databases, and search systems. Thousands of companies including Netflix, Salesforce, and Walmart use Druid to power analytics applications. Use cases include clickstream analytics, risk and fraud analysis, and supply chain analytics. Developers can review the introduction to Apache Druid and the ebook of success stories to get started.

— Isaac Sacolick

IDG

JAX

Among the innovations that power Google’s popular open source TensorFlow machine learning platform are automatic differentiation (Autograd) and the XLA (Accelerated Linear Algebra) optimizing compiler for deep learning. JAX, also from Google, is another project that brings together these two technologies, and it offers considerable benefits for speed and performance. When run on GPUs or TPUs, JAX can replace other programs that call NumPy, but its programs run much faster. (The Autograd engine can automatically differentiate native Python and NumPy code.) Additionally, using JAX for neural networks can make adding new functionality much easier than expanding a larger framework like TensorFlow.

— Martin Heller

IDG

nbdev

One of the dirty secrets of notebook programming, using environments like Jupyter or Google Colab, is that it produces some of the worst spaghetti code you’ve ever seen, with data scientists hopping from cell to cell and creating an unmaintainable mess. Some even go so far as to say that notebook programming might be as harmful as GOTO was back in the day.

nbdev embraces the good of notebook programming and attempts to alleviate the bad. A Git-friendly, Jupyter notebook-driven development platform from fast.ai, nbdev gives data scientists the exploratory freedom they require, but also the ability to easily create modules with documentation and, yes, even proper tests, all within the same notebook. You’ll find it in use at companies like Netflix and Lyft and (naturally) fast.ai, which used nbdev to create the new version of the fast.ai library.

— Ian Pointer

IDG

Accelerate

What if you could add distributed training and inference at huge scale to any PyTorch code just by adding four lines of code? Straight out of the box, Huggingface’s Accelerate allows you to use features like TPU devices or Microsoft’s DeepSpeed optimizations via simple configuration switches. Yes, you can train deep learning models at billion-parameter scale using techniques such as distributed training, sharded parallelism, and gradient accumulation, all handled behind the scenes by the Accelerate library. Making sure that the training of big models is not just limited to the heavyweights in the industry is important for diversity and experimentation, so it’s heartening to see Accelerate becoming part of the PyTorch ecosystem.

— Ian Pointer

IDG

Stable Diffusion

Stable Diffusion is a text-to-image AI model that generates images of simply astonishing quality. Barely two months old, the project has caught like wildfire, with enthusiasts across the world already improving on the original work to make generation faster, to run on lower-memory GPUs, and to add in-painting and out-painting support. They’ve even got Stable Diffusion running on M1-powered MacBooks.

Stability.ai spent $600K training this model, and immediately gave it away as open source (contrast OpenAI’s DALL-E). While such a model definitely prompts concerns over dataset curation and the ability to create NSFW images, it is almost certainly better for this technology to be in everybody’s hands rather than just a few giant corporations, both for advancing research and for generating art pieces for years to come.

— Ian Pointer

IDG

EleutherAI

GPT-NeoX-20B is the new 20 billion parameter natural language processing model created by EleutherAI, publisher of the earlier GPT-J, a 6 billion parameter model. These models may seem small compared to OpenAI’s GPT-3, which has 175 billion parameters, but they have achieved strong benchmark results using LAMBADA, Winogrande, Hellaswag, and other datasets. You can test GPT-J in sentence completion and perform more advanced NLP tasks like translation and classification.

What’s behind EleutherAI’s push to open source such powerful models? Conor Leahy, one of the project’s founders, explained, “We have to think of AIs as weird aliens that don’t think like us.” The goal is to make this technology accessible to as many researchers as possible, so we can learn how to control it.

— Isaac Sacolick

Related Slideshows