Bossie Awards 2016: The best open source application development tools
InfoWorld's top picks among the tools and frameworks for building web apps, mobile apps, and apps for data analysis and machine learning
The best open source development tools
For years and years, we’ve been building applications that collect data from the users and serve it back to them. We’re finally starting to do something with that data. Along with the best open source tools for building web apps, native apps, native mobile apps, and robotics and IoT apps, this year’s Bossie winners in application development include top projects for data analysis, statistical computing, machine learning, and deep learning. After all, if our applications can be reactive, responsive, and even “ambitious,” they can also be intelligent.
[ InfoWorld unveils the Bossies: The best open source products of the year. | The best open source applications. | The best open source networking and security software. | The best open source datacenter and cloud software. | The best open source big data tools. | Stay up on open source with the InfoWorld Linux report. ]
Johnny-Five
You might think the JavaScript language was a mismatch for robotics and IoT applications, but you’d be wrong. The movement to apply JavaScript to robotics has been growing since 2010. Yes, many production apps for micro controllers are written in C or Python, but asking a student learning robotics to master C programming first is making things way too hard.
The Johnny-Five framework wants to be a baseline control kit for hardware projects. It supports a bunch of single-board computers, including Arduino (all models), Electric Imp, BeagleBone, Intel Galileo and Edison, Linino One, Pinoccio, pcDuino3, Raspberry Pi, Particle/Spark Core and Photon, Tessel 2, and TI LaunchPad. Johnny-Five can support so many boards because it is based on the Firmata protocol and has an IO plug-in architecture.
Depending on the board, Johnny-Five may run in an on-board Linux environment, as well as on a host machine tethered to a client (via Serial USB or Ethernet), communicating over Wi-Fi to the client, or communicating over Bluetooth to the client. Running tethered isn’t a bad thing in a development environment; in production you might want to switch to a board with wireless or internal Linux support.
In Johnny-Five, the basic abstractions are hardware building blocks: boards, LEDs, servos, GPS, motors, relays, buttons, switches, sensors, and so on. Each class has pretty much the methods, properties, events, and collections you’d expect.
-- Martin Heller
Angular
AngularJS has been a popular framework for several years, primarily for its two-way data binding and its model-view-whatever (MVW) architecture. Angular 2 is a complete rewrite of AngularJS that uses TypeScript in preference to JavaScript, but in general maintains backward compatibility with AngularJS.
Angular 2 supports web apps, native mobile apps (via various native renderers), and desktop apps (with Electron). Using TypeScript gives Angular 2 compile-time type and interface validation. TypeScript supports editing and debugging tooling that is aware of the Angular 2 framework and can provide good code completion and error flagging.
Angular 2 was designed with runtime efficiency in mind, as well as productivity: Angular view templates, which have a fairly simple syntax, are compiled into JavaScript that is optimized for modern JavaScript engines. The new component router in Angular 2 can do code-splitting (lazy loading) to reduce the amount of code delivered to render a view. This wasn’t introduced without problems. The initial two versions of the router had some issues with deep links into lazy-loaded code sections. The third time is supposed to be the charm; we’ll know for sure once it has been extensively field-tested.
In addition, Angular 2 has a CLI and a useful wrapper for the Web Animations API. It supports testing with Karma and Protractor, and it has fully ARIA-enabled components for accessibility.
-- Martin Heller
Bootstrap
Bootstrap claims to be the most popular HTML, CSS, and JavaScript framework for developing responsive, mobile-first projects on the web. Given that the main twbs/bootstrap repository on GitHub has more than 97,000 stars and almost 12,000 commits, that may well be so.
Bootstrap can speed up your responsive web development a great deal, especially if you use its fluid grid layout and theme stylesheets. On the other hand, Bootstrap tends to be porky as far as the amount of code it pulls in, violates several web development best practices, and can make your site look exactly like every other plain vanilla Bootstrap site on the planet (it’s popular, remember) if you don’t take the time to customize the appearance.
A front-end framework originally from Twitter, Bootstrap has nothing to say about the back end of an app. You’ll need to make the form actions point to your web or application server, or you can use a data-binding framework such as Angular along with Bootstrap. If you choose the latter, make sure you aren’t pulling in two copies of jQuery. Instead use one of the Angular UI Bootstrap projects that tie the two frameworks together properly.
The current production Bootstrap version as of this writing is v3.3.7; Bootstrap 4 is in alpha test.
-- Martin Heller
Ember
The developers of Ember consider it a framework for developing ambitious web applications. Often compared to Angular, Ember uses an model-view-viewmodel (MVVM) pattern, handlebar templates for data binding, routes to handle URLs and invoke models, customizable HTML components, and a command line for scaffolding and the imposition of convention over configuration. In some ways, you could say that Ember is to Rails what JavaScript is to Ruby.
There are several optional parts in the Ember stack. Ember Data is a data-persistence library that provides many of the facilities of an object-relational mapper. Ember Inspector helps debug Ember apps in Chrome and Firefox. Fastboot lets Ember apps run in Node. Liquid Fire does animations for Ember.
Ember users sometimes report that the fixed conventions of Ember make it easy for them to understand unfamiliar Ember apps. (This is often said of Rails apps as well.) The downside of these conventions is that you have to bounce around at least eight directories to browse the source. An editor or IDE that can follow the references easily and understands .hbs files helps a great deal when working with Ember code.
Major Ember users include Zendesk, Yahoo, and Square.
-- Martin Heller
SamsaraJS
Last year we noted the promising Famous Engine, which made mobile web applications fast enough for 60fps animation, and the abrupt end to its open source development. One of the former senior developers at Famo.us, Dave Valdman, has since created the SamsaraJS project, a functional reactive library for animating layout. Like Famous, Samsara is about presentation -- moving rectangles around the screen at 60fps -- rather than content, and about web apps, not native mobile. It’s most definitely not an MVC framework, but you can use an MVC framework to set content into Samsara surfaces.
According to Valdman, “Samsara is about exploring user interfaces that rely heavily on gestures and animation. Often animation is used as some kind of attention-grabbing trick, but it can be more useful than that. Animation is the difference between discrete user interactions and continuous ones. For example ‘click page next’ versus ‘infinite scroll,’ or ‘click to close’ versus ‘swipe away.’ I think in the future, UIs that animate continuously with their user will be the norm, and clicking on things will be a thing of the past. Samsara tries to make these kinds of interactions easier and performant.”
-- Martin Heller
Bower
Bower is a front-end package manager often used for configuring versioned web projects, installing components such as frameworks, libraries, assets, and utilities. Bower requires Node.js, NPM, and Git, and it works hand-in-hand with build tools such as Grunt and Gulp, scaffolding tools such as Yeoman, and module loaders such as RequireJS. Bower is supported by several IDEs, including Visual Studio, NetBeans, and WebStorm.
Bower works by fetching and installing packages from all over, using its registry to find them on GitHub if you don’t specify a GitHub endpoint or a URL. Bower keeps track of local packages in a manifest file, bower.json. How you use packages is up to you. The Bower command line allows you to maintain your bower.json files as well as installing all the packages mentioned in any existing bower.json manifest.
Bower originated at Twitter and now has a couple of hundred contributors.
-- Martin Heller
Yeoman
Yeoman is a scaffolding tool for web applications that works with a generator ecosystem, with build tools such as Gulp and Grunt, and with package managers such as NPM and Bower. There are more than 4,000 generators in the Yeoman ecosystem, including very popular ones with 10,000-plus installs, for tools and frameworks such as Angular, Karma, Mocha, and Jasmine.
To get started with Yeoman, install it with NPM:
npm install -g yo
Then either install a generator with NPM or use Yeoman interactively to install a generator:
npm install -g generator-webapp
Alternatively, typing yo
at the command line (see screenshot above) allows you to interactively search and install a Yeoman generator with Yeoman itself -- handy because there are so many different generators available.
Once you have the generator you need installed, you can use it to create scaffolding for a project:
yo webapp
Yeoman can create projects in any language. The yeoman/yeoman repository on GitHub has more than 8,000 stars and 1,000 commits.
-- Martin Heller
JSHint
Starting in 1979, Lint was a standard Unix tool for checking C code for suspicious and nonportable constructs that might not be caught by the compiler. In the C/C++ world, most of the functions of Lint have long since been incorporated in the compiler.
New languages have spawned new tools, however. Lint-checking is especially important for JavaScript because it’s an interpreted, weakly typed language.
There are several good JavaScript linters: JSLint, its friendlier successor JSHint, and the pluggable ESLint are the top three by most accounts. JSHint is flexible and easy to set up. JSLint is less flexible, and ESLint is harder to set up. The jQuery, Bootstrap, and CouchDB projects are all standardized on JSHint, as are many software companies that use JavaScript heavily, such as Facebook, Twitter, and Yahoo.
In general, you should have JSHint plugged into your build process in multiple places. Install it to run automatically in the code editor you use for JavaScript projects so that you can fix problems as you code. Also run it automatically as part of the Grunt, Gulp, or other automation scripts you use to build your projects.
-- Martin Heller
Swift
Apple released the Swift programming language, one of the fastest-growing languages in history, to open source in December 2015. Since then, Swift has become more and more useful. The language has been slimmed down to be much less verbose, and the Core Foundation import and bridging have been beefed up to better handle memory management and type mapping between Swift code and system APIs.
According to Apple, Swift was designed to build on the best of C and Objective-C, but not to be backward-compatible with C. Swift has “safe programming patterns” and memory management by automatic reference counting. Swift got named parameters and the dynamic object model from Objective-C, and it has access to Cocoa frameworks.
Alas, Swift also picked up incredibly bloated source code from Objective-C. For example, take this bit of string code:
" Hello ".stringByTrimmingCharactersInSet(.whitespaceAndNewlineCharacterSet())
Let’s see. We know that the base object is a string, and white space and newlines are a character set. Do we need all that verbiage?
In Swift 3, the answer is no. The code above becomes:
" Hello ".trimmingCharacters(in: .whitespacesAndNewlines())
That’s the good news. The bad news is that all existing Swift code needs to be upgraded, typically using the Migrator tool in Xcode 8.
More good news is that Swift is also available on Linux (officially) and Windows (unofficially).
-- Martin Heller
Visual Studio Code
Visual Studio Code is a lightweight, portable, open source IDE from Microsoft, written mostly in TypeScript and built on top of the Electron shell. VS Code provides comprehensive editing and debugging support, an extensibility model, Git support, and lightweight integration with existing tools. There is a good selection of extensions to VS Code available from Microsoft and the community.
VS Code 1.3 introduced Visual Studio-like tabs in the editor, as well as Visual Studio-like global search and replace and diff-ing. Version 1.3 also introduced a new paradigm for stacks of open editors and the concept of a preview editor. Single-clicking a file in the Explorer brings the file up in a nonsticky editor that will be replaced by the next preview. Editing the file makes it sticky. VS Code retains its Visual Studio-like peeked editors.
VS Code separates the editor from language services. Various language services offer different levels of functionality. TypeScript and C# support refactoring on top of IntelliSense, while C++, CSS, HTML, JavaScript, JSON, Less, PHP, Python, and Sass support IntelliSense, outlining, and linting, but not refactoring. Another 20-odd languages only support syntax coloring and bracket matching.
VS Code has built-in debugging support for the Node.js runtime, and it can debug JavaScript, TypeScript, and any other language that gets transpiled to JavaScript. PHP, Ruby, Go, C#, Python, and many other languages have extensions that support debugging.
VS Code runs on OS X, Linux (both Debian and Red Hat distros), and Windows. VS Code on Windows requires .Net Framework 4.5.
-- Martin Heller
R
The R language, along with its environment, implements statistical computing and graphics: linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, and so on. As I discussed in 2015, R is a popular and useful tool for data scientists and statisticians that is increasingly being applied to big data. R is often compared to Python (augmented with NumPy, Pandas, and Statsmodels) for data analysis and machine learning.
The R project uses the GNU GPL 2 license and is run by an open source foundation that is dominated by the core development team. The Comprehensive R Archive Network (CRAN) hosts R source and binaries, as well as the many R add-on packages. There is an R Journal and an annual R user conference.
RStudio is a very good R development environment that is free and open source, but also has supported commercial licenses. There are a half-dozen other R IDEs and a dozen editors with R support, including Emacs.
Microsoft recently acquired Revolution R, and it released an R Server, as well as an R component in SQL Server 2016. Other databases with R support include dashDB, Vertica, Greenplum, Oracle, and Teradata.
The current build of R is version 3.3.1, Bug in Your Hair. (You can’t make this stuff up.) It is available compiled for Linux, OS X, and Windows, and in source code form, all from CRAN mirrors around the world. The read-only wch/r-source repository on GitHub, which is pulled from the R Project Subversion repository on an hourly basis, has almost 50,000 commits.
-- Martin Heller
Pandas
The Python language by itself is great for massaging data, but not so great for analysis and modeling. Enter Pandas, which provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pythonistas claim that Pandas combined with Jupyter notebooks, Scikit-learn (for machine learning), and Statsmodels (statistics and econometrics) makes it unnecessary for them to use R for data analysis and modeling.
The key features of Pandas include a fast and efficient R-like DataFrame object for data manipulation with integrated indexing; methods for reading and writing data between in-memory data structures and different formats; data alignment and handling of missing data; reshaping and pivoting of data sets; aggregating or transforming data with a SQL-like group by
engine; merging and joining of data sets; hierarchical axis indexing; time series functionality; and optimized performance. Pandas requires NumPy and optionally SciPy, Matplotlib, and Statsmodels. Pandas is in turn a dependency of Statsmodels.
-- Martin Heller
Scikit-learn
Scikit-learn is one of the key components of a Python-based toolkit for data analysis, along with Pandas, Jupyter notebooks, Statsmodels, and its own dependencies NumPy, SciPy, and Matplotlib. The Scikit-learn Python module implements tools for machine learning, including algorithms for classification, clustering, regression, dimensionality reduction, feature extraction, model selection, and preprocessing. There are enough algorithms that Scikit-learn supplies a cheat sheet to point you in the correct direction.
The machine learning algorithm selection in Scikit-learn is limited to well-established options, meaning there have been at least three years since publication, with 200-plus citations, wide use, and demonstrated utility. There are many companion Python modules that handle tasks outside the scope of Scikit-learn: Seqlearn and Hmmlearn for hidden Markov models, PyStruct for structured predictions, Pandas and Statsmodels for statistics, Theano for deep learning, Scikit-image for image processing, and NLTK for natural language processing.
-- Martin Heller
Caffe
Caffe is a deep learning framework from the Berkeley Vision and Learning Center, released under the BSD 2-Clause license. The core Caffe framework is written in C++ with support for CUDA on Nvidia GPUs and the ability to switch between running on CPUs and GPUs. Caffe has command-line, Python (including Jupyter Notebook), and Matlab interfaces.
While Caffe was originally aimed at computer vision learning using neural networks, it has also been used for speech, image sequence, and multimedia recognition. Philosophically, it is designed to be an expressive, extensible, modular, high-speed framework supported by an active community.
As a deep learning framework, Caffe implements a number of different kinds of compute layers -- such as data, convolution, loss, reduction, and pooling -- connected into a directed acyclic graph (DAG) network. An assortment of solvers, such as stochastic gradient descent (STG), find the best parameters by forward inference and back-propagation, with automatic calculation of the gradients at each step. A “model zoo” is useful for sharing networks and solved weights. You can speed up your own work by adapting existing solved networks for different features and classifications.
Caffe installs on Docker, Ubuntu, OS X, RHEL, CentOS, Fedora, and Windows. An experimental branch of the Caffe repository implements an OpenCL back end as an alternative to CUDA.
Caffe is used at Facebook and Pinterest to recognize objects and flag objectionable content in uploaded images, at Adobe to catalog typefaces, and at Yahoo Japan to personalize news and content.
-- Martin Heller
CNTK
CNTK, the Computational Network Toolkit from Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. Microsoft describes it as production-quality, open source, multimachine, multi-GPU, and highly efficient for neural network training to recognize and classify speech, images, and text. You can install CNTK on Windows or Linux, on Azure, or as a Docker container.
When CNTK was released in January, Microsoft’s own comparisons showed it to be faster than all of the competing neural network learning toolkits -- Theano, TensorFlow, Torch 7, and Caffe -- running on the same hardware. Microsoft also claimed it was the only one to scale multiple CUDA (Nvidia) GPUs across multiple machines. Since then, Google has revealed its Tensor Processing Unit, which may change the speed equation when Tensor Processing Unit services for TensorFlow are available to Google Cloud customers. Google also added distributed computing support to TensorFlow.
CNTK supports Feed Forward, CNN (convolutional neural networks), LSTM (long short-term memory), and RNN (recurrent neural networks), and a full suite of training algorithms. Unique to CNTK, a 1-bit SGD algorithm improves performance for deep neural network training, but has a more restrictive license than the rest of CNTK.
A newer version of CNTK, planned for September 2016, will include Reinforcement Learning pipelines, as well as CNTK APIs supporting Python, C++, and .Net (C#) bindings.
-- Martin Heller
NLTK
NLTK, the Natural Language Toolkit, is a platform for building Python programs to work with human language data. It provides interfaces to more than 50 corpora and lexical resources such as WordNet, along with wrappers for natural language processing languages, and a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.
NLTK originated at the University of Pennsylvania, and it is currently being used in courses at 32 universities worldwide. Highlights of NLTK include lexical analysis (that is, word and text tokenization); n-gram and collocations; part-of-speech tagging; a tree model and text chunker; and named-entity recognition.
NLTK is available for Windows, OS X, and Linux. There is an online book about NLTK, Natural Language Processing with Python. NLTK requires Python 2.7 or 3.2 or later.
-- Martin Heller
TensorFlow
If there is a “magic sauce” at Google today, it is machine learning and deep neural networks. The machine learning package Google uses is TensorFlow, assisted by Tensor processing units (TPUs) in its datacenters. TensorFlow was developed by the Google Brain team over several years and released to open source in November 2015.
TensorFlow does computation using data flow graphs for scalable machine learning. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code.
You can install TensorFlow on Ubuntu Linux and OS X, using Python 2.7, 3.4, or 3.5. Nvidia CUDA GPUs are supported on Linux. Google supplies Docker images for TensorFlow with and without GPU support.
The principal language for using TensorFlow is Python, and there is limited support for C++. The tutorials supplied with TensorFlow include applications for classification of handwritten digits, image recognition, word embeddings, recurrent neural networks, sequence-to-sequence models for machine translation, natural language processing, and PDE (partial differential equation)-based simulations.
The tensorflow/tensorflow repository on GitHub has more than 30,000 stars and almost 8,000 commits. According to Jeff Dean, leader of the Google Brain team, in February 2016 there were 1,500 repositories on GitHub that mentioned TensorFlow, five of which were from Google.
-- Martin Heller
Theano
Theano is a Python library that lets you define, optimize, and evaluate mathematical expressions, especially ones with multidimensional arrays. It was developed at the LISA lab of the University of Montreal to support rapid development of efficient machine learning algorithms, and it has been used to support large-scale, computationally intensive scientific investigations since 2007. The University of Montreal uses Theano in its machine learning and deep learning classes.
Theano installs on Linux, MacOS, and Windows, and it can use Python 2 or Python 3. Theano is tightly integrated with NumPy. It can use a CUDA (Nvidia) GPU transparently if you install GPU support.
Theano combines aspects of a computer algebra system (CAS) with aspects of an optimizing compiler. It can generate custom C code for many mathematical operations, can perform symbolic differentiation (for computing gradients, which are key to many machine learning optimizers), and can recognize some numerically unstable expressions and compute them with more stable algorithms.
A number of related projects use Theano, including Lasagne, Blocks, Keras, and OpenDeep to do neural network training; DeepMedic to do brain lesion segmentation; and Theanet and Elektronn for image classification.
-- Martin Heller
Torch
Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to the LuaJIT scripting language and an underlying C/CUDA implementation. (There is also an OpenCL port.) Torch comes with a large ecosystem of community-created packages in machine learning, computer vision, signal processing, parallel processing, image and video processing, and networking among others, and it builds on top of the Lua community.
Neural networks, energy-based models, and numeric optimization are core features of Torch. These are built on top of tensors (n-dimensional arrays, themselves built on top of Lua tables) and basic linear algebra operations.
The Lua language provides benefits to users because it is easy to read and write. On the downside, Lua is less familiar to the deep learning community than Python.
The torch/torch7 repository on GitHub has more than 5,000 stars and 1,000 commits. The ratings on this repository understate the usage of Torch, however, as it is normally installed on Ubuntu, RHEL, and OS X by running a script. There is an EC2 AMI for Torch, as well as Docker images. You can embed Torch libraries into iOS, Android, and video game apps.
Torch is in use at Facebook and Twitter, as well as many universities and industrial labs. Google used Torch extensively before developing TensorFlow.
-- Martin Heller
GitLab
You can’t run a modern development operation without a distributed version control system. Open source tools like Git and Mercurial partially fill the need, but by themselves these tools restrict you to command-line interactions. This is where hosted tools like GitHub and Bitbucket come in, but both are closed source and their road maps have been somewhat opaque in the past. GitLab is an open source alternative to GitHub and Bitbucket with compelling features, an aggressive development cycle, and an exciting road map.
GitLab isn’t merely an open source version that stops at standard features like browsing code, reviewing merge requests, and submitting issues. It has support for confidential issues, for when you want to submit a sensitive or security-related issue to an open source project. It allows you to subscribe to an issue label, so you’ll get a notification any time the label is added to an issue. Need a configurable Kanban-style issue tracker? GitLab has you covered. It even has continuous integration out of the box and can deploy directly to Kubernetes.
-- Jonathan Freeman
Copyright © 2016 IDG Communications, Inc.