A Guide to Using NDArrays in Java

AWS’s Deep Java Library provides a powerful NDArray system to speed development.

istock 1179485259
iStock

Within many development languages, there is a popular paradigm of using N-Dimensional arrays. They allow you to write numerical code that would otherwise require many levels of nested loops in only a few simple operations. Because of the ability to parallelize, it often runs even faster than the standard looping as well. This is now standard practice in many fields such as data science, graphics, and deep learning, but can be used in applications far beyond this.

In Python, the standard library for NDArrays is called NumPy. However, there is no equivalent standard library in Java. One offering for Java developers interested in working with NDArrays is AWS’s Deep Java Library (DJL). Although it also contains Deep Learning, the core is a powerful NDArray system that can be used on its own to bring this paradigm into Java. With support for several Deep Learning Frameworks (PyTorch, TensorFlow, MXNet), DJL can allow the NDArray operations to run at a large-scale and across multiple platforms. No matter whether you are running on CPU or GPU, PC or Android, it simply works.

In this tutorial, we will walk through how you can leverage the NDArray from DJL to write your NumPy code in Java and apply NDArray into a real-world application.

Setup

You can use the following configuration in a gradle project. Or, you can skip the setup and try it directly in our interactive online console.

screen shot 2020 12 08 at 12.20.07 pm AWS

That’s it, now we can start our implementation.

Basic operation

Let’s first create a try block to create a scope for our code (If you are using the interactive console, you can skip this step):

screen shot 2020 12 08 at 12.21.03 pm AWS

NDManager helps manage the memory usage of the NDArrays. It creates them and helps clear them as well. Once you finish using an NDManager, it will clear all of the NDArrays that were created within it’s scope as well. NDManager helps the overall system utilize memory efficiently by tracking the NDArray usage.

For comparison, let’s see how the code looks in Python’s NumPy as well. We will start by importing the NumPy library with the standard alias.

screen shot 2020 12 08 at 12.21.54 pm AWS

In the following sections, we are going to compare the implementation and result between NumPy and DJL’s NDArray.

NDArray Creation

ones is an operation to generate N-dim array filled with 1.

screen shot 2020 12 08 at 12.23.33 pm AWS

You can also try out random generation. For example, we will generate random uniform data from 0 to 1.

screen shot 2020 12 08 at 12.24.53 pm AWS

This is just a quick demo of some commonly used functions. The NDManager now offers more than 20 NDArray creation methods that cover most of the methods available in NumPy.

Math operation

We can also try some math operations using NDArrays. Assume we are trying to do a transpose and add a number to each element of the NDArray. We can achieve this by doing the following:

screen shot 2020 12 08 at 12.27.01 pm AWS

DJL now supports more than 60 different NumPy math methods covering most of the basic and advanced math functions.

Get and Set

One of the most powerful features of NDArray is its flexible data indexing inspired by a similar feature in NumPy.

Let’s assume we would like to filter all values in a matrix that are smaller than 10.

screen shot 2020 12 08 at 12.28.05 pm AWS

Now let’s try to do something more complicated. Assume we have 3x3 matrix and we would like to multiply the second column by 2.

screen shot 2020 12 08 at 12.29.19 pm AWS

In the above example, we introduce a concept in Java called NDIndex. It mirrors most of the NDArray get/set functionalities that NumPy supports. By simply passing a String representation, developers can do all kinds of array manipulations seamlessly in Java.

Real world application

These operations are really helpful when we need to manipulate a huge dataset. Let’s walk through a specific use case: Token Classification. In this case, developers were trying to do Sentiment Analysis on the text information they gathered from the users through applying a Deep Learning algorithm to it. NDArray operations were applied in the preprocessing and post-processing to encode and decode information.

Tokenization

Before we feed the data into an NDArray, we tokenize the input text into numbers. The tokenizer in the code block below is a Map<String, Integer> that serves as a vocabulary to convert text into a corresponding vector.

screen shot 2020 12 08 at 12.30.12 pm AWS

Processing

After that, we create an NDArray. To proceed further, we need to create a batch of tokens and apply some transformations to them.

screen shot 2020 12 08 at 12.31.09 pm AWS

Then, we can send this data to a deep learning model. To achieve the same thing in pure Java would require far more work. If we are trying to implement the reshape function above, we need to create an N-dimensional array in Java that looks like: List<List<List<...List<Float>...>>> to cover all the different dimensions. We would then have to dynamically insert a new List<Float> containing the elements to build resulting data structure.

Why should I use NDArray?

With the previous walkthrough, you should have a basic experience using NDArray in Java. To summarize, here is the three key advantages using it:

  • Easy: Access to 60+ operators in Java with a simple input and the same output.
  • Fast: Full support for the most used deep learning frameworks including TensorFlow, PyTorch, and MXNet. Now, you can get your computation accelerated by MKLDNN on CPU, CUDA on GPU and lots more.
  • Deep Learning ready: It supports high dimensional arrays and sparse NDArray inputs*. You can apply this toolkit on all platforms including Apache Spark and Apache Beam for large-scale data processing. It’s a perfect tool for data preprocessing and post-processing.

*Sparse currently only covers COO in PyTorch and CSR/Row_Sparse in MXNet.

About NDArray and DJL

After trying NDArray creation and operation, you might wonder how DJL implement NDArray to achieve these behaviors. In this section, we will briefly walkthrough the architecture of NDArray.

NDArray Architecture

screen shot 2020 12 08 at 12.35.28 pm AWS

As shown above, there are three key layers to the NDArray.

The Interface layer contains NDArray, it is a Java Interface that defines what the NDArray should look like. We carefully evaluated it and made all functions’ signature general enough and easy to use.

In the EngineProvider layer, there are different engine’s implementation to the NDArray. This layer served as an interpretation layer that maps Engine specific behavior to NumPy behavior. As a result, all engines implementation are behaved the same way as NumPy have.

In the C++ Layer, we built JNI and JNA that expose C++ methods for Java to call. It would ensure we have enough methods to build the entire NDArray stack. Also it ensures the best performance by calling directly from Java to C++ since all Engines are implemented in C/C++.

screen shot 2020 12 08 at 12.34.36 pm AWS

Deep Java Library (DJL) is a Deep Learning Framework written in Java, supporting both training and inference. DJL is built on top of modern Deep Learning Frameworks (TenserFlow, PyTorch, MXNet, etc). You can easily use DJL to train your model or deploy your favorite models from a variety of engines without any additional conversion. It contains a powerful ModelZoo design that allows you to manage trained models and load them in a single line. The built-in ModelZoo currently supports more than 70 pre-trained and ready to use models from GluonCV, HuggingFace, TorchHub and Keras.

The addition of the NDArray makes DJL the best toolkit in Java to run your Deep Learning application. It can automatically identify the platform you are running on and figure out whether to leverage GPU to run your application.

From the most recent release, DJL 0.6.0 officially supports MXNet 1.7.0, PyTorch 1.5.0 and TensorFlow 2.2.0. We also have experimental support for PyTorch on Android.

Follow our GitHubdemo repositorySlack channel and twitter for more documentation and examples of DJL!

Related:

Copyright © 2020 IDG Communications, Inc.