Book review: 'Python Tools for Scientists'

Python has a wealth of scientific computing tools, so how do you decide which ones are right for you? This book cuts through the noise to help you deliver results.

Senior Writer, InfoWorld |

Python notebook analytics — dTosh / Shutterstock

Python has earned a name as a go-to language for working quickly and conveniently with data, performing data analysis, and getting things done. But because the Python ecosystem is so vast and powerful, many people who are just starting with the language have a hard time sorting through it all. "Do I use NumPy or Pandas for this job?", they ask, or "What's the difference between Plotly and Bokeh?" Sound familiar?

Python Tools for Scientists, by Lee Vaughn (No Starch Press, San Francisco), to be released in January 2023, is a guide for the Pythonically perplexed. As described in the introduction, this book is intended to be used as "a machete for hacking through the dense jungle of Python distributions, tools, and libraries." In keeping with that goal, the book is confined to one popular Python distribution for scientific work—Anaconda—and the common scientific computing tools and libraries that are packaged with it: the Spyder IDE, Jupyter Notebook, and Jupyterlab, and the NumPy, Matplotlib, Pandas, Seaborn, and Scikit-learn libraries.

Setting up a Python workspace

The first part of the book deals with setting up a workspace, in this case by installing Anaconda and getting familiar with tools like Jupyter and Spyder. It also covers the details of creating virtual environments and managing packages within them, with many detailed command-line instructions and screenshots throughout.

Getting to know the Python language

For those who don't know Python at all, the book's second part is a compressed primer for the language. Aside from covering the basics—Python syntax, data, and container types, flow control, functions/modules—it also provides detail on classes and object-oriented programming, writing self-documenting code, and working with files (text, pickled data, and JSON). If you need a more in-depth introduction, the preface points you toward more robust learning resources. That said, this section by itself is as detailed as some standalone "get started with Python" guides.

Unpacking Anaconda

Part three tours many of the libraries packaged with Anaconda for general scientific computing (SciPy), deep learning, computer vision, natural language processing, dashboards and visualization, geospatial data and geovisualization, and many more. The goal of this section isn't to demonstrate the libraries in depth, but rather to lay out their differences and allow for informed choices between them. An example is the recommendation for how to choose a deep learning library:

If you’re brand new to deep learning, consider Keras, followed by PyTorch. [...] If you’re working with large datasets and need speed and performance, choose either PyTorch or TensorFlow.

Demonstrations

Part four goes into depth with several key libraries: NumPy, Matplotlib, Pandas, Seaborn (for data visualization), and Scikit-learn. Each library is demonstrated with practical examples. In the case of Pandas, Seaborn, and Scikit-learn, there's a fun project involving a dataset (the Palmer Penguins Project) that you can interact with as you read along.

This book does not cover some aspects of scientific computing with Python. For instance, Cython and Numba aren't discussed, and there's no mention of cross-integration with other scientific-computing languages like R or FORTRAN. Instead, this book stays focused on its main mission: guiding you through the thicket of scientific Python offerings available using Anaconda.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.