Data Science

Data Science | News, how-tos, features, reviews, and videos

digital abstract financial numbers floating on screen

As data science goes mainstream, so does its language

Python may be the second choice to R, but its popularity and ease of use positions it to dominate data science.

cliff diving taking the plunge dive into a project ocean swimming by aydinmutlu getty 2400x1600

What is a data lake? Massively scalable storage for big data analytics

Dive into data lakes—what they are, how they're used, and how data lakes are both different and complementary to data warehouses.

career roadmap primary

Career roadmap: Machine learning scientist

Data scientists and machine learning scientists have similar roles, but a machine learning scientist specializes in researching and implementing complex algorithms.

dock on lake at sunset

Review: Databricks Lakehouse Platform

Databricks Lakehouse Platform combines cost-effective data storage with machine learning and data analytics, and it's available on AWS, Azure, and GCP. Could it be an affordable alternative for your data warehouse needs?

data pipeline primary

Databricks targets data pipeline automation with Delta Live Tables

The company’s new ETL framework aims to cut down the time taken by data scientists and engineers setting up reliable data pipelines and managing infrastructure.

data scientist career rm

Career roadmap: Machine learning engineer

As organizations worldwide adopt machine learning across virtually every industry, the demand for machine learning engineers is on the rise.

financ table spreadsheet team collab

5 ways spreadsheets kill your business

Potentially error-prone, unsecured, and hard to maintain, spreadsheets create data silos and discourage collaboration.

big data blue

Use synthetic data for continuous testing and machine learning

Where real data is unethical, unavailable, or doesn’t exist, synthetic data sets can provide the needed quantity and variety.

Private file card drawer

Google releases differential privacy pipeline for Python

PipelineDP allows datasets containing personal information to be aggregated in a way that preserves the privacy of individuals.

illuminated network

Best practices for developing governable AI

Focus on these engineering best practices to build high-quality models that can be governed effectively.

An infinity symbol hovers over a horizon line of sea and sky. [continuous cycle / iterative process]

How CI/CD is different for data science

Moving data science into production has quite a few similarities to deploying an application. But there are key differences you shouldn’t overlook.

upside down turtle tortoise shell stuck rollover slow malfunction

The high cost of data science toil

Data science toil saps agility and prevents organizations from scaling data science efforts efficiently and sustainably. Here’s how to avoid it.

team of computer engineers work on machine learning neural network picture id1182697691

Don’t rush to machine learning

A simpler approach—good data, SQL queries, if/then statements—often gets the job done.

artificial intelligence automation digital brain thinkstock 875595818 100749926 orig

JetBrains previews data science IDE

DataSpell offers data analysis and machine learning model prototyping.

ai artificial intelligence ml machine learning abstract face

What is AI bias mitigation, and how can it improve AI fairness?

Algorithmic biases that lead to unfair or arbitrary outcomes take many forms. But we also have many strategies and techniques to combat them.

robot monitoringa  cog wheel system for maintenance [automation]

When RPA meets data science

Data science can make robotic process automation more intelligent. Robotic process automation make it easier to deploy data science models in production.

momentum man pushing boulder uphill conquer challenge by yogysic getty images 479447604

Data science needs drudges

Quality data science outputs depend on quality inputs. Data cleansing and preparing may not be exciting work, but it’s critical.

Conceptual trend lines track + monitor data analytics [forecasting / future / what's next]

An introduction to time series forecasting

Time series forecasts are used to predict a future value or a classification at a particular point in time. Here’s a brief overview of their common uses and how they are developed.

big data analytics computer laptop app security

6 essential Python tools for data science—now improved

SciPy, Numba, Cython, Dask, Vaex, and Intel SDC all have new versions that aid big data analytics and machine learning projects.

robot arm using laptop

The real successes of AI

Despite the hype, especially around self-driving cars, AI is writing code, designing Google chip floor plans, and telling us how much to trust it.