Data Science
Data Science | News, how-tos, features, reviews, and videos
As data science goes mainstream, so does its language
Python may be the second choice to R, but its popularity and ease of use positions it to dominate data science.
What is a data lake? Massively scalable storage for big data analytics
Dive into data lakes—what they are, how they're used, and how data lakes are both different and complementary to data warehouses.
Career roadmap: Machine learning scientist
Data scientists and machine learning scientists have similar roles, but a machine learning scientist specializes in researching and implementing complex algorithms.
Review: Databricks Lakehouse Platform
Databricks Lakehouse Platform combines cost-effective data storage with machine learning and data analytics, and it's available on AWS, Azure, and GCP. Could it be an affordable alternative for your data warehouse needs?
Databricks targets data pipeline automation with Delta Live Tables
The company’s new ETL framework aims to cut down the time taken by data scientists and engineers setting up reliable data pipelines and managing infrastructure.
Career roadmap: Machine learning engineer
As organizations worldwide adopt machine learning across virtually every industry, the demand for machine learning engineers is on the rise.
5 ways spreadsheets kill your business
Potentially error-prone, unsecured, and hard to maintain, spreadsheets create data silos and discourage collaboration.
Use synthetic data for continuous testing and machine learning
Where real data is unethical, unavailable, or doesn’t exist, synthetic data sets can provide the needed quantity and variety.
Google releases differential privacy pipeline for Python
PipelineDP allows datasets containing personal information to be aggregated in a way that preserves the privacy of individuals.
Best practices for developing governable AI
Focus on these engineering best practices to build high-quality models that can be governed effectively.
How CI/CD is different for data science
Moving data science into production has quite a few similarities to deploying an application. But there are key differences you shouldn’t overlook.
The high cost of data science toil
Data science toil saps agility and prevents organizations from scaling data science efforts efficiently and sustainably. Here’s how to avoid it.
Don’t rush to machine learning
A simpler approach—good data, SQL queries, if/then statements—often gets the job done.
JetBrains previews data science IDE
DataSpell offers data analysis and machine learning model prototyping.
What is AI bias mitigation, and how can it improve AI fairness?
Algorithmic biases that lead to unfair or arbitrary outcomes take many forms. But we also have many strategies and techniques to combat them.
When RPA meets data science
Data science can make robotic process automation more intelligent. Robotic process automation make it easier to deploy data science models in production.
Data science needs drudges
Quality data science outputs depend on quality inputs. Data cleansing and preparing may not be exciting work, but it’s critical.
An introduction to time series forecasting
Time series forecasts are used to predict a future value or a classification at a particular point in time. Here’s a brief overview of their common uses and how they are developed.
6 essential Python tools for data science—now improved
SciPy, Numba, Cython, Dask, Vaex, and Intel SDC all have new versions that aid big data analytics and machine learning projects.
The real successes of AI
Despite the hype, especially around self-driving cars, AI is writing code, designing Google chip floor plans, and telling us how much to trust it.