Big Data
Big Data | News, how-tos, features, reviews, and videos
Splunk’s new AI tools aim to ease security, observability tasks
The AI tools introduced at the company’s .conf2023 include the Splunk AI Assistant, Splunk Machine Learning Toolkit 5.4, Splunk App for Anomaly Detection, and the Splunk App for Data Science and Deep Learning 5.1.
What is Apache Spark? The big data platform that crushed Hadoop
Fast, flexible, and developer-friendly, Apache Spark is the leading platform for large-scale SQL, batch processing, stream processing, and machine learning.
AWS simplifies data management, analytics with new services
A major theme at re:Invent 2022 was Amazon's efforts to ease data management, as AWS announced new ETL capabilities and features for collaboration, searching and cataloging.
Amazon Omics aims to optimize biological data analysis at scale
The bioinformatics service, made generally available at AWS re:Invent, is designed to help researchers and scientists store and accelerate analysis of genomic and other related biological data types for precision medicine.
AWS Glue upgrades Spark engines, backs Ray framework
Serverless data integration service in the Amazon cloud also adds support for built-in Pandas APIs and the Apache Hudi, Apache Iceberg, and Delta Lake formats.
Starburst Galaxy gets data discoverability updates
At AWS re:Invent 2022, the company also announced support for AWS Lake Formation via Starburst Enterprise suite to help joint customers implement data mesh architecture.
When is enough data enough?
Maybe we don’t need more data, we just need people who understand the data we already have and its value in a business context.
Dremio Cloud review: A fast and flexible data lakehouse on AWS
Dremio Cloud leaps big data in a single bound with a fast SQL engine and optimizations that can accelerate queries dramatically. Plus it lets you use other engines on the same data.
Why Apache Iceberg will rule data in the cloud
Apache Iceberg is an open table format that offers scalability, usability, and performance advantages for very large data sets. Here are five reasons Iceberg is optimal for cloud data workloads.
Databricks adds data governance, marketplace features
The data marketplace and other features are expected to accelerate data engineering tasks with an option for data monetization down the road, Databricks said.
Databricks open sources its Delta Lake data lakehouse
Databricks is open sourcing Delta Lake to counter criticism from rivals and take on Apache Iceberg as well as data warehouse products from Snowflake, Starburst, Dremio, Google Cloud, AWS, Oracle and HPE.
12 programming tricks to cut your cloud bill
Cutting cloud costs is a team effort, and that includes developers. Here are 12 tricks for developing software that is cheaper to run in the cloud.
What is TensorFlow? The machine learning library explained
TensorFlow is a Python-friendly open source library for numerical computation that makes machine learning and developing neural networks faster and easier.
What is a data lake? Massively scalable storage for big data analytics
Dive into data lakes—what they are, how they're used, and how data lakes are both different and complementary to data warehouses.
Where AI has made real progress
Better data infrastructure has provided a big boost to AI’s growth, but some things still require a human.
Deep Dive
Machine learning megaguide: Amazon, Microsoft, Databricks, Google, HPE, IBM
Download InfoWorld's massive roundup of Amazon, Microsoft, Databricks, Google, HPE, and IBM machine learning toolkits
Deep Dive
Public cloud megaguide: Amazon, Microsoft, Google, IBM, and Joyent compared
The top five public clouds pile on the services and options, while adding unique twists
Deep Dive
Quick guide: Learn to crunch big data with R
Get started using the open source R programming language to do statistical computing and graphics on large data sets