Databricks Delta Lake 3.0 to counter Apache Iceberg tables

The updates in Delta Lake 3.0 include a new universal table format, dubbed UniForm, a Delta Kernel, and liquid clustering to improve data read and write performance.

Senior Writer, InfoWorld |

database data center futuristic technology — Getty Images

Databricks on Wednesday introduced a new version of its data lakehouse offering, dubbed Delta Lake 3.0, in order to take on the rising popularity of Apache Iceberg tables used by rival Snowflake.

As part of Delta Lake 3.0, the company has introduced a new universal table format, dubbed UniForm, that will allow enterprises to use the data lakehouse with other table formats such as Apache Iceberg and Apache Hudi, the company said.

A data lakehouse is a data architecture that offers both storage and analytics capabilities, in contrast to the concepts for data lakes, which store data in native format, and data warehouses, which store structured data (often in SQL format).

UniForm eliminates the need for manually converting files from different data lakes and data warehouses while conducting analytics or building AI models, Databricks said.

The new table format, according to analysts, is Databricks’ strategy to connect its data lakehouse with the rest of the world and take on rival Snowflake, especially on the backdrop of Apache Iceberg garnering more multivendor support in the past few years.

“With UniForm, Databricks is essentially saying, if you can’t beat them, join them,” said Tony Baer, principal analyst at dbInsight, likening the battle between the table formats to the one between Apple’s iOS and Google’s Android operating system.

However, Baer believes that the adoption of lakehouses will depend on the ecosystem they provide and not just table formats.

“Adoption of data lakehouses is still very preliminary as the ecosystems have only recently crystallized, and most enterprises are still learning what lakehouses are,” Baer said, adding that lakehouses may see meaningful adoption a year from now.

Contrary to Baer, Databricks said its Delta Lake has seen nearly one billion downloads in a year. Last year, the company open sourced its Delta Lake offering and this according to the company has seen the lakehouse get updates from contributing engineers from AWS, Adobe, Twilio, eBay, and Uber.

Delta Kernel and liquid clustering

As part of Delta Lake 3.0, the company has also introduced two other features — Delta Kernel and a liquid clustering feature.

According to Databricks, Delta Kernel addresses connector fragmentation by ensuring that all connectors are built using a core Delta library that implements Delta specifications.

This alleviates the need for enterprise users to update Delta connectors with each new version or protocol change, the company said.

Delta Kernel, according to SanjMo principal analyst Sanjeev Mohan, is like a connector development kit that abstracts many of the underlying details and instead provides a set of stable APIs.

“This reduces the complexity and time to build and deploy connectors. We expect that the system integrators will now be able to accelerate development and deployment of connectors, in turn further expanding Databricks’ partner ecosystem,” Mohan said.

Liquid clustering has been introduced to address performance issues around data read and write operations, Databricks said.

In contrast to traditional methods such as Hive-style partitioning that increases data management complexity due to its use of a fixed data layout to improve read and write performance, liquid clustering offers a flexible data layout format that Databricks claims will provide cost-efficient clustering as data increases in size.

Next read this:

Anirban Ghoshal is a senior writer, covering enterprise software for CIO and databases and cloud infrastructure for InfoWorld.