After making Astra Streaming generally available last year, DataStax on Wednesday said that it was adding a Schema GPT Translator to the data and event streaming service.
Built on open source Apache Pulsar, Astra Streaming is a managed service that is available on AWS, Google Cloud Platform (GCP), and Microsoft Azure, enabling enterprises to stream real-time data for their applications.
The new Schema GPT Translator, according to DataStax, automatically generates schema mappings, which frees up developers to focus on other components of building real-time data pipelines instead of coping with the time-consuming process of manually creating these mappings.
The creation of schema mappings is an essential part of developing a data pipeline process as it enables data integration and interoperability among multiple systems and data sources.
“Systems within a streaming pipeline typically use different approaches for schema representations and data type definitions. This requires schemas within a pipeline to be mapped to each other manually, a process which is complicated, tedious, and error-prone,” Jamie Ferguson, senior director of product management at DataStax, wrote in a blog post.
“In addition to the complexity involved in creating schema mappings, these mappings must be updated when schemas evolve,” Ferguson added.
Schema GPT Translator cuts out manual processes
In order to circumvent the manual time-consuming process, the Schema GPT Translator captures the contextual relationships and dependencies in a schema, and quickly and accurately generates mappings to other schema representations and data types, the company said in its announcement of the new feature.
“Schema Translator fits into that evolution of approach around connecting data sources like databases into applications and vice versa, from object-relational mapping (ORMs) and API support to automated recommendations based on a generative AI model,” it added.
Currently, the GPT Translator is available as part of the Astra DB Sink Connector and can generate mappings for schemas in Astra Streaming (represented in JSON or Avro) to schemas in Astra DB (represented in Contextual Query Language), with support for additional connectors coming soon, the company said.
An advantage of using the translator, according to Ferguson, is the quick update of schema mappings as schemas evolve in order to support changes in streaming pipelines due to new data sources or changes in business requirements.
Enterprises that subscribe to Astra Streaming will get the new Schema GPT Translator at no additional cost. Astra Streaming offers subscriptions in three tiers, including a pay-as-you-go model.
GPT stands for generative pre-trained transformer, a type of AI model based on deep learning techniques. The term was popularized by OpenAI's ChatGPT, though the company is not the only one to use it, and has released several versions of its own GPT. DataStax did immediately say which GPT it is using.
Last week, DataStax said that it was partnering with Google Cloud to bring vector search to AstraDB in an attempt to make Apache Cassandra more compatible with AI and large language model (LLM) workloads.
AstraDB, built on Apache Cassandra, will arguably be one of the first to bring vector search to the open source distributed database. Currently, vector search for Cassandra is being planned for its 5.0 release, a post by the database community, where DataStax is a member, showed.