Cloud-based data warehouse company Snowflake is shifting its attention toward large language models and generative AI. Launched in 2014 with a focus on disrupting the traditional data warehouse market and big-data analytics, the company has continued to add new features, such as its Native Application Framework, to target different sets of enterprise users.
At its annual Snowflake Summit Tuesday, the company announced Snowpark Container Services, a partnership with Nvidia, and updates to its Streamlit Python library designed to help enterprise users manage large language models (LLMs) and build applications using them from within its Data Cloud Platform.
Snowpark Container Services, currently in private preview, will allow enterprises to bring more diverse workloads, including LLMs, to the Data Cloud Platform, said Christian Kleinerman, senior vice president of product at Snowflake, adding that it also allows developers to build applications in any programming language.
The new container services acts as a linchpin, connecting enterprise data stored in Snowflake with LLMs, model training interfaces, model governance frameworks, third-party data augmenting applications, machine learning models, APIs, and Snowflake’s Native Application Framework.
“Snowpark Containerized Services will help companies to move workloads, such as machine learning models or LLMs, between public and private cloud based on the client’s preferences,” said Hyoun Park, lead analyst at Amalgam Insights.
The process of moving workloads securely will become increasingly important as enterprises discover that the massive data entry and usage associated with training LLMs and other machine learning models are potential compliance risks, causing them to move these models to governed and isolated systems, Park added.
Container Services will also help reduce the burden on Snowflake’s data warehousing engine as it will run in an abstracted Kubernetes environment, according to Doug Henschen, principal analyst at Constellation Research.
“Simply put, it is a way to run an array of application services directly on Snowflake data but without burdening the data warehouses and performance sensitive analytical applications that run on them,” Henschen said.
Nvidia partnership provides technology for LLM training
In order to help enterprises train LLMs with data they have stored in Snowflake, the company has partnered with Nvidia to gain access to its AI Platform, which combines hardware and software capabilities. Snowflake will run Nvidia NeMo, a part of the AI Platform, from within the Data Cloud, the company said, adding that NeMo can be used for developing generative AI-based applications such as chatbots and intelligent search engines.
In addition, Snowpark Container Services will allow enterprises to gain access to third-party generative AI model providers such as Reka AI, said Sanjeev Mohan, principal analyst at SanjMo.
Other LLMs, such as those from OpenAI, Cohere and Anthropic, also can be accessed via APIs, Mohan said.
Snowflake’s updates reveal a strategy that is aimed at taking on Databricks, analysts said.
“Databricks is currently offering far more capabilities for building native AI, ML [machine learning] models than Snowflake, especially with the MosiacML acquisition that promises abilities to train models cheaper and faster,” said Andy Thurai, principal analyst at Constellation Research.
The difference in strategy between the two companies, according to dbInsights’ principal analyst Tony Baer, seems to be their approach in expanding their user bases.
“Snowflake is seeking to extend from its base of data and BI developers to data scientists and data engineers, while Databricks is approaching from the opposite side,” Baer said.
Document AI generates insights from unstructured data
The new Container Services will allow enterprises to access data-augmenting and machine learning tools, such as Hex’s notebooks for analytics and data science, AI tools from Alteryx, Dataiku, and SAS, along with a data workflow management tool from Astronomer that is based on Apache Airflow, the company said. Third-party software from Amplitude, CARTO, H2O.ai, Kumo AI, Pinecone, RelationalAI, and Weights & Biases are also available.
Snowflake also said that it was releasing a self-developed LLM, dubbed Document AI, designed to generate insights from documents.
Document AI, which is built on technology from Snowflake’s acquisition of Applica last year, is targeted at helping enterprises make more use of unstructured data, the company said, adding that the new LLM can help enhance enterprise productivity.
DbInsights’ Baer believes that the addition of the new LLM is a step to keep pace with rival offerings from the stables of AWS, Oracle, and Microsoft.
MLOps tools and other updates
In order to help enterprises with machine learning model operations (MLOps), Snowflake has introduced the Snowpark Model Registry.
The registry, according to the company, is a unified repository for an enterprise’s machine learning models. It's designed to enable users to centralize the publishing and discovery of models, thereby streamlining collaboration between data scientists and machine learning engineers.
Although rivals such as AWS, Databricks, Google Cloud and Microsoft offer MLOps tools already, analysts see the new Model Registry as an important update.
“Model registries and repositories are one of the new great battlefields in data as companies choose where to place their treasured proprietary or commercial models and ensure that the storage, metadata, and versioning are appropriately governed,” Park said.
In addition, Snowflake is also advancing the integration of Streamlit into its Data Cloud Platform, bringing it into public preview for a final fine-tuning before its general release.
Further, the company said that it was extending the use of Apache Iceberg tables to an enterprise’s own storage.
Other updates, mostly targeted at developers, include the integration of Git and a new command line interface (CLI) inside the Data Cloud Platform, both of which are in private preview.
While the native Git integration is expected to support CI/CD workflows, the new CLI will aid in application development and testing within Snowflake, the company said.
In order to help developers ingest streaming data and eliminate the boundaries between batch and streaming pipelines, Snowflake also unveiled new features in the form of Dynamic Tables and Snowpipe Streaming.
While Snowpipe Streaming is expected to be in general availability soon, Dynamic Tables is currently in public preview.
Snowflake also said that is Native Application Framework was now in public preview on AWS.