For the last few years, Snowflake has position itself in the data world with its exceptional handling of massive datasets. It is a powerhouse in the realm of data management, particularly noted for its capabilities in efficiently managing and querying data at scale.
However, its orientation historically has been more directed towards BI and Datawarehouse than to Machine Learning or AI. With the appearance of LLMs and the profound impact that is having in the data world, Snowflake could no longer afford to stay behind.
This occasion presented a unique challenge beyond merely incorporating tools like Jupyter Notebooks, as done previously. Large Language Models (LLMs) are a distinct entity, posing new challenges to certain aspects of the Snowflake platform
Snowflake’s Architecture Adaptation
Snowflake’s architecture primarily involves handling structured and semi-structured data. Integrating LLMs, which often excel in processing unstructured text data, requires Snowflake to adapt or extend its architecture. This includes ensuring efficient data transfer between structured data storage and the LLMs, potentially demanding additional layers of data processing.
Additionally, limited integration with the machine learning ecosystem made ML tasks more difficult. Users typically had to extract data from Snowflake, process it using external ML tools (like Python scripts, Jupyter notebooks, or platforms like TensorFlow or PyTorch), and then, if necessary, load the results back into Snowflake. This would make the process slower and less secure.
Regarding the high computational demands of LLMs, models often require high-performance GPUs, specialized in neural network chips, which were not the central focus of Snowflake’s original data warehousing design.
To address these issues Snowflake has open 3 new fronts: Snowflake Cortex, Snowpark Container Services and Document AI and as Jack the Ripper would say, let’s go by parts.
Snowflake Cortex
This service forms the foundation for incorporating ML into the Snowflake environment. One of the key features of Snowflake Cortex is its set of serverless functions. These functions enable users to perform a variety of tasks, such as data analysis and application development, directly within the Snowflake environment. This means that users can execute these functions without having to manage the underlying infrastructure, which significantly simplifies the process of developing and running AI-powered applications.
Snowpark Container Services
Through the introduction of Snowpark Container Services,a framework that provides the infrastructure for running applications or services, Snowflake allows developers to run containerized data apps using Snowflake-managed infrastructure. This includes the ability to run containers accelerated with NVIDIA GPUs, which is vital for handling the computational demands of LLMs. By enabling these containers to operate directly within Snowflake accounts, the service ensures efficient data transfer and processing, bridging the gap between Snowflake’s structured data storage and the unstructured data processing capabilities of LLMs.
Document AI
As part of its strategy to better handle unstructured data, Snowflake has also developed Document AI. This tool leverages a purpose-built, multimodal LLM that is natively integrated within the Snowflake platform. Document AI allows users to extract and process content from unstructured documents, such as invoices or contractual terms, using a visual interface and natural language. This integration directly addresses the challenge of processing unstructured data within Snowflake’s traditionally structured data environment
From enhancing serverless functions in Snowflake Cortex to harnessing the power of NVIDIA GPUs in Snowpark Container Services, and simplifying unstructured data processing with Document AI, Snowflake is poised to enter the battle of the next generation of data.
Let’s dive deeper into these new functionalities in future posts.