What’s new in the Snowflake Snowpark from Snowdays
- John Morrell
- November 19, 2021
New Features from Snowflake Snowdays!
At Snowday earlier this week, Snowflake announced a number of new features, with each being interactively demonstrated, and interviewed a few key customers. Among the new capabilities were the following:
- Multiple enhancements to Snowpark, most notably new support for Python and extended support for Java & Scala,
- Data engineering feature enhancements, including faster data ingestion, schema detection, and ingestion dashboard, and serverless tasks,
- Additional capabilities to support data lakes within Smart Storage, including support for unstructured data, future integration with Delta Lake and Iceberg, and optimizations around file compression, scans, and queries,
- Cross cloud data replication to replicate data between Snowflake instances residing with different cloud providers,
- Governance features for dynamic and conditional data masking, access history, and object dependencies
- Support for a new usage-based pricing model and “try before you buy” data trials in the data marketplace.
Let’s explore what’s new from Snowflake Snowdays, in the Snowpark and data engineering enhancements as they are most relevant to the analytics and data engineering teams’ Datameer services with our solutions.
When Snowpark was first introduced, it provided a Dataframe API that allowed teams to access data in Snowflake from their favorite languages and notebooks. Snowpark can be used by data and analytics teams for everyday data querying, to create data engineering pipelines, and now for data science usage.
When Snowpark was first introduced a few months ago, the intent was for Snowpark to support a variety of languages, including SQL, Java, Scala, and Python. What Snowflake announced was that Python, Java, and Scala were available in private or public preview. This allows teams with diverse programming skills across the most popular languages to use Snowpark.
With the Java and Scala support, teams can use Scala via Snowpark on the client-side and use Java for table functions (UDFs), stored procedures, and processing unstructured files. In addition, the new Snowpark Java API is under development and will be released in the future. Execution is performed within a native Java engine embedded within Snowflake.
The new Python support within Snowpark also involves a native Python runtime engine embedded within Snowflake. The Python dataframe APIs are similar to PySpark. Python code is pushed down into Snowflake, where dataframe functions are converted into Snowflake SQL, then optimally executed. The integrated Python engine enables complete and common security, governance, and management from within Snowflake.
In addition, Snowflake announced integration with Anaconda’s Python management services and hundreds of open-source Python libraries. This facilitates library and package management directly inside of the integrated Snowflake Python engine.
The key win of these announcements is the ability for teams with diverse programming language skills to work with and collaborate around Snowflake. The data used and consumed remains inside of Snowflake (it doesn’t need to be replicated or moved) with saves time and keeps the data secured and governed from one place – Snowflake. It also lets team members use their most familiar programming models and skills.
Python support is also highly relevant in getting more data scientists to actively use Snowflake. Python is a widely used language for data science. Now, with Snowpark Python support, data engineers, analysts, and data scientists can have shared use of Snowflake for their tasks.
Data Engineering Enhancements
Snowflake Snowdays also introduced a number of new enhancements targeting data engineering and the building and management of data pipelines. This includes:
- Faster data ingestion with improvements of between 50 and 68% for latency on data ingestion processes,
- Automated schema detection from Parquet, AVRO, and ORC files and creation of table objects,
- An ingestion dashboard that allows teams to get a complete view of their data ingestion tasks and drill down into problems, and
- Serverless tasks where Snowflake will automatically figure out and configure the resources required for a task.
All of these enhancements help improve the efficiency and operation of data pipelines within Snowflake to lower operating costs.
The most important aspect of these announcements is the Java, Scala, and Python support within Snowpark. Snowflake’s objective with this announcement was to support a more diverse programming audience to Snowpark and the Snowflake Data Cloud platform. The new capabilities do achieve that objective.
However, the two audiences that continue to be left out of the Snowflake picture are the non-programmers and programmers that would prefer to take a low- or no-code approach to data transformation and modeling. In addition, the highly programmatic approach of Snowpark limits the collaboration between programmatic data teams and lower-coding analytics teams.
This is where Datameer comes into play. The Datameer solution is the industry’s first data transformation platform that offers a unique multi-persona SQL-code/Low-code/No-code toolset that lets anyone in the organization, regardless of technical skill levels, work with data. Datameer gives data engineers the control they require and analysts the ability to support a collaborative data modeling and transformation process. With Datameer, anyone in the organization can model and transform data.
In addition, the Datameer solution enables organizations to transform and model data directly in Snowflake. All transformation models are contained, managed, and executed within Snowflake. This lets you use the flexibility and dynamic scalability Snowflake offers, maintains a single repository for data and data models, eliminates data movement, and keeps your data safe and secure within your Snowflake platform.
The Datameer SaaS data transformation solution for Snowflake is available today. Organizations are invited to schedule a personalized private preview to get a more detailed look at this unique new solution.