Ideal Development Environment Setup for Snowflake

  • How-Tos FAQs
  • September 29, 2020
Get Started Transforming Your Data in Snowflake

Ideal Development Environment Setup for Snowflake

We would like to talk about the Snowflake development environment setup. Please note this is more of an opinion piece and what is, in our mind, the ideal environment setup.

Language:

Snowflake started out with native support for Python. Java and other JVM-based languages are slowly starting to get the right support by way of Snowpark APIs. However, the flexibility currently on Snowflake with Python is very much missing with other languages. Time will tell, which will be the ultimate combination but for now, Python is the clear winner. That being said, Python is also a favourite language, it has incredible data analytics and engineering support. Finally, we are extremely comfortable using it, so there is an inherent bias.

Integrated Development Environment or IDE:

Over the last couple of years, Microsoft’s Visual Studio Code (VSCode) has gained significant ground and, if I might add, has become the number one IDE out there. The extensibility, flexibility, plugins-support, built-in Linux support, etc. have all played a key role in bumping this new IDE right at the top competing with major players such as JetBrains.

One quick and relatively unknown point we’d like you to note, there is a plugin for Snowflake within the VSCode extensions marketplace. It’s called ‘Snowflake Driver for SQLTools’ developed by Peter Kosztolanyi and has so far 9,500 downloads (at the time of writing).

Git:

While there are several focused tools such as GitHub , Bitbucket , and Gitlab , in our experience, the git offerings with DevOps capabilities of the three public cloud providers, i.e. Azure , AWS , and GCP are the best. Their extensibility and connectivity to other tools within their ecosystem make them supremely beneficial.

Orchestrator:

Apache Airflow  is by far the best tool out there. Launching Airflow on AWS (MWAA ) is by far the best combination. We have also used Prefect and found it to be a pretty decent product.

ETL Tools:

In our humble opinion, Matillion  (Snowflake-specific) and Informatica  (general-purpose) take the cake here. Runners up would be Azure Data Factory  and AWS Glue . From an open-source ETL point of view, we would recommend Airbyte  and if you’re looking at only data transformations or T from ETL, then Data Build Tool (or Dbt) is a great tool. For serverless or event-based requirements, we generally used Azure/Lambda functions from public cloud providers. Finally, for sub-minute or sub-second streaming data, we’ve used Spark Structured streaming (AWS EMR  or Databricks ) and Apache Kafka (MSK) on the cloud.

Change Data Capture or CDC Tools:

We haven’t had much experience using CDC tools, generally, we worked on CDC cleverly using SQL. Recently, however, we’ve had the chance to use a market underdog – ‘HVR’. At first, it was a bit cumbersome, but over time the many features it has under its hood have convinced us of its usefulness and practical approach.

Data Catalog:

In this case, we would consider the best option that was Amundsen . However, it is open source, and with that comes a lot of careful handling and management. For those who’re not in favor of open-source, a commercial alternative is Alation.

Testing and quality:

From an automated testing and data quality management point of view, we’ve never used a commercial product. And since we’re on Python, we typically went for something like PyTest  and/or Great Expectations.

This summarizes my ideal development environment setup for Snowflake. 


Up Next:

Read SnowSQL Installation and Configuration