Loading Data into Snowflake with Efficiency

  • How-Tos FAQs
  • November 23, 2021
Get Started Transforming Your Data in Snowflake - feature img

Let’s pretend you are in charge of running a weekly survey about the impact of COVID on people’s day-to-day lives and analyzing the results of the data. All the information is currently in a number of csv files, but you want to get the data into Snowflake in order to more easily analyze and visualize the data. There are several ways of loading data into Snowflake with efficiency, each with various technical complexities and pros and cons, outlined as follows.

3 Options for Loading Data into Snowflake with Efficiency

Option #1: Automate csv upload via no code UI

Utilizing the Snowflake UI to load data works for one-off uploads; however, it can become quite cumbersome to manually create each table if there are numerous tables to be loaded. In this case, you may need a more automated approach to load multiple csv files. One such option is to use Datameer , a no-code solution that connects directly to Snowflake. Once you have a Datameer account, you can connect to your Snowflake instance and create a new project. Now, to load your survey data simply upload your csv file through the UI:

Choose your database destination:

Loading Data into Snowflake with Efficiency

Click upload and that’s it! The data will be immediately available in Snowflake, and you don’t have to go through the process of creating each column prior to uploading the file.

Option #2: Upload data via Snowflake UI

The first option to load the data would be to simply upload it via the Snowflake UI. Once you are in Snowflake and a database is created, you can click “Create a table”:

Loading Data into Snowflake with Efficiency

Give your table a name, and now you will need to create the columns one-by-one, with their respective datatypes. For example, for survey data I would create the following table:

Loading Data into Snowflake with Efficiency

 

After the table is created you can now load data from a csv file:

Loading Data into Snowflake with Efficiency -3

Select the file and work through the prompts to load the file. To ensure the full file is loaded, select “Stop loading, rollback and return the error”. This causes the entire file to fail, rather than only certain rows. You can then modify the issue based on the error log, reload the file and ensure that all data has been loaded properly.

 

Option #3: Using Python Pandas module combined with the Snowflake SQLAlchemy package

Datameer is a great option for non-coders or those working with a medium number of files, but what happens when you have hundreds of survey files that need to be loaded into Snowflake? You can still go through the Datameer upload process, or you could use a programming language such as Python. Python can connect directly to Snowflake, and by utilizing the Python Pandas module combined with the Snowflake SQLAlchemy package, you can easily load large amounts of data.

In this example, we will assume that each survey is a separate csv file with the appropriate headers, and that each file will be loaded into Snowflake as a separate table in the same database and schema. After setting up your python environment and installing the required packages (pandas, snowflake-connector-python, sqlalchemy, snowflake-sqlalchemy), a simple program can be run to loop through a list of files and load them into Snowflake. An example of the code can be similar to as follows:

mport sqlalchemy as sql
import pandas as pd
import os
from sqlalchemy.dialects import registry
 
# Setup an SQL Alchemy Engine connection
registry.register('snowflake', 'snowflake.sqlalchemy', 'dialect')
engine = sql.create_engine(
    'snowflake://{u}:{p}@{a}/{d}/{s}?warehouse={w}&role={r}'.format(
        u='USERNAME',
        p='PASSWORD',
        a='account.region',
        r='ROLE_NAME',
        d='DATABASE',
        s='SCHEMA',
        w='WAREHOUSE_NAME',
    )
)
 
# List of all survey filenames
csv_input_filepaths = [
    'survey1.csv',
    'survey2.csv',
    'survey3.csv',
    'survey4.csv',
]
 
try:
    # Process each path
    for path in csv_input_filepaths:
 
        # Create table with filename (i.e. "survey1.csv" will load with a table name of survey1)
        filename, _ext = os.path.splitext(os.path.basename(path))
 
        # Use pandas to load data with headers and use existing datatypes to populate schema
        data = pd.read_csv(path)
 
        # Store into Snowflake with table name from code above
        data.to_sql(filename, engine, index = False)
 
finally:
    # Disconnect engine
    engine.dispose()

Conclusion: Loading Data into Snowflake with Efficiency

Modify the csv filepaths according to your specifications, and the python program will load each file in a fraction of the time of loading It through any UI.

All in all, there are a few different ways that you may want to load data into Snowflake, depending on your technical acumen and the amount you are trying to load. Python is a great tool for quickly loading large number of data files. However, if you don’t know the Python language, any efficiency gains could be lost by the time it takes to get up to speed on programming. In that case, Datameer could be an ideal option for not only simple loading of files into Snowflake, but also other data transformations that can be handled through an easy-to-use UI.

Continue reading

Check out: How can I view row counts of all tables in a Snowflake database?

Related Posts

Top 5 Snowflake tools for Analysts- talend

Top 5 Snowflake Tools for Analysts

  • Ndz Anthony
  • February 26, 2024