Skip to main content

Let’s say that we have a database stored in Snowflake with some data that we want to use in a Machine Learning model set up in Python. Is there a way to use Snowflake data directly in Python? Lucky for us, yes there is.

Setup

Snowflake Python Connector

First of all, we have to install the connector that will let Python communicate with Snowflake. The first thing that we have to do is figure out the python version that we have installed. In the command line (I am currently using W11) execute

python --version

For me, it is Python 3.11.2. Now we can install the correct version of the connector

pip install -r https://raw.githubusercontent.com/snowflakedb/snowflake-connector-python/v3.0.0/tested_requirements/requirements_311.reqs

You should change the requirements_<python_version>.reqs at the end for your specific version. You can also change the connector version if needed (see other versions).

pip install snowflake-connector-python==3.0.0

Modify the 3.0.0 for the specific version that you have installed in the previous step.

Creating test data (optional)

You can create the test data that I used for this example with the following commands in Snowflake

CREATE DATABASE python_database;

CREATE TABLE python_database.public.cool_table(
    id INT AUTOINCREMENT PRIMARY KEY,
    name VARCHAR,
    email VARCHAR,
    num VARCHAR
);

INSERT INTO python_database.public.cool_table (id, name, email, num) VALUES 
(1, 'Seana Maunders', 'smaunders0@samsung.com', '555-983-6281'),
(2, 'Leanna Entres', 'lentres1@miibeian.gov.cn', '956-437-2011'),
(3, 'Cherin Geydon', 'cgeydon2@stumbleupon.com', '692-616-2135'),
(4, 'Klarrisa Thurlbeck', 'kthurlbeck3@ocn.ne.jp', '767-554-5346'),
(5, 'Dwayne Hurling', 'dhurling4@cbc.ca', '418-885-5011'),
(6, 'Rochette Ballham', 'rballham5@rambler.ru', '325-152-2767'),
(7, 'Brenna Fruish', 'bfruish6@google.com.hk', '766-239-1379'),
(8, 'Al Deare', 'adeare7@networksolutions.com', '845-560-8781'),
(9, 'Felicdad McClarence', 'fmcclarence8@salon.com', '129-698-9707'),
(10, 'Kiley Readhead', 'kreadhead9@instagram.com', '107-619-2098');

Make sure to have at least a warehouse available as well.

CREATE WAREHOUSE IF NOT EXISTS compute_wh
    WAREHOUSE_SIZE = XSMALL
    AUTO_SUSPEND = 60
    INITIALLY_SUSPENDED = TRUE;

Creating a new user (optional)

In order to connect Python and Snowflake together, we need to provide Python with the information to log in into Snowflake. To do this, we will have to create a file (or hardcode a variable in Python) with the user login information. Since we may not want to have our Snowflake password stored in an unencrypted file we can create a user with limited privileges in our Snowflake account.
We can do it with the following commands

CREATE ROLE python_role;

CREATE USER python_user
    LOGIN_NAME = 'python_user'
    PASSWORD = 'python_password'
    DEFAULT_ROLE = python_role;

GRANT ROLE python_role TO USER python_user;

Now we must provide some privileges to the python user so that he can extract some data from the account. We give it access to the database, schema, all the tables in the schema and usage on an x-small warehouse.

GRANT USAGE ON DATABASE python_database TO ROLE python_role;
GRANT USAGE ON SCHEMA python_database.public TO ROLE python_role;
GRANT SELECT ON ALL TABLES IN SCHEMA python_database.public TO ROLE python_role;
GRANT USAGE ON WAREHOUSE compute_wh TO ROLE python_role;

Verifying installation

We can use the test script that Snowflake itself provides. Save a python script (e.g. validate.py) with the following information

import snowflake.connector

# Gets the version
con = snowflake.connector.connect(
    user='python_user',
    password='python_password',
    account='<account_identifier>'  # Change this to your own account
    )
cs = con.cursor()
try:
    cs.execute("SELECT current_version()")
    one_row = cs.fetchone()
    print(one_row[0])
finally:
    cs.close()
ctx.close()

Upon execution of this script, we should obtain the current Snowflake version that is being used.

python validate.py # should return something like 7.7.4

Loading data to Python

Once the install is verified, we are finally able to use our Snowflake data in Python.

import snowflake.connector

con = snowflake.connector.connect(
    user='python_user',
    password='python_password',
    account='<account_identifier>',
    warehouse='compute_wh',
    database='python_database',
    schema='public'
)

for row in con.cursor().execute('select * from cool_table'):
    print(row)

If everything was set up correctly, we should now see the following printed on the terminal when running the script.

(1, 'Seana Maunders', 'smaunders0@samsung.com', '555-983-6281')
(2, 'Leanna Entres', 'lentres1@miibeian.gov.cn', '956-437-2011')
(3, 'Cherin Geydon', 'cgeydon2@stumbleupon.com', '692-616-2135')
(4, 'Klarrisa Thurlbeck', 'kthurlbeck3@ocn.ne.jp', '767-554-5346')
(5, 'Dwayne Hurling', 'dhurling4@cbc.ca', '418-885-5011')
(6, 'Rochette Ballham', 'rballham5@rambler.ru', '325-152-2767')
(7, 'Brenna Fruish', 'bfruish6@google.com.hk', '766-239-1379')
(8, 'Al Deare', 'adeare7@networksolutions.com', '845-560-8781')
(9, 'Felicdad McClarence', 'fmcclarence8@salon.com', '129-698-9707')
(10, 'Kiley Readhead', 'kreadhead9@instagram.com', '107-619-2098')

Therefore, when executing the SQL command as we did, we can obtain the rows from the table stored in Snowflake as a set of tuples, one for each row of the queried table.
Next up, we will see how on better and more efficient ways to obtain this data (i.e. using Pandas).

Auteur