Skip to main content

As a follow up of my previous blog on SQL injection into Snowflake, I made a small demo about data validation in Snowflake. Data validation is the practice of checking the correctness of your data. This can be in terms of completeness or data assumptions, but also to prevent someone with malicious intend from interfering with your database.

New students

To show how incorrect data can enter your Snowflake database, we have an example table students. The table is shown in the output image below.

Results of students table without data validation in snowflake
Result students table

Now we assume that we get a new student to the school. The most common way to load data into a Snowflake table is with a stage and then the COPY INTO statement. If we test to inject a name with some SQL code mixed in there, Snowflake actually doesn’t care. It will load the data with the VARCHAR datatype, and nothing breaks.

CREATE OR REPLACE STAGE new_students_stage;

COPY INTO students FROM @new_students_stage/new_students.csv file_format=(type=csv);

SELECT * FROM students;

Executing the code above will lead to the following result.

Results of students table without data validation in snowflake
Result new students table

Stored procedure

We can select and update the name of Little Bobby Tables with no problem. However, when your database includes dynamically created SQL statements, the name could lead to trouble. Imagine that we input the name of a student with a stored procedure instead of a manual copy statement.

CREATE OR REPLACE PROCEDURE input_student_name(student_name VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
BEGIN
    LET query := 'INSERT INTO students (name) VALUES (\'' || student_name || '\')' ;
    EXECUTE IMMEDIATE :query;
END;

Now if we execute the code, it will form two queries – just as the malicious intend was of the person who performed the SQL injection. Luckily, Snowflake saves us again!

Snowflake recognizes that you tried to execute one query, but there were two queries passed to the API call. The error stops this from happening. Unfortunately, this also suspends any queries that need to be processed afterwards. Can we prevent this from happening?

Binding variables

Yes – when we remove the string concatenation and instead use binding variables. This is also recommended by Snowflake themselves in the documentation. The code below shows how to use binding variables.

CREATE OR REPLACE PROCEDURE input_student_name(student_name VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
BEGIN
    LET query := 'INSERT INTO students (name) VALUES (?)';
    EXECUTE IMMEDIATE :query USING (student_name);
END;

If we execute this stored procedure, the whole value of student_name is given to the insert statement. Any quotation marks inside this VARCHAR variable, will just be part of the name.

Another way for data validation in Snowflake would be to check the type of characters in the student name. We could limit the name to alphanumeric characters. Unless Elon Musk has set a trend by including special characters in a child’s name, this should suffice. Read my next blog for a demo on how you could use REGEX to validate the data that is loaded in your Snowflake database.