Skip to main content

Let us add something new in What’s under the Snowflake(s)? In this post, we focus on the challenge of Week 37 – Basic from Frosty Friday. Topics: directory table, file urls and unstructured data.

Dear reader, this is SO TRUE! But, are you curious to know more about SnowPro Core exam? Read the dedicated blog post about it on Nimbus Intelligence blog.

What’s the goal?

How should the result be

A possible strategy

We can start by copy and paste the start up code. After having run the CREATE STORAGE INTEGRATION command, we have to create the external stage.

CREATE OR REPLACE STORAGE INTEGRATION week37_si
    TYPE = external_stage
    STORAGE_PROVIDER = 's3'
    STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::184545621756:role/week37'
    ENABLED = true
    STORAGE_ALLOWED_LOCATIONS = ('s3://frostyfridaychallenges/challenge_37/');

Since we want to have a directory view/table, in the script of the CREATE STAGE, we have to add DIRECTORY = (ENABLE = TRUE). In this way, a directory table is created with a stage. The stage references a bucket or container named load with a path of files. By means of STORAGE_ALLOWED_LOCATION we assure a secure access to the cloud storage location.

CREATE OR REPLACE STAGE external_stage_week_37
    URL = 's3://frostyfridaychallenges/challenge_37/'
    CREDENTIALS = (aws_role = 'arn:aws:iam::184545621756:role/week37')
    DIRECTORY = (ENABLE = TRUE
    REFRESH_ON_CREATE = TRUE);

LIST @external_stage_week_37;

By running the command LIST, we should see all the files in the external stage that you have created. There should be eleven files, as you can see in the below picture.

At this point, we need to create a directory table/view. Such table/view is similar to external tables in that it stores file-level metadata about the data files in a stage. For the challenge, it is needed. As a matter of fact, querying a directory table retrieves the Snowflake-hosted file URL to each file in the stage. A file URL permits prolonged access to a specified file. That is, the file URL does not expire. However, there are three types of file url: Scoped file URL, Stage file URL and Pre-signed URL.

Scoped file URL, stage file URL, Pre-signed URL

Since we have used a directory table, we can select the relative path from the output of a directory table query. Moreover, we can extract from it also the size and the file_url. If you want, you can also extract some other metadata like ‘last_modify’ (timestamp data type which indicates when the file was last updated in the stage), ‘MD5’ (HEX data type used as hash function producing a 128-bit hash value), and so on and so forth..

CREATE OR REPLACE VIEW table_from_external_stage_w37
AS SELECT 
   relative_path
  , size
  , file_url
  , build_scoped_file_url(@external_stage_week_37, relative_path) as scoped_file_url
  , build_stage_file_url(@external_stage_week_37, relative_path) as stage_file_url
  , get_presigned_url(@external_stage_week_37, relative_path) as presigned_url
FROM directory(@external_stage_week_37);
  • build_scoped_file_url:

BUILD_SCOPED_FILE_URL: Generates a scoped Snowflake-hosted URL to a staged file using the stage name and relative file path as inputs. A scoped URL is encoded and permits access to a specified file for a limited period of time.

  • build_stage_file_url

BUILD_STAGE_FILE_URL: Generates a Snowflake-hosted file URL to a staged file using the stage name and relative file path as inputs. A file URL permits prolonged access to a specified file. That is, the file URL does not expire.

  • get_presigned_url

GET_PRESIGNED_URL: Generates the pre-signed URL to a staged file using the stage name and relative file path as inputs.

By a select statement applied to the ‘table_from_external_stage_w37’, we will see the result as in the above picture.

However, there is need of few more words about the three kinds of file url. From the resulting view, it is possible to analyze the returns of each function. I suggest you to focus your attention to the following pictures! Curious to know more? Open the link in the picture’s description!


Isn’t unstructured data a really cool topic? I really believe so!

I hope you find this episode helpful in smashing the certification! In the meantime, you’ve already won another Frosty Friday challenge!

Auteur

  • Sofia Pierini

    She is a critical thinker who draws inspiration from the intersection of logic, coding, and philosophy. She holds a Master's Degree in Philosophy and graduated as a Data Engineer from the Nimbus Intelligence Academy, earning certifications in Snowflake and dbt. Strong-willed and multifaceted, she has demonstrated competence in various interdisciplinary fields. As the Leader of the Italian Snowflake User Group and a founding member of the Snowflake Squad, she is deeply committed to fostering a strong and dynamic Snowflake Community. Her passion for technology and progress is evident in her innovative approach to problem-solving. She enjoys hardcore punk music and loves coffee.

Sofia Pierini

She is a critical thinker who draws inspiration from the intersection of logic, coding, and philosophy. She holds a Master's Degree in Philosophy and graduated as a Data Engineer from the Nimbus Intelligence Academy, earning certifications in Snowflake and dbt. Strong-willed and multifaceted, she has demonstrated competence in various interdisciplinary fields. As the Leader of the Italian Snowflake User Group and a founding member of the Snowflake Squad, she is deeply committed to fostering a strong and dynamic Snowflake Community. Her passion for technology and progress is evident in her innovative approach to problem-solving. She enjoys hardcore punk music and loves coffee.