Skip to main content

The introduction of Snowflake Notebooks has made exploratory data analysis much easier and more accesible. Rather than having to set up an external connection through Snowflake from a Jupyter Notebook ran elsewhere, users can have their Notebooks directly in Snowsight and leverage Snowflake both as a data warehouse and a compute engine. However, with bringing data science code to Snowflake comes the need for collaborative sourcing and version control.

Remote Git repositories and Snowflake Notebooks

Snowflake supports a wide variety of remote git repositories, though not all of them to the same degree. Github, for example, supports oauth for connecting, while other platforms do not (yet). In this blog we will look at integrating a GitLab repository through project access tokens. This blog also does not consider the (public preview) Snowflake Workspaces. While Workspaces provide a more centralized way to connect to Git repositories, they do not at this time support editing Notebooks.

Setting up and storing a personal access token

To connect to a GitLab repository we will have to create a project access token. In your GitLab project, navigate to settings -> access tokens to get started. When adding a new project access token, we get to choose an expiry date, a role and a scope. For a Snowflake connection the token needs read and write access to your project, which you can cover by using the general ‘api’ scope. When you confirm to create a token, this will be the only time you can copy and save it. Tokens can be rotated (generated anew) and deleted, but not recovered when lost. Copy your token to a temporary safe place for now. Not that we will be using it twice further on in this tutorial.

Storing a secret

On Snowflake’s side, we will store this token as a secret. Secrets are designed to hold sensitive keys safely within Snowflake and use them as credentials. We will store the GitLab token as a password type, meaning we also require a non-blank username. This username at this point can be anything, and will not affect how actual pushes to the repository from a Notebook will be signed. The following SQL code can be used in a worksheet to create a secret:
CREATE OR REPLACE SECRET <database>.<schema>.gitlab_api_key
TYPE = PASSWORD
USERNAME='USERNAME@EMAIL.COM'
PASSWORD = '<GITLAB TOKEN HERE>';

Note that a secret is a schema-level object. While it won’t be shown in Snowsight while exploring your database structure, it is in fact stored with a fully qualified name and location. If you only provide a name, the secret will be stored in the database and schema currently selected in your worksheet. We will need the fully qualified name for our next SQL, where we create a git API integration.

Creating an API integration

CREATE OR REPLACE API INTEGRATION gitlab_api_integration
API_PROVIDER = git_https_api
API_ALLOWED_PREFIXES = ('https://gitlab.com')
ALLOWED_AUTHENTICATION_SECRETS = (<database>.<schema>.gitlab_api_key)
ENABLED = TRUE;

The integration, unlike the secret, is an account-level object. Any Snowflake user going to use this git integration will need explicit access to it through their role. As such, make sure to GRANT USAGE on both the secret and the integration to the relevant roles.
GRANT USAGE ON SECRET gitlab_api_key TO ROLE <NOTEBOOK_ROLE>;
GRANT USAGE ON INTEGRATION gitlab_api_integration TO ROLE <NOTEBOOK_ROLE>;

Connecting the GitLab repository

Connecting an existing Notebook to Snowflake

Select the API Integration and secret we made previously

If you already have a pre-existing Jupyter notebook (.ipynb) in your git repository ready to go, you can link this directly to Snowflake. Navigate to Notebooks under the Projects header, click the downwards arrow next to the ‘+ Notebook’ button in the upper right, and select ‘Create from repository’. After naming the notebook (this will be its name in Snowflake), you get to select the file location in the repository.

As this is the first time you are connecting to the repository, you will now click ‘Create Git repository’. This opens a dialogue window that allows you to paste your repository URL, and give it a name for Snowflake. For ‘API Integration’, select the integration we made in the previous step. Select a database and schema to save the repository, and add a comment if needed. Now toggle the Authentication slider to enabled, and select the secret we made in the first step of this tutorial. Congratulations, you have now succesfully linked you Git repository! Next time you add a file from the same repository, you can select the existing repository object and skip this paragraph.

The first time a Snowflake user links a file from a repository, Snowflake will also want user-specific credentials, used to sign off on any commits and pushes made by you. As the Gitlab API key does not have an inherent user linked, you can re-use the same token as your Personal Access token (in fact, every user can use the same token), and add your own Author name and Author email to identify your commits. These credentials will be saved by Snowflake for further use. Of course you are also free to create seperate personal access tokens for each user. This can make it easier to rotate and revoke tokens when needed.

Connecting a Snowflake-created Notebook to Git

If instead you already created a Notebook on the Snowflake side, and want to now push this to your Git repository, this is also possible. With the Notebook open, press ‘Connect Git Repository’ in the upper left corner. Similarly to the process above, if this is your first time linking Snowflake to this particular Git repository, you will have to make the repository object and select the previously created API integration and secret to establish the connection. If the repository already exists as a schema-level object, you can select it instead.

Note Snowflake automatically creates or overwrites a folder with the Notebook name in the linked repository. If this is the first time you (as a Snowflake user) pushes a file to this repository, Snowflake will want to store personal credentials and ask you once again for your personal access token. As outlined above, this can be either the same PAT for every user or a seperate one for each user, as Gitlab does not consider them user-specific.

  • Chris Verweij
    : Author

    Molecular biologist turned analytics engineer, taking both her knowledge and frustrations about data management in a laboratory environment to the traineeship. She lives with two black cats and a thousand books in Wageningen.

Chris Verweij

Molecular biologist turned analytics engineer, taking both her knowledge and frustrations about data management in a laboratory environment to the traineeship. She lives with two black cats and a thousand books in Wageningen.

Leave a Reply