Snowflake’s features and capabilities enable you to manage, protect, and share your data securely and compliantly. This is called Data Governance. You can implement data governance in Snowflake by using Snowflake Data Sensitivity & Access Visibility, which are classifications that let you control the access and usage of your data. These features are available in Snowflake Enterprise Edition.
You can conquer Snowflake Data Sensitivity & Access Visibility with Object Tagging and Data Classification.
Object Tagging
A tag is a schema-level object that can be assigned to another Snowflake object. You can assign to a tag an arbitrary string value upon assigning the tag to a Snowflake object. Snowflake stores the tag and its string value as a key-value pair. The tag must be unique for your schema, and the tag value is always a string.
A single tag can be assigned to different object types at the same time. At the time of assignment, the tag string value can be duplicated or remain unique.
Creating and Assigning Tags
To create a tag you need to execute the following commands. You can create a tag as a placeholder, and then add its values later with the ALTER TAG command, or just create it with its allowed values.
CREATE [ OR REPLACE ] TAG [ IF NOT EXISTS ] <name> [ COMMENT = '<string_literal>' ]
CREATE [ OR REPLACE ] TAG [ IF NOT EXISTS ] <name>
[ ALLOWED_VALUES '<val_1>' [ , '<val_2>' , [ ... ] ] ]
After you have created a tag, you can assign it to objects (most of Snowflake objects allow the tagging mechanism) or to specific columns of tables using this commands.
-- Add Tag to Objects
ALTER <object> <object_name> SET TAG <tag_name> = '<tag_value>';
-- Add Tag to Columns
ALTER TABLE <table_name>
MODIFY COLUMN <column_name>
SET TAG <tag_name> = '<tag_value>';
Monitoring Tags
After defining and assigning tags to Snowflake objects, tags can be queried to monitor usage on the objects to facilitate data governance operations, such as monitoring, auditing, and reporting.
To retrieve your tags, use the TAGS view in the ACCOUNT_USAGE schema of the shared SNOWFLAKE database. This view can be thought of as a catalog for all tags in your Snowflake account that provides information on current and deleted tags.
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TAGS
ORDER BY TAG_NAME;
Since you can assign tags to tables, views, and columns. By setting a tag and then querying the tag, you can discover many database objects and columns that contain sensitive information.
Upon discovery, you can determine how best to make that data available, such as selective filtering using row access policies, or using masking policies to determine whether the data is tokenized, fully masked, partially masked, or unmasked.
Data Classification
Data Classification is a multi-step process that associates Snowflake-defined tags (i.e. system tags) to columns. It does that by analyzing the cells and metadata for personal data; this data can now be tracked by a data engineer.
Based on the tracking information and related audit processes, you can protect the column containing personal or sensitive data with a masking policy or the table containing this column with a row access policy.
System Tags and Categories
System tags are tags that Snowflake creates, maintains, and makes available in the shared SNOWFLAKE database. There are two Classification system tags, both of which exist in the SNOWFLAKE.CORE
schema:
SNOWFLAKE.CORE.SEMANTIC_CATEGORY
SNOWFLAKE.CORE.PRIVACY_CATEGORY
The tag names, SEMANTIC_CATEGORY
and PRIVACY_CATEGORY
, correspond to the Classification categories that Snowflake assigns to the column data during the column sampling process:
- The semantic category identifies personal attributes. A non-exhaustive list of personal attributes Classification supports include name, age, and gender. These three attributes are possible string values when assigning the
SEMANTIC_CATEGORY
tag to a column - If the analysis determines that the column data corresponds to a semantic category, Snowflake further classifies the column to a privacy category. The privacy category has three values: identifier, quasi-identifier, or sensitive. These three values are the string values that can be specified when assigning the
PRIVACY_CATEGORY
Classification system tag to a column:- Identifier: These attributes uniquely identify an individual. Example attributes include name, social security number, and phone number
- Quasi-identifier: These attributes can uniquely identify an individual when two or more or these attributes are in combination. Example attributes include age and gender
- Sensitive: These attributes are not considered enough to identify an individual but are information that the individual would rather not disclose for privacy reasons
Classification Process
Data Classification simplifies to a three-step process: analyze, review, and apply. Each of these steps have different operations:
- Classification: you call the EXTRACT_SEMANTIC_CATEGORIES (‘<table_name>’) function in the Snowflake account. This function analyzes columns in a table and outputs the possible categories and associated metadata
- Review: you review the category results to ensure the results of the analyze step operations make sense. If no revisions are necessary, you can proceed to the apply step. If revisions are necessary you can revise the output of the analyze step before moving to the apply step
- Apply: now you can assign the system tag, both by manually setting a system tag on a column or by calling the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure. You can then track the system tags and protect data with a masking policy or a row access policy
You can read my previous blog on Data Access Policies, that are part of Snowflake Data Governance, here.
Useful Links
Snowflake Tagging: https://docs.snowflake.com/en/user-guide/object-tagging
Snowflake Data Classification: https://docs.snowflake.com/en/user-guide/object-tagging