Skip to main content
This DALL-E generated image is the banner of this blog post, guiding the reader through the topics of the Snowflake Data Engineer Certification

Road to Snowflake Data Engineer: Storage and Data Protection

Third blog of the Road to Snowflake Data Engineer series. If you missed the first two, I’ll link them here

  1. Road to Snowflake Data Engineer: Data Movement
  2. Road to Snowflake Data Engineer: Performance Optimization

Road to Snowflake Data Engineer is five-parts blog series, I dive into the five domains assessed in the Snowflake SnowPro® Advanced Data Engineer certification exam, as outlined in the official SnowPro Advanced: Data Engineer Certification Exam Study Guide. Writing about the concepts that are covered in the guide helps me memorize and prepare for the exam.

If you’re still in the process of preparing for the Snowflake SnowPro Core Certification, I recommend you read some of these blog posts first. They collect the experience of some of my fellow Nimbus Intelligence colleagues, who provide their own approaches to obtaining the certification. Perhaps, their study methods are more aligned to what works for you. If you’re aiming for the Snowflake Data Engineer Certification, keep reading.

Snowflake’s data recovery features

Time Travel and Fail-safe are the main data recovery features Snowflake offers.

Snowflake’s Time Travel feature is a game-changer in data recovery. It allows users to access historical data up to a certain point in time. This capability is pivotal for undoing accidental changes and recovering lost data. For example, if a table is mistakenly dropped, Time Travel can be used to restore it to its state before the incident occurred, as long as the recovery is within the defined retention period. Read more in Time Travel Made Easy with Snowflake

While Time Travel provides a robust method for data recovery, Snowflake also offers an additional layer of protection called Fail-safe. This feature kicks in when the Time Travel period expires. Fail-safe ensures that all data is protected from permanent loss for an additional seven days after the Time Travel period ends. During this time, Snowflake retains the data in a read-only state, safeguarding it against both accidental deletions and disasters, thereby providing peace of mind for data reliability and compliance.

From my cheat sheet:

  • These features are included standard for all accounts, i.e. no additional licensing is required; however, standard Time Travel is 1 day.
  • Extended Time Travel (up to 90 days) requires Snowflake Enterprise Edition.
  • Both Time Travel and Fail-safe require additional data storage, which has associated fees.

How Snowflake streams impact Time Travel

Streams in Snowflake track changes in tables, enabling real-time data processing and analytics. However, it’s crucial to understand their impact on Time Travel. When a stream is used on a table, it affects the retention period of the data by marking the timestamps of changes. This means that while Time Travel can revert a table to a specific point in its history, the presence of streams might limit the availability of certain data states depending on how changes are tracked and logged.

Analyze micro partitions using system functions

Snowflake stores data in micro-partitions that are automatically managed by the platform. Understanding the structure and optimization of these partitions is crucial for performance tuning.

The SYSTEM$CLUSTERING_INFORMATION function is vital for analyzing how well data is clustered around specified cluster keys. This function returns information about the clustering depth, which indicates the level of data sorting within the micro-partitions. Efficient clustering enhances query performance by minimizing the scan range, leading to faster and more cost-effective data retrieval.

From my cheat sheet:

  • You can use  SYSTEM$CLUSTERING_INFORMATION to return clustering information for any columns in the table. Regardless of whether a clustering key is defined for the table.
  • You can use SYSTEM$CLUSTERING_DEPTH to calculate the depth for any columns in the table. Regardless of the clustering key defined for the table.
  • The number of distinct values (i.e., cardinality) in a column/expression is critical to selecting it as a clustering key. A column with very low cardinality might yield only minimal pruning. At the other extreme, a column with very high cardinality is also typically not a good candidate.

Creating new development environments using Time Travel and cloning

Cloning in Snowflake is not just about creating exact replicas of databases, schemas, or tables; it’s a powerful feature for development and testing. By cloning an object, you can make changes without affecting the production data. This functionality is integral when validating changes or experimenting with new features.

You can combine Time Travel and cloning to test changes in a safe environment. After cloning an object, you can apply and test changes, leveraging Time Travel to revert the clone back to its original state if needed. This way, you can promote to production validated changes only.

If a new feature or change leads to unexpected results, Time Travel allows you to rollback the changes by restoring the clone to a previous state. This functionality ensures continuous integrity and stability in the development process.

From my cheat sheet:

  • If the cloning operation for a database or schema fails, it is for one of these reasons:
    • Time Travel time is at or before the database creation time
    • Time Travel time is beyond the retention time of any current database child
  • When cloning a database, these will be cloned: 
    • Tasks
    • Streams
    • Materialized views
    • Pipes that reference an External (named) stage

    Cloning will not clone Internal named stages and external named tables.

Resources to pass the Snowflake Data Engineer Certification Exam:

Auteur

  • Darko Monzio Compagnoni

    Before becoming an analytics engineer, I worked in marketing, communications, customer support, and hospitality. I noticed how each of these fields, in their own way, benefit from decisions backed by data. Which fields don’t, after all? After spotting this pattern, I decided to retrain as a self taught data analyst, to then complete the Nimbus Intelligence Academy program and graduating as an Analytics Engineer obtaining certifications in Snowflake, dbt, and Alteryx. I'm now equipped to bring my unique perspective to any data driven team.

Darko Monzio Compagnoni

Before becoming an analytics engineer, I worked in marketing, communications, customer support, and hospitality. I noticed how each of these fields, in their own way, benefit from decisions backed by data. Which fields don’t, after all? After spotting this pattern, I decided to retrain as a self taught data analyst, to then complete the Nimbus Intelligence Academy program and graduating as an Analytics Engineer obtaining certifications in Snowflake, dbt, and Alteryx. I'm now equipped to bring my unique perspective to any data driven team.

Leave a Reply