As my colleague Gerard mentioned in his blog on features of Snowflake, warehouse elasticity is an important part of Snowflake. Being able to increase computing power when needed makes our job both easier and faster. Not all scaling options are the same however, and in this blog I will talk about warehouse elasticity and clarify the distinction between scaling up and scaling out.
Refresher: What is a warehouse
A Snowflake virtual warehouse (not to be confused with a data warehouse) is a cluster of compute resources. The warehouse is where the queries are executed at the cost of credits, Snowflake’s proprietary currency. Using multiple warehouses makes it easy to keep track of resource usage throughout the company through the use of resource monitors.
Warehouse sizes and scaling up
Each virtual warehouse you create has a designated ‘size’, referring to the amount of computing resources it has available. Larger sized warehouses can execute their queries at a higher speed, at the cost of extra credits for being active.
Where it gets interesting is that because you pay credits for the time a warehouse is active, and larger warehouses need less time to process queries, the total cost for moving to a larger warehouse does not have to be higher. Say that my query takes 2 hours to complete on an extra small warehouse, costing a total of 2 credits. On a one size larger warehouse the query takes only 1 hour, and while that hour is twice as expensive at 2 credits per hour, my final cost will still be 2 credits – and I have saved myself an hour for no additional cost.
So why not always take the largest warehouse with the most computing power? The drawback is that you will always pay for a minimum of one minute when activating a warehouse. That means that while a larger warehouse may cut my 10 second query down to 5 seconds, I will spend twice the amount on a minute of the larger warehouse. Scaling up to lager warehouses is viable when dealing with large queries taking more than one minute.
Multi-cluster warehouses and scaling out
A different kind of scaling becomes relevant when dealing with a large amount of smaller queries. If a warehouse has to deal with a stack of queries from one or more users, it will back up. People submitting the queries will have to wait for their requests to even be taken into consideration.
Multi-cluster warehouses allow temporary cloning of the warehouse to divide the queued queries among themselves. Each cluster has the same original warehouse size, at the same credit cost. This way a large amount of queries can be dealt with simultaneously in the most efficient way. While utilizing warehouse elasticity to increase the warehouse size is known as scaling up, enabling multi-cluster warehouses is known as scaling out.
A very big advantage of multi-cluster warehouses in Snowflake is that the scaling can be done automatically. An auto-scaled warehouse will only create clones of itself when the query load becomes a queue, and revert back to a single instance after. As such, there is no real reason to not have it enabled as a standard – though be aware that you will of course be paying for the additional clusters when they are needed.