Welcome back to Killing JSON, the series in which I channel my frustration with loading data into teaching you something. Last week, we covered the ins and outs of the flatten function. This week, I’ll cover how to use it to load JSON data with the flatten function.
Setup
To set up this tutorial, I’m going to put some JSON data into a variant column (a column that can story any data type) in a table in Snowflake.
-- Create an external stage
create or replace stage my_stage
URL = 's3://snowflake-docs/tutorials/json';
-- Create a table with a variant column
create or replace table data (
col variant
);
-- Copy into this table
copy into data
from @my_stage/server/2.6/2016/07/15/15
file_format = (type = json);
The data looks like this (use select * from data
):
{
"device_type": "server",
"events": [
{
"f": 83,
"rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19",
"t": 1437560931139,
"v": {
"ACHZ": 42869,
"ACV": 709489,
"DCA": 232,
"DCV": 62287,
"ENJR": 2599,
"ERRS": 205,
"MXEC": 487,
"TMPI": 9
},
"vd": 54,
"z": 1437644222811
},
{
"f": 1000083,
"rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22",
"t": 1437036965027,
"v": {
"ACHZ": 6953,
"ACV": 346795,
"DCA": 250,
"DCV": 46066,
"ENJR": 9033,
"ERRS": 615,
"MXEC": 0,
"TMPI": 112
},
"vd": 626,
"z": 1437660796958
}
],
"version": 2.6
}
Querying Data
The data inside this variant column can already be navigated and accessed using valid keys:
-- Navigate using <column_name>:<key>:<key>
-- output: "server"
select col:device_type from data;
Retrieving Nested Fields
To retrieve nested fields, we can use the flatten function. But we will need to use a lateral join (using the lateral keyword) to query the flattened results. Let’s see what happens if we feed the column containing the variant data to the flatten function:
select *
from data,
lateral flatten(input => col) ;
COL | SEQ | KEY | PATH | INDEX | VALUE | THIS |
{«device_type»: «server», «events»: [{«f»: 83, … | 1 | device_type | device_type | null | server | {«device_type»: «server», «events»: [{«f»: 83, … |
{«device_type»: «server», «events»: [{«f»: 83, … | 1 | events | events | null | [{«f»: 83, … | {«device_type»: «server», «events»: [{«f»: 83, … |
{«device_type»: «server», «events»: [{«f»: 83, … | 1 | version | version | null | 2.6 | {«device_type»: «server», «events»: [{«f»: 83, … |
Very cool! The results of the flatten have been attached to our original data. (Note that each row in the COL column contains the entirety of the JSON data.)
We can also choose to call flatten on a specific nested field. The «events» column contains a lot of fields:
-- flatten on a specific field
select * from data,
lateral flatten( input=> col:events );
COL | SEQ | KEY | PATH | INDEX | VALUE | THIS |
{«device_type»: «server», «events»: … | 1 | null | [0] | 0 | {«f»: 83, «rv»: «15219.6… | [{«f»: 83, «rv»: «15219.6… |
{«device_type»: «server», «events»: … | 1 | null | [1] | 1 | {«f»: 83, «rv»: «15219.6… | [{«f»: 83, «rv»: «15219.6… |
We can now access the data within the «value» column by using the syntax for navigating json data:
-- get the field "f" within "events" and cast it to the number type
select
value:f::number
from data
, lateral flatten( INPUT => col:events );
VALUE:F::NUMBER |
83 |
1000083 |
Creating a Relational Table from JSON
To create a relational table from flattened JSON data, we first need to find the fields we would like to include in the final table.
select
col:device_type::string,
col:version::string,
col:f::number,
from data
, lateral flatten( input=> col:events );
COL:DEVICE_TYPE::STRING | COL:VERSION::STRING | VALUE:F::NUMBER |
server | 2.6 | 83 |
server | 2.6 | 1000083 |
Then, we can create the table from this select statement.
create table as flattened_data AS
select
data:device_type::string as device_type,
data:version::string as version,
value:f::number as f
from
raw_source,
lateral flatten( input => col:events );