Killing JSON: Using Flatten to Load JSON Data

Welcome back to Killing JSON, the series in which I channel my frustration with loading data into teaching you something. Last week, we covered the ins and outs of the flatten function. This week, I’ll cover how to use it to load JSON data with the flatten function.

Setup

To set up this tutorial, I’m going to put some JSON data into a variant column (a column that can story any data type) in a table in Snowflake.

-- Create an external stage
create or replace stage my_stage
  URL = 's3://snowflake-docs/tutorials/json';

-- Create a table with a variant column
create or replace table data (
  col variant
);

-- Copy into this table
copy into data
  from @my_stage/server/2.6/2016/07/15/15
  file_format = (type = json);

The data looks like this (use select * from data):

{
    "device_type": "server",
    "events": [
      {
        "f": 83,
        "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19",
        "t": 1437560931139,
        "v": {
          "ACHZ": 42869,
          "ACV": 709489,
          "DCA": 232,
          "DCV": 62287,
          "ENJR": 2599,
          "ERRS": 205,
          "MXEC": 487,
          "TMPI": 9
        },
        "vd": 54,
        "z": 1437644222811
      },
      {
        "f": 1000083,
        "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22",
        "t": 1437036965027,
        "v": {
          "ACHZ": 6953,
          "ACV": 346795,
          "DCA": 250,
          "DCV": 46066,
          "ENJR": 9033,
          "ERRS": 615,
          "MXEC": 0,
          "TMPI": 112
        },
        "vd": 626,
        "z": 1437660796958
      }
    ],
    "version": 2.6
    }

Querying Data

The data inside this variant column can already be navigated and accessed using valid keys:

-- Navigate using <column_name>:<key>:<key>
-- output: "server"
select col:device_type from data;

Retrieving Nested Fields

To retrieve nested fields, we can use the flatten function. But we will need to use a lateral join (using the lateral keyword) to query the flattened results. Let’s see what happens if we feed the column containing the variant data to the flatten function:

select *
from data, 
lateral flatten(input => col) ;

COL	SEQ	KEY	PATH	INDEX	VALUE	THIS
{«device_type»: «server», «events»: [{«f»: 83, …	1	device_type	device_type	null	server	{«device_type»: «server», «events»: [{«f»: 83, …
{«device_type»: «server», «events»: [{«f»: 83, …	1	events	events	null	[{«f»: 83, …	{«device_type»: «server», «events»: [{«f»: 83, …
{«device_type»: «server», «events»: [{«f»: 83, …	1	version	version	null	2.6	{«device_type»: «server», «events»: [{«f»: 83, …

Joining raw data with flattened data. In this case, THIS is the entirety of the JSON data.

Very cool! The results of the flatten have been attached to our original data. (Note that each row in the COL column contains the entirety of the JSON data.)

We can also choose to call flatten on a specific nested field. The «events» column contains a lot of fields:

-- flatten on a specific field
select * from data, 
lateral flatten( input=> col:events );

COL	SEQ	KEY	PATH	INDEX	VALUE	THIS
{«device_type»: «server», «events»: …	1	null	[0]	0	{«f»: 83, «rv»: «15219.6…	[{«f»: 83, «rv»: «15219.6…
{«device_type»: «server», «events»: …	1	null	[1]	1	{«f»: 83, «rv»: «15219.6…	[{«f»: 83, «rv»: «15219.6…

Flattening the «events» field.

We can now access the data within the «value» column by using the syntax for navigating json data:

-- get the field "f" within "events" and cast it to the number type
select
  value:f::number
 from data
 , lateral flatten( INPUT => col:events );

VALUE:F::NUMBER

1000083

Creating a Relational Table from JSON

To create a relational table from flattened JSON data, we first need to find the fields we would like to include in the final table.

select 
  col:device_type::string,
  col:version::string,
  col:f::number,
from data
, lateral flatten( input=> col:events );

COL:DEVICE_TYPE::STRING	COL:VERSION::STRING	VALUE:F::NUMBER
server	2.6	83
server	2.6	1000083

Then, we can create the table from this select statement.

create table as flattened_data AS
  select
    data:device_type::string as device_type,
    data:version::string     as version,
    value:f::number          as f
  from
    raw_source,
    lateral flatten( input => col:events );