Skip to main content
Data LoadJSONSnowflakeSQLUnstructured Data

Killing JSON: Using Flatten to Load JSON Data

By November 17, 2023No Comments

Welcome back to Killing JSON, the series in which I channel my frustration with loading data into teaching you something. Last week, we covered the ins and outs of the flatten function. This week, I’ll cover how to use it to load JSON data with the flatten function.

Setup

To set up this tutorial, I’m going to put some JSON data into a variant column (a column that can story any data type) in a table in Snowflake.

-- Create an external stage
create or replace stage my_stage
  URL = 's3://snowflake-docs/tutorials/json';

-- Create a table with a variant column
create or replace table data (
  col variant
);

-- Copy into this table
copy into data
  from @my_stage/server/2.6/2016/07/15/15
  file_format = (type = json);

The data looks like this (use select * from data):

{
    "device_type": "server",
    "events": [
      {
        "f": 83,
        "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19",
        "t": 1437560931139,
        "v": {
          "ACHZ": 42869,
          "ACV": 709489,
          "DCA": 232,
          "DCV": 62287,
          "ENJR": 2599,
          "ERRS": 205,
          "MXEC": 487,
          "TMPI": 9
        },
        "vd": 54,
        "z": 1437644222811
      },
      {
        "f": 1000083,
        "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22",
        "t": 1437036965027,
        "v": {
          "ACHZ": 6953,
          "ACV": 346795,
          "DCA": 250,
          "DCV": 46066,
          "ENJR": 9033,
          "ERRS": 615,
          "MXEC": 0,
          "TMPI": 112
        },
        "vd": 626,
        "z": 1437660796958
      }
    ],
    "version": 2.6
    }

Querying Data

The data inside this variant column can already be navigated and accessed using valid keys:

-- Navigate using <column_name>:<key>:<key>
-- output: "server"
select col:device_type from data;

Retrieving Nested Fields

To retrieve nested fields, we can use the flatten function. But we will need to use a lateral join (using the lateral keyword) to query the flattened results. Let’s see what happens if we feed the column containing the variant data to the flatten function:

select *
from data, 
lateral flatten(input => col) ;
COLSEQKEYPATHINDEXVALUETHIS
{“device_type”: “server”, “events”: [{“f”: 83, …1device_typedevice_typenullserver{“device_type”: “server”,
“events”: [{“f”: 83, …
{“device_type”: “server”, “events”: [{“f”: 83, …1eventseventsnull[{“f”: 83, …{“device_type”: “server”, “events”: [{“f”: 83, …
{“device_type”: “server”, “events”: [{“f”: 83, …1versionversionnull2.6{“device_type”: “server”, “events”: [{“f”: 83, …
Joining raw data with flattened data. In this case, THIS is the entirety of the JSON data.

Very cool! The results of the flatten have been attached to our original data. (Note that each row in the COL column contains the entirety of the JSON data.)

We can also choose to call flatten on a specific nested field. The “events” column contains a lot of fields:

-- flatten on a specific field
select * from data, 
lateral flatten( input=> col:events );
COLSEQKEYPATHINDEXVALUETHIS
{“device_type”: “server”, “events”: …1null[0]0{“f”: 83, “rv”: “15219.6…[{“f”: 83, “rv”: “15219.6…
{“device_type”: “server”, “events”: …1null[1]1{“f”: 83, “rv”: “15219.6…[{“f”: 83, “rv”: “15219.6…
Flattening the “events” field.

We can now access the data within the “value” column by using the syntax for navigating json data:

-- get the field "f" within "events" and cast it to the number type
select
  value:f::number
 from data
 , lateral flatten( INPUT => col:events );
VALUE:F::NUMBER
83
1000083

Creating a Relational Table from JSON

To create a relational table from flattened JSON data, we first need to find the fields we would like to include in the final table.

select 
  col:device_type::string,
  col:version::string,
  col:f::number,
from data
, lateral flatten( input=> col:events );
COL:DEVICE_TYPE::STRINGCOL:VERSION::STRINGVALUE:F::NUMBER
server2.683
server2.61000083

Then, we can create the table from this select statement.

create table as flattened_data AS
  select
    data:device_type::string as device_type,
    data:version::string     as version,
    value:f::number          as f
  from
    raw_source,
    lateral flatten( input => col:events );

Auteur

  • Edoardo Draetta

    Dedicated Analytics Engineer with a strong background in physics, bringing a unique blend of analytical rigor and scientific acumen to the world of data-driven decision-making. With a passion for unraveling complex phenomena and a keen eye for patterns, my solid foundation in physics equips me with a deep understanding of quantitative analysis, statistical modeling, and computational techniques, enabling me to develop innovative solutions to intricate problems.

Edoardo Draetta

Dedicated Analytics Engineer with a strong background in physics, bringing a unique blend of analytical rigor and scientific acumen to the world of data-driven decision-making. With a passion for unraveling complex phenomena and a keen eye for patterns, my solid foundation in physics equips me with a deep understanding of quantitative analysis, statistical modeling, and computational techniques, enabling me to develop innovative solutions to intricate problems.

Leave a Reply