Let’s wrap this week up!
The last week was Python projec’s one for the Nimbus Intelligence Academy’s Analytics Engineers. The goal of the project was to collect data from an API (at least one!), store it in Snowflake, create an Entity Relationship Diagram, and then run a Python program with the Snowflake data.
In the next paragraphs, we will analyze what was needed to work with Python, Snowflake and APIs in order to get to the finish line.
As first : choose an API
The first pivotal step is choosing the right Application Programming Interface (API). As a matter of fact, by means of the connection to an API, it is possible to gather data. After that, you can store and print the gathered data and create your database.
- Choose an API: It is better to choose a Representational State Transfer APIs. Why? Because they focus on end-user readability and ease of consumption. A user who looks for information can make a call or request through a hypertext transfer protocol (HTTP) client. In this way it is possible to search for the needed information..
- Gather data from the API: If the end user is authorized to receive the information, then the client receives the information through HTTP, usually in XML or JSON. The last format is a lightweight data interchange that is easy for humans to read and write. It is based on two main data structures: objects and arrays. It is commonly used as a data format for web APIs.
For instance, let’s take the following API https://api.api-ninjas.com/v1/exercises?muscle={muscle} and see how to make requests to it using Python. Small tip: carefully reading the API documentation significantly reduces the time it takes to learn how to make requests.
def get_exercises_api():
import requests
import datetime
import os
import pandas as pd
#list with all the muscles from the API Documentation
list_muscles = ['abdominals', 'abductors', 'adductors', 'biceps', 'calves', 'chest', 'forearms', 'glutes', 'hamstrings', 'lats', 'lower_back', 'middle_back', 'neck', 'quadriceps', 'traps', 'triceps']
all_muscle_exercises = []
for muscle in list_muscles:
api_url = f'https://api.api-ninjas.com/v1/exercises?muscle={muscle}'
headers = {'X-Api-Key': '6+Db18QRpOUqcFWFoXpuxw==1dppKFAgQc8ljPKV'}
#Make the API request
response = requests.get(api_url, headers=headers)
#Check for Error
if response.status_code == 200:
#Extract the list of exercises from the response
exercises = response.json()
import requests : we use the Requests library in Python. The requests
library is the de facto standard for making HTTP requests in Python.
import pandas as pd: pandas is a Python library for data manipulation and analysis. To have an overview, click here: https://github.com/pandas-dev/pandas.
As second: Snowflake Connector for Python
Having choosen the API, there is need to link Python and Snowflake. We can use the Snowflake Connector for Python. It provides an interface for developing Python applications that can connect to Snowflake and perform all standard operations.
- Connecting to Snowflake – install connector. You can do both in Google Colab and in Python. As for the first option, we kindly suggest you to read https://nimbusintelligence.com/2023/03/import-python-scripts-in-google-colab/. In few lines, there is a precise and clear strategy to import Python scripts in Google Colab. If you want to use Snowflake in Python, be careful to use a Pandas-compatible version of the Snowflake Connector for Py. Do you want a clear blog about it? Read https://nimbusintelligence.com/2023/03/using-snowflake-data-in-python/.
- Snowflake Connector for Python. It establishes a connection to your own Snowflake instance and creates a new database and tables for the data. The needed code has to store the data from the Pandas DataFrame (pd.df) into the Snowflake tables.
pip install snowflake-connector-python
def check_exercises(df, U_EMAIL):
import snowflake.connector
import os
import pandas as pd
# Set up Snowflake connection
conn = snowflake.connector.connect(
user=os.environ["SNOWSQL_USR"],
password=os.environ["SNOWSQL_PWD"],
account=os.environ["SNOWSQL_ACC"],
warehouse=os.environ["SNOWSQL_WH"],
database=os.environ["SNOWSQL_DB"],
schema=os.environ["SNOWSQL_SCH"])
cur = conn.cursor()
# Close Snowflake connection
cursor.close()
conn.close()
# Convert results to dataframe
df = pd.DataFrame(results, columns=["E_ID", "E_CREATION", "E_NAME", "E_TYPE", "E_MUSCLE", "E_EQUIPMENT", "E_DIFFICULTY", "E_INSTRUCTIONS", "E_POINTS"])
- Now, we are ready to build the database in Snowflake. But first, it is better to create the ERD! Here a very good tool: https://www.lucidchart.com/pages/.
As third: store data on Snowflake
At this point, you can “manipulate the data to extract relevant information”. The Python code can be used to extract and transform the data that we retrieve from the API in order to display the specific information that we are interested in. In the following example, we want to know the ‘healthy points’ scored by the app’s user by doing sport/exercise.
SELECT U_EMAIL, COUNT(*) as total_ex, SUM(E_POINTS) as total_points
FROM USER_EXERCISE_LOG
LEFT JOIN EXERCISE ON USER_EXERCISE_LOG.E_ID = EXERCISE.E_ID
WHERE U_EMAIL = 'test@person.com'
GROUP BY U_EMAIL;
Last, but not the least: writing a documentation
At this point, the goal of the project is reached! However, it is time explain it to the audience. At the end of the day, your project is delivered by someone else and it is a good practice to make the documentation as clear as possible. In https://nimbusintelligence.com/2023/03/5-best-practices-for-documenting-your-python-snowflake-project/, it is clearly stated that “documentation not only helps developers and readers of the code understand the project but also helps stakeholders and users to make informed decisions”. If you have no idea how to do good documentation, then we invite you to read the blog! There are 5 precious tips that you cannot fail to follow!