Building Data Apps with Streamlit on Snowflake: A Step-by-Step Guide

From Data to Delight

7 min readOct 3, 2023

Last month, I had the opportunity to attend the Snowflake Data Cloud World Tour, which introduced me to the fascinating world of Snowflake. During the event, I discovered numerous exciting insights, including the ability to create Data Apps within the Snowflake dashboard and effortlessly share them with the world. This inspired me to explore building Data Apps within the Snowflake platform. In this article, we will explore the process of creating Data Apps on the Snowflake data dashboard.

Article Overview

Introduction to Streamlit
Why Building with Snowflake ❄️
Creating Data App
Considerations and Limitations
Wrapping Up with a Conclusion

Introduction to Streamlit:

Imagine turning your Python scripts into shareable web apps effortlessly, all without needing any front-end expertise. That’s exactly what Streamlit does. In recent years, it has become the go-to tool for creating data apps in Python. It’s so popular that 80% of Fortune 50 companies use it, and it has captured the hearts of hundreds of thousands of Data Engineer worldwide.

Why Building with Snowflake ❄️

Streamlit has been a trusted name in the data engineering world, with a history of enabling data app development since 2018. It has consistently been the top choice for building Data Apps. Major industry giants such as BlackRock, Instacart, Lacework, and others were already harnessing its power to create customer-facing apps within Snowflake.

Recognising the growing industry demand, Snowflake made an exciting move in early 2022 — they acquired Streamlit. Their objective was crystal clear: democratise data apps by merging the best data application framework with a secure, large-scale deployment platform.

Creating Data Apps

Step 1: Signup

Create an Account with AWS in the US Oregon Region

Step:2 Logging In and Activating Your 30-Day Free Trial

Creating Data App

What are we relay gonna make in this article?

We will beworking on the development of a hit rate calculator for DAAS projects, utilizing the Snowflake Marketplace and Streamlit. Now, you might be wondering what exactly hit rate is. Allow me to explain.

Ever wondered how websites maintain their performance while accommodating relentless web crawlers? Hitrate — a strategic gem that helps Data Engineer determine the optimal number of hits a website can handle without overloading the server or risking being blocked. It’s the secret sauce for achieving a seamless web crawling experience.

What is the method to determine the maximum hit rate?

Similarweb provides valuable analytics and web traffic data for domains, enabling us to determine the daily or hourly request limits. This information assists us in planning the timing for URL crawling based on our specific needs.

To achieve this I found Similarweb’s Traffic Growth dataset on Snowflake’s marketplace. I added it to my database by simply clicking on ‘Get’.

How to calculate Hitrate?

In this dataset, while there are numerous columns, we’re primarily interested in just two, for hit rate calculations:

TOTAL_ESTIMATED_VISITS: The total number of website visitors.
TOTAL_PAGES_PER_VISIT: The average number of pages per visit.

Now, let’s talk about playing nice with web servers. Just like in real life, online politeness is crucial. Web servers have rules, both unspoken and explicit, that dictate how fast we can crawl their content. To stay on their good side, we should limit our requests to just 8–10% of their capacity per day. It’s all about being a courteous web guest!

hitrate = (TOTAL_ESTIMATED_VISITS*TOTAL_PAGES_PER_VISIT)/30*(Politeness Policy/100)

Alright, now that you’ve got the math behind the calculation down, let’s put it into action.

Set Up Your Streamlit Environment

To build your Streamlit application in Snowsight(Snowflake UI), follow these straightforward steps:

Click on Streamlit on the left navigation menu.
Click on + Streamlit App on the top right
Enter App name
Select Warehouse (X-Small) and App location (Database and Schema) where you’d like to create the Streamlit applicaton. If you encounter a ‘no database found’ error, simply navigate to the ‘Data’ section in the menu bar, create a new database, and then give it another try.
Click on Create

Ta-daaaa!

This is a sample application that allows you to make instant code changes on the left and test them on the right for your data app.

Since I already have Similarweb’s Traffic Growth dataset in my database, I wrote a SQL query to extract the required data from it.

SELECT CLEAN_SITE, YEAR, MONTH, MAIN_CATEGORY, TOTAL_ESTIMATED_VISITS, TOTAL_PAGES_PER_VISIT
        FROM datafeeds.GLOBAL_GROWTH
        WHERE 
        clean_site = 'amazon.com'
        AND year = 2022
        AND month

Let’s enable this query using Streamlit and design a stunning user interface for it:

import streamlit as st
import re
import json
import requests
from math import floor
from snowflake.snowpark.context import get_active_session
import pandas as pd
import plotly.express as px

session = get_active_session()

# Create a dictionary to map month numbers to names
month_mapping = {
    1: 'January',
    2: 'February',
    3: 'March',
    4: 'April',
    5: 'May',
    6: 'June',
    7: 'July',
    8: 'August',
    9: 'September',
    10: 'October',
    11: 'November',
    12: 'December'
}

def similarweb_data_query(domain):
    # Load CO2 emissions data
    snow_df = session.sql(f"""
        SELECT CLEAN_SITE, YEAR, MONTH, MAIN_CATEGORY, TOTAL_ESTIMATED_VISITS, TOTAL_PAGES_PER_VISIT
        FROM GLOBAL_PUBLIC__PRIVATE_COMPANIES_TRAFFIC_GROWTH.datafeeds.GLOBAL_GROWTH
        WHERE 
        clean_site = '{domain}'
        AND year = 2022
        AND month
    """)
    
    #view the data table
    #st.table(snow_df)
    
    df = snow_df.to_pandas()
    # Use map to replace month numbers with names
    df['MONTH'] = df['MONTH'].map(month_mapping)
    
    TOTAL_ESTIMATED_VISITS_avg = df['TOTAL_ESTIMATED_VISITS'].mean()
    #st.write(TOTAL_ESTIMATED_VISITS_avg)
    
    TOTAL_PAGES_PER_VISIT_avg = df['TOTAL_PAGES_PER_VISIT'].mean()

    # Create a bar plot using Plotly Express
    fig = px.bar(
        df,
        x="MONTH",
        y="TOTAL_ESTIMATED_VISITS",
        labels={"MONTH": "Month", "TOTAL_ESTIMATED_VISITS": "Total Estimated Visits"},
        title=f"Total Estimated Visits vs. Month for {domain}",
    )
    
    st.plotly_chart(fig)

    
    return TOTAL_ESTIMATED_VISITS_avg, TOTAL_PAGES_PER_VISIT_avg


def human_format(num):
    num = float('{:.3g}'.format(num))
    magnitude = 0
    while abs(num) >= 1000:
        magnitude += 1
        num /= 1000.0
    return '{}{}'.format('{:f}'.format(num).rstrip('0').rstrip('.'), ['', 'K', 'M', 'B', 'T'][magnitude])


def result(Hit_rate,domain):
    Hit_rate_readable = human_format(Hit_rate)

    col1,col2 = st.columns(2)
    col1.metric("Hitrate", str(Hit_rate))
    col2.metric("Compact Hitrate", str(Hit_rate_readable))
    st.info(f"Your DAAS project can make a maximum of {Hit_rate_readable} daily hits for {domain}")


    
magnitudeDict={0:'', 1:'Thousand', 2:'Million', 3:'Billion', 4:'Trillion', 5:'Quadrillion', 6:'Quintillion', 7:'Sextillion', 8:'Septillion', 9:'Octillion', 10:'Nonillion', 11:'Decillion'}
def simplify(num):
    num=floor(num)
    magnitude=0
    while num>=1000.0:
        magnitude+=1
        num=num/1000.0
    return(f'{floor(num*100.0)/100.0} {magnitudeDict[magnitude]}')


#hitrate from webpage
def hitrate(domain,pp):
    hit_data = similarweb_data_query(domain)
    visit = int(float(hit_data[-1]))
    PagePerVisit = float(hit_data[0])
    print(visit, PagePerVisit)
    hitrate = (visit*PagePerVisit)/30*(pp/100)
    return result(hitrate,domain)


st.title("❄️ Hit Rate Calculator ❄️")
#st.header("**Enter the domain name: **")
try:
    form_1 = st.form(key='my-form1')
    url = form_1.text_input("Enter the domain name:")
    pp = form_1.number_input("Politeness Policy %: ",10)
    submit = form_1.form_submit_button('Submit')

    if submit:
        st.header("**Result**")
        if pp>10:
            st.success(f"It's Recommended To Follow The Politeness Policy: 10%.")
        hit_web_page = hitrate(url,pp)
        
except IndexError:
    #submit = form.form_submit_button('Submit')
    st.success(f'Input more then a Trillion is out of scope!')
except Exception as e:
    #submit = form.form_submit_button('Submit')
    st.write(e)
    st.success(f'Please Enter Valid Input!!')

Ready Steady Go!

Awesome! We’ve just witnessed how Snowflake can be leveraged to build real-world data applications, and it’s never been this easy and efficient before.

Considerations and Limitations

Please note that Streamlit is currently in Preview when integrated with Snowflake. Preview features are primarily provided for evaluation and testing purposes and should not be used in production systems or with production data.
Streamlit apps in Snowflake have a 16MB data retrieval limit per query. Going beyond this limit triggers a ‘MessageSizeError: Data Size exceeds message limit’ error.
Many Streamlit features, such as file_uploader, cache, and download_button, are currently unsupported in Streamlit within Snowflake.Read more on this…

Wrapping Up

As our community continues to grow, so does the demand for Data Apps. With more companies embracing modern data solutions, the need for quick and insightful decision-making tools has never been greater. In a vast landscape of transactional data spread across the internet, data companies have long sought a way to efficiently leverage their data for optimization.

Enter Snowflake — the game-changer that makes it all possible. And when paired with Streamlit, it becomes a dynamic duo, enabling users to make decisions faster and access valuable insights effortlessly, all on a single platform. Snowflake and Streamlit are shaping the future of data-driven decision-making and product enhancement. It’s a thrilling journey toward smarter and more efficient choices!

Join me on this data-driven adventure!

Connect with me on LinkedIn 🚀