Snowflake Massive Scale Testing — Part 2

John Thuma
5 min readOct 29, 2022

This is part 2 of the Snowflake Massive Scale Testing project I wrote a month back and am eager for you to check it out. The first part of this work is in the link below:

A quick background:

A customer of mine, while I was working at Snowflake, wanted to test out scalability and costs associated with moving data from 5 SQL Server databases in 2,400 locations. They wanted to perform change data capture between those locations every 5 minutes. They had doubts. So part one of this effort constructed the databases, schemas, tables, and other objects in order to test this scenario. This part of the scale test actually goes out and runs a BASH script to simulate the traffic. It uses SNOWSQL, which is the command line interface tool that comes with Snowflake. I kept it pretty simple with respect to their data model. I used some NBA basketball data and a file with 20 records in it to populate 80 tables. The cool thing is that I used stored procedures to build the objects so we could test this out against any number of objects you desired. The code below is 100% portable so it can be run in containers, on EC2 images, or on your desktop. I was able to load 48,000 tables at once using multiple runs of the same code in different iTerm windows every 5 minutes. Pretty sick.

I let it for a few hours on Sunday a few weeks ago and not only did it scale and perform but it did it affordably. I think the whole day was about the price of a fancy Starbucks coffee.

Anyway, the script is below and I look forward to reading your feedback, suggestions, or general hate messages of why this is all ridiculous!

# AUTHOR: JOHN THUMA
# NOTES: This is a prototype. There are better ways to improve on this tech. Small files should be consolidated or streamed in instead of Copy
# Going to need a round trips loop for the maximum number of iterations we want to make (4 — Four loads of 2400 schemas that have 80 tables)
# Going to need to loop for schemas and then for tables. so 2400 schemas and 80 tables per schema
# Master data directory containing MODEL files: /Users/jthuma/Desktop/BD-NRT/CODE/DATA
# Create a folder and Copy Seed MODEL file for each table in that directory and add edate and COLUMNS in each file for each table
# Get script ready to go: chmod -x scaleControl.sh
# Define input arguments for scaleControl.sh (Mandatory)
# -dbName Database we are going to populate (SCALE_BRONZE)
# -schemaCnt (Schema Loop Control Variable (schemaCntSeed to schemaCnt)) So we can do these in batches of 100
# -tableCnt (Table Loop Control Variable (1–80))
# -whPrefix Warehouse prefix (‘SB_’ for BRONZE)
# -schemaPrefix Schema Prefix value (‘S’)
# -tablePrefix Table Prefix value (‘T’)
# -fileModel The file we want to use to PUT & COPY (GOING TO COPY THIS FILE/RENAME/ADD COLUMNS)
# -sTrips The number of round trips I want to make (4) Outside loop to control the number of loads to simulate
# -sDir Seed directory
# -whScale A factor to rotate values of warehouse (This is a tunable option — number of maximum active warehouses at any given time)
# -tSleep Amount of seconds the system sleeps between trips. 300 is 5 minutes
# -lRole Security Role context to run snowsql
# -schemaCntSeed Start of the schema count so we can batch these up to run concurrently
#
# SAMPLE CALL
# chmod 777 scaleControlSEED_v2.sh
# RUNS 1 TRIPS — TESTING for 100 to 200 schemas for 80 tables.
# ./scaleControlSEED_v2.sh SCALE_MANAGEMENT 100 80 SB_ S T seed20.csv 1 /Users/jthuma/Desktop/BD_NRT/CODE/DATA 1 1 ACCOUNTADMIN CONSOL CONSOL_LOAD SB_1 jthuma sfsenorthamerica-az_demo170 99
# KEY LOCATION: /Users/jthuma/Desktop/BD-NRT/CODE/pkey

dbName=$1
schemaCnt=$2
tableCnt=$3
whPrefix=$4
schemaPrefix=$5
tablePrefix=$6
fileModel=$7
sTrips=$8
sDir=$9
whScale=${10}
tSleep=${11}
lRole=${12}
loadSchema=${13}
loadTable=${14}
loadWH=${15}
dbUser=${16}
conStr=${17}
schemaCntSeed=${18}

#DEFINE VARIABLES THIS WORKS
#edate=”S1_T1_”$(date +%s)

#CREATE DIRECTORY FOR DATA

#CREATE COLUMNS THIS WORKS

#awk -v var=$edate ‘BEGIN{ans=var} {print ans”,” NR “,” $0}’ seed20.csv

echo START $(date)

#BEGIN TRIP LOOP
for ((i=1; i<=$sTrips; i++))
do

inTrip=”TR”$i”_”
echo “TRIP: “$i
# TRIP LOOP
# DO STUFF TO PREPARE TRIP
# CREATE DIRECTORY FOR TRIP CONSOLIDATION
# MAKE DIRECTORY
tripDate=$(date +%s)
tripDirectory=$sDir”/”$inTrip$tripDate
mkdir -p $tripDirectory

cp $sDir”/”$fileModel $tripDirectory”/MODEL.csv2"

schemaTotal=$(($schemaCnt + $schemaCntSeed))

#BEGIN SCHEMA LOOP
for ((x=$schemaCntSeed; x<=$schemaTotal; x++))
do
aSchema=$schemaPrefix$x
echo “USE SCHEMA “$aSchema
# SCHEMA LOOP
# DO STUFF TO PREPARE FOR SCHEMAS



#BEGIN TABLE LOOP
for ((y=1; y<=$tableCnt; y++))
do
# TABLE LOOP
# DO STUFF TO PREPARE FOR TABLES

tDate=$(date +%s)
aTable=$tablePrefix$y
aSchemaTable=$inTrip$aSchema”_”$aTable”_”$tDate #THIS IS ALSO THE DIRECTORY NAME AND FILE NAMES

# COPY DATA FROM sDir (CHANGED TO SINGLE DIRECTORY — DIRECTORY STAYS CONSTANT) (MOVED THIS SO WE ONLY COPY ONCE)
#cp $sDir”/”$fileModel $tripDirectory”/”$aSchemaTable”.csv2"

# ADD STATE FIELDS TO file
awk -v var=$aSchemaTable ‘BEGIN{ans=var} {print ans”,” NR “,” $0}’ $tripDirectory”/MODEL.csv2" >$tripDirectory”/”$aSchemaTable”.csv”

# WAREHOUSE ROTATION (WORKING ON THIS)######################################## (keep it simple for now)
# I WANT TO FORCE A SINGLE WAREHOUSE
aWHouse=$whPrefix$whScale


done #END TABLE LOOP
#sleep 1

done #END SCHEMA LOOP
# CONSOLIDATE FILES
cat $tripDirectory”/”*.csv > $tripDirectory”/merged_”$tripDate”.csv”

# PUT FILES — BUILD THE FILE TO RUN IN SNOWSQL
# printf “USE ROLE $lRole; \nUSE DATABASE $dbName; \nUSE SCHEMA $loadSchema; \nUSE WAREHOUSE $loadWH; \nremove @%%CONSOL_LOAD; \nPUT file://$tripDirectory/merged_$tripDate.csv @%%$loadTable; \nCOPY INTO $loadTable \n” > $tripDirectory”/”$inTrip$tripDate”.sql”
printf “USE ROLE $lRole; \nUSE DATABASE $dbName; \nUSE SCHEMA $loadSchema; \nUSE WAREHOUSE $loadWH; \nPUT file://$tripDirectory/merged_$tripDate.csv @%%$loadTable; \nCOPY INTO $loadTable \n” > $tripDirectory”/”$inTrip$tripDate”.sql”

#RUN THE BEAST (NEED TO THINK- STOPPED HERE!)
iObject=$tripDirectory”/”$inTrip$tripDate”.sql”
echo $iObject
snowsql -a $conStr -u $dbUser -d $dbName — private-key-path /Users/jthuma/Desktop/BD_NRT/CODE/pkey/rsa_key.p8 -o friendly=false -o quiet=false -o output_format=html >> ./snowdb.html -f ${iObject} &

# BURST OUT USING STORED PROCEDURE

#SLEEP BETWEEN TRIPS (TURNED OFF FOR TESTING)
sleep $tSleep

done #END TRIP LOOP
echo FINISH $(date)
# CLEAN UP DIRECTORY (TURNED OFF FOR TESTING)
#rm -r “$sDir”/TR*
# MANUAL DELETE
# rm -r /Users/jthuma/Desktop/BD_NRT/CODE/DATA/TR*

--

--

John Thuma

Experienced Data and Analytics guru. 30 years of hands-on keyboard experience. Love hiking, writing, reading, and constant learning. All content is my opinion.