Snowflake Massive Scale Testing — Part 1
WARNING… READ AT YOUR OWN RISK! USE AT YOUR OWN RISK!!
I DOUBLE DOG DARE YOU!
IT CANT BE DONE!
SNOWFLAKE CANT DO IT!
YOU CANT HANDLE THE TRUTH! (Jack’s best movie)
So how many times in my life have I been challenged? It can’t be done. The product won’t scale. Concurrency will kill the project. Or this project scale will crush our budget. Do we have a plan B?
Why do I get these projects? CAUSE I F’ING LOVE A CHALLENGE. GO BIG OR STAY HOME! If it was easy go find someone else to do it. John Thuma loves hard problems. So let’s dig in!
NOTE: THIS IS PART 1… PART 2 WILL SHOW YOU HOW TO BUILD THE DATA IMPORTATION SCRIPTS. THEN THE REAL FUN BEGINS.
THE PROBLEM: I had a customer come at me with a very interesting use case. I have 2,400 remote offices…
Each office has a SQL Server with 80 tables. I am replicating the data to S3 every 5 minutes. I need for Snowflake to pick up that data and take it to a BRONZE database in my data mesh. Eventually we will curate and consolidate in a SILVER tier and then make the dat analytically ready in a GOLD database. From start to finish the processes can’t last more than 10 minutes. Source to Bronze to Silver to Gold all data must be imported into Snowflake in 10 Minutes — Across 2400 sites.
Can I get access to your schema? NO. Can I make one up? YES.
Can I consolidate all data from all sources into one BRONZE database/Schema? NO. Each Schema must be unique for each source. NICE! No problem.
Below is the code to build 2400 schemas representing 2400 SQL Server database schemas with 80 tables in it. I wrote the code to use SNOWSCRIPT and it is very flexible. Do you need 4800 schemas? No problem. Do you need 900 tables? No problem. I even provided the ability to replicate virtual data warehouses.
If I needed to build 80 tables in 2400 schemas what would I do? First I would use ZERO COPY CLONE! This is a fast way to copy Databases/Schemas/and tables without actually making a real copy. But in this case it is real. Because I built a MODEL SCHEMA with MODEL tables. I replicated the tables by using a custom stored procedure I wrote. Once done I replicated the schemas using another stored procedure. The calls are below and the complete code for the stored procedures is below too. You could even build your entire real database tables in the MODEL schema and go from there. Below are all the calls to the stored procedures. The code for the stored procedures is below.
SO IF I NEEDED 80 tables:
call MAKE_TABLES(1,80);
show tables;
SO IF I NEEDED 2400 schemas:
call MAKE_SCHEMAS(1,2400);
show schemas;
SO IF I NEEDED 500 VIRTUAL WAREHOUSES:
call MAKE_WAREHOUSE(1,2400,’B’ );
show warehouses;
I even wrote another stored procedure to drop the warehouses.
The image below shows the databases and some of the schemas.
THE CODE AND STORED PROCEDURES:
BUILD THE OBJECTS:
— DROP DATABASE SCALE_BRONZE;
— DROP DATABASE SCALE_SILVER;
— DROP DATABASE SCALE_GOLD;
— CREATE MODEL DATABASES
CREATE DATABASE SCALE_BRONZE;
CREATE DATABASE SCALE_SILVER;
CREATE DATABASE SCALE_GOLD;
CREATE DATABASE SCALE_MANAGEMENT;
— CREATE SCALE_MANAGEMENT DATABASE/SCHEMA/LOGGING TABLES
USE DATABASE SCALE_MANAGEMENT;
CREATE SCHEMA LOG;
USE SCHEMA LOG;
— BUILD LOG TABLE
CREATE OR REPLACE TABLE “SCALE_MANAGEMENT”.”LOG”.”EVENT_LOG”
(
EVENT_ID VARCHAR
,EVENT_NAME VARCHAR
,EPOCH INTEGER
,EVENT_STATE VARCHAR
,TRIP INTEGER
,SCHEMA_COUNT INTEGER
,TABLE_COUNT INTEGER
,FILE_MODEL INTEGER
,TRIP_COUNT INTEGER
,DB_NAME VARCHAR
,SCHEMA_NAME VARCHAR
,TABLE_NAME VARCHAR
,WAREHOUSE_NAME VARCHAR
,NOTES VARCHAR
) COMMENT = ‘SCALE LOG EVENTS TABLE’;
CREATE SCHEMA CONSOL;
CREATE OR REPLACE TABLE “SCALE_MANAGEMENT”.”CONSOL”.”CONSOL_LOAD”
(
BLOB_FILE_NAME VARCHAR
,BLOB_FILE_ROW_NBR VARCHAR
,URL VARCHAR
,GAME_TYPE VARCHAR
,LOCATION VARCHAR
,DATE VARCHAR
,TIME VARCHAR
,WINNING_TEAM VARCHAR
,QUARTER VARCHAR
,SECONDS_LEFT VARCHAR
,AWAY_TEAM VARCHAR
,AWAY_PLAY VARCHAR
,AWAY_SCORE VARCHAR
,HOME_TEAM VARCHAR
,HOME_PLAY VARCHAR
,HOME_SCORE VARCHAR
,SHOOTER VARCHAR
,SHOT_TYPE VARCHAR
,SHOT_OUTCOME VARCHAR
,SHOT_DISTANCE VARCHAR
,ASSISTER VARCHAR
,BLOCKER VARCHAR
,FOUL_TYPE VARCHAR
,FOULER VARCHAR
,FOULED VARCHAR
,REGBOUNDER VARCHAR
,REBOUND_TYPE VARCHAR
,VIOLATION_PLAYER VARCHAR
,VIOLATION_TYPE VARCHAR
,TIMEOUT_TEAM VARCHAR
,FREETHROW_SHOOTER VARCHAR
,FREETHROW_OUTCOME VARCHAR
,FREETHROW_NUMBER VARCHAR
,ENTER_GAME VARCHAR
,LEAVE_GAME VARCHAR
,TURNOVER_PLAYER VARCHAR
,TURNOVER_TYPE VARCHAR
,TURNOVER_CAUSE VARCHAR
,TURNOVER_CAUSE_PLAYER VARCHAR
,JUMPBALL_AWAY_PLAYER VARCHAR
,JUMPBALL_HOME_PLAYER VARCHAR
,JUMPBALL_POSSESSION VARCHAR
) COMMENT = ‘CONSOLIDATION TABLE’;
USE DATABASE SCALE_BRONZE;
CREATE SCHEMA MODEL;
USE SCHEMA MODEL;
CREATE OR REPLACE TABLE “SCALE_BRONZE”.”MODEL”.”MODEL”
(
BLOB_FILE_NAME VARCHAR
,BLOB_FILE_ROW_NBR VARCHAR
,URL VARCHAR
,GAME_TYPE VARCHAR
,LOCATION VARCHAR
,DATE VARCHAR
,TIME VARCHAR
,WINNING_TEAM VARCHAR
,QUARTER VARCHAR
,SECONDS_LEFT VARCHAR
,AWAY_TEAM VARCHAR
,AWAY_PLAY VARCHAR
,AWAY_SCORE VARCHAR
,HOME_TEAM VARCHAR
,HOME_PLAY VARCHAR
,HOME_SCORE VARCHAR
,SHOOTER VARCHAR
,SHOT_TYPE VARCHAR
,SHOT_OUTCOME VARCHAR
,SHOT_DISTANCE VARCHAR
,ASSISTER VARCHAR
,BLOCKER VARCHAR
,FOUL_TYPE VARCHAR
,FOULER VARCHAR
,FOULED VARCHAR
,REGBOUNDER VARCHAR
,REBOUND_TYPE VARCHAR
,VIOLATION_PLAYER VARCHAR
,VIOLATION_TYPE VARCHAR
,TIMEOUT_TEAM VARCHAR
,FREETHROW_SHOOTER VARCHAR
,FREETHROW_OUTCOME VARCHAR
,FREETHROW_NUMBER VARCHAR
,ENTER_GAME VARCHAR
,LEAVE_GAME VARCHAR
,TURNOVER_PLAYER VARCHAR
,TURNOVER_TYPE VARCHAR
,TURNOVER_CAUSE VARCHAR
,TURNOVER_CAUSE_PLAYER VARCHAR
,JUMPBALL_AWAY_PLAYER VARCHAR
,JUMPBALL_HOME_PLAYER VARCHAR
,JUMPBALL_POSSESSION VARCHAR
) COMMENT = ‘MODEL TABLE’;
show tables;
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— NEED TO BUILD A STORED PROC THAT WILL REPLICATE MODEL FOR NUMBER OF TABLES
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
create or replace procedure MAKE_TABLES(SEED INTEGER, MAXTABLE INTEGER)
returns varchar
language sql
as
declare
t_name VARCHAR ;
begin
loop
t_name := ‘SCALE_BRONZE.MODEL.T’ || SEED::VARCHAR;
CREATE OR REPLACE TABLE identifier(:t_name) CLONE SCALE_BRONZE.MODEL.MODEL;
if (SEED > MAXTABLE) then
break;
end if;
SEED := SEED + 1;
end loop;
return ‘DONE’;
end;
call MAKE_TABLES(1,80);
show tables;
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— NEED TO BUILD A STORED PROC THAT WILL REPLICATE MODEL SCHEMA FOR A COUNT OF SCHEMA NEEDS
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
create or replace procedure MAKE_SCHEMAS(SEED INTEGER, MAXSCHEMA INTEGER)
returns varchar
language sql
as
declare
s_name VARCHAR ;
begin
loop
s_name := ‘S’ || SEED::VARCHAR;
CREATE OR REPLACE SCHEMA identifier(:s_name) CLONE SCALE_BRONZE.MODEL;
if (SEED > MAXSCHEMA) then
break;
end if;
SEED := SEED + 1;
end loop;
return ‘DONE’;
end;
call MAKE_SCHEMAS(1,2400);
show schemas;
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— NEED TO BUILD A STORED PROC THAT WILL AUTO CREATE A BUNCH OF WAREHOUSES
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
create or replace procedure MAKE_WAREHOUSE(SEED INTEGER, SMAX INTEGER, OTYPE VARCHAR)
returns varchar
language sql
as
declare
s_name VARCHAR;
begin
loop
s_name := ‘S’ || OTYPE::VARCHAR || ‘_’ || SEED::VARCHAR;
— CREATE OR REPLACE SCHEMA identifier(:s_name) CLONE SCALE_BRONZE.MODEL;
CREATE OR REPLACE WAREHOUSE identifier(:s_name) warehouse_size = ‘XSMALL’ initially_suspended=true AUTO_SUSPEND = 90;
if (SEED >= SMAX) then
break;
end if;
SEED := SEED + 1;
end loop;
return ‘DONE’;
end;
call MAKE_WAREHOUSE(1,2400,’B’ );
show warehouses;
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— NEED TO BUILD A STORED PROC THAT WILL DROP MODEL ALL WAREHOUSES FOR A COUNT OF SCHEMA NEEDS
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
create or replace procedure DROP_WAREHOUSE(SEED INTEGER, SMAX INTEGER, postFix VARCHAR)
returns varchar
language sql
as
declare
s_name VARCHAR;
begin
loop
s_name := ‘S’ || postFix || SEED::VARCHAR;
— CREATE OR REPLACE SCHEMA identifier(:s_name) CLONE SCALE_BRONZE.MODEL;
DROP WAREHOUSE identifier(:s_name);
if (SEED >= SMAX) then
break;
end if;
SEED := SEED + 1;
end loop;
return ‘DONE’;
end;
call DROP_WAREHOUSE(1,500,’B_’);
show warehouses;