Snowflake or Databricks

John Thuma
AWS Tip
Published in
4 min readOct 5, 2022

--

Should I go with Snowflake or Databricks? I hear this almost every day. It is a very important question. It is like the truth between any two foes: There is one version of the truth from one perspective, there is another version from the opposing perspective; then there is the truth. Let's dig in and decompose each organization and I will finish by giving my version of the truth.

First of all, why does my version matter? Because I have been working in this industry (Data) for 30 years. Every single day.

Preamble

Snowflake and Databricks are both fantastic organizations. They have either invented or re-invented the data management industry. I will not disparage any of their technology, people, or process. They do however compete ferociously against one another. I will say however that Snowflake takes the higher road competitively and Databricks is definitely more confrontational and aggressive. Both perspectives are respectable as long as it is honest and ethical. I will leave that discovery up to you.

About Snowflake

Snowflake started out because its founders understood and knew the truth about how users suffered with traditional relational OLAP solutions. Makes sense, they came from Oracle. They also understood how the cloud works. The founders didn't want to port an Oracle-like database over to the cloud as is. That would not solve the problems that the user base was experiencing. What were users suffering from: scale, performance, concurrency, and tons of expensive resources to keep the lights on! So they built Snowflake to solve these problems by taking all the good of a relational database platform and applying it to the cloud. The cloud allows for simple manifestation of environments with elasticity for size or scale.

Who competes with Snowflake directly? All cloud-based OLAP databases like: Redshift, Teradata, Oracle, Synapse, and Databricks. Yes, dare I say it Cloudera. Snowflake is starting to blur the lines a bit with Iceberg (Data Lake), SnowPark(Data Science/Data Engineering), Data Sharing/Marketplace(Third Party Data), and coming soon: Unistore (OLTP). Genuinely exciting but is the Snowflake roadmap too wide? Maybe. I will say this about Snowflake: A few quarters ago Snowflake released innovations that hurt its revenue. Snowflake made its compression on disk better as well as making its compute more efficient. It is estimated that this cost Snowflake 9% of its revenue going forward.

About Databricks

Databricks was born out of the frustration of the Hadoop vendors and two Apache projects: Hadoop and Spark. Databricks is the commercial entity of Apache Spark. Apache Spark was born out of frustration with Apache Hadoop and the commercial vendors where only one is left: Cloudera. Hadoop does not do well with concurrency and it has huge latency issues. Apache MapReduce is dead and was replaced with Apache Spark to remedy these limitations. Apache Spark has problems of its own and thus Databricks was born to take Spark to Enterprise.

Databricks is a pure development environment for data engineering, data streaming, and data science. It is great for micro-processes. It requires next-level skills to develop, support, and maintain. Databricks is definitely not for everyone. Human talent is hard to find and it is hard to keep in the barn. It's not that you have to be a Scala or Pyspark programmer but it is just the very nature of the platform. It is very technical. Why? Databricks requires a lot of tuning based on the use case, so you have to know what you are doing. It also takes a lot longer to get a solution to market. Databricks human resources cost 30% more than SQL-based platforms. It has been my experience that it takes 50% longer to get a solution to market or introduce change to an existing solution.

FACT: Even at the compute cost layer, Databricks is not more affordable than Snowflake. This is a myth and one perpetuated by Databricks. The total cost of ownership goes hand in hand with SNOWFLAKE as the lower-cost solution.

Who competes with Databricks? Leave out Snowflake and the database players. Databricks has no direct competitors in the marketplace other than Apache Spark. Apache Spark is not a good alternative to Databricks. Maybe a GPU platform?

The way forward for Snowflake and Databricks

Databricks and Snowflake need to work together. It is a 1+1=3 relationship. Working together they can be a more powerful force. The way forward is: Databricks for streaming ingest, rapid transformation, rapid scoring, and Snowflake for business user consumption. Databricks can also take advantage of Snowflake’s newest feature Snowpark for predictive model production.

They Hyper-scalers (GCP, AWS, and Azure) are going to come for you both and eventually catch up. One bad quarter and your stock might be ready for acquisition. So the path forward is for Snowflake and Databricks to get together and announce a powerful partnership.

Get over yourselves!

--

--

Experienced Data and Analytics guru. 30 years of hands-on keyboard experience. Love hiking, writing, reading, and constant learning. All content is my opinion.