Long Read; Databricks vs. Snowflake; the Final Chapter

Ideal Customer Profile, Macroeconomics and Streaming all tie in to the fundamental endgame for the data giants

Jul 10, 2025

Introduction

Over the last couple of years, we’ve seen something unprecedented in Data. Databricks and Snowflake have both become $1bn+ revenue companies, AI has taken off, and data has never been more important than ever.

However, this has caused an immense amount of suffering and headache for the Data Community. The rapid growth means that there are no longer established categories, vendors tread on each other’s toes, and it’s no longer obvious what engineers should do.

Two of the biggest culprits for this are Databricks and Snowflake.

In a previous article, we showed how Databricks and Snowflake were looking to attack different streaming use-cases and potentially go after companies like Confluent.

The Streaming Wars 2.0

What not to miss in Real-time Data and AI in 2025

medium.com

We’ll now see this is indeed, driving a wedge in ICP between Databricks and Snowflake and finally putting an end to what we believe is a “phony” rivalry between the two firms.

This will fundamentally put the two companies on very different paths, catering to very different personas doing very different things. We are therefore more and more likely to be in an age where companies have complex data platforms and require integration extensively.

Let’s dive in!

In this post I’ll make three key points:

1. The core moats of Snowflake and Databricks’ businesses are under attack — their icebergs are melting

2. Both of them need to pivot to win giant new territories — go find a new iceberg

3. For the flex, I will propose a possible and extreme path forward

Databricks and the Flink flinch

In party political terms, the “Apache Spark voter” is the electoral base for Databricks. But Spark is 11 year-old technology — as a technology moat, that’s positively Jurassic.

The Achilles’ heel for Spark is the growing number of low-latency real-time workloads — think fraud detection, personalization, increasingly AI agents. This weak spot was introduced in my Streaming Wars 2.0 post:

What is missing [in Databricks] is a real-time, event-based architecture. Sure, you can process huge batches of data quickly with Spark Structured Streaming, but it is not easy and I believe it is poorly suited to many of the event-based, operational workloads “Across the Atlantic”.

If you want to see a Brickster blink twice, ask about Apache Flink and enterprise customers’ ultra-low-latency workloads. Or mention the new “streamhouse” concept being developed by Ververica (an Alibaba spin-off), combining Flink with new technologies Apache Paimon and Fluss. Mention Clickhouse and their massive Series C, so they can keep going after Databricks (and Snowflake) workloads.

Clickhouse entered the chat. And are going after worklkoads with the flink peeps

Counter-measures to protect Spark are airborne: an under-reported release at DAIS was Spark Real-Time Mode. Spark is Databricks’ original open-source project — it’s the project for which Databricks has the most ‘moral authority’, to use an Index Ventures term.

As such, it’s incredibly important to the Databricks leadership that Spark doesn’t get “DeltaLaked” — it can’t afford for its signature technology to be caught on the wrong side of a strategic inflection point.

You’ve got Databricks Snowflake war all wrong; Tabular Acquired for $1bn

Databricks’ acquisition of tabular show the goal is far greater

medium.com

If you want big data processing and you want it done fast, vote spark! Cry Databricks — “Flink is faster easier and cheaper. The discerning voter votes flink”, cry Ververica..

New territories for Databricks

Databricks can slow the melt-rate on its Spark iceberg. But the real trick is to keep expanding to new territories on the Risk game board. From “Unified Analytics Platform”to “The Data Lakehouse” to “The Data Intelligence Platform”. Where next?

Fundamentally Databricks can go up the stack towards the business user, or down the stack towards the full-stack developer and DevOps.

Now, the DAIS keynotes were a tour de force of a valley tech company at its peak dropping great product after great product. No sign of that “difficult second album” at Moscone! But the myriad releases didn’t speak clearly to going up the stack OR going down the stack, it seemed to be a mix of both

Databricks One — clearly going up the stack towards business users
Agent Bricks — interesting because it’s an agent builder, but it’s low-code. So it’s going up the stack towards “early majority IT”: if you like Alteryx or Matillion, you will love Agent Bricks
Databricks Apps — kind of mixed. The folk writing Databricks Apps will be general app developers and typical Databricks “builders”, but the apps should make data that is inside Databricks more accessible to Line of Business types — I don’t get this, I don’t think people who use Databricks want to build entire apps on it
Databricks LakeBase / Neon acquisition — definitely going down the stack towards full stack engineers, vibe coders and AI engineers, targeting the massive TAM that is OLTP (far bigger than ETL or AI!). Can be fundamental for agents, agent usage driven by business users

Databricks will go down the stack to picks and shovels

The moves up the stack definitely had some of the base worried — this was a banger of a post from Glen McCracken:

“Databricks used to make developers fast. Now? Too much visual fluff. Too many clicks. We don’t need another “low-code” agent builder.””

But more fundamentally as a strategic question: why bother going up the stack and building into multiple lines-of-business when you have 22k technical fans who go to all your concerts and will buy every album you put out?

AWS re:Invent had 60k in-person attendees in 2024; Databricks with 22k in-persons at DAIS 2025 is already a third of the way there with way less than 30% of the primitive coverage… By way of comparison — Confluent’s Current 2024 event for Kafka engineers only pulled in 4k attendees *including online*. That’s fucking exceptional attendance.

Databricks’ Summit featured a folksy pre-recorded “I’m so excited!” chat between Ali Ghodsi and Satya Nadella. Databricks is one of only three full-OEM partnerships that Azure has — the others being OpenAI (which has a lot of baggage) and Redis (in the Honeymoon period) — why? Because Azure Databricks prints money for Azure.

And so I ask: is the real competition for Databricks a data warehousing & analytics company run by a seasoned ex-googler who loves adtech, or is it the hyperscaler that underpins so many Databricks logos?

In my view the play is becoming increasingly obvious.

Databricks will aim to win the hearts and minds of executives building on Azure. “Don’t do it on Azure. Of course they have some primitives, but databricks is better, faster, and cheaper. Just do it over here instead”. Databricks wants to turn Azure into what AWS was for Snowflake — an increasingly low level set of core primitives (S3 and EC2 boxes turn to ADLS and VMs).

The YOLO play: buy Nebius

At some point, if you’re building all of the compute, storage, even networking services of a hyperscaler, why not speedrun to the end and *be* a hyperscaler?

Of course Databricks could still run nicely on the big three clouds, but for their 22k+ true fans they could offer an end-to-end cloud experience. Finally Databricks would own its own “route to the sea” — also providing a ton of negotiating leverage in talks with Azure and Amazon.

How to get there fast? The most obvious way is to acquire one of the fashionable new “GPU clouds”. CoreWeave is far too expensive, but Nebius, which houses a ton of hardcore engineers from Yandex Cloud, is a steal at $12bn.

Snowflake’s spiritual calling

Vincent van Gogh failed as an art dealer and preacherman. Michael Jordan wanted to be a pro baseball player. Jim Carrey tried to make it as a serious dramatic actor. They all eventually found what they were naturally good at — genius comes from within.

Snowflake wants to be a Silicon Valley AI factory — the core piece of AI infrastructure for people building with AI. This has been all over the Snowflake Summit and Snowflake World Tours for the last two years.

From the glossy keynote announcements, to the in-perosn Sam Altman interview — Snowflake really want you to believe that data processing and Cortex is all you need to build productin-grade AI applications.

In spite of this, budget-holders in blazers at Summit still outnumbered the builders with stickers on their laptops by 5 to 1. There were also many traditional IT folks in polos and slacks — all having a wonderful time.

This is because business folk appreciate what Snowflake’s seem unable to: that Snowflake has an *incredible* natural talent as a data platform to deliver incredible next-gen line of business use cases and applications; especially ones with a heavy infusion of AI.

Software engineers at Netflix are not building their AI applications on Snowflake, but Bob McGraham (VP of Enterprise Data Architecture at United Liberty WestNorthern Community Trust) is now delivering a digital and AI transformation for a bank with $20bn in assets at lightning speed — something never seen before.

Snowflake’s iceberg isn’t melting because of Apache Iceberg — Apache Iceberg is just a storage format, one which Snowflake has done a great job of embracing.

In fact, I would say Snowflake is almost certainly the de facto option for anyone wanting to use Iceberg on AWS that doesn’t already do Databricks. It certainly isn’t AWS Glue or S3, necessarily.

Snowflake’s iceberg is melting because many people who like iceberg look at Snowflake like Jeff Bezos looks at breakfast octopus. Not necessarily Snowflake users, but Iceberg suitors.

One of the most fascinating stories ever is Jeff Bezos having breakfast octopus.

So you must be thinking: what in the blue hell is breakfast octopus?

medium.com

“Run Iceberg on Snowflake? Use Snowflake to run my agents? Ha! You must be joking! I’d rather have the breakfast Octopus!”

Central to this idea is cost, efficiency and ego.

It seems like the measure of a builder in data engineering is by their ability to do things faster and cheaper. The ultimate BSD is someone that can reduce latency to a few minutes while decreasing cost by 10x.

Time does not matter. Total cost of Ownership does not matter. “Your SQL Compute Margin is my opportunity to use DuckDB, or Clickhouse” is what the people think — and indeed, many of them don’t just think, but do as well.

This talk by Jake Thomas from Okta is a great introduction to this playbook, and where Jake leads other data engineers follow.

Snowflake wants to appeal to builders but builders are psycologically rigged to get around Snowflake’s business model — which is not a happy family.

Snowflake need to go up the stack

If the builders want to commoditize compute but the business folks can’t get enough of you, then the correct direction of travel is clear.

Head up the stack into the various Lines of Business, starting with the ones that are most data and AI heavy.

Copy-paste the composable/headless playbook that imploded the packaged CDP category and roll out to every other line of SaaS: every “source of truth”, every “system of record”. One by one.

Who is really Snowflake’s nemesis in this world? Is it Databricks? Or is it the father of SaaS, Salesforce, who wants to sell their own Data Cloud, just dropped $8bn to buy Informatica, and is now locking up the Slack data so rivals (and customers) can’t train AI on it.

Oh look. Salesforce want to run your warehouse.

Every large SAAS tool or “Business Application suite” (whatever you want to call it) can see the way things are headed and needs to pull as much data and AI into their sphere of influence as possible.

It is all tied to psychology, gravitational pull and spheres of influence.

If you’re a CIO and you already spend $20m a year with SAP, and you’re VP North America at SAP; what do you do? Do you open up your platform and tell your customer to spend another $5m on other tools, or do you acquire and convince them with a few golf trips to “just do it in SAP”?

There are so many examples of this. Fabric no longer want you to go with Databricks; their “better together” rhetoric is slashed because you can “just do it all in azure”

ServiceNow have acquired data.world. Salesforce have Informatica and the Salesforce Data Cloud (and of course, AgentForce). SAP are heavily partnering with Databricks. These are not moves that say “use Snowflake for your business critical AI workloads” — they are aggressive and designed to attack Snowflake’s moat.

We have seen this before, in marketing, an area which is typically very data-heavy.

Snowflake and other vendors realised that instead of sucking up all your data into a “Customer Data Platform” (or CDP) and analysing that data, figuring out where your marketing spend was working and where it wasn’t, and then pumping it out to different marketing automation tools, you could separate it out.

This created the implosion of packaged CDPs. For many of us data laymen, the first we probably heard of this was when Hightouch started plastering “composable CDP” everywhere — this was the tip of the iceberg and symptomatic of a much larger movement away from traditional CDPs like Adobe that were locking up Marketing spend while delivering awful returns.

CDPs pale in comparison to companies like Salesforce, ServiceNow, and SAP. Imagine what happens when these tools become composable too. They are not waiting for companies like Snowflake to figure this out — they are deploying counter-measures like “Data Clouds” and Databricks partnerships as fast as possible.

Snowflake going up the stack into Lines of Business doesn’t actually take away from all of their product build-out, from native apps and Cortex to the Crunchy Data OLTP and Openflow.

It just means that a primary audience for those features will be developers at ISV partners who are attacking these Line of Business workloads in a Snowflake-native way. Powered by Snowflake can become super strategic, and Snowflake can focus basically all theri spend on Sales and Marketing (which they alreayd do!!!!)

How Snowflake increased ROI by 300% by visiting Cannes

Building Orchestra I spend a lot of time getting our name out there by going to Summits. I went to San Francisco, which was lit, but then I was dismayed to learn many Snowflake reps were going to Cannes and I wasn’t.

Why? The last time I went was when I stayed in a mansion with JUUL where we had hosted the e-cig wizard (the mansion was booked for a few weeks extra, so we made use of it) — and it was very fun. I would return.

Turns out it was because there is an event called Cannes Lions. Kind like the Oscars for advertising where CMOs fly in and LARP as characters from a Sofia Coppola movie.

The whole event screams “I collect contemporary art but also understand the semiotics of header bidding”. It made sense to me as I neither have enough money to collect art nor understand anything about header bidding — maybe I will be invited next year (although advertising and media companeis still seem to insist on doing massive airflow deployments, so probably not).

The Snowflake folks in Cannes look like they’re having a great time, posting on LinkedIn in floaty white linens and hunting down their share of the Global 2000’s martech & adtech budget. They are moisturized, happy, in their lane, focused, and flourishing.

Snowflake CEO Sridhar Ramaswamy did not attend, but perhaps he should: After all, he ran Ads & Commerce at Google, and has a unique point-of-view on the future of search from building Neeva.

Sridhar would be a great addition on industry panels on yachts with bare feet explaining the massive disruption coming to digital media and customer experience through LLMs.

Clearly, there are folks at Snowflake who understand that Cannes (and NRF, and Money 20/20) is where they need to hang out.

There are Snowflake folk that understand that the SaaS walled gardens and their “best of suite” Counter-Reformation is fundamentally hostile to the interests of modern data-driven enterprises.

I wrote about this over a year ago. What if all SAAS applications were built on Iceberg? Or at the very least, what if the crucial data ended up in centralised storage (iceberg) without you having to do a massive ELT process…would you do it?

Migrating to an iceberg lakehouse: key architecture considerations

Apache Iceberg might be the coolest topic in data right now. When should your organization seriously consider…

medium.com

Why Apache Iceberg is heralding a new era of change in Data Engineering

“Bring Your Own Storage (BYOS)” has never been cooler

medium.com

I would and I already do, and so do many companies. Prequel already exists because people get that they just want their data in their environment so Preql are creating a standard to help do it.

With Orchestra we don’t use iceberg as our back-end for obvious reasons, but we open our database which means you can use dlt to load it into your destination instead.

What if every vendor did this? What if they dumped everything in iceberg for you — wouldn’t that be great? No more ELT costs and no more horrible APIs to wrangle, plus an incentive model that makes sense;

Does Snowflake leadership understand the rich potential of composable CRMs, CDPs and so on? It cannot exactly shout about it if it does. It would start a highly destructive value-war with many many companies.

The YOLO play: acquire Hubspot

Salesforce has expanded into multiple adjacent and competitive categories. Why can’t Snowflake come up the stack and leverage all of its underlying primitives to offer a better, AI and warehouse-native CRM?

Customer system of record is a winner-takes-all market so there isn’t an easy-to-digest $3–5bn pureplay CRM for Snowflake to pick up. Pipedrive’s market penetration is way too low and SMBish, and building on one of the new AI CRM startups like Clarify.ai or Attio would take far too long.

That leaves Hubspot, at a market cap of $29bn (versus Snowflake at $74bn). Hubspot has a fantastic CRM business which is expanding into enterprise, no Data Cloud pretensions and a decent tier-two martech business.

On a personal note I would love it if Hubspot was in Snowflake. A real-time event stream of CRM data (form fills etc.) combined with my real-time product data basically allow me to use Orchestra to automate my entire Customer Success and Outreach function. I can’t do that today because the data feed from Hubspot to Snowflake isn’t “live” enough.

This is not a move to make Snowflake use Hubspot to dominate the CRM market. It’s a move to dominate the CRM market for Snowflake customers.

How big is the bit in Blue? Probably quite big

If you’re a Snowflake customer, and Snowflake can offer you live integrated CRM with no ELT costs, that’s fucking cool and every marketing executive should be bouncing off the walls to get that shit.

Why would go to Attio or Clarify or whatever when Hubspot on Snowflake gives you your CRM data to you on a silver platter for free?

Sure, the Attios could integrate more tightly with ELT tools, but the service is still not the same. It’s still architecturally clunky. But what if they did what Orchestra does, and give you access to your metadata in real-time? What if they built out that part of their back-end (the analytical data) on Snowflake and pushed it to you in a data share?

That would be the utlimate flex. Snowflake becomes a big stack bully. It doesn’t say “Hey look, Snowflake customers now have a CRM. I’m going to take my CRM to non-Snowflake customers and see if they want it”.

That’s a tough ol’ game. Instead they say “Hey Snowflake customer that uses a CRM that’s not ours — you’ve got all these issues with [Salesforce], remember those things you said you wanted to do in Cannes? Ha yeah those things. Well we can help you do that now, and the best thing is those pesky folks in IT always harping on about “openness” and “Owning our data”, well they were actually right — which is why CRM [blah], [blah] and [blah] you can still use because they’ve recently built on Snowflake too. Now how about that dinner I owe you?”

Snowflake force GTM teams everywhere to start segmenting their prospects in a new way; by data warehouse.

With SAAS experiences fully-integrated with Snowflake being SO much better than ones that aren’t, you either go with that or go with one that ingerates with Snowflake.

Snowflake wins in both cases, and forces workloads like Hubspot’s tier 2 martech automations into Snowflake unleashing the power of composability which is what us data folks want to do anyway.

This is how “Powered by Snowflake” goes mainstream.

Conclusion — the Snowflake Databricks chess war is over

Microsoft were very close to buying Neon too, and apparently moved too slowly.

Databricks are making up ground in SQL Land for Snowflake.

Broker Reports — this is just SQL revenue

Both are expanding aggressively into AI and new use-cases. There is a phony war going on where Databricks are getting SQL revenue but clearly it is not entirely at Snowflake’s expense, based on the chart above.

Databricks’ plan is pretty clear. It’s agents for their existing user base. Snowflake also want AI for their existing user base.

The strategy is working. There is just so much gravitational pull into Databricks and Snowflake they can almost sell whatever they want — someone is going to buy it.

I do not believe the two companies can continue to spat with each other and actually make money and grow how they need to. Snowflake are under attack from the big SAAS vendors — they should go after them with a suite of tools, marketing and sales playbooks designed to mirror the composable CDP story.

Ensuring those companies have an easy-to-use workflow builder will be critical.

Imagine if a business user could harness the power of a data engineer with guardrails?

The whole value prop of centralising these SAAS tools into Snowflake in a composable way hinges on business users (or not just the data team) being able to automate AI workflows using Data in Snowflake.

This is not possible with existing tooling and architecture. Data Teams require 10+ tools to integrate and frameworks like Airflow are hard to use and broken, and certainly not built in an AI-Native way.

Executing on these two parts of the strategy, companies using Snowflake and a Snowflake-native CRM could get immense value and break free of million dollar Salesforce contracts and million dollar Salesforce implementations. This is a huge huge huge value-add.

I used to work for a scale-up with 300 people who sepnt about £300k and 6 months on a Salesforce implementation. Incredibly expensive for what it was.

Don’t forget — the Snowflake-native CRM becomes a new standard of CRM for Snowflake Customers . This forces other CRMs to “build on Snowflake” and be “powered by Snowflake”, which in turn drives Snowflake revenue through data landing in Snowflake, expanding budget for Data Teams as CRM companies take the burden of responsibility for ELT.

This is attractive for practitioners, because we don’t need to spend any money with Fivetran or other ELT vendors for SAAS data. It just ends up in our warehouse.

Fundamentally, Snowflake start selling to the Data Teams and the Business users for operational and commercial use-cases.

On the other hand, Databricks end up focussing solely on builder personas, technical data teams and especially those in Azure. They could buy Nebius or similar, but building their own Cloud is going to be very difficult to pull off.

This is because much data is already in Azure-owned VPCs. They would need to get their customers comfortable with moving data out of their Azure VPC, and for that Azure could retaliate with similar things to like the AWS S3 egress costs, and kill that project in the water.

Instead, to prevent Fabric destroying Databricks revenues, Databricks will continue to invest in more features to entice the Microsoft personas away from Microsoft and into Databricks.

Weirdly, I think this will play out best for Snowflake. It is not easy to go up against a hyperscaler and win. The pull of Microsoft is enormous. Databricks may end up the same size as Snowflake for warehousing revenues, it may even end up larger — but it will be a pyrrhic victory as it inevitably fights a much longer, more painful war with Microsoft.

Meanwhile, Snowflake will use warehousing as more of a proxy war as it starts to truly democratise AI and Data workflows outside the SASS walled gardens.

Exciting times. Please subscribe to the blog if you found this interesting, and comment your thoughts below!

My Linkedin 👨

My Company 🏠

The Orchestra Data Leadership Newsletter

Discussion about this post