The Rise of the Data Product Manager

You had the Rise of the Analytics Engineer, it's time for the next one

Nov 17, 2023

How Data Product Managers use tools like Orchstra to run a great data stack.

Introduction

When I first heard of there being Data Product Managers I was extremely fucking skeptical.

Coming from a data engineering background, I just couldn't fathom why (or how) a data product manager could deliver value over and above another technical member of a data team.

Now I knew I was biased. Product managers, in the software sense, can range from low value-add middle managers (like in any role) to visionaries that drive software teams to dizzying heights of productivity. Furthermore, in large organisations where products are expansive and complicated, it's simply not tenable to expect software engineers to be content to context-switch; to go from thinking in terms of design one moment, and to debug gnarly javascript the next.

However data's different. The architectural problems are often not complicated. And where they are, it's not really a "product" question. A simple architectural pattern would be to buy a data ingestion tool, buy a data store (data lake or warehouse, or both and do external tables), invest in some infrastructure to do basic transformations and testing, and you're more or less done (plus a bit of DevOps).

Image the author’s: a simple data architecture. Now there are no 1m logos on this slide, there are 5, really it’s only 4 (you could remove the S3 one)

A more complicated structure would involve some kind of low latency and/or big data problem. The above architecture does not scale well, but often it doesn't need to. At this point, you'll need to be thinking about investing in some kind of message broker, certainly some kind of unstructured object store, and you'll have gnarly data transformations that require more custom infrastructure and processing power than simply dbt+Snowflake.

One box per service to ingest streaming data on azure. Double the boxes if you want another environment. Image the author’s (Source)

In the first example, the product manager adds nothing. The architecture, the fields people need and how to get there is obvious. In the second, the problems are extremely low-level. Product Managers rarely go down to the technical level of designing infrastructure systems - that's what architects are for. It's no different with data; in which you'll sometimes see larger companies hire an "enterprise data architect".

So what's changed?

What has changed

Extremely simple tooling

Something I believe the best product managers do is combine a bit of technical skill with their day-to-day. A product manager who knows a bit of react can easily implement fixes or small changes, and be very dangerous indeed! It's no different in data. Fivetran + dbt + Snowflake meant that, for a price, all you needed to set up a data team was an analytics engineer, the term obviously coined by dbt.

Sure - that analytics engineer wouldn't be expected to make good long-term infrastructure decisions. They may not have been expected to build sustainable dbt models and patterns, or set up end-to-end orchestration and testing frameworks. They probably relied a bit on the platform team for any infra needs too (namely, deploying dbt core). They are certainly not in charge of elevating data literacy and advocating for data to the C-suite, but they may be in charge of building some important people dashboards (yawn).

Something I've realised through personal experience at Codat, but also through others', is that the soft-skills and the holistic things like improving data literacy, focussing fiercely on initatives that build business value, and advocacy within executives is extremely important for a data team in a start-up environment (hell, even with enterprises moving to the cloud, it's the same).

If I had a choice to start my data team with a single AE or a Product Manager who knows SQL, I would choose the latter all day long. Ask me this 2 years ago, I would've said get a data engineer. But there's just no need. Ingestion pipelines are dead easy to set up. A Product Manager who knows how to model well but doesn't have the confidence to create dbt sprawl is awesome - because they're focused on business value. They don't even need to know SQL - you could use a tool like Coalesce.

Coalesce data transformation User interface (UI) — The Coalesce UI. Image the author’s

But what about Orchestration I hear you say? What about managing that infrastructure and having multiple environments for robust pipelines?

Not needed - we have companies that do this. Orchestra is one of them.

You can sign up to the beta here

It was revolutionary when dbt came along because it meant you didn't need any python to get started as a data team, just SQL. Now, because tooling has advanced so much, you don't even need that.

Federated teams and approaches

Another approach which is becoming quite popular in larger organisations is that of having federated teams, teams that specialise in bits of the data pipeline. We know this as the traditional "data engineer vs. analytics engineer" paradigm.

What this means, is that you can scale different parts independently. In the second example I spoke about earlier, where data engineers need to ingest a huge amount of data at low latencies, that can be its own team, without a product manager.

The data product manager can then work without worrying about how data arrives in the first place. Sure - to do it really properly they'll need to know how to check the data is up to scratch before kicking off their workflows (again, tools like this exist), and having tests in two places is a bit inefficient, but you get the benefit of treating data architecture in a decoupled microservices style, which I personally like quite a lot.

Having a data product manager do this with a bit of analytics engineering resource will become more commonplace as it's so much more powerful to have someone advocating and focussing on ensuring data delivers value, than simply having AEs write code and maintain a Looker instance.

Databricks and Snowflake marketplaces

So far, I've made an implicit reference to only internal use-cases; data thought of "as a product" but where the customers are actually just employees in your company. Not much of a product, if you ask me.

The Databricks and Snowflake clouds have marketplace functionality, which is very powerful. Frank Slootman presented a knowledge graph of all the Snowflake instances in the Snowflake Data Cloud at their event in London earlier this quarter. TBC if that was an actual rendering of any real data, but the graphic looked very cool and it was, I thought, a good illustration of how different companies can share datasets between themselves.

The Snowflake Data Cloud. Image credit: Snowflake

A short aside - when I worked for JUUL (a company that makes and sells vapes) we would want to understand what our sales data was. This is non trivial - think about all the places you can buy cigarettes or chewing gum. There are small chains, big supermarkets, the internet, corner shops - it's a difficult dataset to pin down. Fortunately IRI or Nielsen do it for you, but they charge you a pretty penny (I think it was about £150k a year for Sainsburys data) and the delivery is total shit. All FTP, most often email excel data send. Literally a nightmare.

End of aside - this is a big big contrast to what databricks and snowflake marketplaces offer. Imagine if IRI were on the Snowflake data cloud! You wouldn't even need to do any code. Just select * from IRI_DB.JUUL_SCHEMA.SAINSBURYS_DATA and you're good to go.

As this becomes more commonplace, and companies start selling more of their data presumably at an aggregated level to other companies, the function of this data and its quality and SLA start to resemble more of a product. If something's not right, there needs to be someone on the end of an email. There has to be docs. There has to be an SLA. There has to be a "use-case" that isn't just "dashboarding" - this requires product-lead thinking, and could be perfectly within the remit of a data product manager and analytics engineer to produce, which is I think extremely cool.

Conclusion

There are lots of things to be excited about in data but being an analytics engineer isn't one. While it's widely heralded as an awesome career path for business and data analysts wanting to be closer to the "code" (ahem - is SQL really a programming language? Do you even write tests? Do you even know what mocking is?), if it were me I would look beyond being the SQL-only person and brand myself as a data product manager. Writing awesome SQL and deploying it reliably and efficiently is indeed a skill, but it shouldn't be your main one. It should just be a facet you leverage to produce business value, perhaps as a data product manager.

The Orchestra Data Leadership Newsletter

Discussion about this post