Platform Engineer Not Required: why the greatest barrier to Analytics is non-technical
Building a Great Data Team requires a clear Vision — rockstars can be your enemy
Substack note
I have long wondered what skills you truly need to deliver BI and Analytics en-masse in an effective way. Coming from the business side, the obvious skill lacking in many (myself included) was an ability to communicate the complexity of data in a simple way anyone could understand.
Sadly, building data pipelines at scale is hard, so the technical folks who are good at that are rarely those that excel at communicating this to the business. This has created this enormous “data” sector with many roles, many permutations, and not a whole lot of success.
I believe we are now at an inflection point. For the vast majority of data use-cases, a managed ELT tool, a query engine, some basic SQL, a unified control plane and dashboarding knowledge is all that is required to implement in a straightforward but scalable manner.
In this article, I explore how, but if you’re a technical leader the takeaway is basically:
Get someone that knows what they’re doing (from the human side)
Analytics Engineers or people that know SQL for the grunt work (cheap source of labour, much better than data eng. And good at talking to people)
Add: Data Engineers when you really need python
Start with and rely partially on: consultants. They implement these patterns all the time.
Unified Control Plane mandatory.
Introduction
Ever since Data Science became the sexiest job of the 21st century, similar professions have tried and failed to claim the same title.
Within the Data Space alone, there have been many contenders. One of the most well-known was the rise of the analytics engineer, brought about by dbt. But we have also seen the rise of the data engineer, the data product manager, and even the analytics pretendgeneer.
The premise for all of these articles, s
One of the most common and well-thought-out titles for content in the Data and Software space is:
“The rise of the ____ ____”
For example
The rise of the pretengineer (Benn Stancil)
The rise of the Analytics Engineer (dbt)
The rise of the Data Engineer (the internet)
The rise of Data Product Manager (me)
Super catchy! but what’s the point? Why do we care?
Answer: we care because we’re interested in knowing if what we’re doing matters and is special. As new professions “rise” they are the ones doing the important work. we want to be the ones doing that fun work. Remember when Data Science was cool?
But it also gets at something better and more interesting, which is the point around what resources do you need to get something done.
This is why AI is so cool, at least for me. It means you can do something that would otherwise have required someone with very specialist knowledge, that was very rare, with a probably a very high salary and a very high degree of intellect very easily and cheaply.
So what’s happenning in Data? What’s rising and what’s not?
In Data and Business Intelligence (but not AI) things are getting simpler
In Data and BI, I now believe the core technical skills required to build a successful data scratch can be summarised in three letters:
SQL
Yes ladies and gentlemen, I really believe there are patterns that make it incredibly easy to get stuff done properly. Contrary to what you may be reading about semantic fabric meshes, universal orchestration protocols and various three letter acronyms in lower case beginning with the letter “d”, there is a tried and trusted pattern for getting data and business insights into production.
Consolidate data into a single place
Transform it
Surface it
Now, I am not saying a bit of python doesn’t go amiss here. DevOps will certainly come in handy. But when we speak of pure technical skill, we don’t need it anymore and here is why;
The importance of Architecture, Data Modelling, and Common Sense
Something that falls out of this discussion is that there is an increased focus on soft skills, that don’t really sit nicely in the software engineering domain.
The importance of Data Architecture
Simply having the knowledge of what structure data pipelines should have is obviously incredibly important. Whoever is leading any data stack building initiatives needs to be able to justify why they’re doing what they’re doing.
This is not common knowledge. Typically the people with this knowledge are the people that have done it before. So as we’ve said before, having this knowledge (soft knowledge) overlaps with the hard knowledge of being able to build all this stuff from scratch.
Having some architectural basics is imperative, which leads us to our first topic in our 2 week crash course:
Week 1: data architecture basics
You’ll understand: when to deviate from an ELT framework, when to use spark, when to stream, when to use dbt, when to build vs. buy, when to use a monorepo orchestrator like Airflow, when to consider yourself a software engineer, how to make a really good espresso
You should read: probably the fundamentals of Data Engineering by Joe Reis — apart from this the jury’s out. Anyone feel free to comment and help me out here?
The importance of Data Modelling
A noble shout-out has to go to data modelling.
The problem of code spaghetti is well documented.
This now exists in the data world — as dbt spaghetti. I’ve spoken to countless hapless analytics engineers who spend 80% of their time debugging complicated dags with 2200 models.
Teams whose sole function is to answer basic business questions like “How many cars do we rent every month” or “what are we spending our marketing budget on”. Insanity.
Understand how to do data modelling. Understand when to say no. Push back — be a pain. Less is more, man.
Week 2, Days 1–4: data modelling basics
You’ll understand: star schema, data vault, when to deviate from star schema, One Big Table, dbt basics, what is Coalesce.io, how to do Data Quality Testing frameworks, oh and Kimball (obviously), and Inmon (obviously)
You should read: inside out, Kimball and Inmon — this is all first principles stuff, so if you like me, like that, then you’ll be fine
Why Data Teams need common sense
One of the biggest differences between Software Engineering and Data Engineering, particularly when you’re running a BI team (i.e. you don’t work at Netflix) is that if you fuck up your CEO or CFO is on the call instead of a Product Manager.
This is a big deal — it means you need to learn how to speak to that person.
You won’t have a Product Manager fawning to learn your language or impress you by reading your code, understanding your database chat or explaining agile.
You’re on the back foot, and you had better know it.
Communication for anyone with “data” in their title is key. While a knowledge of technical components to “get stuff done” is imperative, you also need to be able to hold discussions with stakeholders about prioritisation, about timelines, about budgeting and so on.
Data allows you to answer hard questions (“Should we really be spending all this money on initiative A, B, C which have returns of 0, 10, 100 but you really care about A and B — shouldn’t we just do C?”) that make people feel uncomfortable — ensuring you don’t simply become the pariah of the organisation can be hard when you hold all the answers.
Days 5–7 are dedicated to emotional intelligence and communication.
Week 2, Days 5–7: communication, teamwork, empathy
You’ll understand: how to make your CEO prioritise, how to get Heads of to Trust you, how to tell stories with Data, how to get your point across, how to make friends, how to say no (politely), manners, why your CEO is always right, why the CFO is probably your best friend
You should speak: to people that have done this type of thing before, probably consultants, since good consultants are good at this.
Conclusion — Send in the Clowns? Consultants
The development of technology in the data space has been enormous.
One of the missing pieces was the platform to knit everything together — now we have Orchestra, I genuinely believe the only technical barrier to achieving a world class data team in 99% of cases is SQL.
The greater barrier is know-how. And as we’ve seen, typically those with know-how, a combination of expertise and soft skills, are those with technical backgrounds.
This results in no better outcome for the Enterprise or Start-up. A single rockstar or team of rockstars is still required.
But imagine if you had this expertise outsourced — imagine if you could have it on demand, from a group of experts whose job it is to understand what to do, stay on top of trends, who you can even ask for a refund if they get it wrong?
It just so happens these people exist. They are consultants. They always have done. Providing expertise is the perfect role for a consultant.
Give them your requirements
Take their recommendations
Have them do an initial build and a couple of hires
Be happy with a short 3 month engagement — you now have a best-in-class data stack and 2 solid hires
This is by far the best pattern, in my eyes, for getting started with Business Intelligence.
Of course, you may be a company that requires a custom architecture. You may have petabytes of data that need processing every second. You may have low-latency AI use-cases that require GPUs for image processing or generation. You may be deploying miniaturised custom ML models onto phones to ensure your users never leave them.
If that’s you, I implore you, do anything but hire consultants. Do anything but leverage “Modern Data Stack” components. You’re building Products, you’re not powering dashboards.
However for most of us, I do believe the path forward is finally much clearer. ☀️
Data Empathy Resources
Data Arhictecture Resources
James Serra’s book on Data Architecture
Mastering Data Warehouse Design: Relational and Dimensional Techniques by Claudia Imhoff, Nicholas Galemmo, Jonathan Geiger
A fantastic intro video from Alex Merced (link)
Steve Hoberman’s website (link)
Data Modelling Resources
Data Vault (link)
Data Mesh modelling (O’Reilly)
Data Modeling Essentials (Simpson)
Anything by John Giles, Steve Hoberman, DMC, Larry Burns, Bill Inmon, Venkat Subramaniam dev2next organizer, CJ Date
Corporate Information Factory by Bill Inmon, Claudia Imhoff
The Data Warehouse Toolkit by Kimball & Ross
For further reading…
anchormodeling.com by Lars Rönnbäck et al
Building a Scalable Data Warehouse with Data Vault 2.0 by Dan Linstedt and Michael Olschimke
Unified Star Schema by Francesco Puppini
Enterprise Model Patterns by David Hay is excellent.