Go from Analytics Engineer to Platform Engineer in 2 weeks
Data Platform Engineers do incredibly challenging work - work that is probably not necessary for the company you work for
Foreword
Some of the biggest pains we see facing data engineers and analytics engineers is that they spend too much time maintaining boilerplate infrastructure, and still have no visibility into pipeline failures.
It means you’re constantly fighting fires and don’t have time to focus on building. What’s worse is that the business doesn’t trust the data.
At Orchestra we’re building a Unified Control Plane for Data Ops. A Data Status page if you like, with some incredible features to give data teams their time back so they can focus on building. You can try it now, free, here.
Introduction
There have been many “rises” of different professions in the data and tech space over the last few years. This started with the infamous “Rise of the Data Scientist”, dubbed the Sexiest Job of the 21st Century.
Since then, we’ve seen a few other iterations of this, presented below in a rough chronological order:
The rise of the Data Engineer (2017)
The rise of the Analytics Engineer (2022)
The rise of the Data Product Manager (2023)
The rise of the Analytics Pretendgineer (2024)
Super catchy! but what’s the point? Why do we care?
The answer is we care because we’re interested in knowing if what we’re doing matters and is special. As new professions “rise” they are the ones doing the important work. We want to be the ones doing that fun work. Remember when Data Science was cool?
Another dimension this relates to are the skills you need to get something done. By making it easier to complete tasks, more people can do them which means professions specialising in it become relatively less valuable.
This is one of the reasons I think AI is genuinely cool. It means you can do something that would otherwise have required someone with very specialist knowledge, that was very rare, with a probably a very high salary very easily and cheaply.
Things are getting simpler
Historically to build a best-in-class data stack for BI and Analytics, you would need the following skills:
Knowledge of Data Architecture — to design the system
Who: Data ArchitectKnowledge of Python and Data processing — to move data
Who: Data EngineerKnowledge of DevOps and Infrastructure management — to run compute resources
Who: Data Platform EngineerKnowledge of Data Modelling and SQL — to transform data
Who: Analytics EngineerKnowledge of how to build great dashboards — to deliver insights
Who: Data AnalystKnowledge of how to get organisations to understand data
Who: Team Lead / Head of Data / VP etc.
There are two common paths to becoming a data professional. Some folks come from software engineering and are naturally more technical. This persona is typically very strong in areas (1–3).
The other is to come from the business side. This Persona is stronger in areas (4–6). Typically, skills can be summarised by the diagram below.
Generally speaking and with exception, organisations seek the lowest common denominator when first tackling data. This might be by hiring a Data Angineer or Data Platform Engineer, since those personas are often multi-skilled and have some expertise in most areas.
The Skills Required are changing
ELT and Python
I am somewhat surprised nobody has written an article called the “Fall of the ETL developer”. The Founders of Data movement tools noticed there was an opportunity to help the data community, when they realised thousands of ELT Developers were writing the same code to move data from Salesforce to BigQuery.
Now, Data Engineers rely on third party, commoditised tools for a lot of extraction and loading. This means the need for an in-house Python and ELT Skillset is less.
Of course, there will always be edge-cases that require some knowledge of python. Your organisation may wish to do some streaming, which is not straightforward to implement well. But managed versions of these exist too, which again reduces the dependence of organisations on the traditional data engineering skillset.
DevOps, Infra and Architecture
The beauty of managed services is that often there is no infrastructure to maintain. Services also typically include software development best practices built-in, which means advanced knowledge of DevOps and Continus Integration and Deployment are not necessary to make the most of them.
Architecture has also greatly simplified. Where organisations used to rely on complicated Extract-Transform-Load patterns, the popularity of Extract-Load-Transform patterns means those responsible for business logic and data transformation are no-longer wedged into hardcore data engineering teams.
Orchestration and Monitoring
The eagle-eyed of you will notice an Apache Airflow logo in the image above. That’s because with a modular stack like this, you also need to join the stack up — this typically requires lots of custom python code and some infrastructure management to run jobs.
However, there are now many orchestrators that also provide managed versions. Airflow is no longer the only option. There are even Modern Data Stack tools integrated with each other, which means you don’t even need to write the python code to integrate the tools if you keep your architecture simple.
Become a Platform Engineer in two weeks
I’m not going to argue that someone without any knowledge of data architecture can expect to be listened to when making data architecture decisions.
But I do believe with a bit of architecture knowledge and an awareness of the right managed options, you can build something that is both scalable and best-in-class as an Analytics Engineer.
However, to do that you’ll need to brush up in three areas.
Data Modelling
Data Architecture
Empathy
Hopefully you already have knowledge of (1), but if you’re reading this and aren’t already building data warehouses, this should be your first port of call.
Knowledge of Data Architecture is important for people to listen to you. If you can successfully argue for a simple architecture, then the ELT pattern is yours to own.
The final skill is empathy. Many Data Teams are in the unfortunate situation of working in companies that simply do not understand Data. I am of the opinion it is our job, as data practitioners, to bring business stakeholders on that journey with us — and for that, we should learn about empathy.
The 2 week Plan
Data Architecture
Simply having the knowledge of what structure data pipelines should have is obviously incredibly important. Whoever is leading any data stack building initiatives needs to be able to justify why they’re doing what they’re doing.
This is not common knowledge. Typically the people with this knowledge are the people that have done it before. So as we’ve said before, having this knowledge (soft knowledge) overlaps with the hard knowledge of being able to build all this stuff from scratch.
Having some architectural basics is imperative, which leads us to our first topic in our 2 week crash course:
Week 1: Data Architecture basics
You’ll understand: when to deviate from an ELT framework, when to use spark, when to stream, when to use dbt, when to build vs. buy, when to use a monorepo orchestrator like Airflow, when to consider yourself a software engineer.
You should read: James Serra’s book on Data Architectures
Week 2: Data Modelling
Knowledge of Data Modelling is imperative
Week 2, Days 1–4: Data modelling basics
You’ll understand: star schema, data vault, when to deviate from star schema, One Big Table, dbt basics, what is Coalesce.io, how to do Data Quality Testing frameworks, oh and Kimball (obviously), and Inmon (obviously)
You should read: The Database relational model by CJ Date, Dimensional Modelling techniques (Kimball), Mastering Data Warehouse Design (Imhoff)
Dealing with the Business
Days 5–7 are dedicated to emotional intelligence and communication.
Week 2, Days 5–7: Communication, teamwork, empathy
You’ll understand: how to make your CEO prioritise, how to get Heads of to Trust you, how to tell stories with Data, how to get your point across, how to make friends, how to say no (politely), manners, why your CEO is always right, why the CFO is probably your best friend
You should read: Influence by Robert Cialdini, Thinking Fast and Slow (Kahnemann), the Science of Selling (Audiobook)
Conclusion
There is obviously no hope of becoming a master of Data Architecture, Data Modelling and Company-Influencer in 2 weeks, however you can genuinely have a decent stab at it with the resources mentioned above.
If your organisation grows, then at some point you will undoubtedly need data engineers. The rise of streaming, iceberg, data mesh, and generally just scale mean that specialisation is almost inevitable.
However we are at an interesting juncture, where a handful of analytics engineers can probably suffice for a small (< Series C) Technology Company, or a mid-sized (c.1,000 employees) company in a traditional sector.
The missing skillgap preventing those budding data professionals from taking things to the next level tends to be rock-solid foundation in data modelling, a bit of architecture knowledge, and of course — the skill to navigate office politics.
I’m very excited to be working in Data at a time where technology is genuinely democratising how companies do analytics. If anyone has any recommendations for additional resources for our fictionary training course, do get in touch.
I have no affiliation to any of the resources or authors mentioned in this article.
Find out more about Orchestra
Orchestra is a unified control plane for Data and AI Operations.
We help Data Teams spend less time maintaining infrastructure, make them proactive instead of reactive, and ultimately win trust in data and AI from the Business
We do this by consolidating Orchestration with monitoring, data quality testing, and data discovery. You don’t need an observability, lineage, catalog etc. with Orchestra.
Check out
Interesting article! Heads up there’s an Oreilly book on Platform Engineering coming out in November, but I always take software eng’s take on platform engineering with a grain of salt. Like you said, with the managed services we in data have at our disposal, we won’t necessarily be managing a bunch of Docker, Kubernetes, and bash scripts. It’s more about making sure everything orchestrates well and plays nicely with the data lake[house] and/or data warehouse
Also, not sure I agree with the strict hierarchy of roles you show. It’s more of a Venn diagram to me (or an Euler diagram?) - Analysts tend to have skills DE’s don’t, and Analyst can promote up to AnEng